An End To End Data Integrity with Blockchain

Renjith KN
7 min readAug 2, 2020

Data integrity is the property that the data used in a solution is correct, reliable, and useful for all participants. This article covers typical considerations around ensuring that the data used in a blockchain solution is correct, reliable, timely for all participants, and preserved from the point of data creation to the point of usage on the blockchain. When it comes to Blockchain and Distributed Ledger Technology (DLT), the real beauty of it is in the decentralized consensus, and the immutability it provides when it comes to not only adding data to the ledger but also running Turing complete programs Known as Smart Contract. Even though Smart Contracts aren’t really that smart (and they aren’t really contracts either), when you combine them with a blockchain you have a new technological advancement that becomes as groundbreaking as the internet itself. The ability for a trustworthy, self-executing, tamper-proof digital agreement to be deployed and executed on a distributed ledger is absolutely game-changing. But it’s only game-changing if it is used in conjunction with other technology and processes. On its own, it is essentially just a program running on multiple computers.

Enterprise solution often needs integration from different sources of data or systems. So, the blockchain system needs to integrate with different external systems that serve as a source of data. This means smart contract needs connectivity to external resources. They need to be able to have external data sent in, and they need to be able to push out data to external systems. In addition to this, the whole process end to end needs to be as secure and immutable as possible. Data Integrity is a subject often brought up at blockchain events as a crucial barrier to adoption, but its rarely discussed in detail. The focus is usually always on the blockchain itself, which already has a high level of security and integrity. But to mitigate risk and increase security, you need to look at data integrity from an end to end perspective, not from just within the blockchain. Achieving data integrity within blockchain applications is broadly composed of three pillars: data origin integrity, oracle integrity, and Trusted Execution Environments.

Data origin integrity

If you need to get external data into a Smart Contract on a blockchain, you need to ensure the integrity of the data remains. A common misconception is that the use of a blockchain alone can ensure data integrity. However, even though blockchains can reliably prevent the undetected modification of data once it is confirmed on-chain, blockchains will enforce this only on the data it is given. A lack of data origin integrity will prevent blockchain participants from drawing useful insights from the data on the blockchain since the data itself is faulty. The point is, if you want to ensure the integrity of the data, you should try to decentralize the data, and obtain it from multiple data points/sources. This way you can know if one of the data sources is potentially compromised, and can act accordingly. Whether you can do this depends on the scenario though. In the case of needing to know what the weather is or the price of a stock, you can use multiple data points. But if you’re grabbing data from an enterprise ERP system, then this option isn’t really available, so you can only really use the 1 data point.

Oracles integrity

A common step where problems can occur is at the point of submission to the blockchain. Since blockchains themselves cannot directly access information about the real world such as the status of a shipment, weather conditions, and commodity prices, blockchains must rely on third parties to submit this information, commonly referred to as oracles. Oracles are the means in which data is sent to/received from a Smart Contract on a blockchain. They are essentially a piece of software running on a machine because Smart Contracts can only access data from within the blockchain itself.

Apart from their obvious advantages of opening up the blockchain to a wider set of real-world use cases, the disadvantage of using Oracles is that being a trusted third party between the data source and the Smart Contract, they add a new potential point of failure in what its meant to be a highly secure and tamper-proof solution. Depending on the environment the blockchain solution operates in, a degree of care must be taken to ensure that oracles have not modified or omitted data before submission to the blockchain. This is referred to as oracle integrity. Despite all the great properties that a blockchain has in regards to security and immutability, if an Oracle is compromised then the whole Smart Contract can be gamed to behave in a certain way. In this new incoming era of automated execution of digital agreements that cut out third parties and manual intervention, we need to do all that’s possible to prevent this from happening.

A good way to remove or mitigate this extra possible point of failure is to also decentralize the Oracle layer and to pass data from data points to the blockchain in a decentralized way. This means if you have 3 data points that are being used to pass data into a Smart Contract, you can have 3 or more odd number Oracles that pass the data in. If there is only 1 data source for the data going into the blockchain, you can still use 3 Oracles to pass the data in, this way we will know if one of them was compromised, as they won’t reach a consensus on the data.

Trusted Execution Environments

Another method for ensuring the integrity of the data passed into a Blockchain by an Oracle is via the use of Trusted Execution Environments. When it comes to cyber and web security, a defense-in-depth approach is often taken to minimize the risk of a security breach, and in the world of blockchain technology, it should be no different. The decentralization of data feeds and Oracles can be used in conjunction with Trusted Execution Environments, should the security requirements of the end to end solution require this level of security. A Trusted Execution Environment is a specialized piece of hardware that allows the software (in this case an Oracle) to run in a secure enclave, separate from the Operation System and other processes. This provides an extra layer of integrity and confidentiality because the trusted computing base has been lowered, and there are fewer places in which people can attack the software (Oracle). One example of an Oracle running in a Trusted Execution Environment is the Town Crier project, which uses Intel SGX.

What about off-chain data?

It is common practice in blockchain deployments to only store the hash digest of data on-chain instead of the data itself when the dataset is particularly large, perhaps including documents, images, videos, long strings of text, or other elements. Storing all of this on the blockchain can lead to blockchain bloat.

To address this issue, the larger dataset may be stored somewhere off-chain, whether in a shared database, another blockchain, or a peer-to-peer network like InterPlanetary File System (IPFS). The on-chain hashes of the data can then help a blockchain refer to the off-chain data as needed.

This arrangement guarantees that unwanted modifications to the data will not go undetected, but the same problems, considerations, and solutions relevant to data integrity still apply. The data must still be validated in some way, but since the smart contracts on the blockchain cannot do this directly, the validation needs to be completed by a different architectural component such as by the clients of participating users or even by a trusted execution.

Output

The output of a Smart Contract works in a similar fashion to its input, Oracles are required to capture data. This means the same solutions can be proposed here. If you have a specific output, whether it’s a payment or something else, you can use multiple Oracles to capture this, then they can reach a consensus on what is meant to happen. When/if they all agree, then the action can be performed, having a high degree of confidence that the actual output of the Smart Contract was not tampered with or modified at any time from when the output was reached in the Smart Contract, to when the intended action was triggered.

Bottom Line

When you bring everything together and combine Blockchain technology with externally connected and highly secure Smart Contracts that have the ability to be used in end to end processes that involve more than just the blockchain and adding data to the ledger…then you will see a whole new era of digital agreements come into fruition, from just about every industry and vertical that exists.

The fourth industrial revolution is coming. AI, big data, blockchain and machine learning are going to drastically change the way businesses operate. Smart Contracts running on a Distributed Ledger will play a big part in this, but for this to be realized we need to ensure high level of security and risk mitigation across the whole process, all the while maintaining the highest level of data integrity. Only then will the true potential of Blockchain & Smart Contracts technology be realized.

--

--

Renjith KN

Senior technical architect with more than 15 years of experince in microservices, blockchain, J2EE technologies.