Blockchain Data Governance.

Renjith KN
5 min readOct 17, 2021

The key idea behind blockchain technology is a completely distributed, append-only ledger enabled by cryptography. It’s been a few years now since blockchain first burst onto the digital landscape as the underlying technology behind cryptocurrency. But today, the technology is making its mark far beyond the crypto world.

As the use of blockchain expands into different areas, it is becoming increasingly apparent that blockchain still requires fundamental supporting controls as with any other software system. If you enter erroneous data, such as a fraudulent record, unless there are proper controls in place, end users may end up relying on this information. Given the underlying expectation of reliability of information from the blockchain, data governance controls have become more important than ever. Data governance can be thought of as the proper management of data, from its initial capture to its protection, availability, usability, and eventual archiving or destruction.

Data governance in blockchain systems.

A blockchain as a data store brings up a range of governance issues, first as a unique data processing platform that removes the need for centralized authority and second as an append-only, permanent data storage. Data governance provides a comprehensive control including processes, policies, and structures that could be applied to all data assets to support the right decision-making process in an organization. The data quality is an important aspect of data governance and is measured by consistency, traceability, availability, compliance, and credibility.

Here, we focus on the privacy and data quality aspect of the governance and create a data flow architecture and analysis which addresses both aspects of the governance.

Data flow and placement architecture.

A Blockchain data flow

The diagram shows a representative data architecture of a blockchain-based application, depicting the flow of data between the blockchain networks and external environments. Usually, all the data will flow from external sources to a blockchain data store as shown in the diagram. Practically, most data will go through some processing steps in external storage before it is transferred to on-chain in the form of metadata, summaries, or attestations generally as a hash of the data. Because on-chain options could present limitations in storage capacity and performance, as well as privacy and cost concerns, most blockchain-based applications with high throughput and storage requirements would need to consider an on-chain/off-chain hybrid option similar to what is shown in the figure. Blockchains cannot directly access the data outside of the network. However, to make deterministic choices, blockchain-based applications need some external information (e.g., a real-time weather condition). An oracle is a data feeder that connects an external data source to the blockchain network. Oracles typically have both on-chain and off-chain components. Either a human or a server could play the role of an oracle and submit data (from an external system or off-chain storage) to a blockchain as a set of transactions. An oracle may also rely on a smart contract adding metadata such as the reliability/reputation of the data source.The on-chain data store may consist of multiple blockchains similar to an application using multiple databases with different features, performances, and security settings. Therefore, even within the on-chain data store, data may also flow between multiple blockchains referred to as auxiliary chains.

The on-chain data storage usually has the following options.

  1. Using Public Blockchain: To fully leverage transparency, immutability, and consistency, an application can utilize a public blockchain.
  2. Using Private Blockchain: As only the authorized nodes can participate in a private blockchain network, it provides confidentiality.
  3. Using Consortium Blockchain: Similar to private blockchains, with higher performance, scalability, and confidentiality except involving multiple organizations who interact with each other.
  4. Using Auxiliary Chains: More than one blockchain is used in a larger software system to build connected data stores. Highly sensitive data of an organization can be stored on a private blockchain owned by the organization, while data essential for business interactions between multiple entities could be stored on a consortium blockchain. Data from both a private and a consortium blockchain could be anchored on a public blockchain to leverage the public blockchain’s stronger transparency and immutability properties.

On-chain and off-chain datastore combination: Generally, most applications rely on data from one or more external systems. Therefore, a more realistic architectural design of a blockchain-based application would place data on both on-chain and off-chain. Combining on-chain and off-chain data is commonly found in large software systems with a blockchain. Such a hybrid design could also overcome scalability issues resulting from full data replication and the limited computational capacity of blockchains. One such architectural strategy for separating data is to keep only the application’s metadata or their hashes on-chain while keeping the actual data off-chain (e.g. on-premise, cloud, or P2Pstorage).

Data Analysis

The data generated by the blockchain network can be collected to visualize and analyze the historical and current system behavior. Such visualization and analysis are needed by the system operators/administrators to understand the system behavior at the network layer and transactions in the network, as well as operate the blockchain-based system accordingly. The blockchain data analysis techniques can be divided into two; visualization and data mining. In the context of a blockchain-based system, data visualization
is essential to illustrate the behavior of the system because a blockchain is highly dynamic with nodes connecting and disconnecting all the time. With an overall visualization of the system, it is easier to understand the system. With data ming of immutable data, we could find the historical behavior of the system as well as fining any data manipulations.

Conclusion.

We are seeing the growth of blockchain applications reaching far beyond the initial craze of Bitcoin. The unique data quality properties of blockchain technology, such as consistency, transparency, auditability, and availability, and their utility to data provenance are useful in many data governance use cases. Having a clear understanding of a blockchain as a data store, and be able to comprehend and evaluate the characteristics of blockchains with regards to the conventional data stores will help application developers design and implement a blockchain-based application more effectively.

--

--

Renjith KN

Senior technical architect with more than 15 years of experince in microservices, blockchain, J2EE technologies.