Build a Blockchain Database with bigchainDB for Immutable Data Transactions

Build a Blockchain Database with bigchainDB for Immutable Data Transactions

Listen to this article:

Voiced by Amazon Polly

Unless you’ve been living under a rock for the past decade, you’ve undoubtedly heard of the blockchain. At least, bitcoin. Maybe you’ve heard so much about those virtual coins, that you own a couple of them.

In this post, I’ll give you a conceptual overview of the blockchain, highlight its applications, and then show you how to use bigchainDB to develop a simple blockchain database for a human capital management (HCM) system.

What the heck is Blockchain?

Some people claim that the blockchain is the greatest invention since the internet, and for this reason, it has become a technological buzzword akin to artificial intelligence and cloud computing. The late 2017 cryptocurrency gold-rush only added confusion to an already foreign concept to many of us.

Security and Integrity

Consider your mobile phone. You can easily divide your phone’s features into functional and nonfunctional aspects, essentially distinguishing between what your phone does and how it does it. For example, you may expect your phone to take photos, make phone calls, connect to the internet, and last at least 10 hours on a full charge. These are functional requirements. In addition to them, you may also expect nonfunctional requirements: a beautiful user interface and physical design, performance, efficient energy management, and secure data storage.

We loosely summarize these nonfunctional requirements to security and integrity. When a system or product has integrity, it behaves as intended. In software, there are three types of integrity:

  1. Data integrity: The data collected and manipulated by the system will be free from flaws.
  2. Behavioral integrity: The system behaves as intended and is free from corruption.
  3. Security: Access to the system is managed and restricted to monitored users or groups.

We take security and integrity for granted simply because we use systems that maintain their integrity. Unfortunately, this can make us increasingly complacent when protecting our information. After all, the most popular password is password. Only when data breaches occur do we realize the value of a secure system. Further, given the sheer expanse of the internet and capacity for data breaches to affect several systems, software professionals should be dedicating significant resources to improving security, privacy and integrity.

Distributed versus Centralized Architecture

When developing software, it’s important for engineers to decide on an appropriate architecture. Two main architectural approaches are centralized and distributed.

In centralized networks, several components are located around and related to a single, central component (or node). This central node has a direct connection and control over every other node in the network. However, in distributed networks, the nodes are connected without any single node taking precedence over the others.

Distributed (left) versus Centralized (right) networks. From: Tanenbaum, Andrew S., and Maarten Van Steen. Distributed systems: principles and paradigms.
Upper Saddle River, NJ: Pearson Prentice Hall, 2007.

Both networks have advantages and disadvantages. Centralized networks provide a single source of control, with easier coordination and communication throughout the network. However, they suffer by relying significantly on the central node for performance, which makes them unreliable in the event of network attacks or intrusions. The integrity of this system largely depends on the integrity of the central node.

A centralized network runs the risk of being completely disabled when its core node is vulnerable.

Distributed networks combine the performance and power of all connected nodes, and since no single node takes precedence, redundancy is maintained. The integrity of this system depends on the combined integrity of every connected node. These networks can also scale as needed by simply adding more nodes. Unfortunately, distributed networks require complex communication and coordination services.

Peer-to-Peer Networking

Peer-to-Peer Networks (P2P), are distributed networks where all the connected nodes make their processing resources accessible throughout the entire network. The rights and roles of each node are equal and the more nodes in a P2P network, the more robust and secure it is.

This system, what’s most interesting about it is, you’re interacting with peers, you’re exchanging information with a person down the street.

~ Shawn Fanning, Napster

Some P2P networks use a hybrid distributed and centralized network. For example, a handful of nodes can be selected to take precedence over nearby distributed nodes to provide a central point of control and data storage. These networks are especially powerful as they combine the advantages of both network architectures. These hybrid networks were used by file-sharing services, like Napster and Limewire.

However, there are several other applications where software developers may want to maintain a distributed, disintermediated network where all nodes have the same permissions. For example, electronic voting and currency exchange systems will want to ensure that no node has preference over the others.

The blockchain is simply a method of achieving integrity in a purely distributed network.

It’s all about Trust

In a world with many peoples, cultures and languages, we have historically relied on, or trusted, individuals who appeared to understand a variety of languages to translate and interpret our conversations correctly. If a single individual were to convince you that an orange was in fact a purple-coloured fruit, you’d think them insane. However, if a significantly large group of people expressed that sentiment, well we’d be eating purple slices at soccer games.

The blockchain provides a way of establishing trust in a network where all nodes are equal. The integrity of a purely distributed network really comes down to two important properties:

  1. The number of peers in the network, and
  2. the trustworthiness of each peer.

If the number and nodes and the trustworthiness of each node are high, then the network will attract more nodes and its integrity will grow. Unfortunately, it’s impractical to gauge either property due to communication failures and malicious peers. Therefore, we need a method of understanding the integrity of a network with unknown reliability and trustworthiness. This is known as the Byzantine general’s problem.

The Byzantine General’s Problem is an agreement problem, (described by Leslie Lamport, Robert Shostak and Marshall Pease in their 1982 paper, “The Byzantine Generals Problem”), in which a group of generals, each commanding a portion of the Byzantine army, encircle a city. These generals wish to formulate a plan for attacking the city. In its simplest form, the generals must decide only whether to attack or retreat. Some generals may prefer to attack, while others prefer to retreat. The important thing is that every general agree on a common decision, for a halfhearted attack by a few generals would be chaotic and worse than either a coordinated attack or a coordinated retreat.

So How Does the Blockchain Help?

The blockchain is a distributed, immutable database that determines the consensus of the entire network, through an incentivized proof-of-work system to deter cheating. In the blockchain, every node maintains an append-only, accessible ledger that contains a full copy of the entire database and its transactions.

Data is stored in successive blocks; each of the blocks contain small information chunks that verify the content of the previous block. If an attempt is made to maliciously change the blockchain’s previous transactions, all the subsequent blocks will cease to match up. Further, all participants in the blockchain can be uniquely identified with public-private key cryptography, and the data transferred is also secured through hashing, a process by which data inputs are obfuscated with a secure hashing algorithm (bitcoin uses SHA-256) where no matter the input length, the output will always be a unique 256-bit string (32 characters in length). This is called a hash puzzle.

Encrypting ‘Hello World’ with different hashing algorithms. Source: Drescher, Daniel. Blockchain Basics: a Non-Technical Introduction in 25 Steps. Apress, 2017.

Hashing is very different from encryption. Encryption is a two-way process; encrypting a string would obfuscate that string against a key-phrase or cipher, which can then only be decrypted by any individual or computer who possesses that cipher. Hashing is one-way obfuscation with a secure hashing algorithm (SHA) that can only be understood through a process of trial and error where a computer would hash several potential phrases until a hash match is found. This trial and error process is called mining.

Hashing puzzles are called “proof of work” because a mined solution proves that several nodes have performed the work necessary to solve it.

To make a hash puzzle even harder to solve, developers can force the algorithm to return values with specific patterns. For example, suppose a blockchain required that every output value began with three leading zeroes. That is, instead of Hello World! returning a SHA-256 value of “7F83B165…”, it must return a value of “000…”. A developer could apply a nonce, a suffix that would adjust the final output, increasing from 0 until the desired hash appears.

In this scenario, “Hello World!” leads with three zeros after a nonce of 614 is added to the input string. Source: Drescher, Daniel. Blockchain Basics: a Non-Technical Introduction in 25 Steps. Apress, 2017.

To verify the transactions, blockchain nodes need to parse a series of input phrases and nonce values, matching each output with the desired hash. Integrity can be achieved by requiring a distributed consensus; validation from several nodes in the network. When a transaction is verified, it is added as a block to the blockchain, and all ledgers are updated to append that new transaction.

Miners dedicate significant processing power to solve hashing puzzles and are rewarded for their work with newly created coins. Mining rigs are energy intensive, generating heat and requiring massive cooling facilities to maintain their efficiency. For this reason, the blockchain community is experimenting with other validation frameworks like Proof-of-Stake, but we’ll keep that to another post.

Blockchain Applications

Keeping that introduction in mind, it’s easy to see that a distributed ledger can be applied to several industries where security and integrity are paramount. These applications can include:

  1. Currencies: cryptocurrencies, which still follow traditional supply and demand curves, albeit with increased volatility due to it being incredibly emotion-driven.
  2. Agriculture: to record the trade of goods and commodities, using the ledger as a pedigree. Consumers and health agencies can monitor produce for pesticides, transportation, and other processes through transactions, as food travels from farm to table.
  3. Apartment rentals: to record rental payments, inspections, and create a transparent network where landlord/tenant actions are traceable and public.
  4. Used car dealerships: to document the transfer-of-ownership of a vehicle as transactions, including major accidents and emissions tests.
  5. Identity management: access to certain services can be controlled by harnessing the SHA256 hashing algorithm to provide every user a unique identifier and immutable ledger.
  6. Data governance and auditing: the immutability of the blockchain makes it a great tool to maintain version control and document ownership history.
  7. Hazardous waste management: As a nuclear engineer, I need to plug this in as an exceptional application for the blockchain. Tracking spent nuclear fuel is paramount to nuclear safety and security, and an immutable ledger could provide that integrity.

And many more. From credit cards to human capital management software, companies are readily exploring the blockchain’s immutability, integrity and security for global applications.

Setup a simple Blockchain Database with bigchainDB

The best way to learn anything in computer science is by building your own system, and that’s the same with the blockchain. BigchainDB is a blockchain database that combines the immutability and security of the blockchain ledger, with low latency and high performance from a traditional NoSQL (non-relational) database.

BigchainDB 2.0 uses Tendermint as its byzantine-fault tolerant consensus system where each node maintains a MongoDB non-relational database. Therefore, if a hacker infiltrates a single node and manipulates its database, all the other nodes will remain unaffected. The consensus would identify the misaligned database, and the corrupt database would be reloaded with the correct data to match.

Assets can be created, signed with SHA-256 hashing, and transferred as required. The Tendermint network can complete thousands of transactions per second, and appends the result of each transaction to every database in the network.

bigchainDB network schematic. Source: bigchainDB.

In this example, we’re going to investigate using bigchainDB to transfer an employee from one department to another. We will need to generate two departments (hashed with SHA-256 keypairs), and one asset (the employee), which will contain both their persistent data (i.e. date of birth), and the associated metadata, for dynamic information (i.e. position title).

To install bigchainDB, we first need to install Docker CE and docker-compose. Docker is a program that performs containerization, or operating-system-level virtualization where all the containers run on a single OS kernel while maintaining independence.

Install Docker

In Ubuntu 16.04+, first update your apt package index (the list of available packages):

sudo apt-get update

And then install packages to allow apt to use HTTPS:

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

Now add Docker to your apt package index:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

Rerun the package index update and install docker-ce:

sudo apt-get update
sudo apt-get install docker-ce

Now install docker-compose:

sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose

And apply executable permissions to the repository:

sudo chmod +x /usr/local/bin/docker-compose

To test both docker-ce and docker-compose, run:

docker-compose --version
sudo docker run hello-world

Install bigchainDB

Now you can clone the latest version of bigchainDB from github with the following command:

git clone https://github.com/bigchaindb/bigchaindb.git

Once complete, change to the bigchainDB directory and run the make help command to see a list of available commands:

cd bigchaindb
make help

You’ll see a list of commands. Notice that to run bigchainDB, you can use make run, or you can use make start to daemonize it (multi-tasking), and that you can stop the service with make stop. For now, we need to install the bigchainDB python driver, as well as a cryptography tools package for development with the python 3.6 language.

I’m assuming that you’re using a virtual environment for the following commands. If you aren’t, simply replace pip with pip3.

Install setuptools for pip:

pip install --upgrade setuptools

Then install the cryptography package:

pip install cryptography

Now you can install the bigchainDB python driver:

pip install bigchaindb_driver

And that’s it! Now we can begin using bigchainDB for a very basic HCM application that moves an employee from one department to another, securely transferring his or her persistent and dynamic data.

HCM with Blockchain

Let’s get started by starting our bigchainDB service, be sure to take note of the root URL, but it’s likely to be http://localhost:9984

cd bigchaindb
make start

Launch python 3 in terminal and begin importing the libraries we’ll require:

from bigchaindb_driver import BigchainDB
from bigchaindb_driver.crypto import generate_keypair

Add the BigchainDB root URL. For the purposes of this tutorial, we won’t require authentication tokens.

bdb_root_url = 'http://localhost:9984'
bdb = BigchainDB(bdb_root_url)

Now generate a keypair for both departments or supervisory organizations between which we will transfer our employee:

finance, marketing = generate_keypair(), generate_keypair()

We can define the employee as an asset. Assets carry persistent data, which is included in the asset definition, and metadata which can store dynamic data. Our asset (employee) will be Jim.

Jim = {
     'data': {
         'jim': {
             'employee_number': 'abcd1234',
             'date_of_birth': '07201985',
         },
     },
}

JimMetadata = {'position': 'Administration Assistant'}

Once we’ve identified the persistent and dynamic data for our asset, we can prepare the CREATE transaction:

prepared_create_tx = bdb.transactions.prepare(
        operation='CREATE',
        signers=finance.public_key,
        asset=Jim,
        metadata=JimMetadata
    )

We can then FULFILL the transaction, essentially completing the asset creation process and adding Jim to the Finance department.

fulfilled_create_tx = bdb.transactions.fulfill(
        prepared_create_tx,
        private_keys=finance.private_key
    )

sent_creation_tx = bdb.transactions.send_commit(fulfilled_creation_tx)

Check if the transaction has occurred with the following command. It will return NONE if the transaction has failed.

block_height = bdb.blocks.get(txid=signed_tx['id'])

Suppose Jim has completed his tasks with the Finance team and needs to be transferred to the Marketing team, we’ll need to process that transfer securely with bigchainDB. To do that, we’ll need the transaction ID that we used to create Jim in the first place:

creation_tx = fulfilled_creation_tx

asset_id = creation_tx['id']

transfer_asset = {
     'id': asset_id,
}

We’ll need a bit more information. To transfer an asset, we need the output index of the transaction (which would be 0 for the first transaction), the public keys of the previous owner (the Finance department in this case), and a fulfillment string that contains the crypto-condition of the transaction (which can be ED25519-SHA-256 or THRESHOLD-SHA-256, two digital signature schemes). To make this easier, we’ll just pull all the required data from the CREATE transaction:

output_index = 0

output = creation_tx['outputs'][output_index]

transfer_input = {
     'fulfillment': output['condition']['details'],
     'fulfills': {
          'output_index': output_index,
          'transaction_id': creation_tx['id'],
      },
      'owners_before': output['public_keys'],
 }

prepared_transfer_tx = bdb.transactions.prepare(
     operation='TRANSFER',
     asset=transfer_asset,
     inputs=transfer_input,
     recipients=marketing.public_key,
 )

We can then fulfill the TRANSFER and check if Marketing is now the new owner of our Jim asset:

fulfilled_transfer_tx = bdb.transactions.fulfill(
     prepared_transfer_tx,
     private_keys=finance.private_key,
 )

fulfilled_transfer_tx['outputs'][0]['public_keys'][0] == marketing.public_key

Note, we use the Finance private keys to actually process the transfer. If Jim was successfully transferred, the above input returns TRUE.

That’s it! We just completed a conceptual introduction to the blockchain, highlighted some applications, and then used bigchainDB to build our own secure, transparent database to transfer an employee from one department to another.


Geoffrey Momin is an Engineer and Technology Consultant. He is actively researching the application of blockchain, artificial intelligence and conversational interfaces to improve human capital and enterprise management.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Up Next:

Deploy JupyterHub for Big Data and AI Collaboration in your team

Deploy JupyterHub for Big Data and AI Collaboration in your team