Basket of Apples, 1895, by Levi Wells Prentice.
Basket of Apples, 1895, by Levi Wells Prentice. (source: Wikimedia Commons).

Blockchain may be the buzzword of 2015, but few understand what blockchains are. In this article, I’ll compare the key components of blockchains to their analogues in better-understood technologies, such as Git, BitTorrent, and Raft. In doing so, we’ll develop a sense of what blockchains are and how they present a unified framework to solve problems that are presently solved by many disparate tools.

Blockchains are databases

At their core, blockchains are databases. They contain data that is persisted to disk. Blockchains usually employ tabular schemas, which encode their core data types on top of a traditional database.

Common tables are accounts, transactions, or blocks. Roughly speaking, a block is a bundle of transactions, which are executed to update account balances. Consider a transaction encoded in SQL-like pseudocode below:

SQL-Like Pseudocode I:

[   BEGIN transaction;

         UPDATE account_balance SET account_balance = account_balance-50 WHERE account_address = sender;

         UPDATE account_balance SET account_balance = account_balance+50 WHERE account_address = receiver;

    END transaction;         ]

This transaction simply sends 50 units of a blockchain-resident currency from the sender to the receiver. The word transaction is appropriate in two senses: the traditional meaning of transaction in database theory is all or nothing—all the actions contained in a transaction are completed at the same time, or the partially completed actions will be rolled back.

Blockchain transactions are all or nothing, but also encode a global invariant: the total amount of value in the system is constant (note: see discussion of mining, below). Thus, transactions on blockchains are like cash transactions—the money really changes hands.
 

Understanding transactions

Transactions can contain more logic than in the simple example above. First, notice that the example above is buggy. If the sender has a balance of less than 50 units, the transactions shouldn’t go through at all because the sender will end up with a negative balance. Let’s fix that problem.

SQL-Like Pseudo Code II:
[   BEGIN transaction;
         UPDATE accounts SET account_balance = account_balance-50 WHERE account_address = sender AND account_balance >= 50;
         UPDATE accounts SET account_balance = account_balance+50 WHERE account_address = receiver;
    END transaction;         ]
 

Now we’ve added a bit of logic to our transaction—we’ve validated that the sender has the requisite account balance before updating each account. We are assuming that if the first UPDATE fails, or is empty, then the whole transaction will rollback, so the receiver won’t get 50 units out of thin air.

Let’s take this a step further—we’ll only execute the transaction if the receiver isn't too rich.

SQL-Like Pseudo Code III:

[   BEGIN transaction;
         UPDATE account_balance SET account_balance = account_balance-50 WHERE account_address = sender AND account_balance > 50;     

         UPDATE account_balance SET account_balance = account_balance+50 WHERE account_address = receiver AND account_balance < 10000;

    END transaction;         ]
 

Here, we create a transaction that updates the sender and receiver balances only if the sender has the requisite capital, and the receiver’s account balance is less than 10,000 units. This is a hint of the idea of smart contracts. A smart contract is somewhat like a transaction that encodes balance transfers based on programmatic logic along the lines of example III. We’ll discuss smart contracts in more detail later.

Persistent, replicated databases (related technology: Git)

Persistent or immutable data structures, which are common in pure functional programming,  are data structures that cannot be mutated (in place). Rather, to update a persistent data structure, it is necessary to create an entirely new version of that data structure. It might sound inefficient to do so, but one of the advantages is that persistent data structures are “tamper proof” and, in practice, contain their histories. The transaction history of a blockchain is persistent—that is, each block contains an authenticated reference to its parent, which contains reference to its parent, and so on. These references are typically hash-based, as in Git, where each commit hash represents an isolated snapshot of the state of the entire system.

Further, blockchains are replicated databases. Each participant in a blockchain protocol has a copy of the blockchain’s entire transaction history. As with Git, there is a symmetry between “local” and “remote.” Any remote peer should have exactly the same transaction history as the local peer (i.e. the client running on your machine), up to the last few blocks.

The last few blocks live in a sort of limbo until the network chooses a best block, which is analogous to the Git HEAD. It’s as if a band of rogue developers are in a commit war and are fighting over the syntax of a particular module. They will eventually become tired and achieve consensus.

Peer-to-peer networks (related technology: BitTorrent)

If blockchains are replicated databases, how can you get a copy of one? And if they contain money, how can one be sure that the database isn’t fraudulent or out-of-date?

Part of the answer is that blockchains usually overlay a BitTorrent-like peer-to-peer network. When joining the network, one connects to a set of peers, and downloads the blockchain from them, validating the correctness of each block along the way. One doesn’t necessarily have to download all the blocks in order—as in BitTorrent, the blockchain can be assembled piecemeal.

Distributed consensus (related technology: distributed databases, Raft)

Once we’ve downloaded the blockchain, how do we know that it is authentic? How do we validate the blocks we’ve downloaded?

Blockchain employs consensus algorithms similar to the popular consensus algorithm Raft. A consensus algorithm is a procedure for a set of actors or machines to agree on the state of some system. Raft is implemented in many popular distributed computing technologies including Akka and RethinkDB.

Like many other consensus algorithms, Raft operates in the state machine replication framework. This means that there are a number of machines that have identical rules for updating their state, but transmit updates to each other unreliably. Their task is to work together to sync their states efficiently.

Raft approaches the distributed consensus problem by solving three subproblems:

Leader election

Raft has randomized leader election. Leaders must be replaced when they fail or are otherwise unresponsive, and there is, at most, one leader at a time.

Log replication

The leader must accept updates and serialize them to an append-only log. Further, the leader must force the rest of the machines to be in sync with the leaders’ append-only log.

Safety

Assume the updates are indexed by positive integers. If any machine has an update at a given index, then no other machine will apply a different update at that index.

Minting new coins (mining)

Blockchains are quite similar—the blockchain database itself is an append only log. A peer on the blockchain peer-to-peer network can become the leader (and gain the right to propagate updates) via mining. Mining is basically an anti-spam mechanism that enforces an economic cost to attacking a blockchain—peers must burn a significant amount of computational energy to earn the right to publish a block. It also confers an incentive: mining is typically the only means by which new “coins” are minted within blockchains.

The key safety property is that blockchains have is that history is immutable. This is guaranteed at the block level and at the database level: blocks must contain only valid, consistently ordered transactions, and contain a unique reference to their parent block. Blocks added at an earlier point in history than the present are ignored by default. Thus, when a new block is propagated, it is appended to the history of the system, as there is no incentive to build blocks “previously in time” where they could revert history.

Technology

Leader Election

Log Replication

Safety

Raft

Randomized

Append Only

Updates Never Overwritten

blockchain

Mining

Append Only (Blocks)

Transaction History Never Overwritten

Embedded identities (related technology: TLS)

Identities on the blockchain are roughly equivalent to sets of public and private key pairs—so, an identity is a collection of accounts. The public key is the “address” associated with an account. The private key allows the account owner to spend the balance of the account, as in the SQL-like pseudocode examples, above.

To ensure that a transaction is authentic, it is cryptographically signed by the private key associated with the sending account. So, before executing a transaction, one must check that it is signed by the owner of the account sending the transaction.

Identities as public-private key pairs is not a new idea. It is core to the architecture of Transport Layer Security (TLS): certificates are roughly public-private key pairs that prove to a user (or the user’s browser) that the pages the user is being served are from an authentic source. TLS certs are issued and validated by a number of certificate authorities. Identities on the blockchain are without a certificate authority, or, put another way, the certificate authority is the blockchain itself.

Smart contracts: Like SQL expressions & triggers

Smart Contracts are maybe the number two buzzword of 2015. We hinted earlier that smart contracts were just transactions with logic. However, while the transactions we wrote before were just one-offs, contracts sound like something persistent. Contracts have rules that need to be enforced in the future, not just at the time of creation.

Thus, contracts are a bit closer to SQL triggers than SQL expressions. A SQL trigger is a persistent rule that updates a table entry when some condition is met. Here’s an example of a contract written as a SQL trigger, in which the husband immediately sends any money he’s received to his wife.

SQL-Like Pseudocode IV:

[

CREATE TRIGGER forward_balance AFTER UPDATE on accounts
    FOR EACH ROW

        WHERE NEW.account_address = husband_address

            BEGIN transaction;

              UPDATE account_balance SET account_balance = account_balance+ NEW.account_balance  WHERE account_address = wife_address;
              UPDATE account_balance SET account_balance = 0 WHERE account_address =    husband_address;
            END transaction;  

]

The contract is invoked on a push basis: an event happens (the husband gets paid) and a transaction is created in response (the husband hands the balance of his account to the wife). A common assumption about smart contracts is that they can invoke themselves. At present, all production smart contract designs are push-based, in the sense that the user pushes data into the contract to invoke it, and the contract otherwise lays dormant. Smart contracts that invoke themselves are possible, but haven’t been implemented yet.

What can we really do with blockchains?

Financial contracts

A good example of where a blockchain is useful is in the settlement of financial contracts. As opposed to the typically slow movement of money through the infrastructure of traditional financial institutions, blockchain-based settlement is fully-automated and nearly-instant. Margin requirements are enforced a priori, and at settle time, payouts happen automatically.

Passport for financial identity

Blockchains can also mitigate the challenges facing financial institutions, related to Anti-Money Laundering (AML) and Know Your Customer (KYC) laws. These laws require institutions to collect detailed financial histories and sensitive personal data from all of their customers, a process which is often duplicated, every time a customer interacts with a new financial institution. Using blockchain technology, KYC requirements can be enforced at the protocol level—identities being roughly synonymous with known lists of public-private key pairs.

Blockchains contain complete and authenticated payment histories, at per-customer granularity. These payments histories are portable—it is easy to extract them from a blockchain and run sophisticated analytics. Blockchains offer a passport for financial identity.

Data consistency and provenance

Health care is a perfect example—using blockchain, medical records can be referenced (but not stored), by a unique patient identifier. Referral rules are programmatic and can be adjudicated by smart contracts. They can be recorded to the blockchain to create an efficient, fraud-proof market for matching doctors and patients. Using blockchain technology could allow billing to occur instantly and help keep prices fair.

Internet of Things

In large-scale manufacturing, where a company might be producing tens of millions of widgets per year, blockchain technology could enable a highly automated production pipeline. A blockchain-aware inventory scanner can scan the barcodes of each part as they arrive in the factory, posting a certificate of the part’s integrity and authenticity to the blockchain. The factory itself can have thousands of blockchain-aware devices that perform functions ranging from detecting failed components and ordering new ones to managing load balances to serving as an employee time clock.

Article image: Basket of Apples, 1895, by Levi Wells Prentice. (source: Wikimedia Commons).