A Primer on Consensus in DLT

10 min readFeb 19, 2020

Distributed Ledger Technology (DLT) brings together a number of capabilities — shared views of tamper-resistant data, smart contracts, etc. — to create a powerful solution for new applications. Underlying DLT are two core concepts: cryptography and consensus. Both are complex in nature and grounded in mathematics and computer science. However, unlike cryptography where standards bodies (e.g. ANSI, ISO) and government agencies (e.g. NIST) can be relied upon to certify an algorithm, there is not the same convenience for consensus. This puts the onus on organizations to do their own due diligence to confirm whether a consensus algorithm behaves and performs in a way that is expected.

This paper provides an overview of consensus within the context of DLT and will also cover a handful of the most commonly employed algorithm families. The paper is intended for a broad audience and it intentionally stays at the conceptual level for many reasons, not the least of which are:

Substantial bodies of work already exist on the topic in the public domain (over two thousand published on arVix alone);
Within an algorithm family, there are many variants (e.g. Istanbul BFT is an offspring of PBFT) and it will be impractical to cover every variant;
Minimize the loss of accuracy due to summarization

The paper will defer to the appropriate original works as much as possible for details and proofs of specific algorithms.

What is consensus?

While there is no shortage of definitions for “consensus” (Google is Your Friend), for the purpose of this paper, we define consensus as:

the collective agreement on the ordering of transactions and more generally, the state of the ledger

Consensus is necessary to maintain the integrity and consistency of the data that is recorded on the ledger. Without consensus, participants will not be able to trust what’s on the ledger.

In traditional architectures, a central system or entity is responsible for holding the official gold copies of all transactions. Consensus is authoritarian and connecting nodes inherently rely on the central system as the single source of truth. With distributed ledger technology, all nodes are peer to each other and can initiate transactions directly on the ledger. There must be a mechanism for everyone to review and confirm whether any initiated transaction is valid. This mechanism is called consensus and is used to keep the ledger globally consistent for all nodes.

If there is one thing to get out of the way first, it’s: There is no perfect consensus algorithm!

When a vendor claims to have the perfect algorithm — they are either naive, lying — or both! Every consensus algorithm has strengths and weaknesses. Some may be better suited than others for certain requirements (e.g. fast performance in a private network), but become not viable when applied to another set of requirements (e.g. high fault tolerance in a public network).

There are many considerations when choosing a consensus algorithm including:

Type of DLT network: Public, consortium (multi-organization), or private (single organization)
Fault tolerance
Performance
Finality

Popular Consensus Algorithms

Byzantine Fault Tolerance

Before going into different algorithm families, it’s worth taking a moment to understand what Byzantine failures mean. A Byzantine failure places no restrictions on how a node on the network can “fail”:

Crashed, taken offline (planned or unplanned)
Loss of network connectivity or network delays
Pretending to be good, but actually producing bogus data
Basically anything goes!

Byzantine failure is the most difficult condition to solve for in a highly decentralized network. Algorithms that attempt to solve all aspects of a Byzantine failure are called Byzantine Fault Tolerant (BFT). Given the complexity of Byzantine failures, some algorithms focus only on ensuring consensus can be achieved in cases where nodes go offline — these algorithms are called Crash Fault Tolerant (CFT).

Proof-of-Work

Proof-of-Work is arguably the only BFT algorithm demonstrated to work at planetary scale. In Proof-of-Work (PoW), any node that wants to append transactions (typically packaged in blocks) to the ledger must solve very difficult — even for a computer — puzzles; the solving of these puzzles is also referred to as mining. Naturally, a node that is mining is called a miner. At a high level, a PoW-based consensus works as follows:

Each miner determines the block of transactions it wants to submit for appending to the ledger. A puzzle is automatically created based on this block and the current state of the ledger (from the miner’s point of view). Note that this puzzle is unique to each block.
As soon as a miner finds a solution to the puzzle for its block, it broadcasts that block along with the solution to all other nodes.
Other nodes may accept the block by verifying that the puzzle solution is indeed correct and all transactions in the block are valid.
Once a node accepts a block, it appends the block to its ledger and the cycle starts again for a new block of transactions..
In the case of any conflicts, the ledger with the most number of blocks is deemed the correct ledger.

The main idea behind PoW is to discourage cheating by ensuring the effort needed to solve the puzzle is disproportionately high compared to verifying the solution to the puzzle. In other words, bad actors on the network will need to commit significant resources to mining for a chance to have their block of transactions accepted by other nodes. The more resources a node commits, the higher the probability the node will get its block accepted by the ledger — it’s somewhat like playing the lottery.

PoW are effective in public networks where participating nodes cannot fully trust each other. There are many PoW algorithms, but the most famous comes from Bitcoin. By design, PoW is slow and requires expending a non-trivial amount of resources, be it compute or something else, to participate.

Lastly, PoW consensus are vulnerable to so-called “51% attacks” in which a group of bad actors is able to control sufficient mining power to introduce blocks with fraudulent transactions. As long as an adversary controls less than half of the total mining power present in the network, PoW consensus prevents the adversary from creating new blocks faster than honest participants. Finality for PoW is probabilistic in that the likelihood of accepted transactions being changed or reversed decreases as more and more blocks are appended after them.

Proof-of-Stake

Proof-of-Stake (PoS) is another lottery-based algorithm family. Instead of miners expending computational resources in order to participate in the consensus process, a node is randomly selected with a probability proportional to the stake that node possesses according to the current ledger. The node that is selected is called a minter. A stake may be denominated in any asset or currency widely accepted on the network as having significant value; for example, in Ethereum a stake is denominated in ether.

Like PoW, PoS are popular in public networks. There are also many PoS algorithms, each with a variation on these two facets:

Mechanism for random selection of minters
Block minting: whether minters are directly minting blocks of transactions or whether they are simply proposing blocks and other stakeholders of the network will need to vote on the proposed block before it will be accepted on the ledger.

A fair random selection of minters is fundamental to PoS. Early implementations of PoS used stake as the primary selection factor which led to the Matthew Effect whereby nodes with outsized holdings on the network could prioritize transactions that were advantageous to them.

Finality is probabilistic for PoS algorithms that allow minters to directly mint blocks. PoS algorithms in which minters only propose blocks can have deterministic finality; the subsequent voting by other stakeholders on the network will “seal” the block.

Practical Byzantine Fault Tolerance (PBFT)

Outside of PoW and PoS, of particular significance is the work from MIT researchers which led to the Practical Byzantine Fault Tolerance (PBFT) algorithm. PBFT was important in that it spawned a whole new family of protocols.

The main ideas behind PBFT are:

Nodes are sequentially ordered from 0 to n-1 where n is the number of nodes in the network.
All nodes are connected to each other and the network itself is assumed to be partially synchronous.
Time is divided into views and for each view, a node is designated the Leader of the network.
Each node takes turns being the Leader in a never-ending cycle, starting with the first node. For example, in a ten node network, node 0 is the leader for View 0, node 1 is the leader for View 1 and so on and so forth. When the network gets to View 10, node 0 will get another chance at being the leader again.
Transactions are sent to the Leader node which is responsible for sharing those transactions with all other nodes (called Replicas).
Every replica executes the same transactions in the same order as provided by the Leader node. After execution, at least 1/3 of the replicas must arrive at the same ledger state for the ledger to be updated.

A Leader node can be replaced during every view if:

A predefined amount of time has passed in which no communication has been received from the leader
A majority of honest nodes votes to replace the leader with the next node in line

Given n number of nodes in the network, PBFT can ensure consensus will be reached if more than two thirds of the network are honest. More precisely, PBFT can tolerate (n — 1)/3 malicious nodes.

The PBFT works efficiently when the number of nodes in the network is small. Due to the high communication overhead (remember all nodes are connected to each other and must actively communicate), performance degrades substantially as more nodes are added.

PBFT is susceptible to Sybil attacks, where one party controls many nodes. As the number of nodes increases, Sybil attacks become increasingly difficult to carry out. Unfortunately, the increase in number of nodes also means a corresponding substantial decrease in network throughput.

Descendants of PBFT include Istanbul BFT, BFT-SMaRt, HotStuff (which inspired LibraBFT) and many others. Each of these algorithms attempt to optimize PBFT by relaxing certain conditions to improve performance.

Raft

Raft also relies on a leader node to drive consensus. Time is divided into terms (similar to PBFT’s views) and for each term, a node is elected Leader of the network. A voting process is used to elect the leader; the node with a simple majority of votes will be the leader for the term. Nodes not acting as a leader are called followers.

Unlike PBFT, Raft assumes the leader node is always honest and is given full authority to process all transactions directly to determine what the resulting ledger state should be. This resulting ledger state is shared and must be accepted by all followers within a term.

Raft is deemed a crash fault tolerant (CFT) and not a BFT algorithm since it only handles crashing and manual stopping failure conditions. When a leader node crashes in Raft, the other nodes will automatically elect a new leader after a predefined timeout period. Recall that the leader election in PBFT is predictable in a round-robin format. In Raft, any node can nominate itself as a candidate and subsequently be voted by its peer for the leader role.

Raft is a viable algorithm in a private permissioned network where the identity of all participants are known and deemed trustworthy. Assuming these conditions are true, then Raft can be used to ensure the ledger for all nodes will be consistent with that of the most current elected leader node.

Consensus in Permissioned Networks

In a permissioned network, participants’ identities, and by extension, individual node identities are known. Contrary to what some may want you to believe, permissioned networks are not fully decentralized — they all rely on some degree of centralization or authoritative service to properly function.

Corda

Consensus in Corda is fairly straightforward: each transaction is routed between appropriate participants for signing and a Notary service handles the final verification to make sure there is no double-spending for the transaction. While the Notary service can optionally also validate smart contract results (i.e. validating notary), this has not been widely used due to implementation complexity.

The open source version of Corda (as of this writing, Jan 2020) only supports a single-node Notary service and despite what is suggested in the documentation, this configuration is not really suited for mission-critical Production use. On the other hand, in the enterprise version of Corda, the Notary service can be backed by multiple notary nodes for high availability. These nodes take turn in a simple round-robin fashion to service incoming requests.

Network participants generally assume the Notary service is trustworthy. Any compromise of the Notary service would ultimately lead to a complete halt of a Corda network since transactions can no longer be finalized.

Fabric

Consensus in Hyperledger Fabric is broken out into 3 phases: Proposal, Ordering, and Validation:

Proposal: each node decides whether to accept or reject a transaction as specified by an endorsement policy. Endorsement policies are written using the grammar rules:

EXPR(E[, E...]) where EXPR is AND or OR and E is a principal or nested EXPR(...)

For example, let’s say you have 4 participants: Superman, Batman, WonderWoman, and Flash. A policy definition of

AND(Superman, AND(Batman OR WonderWoman))

would mean you need a total of 2 endorsements out of 4 participants. Superman must endorse PLUS either Batman or WonderWoman. (Sorry, Flash — your opinion is not needed here :-))

Ordering: determines order of endorsed transactions within a block
Validation and Commit: takes a block of ordered transactions and validates the correctness of the results, including checking endorsement policy and double-spending.

The Ordering phase is handled by something called an Ordering Service which itself is comprised of one or more Ordering nodes (called Orderers). Since each Orderer handles only a portion of all incoming transactions, there must be a way to agree on the “global” order for transactions across all Orderers. This is where traditional consensus algorithms (covered in prior sections) come into play. Fabric ships with Raft (Apache Kafka, which implements another CFT algorithm, has been deprecated in v2.x), to ensure that transactions will be delivered (liveness guarantee) and packaged in the correct order (safety guarantee). Since these are CFT algorithms, they assume that each Orderer is trustworthy. A Byzantine fault tolerant option is planned for a future release of Fabric.

Enterprise-flavors of Ethereum

Ethereum was built from the ground up for decentralization. Nevertheless, in deployments of enterprise-flavors of Ethereum, usually only a subset of nodes are allowed to participate in consensus. These nodes (referred to as validators) collectively act as an authoritative service and are responsible for “certifying” all transactions. The validators use a CFT or BFT algorithm to reach consensus amongst themselves before propagating a block to other non-validating nodes.

The Enterprise-flavors of Ethereum include JPM Quorum, Axoni, and others that implement the Enterprise Ethereum Alliance (EEA) specification.