|6-Jan-2018||Like this? Dislike this? Let me know|
Blockchain ... it's almost too much to take on in one rant.
First and foremost, there is no single precise definition of "blockchain" like there is with, for example, the derivative of x2+1. The term blockchain is now used broadly to cover a soup of approaches involving immutability, transaction management, distribution of data, payload, and consensus. Here is a sample from both ends of the spectrum:
|Concept||Bitcoin||Hyperledger (incl. many variations within)|
|Participants||Completely anonymous, known only by public key||All participants well-known and identity is vetted|
|Transactions||Accumulate in an uncommitted block (effectively a bucket). Miners attempt to find the right data (a "nonce") to add to the block that will produce a properly constructed hash or fingerprint of the block, after which the block can be committed to the chain and the result broadcast to the mining network||No mining, no nonces, and essentially no blocks. Each transaction (e.g. modification of a loan agreement, updating a current assessed value, etc.) yields a new version, which is hashed and committed to the chain|
|Consensus||Statistically driven, relying on large number of participants. Side chains can emerge but eventually, participants add more and more blocks to one particular chain, leading to longest chain wins model. Statistics suggest that after 6 blocks have been committed to a chain, the transactions within are nearly (but not 100%) guaranteed to be correct and without double-spending.||Workflow entitlements driven, relying on specific actions by specifically named participants. No consensus required although the workflow might demand two or more participants to do something before state change can take place. But this is not the same thing as law-of-large-numbers statistical consensus.|
|Distribution Model||Fully distributed data and processing running on any infrastructure from the cloud to a PC on a desktop. Many nodes in the network, each with a copy of the blockchain. Nodes broadcast changes and listen for others and each applies the same algorithms to achieve global consensus.||(Most extreme variation) Single copy of a single workflow running on infrastruture in the cloud hosted by a major company. No other nodes, no other copies (other than those manually created and maintained by participants but these are not part of the consensus / data integrity model)|
|Payload||Completely objective and context-free, the value and bookkeeping data about the bitcoin. Any party examining the payload can understand it||The digital asset is an arbitrary payload such as a loan that may have a great deal of subjective and context-sensitive data. Consistent relevance/importance and interpretation of all the data to every party involved in the workflow is highly questionable, e.g. the building inspector does not care about the LTV and toggle rate parameters of the loan -- and by extension does not want to in any way be responsible for assuring their integrity|
So... which one is correct? Both. Wikipedia summarizes blockchain thusly and I believe it is not only a fair description, but one that could be applied to both scenarios above:
A blockchain, originally block chain, is a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data. By design, blockchains are inherently resistant to modification of the data. The Harvard Business Review describes it as "an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way." For use as a distributed ledger, a blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority.The landscape is filled with terms like "distributed ledger" and "smart contract" and different definitions have been applied to each one depending on the particular product at hand. In other words, there are many different (and useful and interesting) products and solutions performing very different workloads at different scale and performance -- and each trying to tag the solution with as many blockchain terms as possible.
Instead of adding to the mess by proclaming another top-down definition of the blockchain as it revolutionizes yet another business use case, let's instead start fresh from the bottom up: a chain of transactions.
fingerprint1 = hash(version 1) fingerprint2 = hash(version 2) merged = concatenate(fingerprint1, fingerprint2) chain_fingerprint_from_1_to_2 = hash(merged)
fingerprint2 = hash(version 2) fingerprint3 = hash(version 3) merged = concatenate(fingerprint2, fingerprint3) chain_fingerprint_from_2_to_3 = hash(merged)
fingerprint3 = hash(version 3) fingerprint4 = hash(version 4) merged = concatenate(fingerprint3, fingerprint4) chain_fingerprint_from_3_to_4 = hash(merged)
As a result, given a list of versions of a transaction, it is possible for anyone to "walk" the list and recalculate all the fingerprints and ensure that the recalculated data matches whatever was originally stored. Not a single byte of any of the versions can change nor can the order of the list. No secret keys are required; in fact, no keys are required at all and the process is patently transparent. It almost does not matter if a thing in fact has a "version number" as part of its data payload. It is the creation order and fingerprint chaining that is the ultimate guarantor of integrity and transaction activity over time.
As valuable and important as chain immutability is, there are several important points that should be made here:
Note: Even in a single shared central ledger design like the IBM Cloud Hyperledger, it is very likely that participants will have to make an out-of-process copy of the ledger in order to integrate the data with other systems. Long story short, there is no practical way you are going to issue a SELECT statement to join the blockchain persistor to your local database.
An important concept to understand is that you must be very careful to physically guard your private keys.
It is much, much easier to steal a private key than to computationally attack
encrypted material. This challenge has been present for more than 20 years.
In the emerging world of smart contracts, this could have devasting consequences as contracts signed by you (but not really you) automatically transfer ownership of your car to an unintended third party, which quickly sells the car for bitcoins, remaining anonymous and leaving you to deal with the new owner who can present cryptographically secure proof that he owns the asset. Because people are fallable -- much more so than strong cryptography -- clearly legal counsel will continue to be a needed profession.
In the mid 1990s, we stored smallish perl programs in a database as a BLOB. These programs exploited the compactness and "quickness" of perl to perform if/then/else logic and array and hashmap manipulation without getting buried in the rigid and unterse syntax of C++. Every night, a C++ program linked with the perl interpreter would iteratively fetch these programs based on various criteria, determine the data needs, make market and other information available to it, let it run its perl logic (which could also make use of the parent C++ program's high performance functions and, indeed, the distributed computing environment), and then save results back to the database.Sound familiar? It is also important to note that even today most smart contract implementations have some sort of a runtime context around them. In other words, the contract software as a unit of release just "sits there"; something has to run it and bring it to life. In the example above, the parent C++ program was execution engine that took care of this. Today, smart contracts require a similar engine that is live and sitting on top of the blockchain. The code for the smart contract is part of the data payload managed on the blockchain and enjoys the same benefits of immutability as regular "simple" data like fields of numbers and text.
Note that you actually don't need a blockchain to make a smart contract run, but the versioning, chain immutability, and signature integration capabilities of a blockchain stack tremendously improves the robustness and integrity of the actions autoexecuted by the smart contract.
To be fair, smart contracts have a little more work to do than our 1995 version:
In the Bitcoin system, consensus is achieved through proof-of-work by a
statistically important large number of participants performing extremely
objective and clear (but time/cost expensive) operations for which there is
specific incentivization. Perfect.
But the concept of consensus gets murkier when it is not a statistically based problem involving data much more complex than a bitcoin value. For example, consider a real estate processing blockchain involving 6 parties: the buyer, the seller, the broker, the buyer's bank, the housing inspector, and an escrow bank. There is no bitcoin-style consensus here. There are not 1000s of miners performing the same task in parallel, each trying to win the next block. Instead, there is only one of each type of participant, each with a different set of responsibilities and incentivizations. This gives rise to the following:
No participant will provide input to consensus or voting regarding authenticity and/or accuracy on data or process for which they are not incentivized (typically through monetary compensation) and for which legal backstops and risk mitigation have not been established.This does not defeat the usefulness of the blockchain, of course, but developers of solutions must be careful when using the term "consensus." Consensus is only appropriate when 2 or more participants work in parallel and a mathematical model is employed to determine if conditions are sufficient for workflow to move forward. It is worth noting that consensus does not have to be a PhD-complex algorithm or one that demands large numbers of participants -- and in fact, many consensus models in longer cycle business-transaction workflows look very much like standard workflow approval, e.g. if a simple majority of participants at stage n say all is well, proceed to stage n+1. Or even simpler (and very common): when all participants say all is well, move to stage n+1. As such, in most business workflows, it will be necessary to clearly define the fields for which a participant has "vouching/review" responsibility. This is the next step beyond basic read/write entitlements.
An exciting opportunity exists to hybridize single-actor workflow together with consensus via crowdsourced incentivized participation. Relatively simple but somewhat more subjective steps in a workflow could be tackled by dozens or more participants, making their responses (mean and standard deviation) more statistically relevant.
Like this? Dislike this? Let me know