Addendum to the "Public Subnets" thread: Helpful info on subnet workings

Regarding the referenced thread, I want to share some technical explanations to help clarify discussion, and also provide some points for consideration. I will start by explaining how and why Generation I & II blockchains support the local processing of blocks and transactions, then explain how and why the Internet Computer, as a Generation III blockchain, uses a different model, which does not require the local processing of blocks using e.g. “local nodes.” Finally, I’ll raise some technical issues that would be involved in adding that functionality for use in special cases, such as with the Network Nervous System DAO, and raise some societal, security and regulatory issues involved with making all its data, and the interactions with it, totally transparent.

Why Generation I & II blockchains involve local processing of blocks

Generation I (e.g. Bitcoin) and Generation II (e.g. Ethereum) blockchains provide a model for secure interaction that cannot work for Generation III blockchains. Their networks maintain copies of all the blocks of transactions added to the chain and processed. When a party wishes to securely interact with the blockchain, in a manner that does not involve trusting some centralized actor, such as a crypto exchange, or Infura or Alchemy, say, they must download a copy of the blocks, and re-run all the transactions locally to construct a copy of its current state. Since the blockchain protocols involved make it possible to verify that the blocks downloaded represent the correct chain, the party then knows they are interacting with a correct copy of the state, and they can also see past cryptocurrency transfers, and the results of smart contract computations, say. This enables them to safely interact, for example by creating new transactions.

A self-hosted (“decentralized”) Bitcoin wallet must download the blockchain’s past blocks, and re-run the transactions they contain to calculate its bitcoin balances and obtain the transactions sent and received. Meanwhile, those creating Web3 services using Ethereum, say, will typically build the website on a cloud account (for example one provided by Amazon Web Services) by installing a web server, database and Ethereum “local node” that downloads its blocks, and re-runs the transactions, to create a trusted local copy of its smart contracts and data that the website code can interact with. A key problem has emerged with this model over time, which been that as these blockchains grow, it has become more and more expensive to download the blocks and re-run their transactions.

As early as 2013, I can remember my self-hosted Bitcoin wallet taking hours to initialize, and today Ethereum local nodes can often take days to initialize when started, even when running on a very powerful computer. As the result of this key problem, few people now use traditional self-hosted Bitcoin wallets, and when they do, they choose a self-hosted wallet that relies on recent state checkpoints, which involves trusting the checkpoint creators, or worse, and much more commonly, they just keep their bitcoin on a centralized crypto exchange like FTX. Meanwhile, the vast majority of Ethereum developers now simply build blockchain websites using code that interacts with centralized blockchain infrastructure services such as Infura and Alchemy, choosing to trust them as a tradeoff against the cost of running a local node, hoping that those services have not been hacked and aren’t malicious, and also trusting the clouds that these services themselves run on.

For this and other similar reasons, the majority of participants in Generation I and II Generation II ecosystems no longer verify their interactions with their blockchains in a trustless and decentralized way, and instead rely on trusting centralized actors. This not only reduces their security and resilience (for example, Infura has suffered outages because Amazon Web Services has gone down), but also lays them open to being disrupted, for example by regulators who impose conditions upon those centralized actors. This means the architecture that has become the status quo is far from desirable, and those building Generation III blockchain need to leave this old model behind. However, the referenced thread discusses adding support for this traditional model, which is not currently supported by the Internet Computer, for use in specific cases,

Why generation III blockchains cannot easily use the old model

Because Generation III blockchains (today, only the Internet Computer) must operate at massive scale they can’t easily use the old model. Ethereum only processes a handful of transactions a second, while the Internet Computer, was already recently processing more than ten thousand transactions a second, and we hope that one day it will process many millions or billions of transactions a second. Indeed, the network has already processed more than 1.5 billion blocks, and one day will have processed quadrillions of blocks. It is hard to see how developers could run local nodes that could download such large numbers of blocks and replay their transactions economically. Moreover, it would be extremely expensive for the blockchain to store every past block for later download by local nodes.

Another reason why the old model isn’t generally suitable, is that Generation III blockchains must host smart contracts that can directly and securely serve content to end-users, including transmitting content into web browsers, whose pages then directly interact with hosted smart contracts, closing the loop – as this is the only way to obtain the genuine end-to-end decentralization required for security, liveness and censorship resistance purposes, and to allow Web3 services to run fully on-chain under the absolute control of community DAOs, fulfilling a key Web3 promise. Even if some technically feasible way to transparently embed local nodes inside web browsers was found, perhaps by making the local nodes more efficient using zero knowledge proofs, say, so that direct interaction could be enabled using a hybrid of the old model, it’s doubtful anyone would want this!

Forming a blockchain from “subnet blockchains”

The Internet Computer doesn’t need the old model, but understanding why involves an understanding of how it is formed from “subnet blockchains”, and how they work. Each subnet provides the blockchain with additional capacity for hosting smart contracts, allowing it to scale as required. The entire network is directly controlled and managed by a permissionless governance DAO called the “Network Nervous System” (NNS). This forms new subnets from node machines, which are special hardware devices owned and operated by independent “node providers” who install them in independent traditional data centers located in a variety of geographies and jurisdictions around the world.

The NNS creates new subnets very deliberately, by instructing sets of nodes in the Internet Computer network to combine that are sufficiently decentralized, when considering their node providers, data centers and locations, that a) when combined by the network protocols involved they can provide the necessary security and liveness guarantees expected of blockchain, while b) at the same time the minimum possible number of nodes can be combined to reduce the replication of data and computations and thus the cost of running smart contract software. There are different types of subnet, with different replication levels, which host smart contracts with different guarantees, and different operational costs. This approach is unique to the Internet Computer and is called “deterministic decentralization”

The NNS lives on its own subnet, which was, naturally, the first subnet created at mainnet Genesis, since it was responsible for creating every other subnet that exists. A game-changing innovation is that each subnet blockchain has its own unique “chain key,” which it uses to sign its interactions with users and other subnets, and the public chain key never changes, even as the subnet has nodes added and removed. The chain key of the subnet that hosts the NNS serves as the Internet Computer network’s “master chain key,” which like other chain keys never changes, and the NNS uses this master key to sign each new subnet’s chain key, so it can operate as part of the network.

How subnets maintain and use their chain keys

Chain Key Crypto is the name for the special protocol math and cryptography at the heart of the Internet Computer blockchain that makes it possible for subnets to have chain keys. A chain key is produced using threshold cryptography schemes, which involve one public key, and lots of individual “private key shares” that are held by the different nodes in the subnet. This might sound simple, but its not: the complexity is in creating the private key shares in the first place, using an NIDKG (Non-interactive Distributed Key Generation) protocol, and key resharing protocols, which allow the subnet to have nodes added and removed without causing a change to its public chain key. The resharing process in fact runs constantly, with the aim of defeating “adaptive adversaries” that wish to incrementally steal a consistent threshold of private key shares from nodes, by constantly remaking the shares.

So long as a threshold number (i.e. a subset of a certain size) of the nodes are operating correctly, then the subnet can create signatures using its chain key. This is important for the functioning of the blockchain consensus protocols, and also for signing the output of consensus, including the results of newly processed update call TXs, and pre-signing (pre-finalizing) a Merkle root of certified query call TX results, which can then be returned without going through consensus. Threshold signing is Byzantine fault tolerant, which means that faulty (i.e. arbitrarily bad nodes) cannot stop a threshold of correct nodes signing, which property is also leveraged by the blockchain protocols in the way they ensure that subnets are Byzantine fault tolerant (i.e. so they continue operating without any corruption to data or function even when a portion of their nodes are arbitrarily faulty).

Enabling scaling and direct interaction using chain keys

Given the current status quo, it is easy to imagine that a blockchain by its nature must allow users to download past blocks to verify its state, which leads to a misconception that it is a necessary part of blockchain security. However, the requirement is really just that a blockchain provides a tamperproof and unstoppable platform that supports autonomy and is trustless because no centralized actor can do anything other than submit legal transactions to create updates to the state. While it might seem necessary to download the blocks, and replay the transactions to reconstruct the current state, because of past practices, it is possible to devise other systems that can be proven equally unbreakable by mathematics in which cryptography takes care of the verification. This is what Chain Key Crypto protocols do for the Internet Computer.

The availability of chain keys is core to how this is done. When your software interacts with the Internet Computer, the results returned are signed by the chain key of the subnet hosting the smart contract software involved, which key in turn is signed by the blockchain’s master key. Thanks to the way the protocol math works, if the chain key signature validates, this not only tells you that your interaction has not been tampered with, but also that the subnet blockchain returning the result is running correctly and that neither its state or computations have been tampered with (which could only be achieved by corrupting the subnet blockchain involved, for example by overwhelming the fault bounds of its Byzantine Fault Tolerant protocols). This means that it is not necessary for you to download its blocks and re-run the transactions so you can be sure of what is happening.

Chain key signing thus makes three wonderful things possible. Firstly, when two smart contracts interact that are hosted on two different subnets, the subnets can securely pass the traffic involved without needing to download and process the other’s blocks, since they can simply check the chain key signatures on the traffic instead to know that a) the traffic has not been tampered with, and b) the sender is also running correctly. This means that one unified blockchain environment can be created by combining any number of subnets, and that capacity can be scaled by adding new subnets. Secondly, user software, such JavaScript in a web page, can securely directly interact with smart contracts hosted on the Internet Computer by checking the chain key signatures on results, and indeed, the assets composing such web pages, can also be signed and directly served by smart contracts. Thirdly, subnet blockchains can discard blocks they have already processed when the protocol no longer needs them, saving node machines from having to store them – which might quickly exhaust their memory if they are processing a block a second, say.

In summary, a blockchain is defined by the properties it provides to hosted ledgers and smart contracts, such as making them unhackable (tamperproof), unstoppable (always keeping them live) and supporting autonomy (i.e. tokens and smart contracts that exist independently of any centralized party, which provides for sub-properties, such as being censorship resistant). It is the provision of such properties that define what blockchain is, not the technical mechanisms used to create a blockchain network. This means that the Generation I/II model in which participants locally download and process copies of a network’s blocks to secure interaction and ensure copies of the “correct” chain are maintained, is just one possible technical solution. What matters are the provable mathematics of the protocols involved, and the associated security analyses, which show that the required properties are provided. Chain Key Crypto, while undeniably complex, meets these requirements while allowing for the production of Generation III blockchains that can play the role of a World Computer.

Thinking more about what have been called “public subnets” in the referenced thread

Firstly, it’s worth mentioning that things went a little awry because of the nomenclature used by the topic, which talked about “public subnets,” which imply that today’s subnets are private, or something, when in fact they are all public and can be accessed by anyone and run under the control of the permissionless Network Nervous System DAO. What was meant, was today’s subnets do not allow arbitrary access to their current state, and that smart contracts can keep their data private (i.e. your only route to the data hidden inside a smart contract you do not control is through the logic it makes available to you). To gain “unauthorized” access to such smart contract data you need to obtain physical access to a node machine in the subnet that hosts it (and misusing such physical access will soon become harder when node machines switch on their SNP-SEV hardware privacy technology). Furthermore, subnets do not provide access to the previously processed blocks of transactions that created the current state.

The referenced post was only concerned with providing access to smart contract state, and the transaction history that created that state, for specific subnets, such as the subnet that hosts the NNS. Meanwhile, the use of the term “public subnet,” which implied existing subnets are not public, created rather a lot of discussion. Regarding the proposed changes, I do not personally have super strong opinions one way or the other at the moment, but I will highlight some important things to think about.

The technical challenges involved

  1. The NNS subnet is processing around 1 block of transactions a second. Should it make those blocks available for download, so that people could try to verify its state using the Generation I/II “local node” model, then each participant running some kind of local node would have to download and process 86,400 blocks a day. Depending upon how many people wished to do that, an enormous bandwidth overhead would be created, which would be expensive, and it’s not clear how that could be measured and node providers compensated via the protocol or NNS. No doubt it could be worked out, but the network would still have to bear a very substantial additional expense.

  2. Thanks to Chain Key Crypto math, Internet Computer subnets do not need to keep old blocks around for long, which avoids filling up the memory of the node machines unnecessarily. With some modifications, however, nodes could keep, say, the last 3,000 blocks around (50 minutes worth), which would allow those running “local nodes” to resync if they get disconnected momentarily. Every sync/resync would be very expensive though: a) first a syncing “local node” would have to begin downloading a checkpoint “snapshot” of the entire state, while storing every streamed block thereafter, then applying the collected blocks to the fully downloaded snapshot state to catch up, and b) any “local node” that got disconnected for more than 50 minutes would have to restart this expensive procedure. This magnifies the bandwidth expense involved in streaming the blocks, and might create a DDoS vulnerability that attackers could go after. Moreover, even if only one snapshot were created an hour (and the previous one discarded), this would halve the amount of state that could be stored on a subnet.

  3. One of the most amazing things about the Internet Computer is that it is self-updating. That is, a proposal can be submitted to the NNS DAO to update the protocol using a new binary image produced by some referenced source code, and if the proposal is adopted, the image is then used to update the node machines in the network – entirely automatically. The problem here is that the meaning and processing of blocks can change as the protocol design and implementation evolves. This means it would be impossible (without insane engineering effort) to create a “local node” system that could record a state snapshot today, say, and then every block after, for weeks, months or years, that would be sufficient to re-create the current state, since the logic would have to change whenever a block height is hit where a protocol upgrade occurred. Therefore, syncing a “local node” will always involve downloading a snapshot that is less than an hour old, and applying subsequent blocks – which I feel isn’t what people really want here.

Questions about “opening” the NNS

The essential idea is that by creating a new kind of subnet, people will be able to see the insides of the Network Nervous System/NNS, and every transaction (smart contract function call) that updates it.

  1. Given the aforementioned technical challenges involved in creating a new kind of subnet that can support “local nodes”, if it were decided that the insides of the NNS must be made available to all, then there is an easier way forwards that involves a tiny fraction of the work, which is simply to add functions to the NNS’s smart contracts to allow people to obtain any information inside they want. Of course, alone this would not reveal the transactions (i.e. how users have interacted) with the NNS to create the information being made available. However, with some rather more substantial work, we could also add logging to the NNS, so that anything and everything anyone did within the NNS, such as staking a neuron, configuring a follow, or voting, would be maintained, say for a couple of days, and also made available. Because the Internet Computer’s Chain Key Crypto protocols create subnet blockchains that are tamperproof, there would be no risk that information returned through such APIs were modified by malicious actors, and we could securely make everything available to anyone.

  2. We must ask serious questions about whether doxxing people’s NNS interactions and data would really contribute to a healthy democracy. For example, there are reasons that when you go to a polling booth in an election, your vote is anonymous and private. Arguably, for similar reasons, how much you’ve got staked in neurons, and for how long, who you follow, how you vote on motions, and the links to e.g. your balances of ICP and other governance tokens, should also be kept private. Our community is not immune to toxicity, and those with substantial voting power might find themselves being pressured by zealots keen get their way who wish them to change their neuron follows and/or vote for/against specific proposals. Moreover, they might even be blamed and persecuted for how they voted in the past. The danger is that quiet and reasonable people might be forced out of the governance community, and the useful inputs they provide through voting would be lost.

  3. Doxxing people’s NNS interactions and data could cause both security and regulatory risks. Firstly, it would provide attackers with accurate information on exactly who they had to nobble, blackmail or extort, to push through proposals that further their goals. Where gangsters are involved, and sadly nowhere is completely free of dangerous actors, participants in the governance community could find themselves in horrible and dicey situations, and bad actors might cause some real harm to the network by pushing through bad proposals, for example to profit from shorts on ICP. Secondly, aggressive regulators out to make a name for themselves by harming the Internet Computer ecosystem might seek to hold participants in the governance community responsible for the adoption of proposals that they do not like, according to how they voted. Thus arguably, revealing this information could create serious risks for both individuals and the network.

Summary

As is often the case in blockchain, unpacking problems often reveals them to be more complex than they seem on first sight. It is necessary to understand how the technology works in detail, how it might be modified, and consider the potential regulatory, security, game theoretic, tokenomic, socionomic and economic implications. The potential R&D costs and distraction involved with making NNS interactions and data transparent, should we wish to do that, also have to be weighed against the need to ship and polish many other things, such as the SNS functionality, or Ethereum chain key integration.

A last comment is that we can more easily travel in the direction discussed in the referenced thread with the governance token ledgers. Currently, they are implemented in the mode of a blockchain-within-a-blockchain. That is, transactions are recorded in a hashed chain in which each transaction forms one block that lives on the Internet Computer. This, for example, is what enables crypto exchanges to interact with the ICP ledger via the Rosetta API, and meet regulatory demands that require them to have knowledge of every transaction. We could look at ways to modify this so that the signatures on interactions that ultimately caused a transaction are stored with the transaction. This would make it possible to re-run every transaction on the ledger, and prove that some user did not transfer some tokens that they owned, and therefore still should still possess them, independently of what the Chain Key Crypto protocol of the subnet hosting the ledger says. Note that even here there are non-obvious challenges though – what would be the situation if the user had originally received a balance in question via a transfer made by an autonomous smart contract invoked by a blockchain heartbeat!? The tl;dr is that leaning on the hard math of Chain Key Crypto is much easier.

20 Likes

Thank you, Dom. I have to admit that everytime I read your writing I learn a few new things about IC, and why some decisions were made from a technological point of view.

1 Like

I felt like a first grader reading this. Surely there is a lot I have to catch up on :pray::100: Thanks for the in depth response regarding network systems. It sure does clear the air on some misconceptions that were going around.

1 Like

I had to have my phone read this to me. So much to unpack there. So much to learn.

We well understood the privacy concern with opening the NNS subnets. But shouldn’t that be done in a cryptographic way such as Zero Knowledge or Ring signatures. Hiding the data to only allow insiders to peek creates discrimination to regular end users but privileges to the insiders.

2 Likes

Yes, I have long believed that we should probably make neuron follow relationships and voting inaccessible to even those who had physical access to the node machines involved (and, since the node machines will eventually, hopefully soon, turn on SEV-SNP hardware privacy protection, even to those who have physical access and can additionally break that protection).

The origins of the Network Nervous System can be traced to The DAO fiasco and hack on the Ethereum network in the summer of 2016. Firstly, this revealed the need for better DAO designs, which didn’t depend on a cabal of high-profile human “curators” to make decisions and used mechanisms that couldn’t be so easily gamed (although in the end, The DAO was exploited by a simple reentrancy flaw in its Solidity code, rather than through its vulnerability to nefarious voting strategies, and other game theoretic vulnerabilities that I was worried about). Secondly, because the fiasco resulted in a hard fork of the Ethereum network, the need for a blockchain network that could be easily and regularly updated by a DAO, rather than by cabal of blockchain insiders orchestrating a hard fork among the miners, with all of the political, ethical and security issues, and sheer friction, that hard forks entail, was revealed. Although I had been working on how to build a World Computer, funding limitations at the time meant that my early ambitions were pretty much limited to building a sister version of Ethereum that ran under the control of a DAO, with some limited improvements to consensus.

During 2016, I created a simple proof-of-concept implementation of such a DAO, which was really just a privileged smart contract that had access to special Ethereum op codes invented for purpose. Anyway, if you look at my early proposals for a “Blockchain Nervous System” from January 2017, you’ll see that I proposed storing neuron follow relationships on user devices, in client software that would run e.g. on the phones of neuron owners. This provided a simple way of making the neuron follow graph completely inaccessible to all, and I also proposed introducing random time delays during automatic voting to disguise the follow graph from those temporally tracking voting activity (remember that on a simple Ethereum 1.0-like blockchain, every individual transaction can be downloaded).

Of course, today, almost 6 years later, many aspects of how I would wish governance to work have changed from the original vision, but I personally still believe that there is value in making the follow graph and voting completely private using cryptography. Voting could perhaps be disguised using cryptographic blinding. Disguising the follow graph would be more challenging using cryptography, although I’m sure DFINITY’s cryptographers could find ways of doing that if set to the task, and if not, potentially we could revert to my original shortcut – which involves the creation of special NNS client software to hold neuron follows, which would run on phones and laptops.

A basic case for privacy was expressed in that original post: “One dimension of security we did not mention in the foregoing is privacy. The neurons are managed at the edge of the network for a reason — at a fundamental level, it makes the follow graph and decision process unknowable, since it will be impossible for an adversary to collect the state of the thousands of privately operated client instances distributed around the Internet. This prevents an adversary targeting critical nodes in the graph, for example via extortion or kidnapping, to increase his chances of having proposals adopted. Furthermore, it also means that such critical nodes cannot be held accountable for decisions, for example by angry owners of systems that are frozen, or governments or agencies that believe legal liabilities should stem from decision making.”

One final note. When describing the need for privacy in that original post, I said I wished to protect neuron owners from “angry owners of systems that are frozen.” This should not be interpreted as a wish to support automated governance that generally censors smart contracts for political, competitive or other reasons (i.e. to advance the agendas of a majority whose voting might controls governance). I was talking specifically about the presence of “assassination markets” and “ISIS slave markets.” Currently, the NNS does not perform any such censorship of smart contracts, and such nefarious and highly illegal systems are currently simply blocked by the operators of boundary nodes when discovered (and in the future, it will be for each boundary node operator to decide on their own black list to protect themselves from the consequences of forwarding such traffic).

I stand by the opinion that aberrations such as assassination markets and ISIS slave markets have no place on decentralized blockchains, whose aim is to enrich and improve the human conditions, and one way or another, we must develop acceptable ways of preventing them running on World Computer blockchains, which by their nature are vastly more powerful that traditional blockchains. The purpose of having boundary node operators maintain block lists is to decentralize censorship to those who will bear the practical consequences of forwarding such traffic (typically, a data center will disable a boundary node machine once it is shown to be serving illegal content after a complaint from law enforcement, say, unless remedial action is quickly taken). Hat tip to everyone who advocated for this approach in earlier community discussions.

4 Likes

But doesn’t a subnet signing with chain-key just prove that the subnet nodes came to consensus on the blocks/state, not necessarily that the computation was performed correctly. How does a chain-key signature prove that the subnet wasn’t overrun by malicious nodes and signed invalid blocks/state transitions? Only way to prove correct computation is to run a node yourself and download/run all the blocks and computation yourself (or generate zk proofs of the computation).

7 Likes

@JaMarco, But why should this downloading of all prior transactions/blocks always be necessary to prove state if state is always being confirmed by consensus too?

It’s the same concept as an audit of a company’s trial balance, which is what financial statements are derived from. When financial auditors come in to audit the 2022 year of a company like GM, formed in 1908, they don’t start by “proving the calculation” of its trial balance at Dec. 31, 2022 – i.e., by recalculating all 114+ years of its general ledger transactions for the original company and its thousands of acquired and sold companies since inception. No one does that, nor will anyone ever do that again, even if they wanted to.

That said, it would not hurt to have the equivalent of the Wayback Machine, which tracks prior versions of Internet pages. Someone could store the “since inception” blockchain offline to analyze (and potentially investigate) it in great detail for any anomalies or lack of 100% consensus by the “node auditors”. That could at least provide an additional level of confidence, even if it is not a real-time internal control.

Because consensus doesn’t deterministically prove computation is correct, it just proves that the majority of nodes agreed it’s correct. If that majority of nodes are malicious they could “agree” on invalid computation/state transitions.

3 Likes

Sure, just like any other blockchain, which could theoretically become compromised, either via incorrect calculations or human intervention. This is why I suggested a type of “Wayback Machine” to at least detect this retroactively, even though it would not prevent such a compromise in real-time.

Either type of compromise could even happen on the Ethereum blockchain, though the latter (human intervention) would be more likely. The majority of Ethereum nodes are controlled by only a handful of EVM node providers and a couple of major hosting providers. Even without coercive force upon the few individuals necessary to take over the majority of the Ethereum blockchain nodes, all it would take is a billionaire with money to burn. Based on my understanding, that billionaire could simply spin up enough nodes to change the state of the Ethereum blockchain, since it is based on permissionless node providers (unlike the IC, which requires NNS permission and approved hardware to join).

However, as we have seen, a blockchain compromise could happen far easier than in that scenario. Ethereum, as the largest smart-contract blockchain in the world, has essentially been compromised already. The censorship power of a single, bugger-eating bureaucrat waiving around a little memo was all it took to threaten node providers enough that they were willing to shut down Tornado Cash and a whole list of censored ETH addresses. So is any smart contract blockchain truly “unstoppable” or “non-censorable” today? Not likely.

EDIT: One more quick point. What I like about the IC node providers being restricted by NNS approval and consistent hardware standards is that 100% of all divergences from majority consensus can be easily investigated, as they have been in the past (per DFINITY R&D folks). There have been almost no divergences at all since genesis, let along any that would come close to threatening a majority consensus. The few that did occur were all related to bugs or similar software/hardware upgrade issues, again per DFINITY R&D. Such an investigation would not be possible or even legally feasible to conduct upon all the diverging Ethereum node providers. Think about it.

1 Like

This is my biggest gripe as well…

Just being able to verify that the IC performed and signed a computation does NOT necessarily guarantee that a computation was performed correctly. That is why all other blockchains allow you to download and verify their entire state and state mutations.

1 Like

Stepping back for a moment from the specific workings of different blockchains, we can generalize, and say that all good blockchains are versions of Byzantine fault tolerate (BFT) networks. That is, they use protocol math that guarantees that so long as the proportion of “faulty” nodes (i.e. nodes that can behave arbitrarily, including going offline, corrupting data, subverting the protocol, and colluding with other faulty nodes in any way they choose) stays beneath a defined “fault bound,” then the network will continue running correctly.

What we care about is the probability that liveness might be lost, or that an adversary might have cause some invalid state transition. We also care about the presence of vulnerabilities to specific kinds of attack, such as DDoS, which might cause the network to lose liveness, say. To understand the probabilities, we must mathematically analyze the protocols, and also analyze the probability of the fault bounds being exceeded given the specific network model being used. We have to be careful here, because the mathematical analysis of BFT protocols, cryptography, and aspects of complex network designs, and reasoning about fault bounds is difficult, which has led the existence of simple blockchain design rubrics that are easy to understand, but oftentimes become misunderstood as necessary and immutable laws of design. The Internet Computer community often bumps up against these, because it takes a different approach to nearly all aspects of blockchain design.

As I mentioned in earlier posts in this thread, we must define “blockchain” by the properties it can provide as a platform, such being tamperproof, being unstoppable, and supporting autonomy, not by specific modalities of design, including whether someone can verify signatures on the transactions processed themselves, to add security. What matters, is that the blockchain provides guarantees that an invalid transaction isn’t processed, say, which requires more sophisticated analysis. The Internet Computer is a Generation III blockchain, which does not work like earlier blockchains in nearly any way, but it provides the key properties that define blockchain extremely well.

Regarding the subnet blockchain design used, I think what you allude to is that if a large portion of a subnet blockchain’s nodes were faulty (i.e. controlled by an adversary), then they could in principle modify the subnet’s state without being detected. This is correct in theory – but what matters with any theoretical possibility, is not just its potential impact, but the probability that it could occur. Also, what we discuss here is just one of many failure modes that can be experienced once a blockchain network’s fault bounds are exceeded and its Byzantine fault tolerance stops working. For example, although blockchain networks incorporating vast numbers of validators that check the legality of every transaction and state transitions prevent unobserved bad behavior, they are often much more susceptible to adversaries gaining control of their networks, and should that occur, the validators performing the checks cannot also prevent the creation of new branches that reorder the transactions, perhaps allowing the adversary to double-spend a large token transfer, or wreak some havoc within DeFi in a way that profits them. Even when such network’s fault bounds are not exceeded, simple design weakness can allow for this problem or that, as we see with miner extractable value (MEV) on Ethereum. The importance of all potential issues, is some kind of product of the impact they would cause with the likelihood they will occur.

The question we must ask, then, is what is the likelihood that a “fiduciary subnet” on the Internet Computer, say, gets so corrupted that it can whir away doing illegal state transitions unnoticed. Such a subnet is currently comprised from a minimum of 34 node machines, although the protocols used actually scale very nicely, and allow subnets to be constructed from hundreds of nodes if desirable. The nodes involved are selected by the NNS using a system of deterministic decentralization. This means that they are owned and operated by independent node providers, and installed in different data centers in different geographies and jurisdictions. This is very different to randomly selecting anonymous nodes in a typical Proof-of-Stake network, within which an adversary might be running a large number of nodes – such that if the random selection of a group is unlucky, nearly all the nodes might belong to the adversary. Moreover, an adaptive adversary will find it very difficult to corrupt honest nodes (or more specifically, their owners). Node providers are not anonymous, and if they maliciously collude and are found out, they might be held accountable for their actions, and any node provider approached by the corrupting adaptive adversary might only pretend to be corruptible, continuing in the mode of a mole to expose their operation. Further, where node providers are companies, numerous people within their organizations might choose to blow the whistle on suspicious activity.

Internet Computer subnets are 3f+1 Byzantine fault tolerant (which, according to simple math, is the best you can do in an asynchronous network), which means that less than one third of the nodes can be faulty to ensure that no problems can occur. However that does not mean that any problem can occur if one third or more of the subnet’s nodes are faulty. A supermajority of nodes is required to make the network run, which is 2f+1, Since f is 11, a supermajority is 23 nodes. If an adversary wanted to modify the network in secret, they would need to have installed at least 23 faulty nodes, which would a) stop talking to the remaining 11 correct nodes, b) perform the illegal state transitions desired, and c) start talking to the correct nodes again after a sufficiently long period of time that they are forced to resync using the illegally created state, rather than by re-running cached recent blocks to catchup. The question is, given the aforementioned nature of deterministic decentralization, what is the likelihood of an adaptive adversary corrupting 23 node providers, and getting them to modify the software their node machines run, without getting detected. We claim that this is extremely unlikely.

Before discussing PoS blockchains to cover why they need more validators, let’s first review the three-layer consensus mechanism that Internet Computer subnet blockchain protocols use, and consider what can be achieved to add more security simply by adding nodes, without any need to create a system of external validators.

The first layer is a random number generator, called Threshold Relay, which generates the random numbers using BLS threshold signatures (each signature is itself threshold signed to produce the next random number), which is possible because BLS signatures are unique and deterministic. The second layer is a highly-consistent blockchain protocol, called Probabilistic Slot Consensus, which is driven by the random numbers. Only blocks that are fully validated can be added to the chain. The third layer finalizes blocks in the chain, and is called an Optimistic Asynchronous Finalizer, which instantly anoints blocks as finalized when it is successful, such that they cannot be overtaken by a new branch, making it unnecessary to wait until a block is buried to some depth. The nodes (“replicas”) only process the transactions in blocks once they have been finalized by this third finalization layer, which means that nodes never have to “rewind” its state after a branch collapses.

The blocks added to the chain by the second (blockchain) layer must be signed by a 2f+1 supermajority, which validates that they contain no invalid transactions (i.e. that there are not transactions that break the rules, and none which are incorrectly signed). If a correct node receives an invalid block, then the signatures on that block by faulty nodes would create an incontrovertible cryptographic proof that the faulty nodes were indeed faulty. So to advance the chain illegally, they need a supermajority, but they also need the supermajority so they can exclude the correct nodes to prevent them bearing witness. The exclusion of the correct nodes would inevitably raise attention, but that’s another matter.

I don’t want to go too deep here, but the tl;dr here is that rather than modifying the architecture, if we want/need to increase subnet security, and reduce the chance that some adversary could do this, the easiest means is simply to add more nodes, not adopt an old-fashioned validator model. It can be made arbitrarily hard for an adversary to corrupt a 2f+1 supermajority of faulty nodes by increasing the size of the subnet, and thus the number of node providers they must corrupt. For example, what if there were a thousand node machines from a thousand node providers! It would surely not be credible that an adversary could manage to bribe 667 node providers, and persuade them to collude, all without being reported and caught. The point here is that security exists on a cost curve, and after a while, security becomes so strong that additional expenditure provides near zero gains. While we can make a subnet arbitrarily secure by adding additional nodes, the gain from adding each additional node diminishes, while the extra expense involved with adding the new node, and thus replicating the protocol cryptography and smart contract data and compute, on an additional hardware device, remains constant. At some point it makes no sense.

Finally, it’s worth touching on why traditional blockchains need so many validators. At the time of writing, Ethereum has 488,586 validators slurping down Ethereum blocks, and checking that they are correctly formed, thus replicating the ledger and its currency transactions, and smart contract data and computations, 488,586 times. This is unbelievably expensive. These validators, in vast majority, are really just software instances running on centralized cloud services, rather than dedicated hardware, but still, if we imagine them as computers arranged in a line, 1 meter apart, then the line would stretch for an incredible 488km. Anyone can see that that’s a lot of checking, and I certainly struggle to find good justifications for the expense as a crypto theoretician. There are other benefits that don’t relate to security though, and these might be important: widespread staking in validators reduces the supply of ETH on the market, helping defend its price, and those staking often become vociferous advocates for the network. However, although Ethereum does not need such vast numbers of validators for security purposes, generally speaking, Proof-of-Stake networks hosted by anonymous validator nodes do indeed need more nodes for security purposes than if they were using deterministic decentralization, say.

Firstly, such networks need to prevent “Sybil” attacks. These occur when an adversary, who is anonymous, can keeping adding additional faulty nodes to the network, until he overcomes its fault bounds, and make it do bad things. For example, an adversary might accumulate a large holding of the blockchain’s native cryptocurrency token through a hack, say, then use it to spin up large numbers of validators. To prevent him creating enough validators to breach the network’s fault bounds, the network needs to be hosted by an enormous number of validators (which we take as a proxy for stake), which will make it too expensive for him to reach the fault bounds, since he will eventually run out of resources. In this sense, Proof-of-Stake networks hosted by anonymous nodes are similar to traditional Proof-of-Work networks (the Internet Computer is Proof-of-Useful-Work, which is very different), which require miners to do an enormous amount of hashing, which makes it infeasibly expensive for a malicious miner to obtain 51% of the overall hashing power, which would allow them to perform a double-spend attack or stop the chain. Obviously, if you want to go to the next level and provide a World Computer blockchain, this kind of approach is not a good idea.

Secondly, in these kinds of network, the probability that a subset of nodes chosen to construct a shard, or consensus committee in a larger network, say, contain a number of faulty nodes/validators that exceeds the fault bounds of the protocol, is calculated using hypergeometric probability (you can find a good online hypergeometric calculator here), since the anonymity of nodes always makes their selection random. It’s rather like having a bag full of blue marbles, which are correct, and red marbles, which are faulty, and drawing some number from the bag without looking, and without replacement, and then seeing if the proportion of red marbles is too high. For example, if a dynamic shard were created with 34 nodes, drawn from a population of 5000, where 2000 were faulty having been created by a Sybil attacker, then the probability that the number of faulty nodes >11 is 1-0.2322786150022 = ~77%. This is obviously unacceptable, and is another reason why Proof-of-Stake networks require vast numbers of validators to constrain by expense the proportion of faulty nodes that a Sybil attacker can add, and must also use larger group sizes. For example, if Ethereum had 450,000 nodes (let’s pretend they have equal stake for simpler math), and only 50,000 belonged to the resource-constrained adversary after his Sybil attack, then the probability that a randomly consensus committee of 115 nodes selected by the Beacon Chain contains >38 faulty nodes is 1-0.9999999999817 = 0.00000000778%, which is fine.

Important note: fears that large numbers of nodes in Proof-of-Stake networks might belong to an adversary are not overblown. Proof-of-Stake “validator nodes” can nearly always run on cloud services, making easy to spin up thousands in seconds by running a script, perhaps using stake obtained through a hack (this compares to the difficulty of spinning up dedicated physical hardware devices, such as Internet Computer node machines, which must be built or purchased, then configured, then installed in data centers, which will also involve arranging minimum-term contracts for the racks used). Moreover, there is a significant risk that the cloud service running the validators, or some malicious employee working there, could surreptitiously commandeer the validator nodes themselves, and turn them into zombies that do their bidding. The potential for things to go wrong was demonstrated recently when Hetzner flicked a switch and took more than 1,000 Solana validators in the blink of an eye, which number represented 40% of the Solana network.

11 Likes

A supermajority of nodes is required to make the network run, which is 2f+1, Since f is 11, a supermajority is 23 nodes.

Is this purely because of the 3rd bullet point below (i.e. to certify the replicated state)?

Dom, thank you for taking the time to explain the technical reasonings behind the IC.

I agree, having a network with 450,000x replication is extremely wasteful. Lots of work is repeated when a lesser degree of replication will do.

However, the benefit of networks like Ethereum is that ANYONE can join and start verifying blocks. This is not the case with the Internet Computer because becoming a validator is permissioned via the NNS. Thus, only node providers at the moment can tell if the state transitions executed by other node providers are legitimate.

What I would like to see, and I think others would as well, is a way to download and verify blocks on subnets. Is this possible? We needn’t download the entire state of the chain, perhaps given an initial input state, one could follow the state transitions of canisters and verify that the transitions are progressing as expected. Sort of like a read-only node.

The problem that I notice in the community is that the IC is opaque to everyone but DFinity and node providers. We would like to see with our own eyes that the IC is functioning as expected. I think it would help build a lot of trust.

6 Likes

Is this purely because of the 3rd bullet point below (i.e. to certify the replicated state)?

(repost because didn’t quote @jzxchiang above)

Yes that’s an example. Note that for some things f+1 is sufficient, since a) there are assumed to be at most f faulty nodes in the subnet, so b) a quorum of size f+1 will include at least 1 correct node. But this only works where no “equivocation” is possible, such as in production of the random beacon. In that application, there can only be 1 one prior random number/signature to sign, since BLS is unique and deterministic, and the next signature (which is a random number) on that prior number will also be unique and deterministic, such that no fork is possible (it gives you consensus on a sequence of numbers without consensus…). So there is no wriggle room. But, when you get into consensus protocols things change, because faulty nodes can equivocate. For example, faulty nodes could sign e.g. two different states, or blocks, or finalizations, say, and suitably partitioned correct nodes (which cannot see both states) could be coaxed into helping to sign both (the network is assumed to be asynchronous, so in theory, just by accident, and anyway, the correct nodes might not detect the “fork” but the model also considers that the adversary can could interfere with message delivery over the network to allow their scheme to get proceed). The only way you can prevent this is by requiring 2f+1 signatures. Incidentally, the math behind this result is pretty simple and there is no way to get around it in an asynchronous network (where message delivery times are not guaranteed). Many blockchains have been designed in ways that pretend this result doesn’t exist, usually because the architects seem not to know about the result for some reason. We know this is the case with Solana, for example, and that’s why it kept running when Hetzner took down 40% of its nodes (more than 1 third). The result is that simple network perturbances, and nodes processing TX at different rates, can make it go wrong and fork, which has happened quite regularly, and indeed, with time, it would be easy enough to add a modified validator to its network that could fork their chain on demand.

4 Likes

My point is: the only reason that this feature appears to be necessary and/or beneficial, is that “blockchain community consensus” provides misleading social proof that this is somehow technically necessary and good. This leads people to feel that the crypto math cannot be trusted, and that they themselves should be able to download every once-per-second block from every subnet and rerun all the transactions contained to replicate and check the multi-terabyte blockchain states, to ensure that no funny business goes on. After all, how could this not be necessary since this is how every other blockchain works, and they advertise the thousands of cloud validators that keep them safe…?

But the math does says otherwise (as it does about the safety of many other leading blockchains who use this model). In end, we have to trust the math and stay the course with the science, and eschew misleading blockchain rubrics. By adding a feature that enables people to slurp blocks from subnets and replicate their state, we are effectively saying that this is technically necessary, feeding the falsehood, and soon people will believe that any subnet not being replicated by an army of local nodes is unsafe.

There are also several technical challenges involved with this implementing feature (this is not to say they cannot be solved in special case, such as the NNS subnet). Crucially, it is not possible for node machines to store all past blocks forever, since they would eventually consume all their memory, and the math used doesn’t require them to do that. This means that if a local node becomes disconnected for sufficiently long, it would have to resync from a recent state, rather than from the blocks it missed: thus, a malicious subnet could simply disconnect the local nodes while it illegally modified the state, keeping them disconnected for long enough they would be forced to later resync from the illegally modified, once the blocks that would have revealed the mischief are no longer available. And there are other issues too. The Internet Computer is just not designed to work like a Generation I or II blockchain – albeit with a lot of work we could find solutions, but why?

I also strongly debate that removing the weak-medium privacy that the Internet Computer provides to smart contracts is a good idea. Most people like privacy. Currently, the only way to get data from a smart contract, is by interacting with it. It chooses what data somebody should be able to access. Unless you have physical access to a node machine, and can poke around on its SSDs. However, this isn’t much different to the situation with cloud providers, where they could in principle also look at the private data of services they host, and, soon we can enable SEV-SNP on node machines, which will prevent even those with physical access snooping, unless they have special skills and are willing to invest a lot of time.

We can argue that because the NNS runs the whole network, that it needs special oversight, and it is a special case. I am not against that reasoning in principle. But regarding risk, I think I am far more concerned about the consequences of doxxing everybody’s voting, neurons, balances etc in the NNS. Democracy depends on privacy to prevent coercion and reprisals. The real reason that some are demanding this technical feature has nothing to do with mathematical security, and lots to do with a feeling that they want to know why proposals they submit are not adopted, and perhaps a subconscious corollary belief that if those voting against them could be revealed, they would find it easier to get their way. To me, that looks like a potentially disastrous way forwards.

8 Likes

Currently, Dfinity seems to host a good amount of nodes. Taking that as a concern, technically Dfinity has the access to NNS data, while others don’t. Talking about SEV, what if SEV is not properly set up on the node machines that lead the node providers to potentially leak the data

What people should understand is that someone can say one thing (an idea) in say Chinese and when it is translated to English it would mean something else.

But math is the language of the universe. Numbers don’t lie. Mathematical proofs are the same no matter in what language they are transcribed.

That is why makes sense to use math to communicate ideas.

1 Like

Regarding privacy&security issues, having no block histories makes DEFI teams and users feel really unsafe.

I don’t see any concern for DEFI teams because what seems to be going on is actually 2 questions:

  1. Do we want public networks? which seems to be an easy yes IMO, so DEFI teams should be feeling just fine;
  2. Do we want NNS to be public? this is not clear because we may want to keep some information private like our staked maturiry, voting history and so on