Long Term R&D: Malicious Node Security (proposal)

Malicious party security

Objective

The internet computer is designed to run software in the form of canister smart contracts in a very secure and reliable fashion. In simple terms, it means that many replicas together form a subnet, and even if some of those replicas are malicious and misbehaving, the internet computer still guarantees two important properties:

  • Liveness: the canister smart contracts on that subnet should still be available and process incoming messages
  • Safety: the state of a canister only transitions according to the rules of the canister (i.e., tampering with a canister state is impossible, and in particular rolling back / “double spending” attacks are impossible).

The internet computer protocol by design has a very simple argument that guarantees safety, which can be observed simply by receiving sufficiently many cryptographic signatures. Liveness however is harder to achieve in practice: if peers in the subnet are malicious, they can send a lot of useless messages, trying to waste the bandwidth or CPU of an honest replica.

Why this is important

The IC should be able to withstand and defend against all sorts of adversarial and non-adversarial behavior that may affect its liveness, safetly, performance and/or fairness. In particular, the IC should cope with DoS and other attacks that aim to waste its resources.

Beyond spamming the network by sending invalid, incorrect, irrelevant (e.g. duplicate, expired) or explicitly unwanted artifacts, every behavior that is capable of wasting system resources like advertising and then withholding requested artifacts is considered a form of misbehavior.

Background

There are many different ways in which a replica can misbehave. Some are relatively simple: if a node sends some signed artifacts that deviates from the protocol (no honest party would ever send such artifacts), then this node is clearly misbehaving and there exists evidence. Other types of misbehavior are much harder to act upon: if a node never sends any message, there is no hard evidence of misbehavior.

We can classify the types of misbehavior based on whether a node can deterministically detect and prove them to other nodes in the network. If certain malicious actions are not provable, the affected victims could still file a complaint and let the system punish the bad node or user if a threshold of (e.g. > n/3) has complained against the same node or user.

Deterministically detectable misbehavior

This category of misbehavior is unequivocally detectable based on specific action or received artifact.

Publicly blamable:

  • Node sends an unrequested or invalid artifact/advert/request
  • Node sends the same artifact twice
  • Node withholds requested artifacts
  • Node sends too many artifacts (e.g., state sync requests) per time unit

Publicly provable:

  • Equivocating blocks
  • Equivocating signatures
  • DKG dealings from non-dealers

Probabilistically detectable misbehavior

A node can only probabilistically infer a misbehavior by statistically analyzing communication patterns of a peer.

Publicly blamable:

  • Node regularly responds to artifact requests late (but before timeout)
  • Node sends “too many” errors to requests.

Publicly observable:

  • Node fails to produce blocks most of the time, i.e. the fraction of the blocks in the final chain is below a certain level

Collectively detectable

Some adversarial communication patterns may only be detectable on an aggregate basis by multiple nodes.

Publicly blamable:

  • Node selectively advertises some artifacts to some of the peers but not to others.

Key milestones

  1. Replicas temporarily disconnect from peers that send invalid artifacts
  2. All data structures are of bounded size, and malicious peers cannot prevent an honest node from staying up-to-date on the blockchain and state of the subnet
  3. Statistically deviating replicas can be identified by the protocol
  4. Action is taken against provably misbehaving replicas, and such replicas may be permanently removed from the internet computer by the protocol
  5. Action is taken against statistically deviating replicas, and such replicas may be permanently removed from the internet computer by the protocol
  6. No bad peer can deteriorate the throughput and latency by more than 20% in and across subnets of at least 50 nodes

Discussion leads

@yvonneanne, @Manu

Open Research questions

  • What are relevant metrics for correct participation in the Internet Computer Protocol?
  • What are suitable observation windows and what are the expected and tolerated deviations from aggregate metrics?
  • How can the traffic be shaped to utilize the available bandwidth between nodes in the same and different data centers efficiently despite malicious nodes?
  • How can one design OS and networking scheduling algorithms to provide Quality of service guarantees despite Byzantine players?
  • How can we prove that the internet computer protocol is live while maintaining bounded data structures?

Skills and expertise necessary to accomplish this

This project will require a wide range of different skills. One the one hand, this is a theoretical problem, and requires academic distributed systems and cryptography experts. On the other hand, it requires empirical testing, which includes testing how specific malicious behavior is handled, but also chaos engineering and fuzzing techniques.

What are we asking the community

  • Review comments, ask questions, give feedback
  • Vote accept or reject on NNS Motion
  • Participate in technical discussions as the motion moves forward
4 Likes