Verification of Node Data

Background and Goal

The Internet Computer Protocol (ICP) operates on a network of independently owned and globally distributed node machines. A registry tracks specific characteristics for each node, including the node provider, data center, data center provider, and the country location. These characteristics are vital as they enable the ICP to achieve governance-approved decentralization targets, which are essential for the network’s security.

The data for each node, provided by the node providers themselves, is recorded when they join the network. This information undergoes a vetting process by the community during the onboarding of new node providers. To support the network’s growth and the continued reliability of node data, it is recommended to enhance the auditing of this self-declared information. The proposed enhancement of the auditing process should occur not only at the initial stage of node provider onboarding but also on a continuous basis thereafter.

In this forum post, we would like to share an initial collection of ideas for enhancing the auditing process. We invite the community, especially the node providers, to share their feedback and insights.

Potential auditing measures

The following strategies to enhance the auditing of node data have been identified for further discussion:

  • Node provider identity verification
  • Contractual transparency
  • Algorithmic monitoring of geographic location
  • On-site audits

We will elaborate on each of these strategies in the following sections. It is important to note, that these measures, while improving the system, do not guarantee 100% accuracy of node data. Rather, collectively they raise the bar to make the node data more reliable.

Node provider identity verification

Node providers play a crucial role by hosting and running the network’s infrastructure. To maintain transparency and accountability, their identities are made public. This allows the community to assess potential providers during the onboarding process and ensures providers can be held accountable for malicious actions.

For the first generation of node providers, the DFINITY Foundation conducted identity verification checks on node providers through a third-party service, a step deemed necessary for the network’s initial setup. As the network transitioned to onboarding the second generation of node providers, the process evolved. Node providers now submit their identities through a self-declaration form. However, a precise standardized protocol detailing what specific information suffices for thorough identity verification is not yet available.

Hence it is suggested to standardize this process in more detail. Below are key considerations we propose for creating a more robust framework. This is not an exhaustive process yet but rather a foundation for developing one:

  • Distinguishing between entities: Individuals and companies present unique challenges in identity documentation. It is essential to differentiate not just between individuals and companies but potentially also among various types of companies, each capable of providing different forms of identity proof.

  • Verification scope: For individuals, the process should collect enough data to uniquely identify a person. This includes their name, date of birth, address, and passport number. For companies, the verification should confirm the company’s legal existence, its physical address, and the authenticity of its representatives’ authority.

  • Handling of sensitive information: To ensure privacy, sensitive information such as dates of birth, addresses, and passport numbers should be stored in an encrypted format, for example, using vetKeys once available. Access to this encrypted data should be strictly controlled and only decrypted following a Network Nervous System (NNS) vote.

  • Decentralizing verification: Future identity checks should avoid reliance on a single verification provider. Furthermore, to ensure integrity, the identity verification confirmation report should be issued directly by the identity verification provider to the Internet Computer community.

Contractual transparency

Node providers, when deploying machines in data centers, engage in contractual agreements with both data center providers and Internet service providers (ISPs). To enhance transparency within the network, it is recommended that node providers either publicly disclose these contracts or provide a confirmation of the contract’s existence and its duration. Recognizing that certain details within these contracts may be sensitive and confidential, it would be acceptable to redact such information before making the documents available to the community.

This approach aims to bolster transparency by affirming the existence of contractual relationships between node providers, data center providers, and ISPs. Additionally, it allows community members to verify that the indicated data center providers and ISPs exist and appear trustworthy.

Algorithmic monitoring of geographic location

This method aims to validate the geographic locations of nodes after they have been onboarded, and hence helps to verify that the Internet Computer meets its geographic decentralization targets.

It is suggested to employ two primary techniques: analyzing IPv6 addresses and measuring round-trip times.

  • IPv6 address analysis: This technique uses IPv6 addresses to estimate a node’s geographic location. However, it’s important to acknowledge that this method has limitations. The use of VPNs can mask the true location of a node by altering its perceived IP address, which may lead to inaccurate location estimates.
  • Round-trip time measurement: To complement IPv6 address analysis, the measurement of round-trip times provides an additional layer of verification. This method assesses the time it takes for a signal to travel to the node and back, offering insights into the node’s physical proximity based on the delay observed.

Please note that the ICP boundary nodes already ping all nodes every 10 seconds, though this data is not stored or made publicly available yet. While the implementation of round-trip time checks can trigger alerts and facilitate discussions, it should not be interpreted as definitive proof of a discrepancy.

The roll-out of such a monitoring could be done in phases. In the first phase, metrics on round-trip times could be provided by the boundary nodes to the community. Although these nodes are presently operated by DFINITY, it is planned to decentralize them in the future. In a second phase, one could develop a monitoring toolkit or integrate the measurement of round-trip times directly into the protocol to automate the monitoring.

On-site audits

The core concept of on-site audits involves dispatching individuals to physically verify the presence of node machines at the declared data centers. This measure would help to verify that the Internet Computer meets its geographic and data center decentralization targets.

These auditors, in collaboration with the node provider, would inspect the data center to ensure that the machines correspond to those registered in the ICP registry. This would require a protocol to conduct this verification, for example using latency check via the local network.

While this method offers the most robust form of verification for the physical location of node machines, it also comes with logistical challenges and costs.

Several points require further discussion:

  • Implementing mechanisms to ensure the integrity and reliability of the auditors.
  • Determining which subnets to target for these audits. It might be practical to focus primarily on highly critical subnets, such as the NNS or Fiduciary subnets, given the high resource demands of on-site audits.

Consequences of Incorrect Node Provider Data

If the community identifies inaccuracies in the data provided by node providers, the following actions could be taken to maintain the integrity and trustworthiness of the network:

  • For New Nodes: Proposals for onboarding new nodes found to have incorrect data may be rejected.
  • For Existing Nodes: In the case of nodes that are already part of the network, discovering incorrect data could lead to punitive measures triggered by a proposal. These may include slashing rewards and initiating a proposal to offboard the nodes from the network.
9 Likes

I can tell you I get pinged on a routine basis from Coinbase to verify my account information.

The security bar for the IC network should be much higher. Node providers should be pinged on an ongoing basis PLUS in person audits conducted.

This should be a non negotiable and inflation should be allocated to pay for such audits. It should be built into the system itself per country.

2 Likes

@bjoernek All these points are fair and action points are well planned.

In the point geographic location. IPv6 address verification might not be as effective. This depends on how IP blocks are signed, especially when we use a Tier 1 ISP. The round trip measure can be one way. Still having issues as also mentioned in your post.

I would like to point out a very important statement on SEO crawling of sensitive data. A lot of NPs upload documents in terms of verification, purchase orders, and maybe in future contract agreements to ICP wiki. I would like or propose that we can disable the crawling of such pages.

3 Likes

The section about “Contractual Transparency” is not appropriate. We as node providers, provide node machines (according to a pre-defined specification) in NNS approved Data Centre locations and provide the connectivity necessary for the node to optimally perform as part of the IC.

We choose the service providers we want to work with and enter various contracts over time to enable and ensure that our node machines successfully join the IC, and that the performance of these node machines remain within the community’s expectations.

Almost all these contracts are subject to Mutual Non-Disclosure Agreements to protect the interests of both parties. The existence of, and/or the terms of any contract between we, as Node Providers, and our Service Providers, has nothing to do with the rest of the IC community.

I am concerned about publishing more and more information about us as Node Providers on the public internet. We should disclose only the data that is necessary to confirm that we achieve our decentralization goals in terms of Node Provider identity, data center ownership and geographic location of the node machines. These facts can only be verified with an on-site audit and can’t be proofed by disclosing any contract details.

Remember that we as Node Providers, don’t have any contractual certainty when providing nodes to the IC. Rules can change, rewards can change, expectations can change. It would make sense, given this uncertain environment, that some Node Providers may choose to mitigate their risks by limiting contract duration to the minimum. Others may choose to go for longer term contracts to get the services they need from their service providers at the best possible price. Bottom line – it proofs nothing regarding the actual location of the node machine.

I am all for “Node provider identity verification” from time to time provided that any sensitive information is provided in an encrypted state and can only be decrypted following the acceptance of a NNS proposal to this affect.

“On-Site Audits” that confirms the specific node machines are indeed located within the approved data center, as per the Node Operator Records on the IC, are of utmost importance. As part of the audit, the auditor can also confirm that the Data Center Record accurately reflect the x;y coordinates and the most up to date data center ownership information.

9 Likes

Thanks for the thoughts regarding verification of node providers and the servers. The team has clearly put in lots of thought into how to best proceed with this. I did have some thoughts:

  • Publication of private information is an issue. Although you can ensure privacy through masking information, at what stage does the masking make the information useless to act upon. I certainly do not want my passport number or other information that can be used to steal my identity online. If its masked too much, nobody will actually be able to verify. This may be better stored internally with restricted user access by Dfinity
  • Same thing with contractual transparency. We have mutual NDAs in place, so we cannot share the info.
  • On site audits pose a challenge as I certainly do not trust parties not directly associated with Dfinity in the data centre. The NP is the one responsible for the visitor’s actions, so they must be part of Dfinity, not a random hobbyist. If the goal is to check if machines match, I think this can probably be done remotely. Although it would be nice to verify the machines are there, I am not sure it adds that much benefit as the person will not be able to access the machine’s internals.
2 Likes

First of all, thanks for this. From our perspective, we have several considerations regarding the proposed measures:

  • Sensitive Personal Data Storage: We question the necessity of storing sensitive personal data, such as passports, on the NNS. In our view, the collection of passport data is overly intrusive. As the industry moves towards decentralized identity (DID) systems, we anticipate the adoption of more privacy-preserving methods, such as zero-knowledge proofs of identity, which would mitigate the need for direct data collection. In the meantime, we suggest a more streamlined approach where only a verification report from approved KYC providers is stored, indicating that KYC has been successfully performed. This method not only enhances privacy but also simplifies the data storage process.

  • IPv6 Monitoring: We view IPv6 monitoring as a potentially superficial metric. Given its susceptibility to manipulation, we believe that relying on this metric might not effectively build trust within the community. While we’re capable of implementing this measure, it’s important to recognize its limitations and the fact that it might serve more as a cosmetic solution.

  • Contractual Transparency: We believe that demanding full contractual transparency could be an overstep. Such a requirement could lead to complications with data center providers, who may prefer to keep certain information, such as pricing, confidential. For regulated entities like Sygnum, this level of transparency is not only impractical but could also pose significant challenges.

  • On-site Audits: In our opinion, the most effective and legitimate method of verification is through on-site audits. Despite their high cost and limited scalability, they stand as the most reliable way to achieve thorough verification. We propose that Dfinity should cover the costs of these audits, either through the treasury or by adjusting inflation rates. Looking ahead, if the number of node providers grows to a point where auditing each one becomes impractical, we could consider a strategy of random sampling, coupled with stricter penalties for any violations found.

Lastly, we are unsure about the effectiveness of these four measures in achieving the goal of ensuring decentralisation of the IC. If there’s a will to obfuscate, these measures could potentially be circumvented.

3 Likes

Many thanks for the detailed feedback @cchung !

I would like to provide regarding your comment on-site audits:

I agree with you that ensuring the trustworthiness of auditors and establishing a reliable process is challenging. However, I believe that allowing only auditors from DFINITY would not be a sound approach for decentralization.

Could you elaborate a bit more on how the remote check could work from your perspective?

1 Like

Hi @LBsygnum many thanks for your detailed answer! In order to make sure that I fully understand I would like to clarify a few points

I am fully with you that IPv6 Monitoring standalone can be easily circumvented, as mentioned above. Hence, it is proposed to complement it by measurements of round-trip times. Do you agree with that suggestion?

Just to clarify, when you mention ‘adjusting inflation rates,’ do you mean that the protocol (not DFINITY) would compensate for on-site audits by minting ICP? If that’s the case, it would also align with the suggestion @dfisher made earlier, which makes a lot of sense to me. Conversely, mirroring earlier comments in this thread, I think it would not be sound if only one entity, like DFINITY, would conduct/pay these audits.

Yes, I agree with that point. Not all nodes would need to be checked all the time. Random sampling (potentially with a higher weight on criticial subnets) sounds like a good strategy.

In case you have additional ideas (even if high level) on how to verify node data and decentralization, then it would be great if you could share them.

Thank you for the reply @bjoernek

For the auditor portion, I think a good first step to get something set up quickly would be auditors from Dfinity to establish a guideline for future audits and the processes involved. For future auditors, if there is a guideline for acceptance to be an auditor similar to the Node Provider Application, then I think that would be fine. There would have to be economic incentives to do such an action as well as some sort of economic disincentive to prevent malicious acts. Maybe some sort of staking mechanism?

The other thing is that they should be escorted by somebody on the NP team to make sure that they do not damage anything at the data centre. This would be important given legal/insurance issues that may pop up, and will require forward planning with enough notice period given. Auditors would not be able to go to the data centre anyway themselves.

Regarding the remote check portion, it depends on at what level of detail you want when checking the hardware. It’s probably possible to set up a fairly low-privilege user account with SSH access, which would allow to pull info about the CPUs, how much memory is available, what storage is available, etc. But if they need anything more specific (like memory speed), that might require admin permissions, and we would honestly rather physically escort someone to a data centre than give them admin access via SSH.

This also assumes theres nothing sneaky like redirecting SSH connections for that user to a VM you set up with a bunch of spoofed hardware details.

3 Likes

Hi all,
many thanks for all the great feedback received in this thread! Here is a suggestion for next steps

Datacenter On-site Audits
It seems that there is a general agreement to pursue this further, with some concerns noted that we have to ensure the trustworthiness of the auditors.

To address operational complexities (mirroring comments made above),it is proposed initiating a pilot project. This will involve forming a small group of trusted community members and DFINITY representatives. The group’s mandate and funding—either through inflation revenues or a DFINITY grant—could be approved via an NNS vote.

This team will be tasked with developing an audit process and implementing it at selected data centers. We suggest initiating work on the pilot as part of a Node Provider ICP Lab meeting.

Algorithmic monitoring of geographic location
It is suggested to build an off-chain tool PoC to process and analyze IP addresses and round-trip times as measured by the boundary nodes. If the PoC is successful and insightful, we could consider to integrate this tooling in the ICP dashboard (or another suitable place).

Node provider identity verification
There is general consensus on the need to strengthen this process, although concerns about handling sensitive data have been raised. We should establish clear standards for the required documents and the verification process. It is recommended to discuss this further in a Node Provider ICP Lab meeting.

3 Likes

Here is the landing page of the Node Provider ICP Lab workshop which I mentioned above, where we could work out further details for the node data verification.

It is organized by @SvenF and @louisevelayo and will take place May 13 - 15, 2024.

2 Likes

Personally I believe that zero knowledge verification is an extremely promising technology to balance privacy and verification concerns. I just want to say that and urge the community to look into which zero knowledge verification solutions can be incorporated now.

I understand that the current solutions are probably immature, but once robust the promise seems enormous.

Here are some potentially useful projects:

zkPass
Reclaim Protocol
Holonym

Not zk-based but a potentially useful concept:

Gitcoin Passport

I really like the Gitcoin Passport idea where “stamps” can be collected that increase the overall sybil resistance score of the holder.

It could be interesting to see this kind of solution implemented for Node Providers, where each node provider has a public account with some kind of identifier and score. Private information made public is kept to a minimum, but a score and maybe even individual “stamps” are shown for each node provider public account.

Then node providers could pick and choose which solutions they would like to add to boost their score. Some might want to engage in audits, which might increase the score by a lot relative to the other solutions. All solutions would have their scoring mechanisms, perhaps voted on by the NNS eventually.

This could prove to be a very flexible, composable, and extensible system.

3 Likes

Just to add to the list of @lastmjs (super cool btw!).

There is DocuTrack that is being developed at DFINITY and could also be a potential solution!