Let's solve these crucial protocol weaknesses

Yes. Anyone can record anything. The next unit who is picking up these messages will have to decide which ones should be used.

As with all other off-chain execution models, things recorded on-chain are only there for availability. How to interpret them is up to the code running off-chain excecution.

3 Likes

Ugh. Ok, I thought they did something smarter.

4 Likes

Where were you guys when the fucking Justin Bons? Wait, I know! Working doing ICP the best blockchain of the earth.

3 Likes

That’s in the first year. Costs grow quadratically over time. So in 10 years you’d have to pay 50x more per year. Storage costs may decrease over time, but quite possibly not at that rate.

Why quadratically? Sure with more use, but is there something else that make it grow faster? Costs also decline according to moore’s law(or have historically) so 5000 should be less than 78 in 10 years.

Yes…this completely depends on if there is a use case. I think I have one for tokens, but likely less important for websites and other static files. Although it can be hard to know what might be interesting eventually. Data is getting more and more valuable with AI. I guess the question to ask is what is in the block data? And what is in the block data that a contract couldn’t record itself if it were interested in eternal persistence? (Trying to understand the actual bytes…I don’t think a bunch of trace info about who has signed would be interesting. calldata very well might be).

To me this begs the question of why aren’t we setting up some kind of staking for boundary nodes? The certification dance is a pain(especially for dynamic queries) and some financial guarantees would be interesting…maybe needs its own thread.

5 Likes

My dots could have been better and better laid out. I even screwed up a vocab word(thanks @timo … fixed above.). But to be fair I did it on an iPhone while drinking beer by the river with my kids throwing rocks far too close to me. :joy:

As far as why I think and evm on ao would be slower is due to the messaging going through the arweave layer. Depending on your trust assumptions, this is going to add latency over the ic model(maybe the IC model needs to come down on the security axis a bit due to lack of economic slashing security).

1 Like

My cu running from the IC is going to be slower than what may be possible on an unbounded cu, but mostly because of the tecdsa signing(if you are ok with an on-node key then this could be faset…but less secure…enclaves would make this go away). But I think its going to be fast enough for ‘most’ applications that are typical in crypto land. It won’t be able to pick up and run any old cu wasm. But you’ll get 13x node agreement out of the box. So some trade offs…we’ll see once I’m done if it is practical.

Unless the nodes writing the output are all on the IC and agreeing to sign with a common tecdsa key and that is the security(You only trust messages signed with that key). :). (The eventual question this begs is why use the ao layer then? My very early theory is only memetics or a very specific kind of compute that makes other hoop jumping worth it.)

1 Like

Because I’m stupid. I was thinking of the total cost. After 10 years you would have paid 1 + 2 + 3 + … + 10 = 55 times as much as what you paid after one year. But the growth is linear, not quadratic. Only the total cost is quadratic.

Regardless, trying to predict the future (especially when it comes to costs) is fraught. E.g. looking at this chart I found online (and assuming it’s vaguely accurate), the cost of disk storage only dropped from $37 to $13 between 2011 and 2022. That’s less than 3x. More than one order of magnitude away from what Moore’s law would have predicted.

Edit: I just realized I hadn’t linked to the chart: Historical cost of computer memory and storage - Our World in Data

I don’t think that staking addresses the issue that Paul brought up: yes, you can speed things up, but if you have a long chain of transactions spanning multiple canisters/subnets and one of them towards the beginning turns out to have been incorrect, what can you do about it?

And if you’re going to wait for certification whenever a chain of transactions is involved, how is that different from issuing an update and a query at the same time and temporarily using the output of the query (e.g. displaying it in a front-end) until you get the certified response from the update?

2 Likes

Lots of great discussion here on many topics here! I’ll quickly reply wrt instructions limit:

I don’t think you can certify things per canister, this might work for 10 canisters on a subnet but seems entirely unfeasible for 100k canisters on a subnet, because creating threshold signatures with all subnet nodes is still significant work.

I agree this would be great. Currently I don’t think we know how we could extend DTS to go over checkpoint boundaries, so that’s one limitation we have now. We are planning to propose to double the DTS instructions limit (from 20B to 40B), so that should already help here. Increasing beyond that would likely be more involved, so I am not aware of any concrete plans there.

6 Likes

Incredible depth of knowledge with clear explain! I learned a lot from this!

1 Like

It’s only sort-of transparent. If your process does actually exhaust the RAM, the system will likely collapse for all practical purposes due to thrashing. Another example are supercomputers; in theory, they expose petabytes of RAM, but generally use a NUMA (non-uniform memory access) design, so you have to design your app quite carefully, otherwise it’s going to be extremely slow. So straight up “infinite memory” doesn’t exist in Web2, even in the supercomputer world, and that’s a world with highly specialized (and expensive) hardware and 0 maliciousness. I don’t think it’s realistic to expect it in a Web3 world either (and I don’t see how Arweave solves that, for example).

That’s not to say that we can’t have better support for storing lots of data on the IC (for example, what CanDB was doing - I’m not sure what’s the state of the project now). But even with this support you can’t expect to treat it the same way as you’d treat your Wasm heap.

8 Likes

I aligned on what you are saying in general. Knowing the data is certified is certainly better than betting it is correct. But in extreme circumstances where it you could stream data faster than certify it or predict its request. It might be interesting to at least have some financial assurance from the boundary that they aren’t swapping bits on you. Maybe like live streaming video?

1 Like

I assumed that’d be the case hence why I’m proposing to have it only on dedicated “heavy processing” subnets, where instead of running lots of canisters doing light work, there are fewer but more demanding ones. In order to achieve the vision of a world computer the IC needs higher throughput, it is unlikely to deliver on its promises if all services running on it are subject to homogeneous constraints.

Even if the threshold sigs weren’t a bottleneck, I wouldn’t expect such subnets to have a high count of actively running canisters anyway, if there are too many and their execution is time sliced too often it’d kinda defeat the purpose.
It’d certainly benefit canisters likely to run into the instruction limit either very small, e.g HPL ledgers, or subnet sized ones, e.g Bitfinity. The latter kind takes up an entire subnet anyway and the former could be load balanced by quickly moving them between subnets individually or subnet splitting.

Though it is true that if the limit of canisters per subnet under this model is too low and the costs have to be increased by many orders of magnitude to make up for it, then they probably wouldn’t be used at all or not enough to justify the engineering effort.
Is ~10 canisters per subnet actually in the ballpark of what we could expect?

Perhaps the way I phrased it made it more dramatic than I intended, I wasn’t implying there are hard constraints in the protocol, it’d be worrying if it were the case.
Nonetheless, if the community suddenly decided to create a 100 nodes subnet, it wouldn’t be possible. Sure, the foundation could prioritize the work to make it happen sooner, but even then, nobody knows how well it’d run, it might function, but further optimization could be required to make it actually useable.

I can understand low node subnets not being compelling enough to justify the work needed to safely add them, but it is somewhat concerning that almost 3 years after mainnet release, we still don’t have even one >=100 nodes subnet, nor have any clue of what kind of performances we can expect when eventually they become a thing.

Imho these too are symptoms of the protocol being developed primarily based on a set of assumptions dictated by the network structure unilaterally chosen for it.
If there had been no a priori bias on which configurations are more desirable, with any form of specialization taking place only after usage patterns spontaneously formed on mainnet, these safeguards likely would have been already implemented as they’d be mandatory for genesis release.

Assuming the vision is still to offer a crypto cloud, capable of covering the decentralization spectrum as much as possible, doesn’t it make more sense to start by accounting for the “worst case” scenario first? This would entail implementing high replication, permissionless subnets, and only later optimizing it by granting more favorable conditions, such as low repl, permissioned subnets with server grade hw.
As a result the end product would be more robust, as it would need to account for more adversities. Generally speaking, it’s easier to optimize a system by providing a less harsh environment, see Hyperledger, than doing it the other way around.

Btw the cycle issue should at some point be addressed regardless of any new subnet types, tokenomics are potentially in constant jeopardy of 1 subnet being taken over.

This is super interesting that it seems to be flattening out. I wonder if Moore’s still holds if you take latency, access and reliability into account? Likely it looks steeper but I wonder by how much. What do the replicas use for disk space?

Replicas use data center SSDs in a RAID configuration (not sure which). This is necessary because orthogonal persistence requires the ability to read and write GBs per second. And do so over and over (at rates that would likely cause consumer SSDs to fail within weeks or months).

2 Likes

Here is a description of the replica’s hardware.

1 Like

For an actual example of the disks used in our Gen2 hardware replica nodes, we use the following SSD model: 6.4TB Micron 7450 MAX Series U.3 PCIe 4.0 x4 NVMe Solid State Drive
Each node server has 5 per the specification, so 5x6.4TB ~ 30TB (leaving some used by the IC-HOST OS, etc). Total cost for the 5x SSD is around US$6000 so that gives you a rule-of-thumb cost of US$200/TB

Also on the topic of disk price changes, the pricing of these datacentre grade nvme SSDs (at least) jumped about ten percent at the end of last year, apparently due to flash module supply constraints. So pricing doesn’t always trend down in the near term.

1 Like

Agree - these are the strongest new blockchain designs. I think they do keep the entire history of the chain though, at least for full nodes… are you sure about this? The TPS figures are a bit overblown though.