Why are Internet Computer’s storage costs so low? Is it because of the existence of Storage Node? I tried to find information about Storage Node, but could not find it.
Storage Node does not exist yet?
Because there are only a few nodes?
Because data is not stored forever?
There are a number of reasons for the low cost.
The primary reason is that (as opposed to e.g. Ethereum) the IC is not a single, monolithic blockchain. If you have a single blockchain and thousands of nodes, you need to pay for storage across all those nodes.
Second, on a network like Ethereum (I just picked it because I have a vaguely decent understanding of it, not because I have anything against it) not all nodes may store the full blockchain. But the full blockchain is intended to be stored indefinitely. So when you pay for storage, you pay for it for an indefinite period of time (with the underlying idea being that storage costs halve every X years, so a finite upfront payment should cover storage forever). On the IC (as mentioned on the wiki page you linked) storage costs are charged per year (actually per day, or more frequently).
Third, even considering a network consisting of multiple subnets/sub-chains (I don’t know which of the ones listed there qualify, if any), there is still the issue of anonymous decentralization. I.e. it may be that only a subset of the network’s nodes (those making up a specific subnet) will store your data. But if any validator is free to anonymously join any subnet, you need large numbers of validators in order to prevent a Sybil attack (where an attacker, acting under a multitude of identities, controls a majority of the validators and can get the subnet to act maliciously). The IC uses deterministic decentralization, whereby node operators are publicly identified (and the public can look into whether they are actually independent). Meaning that a much smaller number of validators per subnet is required to achieve the same level of decentralization as on a network with anonymous validators. Hence, a lot fewer copies of any piece of data.
Finally, the $5 per year cost was (I believe) calculated for subnets consisting of 4 or 7 replicas. 13 node replicas (the current standard subnet size) should charge a bit more. Fiducial subnets, with 30+ replicas, even more than that. I believe that this is already implemented and will apply to all future subnets.
@free are you able to share why Arweave can advertise a cost that is much lower than the IC?
Thank you for your answer! I understood. I have one more related question.
Is the low cost of storage the reason for being able to store large files on Internet Computer or is there another reason? It is my understanding that Ethreum cannot store large files due to the high cost and low maximum size per block and TPS .
To my understanding the low cost is due to the consensus mechanism that the IC uses. In Ethereum the files are stored in every machine. This as a result tends to be much more costly.
Internet Computer’s consensus mechanisms ensure network efficiency, network stability, fast finality, and high security. However, I believe it does not contribute directly to the cost of storage.
It may depend on the definition of the consensus mechanism.
In Ethereum the files are stored in every machine. This as a result tends to be much more costly.
Thank you for your answer! I agree with you.
They are very different classes of storage. I don’t know much about Arweave, but it looks very much like cold storage: you upload some data and it stays there forever, but you can never modify it. They do have a “Use Arweave as a database” section on their website, but if you click through, all it talks about is discovering and retrieving data. Not a general purpose database, with records you can update at will.
Arweave also relies on “people who have hard drive space” for the actual data storage. Meaning that on the one hand the actual hardware costs are cheap (you don’t even need a fully dedicated machine, much less high end hardware); and on the other, that you probably get relatively low quality of service (I imagine that heavily queried data is widely replicated, but some out-of-the-way piece of data may live on HDDs behind a 200kb/s uplink).
On the other hand, on the IC currently all data is replicated at least 13x, stored on expensive data center SSDs, connected to 10Gbps network links. With the nearest copy of said data likely being 10 ms away from 90% of potential users. All of it is also very much mutable, so you can build an actual general purpose database on top of it, since it all acts as canister memory (and the frequently used canisters being pretty much in memory at all times, given that a subnet is currently limited at 450 GB of state and the nodes all have 512 GB of RAM).
Which is why we’re looking into storage subnets or possibly relying on external cold storage, such as Arweave or IPFS for the kinds of data that don’t require this level of availability, throughput, latency and mutability.
The reason why Ethereum cannot store large files is that Ethereum stores everything (smart contracts, user requests, data) in its blockchain, forever. And that blockchain is replicated across many of the thousands of Ethereum nodes (some of them only retain parts of the blockchain, but IIUC you need access to the full blockchain for everything to work).
The IC OTOH only stores user and canister messages in its blockchain (including e.g. ingress messages that upload large pieces of data in chunks). But apart from the NNS said blockchains are not persisted beyond the last few hundred blocks (and seconds). (And the NNS blockchain usually consists of NNS proposals, votes and ledger transactions, so it’s quite limited in size.)
The full state of a subnet is maintained by the replicas making up the subnet and, as said, can go up to 450 GB per subnet (and soon more). But this data is not permanent. Canisters have to pay for storage and when a canister is either deleted or fully runs out of cycles, its state is lost. There’s an obvious trade-off there, which is reflected in the cost.
Thank you for your answer! It is very informative.
I understood that Ethereum stores smartcontracts and states while IC does not store smartcontracts and states. And I understood that only the last few hundred blocks are stored. But I can’t fully understand. Stable variables in Motoko don’t lose state when the canister is upgraded, but it is not saved? Does it work even if the canister is not saved because it is saved as WASM?
There seems to be a misconception. The IC does not keep all the blocks of its subnet blockchains forever, but it does store the currently deployed canister smart contracts and their state, together with the current subnet states, see IC state manager - Internet Computer Wiki for more details.
I.e. the whole idea of a blockchain as a ledger is that if you follow the full blockchain and replay every block, you will arrive at the current state of the network. And that is what defines the state of the network. E.g. if all copies of a handful of Bitcoin blocks would disappear overnight, the whole state of Bitcoin would be lost forever (imagine trying to figure out the balance of an account from a paper ledger that has had a few pages torn out).
The IC OTOH has made the conscious decision that it is only going to use a subnet’s blockchain as a way of ensuring consensus (i.e. everyone agrees that these are the transactions everyone should be executing), and discard the blocks after every replica has had a chance to process them. And instead it maintains the current state of the subnet (kind of like a computer maintains its current state in memory instead of keeping an ever growing list of key presses and incoming network packets).
From this point of view, the downside of the IC’s approach is that state is not persisted forever, as is the case with Bitcoin and Ethereum. And you cannot reconstruct the state of the subnet at an arbitrary point in the past. The upside is that state (including long since obsolete state) does not need to be persisted forever. So storage is cheaper.
I misunderstood. I did not know IC state manager. Thank you very much.
Thank you for your answer. My disparate knowledge is now connected. Very interesting.
IC stores only messages in blocks. Old blocks are discarded. It was my understanding that if the Cycle to maintain the canister runs out and the state is destroyed, the state is completely lost as there is no way to recover the state. I understood that the old blocks are destroyed, but since they are distributed across multiple nodes, if one node goes down, state is recovered by Resumption.
However, since the State Manager stores canisters and states, storing only messages in blocks may not be a reason why storage is low. If State Manager is not distributed, then it might be a reason why it is low. What do you think?
Thank you, super helpful. What’s the latest in Dfinity’s thinking on storage subnets vs. other blockchains vs. other storage solutions? Is this a major priority?
Not 100% clear on what you are saying, so for the sake of completeness I’m going to clarify (even though it may turn out this is exactly what you were saying in the first place). If a canister completely runs out of cycles, it is actively deleted by the subnet. You should think of a subnet (from this point of view) as a fully replicated virtual computer. Each of the replicas executes the same inputs and has the same state. The state of the subnet is defined by the state of this virtual machine, which is continuously persisted to disk. The state of the subnet could also be fully defined by the blockchain (as is the case with most blockchains), but apart from the NNS subnet, subnet blockchains are not persisted. So they are primarily used as a consensus mechanism.
With a subnet behaving like a virtual machine, if a canister state is deleted (whether explicitly, by its controller or because it ran out of cycles) it is indeed the case that its state cannot be recovered. In this view of a subnet, a canister is equivalent to a Linux or Windows process (a running application). And deleting the canister is equivalent to terminating the process. There is usually no way of resuming the state of a process once it is terminated.
I’m guessing by “storage is low” you mean “storage cost is low”. In that case, not persisting blocks (and thus not persisting all inputs) is very much a reason for why storage costs are lower.
E.g. imagine the amount of storage required to store all Windows updates that you ever applied to your computer. Forever. Some of those may have been half the size of your Windows installation, and there are thousands and thousands of them. Now compare that to the current disk size of your Windows installation. You will find that the former is at least 10x larger than the latter. Now imagine that you go ahead and wipe Windows from your machine (say, 10 GB) and install a very small Linux distribution (say, 1 GB). (We can see this as the equivalent one canister being deleted and another one installed.) Now the full state of your machine is 1 GB whereas before it used to be 10 GB. And, if you still had to persist all Windows updates you ever installed, it would be 100 GB.
If, for the sake of this analogy, we looked at your computer and its state as the equivalent of a canister, your 1 GB current Linux installation is what the IC would persist as its state. Completely ignoring the fact that it used to have a 10 GB Windows installation; and that over its lifetime said Windows installation required the downloading of 100 GB of updates. A traditional blockchain, OTOH, would represent the state of your computer by those 100 GB of updates plus the 1 GB Linux download, so it would require 100x more space.
So not storing the full blockchain does result in vastly reduced storage over time.
A subnet’s state is identically replicated across all replicas on the subnet. But it is not replicated across all replicas on the IC (point (1) in my original post). And the number of replica making up an IC subnet is orders of magnitude lower than the number of validators of a traditional blockchain (point (3) in my original post).
Both of these (and point (2), not having to store the full blockchain) contribute to the IC requiring less storage than a traditional blockchain for the same functionality.
Honestly, I don’t know. Each of those approaches has its benefits and downsides. E.g. something like Arweave is cheap, but immutable and (very likely, although I’m not sure) provides limited bandwidth. So it may work for something like personal data storage (like Google Drive) but maybe not if you wanted to build some sort of shared storage (like, e.g. video storage for Netflix). For the latter, storage subnets (with datacenter-level bandwidth and adjustable replication) may be more appropriate. On the other hand, if you want to build something like YouTube (with huge storage requirements and a long tail of never accessed content), maybe some other blockchain or a combination would again be more appropriate.
And I guess (again, without any knowledge of the details) that given the IC’s threshold ECDSA (tECDSA) support you may even be able to implement your own integration with a third party storage solution (e.g. another blockchain), without any direct support from or integration with the IC protocol.
I understand. Thank you very much for your thoughtful response.
Are the blocks publicly available?
NNS blocks are not publicly available. The Internet Identity canister was migrated off the NNS subnet precisely so the blocks could be made publicly available (as Internet Identity login uses update calls and every login would be reflected in the blocks).
But there was still work required to actually make the blocks public. And there was a long thread about public subnets that didn’t really reach a conclusion IIRC.
We can see ICP transactions on the dashboard.
Can we see non-ICP transactions (blocks)?
We may be able to check the transaction history at
dfx canister history [canister-id].
I was wrong. The following command does not exist.
dfx canister history [canister-id].