As you can read from Saša’s answer we do use AWS, and I would guess since the file that was not accessible anymore was older than one year and the new retention is 6 months that ~50% less storage is now used. AFAIK it’s mostly build artefacts that you can produce yourself given the public repos.
I doubt the costs will be made public. Just out of curiosity: why do you want to know about AWS usage specifically?
Let me preface this by saying, I understand there are multiple reasons why any organization would need to use AWS. It’s obviously heavily used by almost anyone in the web space and has a ton of use cases.
With that said, in my opinion, it doesn’t “look” good when a foundation uses a company/service they are directly competing against. When I (as a user/non-developer) see marketing material saying the IC can do everything AWS can do, but better, and then see the foundation who is promoting the IC using AWS, it makes me wonder “Why aren’t they using their own service if it’s so great?”
Like I said, I understand there can be any number of reasons, technical and non-technical why this may be occurring. I think ICP tech is awesome and am trying to use it whenever/wherever I can. But again, as a layperson, it doesn’t make the foundation “look” like they are willing to put their money where their mouth is and dump AWS to use their own product if it has all the capabilities of AWS right now. Just my opinion.
Such solution allows to permanently archieve critical information such as blockchain states/history in a trustless, decentralized way. Pay once store forever.
But I have a feeling community doesn’t know about this technologies
I’ve seen games built on the IC, web sites, social media dapps, a data storage app, digital marketplaces, etc. The only thing I can think of off the top of my head that I haven’t seen is a store that sells/ships physical goods.
I’m genuinely curious, what can’t it do that AWS can? Like I said originally, I’m sure there are some things, and that’s why Dfinity would be using it, but I don’t know what they are.
Totally fair opinion and I agree with most of it. I think once storage subnets are a thing this could change pretty quickly.
Related question: Do you think such data should be hosted on-chain? While the IC is an extremely high-availability system, is it really optimal to use it to store everything? I’ve heard of more than a few situations (not related to the IC) where the service that was down was also hosting things that were needed to get it back up and running. Or incidents where the status page is down when the service is down
Thank you for the links! Do you know how much these storage options cost? I couldn’t find it after a bit of looking around…
While I see what you’re getting at, I don’t think it’s fair to say ‘choose to hide’. The data on AWS is relatively unimportant (build artefacts are recoverable from the source code) and AWS going down does not affect mainnet at all.
Store very large amounts of data, and make it cheap. The latest node hardware spec demands 32TB disks, and assuming all of this is available to store data on-chain (it isn’t, but we’ll skip over that for now) and given the 36 subnets we have right now, the capacity of the IC is 36*32TB = 1100TB. I don’t think DFINTY should hog more than half the capacity of the IC
Also, cost is a factor. I don’t know AWS costs, but since there is less replication and (some of) their systems are specifically built to store data it is a lot cheaper. Assuming AWS is 5x cheaper, that would free up ~2.4M USD to fund additional development per year (600TB * 5$ / GB / year).
With that I was also asking why nobody is asking about GCP or Azure. (AFAIK we don’t use these at all, but why do people only care about AWS?)
Hey @Severin, thanks for the reply! I think data should be stored wherever makes the most business sense for the organization storing it. If that’s AWS right now for Dfinity, then go with it. My comments are purely based on optics. If it’s not economically viable to store the amount of data on the IC that you need to store because of current limitations, so be it. But it just doesn’t “look” good (and this is just my opinion) when marketing materials make it seem like AWS is legacy tech and people can replace it right now with the IC. From the two images below from https://deck.internetcomputer.org, that’s the impression I got.
Also, I fully understand marketing includes things that are not currently possible and will be available at some point down the road. I respect that. Personally, I think if you were using some other provider other than AWS, even less people would care.
To me, this is just because AWS is the service that has the most “visibility”, it’s a company/service that most people have heard of in the media, even just in passing, and it’s someone that Dfinity is/will be competing against, even in their marketing material. I’m sure GCP or Azure have had outages, but when AWS has one, it’s “newsworthy” just because of the name.
I wholeheartedly agree allocating resources where they’re necessary. Definitely seems like a good tradeoff. As I said, my whole viewpoint and comments are based on optics, nothing more. Thanks again!
storing things forever is a nice promise, but it’s kind of precisely the opposite use case of what Dfinity is using AWS for. We create vast volumes of build artifacts through CI jobs, which need to exist for a little while for testing and development, but most of which can be discarded after a little while.
We’ve already transitioned other temporary storage, such as build previews for internetcomputer.org and so on, to hosting on the Internet Computer, but there still isn’t a great solution to date for the kind of scale that these ephemeral multi-gb files call for
Also, from the Arweave yellow paper HERE “The Arweave protocol avoids making it an obligation to store everything, which in turn allows each node to decide for itself which blocks and trans- actions to store.” My understanding of that statement is any node provider can choose not to store your information if they don’t want to. Just like most other protocols.
The Internet Computer doesn’t rely on AWS; however, we do utilize AWS S3 as a data store for build artifacts. It’s important to clarify that this reliance on AWS is not absolute. We could employ any S3-compatible data storage solution with an HTTPs interface. The choice of AWS S3 is primarily for convenience. Notably, in recent weeks, we have begun pushing IC release artifacts to GitHub as well, and we may explore other storage options in the future.
Currently, we have approximately 500TB of IC build artifacts stored on AWS S3. Unfortunately, we cannot disclose the cost publicly due to confidentiality reasons.
One unique aspect of the IC is the privacy of on-chain data. While this may or may not be the primary differentiator in the future, it is a key consideration for us now. The decentralized nature of subnet nodes, spread across independent nodes globally, ensures the safety and integrity of data against malicious actors. Sharing block data, such as through backups to platforms like Arweave, has irreversible consequences. Once data becomes public, there’s no turning back. To maintain this privacy, we create subnet backups on private machines, accessible only to a select few individuals. Even I do not have access to these machines. Simultaneously, we are actively exploring better methods to ensure privacy and data backups. Storing encrypted data on public blockchains is not a viable option, as it would offer minimal value.
It’s essential to clarify that we are not attempting to hide our reliance on AWS. As mentioned earlier, AWS is a tool in our toolkit, serving as a temporary solution until we transition away from it entirely. Currently, our dependence on AWS is relatively minimal compared to many other blockchain projects.