As you can read from SaĹĄaâs answer we do use AWS, and I would guess since the file that was not accessible anymore was older than one year and the new retention is 6 months that ~50% less storage is now used. AFAIK itâs mostly build artefacts that you can produce yourself given the public repos.
I doubt the costs will be made public. Just out of curiosity: why do you want to know about AWS usage specifically?
Let me preface this by saying, I understand there are multiple reasons why any organization would need to use AWS. Itâs obviously heavily used by almost anyone in the web space and has a ton of use cases.
With that said, in my opinion, it doesnât âlookâ good when a foundation uses a company/service they are directly competing against. When I (as a user/non-developer) see marketing material saying the IC can do everything AWS can do, but better, and then see the foundation who is promoting the IC using AWS, it makes me wonder âWhy arenât they using their own service if itâs so great?â
Like I said, I understand there can be any number of reasons, technical and non-technical why this may be occurring. I think ICP tech is awesome and am trying to use it whenever/wherever I can. But again, as a layperson, it doesnât make the foundation âlookâ like they are willing to put their money where their mouth is and dump AWS to use their own product if it has all the capabilities of AWS right now. Just my opinion.
Guys, if anyone needs to store backup of blockchain state or blockchain history, thereâre a couple of project that are almost production ready and designed for exactly THIS use case
Such solution allows to permanently archieve critical information such as blockchain states/history in a trustless, decentralized way. Pay once store forever.
But I have a feeling community doesnât know about this technologies
Itâs important to be honest about why officials choose to hide their dependence on AWS while also bashing AWS. Where is the trust if you choose to hide it?
Not totally known with the whole âdfinity storing data on AWSâ. But in my opinion itâs not that weird, as long as they donât run the ic and store on-chain on AWS
I assume Microsoft also uses services from Google or Amazon and vice versa while they are also competitors in the hosting space.
It doesnât currently have all the capabilities of AWS. I am a layperson and I know that. There is potential that it will in the coming years or decade that it might.
Iâve seen games built on the IC, web sites, social media dapps, a data storage app, digital marketplaces, etc. The only thing I can think of off the top of my head that I havenât seen is a store that sells/ships physical goods.
Iâm genuinely curious, what canât it do that AWS can? Like I said originally, Iâm sure there are some things, and thatâs why Dfinity would be using it, but I donât know what they are.
Totally fair opinion and I agree with most of it. I think once storage subnets are a thing this could change pretty quickly.
Related question: Do you think such data should be hosted on-chain? While the IC is an extremely high-availability system, is it really optimal to use it to store everything? Iâve heard of more than a few situations (not related to the IC) where the service that was down was also hosting things that were needed to get it back up and running. Or incidents where the status page is down when the service is down
Thank you for the links! Do you know how much these storage options cost? I couldnât find it after a bit of looking aroundâŚ
While I see what youâre getting at, I donât think itâs fair to say âchoose to hideâ. The data on AWS is relatively unimportant (build artefacts are recoverable from the source code) and AWS going down does not affect mainnet at all.
Store very large amounts of data, and make it cheap. The latest node hardware spec demands 32TB disks, and assuming all of this is available to store data on-chain (it isnât, but weâll skip over that for now) and given the 36 subnets we have right now, the capacity of the IC is 36*32TB = 1100TB. I donât think DFINTY should hog more than half the capacity of the IC
Also, cost is a factor. I donât know AWS costs, but since there is less replication and (some of) their systems are specifically built to store data it is a lot cheaper. Assuming AWS is 5x cheaper, that would free up ~2.4M USD to fund additional development per year (600TB * 5$ / GB / year).
With that I was also asking why nobody is asking about GCP or Azure. (AFAIK we donât use these at all, but why do people only care about AWS?)
Hey @Severin, thanks for the reply! I think data should be stored wherever makes the most business sense for the organization storing it. If thatâs AWS right now for Dfinity, then go with it. My comments are purely based on optics. If itâs not economically viable to store the amount of data on the IC that you need to store because of current limitations, so be it. But it just doesnât âlookâ good (and this is just my opinion) when marketing materials make it seem like AWS is legacy tech and people can replace it right now with the IC. From the two images below from https://deck.internetcomputer.org, thatâs the impression I got.
Also, I fully understand marketing includes things that are not currently possible and will be available at some point down the road. I respect that. Personally, I think if you were using some other provider other than AWS, even less people would care.
To me, this is just because AWS is the service that has the most âvisibilityâ, itâs a company/service that most people have heard of in the media, even just in passing, and itâs someone that Dfinity is/will be competing against, even in their marketing material. Iâm sure GCP or Azure have had outages, but when AWS has one, itâs ânewsworthyâ just because of the name.
I wholeheartedly agree allocating resources where theyâre necessary. Definitely seems like a good tradeoff. As I said, my whole viewpoint and comments are based on optics, nothing more. Thanks again!
Arweave is ~ 3.5$/GB FOREVER, Kyve and IRYS will add a fixed on top of this, but Iâm pretty sure itâs below 5$/GB.
You pay just once and data is stored forever on arweave
storing things forever is a nice promise, but itâs kind of precisely the opposite use case of what Dfinity is using AWS for. We create vast volumes of build artifacts through CI jobs, which need to exist for a little while for testing and development, but most of which can be discarded after a little while.
Weâve already transitioned other temporary storage, such as build previews for internetcomputer.org and so on, to hosting on the Internet Computer, but there still isnât a great solution to date for the kind of scale that these ephemeral multi-gb files call for
Also, from the Arweave yellow paper HERE âThe Arweave protocol avoids making it an obligation to store everything, which in turn allows each node to decide for itself which blocks and trans- actions to store.â My understanding of that statement is any node provider can choose not to store your information if they donât want to. Just like most other protocols.
The Internet Computer doesnât rely on AWS; however, we do utilize AWS S3 as a data store for build artifacts. Itâs important to clarify that this reliance on AWS is not absolute. We could employ any S3-compatible data storage solution with an HTTPs interface. The choice of AWS S3 is primarily for convenience. Notably, in recent weeks, we have begun pushing IC release artifacts to GitHub as well, and we may explore other storage options in the future.
Currently, we have approximately 500TB of IC build artifacts stored on AWS S3. Unfortunately, we cannot disclose the cost publicly due to confidentiality reasons.
One unique aspect of the IC is the privacy of on-chain data. While this may or may not be the primary differentiator in the future, it is a key consideration for us now. The decentralized nature of subnet nodes, spread across independent nodes globally, ensures the safety and integrity of data against malicious actors. Sharing block data, such as through backups to platforms like Arweave, has irreversible consequences. Once data becomes public, thereâs no turning back. To maintain this privacy, we create subnet backups on private machines, accessible only to a select few individuals. Even I do not have access to these machines. Simultaneously, we are actively exploring better methods to ensure privacy and data backups. Storing encrypted data on public blockchains is not a viable option, as it would offer minimal value.
Itâs essential to clarify that we are not attempting to hide our reliance on AWS. As mentioned earlier, AWS is a tool in our toolkit, serving as a temporary solution until we transition away from it entirely. Currently, our dependence on AWS is relatively minimal compared to many other blockchain projects.