Data Processing Subnet

Just throwing out an idea for discussion. I’ve been looking at some of the stuff Saorsa Labs does off-chain and asking the question ‘why are we not doing this on the IC?’

Partly the reason is the lack of tried and tested ICP DB solutions when we were setting up backend services but a bigger part is probably to do with the setup of ICP subnets. I know that this has been discussed in the context of gaming with 3 node subnets proposed. I’d like to go one step further and propose a non-replicated data processing ‘subnet’. Hear me out…

So first point I’ll make is not every bit of information stored requires the same level of security and auditability. Of course mission-critical information should be replicated with consensus being reached on the validity of that information. But what about when this is not the case… for example calculating metrics for ‘internal’ use or for ‘info’ only? What is the benefit of replicating this information over 13 nodes?

Say for example you want to track all ICP account balances for a non-mission-critical ‘info only’ service. To do this the developer wants to take X latest blocks from the ledger canister, check if the send/receive accounts are known, if so update the balances and if not add the account/ balance.

Although this is quite heavy on read/write it’s well within the capabilities of the traditional server/ DB setup. The question is… in it’s current form, how easy would it be to do this on the IC? I’ve been meaning to give it a go but the list of priority dev work is ever-growing…

My initial thought is that the read part isn’t the issue (query calls) and the problem would lie with writing (update calls). Given that update calls could take 2sec each there is a realistic possibility that you simply couldn’t write the data quick enough to keep up with the tip of the chain. Sure, you could batch stuff and spit processing over multiple canisters… but the question is why bother?

Remember this isn’t mission critical information… I don’t really care all that much if a node goes rogue and changes a 1 to 0 (as long as I’ve got a backup). This is much the same with my DB provider. What I care about is cheap and easy way to compute and store information. I think a lot of businesses will have the same requirement.

The problem that the IC faces that simply having this stuff off chain isn’t a great option either because passing information between IC and the outside world is also a consensus bottleneck. Again HTTP outcalls are great when you need validated info… but I just want to pass some useless info out to my DB as quick as possible. Or better still, have this information stored within the ICP ecosystem on a non-replicated data storage and processing subnet. BINGO!

Interested on people’s thoughts on this?

Anyone else running off-chain services? Why?

3 Likes

For the internet computer do truly compete with the giant cloud providers it will need all types of subnet configurations. Your idea is a good conversation starter.

3 Likes

Yeah, TradCloud has many different use cases covered. As an example, have a look at what AWS offers. Mind you, TradCloud is heavily virtualized so different types don’t necessarily mean different hardware, just different virtual settings. But there are various machine types as well.

1 Like
1 Like