Instruction limit is crushing me!

One aspect of the IC architecture that pretty much requires instruction limits is the fact that canisters are (single-threaded) actors. If instruction limits were removed or just greatly increased, a canister could be stuck executing one message for very, very long (especially if it doesn’t have reserved compute and competes for CPU cores with other canisters). So at best one could apply higher instruction limits to “batch processing subnets”, where latency is explicitly not a core requirement.

Then there are all the nitty-gritty details of the implementation, such as checkpointing and it being impossible (or at least very, very hard) to preserve the state of an ongoing execution across a checkpoint (so that a newly added or catching up replica can resume computation from half-way through). And allowing a message execution to take something like 100+ rounds with a 500 round checkpoint interval when the canister is not guaranteed to be scheduled for at least 1 out of every 5 rounds will just lead to aborting and restarting the execution after every checkpoint.

Introducing concurrency within canisters would likely not help much: if you had a background execution reindexing all (or much) of your data, it would likely lock you out of accessing that same data concurrently. I.e. canister concurrency would pretty much only help with transactions that touch disjoint data / memory areas. So it would be useful for increasing the throughput of something like independent ledger transactions; but not for heavy computation.

“Renting” a subnet (even for a short period of time) might allow one to set it up as desired (arbitrarily high instruction limit, arbitrarily high checkpoint interval, etc.). This may not e.g. lead to the expected outcome (e.g. if one sets up a 4-replica subnet with a one hour checkpoint interval and two replicas fail within the same hour, at least one of them would have to redo most of one hour’s worth of computation before the subnet can make progress; by which time another replica may fail and the whole process would have to start again).

3 Likes