That would surprise me tbh, do you have a canister id for me?
iyehc-lqaaa-aaaap-ab25a-cai.
we have canister logs to show the times of when it started work and finished work.
it started at Oct 9, 2024, 3:11:41 PM UTC
it finished it’s work at Oct 10, 2024, 2:50:01 AM UTC
roughly 12 hours but it’s more like a 11 hour delay when you factor in the time it takes to do the actual work.
My guess is that you do this work with a timer over many rounds? And that round may call a canister on the same/or simally contested subnet to get some value or distribute data? If you don’t increase your compute allocation on all of the involved canisters, then yes, you will see this behavior.
Example:
Canister A sets timer for next round
Canister A is not scheduled for 23 minute.
23 minutes later: Canister A activated and does work but awaits canister B for a value to finish work.
Canister B is not scheduled for up to 23 minutes.
Up to 23 minutes later(+46): Canister B processes and returns value to A
Up to 23 minutes later(+69), Canister A is scheduled to process the request and does loop 2.
If you do a few of these loops you end up with a very long wait.
If this work can be parallelized at all you should do so. Initiate all the calls with a future and then await them all at the same time. Make sure you have enough cycle to cover all the potential calls(up to 500).
I’m guessing your work is minting or sending tokens to a group of accounts? Parallelize it.
See this thread: All or nothing batch transaction ICRC standard? - #14 by icme
The primary work of the canister is to distribute rewards. the only other canister it interacts with is a ledger canister. It is already doing work in parallel to some degree. We calculate all the transfers that need to happen and then do batches of 45 icrc1_transfer calls to the ledger. the canister was developed around 4/5 months ago and at the time we found doing more than 50 calls resulted in some errors related to making too many calls. Perhaps we could make it do 500 and just retry the failed errors but last time i did that i ended up so many fails that it took more time to retry the errors than if i had just batched them at 45.
If the ledger canister timed out some of our icrc1_transfer calls then we should’ve seen errors in our reward payments but we didn’t, they all went through so i think that the ledger canister was actually performing well. We also noticed that once the transfers started to work they all went through in the normal time it would take - this would indicate that subsequent rounds were being processed in a timely manner wouldn’t it? It seems like there was just a huge 11 hour delay and then all the rounds got processed like normal.
Is it safe to parallelize up to 500 icrc1_transfers now?
I see, yeah so if you interact with another canister then it makes more sense. Your canister must definitely get scheduled more frequently than once in 11 hours, but if you make multiple calls to another canister and need to await the result (which means getting scheduled again), I understand how the whole task can take a long time.
that sounds more correct. I guess the only thing we can do is try to bump the batch amount from 45 to close to 500.
You can fill the outgoing queue with up to 500 calls, but you need the right number of cycles in reserve. Each reserves 20B. You get them back when the calls complete(minus the actual execution cost which is far far lower), but the system has to reserve them because it doesn’t know how long it needs to hold the call context for.
500*20000000000 = 10000000000000 =10_000_000_000_000 = 10 T cycles.
So Make sure you have at least 10 T cycles + your freezing threshold(I’d double it just to be safe) and you should be able to up your parallelization to close to 500. I’d leave some room in case your canister needs to do something else.
Also, if this is a custom ledger, consider possibly moving it to the @PanIndustrial Ledger at GitHub - PanIndustrial-Org/ICRC_fungible: A full implementation of an ICRC 1,2,3 compatible fungible token which implements ICRC-4 and you could do hundreds in one call.
Thanks for the recommendations, this should also be a nice boost for us <3
A lot of our staging canisters are affected by the problem at large too. We will also have to add a lot of cycles just to test our application works. I’m wondering how this will affect smaller developers without our budget.
My knowledge of how the scheduler works is based on @skilesare 's comment.
Let’s see what the engineers working on it will figure out. I also have the same scheduling problem with what I am working on, so I am interested in how this gets solved, but we are now far away from the original topic and should move to another thread.
I suppose selecting canisters with multiple algorithms at the same time will make it feel faster. Each algo with a different % of the unreserved computation power. 1) using cycle balances 2) round robin 3) canister age (against DoS) 4) least amount of cycles used. Where (4) is going to make devs who write optimized apps that don’t get used a lot - happy.
Did they make it through?? Sounds like they haven’t found a solution yet.
Hey guys,
Seems the release Dfinity done yesterday fixed all our issue on our side. We are back with compute allocation 0, and update request are working correctly.
Thanks dfinity for your work here.
Some thoughts of people due to your reply and the way you handle this situation @Manu
Highly supporter of your job and you knowledge, but sometimes you don’t answer properly to when concerns come from builders on the network.
Instead of telling developers that they are doing wrong you DFINITY should be all day all night working to fix this limitations and deliver a production product faster, that’s fine if the dapp builder it’s dapp can’t scale due to not using the correct architecture, and in that situation you can tell them don’t use a single canister per user, BUT ONCE this approach taken by the dapp builder AFFECTS THE NETWORK ITSELF you can’t come with that answer because the network should be able to adapt and doesn’t get affected by what other people do in it. So you mean you are gonna say to an attacker,
“hey don’t use a canister per user because the network will slow down and this will affect us” NO THATS NOT THE ANSWER, the network should handle that properly, so if what yral is doing wouldn’t affect other dapps on the subnet and the subnet itself is ok your answer, but once this affects everyone else your answer is simply not professional.
@dominicwilliams @Jan people are worried about replies of some engineers working at dfinity some times. Thanks
Reflecting on the network limitation discussions… I think a key takeaway is the need for clear guidance. To support builders and ensure a harmonious network, I strongly recommend:
- Developing a Comprehensive Best Practice Guide for building scalable dapps on DFINITY, covering architecture, optimization, and network considerations.
This resource would empower devs to create successful apps while minimizing potential network impacts.