Dfx canister --network ic stop command throwing error

I’m trying to upgrade my backend canister. I ran a stop command on it about 30 minutes ago and its still has a status of stopping as opposed to stopped. How long should it take for the canister to stop? and do I have to stop the frontend canister before stopping the backend canister?


edit: it has now had a status of stopping for 13 hours.

for a bit more context, I’m attempting to remove a function that I had recently added to my main.mo file since introducing this function came with issues. When I removed the function, I then proceeded to deploy the backend canister and was met with this error message:

Error: The Replica returned an error: code 5, message: "Canister cxi6d-5iaaa-aaaap-qaaka-cai trapped explicitly: canister_pre_upgrade attempted with outstanding message callbacks (try stopping the canister before upgrade)"

so in response to this message, I then attempted to stop the canister before upgrading it and was thats when I ran into the issues with stopping the canister.


update: I discovered an infinite loop in my code. That would explain why the error message is saying canister_pre_upgrade attempted with outstanding message callbacks . I fixed the loop and am now attempting to upgrade the canister to resolve this issue, but of course I can’t upgrade it due to the outstanding callbacks most likely caused by the infinite loop. the infinite loop also seems to be causing me to not be able to stop the canister either. @diegop, would you mind directing me to the person who would be most able to help me resolve this issue. The documentation makes no mention of this and any questions in this forum that are remotely similar went unanswered.

The command actually just errored out. The terminal gave me this message:

Error: The request timed out.

How do I stop this canister? the front end canister stopped with no problem. The backend canister is throwing that error

@paulyoung , @chenyan or @claudio would any of you know why this would be happening?

AFAIK, stop command will only execute once canister is not being used anymore (ie: it has no pending requests to it etc). Not sure if this is causing issues to you or not.

I doubt thats causing the issue. The canister is a test version deployed to the IC. The only person who could be making a request to that canister is me as I am the only person who uses the test version of my app.

Very weird indeed ser. Hope there are people that know more about what could happen with this than me, that will provide you with the answer :slight_smile:

1 Like

The canister seems to be running again. @Jesse Any update?

Hi @Jesse,
Yes this is a tough situation if your canister is stuck in an infinite loop. One option would be to delete your canister and create a brand new one. Another option would be to wait for the canister to run out of cycles (causing it to break out of the loop because it will reply with a reject message). You can then deposit more cycles and do your upgrade.

Sorry that we don’t have a better solution.

@Jesse would you mind sharing some details about the infinite loop or a code snippet?

Do I understand correctly that the stopping canister A calls another canister B in an infinite loop within an async function without checking whether the call was successful or not? So that even if canister B is stopped and rejects calls, canister A continues running in the loop?

below is a code snippet of the code that caused the infinite loop. You’ll notice that i forgot to increment the index variable. My frontend canister makes a call to this backend function.

public func getTotalValueLocked () : async Nat64 {
        var index = 0;
        var totalValueLocked : Nat64 = 0;
        let numberOfProfiles = Trie.size(profiles);
        let profilesIter = Trie.iter(profiles);
        let profilesArray = Iter.toArray(profilesIter);

        while(index < numberOfProfiles){
            let userProfile = profilesArray[index].1;
            let userJournal = userProfile.journal;
            let userBalance = await userJournal.canisterBalance();
            totalValueLocked += userBalance.e8s;
        };

        return totalValueLocked;

    };

so the accurate description of the situation would be, the stopping canister A is called by my frontend canister B. the function within canister A that is being called is trapped in an infinite loop.

1 Like

No update. Last night I was testing a theory which required me to start the canister. When that didn’t work, i went to bed, leaving it as “running”

yikes. Neither of these options are good options at all. I’m worried that if my hypothesized fix turns out not to be the solution, I’ll be deleting and creating canisters every time I wanna test a hypothesis as a solution. and in the event that this issue had made it to production, I’d be forced to choose between letting all the user data be lost or allow the canister to run out of cycles. In production, I keep hundreds of trillions of cycles in my backend canister because it is this canister that facilitates the creation of new canisters each time a user makes an account. It could be months or years before that canister runs out. Both options are infeasible if this bug were to make it to production. If it is in fact the case that an infinite loop is this costly of a mistake, I’m requesting that this be made among top priority by the SDK team. If you are a member of the SDK team or are able to reach them, please do relay this request.

Thanks! Does userJournal.canisterBalance() call some other canister? If so stopping that canister might help to break the infinite loop.

@ulan yes. It calls a canister that was dynamically created using a canister class. So as far as I know, the canister where the loop is occurring is the controller of the canister that is called when userJournal.canisterBalance() is implemented. thus leaving me without the power to stop the canister since my prinicpal isn’t a controller directly.

Yes I agree that this is unreasonable price to pay for an accidental infinite loop. I’ll pass on the request.

2 Likes

@Jesse Are you in this subnet ?

This subnet seems to experience issues with the number of blocks per seconds since a few hours.

I’m having issues upgrading and making calls to my canisters in this subnet (some requests are taking more than 50s and others just time out; this seems to be worse when I try inter-canister calls).

Also, does someone know if there is a way to choose in which subnet you will be deployed ? If not, what kind of system / who is making this decision for you ?

I’m actually not sure. The closest data center location to me would be chicago. So if subnets are geographically oriented then maybe. If all the nodes in that link you sent me are a part of a single subnet, then I can tell that it’s not the case that subnets are specific to a single geographical location and would have to check to see what subnet I’m on. How do i check which subnet I’m on?

You can just copy/paste your canister id into the search bar on the top left

Hi @Jesse and sorry to hear about your troubles here. On top of the recommendations others have offered to help with your immediate problem, I wanted to make sure you’re aware of the Motoko Playground. This should give you a test environment where you can safely run tests and debug your canisters before going to production. Of course it’s not going to help you in this case but it’s good to be aware of this for the future.

1 Like

Thank you for the recommendation. I was previously using motoko playground, but since introducing dynamically created canister class, I haven’t been able to use the motoko playground because it doesn’t allow me to send cycles (a functionality that is necessary for dynamically creating canister classes)