Broken local replica

richardartoul · January 10, 2021, 5:52am

I was experimenting with a simple Matoko program to do some benchmarking of the IC and see what its capable of:

import Array "mo:base/Array";
import Iter "mo:base/Iter";

actor {
    var nums : [[Int]] = [];

    public query func size(): async Int {
        var sum = 0;
        for (x in nums.vals()) {
            sum += x.size();
        };
        return sum;
    };

    public query func sum(): async Int {
        var sum : Int = 0;
        for (x in nums.vals()) {
            for (y in x.vals()) {
                sum += y;
            }
        };
        return sum;
    };

    public func append(size : Int) {
        // TODO: Could pre-alloc this if i could figure out how to cast
        // Int to Nat or make IDL parse a Nat.
        var new : [Int] = [];
        for (i in Iter.range(0, size)) {
            new := Array.append<Int>(new, [i]);
        };
        nums := Array.append<[Int]>(nums, [new]);
    };
};

I found that if I ran a bunch of calls to append in parallel I would push my local replica into a bad state where it would no longer reply to any calls (btw the dfx client doesn’t seem to have a default timeout which is a bit annoying). If I force-restarted my local replica it would recover.

Eventually after doing this long enough I seem to have borked my local replica and it just spam this message when I turn it on:

Jan 10 05:52:00.405 WARN s:fscpm-uiaaa-aaaaa-aaaap-yai/n:fhkdv-npwzk-k6xia-pkinl-pzgfn-tifm2-stjfg-2udsp-v5ejo-zx4ud-nqe/ic_consensus/block_maker Cannot propose block as the locally available validation context is smaller than the parent validation context (locally available=ValidationContext { registry_version: 1, certified_height: 4104, time: Time(1610257920403552000) }, parent context=ValidationContext { registry_version: 1, certified_height: 4157, time: Time(1610256351441464000) })

Now any calls I make with the dfx cli return:

Could not reach the server

I assume I can just wipe my local replica (how do I do that?), but thought I’d bring it up as a bug

richardartoul · January 10, 2021, 5:54am

Sorry my DFX version is 0.6.16

Ori · January 10, 2021, 9:10am

Thanks Richard. The local replica state is local to each project, there’s a .dfx directory created when you first run dfx start within the project directory.

You can run dfx start --clean or manually delete this .dfx directory to reset the state for that project only.

Ori · January 10, 2021, 9:33am

Re the TODO, you could use Nat for the size parameter in your append function there, since Nat is a subtype of Int in Motoko.

For reference, to cast Int to Nat as an absolute value you could use Int.abs() from the base library: https://github.com/dfinity/motoko-base/blob/master/src/Int.mo

richardartoul · January 10, 2021, 3:30pm

Thanks!

Any idea why I’m struggling to make parallel calls? I’m hoping that the dfx CLI is just doing something non thread-safe in my local project and it’s not an issue with the replica itself.

Any suggestions on the best way I can try and bulk load data? The consensus latency is kind of high and the cycle limit limits how much data I can load at once so I’m trying to figure out how I can get a few gibs of data in there quickly without waiting for a bunch of synchronous consensus round trips.

richardartoul · January 10, 2021, 4:50pm

@Ori It seems really easy to break my local replica and get it into a bad state. Is this representative of how full nodes will behave? All it takes is making 10 concurrent update calls for about a minute or so and the whole thing becomes unresponsive and gets into a bad state.

chenyan · January 10, 2021, 7:48pm

Can you share the dfx command you use? I cannot reproduce this locally.

richardartoul · January 10, 2021, 7:52pm

Sure, initially I was doing stuff like this:

#!/bin/bash

for i in {1..5}
do
	dfx canister call rno2w-sqaaa-aaaaa-aaacq-cai append '(10000)'
done

then I switched to doing stuff like this using the front-end (please excuse my horrible javascript):

import cache from 'ic:canisters/cache';

var numRunning = 0;
var numIters = 0;

function run() {
  for (var i = 0; i < 10; i++) {
    numRunning++;
    cache.append(10000).then((response) => {
        console.log(i + " -> " + response);
        numRunning--;

        if (numRunning == 0 && numIters < 100) {
          run()
        }
    });
  }
  numIters++;
}

run();

chenyan · January 10, 2021, 8:15pm

So the bash works fine for me, with 10 iterations. I haven’t tried the JS version, but it seems you are making 10 parallel calls for each append, that’s 10^100 calls…

richardartoul · January 10, 2021, 8:18pm

Try the bash thing with 100 or 1000 calls.

The JS thing is confusing but what it’s doing is running 10 parallel calls, waiting for them to finish, then running 10 parallel calls etc

richardartoul · January 10, 2021, 8:19pm

Either way though I should get some kind of back pressure response no?

Anyone on the network could issue calls like this so the replicas need to be able to apply back pressure.

richardartoul · January 10, 2021, 10:42pm

Hmm so interestingly, I tried deploying to the sodium ic and that seems to work much better. My script did not break the sodium IC and I get error now about my canister exceeding its memory allocation. I should be nowhere near the 4GiB limit but I assume the sodium test net has an artificially low limit since no one is paying for it?

nomeata · January 11, 2021, 6:11pm

The default allocation was reduced to a few dozen MB, but you can still request a higher allocation when you install your code.

richardartoul · January 12, 2021, 3:07am

How do I do that? Is it a command line argument to dfx? (I’m on mobile right now and can’t check)

nomeata · January 12, 2021, 5:59pm

Yes, dfx canister install takes such arguments:

~ $ dfx canister install --help
dfx-canister-install 
Deploys compiled code as a canister on the Internet Computer

USAGE:
    dfx canister install [FLAGS] [OPTIONS] [canister-name]

ARGS:
    <canister-name>    Specifies the canister name to deploy. You must specify either canister
                       name or the --all option

FLAGS:
        --all           Deploys all canisters configured in the project dfx.json files
        --async-call    Specifies not to wait for the result of the call to be returned by polling
                        the replica. Instead return a response ID
    -h, --help          Prints help information
    -V, --version       Prints version information

OPTIONS:
        --argument <argument>                        Specifies the argument to pass to the method
        --argument-type <argument-type>
            Specifies the data type for the argument when making the call using an argument
            [possible values: idl, raw]

    -c, --compute-allocation <compute-allocation>
            Specifies the canister's compute allocation. This should be a percent in the range
            [0..100]

        --memory-allocation <memory-allocation>
            Specifies how much memory the canister is allowed to use in total. This should be a
            value in the range [0..256 TB]

    -m, --mode <mode>
            Specifies the type of deployment. You can set the canister deployment modes to install,
            reinstall, or upgrade [default: install] [possible values: install, reinstall, upgrade]

Topic		Replies	Views
Local Replica Weirdness with "Ping-Pong" Canisters Developers	5	429	June 11, 2021
Getting Error IC0504 Developers	1	438	July 4, 2021
Errors when stress testing locally Developers	6	664	June 16, 2022
More Motoko parentheticals: try _bounded-wait_ calls in local replicas now! Motoko release	7	97	March 18, 2025
Error: The Replica returned an error: code 5, message: "Canister trapped explicitly: could not perform self call" Developers	8	2368	August 25, 2022

Broken local replica

Related topics