Identifying replica/Motoko performance anomaly with larger responses being faster

Hey there,

in our project we’re observing an interesting performance anomaly. We’re storing encrypted messages in canisters and tried to identify the best batch sizes for reads and writes. Checking different batch read window sizes (read 10, 100, 1000, 10000 messages at once) we got these measurements:

> ./test.exs read_bench
...snip...
Name                 ips        average  deviation         median         99th %
read 10           344.41        2.90 ms    ±18.94%        2.85 ms        4.25 ms
read 1000         100.74        9.93 ms    ±25.17%        9.60 ms       16.75 ms
read 100           21.30       46.96 ms     ±1.89%       46.90 ms       49.54 ms
read 10000          1.51      660.43 ms     ±2.61%      653.82 ms      697.74 ms
...snip...

So there is this outlier here at reading messages with a batch size 1000. Reading a single batch of 100 messages takes ~47ms on average but reading a single batch of 1000 messages takes only ~10ms on average! So we’re getting more in less time! We’re calling always the same function in the canister zone_availability_canister/src/DiodeMessages.mo at master · diodechain/zone_availability_canister · GitHub

  public func get_messages_by_range(store: MessageStore, min_message_id: Nat32, in_max_message_id: Nat32) : [Message] {
    let max_message_id = Nat32.min(in_max_message_id, get_max_message_id(store));
    if (min_message_id > max_message_id) {
      return [];
    };

    Iter.range(Nat32.toNat(min_message_id), Nat32.toNat(max_message_id))
    |> Iter.map(_, func (i : Nat) : Message {
        let offset = get_message_offset_by_id(store, Nat32.fromNat(i));
        get_message_by_offset(store, offset);
    })
    |> Iter.toArray(_);
  };

Which is wrapped by the actor:

  public shared query func get_messages_by_range(min_message_id: Nat32, max_message_id: Nat32) : async [DiodeMessages.Message] {
    DiodeMessages.get_messages_by_range(dm, min_message_id, max_message_id);
  };

And the bench code is checking the returned number of messages. So we know that is correct.

Going deeper into this and trying to isolate the breaking point, it seems that once we cross a certain response size this “turbo anomaly” kicks in:

It’s wild that the performance improves 10x suddenly when we cross a certain threshold. So my question is what can we do to dig deeper into this and find the root cause? I was initially thinking there might be some caching in the replica, but this is not on repeated requests. Might this be a local only dfx replica issue, a Motoko compiler anomaly?

Also is this test then even representative for what we will experience once deployed on a real network?

Any suggestions from an ecosystem expert on how to debug this further would be amazing!

Thanks!

What is your store? It may be something in the iter allocation code given what you’ve posted.

Hey @skilesare, the store is holding two regions and is using the Region API to load the message. The whole get_message_by_offset is actually pretty short so I’m copying it here:

  private func get_message_by_offset(store: MessageStore, offset: Nat64) : Message {
    let id = Region.loadNat32(store.inbox.region, offset);
    let timestamp = Region.loadNat32(store.inbox.region, offset + 4);
    let key_id = Region.loadBlob(store.inbox.region, offset + 8, 20);
    let next_msg_id = Region.loadNat32(store.inbox.region, offset + 28);
    let prev_msg_id = Region.loadNat32(store.inbox.region, offset + 32);
    let hash = Region.loadBlob(store.inbox.region, offset + 36, 32);
    let payload_offset = Region.loadNat64(store.inbox.region, offset + 68);
    let payload_size = Region.loadNat32(store.inbox.region, offset + 76);
    let ciphertext = Region.loadBlob(store.payloads.region, payload_offset, Nat32.toNat(payload_size));

    return {
      id = id;
      timestamp = timestamp;
      key_id = key_id;
      next_msg_id = next_msg_id;
      prev_msg_id = prev_msg_id;
      hash = hash;
      ciphertext = ciphertext;
    };
  };

And I didn’t mention it above but the versions are:

dfx 0.24.0
moc 0.13.0

Hi Dominic,

It could be interesting to trace the cycle consumption in the canister, to see whether it is by dfx/replica, e.g. by logging the following at the end of the query message:

Prim.debugPrint("Total instructions " # debug_show(Prim.performanceCounter(0)));
Prim.debugPrint("Mutator instructions " # debug_show(Prim.rts_mutator_instructions()));
Prim.debugPrint("Collector instructions " # debug_show(Prim.rts_collector_instructions()));
1 Like