NFTAnvil IC network tests report

We have been running tests on local replica for the last three weeks.

A few days ago we started testing on the real IC network. Tests burned ~ 30Tcycles.
All canisters are written in Motoko and implement our own NFTA protocol.

The test tries to simulate real usage - minting, transferring, burning, purchasing with.
There were 3 IC network tests.
Each test ~ 300 threads working for ~14 hours each and doing ~ half-million transactions added to history. One NFTA transaction uses from 1 to 3 IC update transactions.

During tests, the subnet message throughput raised from ~13/s update calls to 90/s

During the first test canisters were loaded with ~3T cycles and at some point we started getting errors.

#system_transient Canister n7njf-sqaaa-aaaai-qcmhq-cai is out of cycles: requested 2000590000 cycles but the available balance is 2770416633363 cycles and the freezing threshold 2790027551738 cycles'

These errors were on inter-canister calls, which ruined the cluster data integrity.

Additionally, canisters using Hashmaps consumed unreasonable amounts of memory. 700mb for 65000records (accountid 32byte) → {Nat64, Nat64} (wasn’t different during local tests)

Canister upgrades worked without issues.
Projected cycle costs were the same as measured costs.

During the next test, canisters were kept loaded with ~ 7Tcycles. There were no inter-canister call errors. :steam_locomotive:

Hashmaps were replaced with Triemaps and the memory consumption dropped from 700mb to 25mb.

Memory = Prim.rts_memory_size()
Heap = Prim.rts_heap_size()
ICE = Inter-Canister call Errors caught with try-catch
Dashboard here - https://nftanvil.com/dashboard
Code here - https://github.com/infu/nftanvil


This is what the transaction history looks like.

Under load, there were unseen before client errors showing once in a while. However, these can easily be caught and retried.

Error: Server returned an error: Code: 503 () Body: Certified state is not available yet. Please try again...

TLDR: We are very happy with the results and launching is in sight. :steam_locomotive:

The current cluster can support:
History transactions (auto-scaling) - unlimited
Nfts (auto-scaling)- unlimited - up to 327 million if no more canisters are added.
Non-fungible token inventories - 32 million - unlimited with rebalancing.
Fungible token accounts - 10 million - few options are available to scale that up too.

You can try the dapp. It is in test mode and after authenticating with IE you are given 8 testICP. All tokens fungible & non-fungible are temporary during the tests and will be erased.
https://nftanvil.com/mint

11 Likes

I just wanted to bump this up. We need more “report from production” posts. There is great info here and it looks like hash maps may just be best avoided. This is unfortunate as I have a number of them already deployed. :grimacing: I guess I’ll switch them to trie maps in the next upgrade.

1 Like

Was your code motoko or rust?

Hey infu,

This is really great stuff thank you! for sharing this

It’s all Motoko.
TrieMaps have 99% the same interface as Hashmaps, so perhaps they can be easily switched. Maybe the problem is in AssocList. I remember getting high memory usage with it too and it’s used inside HashMaps. They are a var Array of AssocLists.

9days later

Canisters 3,4,5 have bigger images, so there are less nfts in each. They hit the memory limit set by us to 1gb instead of the max count limit like 1,2,3.

Ratio heap:memory slightly increased on NFT type canisters (They use only var array). From 1 : 2.1 to 1 : 2.9
private stable var _token : [var ?TokenRecord] = Array.init<?TokenRecord>(65535, null);

Upgraded a few times (without changing types) and all is good.

Was surprised to see even higher freezing_threshold 8T. That happened on the biggest 1.5GB canister.

Canister p6u54-wyaaa-aaaai-qcmka-cai is out of cycles: requested 2000590000 cycles but the available balance is 6993673709288 cycles and the freezing threshold 8071506052621 cycles

At the frontend (during 5days) 153 users got 507 errors like these:

  Canister: kbzti-laaaa-aaaai-qe2ma-cai
  Method: config_get (query)
  "Status": "rejected"
  "Code": "SysTransient"
  "Message": "IC0515: Certified state is not available yet. Please try again..." 

These errors happened at two exact hours and then disappeared. Maybe someone fixed them.

41 users got 107 errors like these:
Code: 400 () Body: Specified ingress_expiry not within expected range: Minimum allowed expiry: 2022-03-01 07:57:27.063
Which also happened at a specific time and disappeared.

Overall sentry io reports 95% crash free rate and users seem happy with how it works.

“Crash Free Sessions” is the percentage of sessions in the specified time range not ended by a crash of the application. Crash- The app had an explicit unhandled error or hard crash.

2 Likes

@infu What made you choose TrieMaps over Red Black Trees (RBTree.mo)? Did you try out Red Black Trees and find any results on performance/memory usage or go straight to TrieMap?

@claudio @rossberg Any input on why the Motoko HashMap library takes up 28X memory vs. TrieMap? I would expect it to take up maybe 2-3X (Hash Table Array doubling & collision lists), but this seems like a pretty unexpected result, suggesting that either the HashMap implementation or mutable Arrays (the underlying hash table) take up more memory than expected.

I’m wondering if part of this memory usage comes from the new table that gets created each time, and that table gets thrown away but the memory footprint remains and isn’t being overwritten. I’m referring to line 92 where the new table is being created in the replace method of this code motoko-base/HashMap.mo at master · dfinity/motoko-base · GitHub

1 Like

Tagging #Motoko to hopefully get some additional eyes on this.

I’m also very curious about this large memory discrepancy between HashMap and TrieMap.

@claudio do you have any ideas regarding why the memory usage is so much higher for HashMap? And were there any benchmarks run on the modules in motoko-base beforehand that can be referenced?

Based on the table resizing code link @justmythoughts posted, the load factor for the current HashMap implementation is 1 (hash table waits until count >= size of the array to double. I could see memory usage being 4X or 10X if the load factor was 0.25 or 0.1 respectively, but it makes no sense that the memory usage is 28X that of TrieMap with a load factor of 1.

I wonder if it hash anything to do with this issue, and not necessarily that the HashMap is less space efficient

2 Likes

I haven’t tried RBTrees. I will check that out when I get to do more tests.

Maybe we should also consider asynchronicity.

What happens to the memory and locally scoped variables when 100 async requests (which had to do something with HashMaps) at the same time get paused “awaiting” for inter-canister calls to finish.

My tests were doing 100s of async calls per sec.