Currently each response from IC (it is a boundary node if I understand it correctly) has following headers:
x-ic-canister-id
x-ic-node-id
x-ic-subnet-id
Sending the same query request multiple times - every client gets response from different (sometimes the same of course) boundary nodes (x-ic-node-id is different).
Is it possible to have additional header like x-ic-last-state-timestamp?
Having multiple responses - I would like to figure out - which of them has the most recent state and is âup-to-dateâ.
P.S. maybe @nomeata is the right person to start with�
Iâm not too familiar with the inner workings of the boundary nodes. But x-ic-node-id is very likely not the boundary node, but rather the actual ânormalâ node that is answering this request.
I assume that these headers are added by the boundary node, which does not know the state timestamp. Weâd first have to change the query call interface to include that information, and then the boundary node can add it to response. It would be untrustworty, though (but so is the other information).
Each line is a request-response information for a particular canister and query method listings:
As an example line: INFO | jvm 1 | 2022/05/09 14:45:42 | 2022-05-09 14:45:42 INFO AbstractNFTCollectionCanisterCommand:83 - [6wih6-siaaa-aaaah-qczva-cai]:[icbucks]:[qoxup-yfb2y-2hpsx-6nz4w-3fxcx-eg7yj-wtarr-3dt47-6qiqw-g654f-dae] - 'listings' loaded bytes [76839]:[2030564593] in [1753] ms. means that:
reponse got at 2022/05/09 14:45:42
response got from node-id qoxup-yfb2y-2hpsx-6nz4w-3fxcx-eg7yj-wtarr-3dt47-6qiqw-g654f-dae
response size was 76839 bytes
hash of response bytes was 2030564593
reponse got in 1753 milliseconds
By analyzing records we can see that:
from 2022/05/09 14:41:11 to 2022/05/09 14:44:26 we got 20 responses from different nodes with the same content (all response contents has the same hash -1278708667)
the state in the canister changed and from 2022/05/09 14:44:34 to 2022/05/09 14:45:42 we got 8 responses from different nodes with hash 2030564593
then at 2022/05/09 14:45:52 from the node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe we got updated (with previous hash -1278708667) content
all subsequent responses had correct current hash 2030564593
Please note!
node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe finally responded twice with correct at 2022/05/09 14:46:53 and 2022/05/09 14:47:32
node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe was outdated for at least 1 minute and 18 seconds (from time when we first got 2030564593 hash till outdated response on that nodeâŚ
So question is:
Is is normal that node can be out of sync for that long (1.5 minutes)?
P.S. @diegop please mention someone from Dfinity devs who can answer
Hi there @alexeychirkov thanks for pinging me. I will escalate this to the right folks. (I cannot believe I missed it the first time, i appreciate you pinging a second time).
âx-ic-subnet-idâ is the IC subnet_id;
âx-ic-node-idâ is the IC node_id;
x-ic-canister-id is the IC canister ID calculated from the URI, length 27.
This is a very pertinent request. We donât have a view consistency model formally defined for data written on the IC. FWIW writes using updates calls and reads using queries without global timestamp amounts to local consistency i.e. weakest form of consistency. A reader can read old data, and new data, followed again by old data depending on the replica that is chosen to serve the query. (boundary nodes choose a random replica to serve a read)
Having said the HTTP response header in the form of the block number at which the query was replied to will NOT solve any developer woes. Esp. for developers trying to reason about the consistency model. Timestamps/block height will let users know out of N response which one is the latest. But users still wonât know if the N responses came from the same replica or overlap of lagging replicas. The question that needs to answer is -
Is query response X served from the most up-to-date replica?
Timestamps in headers wonât answer this.
So where next?
if one wants to read the most up-to-date state of a canister. The state has to be read using an update call.
OR.
Collect 1/3+ responses with unique node-id and the response with the highest block id is the latest one. Assumes one of the 1/3+ nodes participated in the consensus of the update and thus has the highest block number for responding to the query. (And also didnât mount a dynamic attack of responding with old data )
2 is no better than an update call. I donât see a workaround for R+W>N (where N=2/3 i.e. the consensus quorum) for any form on a strong consistency model.
x-ic-node-id is the replica that responded to the query. Its chosen randomly by the boundary node.
The question was not answered: Is is normal that node can be out of sync for that long (1.5 minutes)?
What is expected synchronisation delay time between nodes?
It is not that the node is âout of sync for 1.5 minutesâ. Whatâs happening is that this one node is consistently behind, trying to catch up. And, for reasons we are still unable to pinpoint, it is failing to catch up.
What should be happening in an ideal world is that as long as the subnet is not CPU bound (with canisters executing 1+ second worth of instructions every single round), a replica that has fallen behind for any reason will be able to execute messages at full speed, resulting in a block rate > 1 and eventually catching up. (Arguably the replica should be able to catch up even if the subnet is CPU bound, as the block rate of the subnet is limited by the 1/3 slowest replicas. And there is a whole lot of work around Consensus that a catching up replica does not need to do.)
What we are seeing instead is this replica actually making (very slightly) slower advance than up-to-date replicas, even though the average round execution time is around 600 ms. Based on which one would expect execution to advance at 1/0.6 or about 1.66 blocks per second. As said above, this is not what we are seeing, meaning the replica stays behind by about a constant amount, then doing a state sync.every 10 or so minutes.
As for how far behind is ânormalâ, that depends very much on the subnet. Assuming the case that weâre seeing (catching up replica not managing to outpace up-to-date replicas), the delta between the catching-up replica and the up-to-date ones is basically dictated by how long it takes to complete state sync. On this particular subnet (pjljw, with about 5.5K canisters and 130 GB of state):
computing the manifest (producing a hash of the whole 130 GB of state) takes about 15 seconds;
followed by 30+ seconds of long tail latency before the CUP is certified (before 2/3 of the replicas have done (1) and agreed on the result);
the catching-up replica takes about 15 seconds to complete a state sync; and finally
the catching-up replica takes about 20 seconds to verify that it has the full, correct state before it starts execution.
The above all adds up to about 1 minute 20 seconds before the catching-up replica is in the same state as the other replicas, which would seem to match up quite well with what you are seeing.
This is not an issue for the health of the subnet as a whole (we can see that it manages to consistently put out 1 block per second even with 12 out of 13 replicas), but it does cause some proportion of queries to be served from an out-of-date state.
As I described in the earlier post, there is no good way to have N queries, handled by N unique replicas.
Boundary nodes choose a random replica for answering the queries. One might get lucky if N queries land on unique replicas but thatâs unlikely. There are good reasons for choosing a random replica.
why not do a single update call, instead of N query calls? - this doesnât need any further fixes