BoundaryNode HTTP response headers

Hello

Currently each response from IC (it is a boundary node if I understand it correctly) has following headers:

  • x-ic-canister-id
  • x-ic-node-id
  • x-ic-subnet-id

Sending the same query request multiple times - every client gets response from different (sometimes the same of course) boundary nodes (x-ic-node-id is different).

Is it possible to have additional header like x-ic-last-state-timestamp?

Having multiple responses - I would like to figure out - which of them has the most recent state and is “up-to-date”.

P.S. maybe @nomeata is the right person to start with…?

Thanks!

3 Likes

I’m not too familiar with the inner workings of the boundary nodes. But x-ic-node-id is very likely not the boundary node, but rather the actual “normal” node that is answering this request.

I assume that these headers are added by the boundary node, which does not know the state timestamp. We’d first have to change the query call interface to include that information, and then the boundary node can add it to response. It would be untrustworty, though (but so is the other information).

1 Like

Looks like on mainnet - some nodes are out of sync by 1.5 minutes…

Please take a look at log: Query call node sync state · GitHub

Each line is a request-response information for a particular canister and query method listings:

As an example line: INFO | jvm 1 | 2022/05/09 14:45:42 | 2022-05-09 14:45:42 INFO AbstractNFTCollectionCanisterCommand:83 - [6wih6-siaaa-aaaah-qczva-cai]:[icbucks]:[qoxup-yfb2y-2hpsx-6nz4w-3fxcx-eg7yj-wtarr-3dt47-6qiqw-g654f-dae] - 'listings' loaded bytes [76839]:[2030564593] in [1753] ms. means that:

  • reponse got at 2022/05/09 14:45:42
  • response got from node-id qoxup-yfb2y-2hpsx-6nz4w-3fxcx-eg7yj-wtarr-3dt47-6qiqw-g654f-dae
  • response size was 76839 bytes
  • hash of response bytes was 2030564593
  • reponse got in 1753 milliseconds

By analyzing records we can see that:

  • from 2022/05/09 14:41:11 to 2022/05/09 14:44:26 we got 20 responses from different nodes with the same content (all response contents has the same hash -1278708667)
  • the state in the canister changed and from 2022/05/09 14:44:34 to 2022/05/09 14:45:42 we got 8 responses from different nodes with hash 2030564593
  • then at 2022/05/09 14:45:52 from the node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe we got updated (with previous hash -1278708667) content
  • all subsequent responses had correct current hash 2030564593

Please note!

  • node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe finally responded twice with correct at 2022/05/09 14:46:53 and 2022/05/09 14:47:32
  • node hy7f5-pw2f5-6qinp-znpu5-6gb25-xmbg2-vunun-uzcuk-fodvd-lwg4m-mqe was outdated for at least 1 minute and 18 seconds (from time when we first got 2030564593 hash till outdated response on that node…

So question is:
Is is normal that node can be out of sync for that long (1.5 minutes)?

P.S. @diegop please mention someone from Dfinity devs who can answer

5 Likes

@diegop anyone?
It is very critical I guess…

1 Like

Hi there @alexeychirkov thanks for pinging me. I will escalate this to the right folks. (I cannot believe I missed it the first time, i appreciate you pinging a second time).

1 Like

In which timezone are those timestamps @alexeychirkov?

1 Like

Coordinated Universal Time (UTC)

Fwiw folks are looking into this

‘x-ic-subnet-id’ is the IC subnet_id;
‘x-ic-node-id’ is the IC node_id;
x-ic-canister-id is the IC canister ID calculated from the URI, length 27.

1 Like

Hi,

This is a very pertinent request. We don’t have a view consistency model formally defined for data written on the IC. FWIW writes using updates calls and reads using queries without global timestamp amounts to local consistency i.e. weakest form of consistency. A reader can read old data, and new data, followed again by old data depending on the replica that is chosen to serve the query. (boundary nodes choose a random replica to serve a read)

Having said the HTTP response header in the form of the block number at which the query was replied to will NOT solve any developer woes. Esp. for developers trying to reason about the consistency model. Timestamps/block height will let users know out of N response which one is the latest. But users still won’t know if the N responses came from the same replica or overlap of lagging replicas. The question that needs to answer is -

  1. Is query response X served from the most up-to-date replica?

Timestamps in headers won’t answer this.

So where next?

  1. if one wants to read the most up-to-date state of a canister. The state has to be read using an update call.
    OR.
  2. Collect 1/3+ responses with unique node-id and the response with the highest block id is the latest one. Assumes one of the 1/3+ nodes participated in the consensus of the update and thus has the highest block number for responding to the query. (And also didn’t mount a dynamic attack of responding with old data :slight_smile: )

2 is no better than an update call. I don’t see a workaround for R+W>N (where N=2/3 i.e. the consensus quorum) for any form on a strong consistency model.

x-ic-node-id is the replica that responded to the query. Its chosen randomly by the boundary node.

4 Likes

In my reply above I claimed "we don't have a ... view consistency model formally defined for data written on the IC"

Coincidentally, I came across this today

This section specifically calls out the local consistency behavior - Thanks @nomeata

1 Like

The question was not answered: Is is normal that node can be out of sync for that long (1.5 minutes)?
What is expected synchronisation delay time between nodes?

For some cases having ‘x-ic-block-id’ or something like that would be really helpful.

‘x-ic-block-id’ with the comparing possibility (to define which one is the latest one)

It is not that the node is “out of sync for 1.5 minutes”. What’s happening is that this one node is consistently behind, trying to catch up. And, for reasons we are still unable to pinpoint, it is failing to catch up.

What should be happening in an ideal world is that as long as the subnet is not CPU bound (with canisters executing 1+ second worth of instructions every single round), a replica that has fallen behind for any reason will be able to execute messages at full speed, resulting in a block rate > 1 and eventually catching up. (Arguably the replica should be able to catch up even if the subnet is CPU bound, as the block rate of the subnet is limited by the 1/3 slowest replicas. And there is a whole lot of work around Consensus that a catching up replica does not need to do.)

What we are seeing instead is this replica actually making (very slightly) slower advance than up-to-date replicas, even though the average round execution time is around 600 ms. Based on which one would expect execution to advance at 1/0.6 or about 1.66 blocks per second. As said above, this is not what we are seeing, meaning the replica stays behind by about a constant amount, then doing a state sync.every 10 or so minutes.

As for how far behind is “normal”, that depends very much on the subnet. Assuming the case that we’re seeing (catching up replica not managing to outpace up-to-date replicas), the delta between the catching-up replica and the up-to-date ones is basically dictated by how long it takes to complete state sync. On this particular subnet (pjljw, with about 5.5K canisters and 130 GB of state):

  1. computing the manifest (producing a hash of the whole 130 GB of state) takes about 15 seconds;
  2. followed by 30+ seconds of long tail latency before the CUP is certified (before 2/3 of the replicas have done (1) and agreed on the result);
  3. the catching-up replica takes about 15 seconds to complete a state sync; and finally
  4. the catching-up replica takes about 20 seconds to verify that it has the full, correct state before it starts execution.

The above all adds up to about 1 minute 20 seconds before the catching-up replica is in the same state as the other replicas, which would seem to match up quite well with what you are seeing.

This is not an issue for the health of the subnet as a whole (we can see that it manages to consistently put out 1 block per second even with 12 out of 13 replicas), but it does cause some proportion of queries to be served from an out-of-date state.

4 Likes

Thanks for such in depth answer.

How big is the chance that we have “x-ic-block-height” header in reponse to deal with this situation?

1 Like

@alexeychirkov if there is a strong usecase then chances are 100pct.

How would one use the x-block-Id feature ? It would be great to drive this discussion around

  1. The problem ?
  2. The expected solution … and
  3. How exactly will the http header solve the problem

All answers are in previous messages:

problem - one of N requests has old state

solution - store block height from the response header and skip subsequent responses with lower or the same block height

As I described in the earlier post, there is no good way to have N queries, handled by N unique replicas.

Boundary nodes choose a random replica for answering the queries. One might get lucky if N queries land on unique replicas but that’s unlikely. There are good reasons for choosing a random replica.

why not do a single update call, instead of N query calls? - this doesn’t need any further fixes

why not do a single update call, instead of N query calls? - this doesn’t need any further fixes

Update calls are expensive, query calls - for free

Constantly calling canisters (in our NFTGeek case - canisters of each NFT collection) with update calls will drain their cycles.