Timestamp failed to pass the watermark after retrying the configured 3 times

Hi, @kpeacock!

After upgrading agent-js to v1.2 we started to observe the following error during ICP token balance fetching:

Timestamp failed to pass the watermark after retrying the configured 3 times. We cannot guarantee the integrity of the response since it could be a replay attack.

All the other tokens work fine - we’re able to fetch the balance without this error most of the time, but sometimes it appears for other tokens as well. Seems like the probability for this error to appear is something like 95% for ICP and 10% for other tokens.

Maybe, something is wrong with our latency expectations. How do we fix that?

Thanks in advance!

1 Like

Interesting! It may have something to do with the volume of transactions coming through the ICP canister. This is great feedback, and there are a few ways we can handle this.

Things you can do right now

  1. You could wait a second and retry the query if it fails watermarking
  2. You could set a higher retry time count

Things I can do:

  1. Add additional / exponential delay to the retries
  2. Test against the ICP ledger on mainnet and set a more reliable default
1 Like

Thanks! We will try this first thing tomorrow and report back to you.

I don’t know if it helps, but maybe you could reproduce this issue setting the browser into throttling mode.

Yes, retrying more times helps.
Set to 10 just to be sure.

Thanks!

Here’s a PR - I’d appreciate your feedback on the design and naming!

1 Like

I just faced it took in Oisy Wallet. Any progress on the fix?

Sorry for the delay. The new strategy has been merged today - feat: retry delay strategy by krpeacock · Pull Request #871 · dfinity/agent-js · GitHub

And now the fix is out - Agent-JS 1.3.0 is released!

1 Like

hi, I’m having this problem again with agent-js 1.3.0.
image

Which canister is this targeting? Is it the ICP ledger again?

I believe the delay is working correctly, so this might just mean that the retryTimes count should be increased

What is this “watermark protections against replay attacks / stale data”? Is it documented somewhere? I would like to understand what causes and throws the error. Is it the gateway, boundary node, replica? And under what circumstances exactly?

I see two potential places where this problem could happen.

Here blsVerify is passed instead of the actual request. Typescript doesn’t catch that, becase the request is of type any in the definition of pollForResponse.

The fix is:

const { certificate, reply } = await pollForResponse(
  agent,
  ecid,
  requestId,
  pollStrategy,
  undefined,
  blsVerify,
);

And here blsVerify isn’t propagated through the recursion.

The fix is:

return pollForResponse(agent, canisterId, requestId, strategy, currentRequest, blsVerify);

This is an agent error. It was introduced to prevent allowing stale data through that has a timestamp before the last known block that came in as a call.

This can prevent against ordinary stale data, or a malicious MITM replay attack. Since a node can fall behind, a valid canister signature may still come back, but we know that the state may have changed in a more recent block.

In theory, another request or two with a slight delay should hit a different node, or allow the behind node to catch up.

Despite the security advantages of this feature, this has been leading to increased client errors and degrading the user experience.

Thank you for identifying this mistake! I’ve opened a PR here - fix: passing request correctly during pollForResponse Processing status by krpeacock · Pull Request #909 · dfinity/agent-js · GitHub. I don’t know of a good way to test this flow to verify the theory that this was causing the error, and the fix will resolve it, though. There isn’t any tooling to produce a Processing response back from a test replica that I know of

1 Like

So the correct way to handle it is simply to retry the call?

heya frens,

after updating agent-js to v2.1.2 we encountered this error again.
is anyone else facing it again?

@kpeacock, by any chance do you already aware of it?

thanks

@timo also called this out to me. It’s on my radar and important, but I have to get a couple other things taken care of before I can investigate fully. It’s possible this is happening more frequently with the higher load incidences, but I’ll hunt for a flaw in the logic