We would like to give some context around the security patch proposal 125000 to upgrade the ckBTC Ledger.
On Friday 2023-10-06 afternoon DFINITY found a security bug in the ckBTC Ledger and immediately started a security patch procedure according to the Security Patch Policy and Procedure (see proposal 48792). The security patch was finalized at 4 PM UTC of the same day. The proposal 125000 was then created, voted and executed.
The security bug affected the icrc2_approve endpoint of the ckBTC Ledger. The ledger records when an approval was submitted in an arrival_queue. If the same approval is added for the same (account, spender) pair more than 2 times, instead of updating the element in the approval_queue, a new element was inserted additionally. The outdated elements were never cleaned up and could cause memory contention if a large number of approvals were added.
How it Happened
The bug has been introduced with commit 3706996b7c290e8f5cff9a29532b4d255b995ff5. An appropriate debug_assert was introduced to check whether the arrival_queue has the same size as the data structure containing the approvals. However, due to a configuration error, the assert did not trigger and the bug was not caught by the tests in time.
On 2023-10-05, the ckBTC Ledger upgrade proposal 124868 went live. This upgrade included the commit 370699 and exposed the security bug on mainnet.
My perception of ckBTC being “not a bridge, it’s a mathematical masterpiece” just shattered… Makes me wonder whether to “bet the farm on it” going forward
transparency and verifiability are very important for projects like this. that’s why the details and explanations for (even small) hotfixes are always provided and verifiability of the patches is ensured.
however, the risk of this issue was not high. that allowance entries are very small and an attacker would have to create several millions of entries to cause real harm which is practically impossible. the fix was proposed immediately after the issue was found to avoid potential clean up work, not because it would have really harmed the project.
We all appreciate the transparency but since it took exactly 4min 22 sec from the creation to execution of the proposal not sure anyone would bother to check even the hash after the fact.
No, we didn’t. The bug was a small memory leak in the ckBTC Ledger. We followed the security patch update because we prefer to not expose problems before we patch them but it’s important to understand the impact of what happened and why realistically it wasn’t a concern.
The first point to understand is that the memory leak had small impact on the Ledger. The bug meant that a new approval for an existing allowance could make the Ledger use ~100 bytes more than needed. For this to become a problem, an attacker would have to create millions of approvals with a certain pattern. Creating an approval requires paying a fee and takes time, making the attack not easy to do and trivial to detect.
The second point is that we monitor our canisters and we would be able to spot the memory leak almost immediately once an attacker starts exploiting the bug. All our canisters expose the stable memory usage in the form of metrics and we have dashboards that keep track of them. We also have procedures to prevent memory usage to spiral out of control in case of anomalous memory usage. For instance, we can push an upgrade to the ckBTC Ledger to stop it from accepting new approvals and effectively neutralize the attack. Once the Ledger is frozen, we then would work on patching the issue.
The bug did not affect the functionalities of ckBTC. The only effect of the bug was to make the Ledger use a bit more space than needed for approvals. The ckBTC canisters correctness was not shattered.
Security patches cannot be disclosed before they are executed because otherwise they would expose the issue and somebody could then exploit them. The verification of security patches should still happen after the patch is installed.
@mariop thanks for this post-mortem, and @maria thanks for the initial communication. For trusted environments as such, it’s important (at least for ex-BigTech types like me) to know the maturity of the processes underneath, and that (almost) any mishap can be contained with minimal loss of funds, and most importantly credibility in the platform. Rock on Dfinity