Incident Report
Node: qpt6h
Date & Time: March 14, 2025, 00:12 AM (+5:30)
Incident Summary
The node attempted to download the latest update (Proposal 135696), but the download repeatedly failed. This led to multiple retries, causing the node to degrade and appear offline after rebooting.
Root Causes & Resolutions
- ISP Packet Loss: Possible packet loss during the update caused interruptions and slow download triggering the 60 second timeout retry (Fixed)
- Download Loop & Throttling: Repeated download attempts may have led AWS or Cloudflare to throttle the IP. (Resolved after leaving the node offline for an hour)
- Cable Issue: A faulty cable on one port was identified and replaced. (Fixed)
These issues were addressed around 6:00 PM +5:30. However, since there was no proper fix at the time, @DRE-Team and @sat considered submitting a proposal to remove the node from the subnet to allow it to download the update properly. (While keeping me inform)
Now that all issues have been resolved, the node is healthy again. Apologies for any inconvenience caused, and special thanks to @sat for the continued support.