The issue was already mentioned in another forum thread.
Seems like this is a very unlikely corner case in the consensus code. We will submit an NNS proposal to recover the subnet, and continue debugging the root cause with the data after the incident is resolved.
This recovered the subnet, although we did not resolve the root cause. The consensus team will try to get to the bottom of it, although they only have limited resources over the holidays.
There will also likely be a Post Mortem, once we identify and resolve the root cause.
Voted to adopt proposals 134605, 134606 and 134607. Critical fix as explained above, already executed. Great work @sat and team on getting this happening so fast!
Yes, we recovered the subnet yesterday without fixing the root cause, and just when the fix for the root cause was about to be rolled out to the subnet, it got stuck again. Bad luck.
For what it’s worth, this subnet is unusable for us, latency is too high and unpredictable for a social network. As a result, we will be moving to another one, with all the inconveniences that it creates. I’m not sure if this is the expected behavior.
Voted to adopt proposals 134623, 134629 and 134632. Critical hotfix for subnet lhg73 having stalled again, already executed pending further investigation into cause as explained above.
Voted to adopt proposals 134623, 134629 and 134632.
This subnet stalled not long ago and was recovered without fixing the root cause and just before this was fixed it stalled again. The issue was discussed here and the following IC OS Version Election proposals 134608 and 134609 were made and have been executed to the issue should be solved. Great work @sat and team.