Upgrade to latest replica: CHANGELOG and errors

I try to upgrade the replica in the Juno Docker from 3d6a76...ce96ee93 to the latest release e54d3fa34ded...05fa4b5d564743 (see PR), but it doesn’t work out of the box.

Where is the CHANGELOG, or what needs to be adapted to spin up this new version locally?

This is for example what I get as stacktrace:


juno-console-1  | # TYPE artifact_manager_client_processing_interval_seconds histogram
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.1"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.2"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.3"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.4"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.5"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.6"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="0.8"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="1"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="1.2"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="1.5"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="2"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="2.2"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="2.5"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="5"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="8"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="10"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="15"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="20"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="50"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="canisterhttp",le="+Inf"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_sum{client="canisterhttp"} 29.59558589199999
juno-console-1  | artifact_manager_client_processing_interval_seconds_count{client="canisterhttp"} 146
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.1"} 62
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.2"} 62
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.3"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.4"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.5"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.6"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="0.8"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="1"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="1.2"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="1.5"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="2"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="2.2"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="2.5"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="5"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="8"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="10"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="15"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="20"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="50"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="certification",le="+Inf"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_sum{client="certification"} 29.53086163499999
juno-console-1  | artifact_manager_client_processing_interval_seconds_count{client="certification"} 207
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.1"} 317
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.2"} 317
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.3"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.4"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.5"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.6"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="0.8"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="1"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="1.2"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="1.5"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="2"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="2.2"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="2.5"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="5"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="8"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="10"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="15"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="20"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="50"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="consensus",le="+Inf"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_sum{client="consensus"} 28.432888392000006
juno-console-1  | artifact_manager_client_processing_interval_seconds_count{client="consensus"} 454
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0"} 0
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.1"} 1
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.2"} 1
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.3"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.4"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.5"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.6"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="0.8"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="1"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="1.2"} 139
juno-console-1  | artifact_manager_client_processing_interval_seconds_bucket{client="dkg",le="1.5"} 139
juno-console-1  | 
// etc for hundred of lines
artifact_pool_consensus_height_stat{pool_type="validated",stat="min",type="random_tape"} 1
juno-console-1  | artifact_pool_consensus_height_stat{pool_type="validated",stat="min",type="random_tape_share"} 1
juno-console-1  | # HELP artifact_pool_op_duration_seconds The time it took to perform an operation on the given pool
juno-console-1  | # TYPE artifact_pool_op_duration_seconds histogram
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.0001"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.0002"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.0005"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.001"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.002"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.005"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.01"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.02"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.05"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.1"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.2"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="0.5"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="unvalidated",le="+Inf"} 32
juno-console-1  | artifact_pool_op_duration_seconds_sum{op="purge_below",pool="ingress",pool_type="unvalidated"} 0.000029124000000000008
juno-console-1  | artifact_pool_op_duration_seconds_count{op="purge_below",pool="ingress",pool_type="unvalidated"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.0001"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.0002"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.0005"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.001"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.002"} 32
juno-console-1  | artifact_pool_op_duration_seconds_bucket{op="purge_below",pool="ingress",pool_type="validated",le="0.005"} 32
juno-console-1  | thread 'logger' panicked at external/crate_index__slog-2.7.0/src/lib.rs:1944:33:
juno-console-1  | slog::Fuse Drain: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
juno-console-1  | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
juno-console-1  | ./docker/replica: line 16:    16 Aborted                 ./target/ic-starter --replica-path ./target/replica --http-port "$REPLICA_PORT" --state-dir "$STATE_REPLICA_DIR" --create-funds-whitelist '*' --subnet-type application --chain-key-ids ecdsa:Secp256k1:juno_test_key --chain-key-ids schnorr:Bip340Secp256k1:juno_test_key --log-level warning --use-specified-ids-allocation-range --consensus-pool-backend lmdb --subnet-features canister_sandboxing --subnet-features http_requests --initial-notary-delay-millis 600 --canister-http-uds-path /juno/.juno/sock
juno-console-1  | 2024-10-23T18:03:49.434413Z  WARN icx_proxy_dev::proxy::agent: Other error: An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.434470Z  WARN icx_proxy_dev::proxy::agent: Other error: An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.434537Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.450769Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.450772Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.451852Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=32 ms
juno-console-1  | 2024-10-23T18:03:49.451983Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=31 ms
juno-console-1  | 2024-10-23T18:03:49.452090Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=24 ms
juno-console-1  | 2024-10-23T18:03:49.858907Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:49.859278Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=2 ms
juno-console-1  | 2024-10-23T18:03:50.194834Z  WARN icx_proxy_dev::proxy::agent: Other error: An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:50.194893Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | An error happened during communication with the replica: error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:50.195017Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=1 ms
juno-console-1  | 2024-10-23T18:03:50.624890Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:50.625167Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=2 ms
juno-console-1  | 2024-10-23T18:03:52.162431Z ERROR icx_proxy_dev::proxy: Internal Error during request:
juno-console-1  | error trying to connect: tcp connect error: Connection refused (os error 111)
juno-console-1  | 2024-10-23T18:03:52.162721Z ERROR tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=1 ms

I believe the relevant bit is this:

juno-console-1  | thread 'logger' panicked at external/crate_index__slog-2.7.0/src/lib.rs:1944:33:
juno-console-1  | slog::Fuse Drain: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
juno-console-1  | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But I have no idea what kind: WouldBlock, message: "Resource temporarily unavailable" actually means. Seems like the replica fails to set up logging. Maybe running it with the RUST_BACKTRACE=1 environment variable set will give you more context.

Where there any breaking changes or parameters that need to be adapted in the replica, ic-starter, canister_sandbox, compiler_sandbox and sandbox_launcher, and ic-https-outcalls-adapter betwen the two version I’m mentionning? Any CHANGELOG I can have a look at to see what I missed?

The new replica is also throwing lots of new logs when I start it. Can it be related? What are those new things?

juno-console-1  | ..........Oct 25 04:55:07.440 TRCE # HELP crypto_bls12_381_sig_cache_hits Number of cache hits for successfully verified BLS12-381 threshold signatures
juno-console-1  | # TYPE crypto_bls12_381_sig_cache_hits counter
juno-console-1  | crypto_bls12_381_sig_cache_hits 0
juno-console-1  | # HELP crypto_bls12_381_sig_cache_misses Number of cache misses for successfully verified BLS12-381 threshold signatures
juno-console-1  | # TYPE crypto_bls12_381_sig_cache_misses counter
juno-console-1  | crypto_bls12_381_sig_cache_misses 0
juno-console-1  | # HELP crypto_bls12_381_sig_cache_size Size of cache for successfully verified BLS12-381 threshold signatures
juno-console-1  | # TYPE crypto_bls12_381_sig_cache_size gauge
juno-console-1  | crypto_bls12_381_sig_cache_size 0
juno-console-1  | # HELP crypto_boolean_results Boolean results from crypto operations
juno-console-1  | # TYPE crypto_boolean_results counter
juno-console-1  | crypto_boolean_results{operation="latest_local_idkg_key_exists_in_registry",result="false"} 0
juno-console-1  | crypto_boolean_results{operation="latest_local_idkg_key_exists_in_registry",result="true"} 0
juno-console-1  | # HELP crypto_epoch_in_loaded_nidkg_transcript Epoch in loaded NI-DKG transcript
juno-console-1  | # TYPE crypto_epoch_in_loaded_nidkg_transcript gauge
juno-console-1  | crypto_epoch_in_loaded_nidkg_transcript 0
juno-console-1  | # HELP crypto_fine_grained_verify_dealing_private_duration_seconds Histogram of a verify dealing private call durations in seconds
juno-console-1  | # TYPE crypto_fine_grained_verify_dealing_private_duration_seconds histogram
juno-console-1  | crypto_fine_grained_verify_dealing_private_duration_seconds_bucket{le="0.00035"} 0
juno-console-1  | crypto_fine_grained_verify_dealing_private_duration_seconds_bucket{le="0.00038500000000000003"} 0
juno-console-1  | crypto_fine_grained_verify_dealing_private_duration_seconds_bucket{le="0.00042350000000000005"} 0
juno-console-1  | crypto_fine_grained_verify_dealing_private_duration_seconds_bucket{le="0.0004658500000000001"} 0
juno-console-1  | crypto_fine_grained_verify_dealing_private_duration_seconds_bucket{le="0.0005124350000000001"} 0
juno-console-1  | crypto_fine_grained_verify_dealing_private

The error was due to a breaking change that now requires running ic-starter with the argument --metrics-addr=[100::]:0.

This is the second or third time I’ve had to spend time comparing code lines to update local replica tooling. As I’ve mentioned before, it would be really helpful if the foundation treated these tooling projects like open-source project — not just as “code is public, yolo” — by providing some documentation, such as CHANGELOG.

Those would be Prometheus metrics. What the internal dashboard (and, to a large extent) the public dashboard is based on. There’s an option somewhere to dump the metrics to stderr, likely in case of failure (or just when the process terminates, I don’t remember).

There is a CHANGELOG for every replica / GuestOS / HostOS release. There’s even a message being posted to the forum every time a release candidate proposal is made (there was one yesterday, IIRC) with the change log.

There is no CHANGELOG for ic-starter, canister_sandbox, compiler_sandbox and sandbox_launcher, and ic-https-outcalls-adapter.

Btw. beside the forum, where is this CHANGELOG posted? I now about the forum, I know I can search through the proposals, but is this also available in some sort of markdown file or GitHub releases content?

The releases themselves appear to be listed at Releases · dfinity/ic · GitHub

I don’t see any attached release notes / changelog, though.

True, although the replica changelog is only a beautified variant of Commits · dfinity/ic · GitHub (limited to actual replica code). So if you go over the commits, you’ll get the combined changelog (plus a bunch of cruft, I guess). It’s not ideal, but it’s what I could dig up.

It’s exactly what I meant. I’m absolutely not aware of any other open-source project published on GitHub, especially if funded, that releases without any notes and expects developers to go through commits to interpret the release.

Just sharing to help improve the experience for developers, as well as communication and per extension marketing of the foundation. Anyway, my issue is resolved, and the new version of Juno Docker is published. Thanks for the feedback—much appreciated!

1 Like

Human-readable change logs for GuestOS are published (under the name “release notes”) on the forum as well as in the proposal. The proposal and the forum change log is not ultra detailed, because it would be very long otherwise (sometimes in excess of proposal payload size!).

It bears clarifying that most projects which publish release notes don’t add absolutely every single change to the release notes. For example, this is an example of how the second most active open source project makes release notes. Then, if you compare their release notes with the actual merged commits, you’ll notice that the release notes are much, much shorter than the actual work that went into the release. That seems pretty standard to me.

I suppose we could improve the release automation so release notes are also posted to the respective Github-based release (see here for an example), but what is actually cryptographically signed is the proposal itself. It would also be a good thing if specific canisters that were updated had their respective logs. It’s a matter we’re discussing for sure.

2 Likes

Indeed, some contributors publish detailed notes, while others share general context, which I personally do on Juno (see RELEASES). I’m absolutely fine with both.

What’s really missing, from my developer point of view, is:

  1. A developer-focused way to share notes on the main IC GitHub repo. This is currently inexistent.

  2. Generally speaking, there’s also no consolidated list of changes. As a developer, I’m usually able to open a CHANGELOG file or scroll down the GitHub releases to see a history of changes, which is useful for historical context and catching up on skipped releases. While proposals could serve this purpose, to my knowledge, there’s no UI/UX to browse them in that way, and they’re far from where the code lives, hence no way of learning easily what changed between two releases.

But again, just sharing my two cents—maybe other devs don’t relate. Absolutely feel free to ignore!!! I’m fine with opening a forum post and grumbling a bit (:wink:) each time I upgrade my Docker image.

1 Like