I’m working on a Rust-based dfinance project on ICP and need to improve our logging approach. Currently, we’re using basic ic_cdk::println! for logging (with over 1,500 calls across our backend), which is becoming problematic since:
Logs are unstructured and lack context (file, line number, timestamp)
No severity levels are implemented
With the 4KB log limit on canisters, we risk losing important information
Difficult to use in practice, especially for incident response
I’m exploring options like:
ic-logger
Has anyone implemented a comprehensive logging solution for production ICP applications? I’d appreciate recommendations on:
Which framework you’ve found most effective in production
Strategies for external log storage/backup beyond the 4KB limit
Best practices for log levels in dev vs. production environments
Any performance considerations specific to the IC environment
Any examples, code snippets, or lessons learned would be incredibly valuable as we look to improve our system.
I see you’re asking IC dev community to share their experiences and I’d be happy to hear it too, but as a developer who implemented the current canister logging state and planning it further improvements I’d like to write down some of my recommendations.
The current implementation was a proof-of-concept designed to be low-barrier, available by default, simple, and free – hence the small 4KiB buffer.
Overcoming the Small Buffer Limit
Log Aggregation: Create an off-chain logging agent to regularly fetch, deduplicate, aggregate, and expose logs
(optional) Custom Logging: Build a wrapper around ic_cdk::print* to add log levels (e.g., [I], [D], [W], [E] for info, debug, warning, error) and decode these on the aggregation side
(optional) Missing Log Detection: Utilize each log entry’s unique increasing IDX to detect gaps, which can indicate either infrequent fetching or oversized messages
(optional) Message Shortening: Apply ellipsizing to long messages by preserving only the beginning and end, with an ellipsis (...) in the middle
(optional) Compression: Explore compressing text logs into a binary format using Rust crates to extend the effective buffer size. I’m not sure but it might give significant improvements to a given 4KiB limit. You also need to handle separately trap messages, because they are saved as strings inside replica, not the canister
Log Exposure: Enhance your logging agent to either send logs to third-party services like Datadog or expose them via an HTML page with filtering (by level, time, traps/no_traps) and other sorting options
There are obviously some limitations to that approach too, for example Boundary Node rate-limiting does not allow to fetch logs too often. But for a single canister that produces moderate size logs it should be usable.
These steps can help overcome the buffer limitations and deliver a more robust logging solution.
I am curious to hear what people think about the current state of logging, their pain points and maybe most urgent feature requests (except for increasing the buffer size, I assume everyone wants it).