DFX Telemetry Proposal
Summary
We are working on adding telemetry collection to the Internet Computer SDK to improve the platform’s developer experience based on insights derived from behavioral data.
We’d like to create one of the most privacy-friendly telemetry collection pipelines:
- Data collection is opt-in only.
- There won’t be required telemetry that you can’t opt-out from.
- The data will be stored on the IC platform and managed by a canister with open-source code.
- The information will be thoroughly anonymized.
Please let us know what you think!
Intro
DFX is the default tool for developers to create, build, and manage Internet Computer canisters.
We were hearing community feedback that we need to improve the developer tools. For example, a survey of SuperNova participants flagged the developer experience as the #1 concern for the Internet Computer:
What should we improve exactly?
Taking a step back, how do you decide in what direction you should improve the product?
Approach #1: Heuristic.
The product team would use DFX, notice inconsistencies, and add features that improve the situation.
The good part of this approach is that the team would feel deep empathy for users. If you experience how bad (or good) something works, it’s easy to get motivated to improve it. Also, developers will know exactly how users feel to the tiniest detail instead of general categories like “improve onboarding” or “optimize identity.”
However, the product team usage may differ from the community’s. They may use tools less often than most devoted developers. The product team is cursed by knowing how things work internally and don’t stumble on the issues that confuse beginners.
Approach #2: Collect user feedback.
The product team would ask users to complete a survey or conduct an in-depth interview. This is a traditional way to do product research.
While this method works, there are a lot of limitations:
- Users don’t like surveys. Participation is usually low. If there’s a reward, some users fill out surveys randomly just to collect the incentive.
- Users don’t necessarily know what’s wrong. Only a minority of developers can express what exactly needs improvement.
- This is the most expensive way to collect information. The team needs to hire a UX researcher to create surveys and conduct deep interviews.
- The feedback cycle is slow. Learning the users’ opinions will take weeks or months if something goes wrong.
Approach #3: Collect behavioral data ← BEST
Collecting direct behavioral data is often the best way to know what’s happening with the product. You see your product usage dynamics, can conclude quickly, and iterate on features.
The data collected by the product is usually called “telemetry.” The product collects telemetry and sends the data to the server for further analysis.
The product team doesn’t know if dfx users experience many errors. We must rely on Twitter and the developer forum to see if users experience SDK failures. When the data collection infrastructure is established, it will be trivial to collect errors to proactively detect spikes and save environment information that will have to detect root causes.
However, data collection can be privacy-invasive. Let’s take Visual Studio Code as an example:
- Opt-In vs. Opt-out. The IDE collects telemetry by default without asking for explicit user consent (maybe somewhere in the ToS, but who reads those anyways). If you have never thought about telemetry, the chances are you will never know that you can opt-out.
- Required telemetry. Furthermore, only the “optional” analytics can be opted out from. The “required” analytics will always be sent to Microsoft unless you block it with a personal firewall.
- Hidden processing. You don’t know how the telemetry data is processed on the server and whom it’s shared with. There’s only vague language in the privacy policy.
- Bad anonymization. Anonymization could be inadequate. For example, most IP addresses can pinpoint a particular household (because of the residential ISP) and later correlate with other activities to reconstruct the real identity of the developer.
Solution
Is there a way to keep most of the benefits of telemetry but make data sharing consensual, voluntary, and maximum private?
DFINITY team is working on the telemetry collection system that solves most concerns:
- Opt-in only. Telemetry will be opt-in only. Users will go through a consent prompt before any data hits the server.
- All telemetry is optional. Developers are not required to share ANY data to use the IC.
- Transparent processing. The analytics platform will process the data in an IC canister with open-source code. Users can see exactly how data is stored, anonymized, and aggregated.
- Good anonymization. The analytics platform will drop IP addresses and other unique identifiers from the dataset. Since the data collection and processing code is open source, developers can check the anonymization logic by themselves.
- Purposefulness. Inform the community on how data-driven decisions are taken (e.g., we could write we added feature X after analyzing telemetry data over the past Y days in the release notes).
Rollout and timing
We plan to start working on the feature in the coming weeks. Please look for updates on the architecture and timing.