Documenting a Problematic Idea on Data Ownership

I am recording a problematic idea here aimed at improving user data ownership and data sovereignty while optimizing the storage efficiency of dApps. The reason I call it a problematic idea is that I’ve identified many issues it could introduce, but I haven’t yet found suitable solutions. I want to document it first, await discussion, and gradually refine it.

Problem Statement

I often try out various dApps in the ecosystem, and many claim to return data ownership/sovereignty to users. I often wonder: if the dApp team ceases maintenance and no one tops up cycles for the dApp’s canister, how can I retrieve my own data? For example, if I wrote some blogs or published some artworks, would these data disappear along with the dApp?

I find that very few dApps currently explain how they address this issue. Personally, I believe that in Web3, this is an obligation. If I, as a dApp developer, claim to achieve data sovereignty, I should ensure that even if the dApp is shut down, users can still retrieve their data in a certain way.

Some dApps adopt an architecture where a separate canister is created for each user to store their data, aiming to achieve better “data sovereignty.” The intention behind this architecture is commendable, and I believe the designers are working hard for the users’ benefit. However, when considering the questions I raised earlier, does this architecture solve those problems? I don’t think so. Ordinary users still don’t know how to find and manage their own canisters. The dApp team remains responsible for managing these canisters, only now the system complexity has become more daunting, while the team remains immersed in the satisfaction of providing data ownership to users.

If storing all user data in a single canister is analogous to a typical Web2 application storing data in a database, then the alternative design of deploying a separate database instance for each user does not fundamentally change the problem. It only creates additional challenges for the dApp team, such as the one-time deployment fees for a large number of canisters and the need to upgrade thousands or even tens of thousands of canisters during updates instead of just a few. It also requires monitoring the cycles status of this vast number of canisters.

Worse yet, assuming many dApps in the ecosystem adopt the approach of creating a canister for each user (which I am certain is already the case), let’s temporarily set aside the practical significance of this solution (assuming it truly solves the data ownership problem). How would users know which dApps have created independent canisters to store their data? Imagine living in a world where Web3 is widely adopted, and we use services from 50, 100, or even more dApps in our daily lives. If each dApp creates a canister for user data, would users even know how to manage these 100 canisters storing their data? Would this management be efficient? I don’t think so.

My Solution

  • Each user owns a Personal Data Canister (hereafter referred to as a PD Canister), and their Internet Identity has full control over this PD Canister.
  • When a user uses a dApp for the first time with their Internet Identity, the dApp will ask for the user’s PD Canister ID. The dApp will maintain a user index mapping the user’s ID to the PD Canister ID and request restricted read/write permissions.
  • If a user’s PD Canister runs out of storage space, they can create a second PD Canister. Generally, dApps do not need to concern themselves with how many PD Canisters a user has; they only need to focus on the PD Canister linked to the user.
  • Users can manage their PD Canisters through a specific interface, such as viewing which dApps are linked to the canister, how much space is being used, etc. Users can perform management operations like data cleanup, data transfer, and offline backups. They can also transfer the PD Canister to another subnet.
  • Users can adjust encryption and non-encryption strategies for certain data.
  • Users can also authorize and set fees for third-party access to data they are willing to make public.

Users will be responsible for their own canisters, much like they are responsible for their physical devices, such as smartphones. Just as we need to charge our phones, we need to add cycles to our PD Canisters (don’t worry, we will discuss this issue in more detail later).

Let’s briefly look at the benefits we gain:

  • Users achieve stronger and more practical data ownership and data sovereignty.
  • From the user’s perspective, the fragmentation of data storage canisters is alleviated.
  • From the dApp’s perspective, meaningless one-time deployment fees for canisters are significantly reduced.
  • From the network’s perspective, the original number of canisters (dApps * Users) now approaches dApps + Users, enabling more efficient operation.
  • This is especially conducive to the development of small dApps. With the emergence of Caffeine-AI, more small dApps will appear.

Consider this: most dApps do not generate a large amount of user data, except for those involving multimedia resources like images and videos. For example, a blog or article publishing platform might only require a few MiB for dozens of articles. If images are involved, the storage need may be larger but still manageable. Many other dApps may only generate personal preference or settings data. By allocating dedicated sections within a Personal Data Canister, we can efficiently store such data.

Now, let’s discuss the problems this solution introduces:

Users are responsible for topping up cycles for their PD Canisters.

  • When users first enter the ecosystem, they may not have ICP to top up cycles, which raises the barrier to entry and is very unfriendly. However, this issue can be mitigated through incentive programs, such as providing newly registered users with a PD Canister pre-filled with a certain amount of cycles to experience the ecosystem.
  • If a user remains inactive for an extended period, the cycles in their PD Canister may be depleted for various reasons. If the user fails to address this issue during the retention period, the PD Canister may be released by the network.

At first glance, it might seem burdensome for users to bear the cost of data storage cycles. However, from another perspective, dApps, as service providers, currently shoulder the cost of user data storage, but they ultimately need to profit. In this context, if we view personal data storage separately, dApps essentially act as intermediaries. Intermediaries typically need to profit from price differences, which are hidden within the dApp’s business model. Users often struggle to understand dApp business models, which can be seen as a lack of transparency.

Reducing intermediaries is one of the core principles of Web3. Achieving this step will benefit users in the long run. On the other hand, dApp developers can also benefit. The business models that many dApp developers need to consider will be simplified because they no longer need to worry about storing large amounts of user data or maintaining complex systems, upgrading canisters for thousands of users, or monitoring and topping up cycles. Their work becomes simpler, allowing them to focus more on core business development. For small dApps, especially those that only generate user settings or other personal data, this is undoubtedly an optimization.

To address the issue of topping up cycles for users’ PD Canisters, we can introduce a top-up service provider. Users can pre-deposit some ICP, and the service provider will monitor and replenish cycles for them. Centralized handling is more efficient than individual efforts, saving resources while allowing the service provider to earn appropriate profits.

1 Like

I’d like to link a topic here A standard for user owned data canisters?

@peterparker I think Juno’s storage solution is very close to this, as it inherently provides storage services for other dApps through stable interfaces. However, I believe Juno’s current approach is more focused on serving as a database for dApps, rather than being bound to individual users and allowing different dApps to map to their respective storage areas.

1 Like

Additionally, I recently watched a video released by dfinity. The dmail team initially created an independent canister for each user, but eventually switched to sharding the canisters to allow multiple users to share one due to cost constraints.

Here’s the link https://youtu.be/ga7uMpDiBk0?si=SZWJ_BzPzOQ71FUR

Personal Canister has been an obsession, we even run an 1-month event centered around it. My current conclusion is that the tech is here, but it’s missing one use-case where it “clicks” and spread the network effect.

I don’t think the appeal of owning your data is enough. Most people don’t give a f*ck. Even users that claim to care, usually ends up back on centralized networks. I’m one of them.

I believe the “Personal Canister movement” needs something that is simply impossible to build any other way. Until then, we’ll continue to build Web 2.5.

5 Likes

Also the twist of fate of building something like this on the Internet Computer is that it will free users from the currently centralized networks only to put them back under the NNS. But that’s a probably a discussion for another day.

I’m not entirely focused on the ownership issue itself; I usually approach such considerations under the premise that users fundamentally don’t care. What I’m thinking is that this solution could also address certain problems of resource waste and potentially simplify the model by reducing intermediaries.

1 Like

My approach is what is currently demonstrated on mycanister.app:

  • Deposit ICP
  • Perform II auth
  • You have your own canister

Your “Canister Dapp” (as I like to call them) is:

  • Only controlled by your II
  • Has a dashboard to manage canister
  • Serves frontend UI for the Dapp

For devs:

  • Code backend logic
  • Code frontend
  • Glue together into single wasm using My Canister Frontend crate
  • Add My Canister Dashboard

There will be a test suite to check whether your wasm meets the yet to be specified specification for a Canister Dapp

1 Like

Once its easy to develop Canister Dapps, we need to solve one more problem:

Decentralized user discovery. Your Canister Dapp needs to discover users of the service, join a group, publish stuff etc.

Several approaches are possible. I aim to demo this later this year

Wasnt there an ICRC standard in this direction?

1 Like

We need to think of the industries and fields where people do care about owning their own stuff.

Design/film industry for example we regularly back up work to various drives as well as cloud storage.

Games, people hate that many games companies no longer let you fully own a game and make their own products obsolete.

Software, like game many are just rentals.

Music/video a specialist section of the market favours owning media over renting it out.

I’m sure there are many more than that. Even within a niche there will be large markets keen to own their own data.

6 Likes