Data persistance with ICP stable objects

jaxopaxo · May 5, 2024, 8:10am

Hi!

I’ve been trying to find a solution to my concerns about data loss when the canister data model changes.
Let me explain the idea briefly (please feel free to critique it). Here’s how I’m thinking, illustrated with an example.

I chose to keep everything in text format to avoid dependencies and the need for parsing data, thus reducing the risk of errors when saving or retrieving data.
Although this approach may not offer optimal performance, I’ve made an exception for the created and updated timestamps which are Numbers. You can offcource add createdBy and uppdatedBy, but I recomend to add those as attribute.

Here is my model

DataObject = {
id: Text,
parent_id: Text,
type: Text,
path: Text, // path is the name of the object it self, it can ve a vehicle, human, or any kind of object
owner_id: Text,
status: Text, // You may want to flag the object with any kind of status, like DELETED, BLOCKED, HIDDEN, NOTACTIVE, …
created: Int,
updated: Int,
attributes: [Attribute]
};

Attribute = {
id: Text,
parent_id: Text,
status: Text, // Same as DataObject.status
type: Text,
name: Text, // name, value pair
value: Text, // name, value pair
created: Int,
updated: Int
};

Below simple example how to use it,
let userDataObject: DataObject = {
id: “user123”,
parent_id: “”,
type: “User”,
path: “User”,
owner_id: “admin”,
status: “Active”,
created: 1632654400, // October 1, 2021
updated: 1632654500, // 100 seconds later
attributes: [
{ id: “user123_name”, parent_id: “user123”, status: “Active”, type: “string”, name: “name”, value: “John Doe”, created: 1632654400, updated: 1632654400 },
{ id: “user123_age”, parent_id: “user123”, status: “Active”, type: “int”, name: “age”, value: “30”, created: 1632654400, updated: 1632654400 },
{ id: “user123_email”, parent_id: “user123”, status: “Active”, type: “string”, name: “email”, value: “john@example.com”, created: 1632654400, updated: 1632654400 }
]
};

Please try to break it with more complext objects like
Object { id: ‘’, property1: AnotherObject, property2: ReferenceToAnotherObject, property3: ListOfObject, …} etc

asjn3e · May 5, 2024, 8:31am

can you explain what do you mean by data loss when the canister data model changes?
are you talking about updating a canister when you change a type or a data model?

jaxopaxo · May 5, 2024, 9:44am

Thanks for the quick response!
It is kind of psychological comfort

Data loss I mean is when the model changed, the data is is removed, it happened for me, and yes, there are ways to migrate data when schema (model) has been changed. My question here is only about the ObjectModel I described ( I tested it, and it seems to work ), the only problem about it is the persormance as I explained, but that can be solved ( hopefully )

So please, just focus on the model if it seems to work, and if not, please explain why.

jaxopaxo · May 5, 2024, 9:52am

One good thing about this DataObject model is that it is very easy to backup, because it is always the same, if it is a good approach

But ofcource, this is not comming to solve the world problems, it is just a tool I may use ( If it works ) in some cases

asjn3e · May 5, 2024, 10:05am

Im glad to hear that
it depends where you want to use your data object, since i have no idea where it’s supposed to be used my feedback is just a really general feedback.
The structure you’ve implemented looks good but if you could introduce possible options for status or enums it would be better

Status={
#DELETED,
#BLOCKED,
#HIDDEN,
#ACTIVE,
#DISABLED
};

DataObject = {
id: Text,
parent_id: Text,
type: Text,
path: Text, // path is the name of the object it self, it can ve a vehicle, human, or any kind of object
owner_id: Text,
status: Status,
created: Int,
updated: Int,
attributes: [Attribute]
};

Attribute = {
id: Text,
parent_id: Text,
status: Status,
type: Text,
name: Text, // name, value pair
value: Text, // name, value pair
created: Int,
updated: Int
};

jaxopaxo · May 5, 2024, 10:23am

Thanks a lot for your response, I thought about the status, but I just want give the freedom to parse the status when data is mapped into a real object, so the mapping can look like below:
DataObject (NEW) → ConcreteObject (#new)
Or another example
DataObject (0) → ConcreteObject (#new)
and map back
ConcreteObject (#new) → DataObject (NEW)
Or another example
ConcreteObject (#new) → DataObject (0)

This gives the programmer the freedom to chose any text to represent the status

asjn3e · May 5, 2024, 10:32am

i would say its better to sacrifice that freedom to have some rules that will reduce the amount of pain in the future. e.g. if you try to show the status in your UI in the future, it would be easier to write code for a specific list of possible status rather than trying to include every possible option. if you want to keep the freedom and also have a specific list statuses you can create a function in the constructor of your object called getStatus() that takes an input and returns sth based on that input
getStatus(“New”) => “NEW”
getStatus(0) => “NEW”

jaxopaxo · May 5, 2024, 10:51am

Hi!
Yes, the problem is that the status in the DataObject is generic, but if I specify it as an enum I will luck it to specific kind of object, let’s say that we have two kind of object that we want to map
let orderData = new DataObject().
let userData = new DataObject().

Both are dataobjects but the statuses for concrete Order is #new, #pending, #deleted, #delivered
for the user is #active, #banned, #baduser ( Won’t have such )

In this case the responisibilty will recide on the consumer.

Yes you are right about the complixity, but that is less painfull than loosing data

And thanks alot for your support and great feedback.

Samer · May 5, 2024, 11:38am

The data in a canister is replicated on all nodes in a subnet. This means it is already ‘backed up’ in sense. The state of your canister is pretty well guarenteed by the network.

Now you may run some code that results in data loss. If you experiment with new code, you could also do extra backup on the canister itself (copy the data and make periodic backups) or copy the data to another canister by chunking it and sending it over.

If you use Candid serialization or some other established format, you should be assured that you can always recover the data.

Note that copying large amounts of data to another caniser does come with a cost (cycle usage)

Samer · May 5, 2024, 11:45am

Having data stored on IC should give you more comfort than storing it anywhere else. But you need to understand the security guarentees the system is providing.

The IC is tamperproof. Once the data is on there, the only way to compromise it is if you intentionally run some code or allow access to it in some way.

Storage on IC costs more than on storage on physical drives or servers though.

asjn3e · May 5, 2024, 11:47am

ok in that case for each object it is common practice to have specific enums for each object just to make it easier for you and also anyone else working on this project,
forexample for order object define the status
#new, #pending, #deleted, #delivered
and for user #active, #banned, #baduser,
having said that adding new types wont cause data loss only removing types and modification of existing types might cause data loss in some cases so it will be always safe for you to add new status children

Samer · May 5, 2024, 11:48am

There are other ways to solve for this than storing data as strings and adding that complexity.

First off, you should thibk the models through before deciding them and perhaps keep some room for extensibility. If thats not enough, you could also transform old models into new ones. Or start all over on a new canister

jaxopaxo · May 5, 2024, 12:09pm

Thanks alot for the answers!

I know and trust the IC, no question about that. It is just to avoid human misstakes, I am just thiking when having the code deployed and someone accedently change some thing and deployed breaking change code with accepting changes, that results data loss for sure no matter how data is replicated among nodes.

From your ansers some options below:
1 - Simple canisters with unchangable models, and when new features added, just a new object with reference to the main plus the extra data features.
2- Periodically backup, or before each deploy backup the data, remap and copy back.
3- Candid serialization, maybe good option.

jaxopaxo · May 5, 2024, 12:10pm

With all respect to your help, complixity is less painfull than loosing data

LiveDuo · May 9, 2024, 9:50am

This is a very fair concern.

It’s always a good idea to backup but you’d need an “restore” function that loads back the backup data. That easy to do conceptually but has implications on canister ownership and decentralization.

Another important aspect is to have proper tests. You can test your canister before and after and see that data are not lost / have not changed after the upgrade. We have an example here sadly made in JS.

Topic		Replies	Views
Permanent data in a stable memory, not really permanent Language Support	3	232	April 28, 2024
Canister backup Programs & Applications	28	3000	September 18, 2023
Canister backup and restore [Community Consideration] Developers Discussing , community-consideration	53	2517	October 3, 2024
Backup & Restore Canister data Programs & Applications	3	951	September 18, 2023
ICP Backend Canister Logic Question Motoko Functional-Programming	0	57	September 23, 2024

Data persistance with ICP stable objects

Related topics