Offline-first web apps with canister backend, has anyone done it?

Offline-first web apps often access and store the user’s data in local a real-time db, then sync that to the cloud and, if applicable, from there to another instance of web app running on a different device of the same user.

Has anyone done that already for a dapp with a canister as the backend/sync target?

Could any of the existing technologies such as CouchDB or RxDB be used for this and what would be required to make it work with a canister backend?

3 Likes

Could be interesting. Can you share more thoughts about this for example a graph shows how it works or looks like?

Yes. Papyrs was such an offline first dapp. Data were saved in indexedb and periodically sync to the IC with web workers.

1 Like

Actually I do that all the time :grin:. icdraw is also offline first. Same strategy.

And how does the syncing logic in the web worker work? Did you have to write that or is that a standard code? Syncing logic can get quite complicated.

And what did you have to write in order to provide a sync target inside a canister? Could you re-use any existing Rust code for that?

Custom code. I basically simplified the logic by making the frontend always push. The backend canister as in Juno prevent overwrite by comparing timestamps - i.e. only most recent entries can be written.

User journey summarized:

  • user load data from canister → browser
  • user work, data are saved in browser
  • every x seconds web worker checks if local data have been modified
  • if yes, push data → canister
  • canister checks data timestamps and save those

Both projects are open source if you are curious.

I think diagram-wise it is pretty easy:

web app js running in browser ↔ local real-time db ↔ background syncing protocol (maybe web worker) ↔ backend (goal to move this into a canister)

And then the same for another browser on another device connecting to the same backend. Access control would be by principal. If the user logs into the web app with II then he already has the same principal on both devices.

I suppose you always push the entire db for simplicity? Or do you actually figure out what entries have changed?

What format do you push? JSON?

Could you give me some pointers to start? I suppose backend canister and web worker are both in here GitHub - papyrs/ic: Backend canisters and providers of Papyrs, or?

Yes, RxDB is pretty great. One of the reasons I made RxMoDB.
The backend needs two functions to sync data with multiple frontends.
The frontend needs an adapter.
There are a lot of edge cases covered:

  • You have two offline devices and each adds new records, then you bring them online and they have to resolve conflicts.
  • You have multiple online devices
  • You have multiple tabs open and one has to be selected as a master
2 Likes

Of course only new, updated and deleted entries are pushed.

Actor and candid.


Papyrs gonna be a bit tricky to track because there is a lot of abstraction.

dapp: https://github.com/papyrs/papyrs
use a sync library: https://github.com/deckgo/deckdeckgo/tree/main/providers/sync
which effectively makes the calls with another library: https://github.com/papyrs/ic
same repo which also contains the backend

the worker itself is there https://github.com/papyrs/papyrs/blob/7028d3e0b44b328711d39f07051228e43d81d13a/src/lib/workers/sync.worker.ts#L40 but as you see, almost no code because super abstracted


icdraw it’s way simplier, everything is in the same repo: https://github.com/peterpeterparker/icdraw
which uses https://github.com/buildwithjuno/juno-js for the call

it’s also easier because it does not make an extensive comparison of what has change but just checks if one timestamp was updated https://github.com/peterpeterparker/icdraw/blob/b98a1e3d2d371be4cad2f8ee244955609910ea2d/src/workers/worker.ts#L71

Thanks for the links!

If you are pushing candid then does that mean that there’s a hardcoded data scheme in the backend canister? I mean it is not generic in the sense that I can push any indexdb content to the backend, or?

Do you mean these edge cases are already covered by RxMoDB?

If so, how? Does RxMoDB come with a frontend component?

A mix of both. Types (timestamps, keys) are defined in the backend but data are saved as blob backend of the two solutions I shared above but, it is not that relevant for the workflow I built.

RxDB handles them. RxMoDB allows you to store documents (have a primary key, deleted, and last_updated index) and provide what RxDB needs. No frontend components currently. I am planning on making some in the future to make things easier. You can set it up using this https://rxdb.info/replication.html

Conflicts are resolved by the frontend. The backend needs to do this to help it out:

When a document is send to the backend and the backend detected a conflict (by comparing revisions or other properties), the backend will respond with the actual document state so that the client can compare this with the local document state and create a new, resolved document state that is then pushed to the server again.

What are the two functions? Do you mean what is listed on Replication · RxDB - JavaScript Database under transfer level?

Do you mean where RxDB would normally call a REST API I need an adapter to turn this into canister calls?

I am trying to understand what exactly is needed on the frontend to connect RxDB with a RxMoDB canister. Do you have a minimal example?

For example, I suppose the “adapter” also has include an authentication part, i.e. making the canister from the right principal. Not sure how RxDB’s replication protocol handles authentication in other setups. They don’t really talk about authentication of rxdb.info.

I had a working demo, but then I made a lot of changes in RxMoDB. Let me try to fix it a bit and walk you thru.

Start with a document setup like this https://github.com/infu/RxMoDb/blob/main/test/hero.mo
There is some documentation and tests to help you figure out how to set it up.
It has a Primary key (id) and Index (updatedAt + id)

Push


Haven’t tested the conflict resolution yet, but this should work. You will need to put doc.revision +=1 whenever you update @ frontend (Or you can put different conflict detection)

Pull

Frontend

This is the schema I’ve used.

const mySchema = {
  title: "hero schema",
  version: 0,
  description: "describes a simple hero",
  primaryKey: "id",
  type: "object",
  properties: {
    id: {
      type: "string",
      maxLength: 100, // <- the primary key must have set maxLength
    },
    updatedAt: {
      type: "number",
    },
    score: {
      type: "number",
      minimum: 0,
      maximum: 100000,
      multipleOf: 1,
    },
    name: {
      type: "string",
      maxLength: 100,
    },
    level: {
      type: "number",
      minimum: 1,
      maximum: 80,
    },
    skills: {
      type: "array",
      maxItems: 5,
      uniqueItems: true,
      items: {
        type: "string",
      },
    },
  },
  required: ["id", "updatedAt", "score", "name", "level", "skills"],
  indexes: ["score"],
};

Init

import { createRxDatabase } from "rxdb";
import { getRxStorageDexie } from "rxdb/plugins/storage-dexie";
import { RxDBDevModePlugin } from "rxdb/plugins/dev-mode";
import { addRxPlugin } from "rxdb";
import { start_replication } from "./replication";
import { RxDBUpdatePlugin } from "rxdb/plugins/update";
import { RxDBQueryBuilderPlugin } from "rxdb/plugins/query-builder";
addRxPlugin(RxDBQueryBuilderPlugin);
addRxPlugin(RxDBUpdatePlugin);
addRxPlugin(RxDBDevModePlugin);

const db = await createRxDatabase({
  name: "heroesdb",
  storage: getRxStorageDexie(), 
  multiInstance: true, 
  eventReduce: true, 
  cleanupPolicy: {},
});

const myCollections = await db.addCollections({
  humans: {
    schema: mySchema,
  },
});

await start_replication(myCollections.humans);

replication.js | https://rxdb.info/replication.html

import { replicateRxCollection } from "rxdb/plugins/replication";
import { lastOfArray } from "rxdb";
import { Subject } from "rxjs";
import icblast, { toState } from "@infu/icblast";

let ic = icblast({ local: "true", local_host: "http://localhost:8080" });

export const start_replication = async (myRxCollection) => {
  const pullStream$ = new Subject();

  const replicationState = await replicateRxCollection({
    collection: myRxCollection,

    replicationIdentifier: "anything",

    retryTime: 5 * 1000,

    waitForLeadership: true,

    autoStart: true,

    deletedField: "deleted",

    push: {
      async handler(docs) {
        let store = [];
        for (let { assumedMasterState, newDocumentState } of docs) {
          store.push(newDocumentState);
        }

        let can = await ic("rrkah-fqaaa-aaaaa-aaaaq-cai");
        return await can.push(store);
      },

      batchSize: 100,

      modifier: (d) => d,
    },

    pull: {
      async handler(lastCheckpoint, batchSize) {
        const minTimestamp = lastCheckpoint ? lastCheckpoint.updatedAt : 0;

        const lastId = lastCheckpoint ? lastCheckpoint.id : null;

        let can = await ic("rrkah-fqaaa-aaaaa-aaaaq-cai");

        let rez = await can
          .pull(minTimestamp, lastId || undefined, batchSize)
          .catch((e) => console.log(e));

        const documentsFromRemote = toState(rez);
        return {
          documents: documentsFromRemote,

          checkpoint:
            documentsFromRemote.length === 0
              ? lastCheckpoint
              : {
                  id: lastOfArray(documentsFromRemote).id,
                  updatedAt: lastOfArray(documentsFromRemote).updatedAt,
                },
        };
      },
      batchSize: 1000,
      modifier: (d) => d,
    },
  });
};

Then you can use all RxDB functions and the DB will get synced with the IC backend.

1 Like

Thanks for the example! That’s very helpful. I get an idea now for what remains to be done.

As I understand it, the repo at GitHub - infu/RxMoDb is the DB which itself does not provide push/pull functions. I have to implement them myself at the top-level of the actor in the way your wrote them. Right?

There are a few things unclear in the Motoko code:

  • What is it in Iter.fromArray(it)? Is that line meant to be for (doc in docs.vals()) {?
  • If hk.pk.get return null then we trap. Shouldn’t we just insert this doc as a new doc instead? Or how do we ever add an entirely new doc (not update an existing one)?
  • We only call hero.db.insert if there are no conflicts at all. Even if there are conflicts for some docs, shouldn’t we insert the other ones and then return the array of conflicts?
  • In rec.revision != doc.revision + 1 shouldn’t it be the other way around, doc.revision != rec.revision + 1? It seems doc is the provided version and rec is the old one in the database.
  • In Vector.push(conflicts, doc) we push the provided one into the conflicts array. On Replication · RxDB - JavaScript Database under pushHandler it says “It must return an array that contains the master document states of all conflicts.” so shouldn’t be push rec instead of doc?

RxMoDB has some reactivity, which is why it has “Rx”, but is not related to RxDB. Although, they should work well together.

Yes, these both should do the same thing

You are right. I haven’t tested the conflict resolution really. It’s a few lines I just added quickly based on their documentation to show where its place is and how it should probably work. The logic is probably crooked and needs some experimentation and further digging into their documentation.

I’ve tested it without trying to simulate conflicts and it was pretty great. (with the conflict resolution lines removed) Synced 5000 documents for a few seconds. Updates, inserts & deletes work well. Queries work well too, but it seems if someone wants the fastest possible frontend engine, they need to purchase it.

2 Likes