Canister exceeded the limit of 5000000000 instructions for single message execution

I’m attempting to perform a text search in a large dataset of a map structure.

First, I filter the map by projectId to reduce the data volume, then I apply an additional filter using the text.contains function to search for a specified string within the record property. This approach works for small datasets, but with larger ones, the canister instruction limit for a single message is exceeded.

What strategies can I use to mitigate this issue?

public query func getIssuesByIssue (issue:Text, projectId:Nat16) : async [(Nat,Record)] {

Debug.print(debug_show(Map.size(map)));
let mappedFilteredMap = Map.mapFilter<Nat, Record, Record>(map, nhash, func(key, value) = if (value.projectId == projectId) ?value else null);

let results = Map.mapFilter<Nat, Record, Record>(mappedFilteredMap, nhash, func(key, value) = if (Text.contains(value.issue, #text(issue))) ?value else null);
Map.toArray(results);

}

1 Like

You could use Azle or Rust and put Sqlite with indexed text field.
If you want it in Motoko, then something like this:
Break the text into individual tokens (words or stems of words). For each token, add an entry in the index associating it with the document ID and offset.
Search for tokens and if their offsets are sequential then you also match a phrase.

1 Like

Can you explain what the Nat is here, the key in your Map?

Also, it seems like your projectId’s are consecutive numbers. Why don’t you have one Map per project? I mean for example a vector of Maps where the vector index is the projectId. Then you can skip the first mapFilter.

mapFilter creates a new map and is therefore expensive. You don’t care about a map as a result because you want an Array in the end. Hence, I would eliminate the second mapFilter, too. You can iterate of the entries yourself with vals or entries, then filter out the ones that contain the search text, and put the results in a Buffer or Vector. Then convert from there to Array at the end.

If all that still doesn’t help then you can still build your own word index. But even then you should do the above steps regardless. So I would start with those.

Thanks for the help:

The Nat as key represents the issueId in this example.

I have modified the function to use a buffer, but I’m still hitting the limit when using Text.contains for comparisons. Without the Text.contains query, the function runs very quickly. The dataset contains 22,000 records.

  public query func getIssuesByIssue (issue:Text, projectId:Nat16) : async  [(Nat, Record)] {
    let buffer = Buffer.Buffer<(Nat, Record)>(0);

    Debug.print(debug_show(Map.size(map)));

    for (value in Map.vals(map)) {
      switch (?value) {
        case (?value) {
          //if (value.projectId == projectId and Text.contains(value.issue, #text(issue))) 
          if (value.projectId == projectId ) 
          {
            Debug.print(debug_show(value.title));
            buffer.add((value.id, value));
            
          };
        };
        case null {};
      };

    };
     Debug.print(debug_show(buffer.size()));
     return Buffer.toArray(buffer);

  };

There are a number of things in motoko like text operations, binary access and update by index, etc that I think could use some optimization. I know that @luc-blaeser has been hyper focused on OEP for a while and that the team has been super busy. It might be worth indexing all of this small performance issues and getting them slated on the priority list for the motoko team. I’m not holding out hope for a hyper- optimized native regex engine in motoko anytime soon, but I thing there are likely a number of places that would offer substantial gains.

Searching in the middle of a string has been a rough Argo to deal with since SQL server 1.0(and likely before!).

Your best bet is going to be some kind of preprocessing like @infu suggests.

Haha…maybe I’m wrong…wouldn’t be the first time::

Maybe @demali.icp may have some good insights on where they were seeing some slow down or optimization opportunities.

Do you have an example, which I can study ?

Can you tell what is the smallest data set that exceeds 5b instructions? I mean number of records and total length of the Text of all records combined?

issue_text_length: 5_696_451 (number of characters from Text.size() over all records until it stops)
records: 14_681 (number of records over all records until it stops)

1 Like

I doubt regexp will help, it’s going letter by letter same as Text.contains.

Don’t have one. Someone has to make a library, it won’t be very easy to do properly.

1 Like

It seems that the issue isn’t the text size of any single record, but rather the combined size of all text values. This suggests that when using Text.contains for checks, memory isn’t being released after each operation. Could it be that the Text.contains function is retaining memory unnecessarily?

You can check this demo I made when trying out sql in Azle. It has full text search
[https://yv5hx-rqaaa-aaaam-abj6a-cai.icp0.io/]
The code won’t work, but with the newer Azle it should be easier.
[GitHub - infu/internetbase-sql-demo]
If you want to do more of this kind of things - text searching, ordering, intersections, queries on your db, adding queries on demand without planning for them, better try Azle + SQL. But then you need to be good with SQL to make things optimized.

I tried using Azel, but it took a very long time to compile, which is why I switched back to Motoko. But thanks.

Just for information:
I have imported an additional 90_000 messages which belongs to the existing issues, and everything is working as expected. I can now query a single issue with all its comments.

The local canister memory size is currently Nat(135572009), which, if my calculation is correct, would be approximately 129 MB.

Motoko imo is really good for implementing protocols. You can delegate complex query functions like text search to another indexing canister

Another alternative:
Do these inside the browser when creating/modifying text articles, for each word:

  1. stemming [stemr - npm]
  2. convert to lowercase
  3. hash to nat32
  4. send all hashes to the canister along with the text article

Your canister:
1)For each article you have Set [Mops • Motoko Package Manager]

To search, (inside the frontend split into words, convert the same way to nat32, search for array of nat32 using)
Set.has(myproject.words, word)

Thank you, this approach looks interesting. I’ll explore it further to see how I can apply it for phrase searching.

Short Update on My Search Problem

I’ve developed a prototype for a German-language stemmer with Motoko. While I’m not certain that all Porter stemming rules are fully implemented, the core stemming functionality is working. The parser also excludes a predefined list of stop words and effectively removes all HTML tags.

However, the initial indexing process is currently very slow and takes a significant amount of time. Additionally, many unusable words are being indexed, and some words that appear frequently across issues cause the specific word arrays to grow excessively large.

In theory, both the indexing and search functionalities are working, but I’m unsure how to scale this approach effectively with my dataset of 22,000 issues and 90,000 messages and more.

Hey, what version of Azle did you use? I am assuming an old version. Azle no longer has those compiling issues if you’re talking about anything over a few seconds.

It must have been an older version. I tried it again today with Azle 0.24.1, and it seems much faster now!

1 Like