Canister exceeded the limit of 5000000000 instructions for single message execution

rbole · November 12, 2024, 11:00am

I’m attempting to perform a text search in a large dataset of a map structure.

First, I filter the map by projectId to reduce the data volume, then I apply an additional filter using the text.contains function to search for a specified string within the record property. This approach works for small datasets, but with larger ones, the canister instruction limit for a single message is exceeded.

What strategies can I use to mitigate this issue?

public query func getIssuesByIssue (issue:Text, projectId:Nat16) : async [(Nat,Record)] {

Debug.print(debug_show(Map.size(map)));
let mappedFilteredMap = Map.mapFilter<Nat, Record, Record>(map, nhash, func(key, value) = if (value.projectId == projectId) ?value else null);

let results = Map.mapFilter<Nat, Record, Record>(mappedFilteredMap, nhash, func(key, value) = if (Text.contains(value.issue, #text(issue))) ?value else null);
Map.toArray(results);

}

infu · November 12, 2024, 11:17am

You could use Azle or Rust and put Sqlite with indexed text field.
If you want it in Motoko, then something like this:
Break the text into individual tokens (words or stems of words). For each token, add an entry in the index associating it with the document ID and offset.
Search for tokens and if their offsets are sequential then you also match a phrase.

timo · November 12, 2024, 12:08pm

Can you explain what the Nat is here, the key in your Map?

Also, it seems like your projectId’s are consecutive numbers. Why don’t you have one Map per project? I mean for example a vector of Maps where the vector index is the projectId. Then you can skip the first mapFilter.

mapFilter creates a new map and is therefore expensive. You don’t care about a map as a result because you want an Array in the end. Hence, I would eliminate the second mapFilter, too. You can iterate of the entries yourself with vals or entries, then filter out the ones that contain the search text, and put the results in a Buffer or Vector. Then convert from there to Array at the end.

If all that still doesn’t help then you can still build your own word index. But even then you should do the above steps regardless. So I would start with those.

rbole · November 12, 2024, 12:49pm

Thanks for the help:

The Nat as key represents the issueId in this example.

I have modified the function to use a buffer, but I’m still hitting the limit when using Text.contains for comparisons. Without the Text.contains query, the function runs very quickly. The dataset contains 22,000 records.

  public query func getIssuesByIssue (issue:Text, projectId:Nat16) : async  [(Nat, Record)] {
    let buffer = Buffer.Buffer<(Nat, Record)>(0);

    Debug.print(debug_show(Map.size(map)));

    for (value in Map.vals(map)) {
      switch (?value) {
        case (?value) {
          //if (value.projectId == projectId and Text.contains(value.issue, #text(issue))) 
          if (value.projectId == projectId ) 
          {
            Debug.print(debug_show(value.title));
            buffer.add((value.id, value));
            
          };
        };
        case null {};
      };

    };
     Debug.print(debug_show(buffer.size()));
     return Buffer.toArray(buffer);

  };

skilesare · November 12, 2024, 1:05pm

There are a number of things in motoko like text operations, binary access and update by index, etc that I think could use some optimization. I know that @luc-blaeser has been hyper focused on OEP for a while and that the team has been super busy. It might be worth indexing all of this small performance issues and getting them slated on the priority list for the motoko team. I’m not holding out hope for a hyper- optimized native regex engine in motoko anytime soon, but I thing there are likely a number of places that would offer substantial gains.

Searching in the middle of a string has been a rough Argo to deal with since SQL server 1.0(and likely before!).

Your best bet is going to be some kind of preprocessing like @infu suggests.

skilesare · November 12, 2024, 1:10pm

Haha…maybe I’m wrong…wouldn’t be the first time::

Maybe @demali.icp may have some good insights on where they were seeing some slow down or optimization opportunities.

rbole · November 12, 2024, 1:10pm

Do you have an example, which I can study ?

timo · November 12, 2024, 1:15pm

Can you tell what is the smallest data set that exceeds 5b instructions? I mean number of records and total length of the Text of all records combined?

rbole · November 12, 2024, 1:36pm

issue_text_length: 5_696_451 (number of characters from Text.size() over all records until it stops)
records: 14_681 (number of records over all records until it stops)

infu · November 12, 2024, 2:05pm

I doubt regexp will help, it’s going letter by letter same as Text.contains.

Don’t have one. Someone has to make a library, it won’t be very easy to do properly.

rbole · November 12, 2024, 2:10pm

It seems that the issue isn’t the text size of any single record, but rather the combined size of all text values. This suggests that when using Text.contains for checks, memory isn’t being released after each operation. Could it be that the Text.contains function is retaining memory unnecessarily?

infu · November 12, 2024, 2:16pm

You can check this demo I made when trying out sql in Azle. It has full text search
[https://yv5hx-rqaaa-aaaam-abj6a-cai.icp0.io/]
The code won’t work, but with the newer Azle it should be easier.
[GitHub - infu/internetbase-sql-demo]
If you want to do more of this kind of things - text searching, ordering, intersections, queries on your db, adding queries on demand without planning for them, better try Azle + SQL. But then you need to be good with SQL to make things optimized.

rbole · November 12, 2024, 2:25pm

I tried using Azel, but it took a very long time to compile, which is why I switched back to Motoko. But thanks.

rbole · November 12, 2024, 2:46pm

Just for information:
I have imported an additional 90_000 messages which belongs to the existing issues, and everything is working as expected. I can now query a single issue with all its comments.

The local canister memory size is currently Nat(135572009), which, if my calculation is correct, would be approximately 129 MB.

infu · November 12, 2024, 3:20pm

Motoko imo is really good for implementing protocols. You can delegate complex query functions like text search to another indexing canister

infu · November 12, 2024, 3:52pm

Another alternative:
Do these inside the browser when creating/modifying text articles, for each word:

stemming [stemr - npm]
convert to lowercase
hash to nat32
send all hashes to the canister along with the text article

Your canister:
1)For each article you have Set [Mops • Motoko Package Manager]

To search, (inside the frontend split into words, convert the same way to nat32, search for array of nat32 using)
Set.has(myproject.words, word)

rbole · November 12, 2024, 5:49pm

Thank you, this approach looks interesting. I’ll explore it further to see how I can apply it for phrase searching.

rbole · November 16, 2024, 7:26am

Short Update on My Search Problem

I’ve developed a prototype for a German-language stemmer with Motoko. While I’m not certain that all Porter stemming rules are fully implemented, the core stemming functionality is working. The parser also excludes a predefined list of stop words and effectively removes all HTML tags.

However, the initial indexing process is currently very slow and takes a significant amount of time. Additionally, many unusable words are being indexed, and some words that appear frequently across issues cause the specific word arrays to grow excessively large.

In theory, both the indexing and search functionalities are working, but I’m unsure how to scale this approach effectively with my dataset of 22,000 issues and 90,000 messages and more.

lastmjs · November 25, 2024, 5:47pm

Hey, what version of Azle did you use? I am assuming an old version. Azle no longer has those compiling issues if you’re talking about anything over a few seconds.

rbole · November 28, 2024, 8:21am

It must have been an older version. I tried it again today with Azle 0.24.1, and it seems much faster now!

Topic		Replies	Views
What the max size of "Text" in motoko Developers	3	682	June 16, 2021
Unusual increase in errors: "Exceeded the instruction limit for single message execution" Developers	6	1467	June 7, 2022
Full-Text Search Language Support Motoko , Wasm	10	1283	June 29, 2023
Instruction limit is crushing me! Developers	14	681	August 11, 2023
Motoko Database Developers	13	1569	August 16, 2023

Canister exceeded the limit of 5000000000 instructions for single message execution

Related topics