Awarded: ICDevs.org - Bounty #33 - Reg Ex Utility Canister - Rust - $4,000

Reg Ex Utility Canister - Rust - #33

Current Status: Discussion

  • Discussion (01/09/2023)
  • Ratification: (01/09/2023)
  • Open for application: (01/09/2023)
  • Assigned
  • In Review
  • Closed

Official Link - Discussion

Bounty Details

  • Bounty Amount: $4,000 USD of ICP at award date.
  • ICDevs.org Bounty Acceleration: For each 1 ICP sent to 6e8afebab59f703356e189297e3f49fbe18ace5150ccc43f74f30ceb3f6b5ece, ICDevs.org will add .25 ICP to this issue and .75 ICP to fund other ICDevs.org initiatives.
  • Project Type: Individual
  • Opened: 01/09/2023
  • Time Commitment: Days
  • Project Type: Canister
  • Experience Type: Intermediate - Rust;

Description

This rust canister allows motoko canister to query it with data to receive EVM compliant transactions that can be signed via t-ECDSA and submitted to EVM networks.

This bounty gives the opportunity to

  • learn rust
  • learn about regEx
  • learn about intercanister calls and their restrictions

Motoko currently is missing a RegEx library. Until RegEx is integrated into Motoko, it would be nice to have a utility canister that does most of this work for a motoko canister. This will involve async communication and may incur long transaction times, but it will at least let motoko devs get started with RegEx based searches while those motoko libraries are being developed.

  • Create a proper candid type for passing a single or set of strings and regEx expressions to the utility canister.
  • Expose the RegEx crate with public methods as found at regex - Rust
  • Analyze the Performance features and make a recommendation for the canister.
  • Analyze the Unicode options and provide optional parameters to drive the selection at run time if possible.
  • Analyze the performance of the library and provide guidance on limits of its use(We expect that with the 2MB cap on inter-canister calls that it will be unlikely to hit cycle limits, but we need you to test and report.)

To apply for this bounty you should:

  • Include links to previous work writing tutorials and any other open-source contributions(ie. your github).
  • Include a brief overview of how you will complete the task. This can include things like which dependencies you will use, how you will make it self-contained, the sacrifices you would have to make to achieve that, or how you will make it simple. Anything that can convince us you are taking a thoughtful and expert approach to this design.
  • Give an estimated timeline on completing the task.
  • Post your application text to the Bounty Thread

Selection Process

The ICDevs.org developer’s advisors will propose a vote to award the bounty and the Developer Advisors will vote.

Bounty Completion

Please keep your ongoing code in a public repository(fork or branch is ok). Please provide regular (at least weekly) updates. Code commits count as updates if you link to your branch/fork from the bounty thread. We just need to be able to see that you are making progress.

The balance of the bounty will be paid out at completion.

Once you have finished, please alert the dev forum thread that you have completed work and where we can find that work. We will review and award the bounty reward if the terms have been met. If there is any coordination work(like a pull request) or additional documentation needed we will inform you of what is needed before we can award the reward.

Bounty Abandonment and Re-awarding

If you cease work on the bounty for a prolonged(at the Developer Advisory Board’s discretion) or if the quality of work degrades to the point that we think someone else should be working on the bounty we may re-award it. We will be transparent about this and try to work with you to push through and complete the project, but sometimes, it may be necessary to move on or to augment your contribution with another resource which would result in a split bounty.

Funding

The bounty was generously funded by the DFINITY Foundation. If you would like to turbocharge this bounty you can seed additional donations of ICP to 6e8afebab59f703356e189297e3f49fbe18ace5150ccc43f74f30ceb3f6b5ece. ICDevs will match the bounty $40:1 ICP for the first 50 ICP out of the DFINITY grant and then 0.25:1. All donations will be tax deductible for US Citizens and Corporations. If you send a donation and need a donation receipt, please email the hash of your donation transaction, physical address, and name to donations@icdevs.org. More information about how you can contribute can be found at our donations page.

FYI: General Bounty Process

Discussion

The draft bounty is posted to the DFINITY developer’s forum for discussion

Ratification: (01/09/2023)

The developer advisor’s board will propose a bounty be ratified and a vote will take place to ratify the bounty. Until a bounty is ratified by the Dev it hasn’t been officially adopted. Please take this into consideration if you are considering starting early.

Open for application

Developers can submit applications to the Dev Forum post. The council will consider these as they come in and propose a vote to award the bounty to one of the applicants. If you would like to apply anonymously you can send an email to austin at icdevs dot org or sending a PM on the dev forum.

Assigned

A developer is currently working on this bounty, you are free to contribute, but any splitting of the award will need to be discussed with the currently assigned developer.

In Review

The Dev Council is reviewing the submission

Awarded

The award has been given and the bounty is closed.

Other ICDevs.org Bounties

Hello @skilesare . I’m interested in being assigned to work on this bounty.

About me:

I’m experienced with Rust and specifically interested in IC technology. I think this project will be a great opportunity to enhance my understanding of Canisters and IC in general while helping community.

Relevant experience:

Estimate timeline:

  1. POC version - already in progress - 3-4d
  2. second iteration with well-thought-out API supporting batching and other features - 5-7d
  3. researching performance features as outlined in the bounty - 4d
  4. testing implementation (including integration tests) - 4d
  5. writing documentation and examples, deploying canister - 4d

Questions to tackle:

  • How should lifecycle of RegEx be managed? Surely we don’t want to compile it for every request.
  • Should consumers pay for using canister? How much?
  • What does the best API for consumer look like? for performance-concerned consumer?
1 Like

Awesome. Feel free to progress!

I probably need to better understand what you mean here. If I send a text and matching pattern you need to compile that particular matching patter? Is this an expensive operation? If so, it might be nice to store these as named-compiled regexs. So the submission pair would look like ({#pattern(text),#named(text)}, text). And have a function to create a new named regex. Maybe I’m misunderstanding this.

This is likely part of the bounty. For the first version, I think it is safe to assume that each user than needs this will deploy their own canister so that cycle management isn’t an issue. But if you can do the evaluation of how much these generally costs in cycles and/or find a good way to count them then we can look at a second bounty to SNSify the service as a utility canister.

Please make a proposal. It should be optimally useful and should take into account the 2MB message limit. I’d imagine batch would essential here…maybe ask @icme what would be interesting from a large data standpoint.

Simple use is likely just :

public query func com_regex_match([(pattern, text)]) : async matches; //maybe some variants there if we do named.

Great!

Yes, compiling regex each time will definitely waste a lot of cycles. I also started implementation with saving regexes by name. But this is not friendly API (requires to keep track of compiled regexes and handle name clashes)
Instead I decided to go with precompile function, which will store compiled regex by it’s sha256 in a hashmap, so any matching function could lookup already compiled version (or compile regex in place if cached version is missing).

My proposed API:

type Match = record {
    text: text;
    start: nat64;
    end: nat64;
};

type Re = text; // regex text

service : {
    "precompile": (vec Re) -> ();  // idempotent method, call during initialization or before large batch of calls
    "is_match": (Re, text) -> (bool) query;
    "captures": (Re, text) -> (vec opt Match) query;
    "batch_is_match": (Re, vec text) -> (vec bool) query;
    "batch_captures": (Re, vec text) -> (vec vec opt Match) query;
};

Should we support multiple patterns in the same call? I think we can skip this and go straight to exposing RegexSet in regex - Rust API, which matches several patterns in a single pass.

Hello. I have some progress to share. I published a repo with all code, docs and examples to this day. The code also includes benchmark (npm run bench) that compares performance with/without precompilation (spoiler: it’s really not that much of a difference in terms of performance, but should save some cycles)

Feedback on docs and API is greatly appreciated.


Things that remain:

  • Test implementation
  • Deploy canister to IC (& include command to donate cycles to it in readme)
  • Document more findings on performance (?)
  • Research on making this canister as a service

cc @skilesare

1 Like

These look great! Amazing progress!

Hello again. Updates since my last message:

  • migrated to using ic-kit (code looks even simpler now)
  • wrote implementation tests (thanks ic-kit!)
  • set up CI and added a nice badge
  • deployed canister to IC and shared it’s address in readme

I consider this project finalized and ready-to-use. Feedback on API and documentation is still appreciated tho.


Note about making this a canister-as-a-service:

  • There are methods msg_cycles_available and msg_cycles_accept that can be theoretically used to collect fee from callers based on current call usage
  • How to calculate that usage remains a question, as there is AFAIK no methods that would allow to inspect cycles usage in real time. Correct me if I missed something here.
  • Closest thing I found is performance_counter in ic_kit::utils - Rust method, but we will need to determine a fair price for the executed instructions, and “making the price right for everyone” sounds like a tedious but interesting task. I am happy to take on this if a bounty comes up for it.

@skilesare cc just in case you missed this

On my list to review! Awesome turnaround time and great work. I’d love to get @icme 's opinion on how something like this might be useful immediately for canscale.

1 Like

Submitted for awarding

1 Like

Hey Austin - what did you have in mind in terms of usefulness for CanDB? I’m not sure I see the utility.

I’m not sure where you are at with pre processing for indexing or maintaining active filters, but I thought it might help with some of that if users had reg ex searches they wanted to stay active.

1 Like