Great!
Yes, compiling regex each time will definitely waste a lot of cycles. I also started implementation with saving regexes by name. But this is not friendly API (requires to keep track of compiled regexes and handle name clashes)
Instead I decided to go with precompile
function, which will store compiled regex by it’s sha256 in a hashmap, so any matching function could lookup already compiled version (or compile regex in place if cached version is missing).
My proposed API:
type Match = record {
text: text;
start: nat64;
end: nat64;
};
type Re = text; // regex text
service : {
"precompile": (vec Re) -> (); // idempotent method, call during initialization or before large batch of calls
"is_match": (Re, text) -> (bool) query;
"captures": (Re, text) -> (vec opt Match) query;
"batch_is_match": (Re, vec text) -> (vec bool) query;
"batch_captures": (Re, vec text) -> (vec vec opt Match) query;
};
Should we support multiple patterns in the same call? I think we can skip this and go straight to exposing RegexSet in regex - Rust API, which matches several patterns in a single pass.