Completed: ICDevs.org Bounty #8 - HttpRequest Parser

What should it look like in the end?
Option 1
A canister with a function and a json string parameter, for example http_request(json: Text) → Trie
where json is { method: Text; url: { }…}
And all interaction on the client is also via Actor and HttpAgent (@dfinity/agent)
Or
Option 2
Is it possible to build a request from scratch using HTTP/HTTPS specifications?

I would love to work on this bounty. I have a repo with some of the fields in the spec implemented here https://github.com/tomijaga/http-parser.mo . I plan to add tests and documentation once I have completed the completed body field. I estimate it will take two weeks to complete this project.

My Solution

My solution splits the url into its different parts (scheme, domain, subdirectories, query and anchor) and returns them in the object spec.

For the request body, I plan to convert the blob to text and follow the specifications on the site POST - HTTP | MDN.

An external dependency I am using to retrieve the HeaderField and Request types is the http.mo package from aviate-labs

An issue I came across is I can not use query as a field in the object because it is already a keyword in the Motoko programming language. So I changed it to queryObj . I am open to better naming suggestions.

From a user’s perspective, I think it would be helpful to add a deserialize() method that converts a JSON text in the body to an object.

About Me

I have some experience contributing to open source. These include a js SDK, a rust SDK and an HD key generation tool for an open-source blockchain project. These projects can be on my github profile

I took part in the Motoko Bootcamp and learned a lot about Motoko from completing the daily challenges. I better understand Motoko syntax and how to use vessel for importing and publishing packages.

2 Likes

Fantastic. Thanks for the detailed application. I’ll submit it to the board, but I don’t see any reason why you shouldn’t get started!

1 Like

You are assigned! Please create a repo to hold your work and let us know what it is!

1 Like

I think the repo is private, can you make it public? I’d love to follow along

1 Like

My bad, I’ve made it public now. GitHub - tomijaga/http-parser.mo: HTTP Request Parser for Motoko

2 Likes

@tomijaga ,

Can I already do some testing, or should I wait a bit?

I came across this recently and thought the people in this thread might be interested.

The url crate refers to the URL standard whereas this adheres to IETF RFC 3986.

1 Like

Is there a comptable standard for a body parser as well? I’d like to get something out and then we can get nitty about standards because they usually help.

What’s left is percent decoding, testing and documentation. So you can start testing it out. If you run into any problems, pls let me know.

I made some changes to the initial spec. I added a fileKeys array to the form object type and created a new type for files.

form: {
        get: (Text) -> ?[Text];
        hashMap: HashMap.HashMap<Text, [Text]>;
        keys: [Text];
        
        fileKeys: [Text];
        files: (Text) -> ?[File];
    };

public type File = {
        name: Text;
        filename: Text;
        
        mimeType: Text;
        mimeSubType: Text;

        start: Nat;
        end: Nat;
        bytes: Buffer.Buffer<Nat8>;
    };

@tomijaga ,
that’s great! I will let you know how it works out.

Do you happen to have some examples that demonstrate how to use it?

How does one actually call it when it is deployed to either the local network or the IC?

I would be great if you can share some examples. Maybe in curl format or even better a postman collection?

1 Like

I’d suggest switching to triemap as hashmap has serious memory issues.

1 Like

Do you happen to have some examples that demonstrate how to use it?

Yes, I have an example canister in the repo with a simple HTML form for uploading a file and some fields. You can clone the repo and deploy it locally to access the form at http://localhost:8000. I have a function called debugRequestParser that prints the parsed request to the console once the form is submitted. This function should show how to use and access some fields in the object.

How does one actually call it when it is deployed to either the local network or the IC?

The parser is a module that can be imported into your canister by adding the .vessel.dhall and package-set.dhall files specified in the example canister and this line, import HttpParser "mo:HttpParser";, to your code. It would be called in the http_request function on the incoming request to the canister. Here’s a snippet of what it would look like:

    public query func http_request(rawReq: HttpParser.HttpRequest) : async HttpParser.HttpResponse {

        let req = HttpParser.parse(rawReq);

        let {host; port; path; queryObj; anchor; original = url} = req.url;

               ...
2 Likes

Thanks, I will switch to that. Do you have any other suggestions on how to be more efficient? Parsing files over 30kb is really slow. It takes about 40s. I think it’s because I’m concatenating the characters in every line when I only need to check the first character for a match and move to the next line if it fails. I will try this out and get back to you about the performance improvements if there are any.

I just implemented this, and there were some improvements. The module now parses data at 100kb/s, which is still relatively slow as it takes about 10 hours to parse a 3GB file. I will check other similar parsing libs to see how I can increase the performance further.

Wow. We are going to need a strategy for that. We really need a highly performant regex function. I’ll try to take a look, but perhaps @paulyoung has an idea?

One thing to keep in mind is that requests on the IC are limited to 2MB, so you are unlikely to run into a scenario where you need to parse more than that. I think this applies to http_request as well.

If you want a file bigger than that, you have to chunk it.

As I said earlier in this thread; I suggest using parser combinators, or at least a parser that consumes the input as it goes.

I haven’t used these but they might be a good place to start.

3 Likes

Any new updates? Looking forward to paying out the bounty!

Yes, I’ve made a few updates. I’ve added support for percent-encoded search queries, written unit tests for each class and added documentation for the ParsedHttpRequest data type.

However, I haven’t been able to get the module to parse files faster. I’ve tried looking into parser combinators (thanks @paulyoung for this btw), but I haven’t been able to wrap my head around them. It will take some time to understand how they work and use them in the module.