Awarded: ICDevs.org Bounty #13 - Zip Encoder/Decoder - $14,000

This is the second bounty in a series of bounties we are releasing this week in the run-up to Motoko Bootcamp. Winners of the Bootcamp’s Intermediate level will get the first crack at selecting one of these bounties to complete.

Zip Encoder/Decoder - #13

Current Status: Discussion

  • Discussion (02/21/2022)
  • Ratification
  • Open for application
  • Assigned
  • In Review
  • Closed

Official Link

Bounty Details

  • Bounty Amount: $5,000 USD of ICP at award date - $5000 USD of ICP Match Available
  • ICDevs.org DFINITY Bounty Accelerator Grant Match Available: $5000 USD of ICP at award time - (For every ICP sent to 77f7c65f6a5b59e8694ab9594ee6ba4fd2d9fb4d8197e869173f22679d8465c8, ICDevs.org will add $125 USD of ICP at award date to the bounty, up to the first 40 ICP donated, After 40 ICP, donations to the above address will add .25 ICP to this issue and .75 ICP to fund other ICDevs.org initiatives)
  • Project Type: Single Contributor
  • Opened: 02/20/2021
  • Time Commitment: Weeks
  • Project Type: Library
  • Experience Type: Intermediate - Motoko
  • Issue Type: Motoko Library

Description

This bounty gives the opportunity to

  • learn how gzip works
  • learn advanced motoko parsing and type management
  • learn about cycle management and multi step processes

The goal of this bounty is to create an encoder/decoder to gzip/zlib data in motoko canisters written in Motoko. GZip is a compression algorithm that compresses raw data into a manageable file size. It uses the DEFLATE algorithm.

More info about GZip can be found at:

https://git.savannah.gnu.org/cgit/gzip.git/?h=v1.11

https://stackoverflow.com/questions/19746313/when-using-the-yui-compressor-should-i-combine-then-minify-or-minify-then-comb/19799494#19799494

http://www.infinitepartitions.com/art001.html

https://stackoverflow.com/questions/20762094/how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they#:~:text=The%20main%20difference%20between%20zlib,that%20read%20and%20write%20the%20

Example GZip/Zlib libraries can be found at:

https://github.com/nodeca/pako

https://github.com/sile/libflate

The library will need to support compressing and uncompressing large data arrays that likely outrun the current cycle limit. You will need to use a library like Pipelinify.mo. Using pipelinify is not required if another, better method is available. The bounty applicant may also enhance pipelinify.mo for their needs.

The library will need to provide the following functions(or equivalent with a different multi-step processing library):


compress(Pipelinify.ProcessRequest) -> Pipelinify.ProcessResponse; //sets up a compress process

compress_process(Pipelinify.StepRequest) -> Result<ProcessResponse,ProcessError>; //executes a step

decompress(Pipelinify.ProcessRequest) -> Pipelinify.ProcessResponse; //sets up a decompress process

decompress_process(Pipelinify.StepRequest) -> Result<ProcessResponse,ProcessError>; //executes a step

compress_result(Pipelinify.ProcessingStatusRequest) -> Buffer<Buffer.Buffer<nat8>>, metadata: ZIP};

decompress_result(Pipelinify.ProcessingStatusRequest) -> Buffer<Buffer.Buffer<nat8>>;

clear_cache(?Pipelinify.ProcessingStatusRequest) -> bool; //clean out any pipelinify cache for a status request, or the entire cache if null

type ZIP = {
    //see the spec and define this type
}

The ProcessRequest variables will need to be defined by the developer to conform with the spec. For this bounty, the dev may implement one optimal configuration of the compression inputs. A future bounty can allow for more customization.

Results should generally be returned in Buffer.Buffer<Buffer.Buffer> so that files can be manipulated in memory. The subchunks should not exceed 2MB so that they can be shipped to other canisters under the file transfer limit. The Candy Libray that Pipelinify uses provides a number of functions to help with these limitations. Feel free to refactor candy and submit improvement pull requests.

The developer will need to experiment with process chunking strategies and make recommendations about how much data can be processed during one process cycle. Hopefully, we can remove this requirement once deterministic time slicing is integrated.

The final delivery should include test cases for both single step and multistep compression/decompression.

The package should be deployed as a vessel package.

To apply for this bounty you should:

  • Include links to previous work writing tutorials and any other open-source contributions(ie. your github).
  • Include a brief overview of how you will complete the task. This can include things like which dependencies you will use, how you will make it self-contained, the sacrifices you would have to make to achieve that, or how you will make it simple. Anything that can convince us you are taking a thoughtful and expert approach to this design.
  • Give an estimated timeline on completing the task.
  • Post your application text to the Bounty Thread

Selection Process

The ICDevs.org developer’s advisors will propose a vote to award the bounty and the Developer Advisors will vote.

Bounty Completion

Please keep your ongoing code in a public repository(fork or branch is ok). Please provide regular (at least weekly) updates. Code commits count as updates if you link to your branch/fork from the bounty thread. We just need to be able to see that you are making progress.

The balance of the bounty will be paid out at completion.

Once you have finished, please alert the dev forum thread that you have completed work and where we can find that work. We will review and award the bounty reward if the terms have been met. If there is any coordination work(like a pull request) or additional documentation needed we will inform you of what is needed before we can award the reward.

Bounty Abandonment and Re-awarding

If you cease work on the bounty for a prolonged(at the Developer Advisory Board’s discretion) or if the quality of work degrades to the point that we think someone else should be working on the bounty we may re-award it. We will be transparent about this and try to work with you to push through and complete the project, but sometimes, it may be necessary to move on or to augment your contribution with another resource which would result in a split bounty.

Funding

The bounty was generously funded by the DFINITY Foundation Accelerator. If you would like to turbocharge this bounty you can seed additional donations of ICP to 77f7c65f6a5b59e8694ab9594ee6ba4fd2d9fb4d8197e869173f22679d8465c8. ICDevs will match the bounty 5:1 for the first 40 ICP out of the DFINITY grant and then 0.25:1 after that. All donations will be tax deductible for US Citizens and Corporations. If you send a donation and need a donation receipt, please email the hash of your donation transaction, physical address, and name to donations@icdevs.org. More information about how you can contribute can be found at our donations page.

General Bounty Process

Discussion

The draft bounty is posted to the DFINITY developer’s forum for discussion

Ratification

The developer advisor’s board will propose a bounty be ratified and a vote will take place to ratify the bounty. Until a bounty is ratified by the Dev it hasn’t been officially adopted. Please take this into consideration if you are considering starting early.

Open for application

Developers can submit applications to the Dev Forum post. The council will consider these as they come in and propose a vote to award the bounty to one of the applicants. If you would like to apply anonymously you can send an email to austin at icdevs dot org or sending a PM on the dev forum.

Assigned

A developer is currently working on this bounty, you are free to contribute, but any splitting of the award will need to be discussed with the currently assigned developer.

In Review

The Dev Council is reviewing the submission

Awarded

The award has be been given and the bounty is closed.

Matches

DFINITY Accelerator Grant: - $5000 USD of ICP at award date

Other ICDevs.org Bounties

2 Likes

This bounty has been assigned to discord user barolukluk. Good luck and congrats on placing in the Intermediate section of Motoko Bootcamp!

This bounty is being assigned to @W_work . Good luck. This will be a great feature for data storage.

Hey @W_work, any progress on this? I know you’d come a good way on it.

This would be great for Gzip as well.

Stuck for a while with Motoko. Lack of source of information and mentoring.
Right now we have a file header.
We have gzstat.mo (analyzer that gives us information about files and e.t.c).

Yes. It’s currently working with gzip only.
The current milestone is to finish with an uncompressed gzip.
And then move forward with DEFLATE and ZIP.

Bumping in case anyone is looking for a Holiday project. We could really use better compression for Motoko…and now that we have timeslicing it is more realistic to do it over a number of rounds.

This bounty has been increased to $14,000.

Hey @skilesare, I’m interested in completing this bounty.
I plan on exporting functions for the single-step process and a class for the multi-step process that uses Piplinify internally and abstracts away the details. So users can easily upload bytes for each step into the class instance without worrying about types for pipelinify.

Awesome!!! Realize this didn’t go through a week ago because of the character limit.

1 Like

Here’s the repository I will be working in: GitHub - NatLabs/deflate.mo: An Implementation of Gzip the Deflate compression algorithm.
I’ve added a naive implementation of the LZSS encoding algorithm and will post further updates and design choices in the forum.

Deflate Update

I’m pleased to report that I’ve made progress in implementing the deflate algorithms and gzip compression. Here’s a summary of the progress so far:

Gzip Compression Support

I have created a Gzip module that supports all the header options and the crc32 checksum defined by the gzip standard. The compressed data generated by the module can be decompressed using the gzip cli tool in Linux.
There is a example from the module provided in the output.data file. The make gz-d command, converts the data into a zipped file (output.gz ) and decompresses it into the output file using gzip -d

Compression with Fixed Huffman Codes

I have also implemented compression for fixed Huffman codes and tested it using the gzip cli tool in linux. Tests have shown that this implementation can achieve up to 38% compression.

Dependencies

I have created some libraries and updated some existing libraries to support some needed features for implementing the deflate algorithm:

  • BitBuffer: A library for reading and writing binary data
  • CircularBuffer: A fixed-sized buffer that overwrites the oldest item in it if it overflows
  • hash.mo: A library that provides a CRC32 checksum. I made a pull request to update the library to encode consecutive blocks of bytes of unknown size.

Roadmap

Create a Prefix Table for storing repeated sequences
Implement the LZSS algorithm for encoding literals and back-references
Gzip compatible format
Compression with fixed huffman codes
Compression with dynamic huffman codes
Decompression with fixed huffman
Decompression with dynamic huffman
Automate encoding and decoding large files in multi-step processes

Stay tuned for future updates!

4 Likes

Great progress! Keep it up!

2 Likes

Hi @tomijaga, just wanted to check in and see how this one was going.

Hey @skilesare,

Since the last update, I’ve implemented the Gzip decoder for fixed and dynamic Huffman codes.

I am currently working on compression with dynamic Huffman codes and should be finished soon.

Do you have any example projects where you used the pipelinify.mo library to avoid exceeding the instruction limit? I’ve tried implementing the processor-consumer model for multi-step processing, where the processor canister does the heavy lifting and returns the response to the consumer. However, I’m still hitting the instruction limit for data above 1MB.

The library now fully supports compression using dynamic huffman codes, which outperforms compression with fixed Huffman codes. This [test] (update dfx version in github actions · NatLabs/deflate.mo@cad237f · GitHub) achieves 53% compression over the previous 38%.

Here is the code for the processor-consumer model mentioned in the previous post:

The goal is for the consumer to create a task in the processing canister and send multiple instructions to compress consecutive data chunks until the entire dataset is compressed. Unfortunately, I keep encountering a Natural Subtraction Error. I suspect it’s related to the function exceeding the instruction limit, as this error does not occur when compressing files of 1MB or smaller, as demonstrated in encode.mo.

1 Like