ICDevs.org - Bounty #14 - Big SHA-KECCAK

This is the third bounty in a series of bounties we are releasing this week in the run-up to Motoko Bootcamp. Winners of the Bootcamp’s Intermediate level will get the first crack at selecting one of these bounties to complete.

Big SHA256-KECCAK - #14

Current Status: Discussion

  • Discussion (02/22/2022)
  • Ratification
  • Open for application
  • Assigned
  • In Review
  • Closed

Official Link - Discussion

Bounty Details

  • Bounty Amount: $1,000 USD of ICP at award date - $1000 USD of ICP Match Available
  • ICDevs.org DFINITY Bounty Accelerator Grant Match Available: $1000 USD of ICP at award time - (For every ICP sent to 1a25118faf5e325df52b38ff021560e6e0232b816f6526eb575156a6cb5f1ce1, ICDevs.org will add $125 USD of ICP at award date to the bounty, up to the first 8 ICP donated, After 8 ICP, donations to the above address will add .25 ICP to this issue and .75 ICP to fund other ICDevs.org initiatives)
  • Project Type: Single Contributor
  • Opened: 02/20/2021
  • Time Commitment: Days
  • Project Type: Library
  • Experience Type: Intermediate - Motoko
  • Issue Type: Motoko Library

Description

This bounty gives the opportunity to

  • learn how sha256 works
  • learn how keccak works(Ethereum Hashing)
  • learn advanced motoko parsing and type management
  • learn about cycle management and multi step processes

The goal of this bounty is to create a workflow for calculating a sha256 and keccak in motoko for very large files or data structures. SHA256 is a cryptographic hash function that outputs a value that is 256 bits long for a given data that can be verified by other processes that have access to the data. Keccak is a hash algo used in calculating hashes on the EVM.

More info about SHA256 can be found at:

https://www.n-able.com/blog/sha-256-encryption

More info about Keccak can be found at https://keccak.team/specifications.html

We already have a SHA256 library/crypto libraries for motoko at:

https://github.com/enzoh/motoko-sha
https://github.com/aviate-labs/crypto.mo

The current library currently fails if the data passed in is too large. The canister will run out of cycles.The developer will need to create a library to support computing the hash over large data arrays. You will need to use a library like Pipelinify.mo. Using pipelinify is not required if another, better method is available. The bounty applicant may also enhance pipelinify.mo for their needs.

The library will need to provide the following functions(or equivalent with a different multi-step processing library):


sha256_bytes([Nat8]) -> Pipelinify.ProcessResponse; //sets up a sha process for a large Nat8 array
sha256_blob(Blob) -> Pipelinify.ProcessResponse; //sets up a sha process for a blob
sha256_datazone(Buffer<Buffer<Nat8>>) -> Pipelinify.ProcessResponse; //sets up a sha process for a buffer of byte buffers

sha256_process(Pipelinify.StepRequest) -> Result<ProcessResponse,ProcessError>; //executes a step

sha256_result(Pipelinify.ProcessingStatusRequest) -> [Nat8];

clear_cache(?Pipelinify.ProcessingStatusRequest) -> bool; //clean out any pipelinify cache for a status request, or the entire cache if null

keccak_bytes([Nat8]) -> Pipelinify.ProcessResponse; //sets up a sha process for a large Nat8 array
keccak_blob(Blob) -> Pipelinify.ProcessResponse; //sets up a sha process for a blob
keccak_datazone(Buffer<Buffer<Nat8>>) -> Pipelinify.ProcessResponse; //sets up a sha process for a buffer of byte buffers

keccak_process(Pipelinify.StepRequest) -> Result<ProcessResponse,ProcessError>; //executes a step

keccak_result(Pipelinify.ProcessingStatusRequest) -> [Nat8];

clear_cache(?Pipelinify.ProcessingStatusRequest) -> bool; //clean out any pipelinify cache for a status request, or the entire cache if null

The developer will need to experiment with process chunking strategies and make recommendations about how much data can be processed during one process cycle. Hopefully, we can remove this requirement once deterministic time slicing is integrated.

The final delivery should include test cases for both single step and multistep calculation.

The output of this library will be important for future bounties around calculating EVM witnesses.

The package should be deployed as a vessel package.

To apply for this bounty you should:

  • Include links to previous work writing tutorials and any other open-source contributions(ie. your github).
  • Include a brief overview of how you will complete the task. This can include things like which dependencies you will use, how you will make it self-contained, the sacrifices you would have to make to achieve that, or how you will make it simple. Anything that can convince us you are taking a thoughtful and expert approach to this design.
  • Give an estimated timeline on completing the task.
  • Post your application text to the Bounty Thread

Selection Process

The ICDevs.org developer’s advisors will propose a vote to award the bounty and the Developer Advisors will vote.

Bounty Completion

Please keep your ongoing code in a public repository(fork or branch is ok). Please provide regular (at least weekly) updates. Code commits count as updates if you link to your branch/fork from the bounty thread. We just need to be able to see that you are making progress.

The balance of the bounty will be paid out at completion.

Once you have finished, please alert the dev forum thread that you have completed work and where we can find that work. We will review and award the bounty reward if the terms have been met. If there is any coordination work(like a pull request) or additional documentation needed we will inform you of what is needed before we can award the reward.

Bounty Abandonment and Re-awarding

If you cease work on the bounty for a prolonged(at the Developer Advisory Board’s discretion) or if the quality of work degrades to the point that we think someone else should be working on the bounty we may re-award it. We will be transparent about this and try to work with you to push through and complete the project, but sometimes, it may be necessary to move on or to augment your contribution with another resource which would result in a split bounty.

Funding

The bounty was generously funded by the DFINITY Foundation Accelerator. If you would like to turbocharge this bounty you can seed additional donations of ICP to 1a25118faf5e325df52b38ff021560e6e0232b816f6526eb575156a6cb5f1ce1. ICDevs will match the bounty 5:1 for the first 8 ICP out of the DFINITY grant and then 0.25:1 after that. All donations will be tax deductible for US Citizens and Corporations. If you send a donation and need a donation receipt, please email the hash of your donation transaction, physical address, and name to [email protected]. More information about how you can contribute can be found at our donations page.

General Bounty Process

Discussion

The draft bounty is posted to the DFINITY developer’s forum for discussion

Ratification

The developer advisor’s board will propose a bounty be ratified and a vote will take place to ratify the bounty. Until a bounty is ratified by the Dev it hasn’t been officially adopted. Please take this into consideration if you are considering starting early.

Open for application

Developers can submit applications to the Dev Forum post. The council will consider these as they come in and propose a vote to award the bounty to one of the applicants. If you would like to apply anonymously you can send an email to austin at icdevs dot org or sending a PM on the dev forum.

Assigned

A developer is currently working on this bounty, you are free to contribute, but any splitting of the award will need to be discussed with the currently assigned developer.

In Review

The Dev Council is reviewing the submission

Awarded

The award has be been given and the bounty is closed.

Matches

DFINITY Accelerator Grant: - $1000 USD of ICP at award date

Other ICDevs.org Bounties

4 Likes

Hi, Skillshare
Aviate-labs has an implementation with SHA256 (crypto.mo ). For large arrays [Nat8] (files) this is not very effective in IC — for me personally, a request from the WEB took a lot of time. Therefore, I wrote crc8 so that there would be at least some kind of data integrity control. Maybe someone will come up with something interesting, but there are not very many options here. I can’t say anything about Keccak. In general, very interesting grants and proposals from you have been sent. By the way, I’m on the wallet, look at the topic.

1 Like

Quint added a paging method to the SHA256, but it still needs some infrastructure around it to make it practical to call between consensus rounds.

“Skilesare” please clarify what problems the infrastructure should solve. So I understand that with heavy (long) calculations, synchronization between nodes fails?

Each “round” of calculation has a cycle limit. If you hit this cycle limit then your call fails. For long running functions, you have to ‘chunk’ them across calls. Basically, Initiate_call(), process(processID), process(processID), process(processID), etc. until you get a finished status. See GitHub - skilesare/pipelinify.mo: Move data chunks between canisters

2 Likes

Is it worth waiting for the following feature instead?

I’ve been thinking about this and I think that blocking is going to be an issue. Say you are recalculating an index on GBs of data that requires a table scan. Maybe time slicing let’s you ignore chunking, but isn’t your canister going to be blocked during the calculation? I’m not sure how it will work, but I’m thinking that the inherent structure of the ic is going to demand this kind of functionality eventually.

Ok.
Interested questions:

  1. How many cycles per round is the maximum?
  2. Exactly how long does it take?

I understand we can’t listen to the system calls of the beginning and end of the round? That is, there is no such thing in the IC API?
Even an implemented library in this case will require the preparation of input data.

The max data that can be uploaded at a time is 2MB. Generally I stage this data I’m a byte buffer. I store large files I’m a buffet of buffets. When I have to process it I iterate over those two buffers keeping track of my spot across each call. See the pipelining.mo package for a schéma for keeping track of things, and the CandyLibrary for the workspace object that creates a…well…workspace.

A round is about 2 seconds. The number of cycles are defined in the ic code.