As far as I can see, nothing in this manipulation would be dynamic – it would be a step of the build/linking process. Or perhaps I misunderstand what you mean by that.
To be sure, shifting indices is what tools like wasm-merge (a static linker that’s part of Binaryen) also do as part of their normal operation, nothing shady about that – programming language nerds would simply call this alpha-renaming :).
That said, a tool like described shouldn’t even need to do that if it simply inserts the additional data segments at the end?
I had issues inserting active data segments at the end, even after manually inspecting everything and ensuring all indexes were correct, a seemingly spurious Rust ownership/borrowing error would occur where it shouldn’t have. I would remove the data section and the error would disappear. After studying all ways to accomplish this it seemed I couldn’t accurately update the data section without essentially doing what the Rust compiler would do, thus why not just use the Rust compiler? But then we’re back to the original problem I aim to solve, which is removing the need to ship a Rust compiler environment.
By dynamic I just mean I have to manipulate the Wasm binary “dynamically” to add the functions into the binary and create their exports.
I think the missing piece is really adding the data segment into the binary, an active segment doesn’t seem very easy to do, a passive segment I was definitely aware of but didn’t pursue for some reason I don’t remember.
Another way to do it that @ulan has suggested is to create a static array in Rust and initialize it with like 10MiB of zeros, thus the data segmenting would all be done correctly. As long as less than 10MiB of source code is needed, I would then just write into that segment. But this would ensure at least a 10MiB binary, which when gzipped would compress very well, but still.
It’s a hack still, I would love to have an elegant solution.
Hm, I can’t speak to the specific problems Rust was creating with this, other than saying that Rust is obviously not a great language for writing compilers and similar tools in. In principle, this transformation should be fairly straightforward to implement. FWIW, I don’t see how passive segments change it much.
I looked for existing tools that might help here. Wizer seems promising:
First we instantiate the input Wasm module with Wasmtime and run the initialization function. Then we record the Wasm instance’s state:
What are the values of its globals?
What regions of memory are non-zero?
Then we rewrite the Wasm binary by intializing its globals directly to their recorded state, and removing the module’s old data segments and replacing them with data segments for each of the non-zero regions of memory we recorded.
@lastmjs: I wonder if the tool would work for you. If not, I think we could write something similar but custom to your case that can embed bytecode and also export static endpoints.
IIUC, in your use case, developers need to upload a new Wasm binary and use standard canister installation. In such cases, dynamic endpoints might be an overkill since we have the new binary anyways.
That’s said, dynamic endpoints might still be useful for interpreted language to enable completely new use cases. For example, running multiple versions of the application in the same canister (e.g. for A/B testing or for zero downtime upgrades). This would work if Azle would support multiple JS contexts in the same JS engine. I am not sure if there is an appetite for such use cases in the community right now.
It currently uses a rust crate wasm-transform to mutate the wasm and this is code that I’ve copied over from our instrumentation code in the replica. But we could easily publish it as a stand alone crate if you’d find it useful.
After thinking and discussing with @ulan I will pursue the Wasm binary modification again for now. If I could continue to get help as I go then hopefully we can get this sorted.
It should become apparent any limitations and then we can weigh those against possible protocol changes, as @ulan said dynamic method registration may provide more use cases down the road.
I think it would be better to have it a global, but I just ran into some issues when trying to make it work. I’m sure it’s doable, but I just went with the function for this simple example because I got it working faster.
It should also be doable with an active segment, but I think it might require some more complicated logic when you inject the new data segment. Since the segment is active, you’ll need to decide exactly where it ends up in the Wasm memory when do the injection and you’ll need to make sure that that region doesn’t later get overwritten by the Rust stack or heap while the module executes. A Wasm module compiled from Rust seems to have globals called __stack_pointer, __data_end, and __heap_base so it might be enough to place the data at __data_end and then increment both __data_end and __heap_base, but I haven’t tried that.
Is there any reason you’d prefer to have it as an active data segment?
I would prefer whatever works best, and I had everything working (well, I had the manipulation working at least) with an active segment, but it just didn’t work. I’m assuming for similar reasons to what you described.
I just don’t remember why I didn’t pursue passive segments, so just curious. I’m excited to try passive segments.
I really appreciate all of the help, I’ll be getting to this soon.
There is still a major problem here that could be simplified if we had dynamic canister method registration. We haven’t dug into the Wasm binary manipulation yet, but we are trying to go to 1.0 soon and this will probably be included.
The problem is that to even get the names of the methods to export in the Wasm binary, that requires either a dynamic execution of the canister’s JS code, which is error-prone and could add to compilation/build time, or we need to write a compiler to get this information. Writing that compiler is non-trivial to do correctly, and it’s how Azle started and we abondoned it after ~18 months because of the complexity of achieving a full solution.
Dynamic canister method registration would make this whole process simple. We could just use a pre-compiled Wasm binary, swap out JS code, and deploy. During init the canister would register its methods.
No need to execute the code twice, no need for a complicated static analysis pass, no need for a more complicated Wasm binary manipulation.
I would love it if we could reconsider adding dynamic canister method registration…or as an alternative, default canister method callbacks could work possibly.
We have the binary manipulation working very well, just figuring out those last few issues to get our test suite passing.
I would still like to push for dynamic canister method registration. Right now we are still required to deploy an entirely new binary every time the canister’s methods change. If we could dynamically change the canister methods we could just use update calls to completely swap out a canister at runtime. This may provide better latency than having to go through the whole deploy and init process.