Improving the Stable-FS: Enhancements and Optimizations for Better Performance

Hello everyone,

Over the past few months, I’ve been working on improving the current implementation of stable-fs, and I’m excited to share the new features and optimizations I’ve introduced.

TLDR

Stable-fs is the ic-wasi-polyfill backend. With the wasi2ic project it allows to adapt various WASI projects for running on the Internet Computer. The file system is implemented using a few BTreeMap stable structures and can utilize stable memory for larger memory storage and canister data preservation between upgrades.

Various optimizations and some interesting new features were implemented. It is now possible to write a 100MB file onto the file system within the 2B instruction limit.

Some other interesting new features: mounted memory files (for faster read/write operations), the ability to configure chunk size, the ability to limit the maximum allowed file size.

The new version (0.7) is backwards compatible and should just work with the older storage after canister upgrade.

I don’t consider it production-ready yet, and would recommend it for non-critical projects. At the same time, I think it is quite stable and should be able to handle reasonably heavy loads.

Your feedback is welcome!

What is Stable-FS?

Stable-fs is a file system implementation that serves as the backend for the ic-wasi-polyfill library that is used by the wasi2ic tool to convert and run the WASI applications on the Internet Computer (IC).

The initial implementation of stable-fs used a combination of a stable Cell and three BTreeMap structures to manage file system components such as the folder structure, file metadata, and file contents stored in data chunks:

struct StableFileSystem {
    header: Cell<Header>,
    metadata: BTreeMap<Node, Metadata>,
    dir_entry: BTreeMap<(Node, DirEntryIndex), DirEntry>,
    file_chunk: BTreeMap<(Node, FileChunkIndex), FileChunk>,
}
  • header : Stores general information like the file system version and the next node ID to use.

  • metadata : Contains details like file type, size, and reference count.

  • dir_entry : Represents folder entries within parent nodes.

  • file_chunk : Holds 4KB chunks of file data.

While functional, this initial implementation had some performance problems and to address those I have implemented a few new features and improvements.

Mounted virtual memory

As a first step to address performance concerns, I implemented the mounted memory files. This allows a user to configure files to act as hosts for virtual memories. Once configured, the file system forwards all read and write operations to one of the virtual memories and bypasses a few overheads and work considerably faster than the chunked file storage. This makes them particularly suitable for use-cases with a few large files requiring high-speed random access, such as SQLite databases.

While implementing the mounted memories, I wanted to possibly give much flexibility on how it is used. The memory can be attached during runtime over an existing file. There is a functionality to transfer the existing mounted memory to the chunked storage or to initialize a new mounted memory with the data from the host chunk storage file. Alternatively, it is possible to just use the mounted storage without synchronization with the chunked storage and after canister upgrade: re-mount it again, no data will be lost as its current file size information is stored in its own mounted file metadata.

While this approach has limitations, ie. the limited number of virtual memories, it sets a good performance baseline for comparisons with the chunked file storage performance.

New chunked memory storage

For use-cases when there are many files of the file system, a faster chunked storage is still important. There are a few overheads that we need to deal with:

  • Serialization Overhead: Reading or writing even small amounts of data requires processing entire chunks.

  • Search Overhead: Chunks had to be located within a BTreeMap.

  • Insertion/Re-balancing Overhead: Adding new chunks to the BTreeMap causes locating, repositioning existing nodes and re-balancing the tree with an extra overhead.

New Chunked Storage Structure

Inspired by discussions with @Ulan, I introduced a more efficient data structure split across three virtual memories:

struct V2FileChunks {
    chunk_ptr: BTreeMap<(Node, FileChunkIndex), FileChunkPtr>,
    chunks: VirtualMemory,
    allocator: ChunkPtrAllocator,
}

The chunk_ptr in the new BTreeMap only stores pointers instead of the actual data. This reduces the serialization overhead. Also, as long as chunks are not deleted, the pointers to the chunks remain constant and can be cached for faster rewrites.

The chunks memory stores the actual file chunks of a fixed size. The direct access to memory allows reading and writing small portions of data, no need to rewrite the whole chunk and there is no need to do multiple copies (the data is only read/written once between the source and the destination memory).

Finally, the allocator is responsible for managing freed chunks. The allocator remembers all the released pointers in its stack and offers them for reuse when the allocate function is called.

If there are no chunks available, the allocator knows the next address it can offer from the chunk memory that was not used before. This grows the chunks memory as a result.

Chunk size matters!

Experimenting revealed that the chunk size significantly affects the chunk insertion overhead. For instance, doubling the chunk size halved the number of insertions needed for the same amount of data.

I’ve increased the default chunk size from 4KB to 16KB, enabling the writing of a 100MB file in under 2 billion instructions.

For flexibility, users can configure chunk sizes during canister initialization (4K to 64K) before writing anything to a file system. With the chunk size setting to 64K, the writing of 100MB can be done in under 1B instructions.

Summary of new features available in v0.7

  • Backward compatibility with older versions.

  • Storing mounted memory into its host file, and initializing chunk storage with the host file chunked storage.

  • Switch between using V1 and V2 chunks if needed (V2 is active default and there is probably no good reason to switch back to the older V1 chunks other than performance estimations).

  • Configuring chunk size (possible settings: 4K, 8K, 16K, 32K, 64K).

  • Sparse file support.

  • Maximum allowed file size setting.

Summary of optimizations

  • Caching: Metadata, File descriptors, chunk pointers for faster access times.

  • Caching missing chunks (if a chunk was not found, this information is also cached).

  • Look-ahead pointer caching for reading and writing (we only search for the first chunk, then use the iterator next command to find the next chunk available).

  • Faster metadata read/write (no serialization, the object is stored directly for a dedicated memory area).

  • Chunk storage optimization for metadata (we use the same V2 file chunks to also store file metadata).

Performance comparisons

Here’s how the stable-fs versions compare when reading or writing 100 MB of data with different access patterns: the older version (v0.4), the current version (v0.7) and the current version performance based on the mounted files:

Read/write 100 MB of data v0.4 v0.7 v0.7 mounted files
Write into a new file 14.51 B 1.32 B 124.45 M
Overwrite existing file 10.52 B 142.73 M 100.01 M
Read in a single data segment 4.00 B 142.34 M 100.02 M
Write 1 KB segments into a new file 79.48 B 2.26 B 918.28 M
Overwrite an existing file in multiple 1K segments 57.42 B 655.89 M 519.91 M
Read multiple 1 KB segments 22.59 B 621.00 M 476.11 M
Write 1 KB segments into 10 new files (10 MB each) 81.39 B 2.29 B 1.11 B
Overwrite 1 KB segments in 10 files 57.61 B 819.30 M 699.55 M
Read 1 KB segments from 10 files 22.93 B 736.88 M 655.03 M

The first three tests check the performance of writing and reading one 100 MB data segment in a single call. The rest of the tests estimate the performance of making multiple 1024 byte calls on one or multiple files.

As shown, there was a significant improvement from version v0.4 to v0.7.

The baseline estimation is as expected faster than the chunked storage. We can see the big difference for newly created files, where the chunked storage needs an extra 1 billion instructions to create a file and insert all the chunks into a tree. On the other hand, reading files or writing over existing data (that does not require chunk node insertions) is relatively fast and has only a few million instructions overhead.

SQLite database benchmarks

The database consists of two tables: users(user_id, username, email) and orders(order_id, user_id, amount).

regular filesdatabase on mounted memory
Create 100 000 users 2.35 B 2.30 B (improved by 2.23%)
Create 1M orders (each refers to one of the users, no extra indexes present) 36.28 B 35.99 B
Create indexes on fields when the orders is filled with data: users.email and orders.user_id 6.98 B 6.78 B (improved by 2.76%)
Make a joint selection:
SELECT u.user_id, u.username, o.order_id, o.amount FROM users u JOIN orders o ON u.user_id = o.user_id WHERE u.user_id < 1000 ORDER BY o.created_at DESC;
247.81 M 217.42 M (improved by 12.26%)
Select using "like" on an indexed field:
SELECT * FROM users WHERE email LIKE 'user%'
856.80 M 846.90 M
Create 100 extra orders after there were already 1M orders and field indexes created. 32.37 M 18.39 M (improved by 43.20%)
Remove 1000 orders (each user has 10 orders):
DELETE FROM orders WHERE user_id <= 100
90.10 M 53.32 M (improved by 40.82%)
Delete 100000 orders with transaction rollback:
BEGIN TRANSACTION DELETE FROM orders WHERE order_id > 900000 ROLLBACK
6.42B 6.30 B

Interestingly, adding records in bulk didn’t have much benefit from using the mounted files. The main advantage was seen on the operations related to random memory access.

Is it production-ready?

I don’t think it is production ready yet, I’ve made my best effort to make the system durable and I would recommend it for usage on various non-critical projects and share your experience. I plan to continue improving the existing implementation and adding more tests to check its WASI compliance and I would be happy to receive your feedback.

10 Likes

Great news to finally have something resembling a file system on a canister.

A question regarding the standard Rust code to create a directory, change to that directory, and then open a file and write to it. Below is the code:

use std::fs::{self, File};
use std::io::Write;
use std::env;
use std::path::Path;

fn main() -> std::io::Result<()> {
    // Create a directory named "data"
    fs::create_dir_all("data")?;
    
    // Get the current directory
    let original_dir = env::current_dir()?;
    println!("Starting directory: {:?}", original_dir);
    
    // Change to the data directory
    env::set_current_dir("data")?;
    
    // Verify we're in the new directory
    let current_dir = env::current_dir()?;
    println!("Current directory: {:?}", current_dir);
    
    // Create and write to a file in the data directory
    let file_path = Path::new("example.txt");
    let mut file = File::create(file_path)?;
    
    // Write some content to the file
    writeln!(file, "Hello from Rust.")?;
    writeln!(file, "This is a new line of text, should be there in a file.")?;
    

    file.flush()?;
    
   
    let contents = fs::read_to_string(file_path)?;
    println!("File contents:\n{}", contents);
    
   
    Ok(())
}

What would be the equivalent code for a canister using Rust?

3 Likes

Great stuff as always @sgaflv!

I created a PR to add an example that does just what @josephgranata asks for, some basic fs operations.

Pasting here also for reference:

use ic_stable_structures::{
    memory_manager::{MemoryId, MemoryManager},
    DefaultMemoryImpl,
};
use std::{
    cell::RefCell,
    env,
    fs::{self, File},
    io::Write,
    path::Path,
};

thread_local! {
    static MEMORY_MANAGER: RefCell<MemoryManager<DefaultMemoryImpl>> =
        RefCell::new(MemoryManager::init(DefaultMemoryImpl::default()));
}

const WASI_MEMORY_ID: MemoryId = MemoryId::new(0);

fn init_wasi() {
    let wasi_memory = MEMORY_MANAGER.with(|m| m.borrow().get(WASI_MEMORY_ID));
    ic_wasi_polyfill::init_with_memory(&[0u8; 32], &[], wasi_memory);
}

#[ic_cdk::update]
fn greet(name: String) -> String {
    // Create a directory named "data"
    fs::create_dir_all("data").unwrap();

    // Get the current directory
    let original_dir = env::current_dir().unwrap();
    println!("Starting directory: {:?}", original_dir);

    // Change to the data directory
    env::set_current_dir("data").unwrap();

    // Verify we're in the new directory
    let current_dir = env::current_dir().unwrap();
    println!("Current directory: {:?}", current_dir);

    // Create and write to a file in the data directory
    let file_path = Path::new("example.txt");
    let mut file = File::create(file_path).unwrap();

    // Write some content to the file
    writeln!(file, "Hello from {}.", name).unwrap();
    writeln!(
        file,
        "This is a new line of text, should be there in a file."
    )
    .unwrap();

    file.flush().unwrap();

    fs::read_to_string(file_path).unwrap()
}

#[ic_cdk::init]
fn init() {
    init_wasi();
}

#[ic_cdk::post_upgrade]
fn post_upgrade() {
    init_wasi();
}
3 Likes

Example should probably be an update call instead of query call.

3 Likes

Yep, that is better. Changed.

2 Likes

The stable-fs API was created to support the wasm32-wasip1 target calls and there is no concept of a “current folder”. For the best convenience you can use the approach suggested by @kristofer: you create a project with the ic-wasi-polyfill dependency then compile it and use wasi2ic to remove the WASI dependencies, then the standard Rust functions should just work.

If you want to use the stable-fs directly, the closest thing to a “current folder” is a directory file descriptor. When you create a directory, you can “open” it like you do with a file, then you pass it as the parent FD to your file operations. This approach is a tiny bit faster, but you need to use stable-fs API which might still have some changes in the future. Also don’t forget to close the file descriptor after it is not needed.

3 Likes

Are we any closer to being able to wire these up and access them from a motoko canister?

1 Like

I would assume, since Motoko is written specifically for the Internet Computer, there is no std::fs library you could even theoretically use. Nothing preventing anyone from building it though.

Thanks again for doing this, you made the library clear, and easy to use with normal looking Rust code, the one I provided using the standard Rust library. This is super useful!

@sgaflv and @sea-snake thanks for the tips, the code is better because of it.

One final important question, how much space would we have available if we use these Rust calls for this virtual file system per canister?

The file system lives inside several virtual memories of a virtual memory manager. As long as the file system can grow the virtual memory sizes, it can create more files with more data, so it is only limited by the stable memory available to the canister.

The file system creates its own instance of the virtual manager if you initialize it from a given memory.

Currently, the initial structure of the empty file system will consume 58MB of data + about 10MB after doing some first writes into the file system. This is due to the the memory manager and the residing stable data structures initializing themselves for work.

Then creating new files and folders will eventually consume more memory pages. I’ve created a vector-memory example that stores the file system into a Vec and can report its current size.

Initialization example:

use std::{
    fs::{self, File},
    io::Write,
    path::Path,
};

use ic_stable_structures::VectorMemory;

// initialize new vector memory from a vector (could be filled with some data already)
fn new_vector_memory(v: Vec<u8>) -> VectorMemory {
    use std::{cell::RefCell, rc::Rc};
    Rc::new(RefCell::new(v))
}

thread_local! {
    static MEMORY: VectorMemory = new_vector_memory(Vec::new());
}

fn init_wasi() {
    MEMORY.with(|m| {
        ic_wasi_polyfill::init_with_memory(&[0u8; 32], &[], m.clone());
    });
}

#[ic_cdk::init]
fn init() {
    init_wasi();
}

#[ic_cdk::update]
pub fn fs_size() -> u64 {
    MEMORY.with(|m| {
        // get fs memory
        let v = m.borrow();

        // return the current size of the memory
        v.len() as u64
    })
}

#[ic_cdk::update]
pub fn write_file(file_name: String, content: String) {
    let path = Path::new(&file_name);

    // Ensure parent directories exist
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }

    // Create and write to the file
    let mut file = File::create(&file_name).unwrap();
    file.write_all(content.as_bytes()).unwrap();
}

Thanks a lot this is super useful!

One further question regarding maximum possible storage.

According to DFINITY’s documentation Canisters can hold 500 GiB, if the subnet allows it. This means that one canister in a whole subnet is taking over most or all of the storage, so it will always be less than 500 GiB because usually there should be other canisters in that subnet.

500 GiB equals 536,870,912,000 bytes, or about 536 Gigabytes.

Given that as you explained the instantiation of the virtual file system takes about 68 MB:

Maximum Virtual File System Size
= 536 GB - 68 MB
= 536,000 MB - 68 MB = 535,932 MB
= 535.93 GB

Thanks @sgaflv for correcting my math above.

In practice I do think it will be closer to 435 GB, given the fact that other canisters do run inside the same subnet and use storage space.

Is my analysis correct? Please let me know,

Joseph

P.S. For those wondering the 500 GiB maximum storage figure comes from the Internet Computer’s documentation here: Storage | Internet Computer

Just to note, 536GB = 536000 MB.

1 Like

Thanks for fixing my math mistake, so the actual available space would be a maximum of:
536 GB - 68 MB = 536,000,000 MB - 68 MB = 535,999,932 MB or 535.9 GB.

For those who want to know how much is available for your canister, and your app, use the Internet Computer Dashboard and open the subnet, in there you will see how much space has already been used.

Here is an example where 238 GiB were already used, so at most you have 535.9 GB - 238 GiB = approximately 297 GB.