Federated Learning on ICP

mmacura9 · August 5, 2025, 6:48pm

Hello everyone,

I wanted to share an idea for a potential feature that could be built on ICP, particularly around federated learning.

Imagine a university or hospital (let’s say University A) has access to sensitive lung scan data that could be used to train an AI model. However, due to privacy regulations — or because patients only gave consent for local use — this data can’t be shared externally. On top of that, the dataset might be large, making it expensive or impractical to transfer.

Now consider another institution (University B) that wants to train a lung cancer detection model but lacks sufficient data. Instead of transferring the data, the model could be sent to University A, where it trains directly on the local server. ICP could act as the coordination and verification layer for this process — ensuring trust, logging training metadata, and possibly even enforcing access control.

This approach could scale across multiple data holders, allowing a model to train across several institutions — all without ever exposing raw data. Smaller universities and research centers that lack data or infrastructure could greatly benefit from using models trained on shared (but secure) resources.

There could be a problem with data preprocessing as the researchers that want to make a model cannot do without knowing some information about data. Maybe the institution that host the data could share a short snippet of data as an example.

Training might be too expensive to run directly inside a canister, especially due to the lack of GPU support and the potential costs involved. A practical solution could be to perform the training off-chain, and then provide the updated model weights along with a verifiable proof that training was completed — such as the number of epochs, validation metrics, or logs. That said, smaller models might still be feasible to train directly on ICP, and in some cases, even the data could be stored on-chain using solutions like Filecoin.

Furthermore this could become the new type of benchmark. For example: our model got accuracy of 92% from Dfinity lung dataset.

I’d love to hear your thoughts on this concept and whether you see potential in exploring it further.

timk11 · August 7, 2025, 12:16am

I’m very interested in this concept. I’m currently working on a grant project aimed at using zero-knowledge proofs to train a machine learning model on a private dataset. The broader idea I have in mind is to use it as part of a federated learning system along similar lines to what you’ve outlined here. Unfortunately the project has stalled as the package I was basing it around does not seem to be adaptable to machine learning calculations so I’m currently researching other ways in which this might be achieved.

Topic		Replies	Views
Come hear about the state of the ART on ZKML. *ICP is the global orchestration layer for DeAI Showcase	19	332	June 27, 2025
How can I take an open source pretrained LLM model, deploy it to ICP and use as a private ChatGPT just fo me Developers	13	388	June 16, 2025
AI and machine learning on the IC? Developers	114	10399	June 20, 2024
Ideas on how to optimise AI capabilities on ICP Developers	3	127	July 28, 2024
AI on Chain Skepticism Programs & Applications	10	552	January 17, 2025

Federated Learning on ICP

Related topics