Hey @dawnnguyen, sorry for the late reply, I was on PTO. You’ve precisely described the approach and how it’s implemented on the DFINITY side.
The main limitations around input/output limits, context length, and model diversity stem from cost, maintenance, and latency considerations. If cost and maintenance aren’t concerns for you, you’re free to choose whatever serves your purpose best.
One key limitation is that data streaming isn’t supported. Unlike the experience you get when querying ChatGPT, where the response appears character by character as it’s generated, you’ll need to wait for the complete output text. This can result in noticeable delays depending on your model size, complexity, and input length.
Without going into too much detail, additional considerations might be:
Reliability: The off chain worker becomes a single point of failure. Make sure to monitor, have error handling, or redundancy if reliability is important.
Scalability: If usage grows, you might need to redesign the off-chain worker to scale accordingly.
@aespieux The issue with dfx deps pull failing on versions 0.26.1 and higher has been resolved in dfx 0.28.0. The problem was related to the switch from replica to PocketIC in versions above 0.25.0, which caused topology differences that prevented dependency pulling.
You should now be able to upgrade to dfx 0.28.0 and successfully pull the llm canister dependency. Please let me know if you encounter any issues after upgrading.
Hi @ddave, after implementing a POC, I encounter some issue, could you please give me some advice?
I have a state type that mark the message is queued, processing or responded. And at step Off-chain worker poll the queued message from the LLM canister, I update the state to processing, so that it has to be an update, not a query call => if offchain service continuous polling, it might increase the price due to update call. Are you having the same problem? I’m thinking about keeping the poll function query call, and the offchain worker will filter out the processing msg base on its id, but not sure will it work. Can you share how you handle this?
How can the LLM canister call the chatbot canister to update the responded message? Currently my offchain service can call the LLM canister to poll and update message from LLM canister, but I can’t update the main canister from LLM canister
You can have a query call that does the polling. When it hits a message it wants to process, you can then do an update call to update the state.
The example we provide requires you to have a blocking call from yourDapp → LLM canister that will wait until the off-chain worker picks up the message (query/update) and responds with the update call to put it back into the LLM. That’s why there is a timeout that will wait to process the prompt sent to the LLM canister. If that flow doesn’t work for you can add an endpoint to your dapp that will allow it to put responses into it (which the LLM canister can do with an intra-canister call).
@ddave thanks for your answer. The 2nd is very interesting, I will try both solution (blocking until get the response and intra-canister call) and will update the the result later. Personally, I prefer the intra-canister call than the blocking method, but I will measure more to choice the best way to safe cycle
The experimental non-replicated outcalls feature will make the off-chain worker example redundant. Now you can directly use your preferred model or LLM API in Web2 to make model calls, giving you the freedom to choose any model you want. (Note: You’re still operating off-chain, so there’s no consensus or verification that your results were not tampered with.)
Hey,
I’m trying to setup locally but quite confused:
The LLM canister expects initialization argument but they are not documented anywhere in the docs, I only have the Candid interface available, which makes it difficult to understand what the values correspond to.
However, when I try to use it, the canister fails to connect to Ollama and times out:
Call was rejected: Request ID: 91c0922cf6f63314223dcd4e470458707741503f7351daf3080a79a03be6566c
Reject code: 4
Reject text: IC0503: Error from Canister w36hm-eqaaa-aaaal-qr76a-cai: Canister called `ic0.trap` with message: 'Panicked at 'called `Result::unwrap()` on an `Err` value: (SysFatal, "Timeout expired")', src/ollama.rs:31:6'
How do I fix this? How is the LLM canister supposed to connect to a local Ollama instance? Does it expect Ollama to be running on a specific port or endpoint?
Quick question, I believe the LLMs are computed outside of ICP? and not in the cansister itself right? If so, ICP has a huge stance on privacy and private computing. Dapp teams can’t directly verify whether third-party providers retain prompts. @ielashi do you have any insights on this ?
As far as I know, they mentioned that they don’t retain exact prompts or any data that can be related to a specific user but they do retain general usage metrics (number of prompts/users). Otherwise, you’re totally right, and there is not much we can do at this stage expect trust them
Okay if we follow trust only. Do we know who the providers are, because IC has a very strict system for bringing on NPs with KYC audited by third party and also verified by independent reviewers. Shouldn’t it be the same of AI providers and it to be onchain who and where the service provider or service endpoint is?