I wanted to try implement a chatbot on ICP. I understand high level what I need to do. I did read some threads about running an LLM on chain and also read about an app where someone provided the option to download the LLM to the browser cache. However let’s assume I want to do straight up OpenAI calls. Now I would like to store state per user with threading capability. Now every time I ask a question should I typically make that a query call and then persist the response as a mutate call? If the LLM query were not a query call then every node could get a different response given the LLM is non-deterministic. So how do I go about thinking about this? Thanks in advance.
Yeah, sounds about right… querycall, then make the LLM request… update after getting the response.