So technically, I can deploy and self hosted off-chain worker, which run my LLM. Then I deploy 2 canister, 1 is the LLM on-chain canister, 1 is my chat bot canister. Then I can do the same flow like you:
- Chatbot canister call to LLM canister
- LLM canister ingress the message → queued it
- Off-chain worker poll the queued message from the LLM canister
- Process it with local LLM or whatever
- Off-chain worker call LLM canister to update the state of the responded message state to proceeded and insert newly responded message
- LLM canister call chatbot canister to return message.
Am I right? If I do so what should I concern because I saw that you have working on it to improve and support many things like extend the input/output limit. Can you share some of the difficulties and obstacles you have encountered?