AI LLM via Canister/Cycles?

This has been an ongoing discussion topic in the DeAI working group this year about a mechanism to pay in cycles for query calls above the call instruction limit, for running LLMs in canister.
The forum conversation starts here after our early Jan meeting:

This then to further discussion in the Explore Query Charging thread here:

The Omnia Network team ( @ilbert and @massimoalbarello ) have been working on a project to enable paying (with ICP) for compute & web services external to the IC, based on their WebSocket Gateway work and following on from this proposal thread they started:

I think you are across some or most of this but may be useful for others coming reading this thread.