The costs right now are pretty high. A story that generates about 100 tokens consumes 0.076 TCycles = $0.1.
That will come down drastically though once we can do the inference as a query call and when the compute/AI improvements of the DFINITY Roadmap become available.
I expect to already deploy a real chat LLM in a few months, but latency will be poor. Implementing it now though, so we are ready when the IC scales up.