llama.cpp

Note

Llama.cpp is a production-ready, open-source runner for various Large Language Models.

It has an excellent built-in server with HTTP API.

In this handbook, we will use Continuous Batching, which in practice allows handling parallel requests.