llama.cpp
GitHub Repository: https://github.com/ggerganov/llama.cpp
Llama.cpp is a production-ready, open-source runner for various Large Language Models.
It has an excellent built-in server with HTTP API.
In this handbook, we will use Continuous Batching, which in practice allows handling parallel requests.