Forward Proxy
A forward proxy is an intermediary server between the client and the origin server. Clients connect to the forward proxy server and request a resource (such as a completion) available on a different server that is otherwise inaccessible to them. The forward proxy server retrieves the resource and forwards it to the client.
You can combine both forward proxy and reverse proxy to create a gateway.
llama.cpp Forward Proxy
llama.cpp implements it’s own forward proxy in the form of RPC server.
It puts the llama.cpp server in form of multiple backends and distributes requests among them.
flowchart TD
rpcb---|TCP|srva
rpcb---|TCP|srvb
rpcb-.-|TCP|srvn
subgraph hostn[Host N]
srvn[rpc-server]-.-backend3["Backend (CUDA,Metal,etc.)"]
end
subgraph hostb[Host B]
srvb[rpc-server]---backend2["Backend (CUDA,Metal,etc.)"]
end
subgraph hosta[Host A]
srva[rpc-server]---backend["Backend (CUDA,Metal,etc.)"]
end
subgraph host[Main Host]
ggml[llama.cpp]---rpcb[RPC backend]
end