Installing with AWS Image Builder
This tutorial explains how to install llama.cpp with AWS EC2 Image Builder.
By putting llama.cpp in EC2 Image Builder pipeline, you can automatically build custom AMIs with llama.cpp pre-installed.
You can also use that AMI as a base and add your foundational model on top of it. Thanks to that, you can quickly scale up or down your llama.cpp groups.
We will repackage the base EC2 tutorial as a set of Image Builder Components and Workflow.
You can complete the tutorial steps either manually or by automating the setup with Terraform/OpenTofu. Terraform source files are linked to their respective tutorial steps.
Installation Steps
-
Create an IAM
imagebuilder
role (source file)Go to the IAM Dashboard, click “Roles” from the left-hand menu, and select “AWS service” as the trusted entity type. Next, select “EC2” as the use case:
Next, assign the following policies:
arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds
arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder
arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Name your role (for example, “imagebuilder”) and finish creating it. You should end up with permissions and trust relationships looking like this:
-
Create components.
We’ll need the following four components:
- llama.cpp build dependencies. It needs to install
build-essentials
andccache
(source file) - CUDA toolkit (source file)
- NVIDIA driver (source file)
- llama.cpp itself (source file)
To create the component via GUI, navigate to EC2 Image Builder service on AWS. From there, select “Components” from the menu. We’ll need to add four components that will act as the building blocks in our Image Builder pipeline. You can refer to the generic EC2 tutorial for more details for more information.
Click “Create component”. Next, for each component:
- Choose “Build” as the component type
- Select “Linux” as the image OS
- Select “Ubuntu 22.04” as the compatible OS version
Provide the following as component names and contents in YAML format:
Component name: apt_build_essential
name: apt_build_essential description: "Component to install build essentials on Ubuntu" schemaVersion: '1.0' phases: - name: build steps: - name: InstallBuildEssential action: ExecuteBash inputs: commands: - sudo apt-get update - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq build-essential ccache onFailure: Abort timeoutSeconds: 180
Component name: apt_nvidia_driver_555
name: apt_nvidia_driver_555 description: "Component to install NVIDIA driver 550 on Ubuntu" schemaVersion: '1.0' phases: - name: build steps: - name: apt_nvidia_driver_555 action: ExecuteBash inputs: commands: - sudo apt-get update - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq nvidia-driver-550 onFailure: Abort timeoutSeconds: 180 - name: reboot action: Reboot
Component name: cuda_toolkit_12
name: cuda_toolkit_12 description: "Component to install CUDA Toolkit 12 on Ubuntu" schemaVersion: '1.0' phases: - name: build steps: - name: apt_cuda_toolkit_12 action: ExecuteBash inputs: commands: - DEBIAN_FRONTEND=noninteractive sudo apt-get -yq install nvidia-cuda-toolkit onFailure: Abort timeoutSeconds: 600 - name: reboot action: Reboot
Component name: llamacpp_gpu_compute_75
name: llamacpp_gpu_compute_75 description: "Component to install and compile llama.cpp with CUDA compute capability 75 on Ubuntu" schemaVersion: '1.0' phases: - name: build steps: - name: compile action: ExecuteBash inputs: commands: - cd /opt - git clone https://github.com/ggerganov/llama.cpp.git - cd llama.cpp - | CUDA_DOCKER_ARCH=compute_75 \ LD_LIBRARY_PATH="/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH" \ GGML_CUDA=1 \ PATH="/usr/local/cuda-12/bin:$PATH" \ make -j onFailure: Abort timeoutSeconds: 1200
Once you’re finished, you’ll see all the created components you added on the list:
- llama.cpp build dependencies. It needs to install
-
Add Infrastructure Configuration source file
Next, we’ll create a new Infrastructure Configuration. Select it from the left-hand menu and click “Create”. You’ll need to use
g4dn.xlarge
instance type or any other instance type that supports CUDA. Name your configuration, select the IAM role you created in step 1, and select the instance, for example: -
Add Distribution Configuration source file
Select Distribution settings in the left-hand menu to create a Distribution Configuration. It specifies how the AMI should be distributed (on what type of base AMI it will be published). Select Amazon Machine Image, name the configuration, and save:
-
Add Image Pipeline source file
Next, we’ll add the Image Pipeline. It will use the Components, Infrastructure Configuration, and Distribution Configuration we prepared previously. Select “Imagie Pipeline” from the left-hand menu and click “Create”. Name your image pipeline, and select the desired build schedule.
As the second step, create a new recipe. Choose AMI, name the recipe:
Next, select the previously created components:
-
The next step is to build the image. You should be able to run the pipeline:
-
Launch test EC2 Instance.
When launching EC2 instance, the llama.cpp image we prepared should be available under
My AMIs
list:
Summary
Feel free to open an issue if you find a bug in the tutorial or have ideas on how to improve it.