Installing with AWS Image Builder

This tutorial explains how to install llama.cpp with AWS EC2 Image Builder.

By putting llama.cpp in EC2 Image Builder pipeline, you can automatically build custom AMIs with llama.cpp pre-installed.

You can also use that AMI as a base and add your foundational model on top of it. Thanks to that, you can quickly scale up or down your llama.cpp groups.

We will repackage the base EC2 tutorial as a set of Image Builder Components and Workflow.

You can complete the tutorial steps either manually or by automating the setup with Terraform/OpenTofu. Terraform source files are linked to their respective tutorial steps.

Installation Steps

Create an IAM imagebuilder role (source file)

Go to the IAM Dashboard, click “Roles” from the left-hand menu, and select “AWS service” as the trusted entity type. Next, select “EC2” as the use case:

Next, assign the following policies:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
- arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds
- arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Name your role (for example, “imagebuilder”) and finish creating it. You should end up with permissions and trust relationships looking like this:

Create components.

We’ll need the following four components:

llama.cpp build dependencies. It needs to install build-essentials and ccache (source file)
CUDA toolkit (source file)
NVIDIA driver (source file)
llama.cpp itself (source file)

To create the component via GUI, navigate to EC2 Image Builder service on AWS. From there, select “Components” from the menu. We’ll need to add four components that will act as the building blocks in our Image Builder pipeline. You can refer to the generic EC2 tutorial for more details for more information.

Click “Create component”. Next, for each component:

Choose “Build” as the component type
Select “Linux” as the image OS
Select “Ubuntu 22.04” as the compatible OS version

Provide the following as component names and contents in YAML format:

Component name: apt_build_essential

 name: apt_build_essential
 description: "Component to install build essentials on Ubuntu"
 schemaVersion: '1.0'
 phases:
   - name: build
     steps:
       - name: InstallBuildEssential
         action: ExecuteBash
         inputs:
           commands:
             - sudo apt-get update
             - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq build-essential ccache
         onFailure: Abort
         timeoutSeconds: 180

Component name: apt_nvidia_driver_555

 name: apt_nvidia_driver_555
 description: "Component to install NVIDIA driver 550 on Ubuntu"
 schemaVersion: '1.0'
 phases:
   - name: build
     steps:
       - name: apt_nvidia_driver_555
         action: ExecuteBash
         inputs:
           commands:
             - sudo apt-get update
             - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq nvidia-driver-550
         onFailure: Abort
         timeoutSeconds: 180
       - name: reboot
         action: Reboot

Component name: cuda_toolkit_12

 name: cuda_toolkit_12
 description: "Component to install CUDA Toolkit 12 on Ubuntu"
 schemaVersion: '1.0'
 phases:
   - name: build
     steps:
       - name: apt_cuda_toolkit_12
         action: ExecuteBash
         inputs:
           commands:
             - DEBIAN_FRONTEND=noninteractive sudo apt-get -yq install nvidia-cuda-toolkit
         onFailure: Abort
         timeoutSeconds: 600
       - name: reboot
         action: Reboot

Component name: llamacpp_gpu_compute_75

 name: llamacpp_gpu_compute_75
 description: "Component to install and compile llama.cpp with CUDA compute capability 75 on Ubuntu"
 schemaVersion: '1.0'
 phases:
   - name: build
     steps:
       - name: compile
         action: ExecuteBash
         inputs:
           commands:
             - cd /opt
             - git clone https://github.com/ggerganov/llama.cpp.git
             - cd llama.cpp
             - |
               CUDA_DOCKER_ARCH=compute_75 \
               LD_LIBRARY_PATH="/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH" \
               GGML_CUDA=1 \
               PATH="/usr/local/cuda-12/bin:$PATH" \
               make -j
         onFailure: Abort
         timeoutSeconds: 1200

Once you’re finished, you’ll see all the created components you added on the list:

screenshot-04

Add Infrastructure Configuration source file

Next, we’ll create a new Infrastructure Configuration. Select it from the left-hand menu and click “Create”. You’ll need to use g4dn.xlarge instance type or any other instance type that supports CUDA. Name your configuration, select the IAM role you created in step 1, and select the instance, for example:
Add Distribution Configuration source file

Select Distribution settings in the left-hand menu to create a Distribution Configuration. It specifies how the AMI should be distributed (on what type of base AMI it will be published). Select Amazon Machine Image, name the configuration, and save:
Add Image Pipeline source file

Next, we’ll add the Image Pipeline. It will use the Components, Infrastructure Configuration, and Distribution Configuration we prepared previously. Select “Imagie Pipeline” from the left-hand menu and click “Create”. Name your image pipeline, and select the desired build schedule.

As the second step, create a new recipe. Choose AMI, name the recipe:

Next, select the previously created components:
The next step is to build the image. You should be able to run the pipeline:
Launch test EC2 Instance.

When launching EC2 instance, the llama.cpp image we prepared should be available under My AMIs list:

Summary

Feel free to open an issue if you find a bug in the tutorial or have ideas on how to improve it.

LLMOps Handbook (work in progress)

Installing with AWS Image Builder

Installation Steps

Summary