Installing with AWS Image Builder

This tutorial explains how to install llama.cpp with AWS EC2 Image Builder.

By putting llama.cpp in EC2 Image Builder pipeline, you can automatically build custom AMIs with llama.cpp pre-installed.

You can also use that AMI as a base and add your foundational model on top of it. Thanks to that, you can quickly scale up or down your llama.cpp groups.

We will repackage the base EC2 tutorial as a set of Image Builder Components and Workflow.

You can complete the tutorial steps either manually or by automating the setup with Terraform/OpenTofu. Terraform source files are linked to their respective tutorial steps.

Installation Steps

  1. Create an IAM imagebuilder role (source file)

    Go to the IAM Dashboard, click “Roles” from the left-hand menu, and select “AWS service” as the trusted entity type. Next, select “EC2” as the use case:

    screenshot-01

    Next, assign the following policies:

    • arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
    • arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds
    • arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder
    • arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

    Name your role (for example, “imagebuilder”) and finish creating it. You should end up with permissions and trust relationships looking like this:

    screenshot-02 screenshot-03

  2. Create components.

    We’ll need the following four components:

    To create the component via GUI, navigate to EC2 Image Builder service on AWS. From there, select “Components” from the menu. We’ll need to add four components that will act as the building blocks in our Image Builder pipeline. You can refer to the generic EC2 tutorial for more details for more information.

    Click “Create component”. Next, for each component:

    • Choose “Build” as the component type
    • Select “Linux” as the image OS
    • Select “Ubuntu 22.04” as the compatible OS version

    Provide the following as component names and contents in YAML format:

    Component name: apt_build_essential

     name: apt_build_essential
     description: "Component to install build essentials on Ubuntu"
     schemaVersion: '1.0'
     phases:
       - name: build
         steps:
           - name: InstallBuildEssential
             action: ExecuteBash
             inputs:
               commands:
                 - sudo apt-get update
                 - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq build-essential ccache
             onFailure: Abort
             timeoutSeconds: 180
    

    Component name: apt_nvidia_driver_555

     name: apt_nvidia_driver_555
     description: "Component to install NVIDIA driver 550 on Ubuntu"
     schemaVersion: '1.0'
     phases:
       - name: build
         steps:
           - name: apt_nvidia_driver_555
             action: ExecuteBash
             inputs:
               commands:
                 - sudo apt-get update
                 - DEBIAN_FRONTEND=noninteractive sudo apt-get install -yq nvidia-driver-550
             onFailure: Abort
             timeoutSeconds: 180
           - name: reboot
             action: Reboot
    

    Component name: cuda_toolkit_12

     name: cuda_toolkit_12
     description: "Component to install CUDA Toolkit 12 on Ubuntu"
     schemaVersion: '1.0'
     phases:
       - name: build
         steps:
           - name: apt_cuda_toolkit_12
             action: ExecuteBash
             inputs:
               commands:
                 - DEBIAN_FRONTEND=noninteractive sudo apt-get -yq install nvidia-cuda-toolkit
             onFailure: Abort
             timeoutSeconds: 600
           - name: reboot
             action: Reboot
    

    Component name: llamacpp_gpu_compute_75

     name: llamacpp_gpu_compute_75
     description: "Component to install and compile llama.cpp with CUDA compute capability 75 on Ubuntu"
     schemaVersion: '1.0'
     phases:
       - name: build
         steps:
           - name: compile
             action: ExecuteBash
             inputs:
               commands:
                 - cd /opt
                 - git clone https://github.com/ggerganov/llama.cpp.git
                 - cd llama.cpp
                 - |
                   CUDA_DOCKER_ARCH=compute_75 \
                   LD_LIBRARY_PATH="/usr/local/cuda-12/lib64:$LD_LIBRARY_PATH" \
                   GGML_CUDA=1 \
                   PATH="/usr/local/cuda-12/bin:$PATH" \
                   make -j
             onFailure: Abort
             timeoutSeconds: 1200
    

    Once you’re finished, you’ll see all the created components you added on the list:

    screenshot-04

  3. Add Infrastructure Configuration source file

    Next, we’ll create a new Infrastructure Configuration. Select it from the left-hand menu and click “Create”. You’ll need to use g4dn.xlarge instance type or any other instance type that supports CUDA. Name your configuration, select the IAM role you created in step 1, and select the instance, for example:

    screenshot-05

  4. Add Distribution Configuration source file

    Select Distribution settings in the left-hand menu to create a Distribution Configuration. It specifies how the AMI should be distributed (on what type of base AMI it will be published). Select Amazon Machine Image, name the configuration, and save:

    screenshot-06

  5. Add Image Pipeline source file

    Next, we’ll add the Image Pipeline. It will use the Components, Infrastructure Configuration, and Distribution Configuration we prepared previously. Select “Imagie Pipeline” from the left-hand menu and click “Create”. Name your image pipeline, and select the desired build schedule.

    As the second step, create a new recipe. Choose AMI, name the recipe:

    screenshot-07

    Next, select the previously created components:

    screenshot-08

  6. The next step is to build the image. You should be able to run the pipeline:

    screenshot-09

  7. Launch test EC2 Instance.

    When launching EC2 instance, the llama.cpp image we prepared should be available under My AMIs list:

    screenshot-10

Summary

Feel free to open an issue if you find a bug in the tutorial or have ideas on how to improve it.