Teaching Docker to an SLM using RAG

Introduction

The topic of this blog post is to make a SLM smarter (SLM for Small Language model).

Why would you want to use a SLM?

You can run it on a Raspberry Pi (so you don't need a GPU)
The hosting costs are lower (no need for a GPU; the size of the model is smaller)

I would say that running a SLM is more ecological and democratic.

So, today, I will try to teach some Docker commands to a SLM. My tools are:

Ollama and its API
For chatting: the Qwen2:0.5b LLM, a small language model of 352MB
For the embedding: the All-minilm:33m LLM, a tiny language model of 67MB
Parakeet is a simple and easy-to-use Golang wrapper around the Ollama API. I developed it to make it easier to use the Ollama API in my Golang projects instead of using Langchain for Go, which is more complex.

As Parakeet is only a Golang wrapper around the Ollama API, you can reproduce this experiment in any language and with other frameworks like LangChain.

all the source code examples are available here: https://github.com/parakeet-nest/parakeet/tree/main/examples/26-docker-cmd

Let's check the Docker knowledge of the Qwen2:0.5b LLM

First, you will create a new Golang project and add the Parakeet package to your project. This project is a simple command-line application that interacts with a language model to translate user input into Docker commands.

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "strings"

    "github.com/parakeet-nest/parakeet/completion"
    "github.com/parakeet-nest/parakeet/llm"
    "github.com/parakeet-nest/parakeet/enums/option"
)

func main() {
    ollamaUrl := "http://localhost:11434"
    smallChatModel := "qwen2:0.5b"

    // 1️⃣ Prepare the system content
    systemContent := `instruction: translate this sentence in docker command - stay brief`

    // 2️⃣ Start the conversation
    for {
        question := input(smallChatModel)
        if question == "bye" {
            break
        }

        // 3️⃣ Prepare the query
        query := llm.Query{
            Model: smallChatModel,
            Messages: []llm.Message{
                {Role: "system", Content: systemContent},
                {Role: "user", Content: question},
            },
            Options: llm.SetOptions(map[string]interface{}{
                option.Temperature: 0.0,
                option.RepeatLastN: 2,
                option.RepeatPenalty: 3.0,
                option.TopK: 10,
                option.TopP: 0.5,
            }),
        }

        // 4️⃣ Answer the question (stream mode)
        _, err := completion.ChatStream(ollamaUrl, query,
            func(answer llm.Answer) error {
                fmt.Print(answer.Message.Content)
                return nil
            })

        if err != nil {
            log.Fatal("😡:", err)
        }
        fmt.Println()
    }
}

func input(smallChatModel string) string {
    reader := bufio.NewReader(os.Stdin)
    fmt.Printf("🐳 [%s] ask me something> ", smallChatModel)
    question, _ := reader.ReadString('\n')
    return strings.TrimSpace(question)
}

systemContent is a string that sets the instruction for the language model.
The program enters an infinite loop, repeatedly prompting the user for input. question := input() calls the input function to read user input from the command line. If the user types bye, the loop breaks and the program exits.
Query Preparation: a query of type llm.Query is prepared with the following fields:

Model: The model identifier (smallChatModel).
Messages: A slice of llm.Message containing two messages:
- A system message with the content systemContent.
- A user message with the content of the user's question.
Options: Various options for the language model, such as Temperature, RepeatLastN, RepeatPenalty, TopK, and TopP. (more info about these options can be found in the Ollama documentation)

Query Execution:

The completion.ChatStream function is called with ollamaUrl and the query.
A callback function is provided to handle the response (answer llm.Answer), which prints the content of the answer message to the console.

Let's run the program and see how well the Qwen2:0.5b LLM can translate user input into Docker commands.

go run main.go

Then, ask the following questions:

"Give me a list of all the local Docker images."
"Give me a list of all containers, indicating their status as well."
"List all containers with Ubuntu as their ancestor."

https://youtu.be/f_A83zVSSc8

You will see that the answers are inaccurate (except one). 🤔 the LLM is completely off the mark and is talking nonsense. We can say that the LLM is terrible at Docker.

The right answers to the questions are:

"Give me a list of all the local Docker images."
- docker images
"Give me a list of all containers, indicating their status as well."
- docker ps -a
"List all containers with Ubuntu as their ancestor."
- docker ps --filter 'ancestor=ubuntu'

Let's see how we can lend a hand to our LLM so it can help us generate relevant responses.

Add some context to the prompt.

An LLM is just a text generator that needs precise information to generate relevant responses (basically, you must also provide the answers in the prompt).

The prompt comprises instructions for the LLM and the user's questions. However, you can also add answers (context) to help the LLM generate more relevant responses.

I will update the code to add some context to the prompt:

I will add a system message to provide more information about the task.
I will add some examples of questions and Docker commands to the prompt.
I will update the query to include the context in the prompt.

func main() {
    ollamaUrl := "http://localhost:11434"
    smallChatModel := "qwen2:0.5b"

    // 1️⃣ Make the system content more informative
    systemContent := `instruction: 
    translate the user question in docker command using the given context.
    Stay brief.`

    // 2️⃣ Prepare the context content with some examples
    contextContent := `<context>
        <doc>
        input: Give me a list of all containers, indicating their status as well.
        output: docker ps -a
        </doc>
        <doc>
        input: List all containers with Ubuntu as their ancestor.
        output: docker ps --filter 'ancestor=ubuntu'
        </doc>
        <doc>
        input: Give me a list of all the local Docker images.
        output: docker images
        </doc>
    </context>
    `

    for {
        question := input(smallChatModel)
        if question == "bye" {
            break
        }

        // Prepare the query
        query := llm.Query{
            Model: smallChatModel,
            Messages: []llm.Message{
                {Role: "system", Content: systemContent},
                // 3️⃣ Add the context to the prompt
                {Role: "system", Content: contextContent},
                {Role: "user", Content: question},
            },
            Options: llm.SetOptions(map[string]interface{}{
                option.Temperature: 0.0,
                option.RepeatLastN: 2,
                option.RepeatPenalty: 3.0,
                option.TopK: 10,
                option.TopP: 0.5,
            }),    
        }

        // Answer the question
        _, err := completion.ChatStream(ollamaUrl, query,
            func(answer llm.Answer) error {
                fmt.Print(answer.Message.Content)
                return nil
            })

        if err != nil {
            log.Fatal("😡:", err)
        }
        fmt.Println()
    }
}

Note: I added some tags (<context><doc></doc></context>) to the context content to make it more readable/understandable for the LLM. I found that the LLm's answers were more accurate when I delimited the examples in context. This seems to help the LLM focus on the most appropriate example.

Now, let's run the program again and see if Qwen2:0.5b LLM is smarter.

go run main.go

Then, ask (again) the following questions:

"Give me a list of all the local Docker images."
"Give me a list of all containers, indicating their status as well."
"List all containers with Ubuntu as their ancestor."

You should see more accurate Docker commands generated by the LLM:

https://youtu.be/ZK3TBuG8v7U

Unfortunately, we have made our SLM smarter for only three examples. We will need to give it the ability to build appropriate contexts based on user requests.

Let's do some RAG and create Embeddings.

RAG (Retrieval-Augmented Generation) and embeddings are key concepts in Generative AI:

Embeddings are numerical representations of words, phrases, or entire documents in a high-dimensional vector space. They capture semantic meaning, allowing similar concepts to be close to each other in this space.

RAG: This technique combines retrieval of relevant information with language generation. It works by:

Creating embeddings of a knowledge base
For a given query, finding the most relevant information using embedding similarity
Providing this retrieved context to an LLM to generate a response

RAG enhances the accuracy and relevance of AI responses by grounding them in specific, retrieved information rather than relying solely on the model's pre-trained knowledge.

Create Embeddings for Docker Commands

To create the embedding database, we need a knowledge base. We will use the Docker commands dataset from the Hugging Face Datasets library: https://huggingface.co/datasets/adeocybersecurity/DockerCommand.

To download the dataset, run the following command:

wget https://huggingface.co/datasets/adeocybersecurity/DockerCommand/resolve/main/NLDockercommands.json

You will get a JSON file containing the Docker commands dataset with the following structure:

[
  {
    "input": "Give me a list of containers that have the Ubuntu image as their ancestor.",
    "instruction": "translate this sentence in docker command",
    "output": "docker ps --filter 'ancestor=ubuntu'"
  },
]

We will use the all-minilm:33m LLM to generate embeddings from every record in the dataset. The all-minilm:33m model is a tiny language model of 67MB. This model is only used for embeddings and not for chatting.

We need to create a new Golang project and add the Parakeet package. This project will be a simple command-line application that interacts with all-minilm:33m to generate the embeddings and store them in a vector database.

The provided Go code reads a JSON file containing a list of items, processes each item to create embeddings using all-minilm:33m, and then saves these embeddings to a vector store.

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "os"
    "strconv"

    "github.com/parakeet-nest/parakeet/embeddings"
    "github.com/parakeet-nest/parakeet/llm"
    "github.com/parakeet-nest/parakeet/enums/option"
)

// 1️⃣ Item 
type Item struct {
    Input       string `json:"input"`
    Instruction string `json:"instruction"`
    Output      string `json:"output"`
}

func main() {
    ollamaUrl := "http://localhost:11434"
    embeddingsModel := "all-minilm:33m" 

    // 2️⃣ Initialize the vector store
    store := embeddings.BboltVectorStore{}
    store.Initialize("../embeddings.db")

    // 3️⃣ Read the JSON file
    fileContent, err := os.ReadFile("./NLDockercommands.json")
    if err != nil {
        log.Fatal("😠 Error reading file:", err)
    }

    // 4️⃣ Parse the JSON data
    var items []Item
    err = json.Unmarshal(fileContent, &items)
    if err != nil {
        log.Fatal("😠 Error parsing JSON:", err)
    }

    // 5️⃣ Create and save the embeddings
    for i, item := range items {
        fmt.Println("📝 Creating embedding from record ", i+1)

        doc := fmt.Sprintf("Input: %s \n\nOutput:%s", item.Input, item.Output)

        embedding, err := embeddings.CreateEmbedding(
            ollamaUrl,
            llm.Query4Embedding{
                Model:  embeddingsModel,
                Prompt: doc,
            },
            strconv.Itoa(i+1),
        )
        if err != nil {
            fmt.Println("😡:", err)
        } else {
            _, err := store.Save(embedding)
            if err != nil {
                fmt.Println("😡:", err)
            }
        }
    }
}

Item Struct: Defines a struct Item with three fields: Input, Instruction, and Output, all of which are strings and will be parsed from the JSON file.
Initialization:

Initializes a BboltVectorStore to store embeddings, pointing to a database file ../embeddings.db. (Parakeet provides two kinds of vector stores: an in-memory vector store and a Bbolt vector store. The last uses Bbolt, an embedded key/value database for Go to persist the vectors and the related data.)

Reads the content of NLDockercommands.json into fileContent.
Parse JSON Data: unmarshals the JSON content into a slice of Item structs.
Create and Save Embeddings:

Iterates over each Item in the parsed data.
For each item, it constructs a document string doc combining the Input and Output fields.
Calls embeddings.CreateEmbedding with the constructed document to create an embedding.
Saves the embedding to the vector store.

Ok, now, let's run the program to create the embeddings and store them in the vector database: embeddings.db.

go run main.go

Wait for the program to finish processing all the items in the dataset (we run it only once, except if we want to update the dataset). Once completed, you will have a database of embeddings for the Docker commands dataset of 2415 records.

Now, we are ready to update the first program to use the embedding database to provide context to the LLM.

Use Embeddings to enhance the LLM

We will update the source code again to use the embeddings we created in the previous step.

This time, the application interacts with two LLMs (all-minilm:33m and qwen2:0.5b) to translate user questions into Docker commands using a given context:

func main() {
    // 1️⃣ Configuration
    ollamaUrl := "http://localhost:11434"
    smallChatModel := "qwen2:0.5b"
    embeddingsModel := "all-minilm:33m"

    // 2️⃣ System Content
    systemContent := `instruction: 
    translate the user question in docker command using the given context.
    Stay brief.`

    // 3️⃣ Embedding Store
    store := embeddings.BboltVectorStore{}
    store.Initialize("../embeddings.db")

    // 4️⃣ Main Loop
    for {
        question := input(smallChatModel)
        if question == "bye" {
            break
        }

        // 5️⃣ Create an embedding from the question
        embeddingFromQuestion, err := embeddings.CreateEmbedding(
            ollamaUrl,
            llm.Query4Embedding{
                Model:  embeddingsModel,
                Prompt: question,
            },
            "question",
        )
        if err != nil {
            log.Fatalln("😡:", err)
        }

        // 6️⃣ Search for similarity
        fmt.Println("🔎 searching for similarity...")
        similarities, _ := store.SearchTopNSimilarities(
            embeddingFromQuestion, 
            0.4, 
            3)

        // 7️⃣ Generate context content
        contextContent := embeddings.GenerateContextFromSimilarities(similarities)
        fmt.Println("🎉 similarities:", len(similarities))

        // 8️⃣ Prepare the query
        query := llm.Query{
            Model: smallChatModel,
            Messages: []llm.Message{
                {Role: "system", Content: systemContent},
                {Role: "system", Content: contextContent},
                {Role: "user", Content: question},
            },
            Options: llm.SetOptions(map[string]interface{}{
                option.Temperature: 0.0,
                option.RepeatLastN: 2,
                option.RepeatPenalty: 3.0,
                option.TopK: 10,
                option.TopP: 0.5,
            }),    
        }

        // 9️⃣ Answer the question
        _, err = completion.ChatStream(ollamaUrl, query,
            func(answer llm.Answer) error {
                fmt.Print(answer.Message.Content)
                return nil
            })

        if err != nil {
            log.Fatal("😡:", err)
        }

        fmt.Println()
    }
}

Configuration: Sets URLs and model names for the Chqt LLM and embeddings LLM.
System Content: Defines the instruction for the language model to translate user questions into Docker commands.
Embedding Store: Initializes a vector store for embeddings using a BoltDB database.
Main Loop: Continuously prompts the user for input until the user types "bye".
Embedding Creation: This function creates an embedding for the user's question using the specified embeddings model (all-minilm:33m).
Similarity Search: Searches for similar embeddings in the vector store.
Context Generation: Generates context content from the found similarities.
Query Preparation: Prepares a query for the Chat LLM (qwen2:0.5b) with the system content, context content, and user question.
Answer Handling: Sends the query to qwen2:0.5b and prints the response.

Remark 1: This line of code similarities, _ := store.SearchTopNSimilarities(embeddingFromQuestion, 0.4, 3) searches the vector store for the top 3 embeddings that are most similar to embeddingFromQuestion with a similarity score above 0.4. The results are stored in the similarities variable.

Remark 4: SearchTopNSimilarities uses the cosine distance calculation.

Remark 3: Finding a not-too-high number of similarities allows the SLM to avoid overloading the context with unnecessary information and thus helps it maintain focus.

Ok, now, let's run the program again and see if Qwen2:0.5b LLM is smarter on other commands.

go run main.go

Then, ask (again) the following questions:

"Give me a list of all the local Docker images."
"Give me a list of all containers, indicating their status as well."
"List all containers with Ubuntu as their ancestor."
"Can you list down the images that are dangling?"
"Show the list of nginx images in the store."
"Show the list of redis images in the store."

You should see more accurate Docker commands generated by the LLM.

https://youtu.be/IG_uEDOMfHo

The dataset I found on HuggingFace isn't perfect (it's missing quite a few Docker commands, like docker build, or Docker Compose commands, etc...), but it gives you the approach to "augment" an LLM without having to do fine-tuning.

So you can see that with a dataset of specific information and appropriate prompts, it is entirely possible to use an SLM in a useful and relevant way for your applications.

That's all for today. Next time, I'll explore other RAG techniques and discuss the principles of document chunking.