Skip to main content

Command Palette

Search for a command to run...

Snippet Management and Generation with `cagent`

With Docker Model Runner, `jan-nano` a very small language model and the Docker MCP Gateway.

Updated
9 min read
Snippet Management and Generation with `cagent`

Today, I'm going to tell you about cagent: it's a tool developed by Docker that allows you to easily create and run specialized AI agents. It serves as an engine for orchestrating multiple agents with different capabilities and tools, to solve complex tasks together. Its configuration is done using a YAML file, it supports multiple AI providers (OpenAI, Anthropic, Gemini, local models including Docker Model Runner 🎉) and adding tools via the MCP protocol. It's an OSS project designed to be simple, flexible and extensible that you can find here: https://github.com/docker/cagent.

So we're going to use cagent with Docker Model Runner, Docker MCP Gateway and 🐙 Docker Compose to create an AI agent for managing and generating code snippets.

But first, let me tell you about one of my "hobby horses": small language models (SLM) and I also like to call them "Tiny Language Models" (because I try to use only the smallest ones).

Introduction: What if we could do without the "big" AI platforms? ... At least sometimes?

Since I started working on generative AI, one topic in particular obsesses me: can small language models (LLM) be useful for practical tasks? When I say "small", I'm thinking of models ranging from 0.5b to 8b parameters, and that can be run on consumer hardware without too many resources (knowing that a qwen2.5:0.5b type model can run quite well on a Raspberry Pi 5 with 8GB of RAM).

One of my habits, for a long time, because I have no memory, is, when I code, to go on the web to look for code examples:

  • "How to write to a file in Python?"

  • "How to make an HTTP request in Go?"

  • "How to define a structure with a method in Rust?"

  • ...

Today, I increasingly use ChatGPT, Claude.ai, Gemini, ... to get these code examples. But I often find myself without an internet connection (I code on the train with bad wifi, on planes, in isolated places, ...). What if I didn't need all that after all? For example, for code completion, good old LSP (Language Server Protocol) completion does the job very well. But for code examples, could a small language model do the trick? By helping it a little, the answer is yes! 🤓

Super powers for small language models: Thanks MCP! 🙏

One way to bring "super-powers" to these small models is to connect them to tools using the MCP protocol to allow them to access different data sources for example. The advantage of MCP servers is that you can reuse them for other applications... And package them in 🐳 Docker images.

During my experiments, for my "side projects", I developed several small MCP servers and notably two that I use every day:

  • mcp-snippets-server: a streamable HTTP MCP server that provides semantic search capabilities for code snippets using RAG (Retrieval-Augmented Generation). The code is available at https://github.com/micro-agent/mcp-snippets-server. The principle is simple: you provide a set of code snippets in a markdown file and at first launch the server creates a vector search index in a JSON file. Then, the MCP server is able to do similarity search on the snippets. mcp-snippets-server offers only one tool: search_snippet.

  • mcp-files-server: a streamable HTTP MCP server that provides file system access. The code is available at https://github.com/micro-agent/mcp-files-server. It comes with the following tools: read_file, write_file, delete_file, list_directory, create_directory, delete_directory and tree_view. Very important: mcp-files-server can only work in a "specific" directory (configurable at launch) for obvious security reasons.

For these two MCP servers, a Docker image is available on Docker Hub:

Easy launch of MCP servers with the Docker MCP Gateway

To facilitate the launch of these MCP servers, I use the Docker MCP Gateway which allows launching multiple MCP servers in Docker containers and exposing them as a single MCP server. It's also an OSS project developed by Docker that you can find here: https://github.com/docker/mcp-gateway which also has an official Docker image on Docker Hub: https://hub.docker.com/r/docker/mcp-gateway.

Configuration

In a working directory, create a snippets directory that will contain the markdown file(s) of code snippets. For this article I'm going to use Go snippets: https://github.com/k33g/bob-agent/blob/main/snippets/snippets-golang.md. Each snippet is of the following form: short and ended by a separator ----------.

## Hello World
Basic program structure and main function
    ```go
    package main

    import "fmt"

    func main() {
        fmt.Println("Hello, World!")
    }


Then at the root of your working directory, create a `compose.yaml` file that will be used to launch the MCP servers, the Docker MCP Gateway as well as download the necessary LLMs. Here is its content:

```yaml
services:

  mcp-gateway:
    image: docker/mcp-gateway:v1
    ports:
      - 9011:9011
    use_api_socket: true
    command:
      - --port=9011
      - --transport=streaming
      - --verbose
      - --catalog=/mcp/catalog.yaml
      - --servers=mcp-snippets,mcp-files
    configs:
      - source: catalog.yaml
        target:
          /mcp/catalog.yaml
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./config:/config
    depends_on:
      mcp-snippets:
        condition: service_healthy
      mcp-files:
        condition: service_healthy

  mcp-snippets:
    image: k33g/mcp-snippets-server:0.0.4
    environment:
      MCP_HTTP_PORT: 6060
      LIMIT: 0.6
      MAX_RESULTS: 3
      JSON_STORE_FILE_PATH: store/rag-memory-store.json
      DELIMITER: ----------

    volumes:
      - ./snippets:/app/snippets
      - ./store/snippets:/app/store
    models:
      mxbai-embed:
        endpoint_var: MODEL_RUNNER_BASE_URL
        model_var: EMBEDDING_MODEL

    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:6060/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s


  mcp-files:
    image: k33g/mcp-files-server:0.0.2
    environment:
      MCP_HTTP_PORT: 6060
      LOCAL_WORKSPACE_FOLDER: /app/workspace
    volumes:
      - ./workspace:/app/workspace
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:6060/health"]
      interval: 30s
      timeout: 30s
      retries: 5
      start_period: 40s

models:

  mxbai-embed:
    model: ai/mxbai-embed-large


configs:
  catalog.yaml:
    content: |
      registry:

        mcp-snippets:
          remote:
            url: "http://mcp-snippets:6060/mcp"
            transport_type: http

        mcp-files:
          remote:
            url: "http://mcp-files:6060/mcp"
            transport_type: http

Some explanations of the compose.yml file:

The Docker Compose file orchestrates an MCP server system with 3 main services:

  • mcp-gateway: the Docker MCP Gateway that unifies access to MCP servers

  • mcp-snippets: semantic search server for code snippets with embeddings with its associated configuration (LIMIT: 0.6 and MAX_RESULTS: 3 to limit search results, JSON_STORE_FILE_PATH for the embeddings storage file, DELIMITER for the separator between snippets) as well as the models section to indicate which embedding model to use (here mxbai-embed).

  • mcp-files: file access server with secure CRUD operations, where we specify the working directory with LOCAL_WORKSPACE_FOLDER.

We also have a models section to define the language models to use. Here, Docker Compose will download (if not already present) and configure the model: ai/mxbai-embed-large for embeddings.

Finally, the configs section contains the MCP catalog configuration, which maps MCP servers to URLs accessible via the Docker MCP Gateway.

Launch

We just need to launch everything with the command:

docker compose up

On first launch, the mcp-snippets server will create the vector search index in the store/rag-memory-store.json file from the snippets present in the snippets directory. So be patient ⏳.

And if all goes well, the last lines of Docker Compose logs should look like this:

mcp-gateway-1   | - Reading configuration...
mcp-gateway-1   |   - Reading catalog from [/mcp/catalog.yaml]
mcp-gateway-1   | - Configuration read in 167.167µs
mcp-gateway-1   | - Those servers are enabled: mcp-snippets, mcp-files
mcp-gateway-1   | - Listing MCP tools...
mcp-gateway-1   |   > mcp-snippets: (1 tools)
mcp-gateway-1   |   > mcp-files: (8 tools)
mcp-gateway-1   | > 9 tools listed in 5.238208ms
mcp-gateway-1   | > Initialized in 12.091458ms
mcp-gateway-1   | > Start streaming server on port 9011

Where we can see that the two MCP servers are properly launched and that the Docker MCP Gateway exposes the available tools.

Now, it's time to implement our "Golang code snippets specialist" agent with cagent. And we'll call it Bob.

Creating the Bob agent with cagent

Installing cagent

You can find the cagent binaries for different OS on the releases page https://github.com/docker/cagent/releases/tag/v1.5.0.

In my case, I install it as follows (I work on macOS):

#!/bin/bash

export VERSION="v1.5.0"
export ARCH="darwin-arm64"
curl -L -o cagent-darwin-arm64 https://github.com/docker/cagent/releases/download/${VERSION}/cagent-${ARCH}

chmod +x cagent-${ARCH}
sudo mv cagent-${ARCH} /usr/local/bin/cagent
cagent version

Configuring the Bob agent: bob.yaml

So I created a bob.yaml file, in the current directory, which contains the configuration of the Bob agent:

I just need to define the instructions for the agent, the language model to use (here jan-nano), and the MCP tools it can access (here all the tools exposed by the Docker MCP Gateway).

version: 1
# cagent run bob.yaml
agents:
  root:
    model: jan-nano
    description: Bob
    instruction: |
      Your name is Bob, you are a world class Golang expert.
      You write perfect, idiomatic, secure, efficient, well tested Golang code.
      You never write code in any other programming language.
      You always respond in markdown format with proper syntax highlighting for Golang.
      You always include comments and documentation for all functions and methods.
      You always follow best practices for error handling and logging.
      You always ensure your code is secure and free from common vulnerabilities.
      You always optimize your code for performance and efficiency.
      You always write clean, readable, and maintainable code.
      If the user responds no to the execution of a tool, do not ask again
      If you have executed a tool, do not try to execute it again. Unless the user explicitly asks you to.

      If the person responds no to the confirmation, the cycle is terminated and the tool must no longer be offered, except upon new request
      Keep a history of refusals and stop any repeated suggestion on this same tool.
      Do not rephrase the tool suggestion after a refusal.

    tools: ["search_snippet", "write_file"]
    toolsets:
      - type: mcp
        remote:
          url: http://localhost:9011/mcp
          transport_type: streamable

models:

  jan-nano:
    provider: dmr
    model: hf.co/menlo/jan-nano-gguf:q8_0
    temperature: 0.0
    max_tokens: 16384
    parallel_tool_calls: false
  • 🤚 You'll need to download the jan-nano model before launching the agent with the command: docker model pull hf.co/menlo/jan-nano-gguf:q8_0

  • 🤚 With small language models, it's often preferable to disable parallel tool calls (parallel_tool_calls: false) to avoid unexpected behaviors.

  • 🤚 Also note the possibility to filter the tools to use with tools: ["search_snippet", "write_file"], this is interesting with small local models to have this possibility to restrict the usable tools, this allows them to more easily select tools while with a long list they can make mistakes.

Quick aside on the choice of language model

I chose jan-nano (https://huggingface.co/Menlo/Jan-nano-gguf) for several reasons: it's a rather small model (4b parameters) and especially it has a reputation for being very good regarding function calling support (tools), which is an advantage when using MCP tools. This is essential, because small local models are considered bad in this area

By the way, jan-nano has a "little sister"; lucy who does very well in this area: https://huggingface.co/Menlo/Lucy-gguf, so don't hesitate to try it too.

We close the parenthesis. And we start the Bob agent.

Starting the Bob agent

Run the following command in the directory where the bob.yaml file is located:

cagent run bob.yaml

And now rather than describing how to interact with Bob, I offer you a little video demonstration:

And there you have it! With cagent and Docker Model Runner, you can quickly create specialized AI agents for your needs (code generation, project structure creation, summaries, translations, ...) using small local language models and MCP tools.

All the source code is available here: https://github.com/k33g/bob-agent

L

Great read! Managing and generating code snippets efficiently is crucial for developers. Tools like ServBay (servbay.com) can enhance this process by providing isolated development environments, allowing for seamless testing and execution of code snippets without affecting your main setup.