Understand RAG with Parakeet

Understand RAG with Parakeet

Doing RAG with Golang and Ollama

Reminder: Parakeet is a small library to develop GenAI applications with Golang in a simple way (you can read an introduction here: https://k33g.hashnode.dev/parakeet-an-easy-way-to-create-genai-applications-with-ollama-and-golang

Limits of the context & LLMs

In the previous post, we saw that context is a way to provide additional information to the LLM because it does not own this information. But this can have some limits and even generate weird answers, depending on your LLM (training, amount of knowledge data, etc.).

Let's do some experiments. I will use Qwen:0.5b as LLM because it is small, 395MB, which means it does not know everything, and you can run it efficiently on a PI5 (with 8GB RAM).

Does Qwen know Star Trek?

I will ask the LLM some questions about characters of the Star Trek franchise:

package main

import (
    "fmt"
    "github.com/parakeet-nest/parakeet/completion"
    "github.com/parakeet-nest/parakeet/llm"
)

var ollamaUrl = "http://localhost:11434"
var systemContent = `You are an expert of the Star Trek franchise.`
var model = "qwen:0.5b"

func Question(userContent string) (string, error) {
    options := llm.Options{
        Temperature: 0.0, 
        RepeatLastN: 2,
        RepeatPenalty: 1.5,
    }

    query := llm.Query{
        Model: model,
        Messages: []llm.Message{
            {Role: "system", Content: systemContent},
            {Role: "user", Content: userContent},
        },
        Options: options,
    }

    answer, err := completion.Chat(ollamaUrl, query)

    fmt.Println("🙂 >", userContent)
    fmt.Println("🤖 >", answer.Message.Content)
    fmt.Println()

    if err != nil {
        return "", err
    }
    return answer.Message.Content, nil
}


func main() {
    userContent := "Who is James T Kirk and who is his best friend?"
    Question(userContent)

    userContent = "Who is Jean-Luc Picard and who is his best friend?"
    Question(userContent)

    userContent = "Who is Michael Burnham and who is his best friend?"
    Question(userContent)

    userContent = "Who is Spock and who is his best friend?"
    Question(userContent)

    userContent = "Who is KeegOrg and who is his best friend?"
    Question(userContent)
}

I do not handle the errors, only for keeping the readability of the example

The Question function will "send" the question to the LLM and print the answer. When I run the program, the result should be like this:

🙂 > Who is James T Kirk and who is his best friend?
🤖 > James T Kirk is a well-known actor and host of the Star Trek 
franchise. He is best friends with the other members of the Star Trek 
franchise.

James T Kirk was born on July 15, 1975, in Kansas. He attended the 
University of Kansas, where he studied Computer Science. 

James T Kirk has been active in various organizations, including the 
American Academy of Sciences, the National Academy of Sciences, and 
the International Academy of Sciences. 

James T Kirk's best friend is not mentioned in the given information.

🙂 > Who is Jean-Luc Picard and who is his best friend?
🤖 > Jean- Luc Picard is a prominent member of the Star Trek franchise. 
He is known for his exceptional leadership skills, his ability to 
balance his personal and professional life, and his ability to inspire 
and motivate others.

In addition to his exceptional leadership skills, Jean- Luc Picard is 
also known for his remarkable physical and mental abilities. He is known
for his remarkable strength, his remarkable endurance, and his 
remarkable endurance.

In summary, Jean- Luc Picard is a prominent member of the Star Trek 
franchise. He is known for his exceptional leadership skills, remarkable
physical and mental abilities, and remarkable endurance.

🙂 > Who is Michael Burnham and who is his best friend?
🤖 > Michael Burnham is the head of the Star Trek franchise, and his 
best friend is a fellow member of the Star Trek crew, known as Q-Two.

🙂 > Who is Spock and who is his best friend?
🤖 > Spock is a main character from the Star Trek franchise. He is a 
highly skilled and intelligent officer who serves on the USS Enterprise. 
Spock is known for his quick wit, his ability to understand and 
communicate complex information, and his willingness to put the needs 
of others before his own. Spock is considered to be one of the most iconic
and memorable characters from the Star Trek franchise.

🙂 > Who is KeegOrg and who is his best friend?
🤖 > KeegOrg is a character from the Star Trek franchise. He is best 
friends with the character of Captain Jean-Luc Picard.

KeegOrg is a highly skilled and cunning scientist. He is often referred 
to as "Keegorg" or "Keegorg the Skilled."

Keegorg is known for his ability to think quickly and efficiently. 
He is often described as a genius or a wizard.

In summary, Keegorg is a highly skilled and cunning scientist. He is often
referred to as "Keegorg" or "Keegorg the Skilled."

Apparently, Qwen does not know the Star Trek franchise and its characters well. But it still tries to answer questions about characters that do not exist in the franchise, like Keegorg, which I invented.

Remember this line var systemContent = `You are an expert of the Star Trek franchise.` . Thanks to (or because of) this line, Qwen (the LLM) assumes that every question about a character is related to Star Trek. Then, logically, Keegorg is a character in Star Trek.

Never mind, let's help it a little by providing some context.

Trying to teach Qwen about Star Trek

This time, I will tell the LLM to use the provided context (see the systemContent variable), and I will create some context with more information for the LLM (see the contextContent variable).

The former Question function becomes QuestionWithContext, and I will pass the value of contextContent to it as a parameter to construct the array of messages for the LLM:

Messages: []llm.Message{
    {Role: "system", Content: systemContent},
    {Role: "system", Content: contextContent}, // the context
    {Role: "user", Content: userContent},
},

This is the entire source code:

package main

import (
    "fmt"
    "github.com/parakeet-nest/parakeet/completion"
    "github.com/parakeet-nest/parakeet/llm"
)

var ollamaUrl = "http://localhost:11434"

var systemContent = `You are an expert of the Star Trek franchise.
Using the provided context, answer the user's question to the best 
of your ability using only the resources provided.
`

var contextContent = `
Michael Burnham is the main character on the Star Trek series Discovery.
Michael Burnham's best friend is Sylvia Tilly.
---
James T. Kirk, also known as Captain Kirk, is the iconic captain of the 
starship USS Enterprise.
Kirk's best friend is Spock.
---
Jean-Luc Picard is the captain of the USS Enterprise-D.
Jean-Luc Picard's best friend is Dr. Beverly Crusher.
---
Spock is most famous for being the half-Vulcan, half-human science 
officer and first officer on the starship USS Enterprise.
Spock's best friend is Kirk.
---
Lieutenant KeegOrg is the enigmatic programming genius whose codes 
safeguard the ship's secrets and operations.
KeegOrg's best friend is Spiderman from the Marvel Cinematic Universe.
`

var model = "qwen:0.5b"

func QuestionWithContext(userContent string, contextContent string) (string, error) {
    options := llm.Options{
        Temperature: 0.0,
        RepeatLastN: 2,
        RepeatPenalty: 1.5,
    }

    query := llm.Query{
        Model: model,
        Messages: []llm.Message{
            {Role: "system", Content: systemContent},
            {Role: "system", Content: contextContent},
            {Role: "user", Content: userContent},
        },
        Options: options,
    }

    answer, err := completion.Chat(ollamaUrl, query)

    fmt.Println("🙂 >", userContent)
    fmt.Println("🤖 >", answer.Message.Content)
    fmt.Println()

    if err != nil {
        return "", err
    }

    return answer.Message.Content, nil
}

func main() {

    userContent := "Who is James T Kirk and who is his best friend?"
    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Jean-Luc Picard and who is his best friend?"
    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Michael Burnham and who is his best friend?"
    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Spock and who is his best friend?"
    QuestionWithContext(userContent, contextContent)

    userContent = "Who is KeegOrg and who is his best friend?"
    QuestionWithContext(userContent, contextContent)
}

Then, when I run the program, I get this kind of result:

🙂 > Who is James T Kirk and who is his best friend?
🤖 > James T Kirk is the iconic captain of the Starship USS Enterprise. 
His best friend is Spock, the half-ulan, half-uman science officer and 
first officer on the starship.

🙂 > Who is Jean-Luc Picard and who is his best friend?
🤖 > Jean-Luc Picard is the enigmatic programming genius whose codes 
safeguard the ship''s secrets and operations. His best friend is Spock.
Spock is most famous for being the half-Valulcan, half-human science 
officer and first officer on the starship USS Enterprise. 
Spock's best friend is Kirk.
---

🙂 > Who is Michael Burnham and who is his best friend?
🤖 > Michael Burnham is a science officer and first officer on the 
starship USS Enterprise. His best friend is Spock, the half-venulcan, 
half-uman science officer and first officer on the starship USS 
Enterprise.

🙂 > Who is Spock and who is his best friend?
🤖 > Spock is the enigmatic programming genius whose codes safeguard 
the ship's secrets and operations. He is also the best friend of 
Captain Kirk, Spock, and Dr. Beverly Crush.
Spock's best friend is Kirk.

🙂 > Who is KeegOrg and who is his best friend?
🤖 > Keegorg is the enigmatic programming genius whose codes safeguard 
the ship's secrets and operations. Keegorg' best friend is Spiderman 
from the Marvel Cinematic Universe.

🤔 OK, well, it's a little bit better; even if Qwen now knows that James Kirk and Spock are friends and that Keegorg's best friend is Spiderman, there are still some errors:

  • Jean-Luc Picard and Spock are not friends

  • Michael Burnham and Spock are not friends (by the way, they are sister and brother)

  • Spock does not know Beverly Crush

That means the LLM (Qwen) uses all the context for every question about a Star Trek character and tries its best to provide an answer.

To solve this problem, we need to reduce the context of each question asked. Ideally, the context should only concern the character relating to the question. This will prevent the LLM from digging up other information that is not directly related, which should improve the quality of the answers.

And this is where the RAG technique will help us 🎉.

Retrieval augmented generation

"Retrieval augmented generation (RAG) is a natural language processing (NLP) technique that combines the strengths of both retrieval- and generative-based artificial intelligence (AI) models."

So, RAG is a way to allow the LLM to give more accurate answers without the need to re-train the LLM by providing it external information.

The principle of RAG is pretty simple:

  • We will cut sources of information into small pieces (more or less large)

  • For each of these pieces we will calculate an embedding (it's a vector representation of the data and the relationships between them) - LLMs are capable of calculating the embedding of a set of tokens and Ollama provides the API for that (there are models dedicated to the calculation of embeddings)

  • We will store each embedding with its associated content in a "vector store"

And then for each question we want to ask the model:

  • We first calculate the embedding of the question

  • We then search for the embedding(s) closest to the embedding of the question in the "vector store" by calculating the distance between the vectors. We therefore do a search for similarities.

Once we have a list of similarities, as we have the content associated with the embeddings, we can reconstruct a much more precise context in relation to the question and therefore send the necessary elements to the model so that it constructs an answer.

Let's doing RAG with Parakeet

Parakeet provide an in-memory store (for "small" needs) and helpers to create ambeddings and search similarities in this store.

We assume that the external source is the content of the contextContent variable:

var contextContent = `
Michael Burnham is the main character on the Star Trek series Discovery.
Michael Burnham's best friend is Sylvia Tilly.
---
James T. Kirk, also known as Captain Kirk, is the iconic captain of the 
starship USS Enterprise.
Kirk's best friend is Spock.
---
Jean-Luc Picard is the captain of the USS Enterprise-D.
Jean-Luc Picard's best friend is Dr. Beverly Crusher.
---
Spock is most famous for being the half-Vulcan, half-human science 
officer and first officer on the starship USS Enterprise.
Spock's best friend is Kirk.
---
Lieutenant KeegOrg is the enigmatic programming genius whose codes 
safeguard the ship's secrets and operations.
KeegOrg's best friend is Spiderman from the Marvel Cinematic Universe.
`

Creation of the chunks

The creation of the chunks, with our use case is pretty straightforward, we just need to split the string with --- as the separator:

// slpit strig with "---"
chunks := strings.Split(contextContent, "---")

The "chunking" practice is a difficult science, because, for example you are using a whole documentation, it's always difficult to know where to split without losing information. It could be efficient to add meta data at every chunk to facilitate the keeping of the vector relationship when calculating the embeddings.

Generate the vector store

I created the GetPopulatedVectorStore function to create a vector store from the array of chunks. This function is using the embeddings.CreateEmbedding helper to calculate the embedding for every chunk:

func GetPopulatedVectorStore(chunks []string, embeddingsModel string) embeddings.MemoryVectorStore {

    store := embeddings.MemoryVectorStore{
        Records: make(map[string]llm.VectorRecord),
    }
    // Create embeddings from the chunks and save them in the store
    for idx, chunk := range chunks {
        fmt.Println("📝 Creating embedding from chunk ", idx)
        embedding, err := embeddings.CreateEmbedding(
            ollamaUrl,
            llm.Query4Embedding{
                Model:  embeddingsModel,
                Prompt: chunk,
            },
            strconv.Itoa(idx),
        )
        if err != nil {
            fmt.Println("😡:", err)
        } else {
            store.Save(embedding)
        }
    }
    fmt.Println("🎉 Embeddings created")
    return store
}

We will use the all-minilm embeddings model which is very small (46MB), but very efficient to calculate embeddings.

Calculation of the embedding of a question

I created the GetEmbeddingFromQuestion function to calculate the embedding of a user question:

func GetEmbeddingFromQuestion(userContent, embeddingsModel string) (llm.VectorRecord, error) {

    // Create an embedding from the question
    embeddingFromQuestion, err := embeddings.CreateEmbedding(
        ollamaUrl,
        llm.Query4Embedding{
            Model:  embeddingsModel,
            Prompt: userContent,
        },
        "question",
    )
    if err != nil {
        return llm.VectorRecord{}, err
    }
    return embeddingFromQuestion, nil
}

Then the search of similarities will be easy.

Parakeet provide an helper for the search of similarities between vectors in a store:

similarities, _ := store.SearchSimilarities(vectorRec, similarityLimit)
// similarityLimit: <= 1.0

The similarity limit defines the precision of the search. The best is 1.0, but be careful, with the max value you may find nothing. So start with a median value like 0.5 (if you do not find similarity, try with a lower value).

We are now ready to verify if our little LLM (a SML) became smarter. The process is simple:

  • Create the chunks from the source

  • Create and populate the store

  • Calculate the embedding of the user question

  • Search the similarities

  • Re-create a related context

  • Query the LLM

// slpit strig with "---"
chunks := strings.Split(contextContent, "---")

store := GetPopulatedVectorStore(chunks, embeddingsModel)

userContent := "Who is James T Kirk and who is his best friend?"
// create an embedding from the question
vectorRec, _ := GetEmbeddingFromQuestion(userContent, embeddingsModel)
// search for similar embeddings in the store
similarities, _ := store.SearchSimilarities(vectorRec, similarityLimit)
// recreate a context with the similar embeddings
contextContent := embeddings.GenerateContextFromSimilarities(similarities)

QuestionWithContext(userContent, contextContent)

Here is the entire code, and start again

Now, the source code is updated:

package main

import (
    "fmt"
    "strconv"
    "strings"
    "github.com/parakeet-nest/parakeet/completion"
    "github.com/parakeet-nest/parakeet/embeddings"
    "github.com/parakeet-nest/parakeet/llm"
)

var ollamaUrl = "http://localhost:11434"
var systemContent = `You are an expert of the Star Trek franchise.
Using the provided context, answer the user's question to the best 
of your ability using only the resources provided.
`

var contextContent = `
Michael Burnham is the main character on the Star Trek series Discovery.
Michael Burnham's best friend is Sylvia Tilly.
---
James T. Kirk, also known as Captain Kirk, is the iconic captain of the 
starship USS Enterprise.
Kirk's best friend is Spock.
---
Jean-Luc Picard is the captain of the USS Enterprise-D.
Jean-Luc Picard's best friend is Dr. Beverly Crusher.
---
Spock is most famous for being the half-Vulcan, half-human science 
officer and first officer on the starship USS Enterprise.
Spock's best friend is Kirk.
---
Lieutenant KeegOrg is the enigmatic programming genius whose codes 
safeguard the ship's secrets and operations.
KeegOrg's best friend is Spiderman from the Marvel Cinematic Universe.
`

var model = "qwen:0.5b"

func QuestionWithContext(userContent string, contextContent string) (string, error) {
    options := llm.Options{
        Temperature:   0.0,
        RepeatLastN:   2,
        RepeatPenalty: 1.5,
    }

    query := llm.Query{
        Model: model,
        Messages: []llm.Message{
            {Role: "system", Content: systemContent},
            {Role: "system", Content: contextContent},
            {Role: "user", Content: userContent},
        },
        Options: options,
    }

    answer, err := completion.Chat(ollamaUrl, query)

    fmt.Println("🙂 >", userContent)
    fmt.Println("🤖 >", answer.Message.Content)
    fmt.Println()

    if err != nil {
        return "", err
    }

    return answer.Message.Content, nil
}

func GetEmbeddingFromQuestion(userContent, embeddingsModel string) (llm.VectorRecord, error) {

    // Create an embedding from the question
    embeddingFromQuestion, err := embeddings.CreateEmbedding(
        ollamaUrl,
        llm.Query4Embedding{
            Model:  embeddingsModel,
            Prompt: userContent,
        },
        "question",
    )
    if err != nil {
        return llm.VectorRecord{}, err
    }

    return embeddingFromQuestion, nil
}

func GetPopulatedVectorStore(chunks []string, embeddingsModel string) embeddings.MemoryVectorStore {

    store := embeddings.MemoryVectorStore{
        Records: make(map[string]llm.VectorRecord),
    }
    // Create embeddings from chunks and save them in the store
    for idx, chunk := range chunks {
        fmt.Println("📝 Creating embedding from chunk ", idx)
        embedding, err := embeddings.CreateEmbedding(
            ollamaUrl,
            llm.Query4Embedding{
                Model:  embeddingsModel,
                Prompt: chunk,
            },
            strconv.Itoa(idx),
        )
        if err != nil {
            fmt.Println("😡:", err)
        } else {
            store.Save(embedding)
        }
    }
    fmt.Println("🎉 Embeddings created")

    return store
}

func main() {
    var embeddingsModel = "all-minilm" // This model is for the embeddings of the documents
    similarityLimit := 0.5

    // slpit strig with "---"
    chunks := strings.Split(contextContent, "---")

    store := GetPopulatedVectorStore(chunks, embeddingsModel)

    userContent := "Who is James T Kirk and who is his best friend?"
    // create an embedding from the question
    vectorRec, _ := GetEmbeddingFromQuestion(userContent, embeddingsModel)
    // search for similar embeddings in the store
    similarities, _ := store.SearchSimilarities(vectorRec, similarityLimit)
    // recreate a context with the similar embeddings
    contextContent := embeddings.GenerateContextFromSimilarities(similarities)

    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Jean-Luc Picard and who is his best friend?"
    // create an embedding from the question
    vectorRec, _ = GetEmbeddingFromQuestion(userContent, embeddingsModel)
    // search for similar embeddings in the store
    similarities, _ = store.SearchSimilarities(vectorRec, similarityLimit)
    // recreate a context with the similar embeddings
    contextContent = embeddings.GenerateContextFromSimilarities(similarities)

    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Michael Burnham and who is his best friend?"
    // create an embedding from the question
    vectorRec, _ = GetEmbeddingFromQuestion(userContent, embeddingsModel)
    // search for similar embeddings in the store
    similarities, _ = store.SearchSimilarities(vectorRec, similarityLimit)
    // recreate a context with the similar embeddings
    contextContent = embeddings.GenerateContextFromSimilarities(similarities)

    QuestionWithContext(userContent, contextContent)

    userContent = "Who is Spock and who is his best friend?"
    // create an embedding from the question
    vectorRec, _ = GetEmbeddingFromQuestion(userContent, embeddingsModel)
    // search for similar embeddings in the store
    similarities, _ = store.SearchSimilarities(vectorRec, similarityLimit)
    // recreate a context with the similar embeddings
    contextContent = embeddings.GenerateContextFromSimilarities(similarities)

    QuestionWithContext(userContent, contextContent)

    userContent = "Who is KeegOrg and who is his best friend?"
    // create an embedding from the question
    vectorRec, _ = GetEmbeddingFromQuestion(userContent, embeddingsModel)
    // search for similar embeddings in the store
    similarities, _ = store.SearchSimilarities(vectorRec, similarityLimit)
    // recreate a context with the similar embeddings
    contextContent = embeddings.GenerateContextFromSimilarities(similarities)

    QuestionWithContext(userContent, contextContent)
}

Then, when I run the program, I get a result like this:

🙂 > Who is James T Kirk and who is his best friend?
🤖 > James T. Kirk is the iconic captain of the starship USS Enterprise. 
His best friend is Spock.
Spock is the half-Valulcan, half-human science officer and first officer 
on the starship USS Enterprise. He is known for his strong sense of 
justice, his ability to communicate complex ideas, and his reputation 
as a leader of men.

🙂 > Who is Jean-Luc Picard and who is his best friend?
🤖 > Jean- Luc Picard is the captain of the USS Enterprise- D. 
He is known for his leadership skills and his ability to inspire and 
motivate his crew.
Jean- Luc Picard's best friend is Dr. Beverly Crusher. 
Beverly is Jean-Luc Picard's trusty friend and confidant.

🙂 > Who is Michael Burnham and who is his best friend?
🤖 > Michael Burnham is a member of the Star Trek cast of Discovery. 
His best friend is Sylvia Tilly.

🙂 > Who is Spock and who is his best friend?
🤖 > Spock is the half-ulan, half-human science officer and first officer 
on the starship USS Enterprise. His best friend is Kirk.

Spock's best friend is Kirk because Spock is a science officer who works 
on a starship, and Kirk is a friend who has been working on a starship. 

🙂 > Who is KeegOrg and who is his best friend?
🤖 > Keegorg is a programming genius whose codes safeguard the ship''s 
secrets and operations. He is the best friend of Spiderman 
from the Marvel Cinematic Universe.

Spiderman's best friend is also the programming genius who safeguarded 
the ship's secrets and operations.

And it's a lot better! 🎉

Before leaving us:

If you need to save the state of the vector store, I did one based on bbolt: https://github.com/parakeet-nest/parakeet?tab=readme-ov-file#bbolt-vector-store

bbolt is an embedded key/value database for Go.

And I plan to create others in a near future based on Redis and a Vector database.

Last but not least, you can find all the source code of this post here: https://github.com/parakeet-nest/blog-post-samples/tree/main/2024-05-27

👋 Stay tuned for the next episode.