Baby steps with Kronk
Updated on 2025-12-01 with Kronk v0.25.0

Kronk provides a high-level API on top of the Yzma library to develop Generative AI applications in Go by directly incorporating local model inference into your application (so your application doesn't depend on any external service like Ollama, for example).
This means you can develop completely independent Generative AI applications in Go, without relying on any third-party solution to run model inference.
Kronk aims to be close to the OpenAI API (and targets compatibility with it), which will make it easier to use.
This project was initiated by William (Bill) Kennedy, a Go consultant and trainer, who is notably the author of the excellent book Go in Action and above all one of the founding members of GoBridge.
Fun fact: I met Bill for the first time because I found myself interviewed (to my great surprise) on his podcast Ardanlabs Podcast in 2023, and it was a really fun moment.
Today, we're going to see how to build our first projects with Kronk.
Prerequisites
Installing dependencies
You'll need to have installed the libraries necessary for using Yzma (for installation and usage, see the previous article: Installing and Using Yzma on a Jetson Orin Nano).
Note: you don't need to have a Jetson Orin Nano to use Kronk and Yzma, even though it's the platform I use for my tests. The dependencies also exist for other platforms, which you can find listed here: support.
Download a model
You'll need a model in GGUF format to do your tests with Kronk. My favorite model is Qwen2.5:0.5b which you can find on Hugging Face:
mkdir first-steps-with-kronk
cd first-steps-with-kronk
curl -L -o qwen2.5-0.5b-instruct-q4_k_m.gguf --progress-bar https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf?download=true
First project with Kronk: simple completion
We're going to do a "first completion" with Kronk using the model we just downloaded. But first, we need to initialize our Go module and add the Kronk dependency:
go mod init first-steps-with-kronk
go get github.com/ardanlabs/kronk@v0.25.0
touch main.go
Then create the main.go file with the following content: main.go
package main
import (
"context"
"fmt"
"log"
"os"
"strings"
"time"
"github.com/ardanlabs/kronk"
"github.com/ardanlabs/kronk/model"
)
func main() {
// Step 1️⃣
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()
// Step 2️⃣
modelFile := "./qwen2.5-0.5b-instruct-q4_k_m.gguf"
libPath := os.Getenv("YZMA_LIB")
// Step 3️⃣
// Initialize Kronk
err := kronk.Init(libPath, kronk.LogSilent)
if err != nil {
log.Fatal("😡 Error initializing Kronk:", err)
}
// Step 4️⃣
// modelInstances represents the number of instances of the model to create.
// Unless you have more than 1 GPU, the recommended number of instances is 1.
const modelInstances = 1
modelConfig := model.Config{
ModelFile: modelFile,
}
// Create a new Kronk inference model
krn, err := kronk.New(modelInstances, modelConfig)
if err != nil {
log.Fatal("😡 Unable to create inference model:", err)
}
defer krn.Unload(context.Background())
// Step 5️⃣
data := model.D{
"messages": model.DocumentArray(
model.ChatMessage("system", "You are a helpful assistant, expert in Star Trek."),
model.ChatMessage("user", "Who is Jean-Luc Picard?"),
),
}
params := model.Params{
Temperature: 0.0,
TopP: 0.9,
}
// Step 6️⃣
response, err := krn.Chat(ctx, params, data)
if err != nil {
log.Fatal("😡 Chat:", err)
}
// Step 7️⃣
// Print the response
fmt.Println(strings.Repeat("-", 60))
fmt.Println(response.Choice[0].Delta.Content)
fmt.Println("\n" + strings.Repeat("-", 60))
log.Println("✅ Chat complete.")
}
Some explanations
Here's what the program does step by step:
Context configuration
Creating a context with a 120-second timeout to limit execution time
The
defer cancel()ensures that context resources will be freed at the end
Parameter preparation
Defining the path to the downloaded GGUF model
Retrieving the Yzma library path from the
YZMA_LIBenvironment variable
Kronk initialization
Calling
kronk.Init()to initialize the Yzma libraryThe
kronk.LogSilentparameter disables verbose logsIn case of error, the program stops immediately
Creating the inference instance
The
modelInstances = 1parameter defines the number of model instances to createUnless you have more than 1 GPU, the recommended number of instances is 1
model.Configcontains the model configuration with the path to the GGUF filekronk.New()loads the model into memory and creates the inference instanceThe
defer krn.Unload(context.Background())ensures that the model will be unloaded from memory at the end
Preparing messages and parameters
Building a data structure (
model.D) containing messages in OpenAI formatUsing
model.DocumentArray()andmodel.ChatMessage()helper functionsThe "system" message defines the assistant's role (Star Trek expert)
The "user" message contains the question asked
Inference parameters are defined separately in a
model.Paramsstructure
Executing the chat
Calling
krn.Chat()with context, parameters, and dataTemperature: 0.0makes responses deterministic (no variability)TopP: 0.9controls the diversity of vocabulary usedIn case of error, the program stops
Displaying the result
- Displaying the response generated by the model (
response.Choice[0].Delta.Content)
- Displaying the response generated by the model (
Running the program
Don't forget the first time to do a go mod tidy to download dependencies and then run the program with the following command:
go run main.go
And you should have output like this after a few seconds (it all depends on your hardware of course):
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Orin, compute capability 8.7, VMM: yes
load_backend: loaded CUDA backend from /home/k33g/yzma-projects/yzma_cuda_lib/libggml-cuda.so
load_backend: loaded CPU backend from /home/k33g/yzma-projects/yzma_cuda_lib/libggml-cpu-armv8.2_2.so
------------------------------------------------------------
Jean-Luc Picard is a fictional character created by the creator of Star Trek, the show and its franchise. He is a starship commander on the starship Enterprise, the starship that is the flagship of the Starfleet crew.
Jean-Luc Picard is a highly advanced AI, a type of artificial intelligence that is designed to assist and facilitate human decision-making. He is known for his advanced programming, his ability to learn from experience, and his ability to make decisions based on complex calculations and data.
Jean-Luc Picard is a member of the crew of the Enterprise, a starship that is the flagship of the Starfleet crew. He is a member of the crew of the Enterprise-A, a starship that is the flagship of the Starfleet crew. He is also a member of the crew of the Enterprise-B, a starship that is the flagship of the Starfleet crew.
Jean-Luc Picard is a member of the crew of the Enterprise-C, a starship that is the flagship of the Starfleet crew. He is also a member of the crew of the Enterprise-D, a starship that is the flagship of the Starfleet crew.
------------------------------------------------------------
2025/11/30 07:11:44 ✅ Chat complete.
You see, it's pretty simple to use Kronk to do local inference with GGUF models in Go 🥰!
Streaming Chat completion
If the model takes time to respond, rather than waiting for the completion to finish, it's possible to display the response as it's being generated (streaming). This will greatly improve the user experience as they can see results appearing little by little. To do this, we need to modify the previous code a bit.
We're going to replace this code:
response, err := krn.Chat(ctx, params, data)
if err != nil {
log.Fatal("😡 Chat:", err)
}
With this code:
ch, err := krn.ChatStreaming(ctx, params, data)
if err != nil {
log.Fatal("😡 Chat streaming:", err)
} else {
log.Println("😁 Chat streaming ready...")
}
for resp := range ch {
fmt.Print(resp.Choice[0].Delta.Content)
}
Brief explanations
Here's what this code does step by step:
Creating a streaming channel
krn.ChatStreaming()returns a channel (ch) instead of a complete responseThis channel will receive response fragments as they're generated
The parameters (
Messages,Temperature,TopP) remain identical to non-streaming mode
Reception loop
for resp := range chiterates over each fragment received from the channelEach
respcontains a small piece of the generated textresp.Choice[0].Delta.Contentextracts the fragment contentfmt.Print()displays the fragment immediately (without newline)
Complete code
main.go
package main
import (
"context"
"fmt"
"log"
"os"
"strings"
"time"
"github.com/ardanlabs/kronk"
"github.com/ardanlabs/kronk/model"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
defer cancel()
modelFile := "./qwen2.5-0.5b-instruct-q4_k_m.gguf"
libPath := os.Getenv("YZMA_LIB")
// Initialize Kronk
err := kronk.Init(libPath, kronk.LogSilent)
if err != nil {
log.Fatal("😡 Error initializing Kronk:", err)
}
// modelInstances represents the number of instances of the model to create.
// Unless you have more than 1 GPU, the recommended number of instances is 1.
const modelInstances = 1
modelConfig := model.Config{
ModelFile: modelFile,
}
// Create a new Kronk inference model
krn, err := kronk.New(modelInstances, modelConfig)
if err != nil {
log.Fatal("😡 Unable to create inference model:", err)
}
defer krn.Unload(context.Background())
data := model.D{
"messages": model.DocumentArray(
model.ChatMessage("system", "You are a helpful assistant, expert in Star Trek."),
model.ChatMessage("user", "Who is Jean-Luc Picard?"),
),
}
params := model.Params{
Temperature: 0.0,
TopP: 0.9,
}
ch, err := krn.ChatStreaming(ctx, params, data)
if err != nil {
log.Fatal("😡 Chat streaming:", err)
} else {
log.Println("😁 Chat streaming ready...")
}
log.Println("⏳ Chat streaming is starting...")
fmt.Println(strings.Repeat("-", 60))
for resp := range ch {
fmt.Print(resp.Choice[0].Delta.Content)
}
fmt.Println("\n" + strings.Repeat("-", 60))
log.Println("✅ Chat streaming complete.")
}
All you have to do now is run the program again with the command:
go run main.go
And there you go! You now have a program that uses streaming chat completion with a GGUF model locally 🥳! You can see that the code isn't much more complicated.
You can find the complete source code for these examples here: https://codeberg.org/GenAI-On-Small-Devices/genai-with-kronk.
See you very soon for new adventures with Kronk and Yzma! 🤓