GenAI streaming API with LangChainJS, Ollama and Fastify
And this is still happening on a Pi 5 (and propelled by 🐳 Docker Compose)

In the previous blog post (Let's chat about programming with LangChainJS and Ollama), we saw how to use LangChainJS with Node.js to prompt Ollama. Today, we will create a streaming API from our source code with the help of Fastify.
Like the previous blog posts, I use the Pi GenAI stack project to do this; it's a Docker Compose stack with all the dependencies and a Web IDE.
Don't forget to git pull the last version
To start the JavaScript Dev environment, type the following command:
docker compose --profile javascript up
Once the stack is started, you can connect to the WebIDE with http://<dns_name_of_the_pi_or_IP_address>:3001 or http://localhost:3001 if you are working directly on your Pi.
Of course if it runs on a Pi, it can run on your workstation
You can have a look to the npm dependencies here: https://github.com/bots-garden/pi-genai-stack/blob/main/js-dev-environment/package.json
Update index.mjs
Creating an API from our previous code is very straightforward with Fastify. So, the updated source code looks like this:
import Fastify from 'fastify'
import { ChatOllama } from "@langchain/community/chat_models/ollama"
import { StringOutputParser } from "@langchain/core/output_parsers"
import {
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
ChatPromptTemplate
} from "@langchain/core/prompts"
let ollama_base_url = process.env.OLLAMA_BASE_URL
const model = new ChatOllama({
baseUrl: ollama_base_url,
model: "deepseek-coder",
temperature: 0,
repeatPenalty: 1,
verbose: false
})
const prompt = ChatPromptTemplate.fromMessages([
SystemMessagePromptTemplate.fromTemplate(
`You are an expert in computer programming.
Please make friendly answer for the noobs.
Add source code examples if you can.
`
),
HumanMessagePromptTemplate.fromTemplate(
`I need a clear explanation regarding my {question}.
And, please, be structured with bullet points.
`
)
])
const fastify = Fastify({
logger: false
})
const { ADDRESS = '0.0.0.0', PORT = '8080' } = process.env;
fastify.post('/prompt', async (request, reply) => {
const question = request.body["question"]
const outputParser = new StringOutputParser()
const chain = prompt.pipe(model).pipe(outputParser)
let stream = await chain.stream({
question: question,
})
reply.header('Content-Type', 'application/octet-stream')
return reply.send(stream)
})
const start = async () => {
try {
await fastify.listen({ host: ADDRESS, port: parseInt(PORT, 10) })
} catch (err) {
fastify.log.error(err)
process.exit(1)
}
console.log(`Server listening at ${ADDRESS}:${PORT}`)
}
start()
The most important and exciting thing is this part:
fastify.post('/prompt', async (request, reply) => {
const question = request.body["question"]
const outputParser = new StringOutputParser()
const chain = prompt.pipe(model).pipe(outputParser)
let stream = await chain.stream({
question: question,
})
reply.header('Content-Type', 'application/octet-stream')
return reply.send(stream)
})
const question = request.body["question"]allows us to get the parameter of an HTTP request.reply.header('Content-Type', 'application/octet-stream')"explains" to Fastifiy that the response will be a stream.And the magic thing is that you only need to return
reply.send(stream)
Start the stream and query Ollama
Now, we have an HTTP server (thanks to Fastify), so start it:
node index.mjs
# you should get this message:
Server listening at 0.0.0.0:8080
You can now try a curl request like this one:
curl -H "Content-Type: application/json" http://robby.local:8080/prompt \
-d '{
"question": "what are structs in Golang?"
}'
robby.localis the DNS name of my Pi, replace by yours, or the IP of the Pi
And now wait for the magic:

Fastify is a powerful and easy-to-use Node.js framework for developing web application servers. In the next blog post, we will see how to develop a single-page application using this API.
👋 Happy geeking!🤓