GenAI streaming API with LangChainJS, Ollama and Fastify
And this is still happening on a Pi 5 (and propelled by 🐳 Docker Compose)
Table of contents
In the previous blog post (Let's chat about programming with LangChainJS and Ollama), we saw how to use LangChainJS with Node.js to prompt Ollama. Today, we will create a streaming API from our source code with the help of Fastify.
Like the previous blog posts, I use the Pi GenAI stack project to do this; it's a Docker Compose stack with all the dependencies and a Web IDE.
Don't forget to git pull the last version
To start the JavaScript Dev environment, type the following command:
docker compose --profile javascript up
Once the stack is started, you can connect to the WebIDE with http://<dns_name_of_the_pi_or_IP_address>:3001
or http://localhost:3001
if you are working directly on your Pi.
Of course if it runs on a Pi, it can run on your workstation
You can have a look to the npm dependencies here: https://github.com/bots-garden/pi-genai-stack/blob/main/js-dev-environment/package.json
Update index.mjs
Creating an API from our previous code is very straightforward with Fastify. So, the updated source code looks like this:
import Fastify from 'fastify'
import { ChatOllama } from "@langchain/community/chat_models/ollama"
import { StringOutputParser } from "@langchain/core/output_parsers"
import {
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
ChatPromptTemplate
} from "@langchain/core/prompts"
let ollama_base_url = process.env.OLLAMA_BASE_URL
const model = new ChatOllama({
baseUrl: ollama_base_url,
model: "deepseek-coder",
temperature: 0,
repeatPenalty: 1,
verbose: false
})
const prompt = ChatPromptTemplate.fromMessages([
SystemMessagePromptTemplate.fromTemplate(
`You are an expert in computer programming.
Please make friendly answer for the noobs.
Add source code examples if you can.
`
),
HumanMessagePromptTemplate.fromTemplate(
`I need a clear explanation regarding my {question}.
And, please, be structured with bullet points.
`
)
])
const fastify = Fastify({
logger: false
})
const { ADDRESS = '0.0.0.0', PORT = '8080' } = process.env;
fastify.post('/prompt', async (request, reply) => {
const question = request.body["question"]
const outputParser = new StringOutputParser()
const chain = prompt.pipe(model).pipe(outputParser)
let stream = await chain.stream({
question: question,
})
reply.header('Content-Type', 'application/octet-stream')
return reply.send(stream)
})
const start = async () => {
try {
await fastify.listen({ host: ADDRESS, port: parseInt(PORT, 10) })
} catch (err) {
fastify.log.error(err)
process.exit(1)
}
console.log(`Server listening at ${ADDRESS}:${PORT}`)
}
start()
The most important and exciting thing is this part:
fastify.post('/prompt', async (request, reply) => {
const question = request.body["question"]
const outputParser = new StringOutputParser()
const chain = prompt.pipe(model).pipe(outputParser)
let stream = await chain.stream({
question: question,
})
reply.header('Content-Type', 'application/octet-stream')
return reply.send(stream)
})
const question = request.body["question"]
allows us to get the parameter of an HTTP request.reply.header('Content-Type', 'application/octet-stream')
"explains" to Fastifiy that the response will be a stream.And the magic thing is that you only need to return
reply.send(stream)
Start the stream and query Ollama
Now, we have an HTTP server (thanks to Fastify), so start it:
node index.mjs
# you should get this message:
Server listening at 0.0.0.0:8080
You can now try a curl request like this one:
curl -H "Content-Type: application/json" http://robby.local:8080/prompt \
-d '{
"question": "what are structs in Golang?"
}'
robby.local
is the DNS name of my Pi, replace by yours, or the IP of the Pi
And now wait for the magic:
Fastify is a powerful and easy-to-use Node.js framework for developing web application servers. In the next blog post, we will see how to develop a single-page application using this API.
👋 Happy geeking!🤓