GenAI streaming API with LangChainJS, Ollama and Fastify

In the previous blog post (Let's chat about programming with LangChainJS and Ollama), we saw how to use LangChainJS with Node.js to prompt Ollama. Today, we will create a streaming API from our source code with the help of Fastify.

Like the previous blog posts, I use the Pi GenAI stack project to do this; it's a Docker Compose stack with all the dependencies and a Web IDE.

Don't forget to git pull the last version

To start the JavaScript Dev environment, type the following command:

docker compose --profile javascript up

Once the stack is started, you can connect to the WebIDE with http://<dns_name_of_the_pi_or_IP_address>:3001 or http://localhost:3001 if you are working directly on your Pi.

Of course if it runs on a Pi, it can run on your workstation

You can have a look to the npm dependencies here: https://github.com/bots-garden/pi-genai-stack/blob/main/js-dev-environment/package.json

Update index.mjs

Creating an API from our previous code is very straightforward with Fastify. So, the updated source code looks like this:

import Fastify from 'fastify'

import { ChatOllama } from "@langchain/community/chat_models/ollama"
import { StringOutputParser } from "@langchain/core/output_parsers"

import { 
  SystemMessagePromptTemplate, 
  HumanMessagePromptTemplate, 
  ChatPromptTemplate 
} from "@langchain/core/prompts"

let ollama_base_url = process.env.OLLAMA_BASE_URL

const model = new ChatOllama({
  baseUrl: ollama_base_url,
  model: "deepseek-coder", 
  temperature: 0,
  repeatPenalty: 1,
  verbose: false
})

const prompt = ChatPromptTemplate.fromMessages([
  SystemMessagePromptTemplate.fromTemplate(
    `You are an expert in computer programming.
     Please make friendly answer for the noobs.
     Add source code examples if you can.
    `
  ),
  HumanMessagePromptTemplate.fromTemplate(
    `I need a clear explanation regarding my {question}.
     And, please, be structured with bullet points.
    `
  )
])

const fastify = Fastify({
  logger: false
})

const { ADDRESS = '0.0.0.0', PORT = '8080' } = process.env;

fastify.post('/prompt', async (request, reply) => {
  const question = request.body["question"]
  const outputParser = new StringOutputParser()
  const chain = prompt.pipe(model).pipe(outputParser)

  let stream = await chain.stream({
    question: question,
  })

  reply.header('Content-Type', 'application/octet-stream')
  return reply.send(stream)
})

const start = async () => {
  try {
    await fastify.listen({ host: ADDRESS, port: parseInt(PORT, 10)  })
  } catch (err) {
    fastify.log.error(err)
    process.exit(1)
  }
  console.log(`Server listening at ${ADDRESS}:${PORT}`)
}
start()

The most important and exciting thing is this part:

fastify.post('/prompt', async (request, reply) => {
  const question = request.body["question"]
  const outputParser = new StringOutputParser()
  const chain = prompt.pipe(model).pipe(outputParser)

  let stream = await chain.stream({
    question: question,
  })

  reply.header('Content-Type', 'application/octet-stream')
  return reply.send(stream)
})

const question = request.body["question"] allows us to get the parameter of an HTTP request.
reply.header('Content-Type', 'application/octet-stream') "explains" to Fastifiy that the response will be a stream.
And the magic thing is that you only need to return reply.send(stream)

Start the stream and query Ollama

Now, we have an HTTP server (thanks to Fastify), so start it:

node index.mjs

# you should get this message:
Server listening at 0.0.0.0:8080

You can now try a curl request like this one:

curl -H "Content-Type: application/json" http://robby.local:8080/prompt \
-d '{
  "question": "what are structs in Golang?"
}'

robby.local is the DNS name of my Pi, replace by yours, or the IP of the Pi

And now wait for the magic:

Fastify is a powerful and easy-to-use Node.js framework for developing web application servers. In the next blog post, we will see how to develop a single-page application using this API.

👋 Happy geeking!🤓

Philippe Charrière's Blog

Philippe Charrière's Blog

GenAI streaming API with LangChainJS, Ollama and Fastify

And this is still happening on a Pi 5 (and propelled by 🐳 Docker Compose)

Table of contents

Update index.mjs

Start the stream and query Ollama