GenAI streaming API with LangChainJS, Ollama and Fastify

GenAI streaming API with LangChainJS, Ollama and Fastify

And this is still happening on a Pi 5 (and propelled by 🐳 Docker Compose)

In the previous blog post (Let's chat about programming with LangChainJS and Ollama), we saw how to use LangChainJS with Node.js to prompt Ollama. Today, we will create a streaming API from our source code with the help of Fastify.

Like the previous blog posts, I use the Pi GenAI stack project to do this; it's a Docker Compose stack with all the dependencies and a Web IDE.

Don't forget to git pull the last version

To start the JavaScript Dev environment, type the following command:

docker compose --profile javascript up

Once the stack is started, you can connect to the WebIDE with http://<dns_name_of_the_pi_or_IP_address>:3001 or http://localhost:3001 if you are working directly on your Pi.

Of course if it runs on a Pi, it can run on your workstation

You can have a look to the npm dependencies here: https://github.com/bots-garden/pi-genai-stack/blob/main/js-dev-environment/package.json

Update index.mjs

Creating an API from our previous code is very straightforward with Fastify. So, the updated source code looks like this:

import Fastify from 'fastify'

import { ChatOllama } from "@langchain/community/chat_models/ollama"
import { StringOutputParser } from "@langchain/core/output_parsers"

import { 
  SystemMessagePromptTemplate, 
  HumanMessagePromptTemplate, 
  ChatPromptTemplate 
} from "@langchain/core/prompts"

let ollama_base_url = process.env.OLLAMA_BASE_URL

const model = new ChatOllama({
  baseUrl: ollama_base_url,
  model: "deepseek-coder", 
  temperature: 0,
  repeatPenalty: 1,
  verbose: false
})

const prompt = ChatPromptTemplate.fromMessages([
  SystemMessagePromptTemplate.fromTemplate(
    `You are an expert in computer programming.
     Please make friendly answer for the noobs.
     Add source code examples if you can.
    `
  ),
  HumanMessagePromptTemplate.fromTemplate(
    `I need a clear explanation regarding my {question}.
     And, please, be structured with bullet points.
    `
  )
])

const fastify = Fastify({
  logger: false
})

const { ADDRESS = '0.0.0.0', PORT = '8080' } = process.env;

fastify.post('/prompt', async (request, reply) => {
  const question = request.body["question"]
  const outputParser = new StringOutputParser()
  const chain = prompt.pipe(model).pipe(outputParser)

  let stream = await chain.stream({
    question: question,
  })

  reply.header('Content-Type', 'application/octet-stream')
  return reply.send(stream)
})

const start = async () => {
  try {
    await fastify.listen({ host: ADDRESS, port: parseInt(PORT, 10)  })
  } catch (err) {
    fastify.log.error(err)
    process.exit(1)
  }
  console.log(`Server listening at ${ADDRESS}:${PORT}`)
}
start()

The most important and exciting thing is this part:

fastify.post('/prompt', async (request, reply) => {
  const question = request.body["question"]
  const outputParser = new StringOutputParser()
  const chain = prompt.pipe(model).pipe(outputParser)

  let stream = await chain.stream({
    question: question,
  })

  reply.header('Content-Type', 'application/octet-stream')
  return reply.send(stream)
})
  • const question = request.body["question"] allows us to get the parameter of an HTTP request.

  • reply.header('Content-Type', 'application/octet-stream') "explains" to Fastifiy that the response will be a stream.

  • And the magic thing is that you only need to return reply.send(stream)

Start the stream and query Ollama

Now, we have an HTTP server (thanks to Fastify), so start it:

node index.mjs

# you should get this message:
Server listening at 0.0.0.0:8080

You can now try a curl request like this one:

curl -H "Content-Type: application/json" http://robby.local:8080/prompt \
-d '{
  "question": "what are structs in Golang?"
}'

robby.local is the DNS name of my Pi, replace by yours, or the IP of the Pi

And now wait for the magic:

Fastify is a powerful and easy-to-use Node.js framework for developing web application servers. In the next blog post, we will see how to develop a single-page application using this API.

👋 Happy geeking!🤓