Create your own QA RAG Chatbot with LangChain.js + Azure OpenAI Service

theophilusO

Brass Contributor

Mar 12, 2025

Use Langchain.js and Azure OpenAI to create an awesome QA RAG Web Application. In this quick read you will learn how you can leverage Node.js to do some amazing things with AI.

Demo: Mpesa for Business Setup QA RAG Application

In this tutorial we are going to build a Question-Answering RAG Chat Web App. We utilize Node.js and HTML, CSS, JS. We also incorporate Langchain.js + Azure OpenAI + MongoDB Vector Store (MongoDB Search Index). Get a quick look below.

Note: Documents and illustrations shared here are for demo purposes only and Microsoft or its products are not part of Mpesa. The content demonstrated here should be used for educational purposes only. Additionally, all views shared here are solely mine.

What you will need:

An active Azure subscription, get Azure for Student for free or get started with Azure for 12 months free.
VS Code
Basic knowledge in JavaScript (not a must)
Access to Azure OpenAI, click here if you don't have access.
Create a MongoDB account (You can also use Azure Cosmos DB vector store)

Setting Up the Project

In order to build this project, you will have to fork this repository and clone it. GitHub Repository link: https://github.com/tiprock-network/azure-qa-rag-mpesa . Follow the steps highlighted in the README.md to setup the project under Setting Up the Node.js Application.

Create Resources that you Need

In order to do this, you will need to have Azure CLI or Azure Developer CLI installed in your computer. Go ahead and follow the steps indicated in the README.md to create Azure resources under Azure Resources Set Up with Azure CLI.

You might want to use Azure CLI to login in differently use a code. Here's how you can do this. Instead of using az login. You can do

az login --use-code-device

OR you would prefer using Azure Developer CLI and execute this command instead

azd auth login --use-device-code

Remember to update the .env file with the values you have used to name Azure OpenAI instance, Azure models and even the API Keys you have obtained while creating your resources.

Setting Up MongoDB

After accessing you MongoDB account get the URI link to your database and add it to the .env file along with your database name and vector store collection name you specified while creating your indexes for a vector search.

Running the Project

In order to run this Node.js project you will need to start the project using the following command.

npm run dev

The Vector Store

The vector store used in this project is MongoDB store where the word embeddings were stored in MongoDB. From the embeddings model instance we created on Azure AI Foundry we are able to create embeddings that can be stored in a vector store. The following code below shows our embeddings model instance.

 //create new embedding model instance
const azOpenEmbedding = new AzureOpenAIEmbeddings({
    azureADTokenProvider,
    azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_API_INSTANCE_NAME,
    azureOpenAIApiEmbeddingsDeploymentName: process.env.AZURE_OPENAI_API_DEPLOYMENT_EMBEDDING_NAME, 
    azureOpenAIApiVersion: process.env.AZURE_OPENAI_API_VERSION,
    azureOpenAIBasePath: "https://eastus2.api.cognitive.microsoft.com/openai/deployments"
});

The code in uploadDoc.js offers a simple way to do embeddings and store them to MongoDB. In this approach the text from the documents is loaded using the PDFLoader from Langchain community. The following code demonstrates how the embeddings are stored in the vector store.

// Call the function and handle the result with await

const storeToCosmosVectorStore = async () => {
  try {
        const documents = await returnSplittedContent()

        //create store instance
        const store = await MongoDBAtlasVectorSearch.fromDocuments(
            documents,
            azOpenEmbedding,
            {
                collection: vectorCollection,
                indexName: "myrag_index",
                textKey: "text",
                embeddingKey: "embedding",
              }
        )

        if(!store){
            console.log('Something wrong happened while creating store or getting store!')
            return false
        }

        console.log('Done creating/getting and uploading to store.')
        return true

   } catch (e) {
        console.log(`This error occurred: ${e}`)
        return false
   }
    
}

In this setup, Question Answering (QA) is achieved by integrating Azure OpenAI’s GPT-4o with MongoDB Vector Search through LangChain.js. The system processes user queries via an LLM (Large Language Model), which retrieves relevant information from a vectorized database, ensuring contextual and accurate responses. Azure OpenAI Embeddings convert text into dense vector representations, enabling semantic search within MongoDB. The LangChain RunnableSequence structures the retrieval and response generation workflow, while the StringOutputParser ensures proper text formatting. The most relevant code snippets to include are: AzureChatOpenAI instantiation, MongoDB connection setup, and the API endpoint handling QA queries using vector search and embeddings. There are some code snippets below to explain major parts of the code.

Azure AI Chat Completion Model

This is the model used in this implementation of RAG, where we use it as the model for chat completion. Below is a code snippet for it.

 const llm = new AzureChatOpenAI({
    azTokenProvider,
    azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_API_INSTANCE_NAME,
    azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_API_DEPLOYMENT_NAME,
    azureOpenAIApiVersion: process.env.AZURE_OPENAI_API_VERSION
})

Using a Runnable Sequence to give out Chat Output

This shows how a runnable sequence can be used to give out a response given the particular output format/ output parser added on to the chain.

//Stream response
app.post(`${process.env.BASE_URL}/az-openai/runnable-sequence/stream/chat`, async (req,res) => {
    //check for human message
    const { chatMsg } = req.body

    if(!chatMsg) return res.status(201).json({
        message:'Hey, you didn\'t send anything.'
    })

   //put the code in an error-handler
   try{
        //create a prompt template format template
        const prompt = ChatPromptTemplate.fromMessages(
            [
                ["system", `You are a French-to-English translator that detects if a message isn't in French. If it's not, you respond, "This is not French." Otherwise, you translate it to English.`],
                ["human", `${chatMsg}`]
            ]
        )

        //runnable chain
        const chain = RunnableSequence.from([prompt, llm, outPutParser])

        //chain result
        let result_stream = await chain.stream()

        //set response headers
        res.setHeader('Content-Type','application/json')
        res.setHeader('Transfer-Encoding','chunked')
    
        //create readable stream
        const readable = Readable.from(result_stream)

        res.status(201).write(`{"message": "Successful translation.", "response": "`);

        readable.on('data', (chunk) => {
            // Convert chunk to string and write it
            res.write(`${chunk}`);
        });
    
        readable.on('end', () => {
            // Close the JSON response properly
            res.write('" }');
            res.end();
        });
    
        readable.on('error', (err) => {
            console.error("Stream error:", err);
            res.status(500).json({ message: "Translation failed.", error: err.message });
        });
   }catch(e){
        //deliver a 500 error response
        return res.status(500).json(
            {
                message:'Failed to send request.',
                error:e
            }
        )
   }

})

To run the front end of the code, go to your BASE_URL with the port given. This enables you to run the chatbot above and achieve similar results. The chatbot is basically HTML+CSS+JS. Where JavaScript is mainly used with fetch API to get a response. Thanks for reading. I hope you play around with the code and learn some new things.

Additional Reads

Updated Mar 10, 2025

Version 1.0