Advanced Chatbot Architecture with Langchain and Pinecone on AWS

A Comprehensive Workflow for Vector Database Integration in AWS

9 min readSep 24, 2024

In the past few months, there has been a significant increase in the number of tools for creating AI chatbots. From no-code platforms like Voiceflow to custom implementations using Langchain, there are numerous options available. However, while many tutorials cover building chatbots on a local machine, there is often a lack of detailed guidance on creating chatbot applications that are ready for production.

How do you create a chatbot that is both functional and secure, and can be shared with the world? In this article, I will outline the workflow I use to build custom AI chatbots for my clients. I leverage the power of Langchain, Langserve, Pinecone, and AWS.

Introducing Langchain, Langserve, Pinecone, and AWS

The workflow for creating a production-ready AI chatbot involves several key steps, utilizing Langchain, Langserve, Pinecone, and AWS to ensure a robust and scalable solution.

Langchain: Used to build the core of the chatbot, integrating sophisticated large language models (LLMs) to create engaging and intelligent conversational experiences.
Pinecone: Manages and queries vector embeddings. Pinecone provides a high-performance, scalable vector database that ensures fast and accurate search capabilities, enhancing the chatbot’s responsiveness and intelligence.
Langserve: Wraps the chatbot in an API, making it accessible to different platforms and users.
AWS: AWS offers a robust infrastructure that supports the development and deployment of advanced applications. Specifically, AWS S3 efficiently manages data storage, ensuring secure and scalable handling of all the data necessary for a chatbot's operation. By utilizing AWS Bedrock, the chatbot gains access to powerful large language models (LLMs), enhancing its language understanding and generation capabilities. Additionally, AWS Copilot simplifies the deployment of the API, streamlining the process of launching and managing the chatbot in a production environment, making it easier for developers to bring their innovations to life.

This comprehensive workflow ensures that your AI chatbot is ready for real-world use, with all components seamlessly integrated for optimal performance.

Step-by-Step Guide

Requirements

To successfully follow and complete this guide, you need:

Python is installed on your local computer.

How to install python in Windows 11?

How to download & install Python on a windows local machine?

bhavikjikadara.medium.com

An AWS account.

Getting Started with AWS: A Step-by-Step Guide to Setting Up Your Free Tier Account

Ready to harness the power of AWS? This comprehensive guide will take you through the seamless process of creating your…

medium.com

A Pinecone account.

The vector database to build knowledgeable AI | Pinecone

Search through billions of items for similar matches to any object, in milliseconds. It's the next generation of…

www.pinecone.io

1. Initiate Langserve Application

First, create an app directory and initiate a new Python virtual environment to install the required dependencies:

# Create directory
mkdir tutorial_langserve
cd tutorial_langserve

# Create a virtual environment
python -m venv env
source env/bin/activate

# Install required packages
pip install -U pip langchain-cli langchain_pinecone langchain-aws \
 langchain-community boto3 poetry "unstructured[pdf]"

Standard Python installations include the pip package manager by default. However, the LangServe project uses poetry by default. Because of this, we'll install our dependencies in two stages.

Now that we have Langchain installed, we can initiate a new Langchain project in our current directory. You will be asked “What package would you like to add?” but you can skip this for now.

langchain app_name new .

Your directory should now look similar to this:

.
├── app/
│   ├── __init__.py
│   ├── __pycache__/
│   │   └── . . .
│   └── server.py
├── Dockerfile
├── packages/
│   └── README.md
├── pyproject.toml
├── README.md
└── env/
    └── . . .

The pyproject.toml file is the primary file that the langchain command and poetry both use to record dependency information and configure project metadata. As this now makes the directory a valid poetry project, we can use poetry to install the remaining dependencies:

poetry add "langserve[all]" python-decouple boto3 langchain \ 
       langchain-aws langchain_community "pydantic<2"

Our project directory now has all the dependencies and project files necessary to build our Langserve application.

2. Build a Vector Database Using Pinecone

In order to build a production-ready AI chatbot, we will need a vector database that can be accessed by our Langchain application. For this, we will use Pinecone.

First, we will create a new vector database via the Pinecone console. Create a new index with Dimensions: 1536; you can leave the other settings as default.

Next, we will use this Python script to upload our PDFs stored in the AWS S3 bucket s3://production-ready-tutorial/data/. Make sure to add your Pinecone API Key and have your AWS credentials set up properly.

import os
from langchain_community.document_loaders import S3DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import boto3
from langchain_community.embeddings import BedrockEmbeddings
from langchain_pinecone import PineconeVectorStore

os.environ['PINECONE_API_KEY'] = '<YOUR PINECONE API KEY>'

def chunk_data(data):
    ''' Function to split documents into chunks '''
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=100)
    chunks = text_splitter.split_documents(data)
    return chunks

def get_bedrock_client(region):
    ''' Function to create a Bedrock client via boto3 '''
    bedrock_client = boto3.client("bedrock-runtime", region_name=region)
    return bedrock_client

def create_embeddings(region):
    ''' Function to create vector embeddings '''
    bedrock_client = get_bedrock_client(region)
    bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",
                                           client=bedrock_client)
    return bedrock_embeddings

def load_vectordatabase(chunks, bedrock_embeddings, index_name):
    ''' Function to load the chunks into the vector database '''
    docsearch = PineconeVectorStore.from_documents(chunks, bedrock_embeddings, index_name=index_name)
    return docsearch

def main():
    print("### Load S3 data")
    bucket_name = 'production-ready-tutorial'
    prefix = 'database/'

    loader = S3DirectoryLoader(bucket_name, prefix=prefix)
    data = loader.load()

    print("### Split data into chunks")
    chunks = chunk_data(data)

    print("### Create embeddings model")
    embeddings = create_embeddings(region='us-east-1')

    print("### Load data into vector database")
    index_name = 'production-ready-tutorial'
    load_vectordatabase(chunks, embeddings, index_name)

    print("### Done!")

if __name__ == "__main__":
    main()

3. Create the Langserve Application

To create a basic LangServe application, open the app/server.py file in your text editor. Inside, replace the existing contents with the following code. Make sure to add your Pinecone API Key and have your AWS credentials set up properly.

from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
import boto3
from langchain_core.output_parsers import StrOutputParser
from langchain_aws import ChatBedrock
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings import BedrockEmbeddings

app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="A simple API server using Langchain's Runnable interfaces",
)

@app.get("/")
async def redirect_root_to_docs():
    return RedirectResponse("/docs")

# Setup AWS and Bedrock client
def get_bedrock_client():
    return boto3.client("bedrock-runtime", region_name='us-east-1')

def create_embeddings(client):
    return BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=client)

def create_bedrock_llm(client):
    return ChatBedrock(model_id='anthropic.claude-3', client=client, model_kwargs={'temperature': 0}, region_name='us-east-1')

# Initialize everything
bedrock_client = get_bedrock_client()
bedrock_embeddings = create_embeddings(bedrock_client)
vectorstore = PineconeVectorStore(index_name='production-ready-tutorial', embedding=bedrock_embeddings, pinecone_api_key='<YOUR PINECONE API KEY>')
model = create_bedrock_llm(bedrock_client)

template = '''
Use the following context to answer the question:
{context}

Question: {question}'''

prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": vectorstore.as_retriever(search_kwargs={"k": 2}), "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)
add_routes(
    app,
    chain,
    path="/knowledge",
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

Explanation of the Code:

The code snippet sets up a FastAPI server for a LangChain-based chatbot application, integrating several components crucial for its functionality:

FastAPI Setup: FastAPI is initialized with a title, version, and description, providing a RESTful API framework for handling HTTP requests and responses.
Root Endpoint Redirect: An asynchronous function redirect_root_to_docs() redirects requests to the root endpoint ("/") to the API documentation ("/docs"), facilitating easier navigation and understanding of the API.
AWS and Bedrock Client Setup: Functions get_bedrock_client() and create_embeddings(client) are defined to set up connections to AWS Bedrock, a service for running LLMs. get_bedrock_client() initializes a client for AWS Bedrock, while create_embeddings(client) creates an instance of BedrockEmbeddings using a specified model (amazon.titan-embed-text-v1).
LangChain Components: ChatBedrock from langchain_aws is utilized via create_bedrock_llm(client) to instantiate a chat model (anthropic.claude-3) tailored for conversational tasks. The model is configured with a temperature setting of 0 to control the variability of its responses.
Vector Store Initialization: PineconeVectorStore from langchain_pinecone is used to load our vector store with index_name 'production-ready-tutorial'. Make sure to add your pinecone_api_key here.
Chat Prompt Template: A template is defined using ChatPromptTemplate from langchain.prompts, to define our prompt. It includes placeholders for the context (i.e., the information retrieved from our vector store) and the question ({context} and {question}).
Processing Pipeline (chain): The chain defines the sequence of operations for processing user queries
Route Addition: add_routes() from langserve integrates the defined processing chain (chain) into the FastAPI application (app) under the /knowledge endpoint, enabling the API to handle knowledge-based queries as per the defined processing pipeline.
Server Execution: Finally, uvicorn.run() starts the FastAPI server (app) on host="0.0.0.0" (accessible from any network interface) and port=8000, making the chatbot application operational and ready to handle incoming API requests.

Now we can test our chatbot app locally by running:

langchain serve .

You can now test your chatbot via http://127.0.0.1:8000/knowledge/playground/.

4. Deploy the REST API

Next, we want to deploy our API to make it accessible both locally and over the internet. Using AWS Copilot simplifies this process by leveraging AWS services for scalability and manageability. To get started, initialize your Copilot application with the following command:

copilot init --app production-ready-tutorial \
    --name production-ready-tutorial \
    --type 'Load Balanced Web Service' \
    --dockerfile './Dockerfile' --deploy

Next, you can configure your AWS environment, such as setting it to “dev” during development and switching to “prod” for the production-ready version of your chatbot application.

The command sets up a new load-balanced web service, using your Dockerfile to define the service’s environment, and deploys it. It should be noted that this Dockerfile was automatically created by Langchain.

Next, you need to create a policy file to enable your application to interact with AWS Bedrock. The policy should include parameters for the application, environment, and workload names, and resources that allow actions such as bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream on specific Bedrock models. In the folder copilot/production-ready-tutorial/addons create a file bedrock-policy.yml that defines the necessary permissions.

# bedrock-policy.yml
Parameters:
  App:
    Type: String
    Description: Your application name
  Env:
    Type: String
    Description: The environment name your service, job, or workflow is being deployed to
  Name:
    Type: String
    Description: Your workload name

Resources:
  BedrockAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: BedrockActions
            Effect: Allow
            Action:
              - bedrock:InvokeModel
              - bedrock:InvokeModelWithResponseStream
            Resource:
              - arn:aws:bedrock:*::foundation-model/anthropic.*
              - arn:aws:bedrock:*::foundation-model/amazon.*

Outputs:
  BedrockAccessPolicyArn:
    Description: "The ARN of the ManagedPolicy to attach to the task role."
    Value: !Ref BedrockAccessPolicy

Upon completing this update, run copilot deploy. Creating the necessary AWS resources will take a couple of minutes. After completion, you will receive your API URL, which you can also retrieve via the AWS Console under EC2 Load Balancers.

5. Test the Chatbot API

To test your LangChain-powered chatbot API using Postman, follow these steps:

Open Postman: Download and open Postman from here.

2. Create a New Request:

Click on “New” in the top-left corner.
Select “Request” to create a new API request.

3. Set Request Type and URL:

Change the request type to POST.
Enter your API URL followed by /knowledge.

4. Set Headers:

Switch to the “Headers” tab.
Add a header with key accept and value application/json.
Add another header with key Content-Type and value application/json.

5. Set the Request Body:

Go to the “Body” tab in Postman.
Select “raw” as the input format.
Choose “JSON” from the dropdown menu.

{
  "input": "<YOUR QUESTION>",
  "config": {},
  "kwargs": {}
}

6. Send the Request:

Click the “Send” button in Postman to execute the request and observe the response from your LangChain chatbot API.

If all goes well, you should be able to see an answer to your question.

Conclusion

Building a production-ready AI chatbot requires more than just creating a functional prototype; it necessitates ensuring scalability, security, and performance. Utilizing Langchain, Langserve, Pinecone, and AWS, this guide provides a robust workflow for developing AI chatbots suited for real-world applications.

Langchain and Pinecone enable efficient data management and querying.
Langserve makes your chatbot accessible via an API.
AWS offers the necessary infrastructure for secure and scalable deployment.
AWS Copilot simplifies the deployment process, allowing you to focus on enhancing functionality.

By following this guide, you can develop a chatbot that is not only functional but also ready for production use, providing users with a seamless and intelligent conversational experience.