Implementing Retrieval-Augmented Generation with LangChain, Pgvector and OpenAI

Adithya Hebbar's avatar

Adithya Hebbar

Introduction

In the previous blog, we explored how Retrieval-Augmented Generation (RAG) can augment the capabilities of GPT models. This post takes it a step further by demonstrating how to build a system that creates and stores embeddings from a document set using LangChain and Pgvector, allowing us to feed these embeddings to OpenAI's GPT for enhanced and contextually relevant responses.


1. The Role of Embeddings in RAG Systems

Embeddings represent text data in a dense numerical format, which helps models capture the semantic meaning of the text. By storing these embeddings in a database like Pgvector, we can efficiently retrieve the most relevant pieces of information and feed them into a model, like GPT, to enhance its ability to answer queries.


2. Installation and Setup

Before diving into the code, let’s go over the installation steps to set up the environment with LangChain, Pgvector, and OpenAI.

Step 1: Install the Required Libraries

You’ll need the following Python libraries to get started:

pip install langchain pgvector psycopg2-binary
  • langchain: A framework to work with LLMs and build AI applications.
  • pgvector: A Postgres extension that supports vector embeddings storage and similarity search.
  • psycopg2-binary: A PostgreSQL adapter for Python to handle database interactions.

Step 2: Set up Pgvector in PostgreSQL

To store embeddings in Pgvector, your PostgreSQL instance needs the Pgvector extension installed. Here's how to install and configure it:

  1. Install Pgvector (if not already installed):
psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;"

Step 3: Set Up OpenAI API Access

You’ll need an OpenAI API key to generate embeddings and interact with GPT models.

  1. Go to OpenAI API to sign up or log in.
  2. Get your API key and store it in an environment variable:
OPENAI_API_KEY="your_openai_api_key"

With these installations in place, you’re ready to move on to loading, embedding, and querying your documents.


3. Loading the Document Data

from langchain.document_loaders import TextLoader
 
# Load the document from a local file on the device
with open("path_to_your_file.txt", "r", encoding="utf-8") as temp_file:
    loader = TextLoader(temp_file.name, encoding="utf-8")
    documents = loader.load()
 
# Optional: Split large documents for better embedding granularity
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

Splitting the documents into smaller chunks ensures that each chunk can be embedded individually, improving retrieval accuracy.


4. Creating and Storing Embeddings

Once we have our text data, we can generate embeddings using LangChain and store them in Pgvector.

from langchain.embeddings.openai import OpenAIEmbeddings
from pgvector.vectorstore import PGVector
 
# Set up the embeddings model
embeddings_model = OpenAIEmbeddings(openai_api_key="your_openai_key")
 
# Set up the Pgvector database
collection_name = "my_collection"
connection_string = "postgres://user:password@localhost:5432/mydb"
 
# Save embeddings to the Pgvector database
pgvector = PGVector(
    collection_name=collection_name,
    connection_string=connection_string,
    embedding_function=embeddings_model
)
 
pgvector.from_documents(texts)

5. Querying the Database for Relevant Data

Now that the embeddings are stored, we can retrieve the most relevant documents based on a query.

# Retrieve documents based on similarity to the query
query = "Tell me about the Eiffel Tower."
 
retriever = pgvector.as_retriever(search_type="similarity", search_kwargs={"k": 5})
 
# Get the top 5 most relevant documents
relevant_docs = retriever.get_relevant_documents(query)
 
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])
 
context = format_docs(relevant_docs)

6. Generating a Response with OpenAI

With the relevant context retrieved, we now use OpenAI’s GPT model to generate a response.

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
 
# Prepare the template for the prompt
template = """You are an AI assistant. Here's the context for the question:\n{context}\nNow answer the question below:\n\nQuestion: {question}\nAI: """
prompt = ChatPromptTemplate.from_template(template)
 
# Set up the GPT-4 model
model = ChatOpenAI(api_key="your_openai_key")
 
# Prepare the full chain
chain = prompt | model
 
# Generate the response
response = chain.invoke({
    "context": context,
    "question": query
})
 
print(response)

Conclusion

In this guide, we’ve shown how to set up LangChain with Pgvector to create and store document embeddings, query them for relevant context, and use the context to generate a response using OpenAI’s GPT model. This approach provides a powerful system for generating accurate, contextually informed answers.


Resources