Understanding Retrieval-Augmented Generation (RAG) with OpenAI

17 Sep 2024

by Adithya Hebbar, System Analyst

Introduction

In today's AI-driven world, models like GPT-4 are remarkable at generating human-like responses. However, there's a catch: they are limited by the data they've been trained on, and they don't have real-time access to information. That's where Retrieval-Augmented Generation (RAG) comes in, enhancing the abilities of language models by integrating external knowledge retrieval into the generation process.

This post dives into how RAG works and walks through an example using OpenAI's API to create a simple RAG-based system.

What is Retrieval-Augmented Generation (RAG)?

RAG combines the power of language models with information retrieval. In this setup:

Retrieval: Instead of relying solely on the pre-trained knowledge in the model, RAG queries external sources, such as databases, documents, or web APIs, to retrieve relevant information based on the user's input.
Generation: The retrieved information is then passed to the language model, which uses this additional context to generate a more accurate and relevant response.

This process is beneficial when the language model needs to provide up-to-date or domain-specific information that wasn’t available during its training.

Why Use RAG?

Overcome Training Limitations: GPT models, while powerful, are only aware of data up to a specific point in time. With RAG, you can query the latest information from databases, APIs, or indexed documents, making responses more relevant.
Accurate & Relevant Responses: With retrieval in place, the language model has access to current and precise data, which improves the relevance of its responses in knowledge-intensive domains.
Improved Scalability: Instead of fine-tuning a model for every new knowledge domain, RAG allows the model to dynamically retrieve knowledge, making it scalable across different applications.

How RAG Works

RAG systems typically operate in two phases:

Retrieval Phase: The user’s query is used to retrieve relevant documents or snippets from a knowledge base or document store.
Generation Phase: The retrieved documents are then used as additional input to the language model (like GPT-4), helping it generate a more accurate and contextually rich response.

RAG with OpenAI: A Simple Example

To illustrate how RAG can be implemented with OpenAI’s API, let's walk through an example where we:

Retrieve documents from an external source (a Wikipedia-like database) based on a user query.
Use the retrieved documents to generate a more informed response using GPT-4.

1. Setting up the Retrieval Mechanism

First, let’s create a simple retrieval system. We'll use a mock database of articles and search for the most relevant ones based on the user’s input.

# Sample article database (could be a real database or indexed documents)
articles = {
    1: "Venom: Last Dance, released in October 2024, is the final installment of the Venom trilogy.",
    2: "Gather AI, a Slack bot developed by Codemancers, introduces a new feature for creating mindmaps.",
    3: "Next.js 15 introduces the @next/codemod CLI for easily upgrading to the latest Next.js and React versions."
}

# Function to simulate document retrieval
def retrieve_documents(query):
    query = query.lower().split()

    relevant_docs = []

    # In real-world, you'd use more advanced search, Here, we are just matching substrings for simplicity
    for id, content in articles.items():
        content_lower = content.lower()
        if any(word in content_lower for word in query):
            relevant_docs.append(content)

    return relevant_docs

2. Generate a Response using GPT-4 and Retrieved Data

Once we retrieve the relevant documents, we pass them into the GPT-4 model to generate a more informative response. Here’s how we can do that using OpenAI’s API:

from openai import OpenAI

client = OpenAI(
    api_key = "your-open-ai-key"
)

def generate_response(query, documents):
    context = "\n\n".join(documents)
     # Combine retrieved documents with the original query
    prompt = f"Answer the following question based on the context provided:\n\n{context}\n\nQuestion: {query}"

    messages = [
        {"role": "user", "content": prompt}
    ]


    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )

    return response.choices[0].message.content


user_query = "Whats new with Next.js 15?"
retrieved_docs = retrieve_documents(user_query)

if retrieved_docs:
    response = generate_response(user_query, retrieved_docs)
    print(response)
else:
    print("No relevant documents found.")

# Response : Next.js 15 introduces the @next/codemod CLI for easily upgrading to the latest Next.js and React versions.

In this example:

The retrieve_documents() function searches through a small set of articles and retrieves those relevant to the user’s query.
The retrieved documents are passed to GPT-4 in the form of a prompt. The model then generates a response that is contextually informed by the documents.

3. Validate Response

For the query "Whats new with Next.js 15?", the system would retrieve the document about the Next.js 15 from the database and pass it to GPT-4. The response generated would be more detailed and contextually accurate than what GPT-4 might generate without the document.

Expanding RAG: Next Steps

While this example is simple, you can expand it in several ways:

Advanced Retrieval: Implement advanced retrieval techniques using libraries like Faiss (for embeddings search) or ElasticSearch to search large document stores efficiently.
Real-time Data: Integrate live APIs or real-time databases (such as Wolfram Alpha, News APIs, or product databases) to retrieve the latest information.
Better Integration: In production, combining RAG with tools like LangChain can help manage retrievals and facilitate seamless query-to-retrieval workflows.

Conclusion

This guide has explored the concept of Retrieval-Augmented Generation (RAG) and demonstrated how to enhance the capabilities of language models like GPT-4 by integrating external knowledge sources. By combining document retrieval with advanced text generation, you can provide more accurate and contextually relevant responses. Whether you're building chatbots, content generators, or question-answering systems, RAG offers a powerful approach to improving AI-driven interactions. Happy implementing!

Follow us

Understanding Retrieval-Augmented Generation (RAG) with OpenAI

Introduction

What is Retrieval-Augmented Generation (RAG)?

Why Use RAG?

How RAG Works

RAG with OpenAI: A Simple Example

1. Setting up the Retrieval Mechanism

2. Generate a Response using GPT-4 and Retrieved Data

3. Validate Response

Expanding RAG: Next Steps

Conclusion

Resources

More articles

Turbo Morphing —A Cleaner Way to Update Your Rails UI

Convert Figma design to code

Ready to Build Something Amazing?