RAG
The Evolution of AI Text Generation
Retrieval-Augmented Generation (RAG) has emerged as a transformative approach in AI, addressing key limitations of traditional large language models (LLMs). By combining the generative capabilities of LLMs with dynamic access to external knowledge sources, RAG systems deliver more accurate, contextual, and up-to-date responses. This advancement has particular significance in domains requiring specialized knowledge or real-time information, from technical documentation to financial analysis.
Understanding Basic RAG Architecture
At its core, RAG operates through a three-stage process:
Document vectorization: Converting chunked documents into embeddings stored in a vector database
Similarity-based retrieval: Matching query vectors with the most relevant document vectors using metrics like cosine similarity
Context-enhanced generation: Combining the original query with retrieved content to generate more informed responses
The effectiveness of a RAG system hinges on both retrieval accuracy and generation quality. While traditional metrics like precision, recall, and F1 scores evaluate retrieval performance, techniques such as LLM-based evaluation and similarity scoring assess the quality of generated responses. In this document, we focus on how to create and evaluate a vector DB on given datasets.
Document vectorization
You use an embedding model to vectorize a document. If the document is large, you may want to chunk the document using various chunking techniques, especially if the document size is larger than the maximum context length supported by the embedding model.
If you have a lot of documents or document chunks, it's more efficient to run them through the embedding model in a batch. You can use the Parasail Batch APIs to achieve this. The following notebook code shows the process.
We use Anthropic documentation dataset for this experiment:
with open("datasets/anthropic_docs.json", "r") as f:
dataset = json.load(f)
print(f"Number of documents: {len(dataset)}")
print(f"Number of chunks: {len(dataset[0]['chunks'])}")
print(f"First chunk:\n{dataset[0]['chunks'][15]['chunk_content'][:100]}")Output:
We define a function that converts a dataset into a vector DB:
We now define a function that computes embeddings of a list of texts using Parasail Batch API.
We can now create a vector DB from the our dataset using an open source embedding model.
Output:
Evaluating the retrieval performance of vector DBs
Now we want to measure the quality of our vector DB using some retrieval metrics.
We use an evaluation dataset that consists of 100 query/answer pairs. The chunks that are relevant to each query are listed along with the answer.
Let's look at one of the queries and its relevant chunks.
We now define a function that takes the evaluation dataset and the vector DB as arguments, and returns a number of retrieval metrics. We first retrieve the most similar chunks for each query using the vector DB, and then compare the retrieved chunks with the golden chunks listed in the evaluation dataset.
Let's now evaluate the vector DB.
Last updated