Rig the RAG

8 min readMay 28, 2024

Unleashing the Power of LLMs with Naive Retrieval-Augmented-Generation!

Photo by Yannis Papanastasopoulos on Unsplash

This article aims to playfully represent Vector Database and RAG to enhance AI application performance. This is a fun blog — designed with AI assistance, and it does have a Simple RAG hands-on at the end for you to play around!

Why should tech learning be boring! ;P

Welcome to the ultimate rock concert of AI! In this electrifying event, we’ll dive into the world of Vector Databases and Retrieval-Augmented Generation (RAG), using a fun and memorable rock concert metaphor to make learning as exciting as a front-row seat at your favourite show!

Backstage crew: Vector Database

Trusty Guitarist: RAG

Lead singer: the Language Model (LLM)

Rhythm of the band — Embeddings

Let’s start our journey with embeddings, the rhythm section of our band, providing the backbone for the music. Just like how different musical notes contribute to the overall sound, embeddings capture the unique characteristics of each piece of data.

In technical terms, embeddings are dense vector representations of words or other items in a dataset. They encode semantic and syntactic information, allowing AI models to understand relationships between words and concepts. For example, in a song lyric dataset, embeddings could capture similarities between words like “love” and “heartbreak,” enabling the AI model to generate emotionally resonant lyrics.

Setting the Stage: Vector Databases

https://www.kdnuggets.com/2023/06/vector-databases-important-llms.html

Imagine our Vector Database as the backstage area of our concert venue. Just like a backstage crew organizes instruments and equipment, a Vector Database organizes data in a way that AI models can easily access and utilize. Each piece of data, whether it’s a song lyric or a guitar riff, is represented as a vector, a unique set of numbers capturing its essence.

Vector Database Breakdown:

Vectors: Think of vectors as the superstar notes. Each note (vector) represents a piece of data, like a document or an image. Just like a musical note captures a particular sound, a vector captures the essence of data in numerical form.
Dimensionality: The number of strings on your guitar. More dimensions mean more detail, allowing for more nuanced representations of data. A higher dimensional vector can capture more complex relationships within the data.
Similarity Search: Finding the most similar notes to create a hit song. This is how we fetch relevant information. In practice, this means using mathematical operations to find vectors that are close to each other in the vector space, indicating similar content.

Rocking with RAG: Unleashing the Lead Singer’s Potential

Author: Pier Paolo Ippolito https://kili-technology.com/large-language-models-llms

Now, let’s introduce our lead singer: the Language Model (LLM). Like any rockstar, our LLM has a vast repertoire of songs (data) stored in its memory. But sometimes, the best rockstars are demanded to sing songs of other rockstars album. That’s where Retrieval-Augmented Generation (RAG) comes in!

Like the famous concert of Coldplay where they sang ‘Vande Mataram’ with A.R.Rehman! 🤩

RAG acts as our band’s trusty guitarist, ready to support at a moment’s notice. When the lead singer (LLM) needs to sing other rockstars song (customer context), the guitarist (RAG) steps in, retrieving the relevant information from the Vector Database with lightning speed. Together, they create a seamless performance that wows the crowd.

How Does RAG Work?

https://kx.com/blog/implementing-rag-with-kdb-ai-and-langchain/

Retriever: The retriever dives into the vector database and fetches the most relevant chunks of information.
Generator: With the help of the retriever, the generator can produce more accurate and contextually relevant responses.

Different Ways to Rock RAG

RAG can perform in different ways. Let’s check out the band’s setlist:

RAG-Token: The retriever provides tokens (words or phrases) during text generation, one note at a time. It’s like the guitarist feeding the singer one word at a time, ensuring the song goes on smoothly.
RAG-Sequence: The retriever fetches entire sequences (sentences or paragraphs) before the lead singer starts belting out the tune. Here, the guitarist hands over a full verse or chorus, giving the singer a more substantial chunk of lyrics to work with.
Re-ranking the Retrieved Documents : Imagine your band has a list of songs (documents) it could play, but you want to ensure the best ones are played first. Re-ranking helps prioritize the most relevant documents.
Ensemble Methods: Sometimes, your lead singer needs backup vocalists to harmonize and enhance the overall performance. Ensemble methods combine multiple models to improve accuracy.
Fine-Tuning: Regular practice sessions help the band perfect their performance. Fine-tuning the model on specific datasets ensures more accurate and relevant responses.
Contextual Awareness: The rhythm section (bass and drums) ensures the band stays in sync. In RAG, contextual awareness ensures responses are coherent and relevant.

Setting Up Your Own RAG Concert

Time to get practical!

Let’s see some simple code to set up RAG with a vector database. We’ll use Python, Hugging Face’s transformers library, and FAISS for our vector database.

Step 1: Clone the Repository

git clone https://github.com/NiteeShah/ai.git

Step 2: Install dependencies

Follow README.md for step by step Simple RAG implementation guide

https://github.com/NiteeShah/ai

pip install torch transformers faiss-cpu numpy datasets sentence-transformers faiss-gpu

Step 3: Creating Embeddings and Vector Database

This code snippet demonstrates how to utilize FAISS (Facebook AI Similarity Search) and Sentence Transformers for efficient text similarity searches. Firstly, we load a pre-trained sentence transformer model to convert textual data into embeddings.

Next, we define our custom data as a list of text strings, which serves as the input for generating embeddings using the loaded model. These embeddings are then added to a FAISS index, a data structure optimized for similarity searches.

By creating this index, we enable quick retrieval of similar text data. Finally, we save the FAISS index to a file, allowing us to reuse it for future similarity searches without recomputation. This approach enhances the performance of AI applications requiring rapid and accurate text retrieval based on semantic similarity.

# python3 sentence_embeddings.py

import faiss
from sentence_transformers import SentenceTransformer, util
# Load pre-trained sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Define your custom data as a list of text strings
sentences = ["This is an example sentence", "Each sentence is converted"]

# Convert sentences to embeddings
embeddings = model.encode(sentences)
print(embeddings)

# Create a FAISS index for efficient similarity search
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Save the index for future use
faiss.write_index(index, "my_vector_database.faiss")
print("Vector database created and saved!")

Do check the output to see how embeddings look like

[[ 6.76569194e-02  6.34959713e-02  4.87131476e-02  7.93049783e-02
   3.74480933e-02  2.65277131e-03  3.93749513e-02 -7.09846430e-03
   5.93614057e-02  3.15369777e-02  6.00981042e-02 -5.29052615e-02
   4.06067781e-02 -2.59308387e-02  2.98428647e-02  1.12688600e-03
   7.35148042e-02 -5.03817983e-02 -1.22386612e-01  2.37028562e-02
   2.97265332e-02  4.24768738e-02  2.56337579e-02  1.99516211e-03
  -5.69190793e-02 -2.71598026e-02 -3.29035036e-02  6.60248846e-02
   1.19007230e-01 -4.58791181e-02 -7.26214051e-02 -3.25840116e-02
   5.23413308e-02  4.50553037e-02  8.25298298e-03  3.67024504e-02
  -1.39415674e-02  6.53918684e-02 -2.64271945e-02  2.06370896e-04
  -1.36643145e-02 -3.62810642e-02 -1.95043720e-02 -2.89738011e-02
   3.94270122e-02 -8.84091184e-02  2.62427772e-03  1.36713730e-02
   4.83062901e-02 -3.11566498e-02 -1.17329173e-01 -5.11690713e-02
  -8.85288343e-02 -2.18963325e-02  1.42986095e-02  4.44167666e-02
  -1.34815648e-02  7.43392259e-02  2.66382862e-02 -1.98763069e-02

Step 4: Perform RAG on custom Data

In this code, we implemented Retrieval-Augmented Generation (RAG) by combining FAISS for efficient similarity search and Sentence Transformers for generating embeddings.

Initially, we created a FAISS index containing embeddings of a set of sentences. When a query sentence is provided, we used FAISS to retrieve the top k most similar sentences from the index.

These retrieved sentences serve as context or background information for text generation. The model then generates a response based on both the query and the retrieved context, thereby augmenting the generation process.

This approach ensures that the generated responses are not only based on the query but also incorporate relevant information retrieved from the FAISS index, resulting in more coherent and contextually relevant outputs.

# python3 naive_rag_from_text.py

import faiss
from sentence_transformers import SentenceTransformer

# Load pre-trained sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Define your custom data as a list of text strings
sentences = ["India is worlds best place", "Rome is also decent", "cinque terre is wonderful", "Kerela is Gods own country", "Moon is not on earth", "Europe is a must visit"]

# Convert sentences to embeddings
embeddings = model.encode(sentences)

# Create a FAISS index for efficient similarity search
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Save the index for future use
faiss.write_index(index, "my_vector_database.faiss")
print("Vector database created and saved!")

# Load the FAISS index from disk
index = faiss.read_index("my_vector_database.faiss")

# Define your query sentence
query_sentence = "which is worlds best place"

# Convert the query sentence to an embedding
query_embedding = model.encode([query_sentence])

# Perform a similarity search using FAISS
k = 5  # Retrieve the top k most similar sentences
distances, similar_indices = index.search(query_embedding, k)


# Prepare the response incorporating the retrieved similar sentences
response = f"The query sentence is: '{query_sentence}'.\n\n"
response += "Top similar sentences:\n"

# Print the retrieved sentences and their distances
for i, distance in enumerate(distances.ravel()):
    response += f"{i+1}. (Similarity: {1-distance:.4f}): {sentences[similar_indices[0][i]]}\n"

print(response)


best_match = sentences[similar_indices[0][0]]
best_match_distance = 1 - distances[0][0]  # Convert distance to similarity score
response = f"The best match for your query '{query_sentence}' is:\n'{best_match}' with a similarity score of {best_match_distance:.4f}."

print(response)

Do checkout the similarity results as a part of repsonse

Vector database created and saved!
The query sentence is: 'which is worlds best place'.

Top similar sentences:
1. (Similarity: 0.5098): India is worlds best place
2. (Similarity: -0.1724): Europe is a must visit
3. (Similarity: -0.2183): Rome is also decent
4. (Similarity: -0.4220): cinque terre is wonderful
5. (Similarity: -0.5599): Kerela is Gods own country

The best match for your query 'which is worlds best place' is:
'India is worlds best place' with a similarity score of 0.5098.

Wrapping Up the Concert

And there you have it! A rock-solid introduction to vector databases and RAG. By combining the precision of vector databases with the generative power of language models, you can build systems that not only remember the lyrics but also generate new hit songs on the fly.

Here’s a final easy diagram to summarize the RAG workflow:

https://refactoring.fm/p/meeting-buffers-naming-files-and

Remember, the AI band is only as good as its training and the data you feed it. So keep experimenting, keep rocking, and most importantly, keep having fun with AI!

Please like and share if it was a good read. :D