Build Retrieval-Augmented Generation (RAG) With Milvus

Learn to manage hallucinations by building RAGs with Milvus. Developers can embed similarity searches and use unstructured data for LLMs.

Nov. 05, 24 · Tutorial

Likes (2)

Comment

Save

12.1K Views

It's no secret that traditional large language models (LLMs) often hallucinate — generate incorrect or nonsensical information — when asked knowledge-intensive questions requiring up-to-date information, business, or domain knowledge. This limitation is primarily because most LLMs are trained on publicly available information, not your organization's internal knowledge base or proprietary custom data. This is where retrieval-augmented generation (RAG), a model introduced by Meta AI researchers, comes in.

RAG addresses an LLM's limitation of over-relying on pre-trained data for output generation by combining parametric memory with non-parametric memory through vector-based information retrieval techniques. Depending on the scale, this vector-based information retrieval technique often works with vector databases to enable fast, personalized, and accurate similarity searches. In this guide, you'll learn how to build a retrieval-augmented generation (RAG) with Milvus.

What Is RAG?

RAG simply means retrieval-augmented generation, a cost-effective process of optimizing the output of an LLM to generate context and responses outside its knowledge base without retraining the model.

This is important because LLMs are usually constrained by the cut-off period of their training data, which could lead to unpredictable, noncontextual, and inaccurate responses. RAGs address this by integrating real-time vector-based information retrieval techniques to get real-time information.

What Is Milvus?

Milvus is an open-source, high-performance vector database specially designed to manage and retrieve unstructured data through vector embeddings. Unlike other vector databases, Milvus is optimized for fast storage and offers users a flexible and scalable database with index support and search capabilities.

One thing that makes vector databases interesting is their vector embedding and data storage capabilities, which come with a real-time data retrieval system to help reduce hallucinations. By vector embedding, we mean the numerical representation of data that captures the semantic meaning of words and allows LLMs to find concepts positioned closely to them in a multidimensional space.

Steps to Building a Retrieval-Augmented Generation (RAG) Pipeline With Milvus

TL;DR: This project focuses on building a RAG system using Milvus and OpenAI's API to efficiently answer users' questions based on the developer guide in the repositories.

You will utilize the GitHub REST API to download the developer guides from the Milvus repository.
Process the documents into vector representation for embedding using OpenAI's embedding model.
Create a collection in Milvus to store embeddings to enhance information retrieval and response generation.
Use the GPT-3.5-turbo OpenAI model to generate responses.

Prerequisites/Dependencies

To follow along with this tutorial, you'll need the following:

Python 3.9 or higher: Download it here.
An IDE or code editor of your choice: I recommend Google Colab, but you can also use Jupyter Notebook.

Setup and Installation

Before building the RAG, you'll need to install all your dependencies. Thus, open Jupyter Notebook locally and run.

    Python
   
   ! pip install --upgrade pymilvus openai requests tqdm

This code will install and upgrade:

pymilvus, which is the Milvus Python SDK
openai, the OpenAI Python API library
requests for making HTTP requests

Next, import os and get your OpenAI API key from the OpenAI developer dashboard.

    Python
   
   import os 

os.environ["OPENAI_API_KEY"] = "sk-***********"

Preparing the Data and Embedding Model

For this project, you can use the Milvus developer guides repository as the data source for your RAG pipeline. To do that, you'll download all the files within the developer guide directory of the repo using the script below.

This script uses the GitHub REST API to retrieve and download all the developer doc content with the .md extension and saves it in the milvus_docs folder.

Now that you have the markdown, you'll gather all the text from the .md files, split them, and store them in a single list called text_lines.

    Python
   
 

   import requests

api_url = "https://api.github.com/repos/milvus-io/milvus/contents/docs/developer_guides"
raw_base_url = "https://raw.githubusercontent.com/milvus-io/milvus/master/docs/developer_guides/"
docs_path = "milvus_docs"

if not os.path.exists(docs_path):
    os.makedirs(docs_path)

response = requests.get(api_url)

if response.status_code == 200:
    files = response.json()

    for file in files:
        if file['name'].endswith('.md'):  # Only select markdown files
            file_url = raw_base_url + file['name']

            # Download each markdown file
            file_response = requests.get(file_url)
            if file_response.status_code == 200:
                # Save the content to a local markdown file
                with open(os.path.join(docs_path, file['name']), "wb") as f:
                    f.write(file_response.content)
                print(f"Downloaded: {file['name']}")
            else:
                print(f"Failed to download: {file_url} (Status code: {file_response.status_code})")
else:
    print(f"Failed to fetch file list from {api_url} (Status code: {response.status_code})")
  

Prepare the Embedding Model With OpenAI

Embedding techniques ensure that similarity, classification, and search tasks can be performed on our text. The model will transform our text into vectors of floating-point numbers and use the distance between each vector to represent how similar the texts are.

    Python
   
   from glob import glob

text_lines = []

for file_path in glob(os.path.join(docs_path, "*.md"), recursive=True):
    with open(file_path, "r", encoding="utf-8") as file:
        file_text = file.read()

    text_lines += file_text.split("# ")

We will use the OpenAI client to make requests to the OpenAI API and interact with its embedding models. The OpenAI documentation (linked earlier) provides more information about the embedding models.

    Python
   
   from openai 

import OpenAI openai_client = OpenAI()

Next, you will need to write a function, emb_text, that takes your text strings and returns its embedding vector.

    Python
   
 

   def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")
        .data[0]
        .embedding
    )
  

Loading and Inserting the Data Into Milvus

You can run Milvus in various ways:

Milvus Lite is a lightweight version of Milvus that is great for small-scale projects.
Via Docker or Kubernetes: You would, however, need a server to serve as the URI (Milvus instance).
Via Zilliz Cloud, a fully managed cloud solution: You will need the URL and API keys for your Zilliz Cloud account.

Since we're using Milvus Lite, we'll first need to install pymilvus to connect to Milvus using the Milvus Python SDK.

    Python
   
   pip install -U pymilvus

Next, you'll create an instance of MilvusClient and specify a URI ("./milvus_demo.db") for storing the data. After that, define your collection. Think of a collection as a data schema that serves as a vector container. This is important for effectively organizing and indexing your data for similarity searches.

    Python
   
   from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")

collection_name = "my_rag_collection"

if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

Then you can test it.

    Python
   
   test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])

Next, you create a collection. By default, Milvus generates three fields:

an ID field for unique identification
a vector field for storing embeddings
a JSON field for accommodating non-schema-defined data

    Python
   
 

   milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level
)
  

Once done, insert the data.

    Python
   
   from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line), "text": line})

milvus_client.insert(collection_name=collection_name, data=data)

Building the RAG

You start by specifying a question.

    Python
   
   question = "What are the key features of Milvus that make it suitable for handling vector databases in AI applications?"

Using milvus_search, you search for the question using semantic top-3 matches in your collection storage.

    Python
   
 

   search_res = milvus_client.search(
    collection_name=collection_name,
    data=[
        emb_text(question)
    ],  
    limit=3,  # Return top 3 results
    search_params={"metric_type": "IP", "params": {}},  
    output_fields=["text"],  # Return the text field
)
  

Now, you process the text and use one of the GPT-3 models to generate a response to the question. You can make use of the GPT-3.5-turbo OpenAI model.

    Python
   
 

   import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)

SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
response = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)
  

Deploying the System

You can view the full code in this GitHub repository. To deploy your Google Colab RAG application using Docker, follow these steps:

First, download your Google Colab files as .py and .ipynb and put them in a folder. Alternatively, you can push the file to GitHub and clone the repo.

    Python
   
   git clone https://github.com/Bennykillua/Build_a_RAG_Milvus.git

2. Create a .env for your variable.

    Python
   
   OPENAI_API_KEY= sk-*********** 

MILVUS_ENDPOINT=./milvus_demo.db 

COLLECTION_NAME=my_rag_collection ```

3. Then install your dependencies. Alternatively, you can create a requirements.txt file.

4. Next, you will build and run the application inside a Docker container by creating a Dockerfile.

5. Start by downloading milvus-standalone-docker-compose.yml and add it to the folder with your .py file. Name the downloaded file as docker-compose.yml.

However, if your file is not present or incorrectly downloaded, you can redownload it using the command below:

    Python
   
   Invoke-WebRequest -Uri "https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml" -OutFile "docker-compose.yml"

6. Start Milvus by running docker-compose up -d. You can learn more about Milvus Standalone with Docker Compose in the documentation.

7. In your project directory, create a Dockerfile.

    Python
   
   FROM python:3.9-slim

WORKDIR /app

COPY . /app

RUN pip install -r requirements.txt
EXPOSE 8501

ENV OPENAI_API_KEY=your_openai_api_key
ENV MILVUS_ENDPOINT=./milvus_demo.db
ENV COLLECTION_NAME=my_rag_collection

# Run the app
CMD ["streamlit", "run", "build_rag_with_milvus.py"]

8. Next, build and run your Docker image:

    PowerShell
   
   docker build -t my_rag_app . 

docker run -it -p 8501:8501 my_rag_app

Build With Milvus

LLMs are great, but they come with some limitations, like hallucinations. However, with the right tool, these limitations can be managed. This article shows how to manage hallucinations seamlessly by building RAGs with Milvus. Milvus makes it easy for developers to perform embedded similarity searches and use unstructured data for their LLMs. By using Milvus in your project, you can create accurate, informative LLMs with up-to-date information. Also, Milvus's architecture is constantly being improved since it is an open-source vector database.

If you have read this far, I want to say thank you — I appreciate it! You can connect with me on LinkedIn or leave a comment.

Information retrieval Data (computing) jupyter notebook vector database AI

Opinions expressed by DZone contributors are their own.

Related

Trending