DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • MariaDB Vector Edition: Designed for AI
  • Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data
  • Architectural Patterns for Enterprise Generative AI Apps: DSFT, RAG, RAFT, and GraphRAG

Trending

  • How Trustworthy Is Big Data?
  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning
  • FIPS 140-3: The Security Standard That Protects Our Federal Data
  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Build Retrieval-Augmented Generation (RAG) With Milvus

Build Retrieval-Augmented Generation (RAG) With Milvus

Learn to manage hallucinations by building RAGs with Milvus. Developers can embed similarity searches and use unstructured data for LLMs.

By 
Ifeanyi Benny Iheagwara user avatar
Ifeanyi Benny Iheagwara
·
Nov. 05, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
12.1K Views

Join the DZone community and get the full member experience.

Join For Free

It's no secret that traditional large language models (LLMs) often hallucinate — generate incorrect or nonsensical information — when asked knowledge-intensive questions requiring up-to-date information, business, or domain knowledge. This limitation is primarily because most LLMs are trained on publicly available information, not your organization's internal knowledge base or proprietary custom data. This is where retrieval-augmented generation (RAG), a model introduced by Meta AI researchers, comes in.

RAG addresses an LLM's limitation of over-relying on pre-trained data for output generation by combining parametric memory with non-parametric memory through vector-based information retrieval techniques. Depending on the scale, this vector-based information retrieval technique often works with vector databases to enable fast, personalized, and accurate similarity searches. In this guide, you'll learn how to build a retrieval-augmented generation (RAG) with Milvus.

What Is RAG?

RAG simply means retrieval-augmented generation, a cost-effective process of optimizing the output of an LLM to generate context and responses outside its knowledge base without retraining the model. 

This is important because LLMs are usually constrained by the cut-off period of their training data, which could lead to unpredictable, noncontextual, and inaccurate responses. RAGs address this by integrating real-time vector-based information retrieval techniques to get real-time information.

What Is Milvus?

Milvus is an open-source, high-performance vector database specially designed to manage and retrieve unstructured data through vector embeddings. Unlike other vector databases, Milvus is optimized for fast storage and offers users a flexible and scalable database with index support and search capabilities.

One thing that makes vector databases interesting is their vector embedding and data storage capabilities, which come with a real-time data retrieval system to help reduce hallucinations. By vector embedding, we mean the numerical representation of data that captures the semantic meaning of words and allows LLMs to find concepts positioned closely to them in a multidimensional space. 

Vector embedding quote

Steps to Building a Retrieval-Augmented Generation (RAG) Pipeline With Milvus

TL;DR: This project focuses on building a RAG system using Milvus and OpenAI's API to efficiently answer users' questions based on the developer guide in the repositories.

  • You will utilize the GitHub REST API to download the developer guides from the Milvus repository. 
  • Process the documents into vector representation for embedding using OpenAI's embedding model.
  • Create a collection in Milvus to store embeddings to enhance information retrieval and response generation.
  • Use the GPT-3.5-turbo OpenAI model to generate responses.

Prerequisites/Dependencies

To follow along with this tutorial, you'll need the following:

  • Python 3.9 or higher: Download it here.
  • An IDE or code editor of your choice: I recommend Google Colab, but you can also use Jupyter Notebook.

Setup and Installation

Before building the RAG, you'll need to install all your dependencies. Thus, open Jupyter Notebook locally and run.

Python
 
! pip install --upgrade pymilvus openai requests tqdm


This code will install and upgrade:

  • pymilvus, which is the Milvus Python SDK
  • openai, the OpenAI Python API library
  • requests for making HTTP requests

Next, import os and get your OpenAI API key from the OpenAI developer dashboard.

Python
 
import os 

os.environ["OPENAI_API_KEY"] = "sk-***********"


Preparing the Data and Embedding Model

For this project, you can use the Milvus developer guides repository as the data source for your RAG pipeline. To do that, you'll download all the files within the developer guide directory of the repo using the script below.

This script uses the GitHub REST API to retrieve and download all the developer doc content with the .md extension and saves it in the milvus_docs folder.

Now that you have the markdown, you'll gather all the text from the .md files, split them, and store them in a single list called text_lines. 

Python
 
import requests

api_url = "https://api.github.com/repos/milvus-io/milvus/contents/docs/developer_guides"
raw_base_url = "https://raw.githubusercontent.com/milvus-io/milvus/master/docs/developer_guides/"
docs_path = "milvus_docs"

if not os.path.exists(docs_path):
    os.makedirs(docs_path)

response = requests.get(api_url)

if response.status_code == 200:
    files = response.json()

    for file in files:
        if file['name'].endswith('.md'):  # Only select markdown files
            file_url = raw_base_url + file['name']

            # Download each markdown file
            file_response = requests.get(file_url)
            if file_response.status_code == 200:
                # Save the content to a local markdown file
                with open(os.path.join(docs_path, file['name']), "wb") as f:
                    f.write(file_response.content)
                print(f"Downloaded: {file['name']}")
            else:
                print(f"Failed to download: {file_url} (Status code: {file_response.status_code})")
else:
    print(f"Failed to fetch file list from {api_url} (Status code: {response.status_code})")


Prepare the Embedding Model With OpenAI

Embedding techniques ensure that similarity, classification, and search tasks can be performed on our text. The model will transform our text into vectors of floating-point numbers and use the distance between each vector to represent how similar the texts are.

Python
 
from glob import glob

text_lines = []

for file_path in glob(os.path.join(docs_path, "*.md"), recursive=True):
    with open(file_path, "r", encoding="utf-8") as file:
        file_text = file.read()

    text_lines += file_text.split("# ")


We will use the OpenAI client to make requests to the OpenAI API and interact with its embedding models. The OpenAI documentation (linked earlier) provides more information about the embedding models. 

Python
 
from openai 

import OpenAI openai_client = OpenAI()


Next, you will need to write a function, emb_text, that takes your text strings and returns its embedding vector. 

Python
 
def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")
        .data[0]
        .embedding
    )


Loading and Inserting the Data Into Milvus

You can run Milvus in various ways:

  • Milvus Lite is a lightweight version of Milvus that is great for small-scale projects. 
  • Via Docker or Kubernetes: You would, however, need a server to serve as the URI (Milvus instance).
  • Via Zilliz Cloud, a fully managed cloud solution: You will need the URL and API keys for your Zilliz Cloud account.

Since we're using Milvus Lite, we'll first need to install pymilvus to connect to Milvus using the Milvus Python SDK.

Python
 
pip install -U pymilvus


Next, you'll create an instance of MilvusClient and specify a URI ("./milvus_demo.db") for storing the data. After that, define your collection. Think of a collection as a data schema that serves as a vector container. This is important for effectively organizing and indexing your data for similarity searches.

Collection definition quote

Python
 
from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")

collection_name = "my_rag_collection"

if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)


Then you can test it.

Python
 
test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])


Next, you create a collection. By default, Milvus generates three fields:

  • an ID field for unique identification
  • a vector field for storing embeddings
  • a JSON field for accommodating non-schema-defined data
Python
 
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level
)


Once done, insert the data.

Python
 
from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line), "text": line})

milvus_client.insert(collection_name=collection_name, data=data)


Building the RAG

You start by specifying a question.

Python
 
question = "What are the key features of Milvus that make it suitable for handling vector databases in AI applications?"


Using milvus_search, you search for the question using semantic top-3 matches in your collection storage.

Python
 
search_res = milvus_client.search(
    collection_name=collection_name,
    data=[
        emb_text(question)
    ],  
    limit=3,  # Return top 3 results
    search_params={"metric_type": "IP", "params": {}},  
    output_fields=["text"],  # Return the text field
)


Now, you process the text and use one of the GPT-3 models to generate a response to the question. You can make use of the GPT-3.5-turbo OpenAI model.

Python
 
import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))

context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)

SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
response = openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)


Response output

Deploying the System

You can view the full code in this GitHub repository. To deploy your Google Colab RAG application using Docker, follow these steps:

  1. First, download your Google Colab files as .py and .ipynb and put them in a folder. Alternatively, you can push the file to GitHub and clone the repo.

Python
 
git clone https://github.com/Bennykillua/Build_a_RAG_Milvus.git


2. Create a .env for your variable. 

Python
 
OPENAI_API_KEY= sk-*********** 

MILVUS_ENDPOINT=./milvus_demo.db 

COLLECTION_NAME=my_rag_collection ```


3. Then install your dependencies. Alternatively, you can create a requirements.txt file.

requirements.txt file components

4. Next, you will build and run the application inside a Docker container by creating a Dockerfile.

5. Start by downloading milvus-standalone-docker-compose.yml and add it to the folder with your .py file. Name the downloaded file as docker-compose.yml.

However, if your file is not present or incorrectly downloaded, you can redownload it using the command below:

Python
 
Invoke-WebRequest -Uri "https://github.com/milvus-io/milvus/releases/download/v2.0.2/milvus-standalone-docker-compose.yml" -OutFile "docker-compose.yml"


6. Start Milvus by running docker-compose up -d. You can learn more about Milvus Standalone with Docker Compose in the documentation.

7. In your project directory, create a Dockerfile.

Python
 
FROM python:3.9-slim

WORKDIR /app

COPY . /app

RUN pip install -r requirements.txt
EXPOSE 8501

ENV OPENAI_API_KEY=your_openai_api_key
ENV MILVUS_ENDPOINT=./milvus_demo.db
ENV COLLECTION_NAME=my_rag_collection

# Run the app
CMD ["streamlit", "run", "build_rag_with_milvus.py"]


8. Next, build and run your Docker image:

Build and run Docker image

PowerShell
 
docker build -t my_rag_app . 

docker run -it -p 8501:8501 my_rag_app


Build With Milvus

LLMs are great, but they come with some limitations, like hallucinations. However, with the right tool, these limitations can be managed. This article shows how to manage hallucinations seamlessly by building RAGs with Milvus. Milvus makes it easy for developers to perform embedded similarity searches and use unstructured data for their LLMs. By using Milvus in your project, you can create accurate, informative LLMs with up-to-date information. Also, Milvus's architecture is constantly being improved since it is an open-source vector database.

If you have read this far, I want to say thank you — I appreciate it! You can connect with me on LinkedIn or leave a comment.

Information retrieval Data (computing) jupyter notebook vector database AI

Opinions expressed by DZone contributors are their own.

Related

  • Multimodal RAG Is Not Scary, Ghosts Are Scary
  • MariaDB Vector Edition: Designed for AI
  • Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data
  • Architectural Patterns for Enterprise Generative AI Apps: DSFT, RAG, RAFT, and GraphRAG

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

OSZAR »