DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Automatic Code Transformation With OpenRewrite
  • A Complete Guide to Modern AI Developer Tools
  • Google Cloud Document AI Basics
  • Building a Simple Todo App With Model Context Protocol (MCP)

Trending

  • GDPR Compliance With .NET: Securing Data the Right Way
  • A Guide to Developing Large Language Models Part 1: Pretraining
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation
  1. DZone
  2. Coding
  3. Tools
  4. Build a Local AI-Powered Document Summarization Tool

Build a Local AI-Powered Document Summarization Tool

Learn how to build a simple document summarizer using Streamlit for the interface and Ollama for running AI models locally.

By 
Vamsi Kavuri user avatar
Vamsi Kavuri
DZone Core CORE ·
Feb. 24, 25 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
25.5K Views

Join the DZone community and get the full member experience.

Join For Free

When I began my journey into the field of AI and large language models (LLMs), my initial aim was to experiment with various models and learn about their effectiveness. Like most developers, I also began using cloud-hosted services, enticed by the ease of quick setup and availability of ready-to-use LLMs at my fingertips.

But pretty quickly, I ran into a snag: cost. It is convenient to use LLMs in the cloud, but the pay-per-token model can suddenly get really expensive, especially when working with lots of text or asking many questions. It made me realize I needed a better way to learn and experiment with AI without blowing my budget. This is where Ollama came in, and it offered a rather interesting solution.

 By using Ollama, you can:

  • Load and experiment with multiple LLMs locally
  • Avoid API rate limits and usage restrictions
  • Customize and fine-tune LLMs

In this article, we will explore how to build a simple document summarization tool using Ollama, Streamlit, and LangChain. Ollama allows us to run LLMs locally, Streamlit provides a web interface so that users may interact with those models smoothly, and LangChain offers pre-built chains for simplified development.

Environment Setup

  • Ensure Python 3.12 or higher is installed.
  • Download and install Ollama
  • Fetch llama3.2 model via ollama run llama3.2
  • I prefer to use Conda for managing dependencies and creating isolated environments. Create a new Conda environment and then install the necessary packages mentioned below.
Shell
 
pip install streamlit langchain langchain-ollama langchain-community langchain-core pymupdf


Now, let's dive into building our document summarizer. We will start by creating a Streamlit app to handle uploading documents and displaying summaries in a user-friendly interface. 

Next, we will focus on pulling the text out of the uploaded documents (supports only PDF and text documents) and preparing everything for the summarization chain. 

Finally, we will bring Ollama to actually perform the summarization utilizing its local language model capabilities to generate concise and informative summaries.

The code below contains the complete implementation, with detailed comments to guide you through each step.

Python
 
import os
import tempfile
import streamlit as stlit
from langchain_text_splitters import CharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain_ollama import OllamaLLM
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_core.documents import Document

# Create Streamlit app by page configuration, title and a file uploader
stlit.set_page_config(page_title="Local Document Summarizer", layout="wide")
stlit.title("Local Document Summarizer")
# File uploader that accepts pdf and txt files only
uploaded_file = stlit.file_uploader("Choose a PDF or Text file", type=["pdf", "txt"])

# Process the uploaded file and extracts text from it
def process_file(uploaded_file):
    if uploaded_file.name.endswith(".pdf"):
        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file.write(uploaded_file.getvalue())
        loader = PyMuPDFLoader(temp_file.name) 
        docs = loader.load()
        extracted_text = " ".join([doc.page_content for doc in docs])
        os.unlink(temp_file.name)
    else:
        # Read the content directly for text files, no need for tempfile 
        extracted_text = uploaded_file.getvalue().decode("utf-8")
    return extracted_text

# Process the extracted text and return summary
def summarize(text):
    # Split the text into chunks for processing and create Document object
    chunks = CharacterTextSplitter(chunk_size=500, chunk_overlap=100).split_text(text)
    docs = [Document(page_content=chunk) for chunk in chunks]

    # Initialize the LLM with llama3.2 model and load the summarization chain
    chain = load_summarize_chain(OllamaLLM(model="llama3.2"), chain_type="map_reduce")
    
    return chain.invoke(docs)

if uploaded_file:
    # Process and preview the uploaded file content
    extracted_text = process_file(uploaded_file)
    stlit.text_area("Document Preview", extracted_text[:1200], height=200)

    # Generate a summary of the extracted text
    if stlit.button("Generate Summary"):
        with stlit.spinner("Summarizing...may take a few seconds"):
            summary_text = summarize(extracted_text)
            stlit.text_area("Summary", summary_text['output_text'], height=400)



Running the App

Save the above code snippet into summarizer.py, then open your terminal, navigate to where you saved the file, and run:

Shell
 
streamlit run summarizer.py


That should start your Streamlit app and automatically open in your web browser, pointing to a local URL like http://localhost:8501.

Conclusion

You've just completed the document summarization tool by combining Streamlit’s simplicity and Ollama’s local model hosting capabilities. This example utilizes the llama3.2 model, but you can experiment with other models to determine what is best for your needs, and you can also consider adding support for additional document formats, error handling, and customized summarization parameters. 

Happy AI experimenting!

AI Document Tool

Opinions expressed by DZone contributors are their own.

Related

  • Automatic Code Transformation With OpenRewrite
  • A Complete Guide to Modern AI Developer Tools
  • Google Cloud Document AI Basics
  • Building a Simple Todo App With Model Context Protocol (MCP)

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

OSZAR »