DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Thumbnail Generator Microservice for PDF in Spring Boot
  • Build a Local AI-Powered Document Summarization Tool
  • How to Split PDF Files into Separate Documents Using Java
  • Gemini 2.0 Flash (Experimental): A Deep Dive for Developers

Trending

  • Rethinking Recruitment: A Journey Through Hiring Practices
  • Microsoft Azure Synapse Analytics: Scaling Hurdles and Limitations
  • Docker Model Runner: Streamlining AI Deployment for Developers
  • Recurrent Workflows With Cloud Native Dapr Jobs
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Google Cloud Document AI Basics

Google Cloud Document AI Basics

This simple example shows how to use a custom extractor in Google's Doc AI to process W-2s and use a PDF as part of the context to Gemini.

By 
Imran Burki user avatar
Imran Burki
·
Apr. 30, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
2.4K Views

Join the DZone community and get the full member experience.

Join For Free

Google Cloud’s Document AI (Doc AI) helps organizations automate the processing, extraction, and classification of massive amounts of documents. 

Doc AI has a lot of capabilities and use cases, and here are a few ways it can help organizations. They’re tailored towards the public sector since that’s the customers I help; however, these use cases also apply to private companies.

Doc AI Example Use Cases

Processing Applications

  • Automating the extraction of key data from applications such as services/benefits, driver’s licenses, and building permits.

Tax Document Processing

  • Extracting information from tax forms (W-2s, 1040s, etc.) for faster processing and auditing. We’ll focus on this example.

Healthcare Administration

  • Processing medical documents, such as medical records and insurance claims, for faster payment.

Unemployment

  • Streamline the process of collecting various documents, quickly adjudicate, and reduce the time it takes to process benefits.

Let’s Get Started!

In this blog post, we’ll review how to create a custom document extractor for W-2 forms, use the Doc AI API to extract information from a document, and pass the W-2 PDF to Gemini to summarize the document.

Create a Custom Processor

Rather than going over the steps to create a custom extractor in this blog post, you can reference the Document AI Workbench — Custom Document Extractor Google codelab. The codelab does an excellent job of showing you, step by step, how to easily create, train, test, validate, and deploy a custom processor using the Doc AI Workbench without writing any code.

Here’s what one of the W-2s looks like after you’ve labeled it in Doc AI Workbench. You can choose three different training methods with a custom extractor. I chose one that uses Gemini 1.5 Flash. The Gen AI training method requires about 50 documents for the best results. You can learn more about the training methods here.

Labeled W-2

Labeled W-2

You can view evaluation metrics and upload a document to test as well.

Evaluation metrics

Evaluation metrics

Application Overview

Our application is very simple. You upload a W-2 PDF, Doc AI extracts the key items, Gemini 2.0 Flash summarizes the PDF, and the results are displayed as shown below. Rather than go through the entire application, I’ll just show the code on document extraction and summarization using Gemini Flash 2.0. I plan on sharing the entire code on GitHub soon.

Document processor

Here’s the sample W-2 we’ll upload.

W-2 First Page

W-2 First Page

W-2 Second Page

Doc AI Code

Here’s the code for Doc AI and an explanation of what it does.

Python
 
from google.cloud import documentai
import os

def process_document(file):
    try:
        # Initialize Document AI client
        client = documentai.DocumentProcessorServiceClient()
        
        # Configure processor path
        LOCATION = 'us'  # Format is 'us' or 'eu'
        PROJECT_ID = os.getenv('PROJECT_ID')
        PROCESSOR_ID = os.getenv('PROCESSOR_ID')
        
        if not PROJECT_ID or not PROCESSOR_ID:
            raise ValueError("PROJECT_ID and PROCESSOR_ID must be set in .env file")
        
        PROCESSOR_PATH = f"projects/{PROJECT_ID}/locations/{LOCATION}/processors/{PROCESSOR_ID}"
        print(f"Using processor path: {PROCESSOR_PATH}")
        
        # Read file content
        file_content = file.read()
        print(f"Read file content, size: {len(file_content)} bytes")
        
        # Configure the process request
        raw_document = documentai.RawDocument(
            content=file_content,
            mime_type="application/pdf"
        )
        
        # Process the document
        request = documentai.ProcessRequest(
            name=PROCESSOR_PATH,
            raw_document=raw_document
        )
        
        print("Sending request to Doc AI...")
        result = client.process_document(request=request)
        print("Received response from Doc AI")
        
        document = result.document
        
        # Extract entities from the processed document
        extracted_data = {}
        for entity in document.entities:
            extracted_data[entity.type_] = entity.mention_text
            
        print(f"Extracted {len(extracted_data)} entities")
        return extracted_data
        
    except Exception as e:
        print(f"Error in process_document: {str(e)}")
        raise


  1. Import libraries: Import the Doc AI library.
  2. Doc AI processor: Get the Doc AI processor information from the workbench.
  3. Read and configure file: Read the file into the file_content variable. Load the PDF into raw_document variable so that Doc AI can scan it.
  4. Process document: Send the document to Doc AI. Save the results to the document variable.
  5. Extract key data: The extracted_data variable is a dictionary. It gets the entities in the document and returns them.

Here’s the final output.

Doc AI Output

Doc AI Output

Summarize PDF Using Gemini

I’m using the Gemini Flash 2.0 model to create a summary of the W-2.

Python
 
import google.generativeai as genai
import os

def get_summary(file):

    api_key = os.getenv('GEMINI_API_KEY')
    genai.configure(api_key=api_key)

    
    sample_pdf = genai.upload_file(path="PDF Path", display_name="file")

    model = genai.GenerativeModel(model_name="gemini-2.0-flash")
    
    response = model.generate_content(
        contents=[sample_pdf, "Give me a summary of this pdf file." ]
    )
    print(response.text)

    return response.text


The code is really simple. One of the things I love about Gemini 2.0 is that you can give it a PDF or a TXT directly in the prompt request or even provide multimodal prompts. There’s no need for me to build RAG or do other preprocessing. Simply put the PDF inside the model.generate_content prompt request as shown in the code above.

Here are the results of Gemini Flash 2.0.

Gemini Summarization

Gemini Summarization

References

Here are some additional references:

  • Processing Documents Using GCP Document AI
  • GCP Document AI Overview
  • Process Requests with Doc AI
  • Code on Github
AI Document PDF Google (verb)

Opinions expressed by DZone contributors are their own.

Related

  • Thumbnail Generator Microservice for PDF in Spring Boot
  • Build a Local AI-Powered Document Summarization Tool
  • How to Split PDF Files into Separate Documents Using Java
  • Gemini 2.0 Flash (Experimental): A Deep Dive for Developers

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

OSZAR »