DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Performance Engineering Management: A Quick Guide
  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Node.js Performance Tuning: Advanced Techniques to Follow

Trending

  • Go 1.24+ Native FIPS Support for Easier Compliance
  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Operational Principles, Architecture, Benefits, and Limitations of Artificial Intelligence Large Language Models
  1. DZone
  2. Data Engineering
  3. Data
  4. Fixing a SOLR Memory Leak

Fixing a SOLR Memory Leak

Let's get things working again.

By 
Harish Kumar Murugesan user avatar
Harish Kumar Murugesan
·
Dec. 18, 21 · Analysis
Likes (25)
Comment
Save
Tweet
Share
7.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog, we are going to learn about memory leaks occurring in SOLR QueryResultCache, how the RCA was carried out, and the solution given to resolve the issue.

In the application under test, SOLR was used as a component to store, search, and retrieve the contents. SOLR 7.5 was used in this application. While conducting the performance testing, it was observed that the SOLR Slave CPU was increasing constantly for every test as given below:

  • Test 1 – 40% CPU usage
  • Test 2 – 60% CPU usage
  • Test 3 – 80% CPU usage

In the tests, only the contents were retrieved, and there was no write to the contents in the SOLR Slave server. The SOLR Slave server was not restarted between the tests, as it will not be in production. The CPU usage drill-down view in Dynatrace did not show any specific evidence of where the actual time was spent. In the GC graph, it was observed that the old generation memory was growing, and the time spent on the young GC was a little higher. The old generation size has grown to 9 GB. Frequent minor GCs were seen, with an average time to GC of around 850 milliseconds, with a few spikes in GC time that went up to 1.3 seconds. Since the old gen was growing, it was suspected that some objects were growing in the memory. When looking at the SOLR Cache in Dynatrace, it was observed that there were only inserts into QueryResultCache, but there was no eviction seen in the cache. The above tests were repeated after restarting the SOLR, and a similar behavior was observed. This time, QueryResultCache was monitored closely via SOLR console and increase in size of the cache was observed as given below:

  • Test 1 – 190K elements in cache
  • Test 2 – 350K elements in cache
  • Test 3 – 520K elements in cache

A heap dump was taken after these tests, and it was observed that around 8GB of the memory was occupied by FastLRUCache and its contents. While looking at the SOLR configuration, it was observed that the below settings were given for the QueryResultCache:

<queryResultCache class="solr.FastLRUCache"   size="5000"  initialSize="512" maxRamMB="1048" autowarmCount="0"/>

In this case, QueryResultCache was not honoring both ‘size’ and ‘maxRamMB’ parameters. It went beyond the values set. Cache size went to 520K, as opposed to the 5K size set, and it crossed 8GB in size in the heap dump, as opposed to the maximum of 1GB limit set. Instead of using both the parameters to limit the cache, it was decided to restrict the cache using the ‘size’ parameter.

The following settings were applied in SOLR:

<queryResultCache class="solr.FastLRUCache " size="150000" initialSize="512" autowarmCount="0"/>

Three tests were repeated, and it was observed that the CPU usage of SOLR was constant at around 40% in all three tests. Also, the QueryResultCache grew to 150K size, and, after that, evictions were seen in the cache by restricting the size within the defined limit. The old generation memory remained constant, and the time to garbage collect the young generation also came down. Old generation size grew to 3 GB and was consistent across the tests. The average time to minor GC was around 650 milliseconds without any spikes. The test result was successful without any CPU issues in SOLR.

Based on the root cause analysis, it looks like there is a leak in the SOLR cache if both the ‘size’ and ‘maxRamMB’ are set in SOLR for QueryResultCache. Setting just the ‘size’ parameter for the QueryResultCache avoids this memory leak.

Memory (storage engine) garbage collection Testing Cache (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Performance Engineering Management: A Quick Guide
  • Using Heap Dumps to Find Memory Leaks
  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Node.js Performance Tuning: Advanced Techniques to Follow

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

OSZAR »