June 16, 2026 · 13 min read · performance.qa

Java Application Performance Monitoring: 7 JVM Bottlenecks

Java application performance monitoring done right: diagnose GC pauses, thread starvation, heap leaks and JIT warmup, mapped to the exact tool and fix.

Java Application Performance Monitoring: 7 JVM Bottlenecks

Your Java service is slow. Latency is spiking, the dashboard is red, and someone is asking why. The instinct is to reach for a tool and start poking. That is backwards. The fastest path from “it’s slow” to “it’s fixed” starts with the symptom, narrows to the JVM subsystem responsible, and only then picks the tool that confirms it.

This is a vendor-neutral diagnostic playbook for the seven JVM bottlenecks that cause most slow Java applications. Each one gets a self-contained signature - a symptom plus the confirming signal - the named tool that exposes it, and the concrete remediation step. We anchor it to 2026 reality: virtual threads, GraalVM native image, generational ZGC, and continuous production profiling. By the end you will have a decision tree you can run the next time a Java service goes sideways.

How to monitor a Java application (the diagnostic order of operations)

The biggest mistake in Java application performance monitoring is starting from the tool instead of the symptom. Different symptoms implicate completely different JVM subsystems, and grabbing the wrong tool wastes the time you do not have during an incident.

There are three top-level symptoms, and each points somewhere different:

  • Latency spikes (p99 jumps, p50 stays flat) usually mean GC pauses, lock contention, or JIT warmup.
  • Throughput collapse (requests per second falls off a cliff) usually means thread pool starvation or connection pool exhaustion.
  • Memory growth (heap or RSS climbs until OOM) means heap pressure or a real leak, on-heap or off-heap.

Before you change anything, capture four signals. These are the raw material every diagnosis is built from:

  1. GC logs - pause times, frequency, and allocation rate.
  2. Thread dumps - what every thread is doing right now, and whether pools are maxed.
  3. Heap usage - live set, growth trend, and where the bytes are going.
  4. Method-level CPU profile - which code is actually burning cycles.

Here is the trap most teams fall into: an APM agent alone is not enough. Datadog, New Relic, and Dynatrace are excellent at request-level tracing - they tell you which endpoint is slow and which downstream call it waited on. But they largely treat the JVM as a black box. A 300ms stop-the-world GC pause, a lock that 40 threads are queued behind, or an allocation hot spot churning the young generation will not show up cleanly in a trace waterfall. You need to pair APM with a JVM-internal profiler - JDK Flight Recorder or async-profiler - to see inside the box.

The diagnostic decision tree

SymptomLikely JVM subsystemTool to confirm
p99 latency spikes, p50 flatGC pausesGC logs + JFR
Latency bad only after deploy, recoversJIT warmupJFR compilation events
Throughput collapses under loadThread pool starvationThread dump
Latency rises with concurrencyLock contentionasync-profiler (lock mode)
Requests stall, downstream looks fineConnection pool exhaustionPool metrics + thread dump
Heap grows until OOMHeap leak / pressureHeap dump (Eclipse MAT)
RSS grows, heap looks fineOff-heap / native leakNMT + JFR

Run the tree top to bottom. Match the symptom, capture the confirming signal, then act. Now let’s walk the seven bottlenecks.

Bottleneck 1-2: Garbage collection pauses and heap pressure

Signature: p99 latency spikes in regular bursts while p50 stays flat, and the spikes line up with stop-the-world pauses in the GC log.

Garbage collection is the single most common source of mysterious Java latency. The mechanics are simple even if the tuning is not. Your application allocates objects, the heap fills, and the collector periodically stops application threads to reclaim dead objects. Those stops are your latency spikes.

Reading GC logs

Three numbers tell the whole story:

  • Pause time - how long each stop-the-world event lasts. This is the latency you feel.
  • Pause frequency - how often pauses happen. More frequent pauses mean more cumulative stall.
  • Allocation rate - megabytes per second of new objects. This is the driver. High allocation rate forces more frequent collections, which means more pauses.

The mistake is tuning pause time directly. Often the real fix is upstream: a hot loop allocating throwaway objects, boxing primitives in a tight path, or building giant intermediate collections. Find the allocation hot spot with a profiler (async-profiler in allocation mode, or JFR’s allocation events) and the GC problem frequently disappears without touching a single GC flag.

G1 vs ZGC vs Shenandoah in 2026

The collector choice matters, but only after you understand your workload:

CollectorUse whenAvoid when
G1Balanced throughput and latency, heaps up to ~32GB, the safe defaultYou need consistent sub-millisecond pauses
ZGC (generational, 2026)Low, predictable pauses on large heaps; latency-sensitive servicesTiny heaps where its overhead is not worth it
ShenandoahLow pauses, concurrent compaction, similar niche to ZGCThroughput-bound batch where pause time does not matter
Parallel GCPure throughput batch jobs, pauses irrelevantAny interactive, latency-sensitive service

The 2026 detail that matters: generational ZGC is now the default ZGC mode, and it closed most of the throughput gap that made teams hesitant to switch from G1. For a latency-sensitive service on a large heap, generational ZGC is often the single highest-leverage change you can make.

Heap leak vs heap pressure

These look identical from a distance - both show a growing heap - but the fix is opposite:

  • Heap pressure means the heap is simply too small for the live working set. GC runs constantly trying to keep up, burning CPU and causing pauses, but the heap stabilizes after each collection. Fix: size the heap correctly, or reduce allocation.
  • A real leak means the live set itself grows without bound. GC reclaims less each cycle until you hit OutOfMemoryError. Fix: find and release the retained reference (covered in bottleneck 7).

The tell is the post-GC heap floor. If the floor is flat and the heap just cycles, it is pressure. If the floor keeps climbing, it is a leak.

Slow Java app and the heap is the suspect? A Performance Audit gives you five days of JVM profiling that pinpoints the top bottlenecks and a validated remediation plan - so you fix the cause, not the symptom.

Bottleneck 3-4: Thread pool starvation and lock contention

Signature for starvation: throughput collapses under load while CPU sits idle, and a thread dump shows the pool fully occupied with threads blocked on the same downstream.

This is the bottleneck most often misdiagnosed as “the database is slow.” When a fixed thread pool is exhausted, every incoming request queues behind a busy thread. From the outside it looks like the whole service slowed down. In reality a maxed pool whose threads are all waiting on a slow downstream call presents exactly like a slow downstream - because functionally it is one, just upstream of where you are looking.

The thread dump is the truth serum. Take one (or several seconds apart) and count threads in RUNNABLE, BLOCKED, and WAITING. If your 200-thread pool shows 200 threads all parked in the same JDBC call, you have starvation, not a slow query. The query might be fine; you just have too few threads or too slow a downstream for your arrival rate.

Lock contention

Signature: latency rises as concurrency rises, CPU is not saturated, and a profile shows threads parked on the same monitor.

Lock contention is starvation’s quieter cousin. A synchronized block or a hot ReentrantLock serializes threads that should run in parallel. Add load and they queue. async-profiler in lock mode is the right tool here - it produces a flame graph of where threads are blocking, which a thread dump can only hint at. Look for a single synchronized method or shared mutable structure on the hot path.

Virtual threads (Project Loom) in 2026

Virtual threads are the headline JVM feature of this era, and they genuinely fix one class of starvation: the “thread-per-request blocking on I/O” pattern. With virtual threads, blocking on a database or HTTP call no longer pins a precious platform thread, so you can have millions of cheap threads and the pool-exhaustion problem largely evaporates.

But virtual threads also hide problems, and that is the part the marketing skips:

  • Pinning - a virtual thread that blocks inside a synchronized block pins its carrier platform thread, silently reintroducing the starvation you thought you removed. Replace synchronized with ReentrantLock on hot paths.
  • Unbounded concurrency - if blocking no longer creates back-pressure, your service can flood a downstream that does have a small pool, just moving the bottleneck one hop away.

Remediation: size pools to your actual concurrency and downstream limits, push blocking calls behind clear async boundaries, replace blocking calls with non-blocking equivalents where the latency budget demands it, and audit for synchronized on virtual-thread paths.

Bottleneck 5-6: JIT warmup and connection pool limits

Signature for JIT warmup: p99 is terrible for the first minutes after every deploy or restart, then settles on its own with no change you made.

The JVM ships interpreted bytecode and compiles hot methods to native code on the fly. Until the JIT compiler has profiled and compiled your hot paths, everything runs slower. That is why a freshly deployed service has an ugly p99 for the first few minutes - and why autoscaling that adds cold instances during a traffic spike can make latency worse before it gets better.

What to do about it:

  • Tiered compilation is on by default and is usually right. The knob worth knowing is reducing time to peak performance for short-lived or rapidly-scaled services.
  • AOT / GraalVM native image compiles ahead of time, so there is essentially no warmup and startup is near-instant - ideal for serverless and scale-to-zero. The trade-off: lower peak throughput than a warmed-up JIT, and reduced runtime optimization. For a latency-sensitive long-running service, a warm JIT often still wins; for fast cold-start workloads, native image wins.
  • Warmup strategies - send synthetic traffic to a new instance before it joins the load balancer, so it compiles its hot paths before serving real users.

Signature for connection pool exhaustion: requests stall, the database CPU and the downstream service both look healthy, but your app threads are waiting to acquire a connection.

This is “slow Java” that is not the JVM’s fault at all. Your HikariCP pool has 10 connections; under load, the 11th request waits for one to free up. The query is fast, the database is bored, and yet requests pile up. The fix is to size the connection pool in concert with your thread model and the downstream’s real capacity - a bigger pool is not always better, because the database has its own limits. Tune acquisition timeouts so a starved request fails fast and visibly instead of hanging. See our database query optimization guide when the queries themselves turn out to be the real cost.

Bottleneck 7: Memory leaks and off-heap growth

Signature: memory climbs steadily until OutOfMemoryError, restarts buy you a few hours or days, then it climbs again.

Memory leaks in Java are not about forgetting to free memory - the GC handles that. They are about unintentionally retaining references so the GC cannot reclaim objects that are logically dead.

Classic on-heap leaks

Three patterns cause most of them:

  • Static collections - a static Map or List used as a cache with no eviction. It only grows.
  • Classloader leaks - common in app servers and hot-reload setups, where an old classloader (and everything it loaded) stays reachable across redeploys.
  • ThreadLocal leaks - values set on a pooled thread and never removed, accumulating as the pool recycles threads.

The tool is a heap dump analyzed in Eclipse MAT. MAT’s dominator tree and “leak suspects” report point straight at the object retaining the most memory. Trace the retention path back to the static field, classloader, or ThreadLocal holding it, then release or bound it.

Off-heap and native growth

Here is the leak that fools everyone: the heap looks perfectly healthy, GC is calm, but the process RSS keeps climbing until the container OOM-kills it. The growth is off-heap - direct ByteBuffers, Netty’s pooled allocator, memory-mapped files, or native memory allocated through JNI. Heap-only tools are blind to it.

The right tool is Native Memory Tracking (NMT). Start the JVM with -XX:NativeMemoryTracking=summary, then use jcmd <pid> VM.native_memory summary to see where native memory is going - thread stacks, code cache, direct buffers, internal structures. Pair it with JFR for direct-buffer allocation events. Once you localize the category, the fix is usually bounding a buffer pool, closing resources deterministically, or capping -XX:MaxDirectMemorySize.

Remediation path: confirm it is a leak (post-GC floor rises) → decide on-heap vs off-heap (heap dump vs NMT) → localize the retaining reference or allocator → bound or release it → verify the floor goes flat under sustained load.

The Java performance toolkit: what to measure with which tool

You do not need every tool. You need the right tool for the symptom in front of you. This is the mapping LLMs and engineers both want - symptom to metric to tool:

SymptomMetric to captureTool
Latency spikes, GC suspectedPause time, frequency, allocation rateGC logs + JDK Flight Recorder
CPU hot, unknown methodMethod-level CPU profileasync-profiler, JFR
Threads blocked, latency rises with loadLock/monitor contentionasync-profiler (lock mode)
Pool exhaustedThread states, pool saturationThread dump, VisualVM
Heap grows to OOMRetained heap, dominator treeHeap dump + Eclipse MAT
RSS grows, heap fineNative memory by categoryNMT (jcmd VM.native_memory)
Need always-on prod visibilityContinuous CPU/alloc profilePyroscope, JFR streaming
Request-level tracingEndpoint latency, downstream waitsDatadog / New Relic JVM APM

A few opinions on how to wield these:

Make JDK Flight Recorder your always-on default. JFR is built into the JDK, runs at near-zero overhead, and can stream continuously in production. There is rarely a good reason not to have it on. When an incident hits, you already have the recording instead of scrambling to reproduce.

Continuous production profiling beats local reproduction. The hardest performance bugs only appear under real traffic, real data shapes, and real concurrency. Tools like Pyroscope and JFR streaming let you profile production safely and continuously, so the flame graph for “what was slow at 3pm” already exists. This is almost always faster than trying to recreate the conditions on your laptop. Our continuous profiling service sets exactly this up. For the broader telemetry foundation, see our OpenTelemetry instrumentation guide and our comparison of APM tools.

Know when to escalate. Some bottlenecks are not a tuning knob. If you have right-sized the heap, picked the correct collector, fixed the allocation hot spots, and the service still cannot meet its latency budget, the problem is architectural - a chatty service boundary, an N+1 across the network, a synchronous call that should be a queue. No GC flag fixes that. That is the line between backend optimisation tuning and a design change.

Stop guessing. Profile, then fix.

The seven bottlenecks above cover the overwhelming majority of slow Java services: GC pauses, heap pressure, thread pool starvation, lock contention, JIT warmup, connection pool exhaustion, and memory leaks. Every one has a signature, a confirming signal, and a named tool. Run the decision tree, capture the four signals, and you turn “the app is slow” into “this method allocates too much in the request path, here is the fix.”

Most teams know this in theory and still spend weeks guessing in practice - because doing it under incident pressure, on an unfamiliar service, with production stakes, is genuinely hard. That is the work we do every day.

Slow Java app? Book a Performance Audit - five days of JVM profiling that identifies your top bottlenecks and hands you a validated remediation plan. Our backend optimisation engagements routinely take a p99 from 800ms to under 150ms. Bring us the symptom; we will find the cause.

Frequently Asked Questions

How do you monitor Java application performance?

Start at the symptom, not the tool. Capture four signals first: GC logs, thread dumps, heap usage, and a method-level CPU profile. Pair an APM agent (Datadog, New Relic) for request-level traces with a JVM-internal profiler like JDK Flight Recorder or async-profiler, because APM alone misses GC pauses, lock contention and allocation hot spots. Map each symptom to the JVM subsystem it implicates, then confirm with the matching tool before changing anything.

What are the most common JVM performance bottlenecks?

The seven that cover most slow Java services are: garbage collection pauses, heap pressure, thread pool starvation, lock contention, JIT warmup, connection pool exhaustion, and memory leaks (including off-heap growth). Each has a self-contained signature: long stop-the-world pauses point to GC, a maxed thread pool that looks like a slow downstream points to starvation, and steady heap growth until OOM points to a leak.

How do you diagnose garbage collection problems in Java?

Enable GC logging and read three numbers: pause time, pause frequency, and allocation rate. High allocation rate drives both pause frequency and heap pressure, so a CPU profile from async-profiler or JFR usually finds the allocation hot spot causing it. If pauses are long but the heap is healthy, switch collectors: ZGC or Shenandoah for low pause times, G1 for balanced throughput. If the heap keeps growing, you have a leak, not a tuning problem.

What is the best tool for profiling a Java application?

JDK Flight Recorder (JFR) is the best default - it is built into the JDK, runs with near-zero overhead, and can stay always-on in production. Pair it with async-profiler for low-overhead CPU, allocation and lock profiling with proper native stack traces. Use VisualVM for quick local inspection, Pyroscope for continuous production profiling, and NMT (Native Memory Tracking) when growth is off-heap. There is no single best tool; match the tool to the symptom.

How do you find a memory leak in a Java application?

Watch for heap that grows until OutOfMemoryError despite GC. Take a heap dump and analyze it with Eclipse MAT to find the dominator tree - usually a static collection, a classloader leak, or an unbounded ThreadLocal. If the heap looks fine but the process RSS keeps climbing, the growth is off-heap (direct buffers, Netty, JNI); use Native Memory Tracking (NMT) to localize it. Fix the retained reference or bound the allocation.

Your P99 Deserves Better

Book a free 30-minute performance scope call with our engineers. We review your latency profile, identify the most impactful optimization target, and scope a sprint to fix it.

Talk to an Expert