APIs That Respond in Milliseconds, Not Seconds

Specialised API performance engineering for REST, GraphQL, and gRPC. Distributed tracing to find latency contributors, payload optimisation, and caching strategy — reducing P99 from seconds to milliseconds without changing your API contract.

Duration: 5 days Team: 1 Senior Performance Engineer

The Challenge

You might be experiencing...

Your GraphQL API has a P99 of 2.4 seconds and you can't tell which resolvers are the problem

API responses are 48MB because every field is included regardless of what the client needs

Mobile clients are suffering because your REST API returns desktop-sized payloads

Caching is on the roadmap but every time you discuss it, the invalidation complexity kills the conversation

API performance is the critical path between your infrastructure and your users’ experience. A P99 latency of 2 seconds at the API layer means that 1 in 100 user interactions takes 2 seconds — and at scale, that 1% becomes thousands of users per hour. Distributed tracing is the tool that makes API latency tractable: by tracing every step of a request from entry point through every service call and database query, we identify exactly where time is spent.

GraphQL performance deserves specific attention because the resolver pattern creates a structural temptation toward N+1 problems: each field resolver may independently query the database, producing O(n) database calls for a query returning n records. DataLoader batching collapses those into O(1) batched calls, often reducing database load by an order of magnitude. We find N+1 resolver patterns in the majority of GraphQL APIs we audit.

Payload optimisation is the other major lever: REST APIs that return every field regardless of client need transfer unnecessary data, slowing responses, increasing bandwidth costs, and forcing clients to parse data they discard. Response compression, field selection (sparse fieldsets), and pagination design each reduce payload size. For mobile clients in particular, payload reduction from 48MB to 2MB represents a 96% reduction in parsing time and data transfer cost.

Our Approach

Engagement Phases

Days 1–2

Distributed Tracing Analysis

We instrument your API endpoints with OpenTelemetry (or connect to your existing Jaeger/Zipkin/Datadog APM) to produce per-request waterfall traces. For GraphQL, we add resolver-level tracing to identify N+1 resolver patterns and slow field resolution. We build flame graphs mapping latency to specific code paths.

Days 3–4

Payload & Protocol Optimisation

We analyse payload size and structure for unnecessary data transfer: over-fetching in REST (fields returned that no client uses), N+1 resolvers in GraphQL (DataLoader implementation or query batching), serialisation overhead (JSON vs MessagePack vs Protobuf evaluation). We implement response compression, field selection, and pagination optimisation.

Day 5

Caching Strategy Implementation

We design and implement a caching strategy appropriate for your API's consistency requirements: Redis or Memcached for mutable data with event-driven invalidation, HTTP cache headers for public endpoints, CDN caching for authenticated-but-cacheable responses. We instrument cache hit rates and validate under load.

What You Get

Deliverables

Distributed trace analysis with per-endpoint latency breakdown

GraphQL resolver performance report (if applicable)

Payload analysis with before/after size comparison

Caching strategy document with TTL and invalidation design

Load test benchmark: P50/P95/P99 before and after

Expected Outcomes

Before & After

Metric	Before	After
P99 API latency	2.4 s	180 ms
Payload size	48 MB	2.1 MB
Cache hit rate	0%	87%

Technology

Tools We Use

Jaeger / OpenTelemetry Flame graphs Redis / Memcached

Common Questions

Frequently Asked Questions

Can you optimise GraphQL APIs specifically?

Yes. GraphQL performance requires different tooling from REST — resolver tracing, DataLoader for batching, query complexity analysis, and persisted queries. We add resolver-level spans to identify which fields are slow and why, and implement DataLoader patterns to eliminate N+1 resolver chains. Most GraphQL APIs we audit have significant N+1 problems at the resolver layer.

How do you handle caching for APIs with personalised responses?

Personalised responses require per-user cache keys, which changes the cache hit rate economics significantly. For personalised APIs, we focus on partial caching — caching the slow, shared parts of the response (product catalog, pricing, permissions) and assembling the personalised elements cheaply. We also evaluate edge-side includes and stale-while-revalidate patterns for near-real-time personalisation.

What about gRPC APIs?

gRPC APIs typically have better serialisation efficiency than JSON REST by default (Protobuf), but they can still suffer from missing streaming, inefficient connection management, and unary calls where server streaming would be more appropriate. We profile gRPC services with gRPC interceptors and analyse the generated Protobuf schemas for unnecessary complexity.

Your P99 Deserves Better

Book a free 30-minute performance scope call with our engineers. We review your latency profile, identify the most impactful optimization target, and scope a sprint to fix it.

Talk to an Expert