APIs That Respond in Milliseconds, Not Seconds
Specialised API performance engineering for REST, GraphQL, and gRPC. Distributed tracing to find latency contributors, payload optimisation, and caching strategy — reducing P99 from seconds to milliseconds without changing your API contract.
You might be experiencing...
API performance is the critical path between your infrastructure and your users’ experience. A P99 latency of 2 seconds at the API layer means that 1 in 100 user interactions takes 2 seconds — and at scale, that 1% becomes thousands of users per hour. Distributed tracing is the tool that makes API latency tractable: by tracing every step of a request from entry point through every service call and database query, we identify exactly where time is spent.
GraphQL performance deserves specific attention because the resolver pattern creates a structural temptation toward N+1 problems: each field resolver may independently query the database, producing O(n) database calls for a query returning n records. DataLoader batching collapses those into O(1) batched calls, often reducing database load by an order of magnitude. We find N+1 resolver patterns in the majority of GraphQL APIs we audit.
Payload optimisation is the other major lever: REST APIs that return every field regardless of client need transfer unnecessary data, slowing responses, increasing bandwidth costs, and forcing clients to parse data they discard. Response compression, field selection (sparse fieldsets), and pagination design each reduce payload size. For mobile clients in particular, payload reduction from 48MB to 2MB represents a 96% reduction in parsing time and data transfer cost.
Engagement Phases
Distributed Tracing Analysis
We instrument your API endpoints with OpenTelemetry (or connect to your existing Jaeger/Zipkin/Datadog APM) to produce per-request waterfall traces. For GraphQL, we add resolver-level tracing to identify N+1 resolver patterns and slow field resolution. We build flame graphs mapping latency to specific code paths.
Payload & Protocol Optimisation
We analyse payload size and structure for unnecessary data transfer: over-fetching in REST (fields returned that no client uses), N+1 resolvers in GraphQL (DataLoader implementation or query batching), serialisation overhead (JSON vs MessagePack vs Protobuf evaluation). We implement response compression, field selection, and pagination optimisation.
Caching Strategy Implementation
We design and implement a caching strategy appropriate for your API's consistency requirements: Redis or Memcached for mutable data with event-driven invalidation, HTTP cache headers for public endpoints, CDN caching for authenticated-but-cacheable responses. We instrument cache hit rates and validate under load.
Deliverables
Before & After
| Metric | Before | After |
|---|---|---|
| P99 API latency | 2.4 s | 180 ms |
| Payload size | 48 MB | 2.1 MB |
| Cache hit rate | 0% | 87% |
Tools We Use
Frequently Asked Questions
Can you optimise GraphQL APIs specifically?
Yes. GraphQL performance requires different tooling from REST — resolver tracing, DataLoader for batching, query complexity analysis, and persisted queries. We add resolver-level spans to identify which fields are slow and why, and implement DataLoader patterns to eliminate N+1 resolver chains. Most GraphQL APIs we audit have significant N+1 problems at the resolver layer.
How do you handle caching for APIs with personalised responses?
Personalised responses require per-user cache keys, which changes the cache hit rate economics significantly. For personalised APIs, we focus on partial caching — caching the slow, shared parts of the response (product catalog, pricing, permissions) and assembling the personalised elements cheaply. We also evaluate edge-side includes and stale-while-revalidate patterns for near-real-time personalisation.
What about gRPC APIs?
gRPC APIs typically have better serialisation efficiency than JSON REST by default (Protobuf), but they can still suffer from missing streaming, inefficient connection management, and unary calls where server streaming would be more appropriate. We profile gRPC services with gRPC interceptors and analyse the generated Protobuf schemas for unnecessary complexity.
Your P99 Deserves Better
Book a free 30-minute performance scope call with our engineers. We review your latency profile, identify the most impactful optimization target, and scope a sprint to fix it.
Talk to an Expert