March 5, 2026 · 7 min read · performance.qa

OpenTelemetry Instrumentation: From Zero to Distributed Tracing in 30 Minutes

Step-by-step OpenTelemetry setup guide - auto-instrument Node.js and Python apps, configure the Collector, and send traces to Grafana or Datadog.

OpenTelemetry (OTEL) is the open-source observability framework that has become the industry standard for distributed tracing, metrics, and logs. It provides a vendor-neutral instrumentation layer that lets you collect telemetry data once and send it to any backend - Grafana, Datadog, New Relic, Jaeger, or your own infrastructure.

Before OpenTelemetry, switching APM vendors meant re-instrumenting your entire application. With OTEL, you instrument once and change only the Collector configuration when you switch backends. This guide gets you from zero to working distributed traces in a Node.js or Python application, with the Collector routing data to Grafana Tempo (free) or Datadog.

Why OpenTelemetry Matters

The problem OpenTelemetry solves is vendor lock-in at the instrumentation layer. Pre-OTEL, APM vendors provided proprietary SDK clients. Installing the Datadog SDK meant your instrumentation code was tightly coupled to Datadog. Migrating to New Relic required replacing all instrumentation calls.

OpenTelemetry provides a standard API that works with any OTEL-compatible backend. Your application code calls the OTEL API. The OTEL SDK collects the data. The OTEL Collector routes it to your chosen backend.

Key benefits:

Instrument once, send anywhere
Auto-instrumentation for popular frameworks (Express, FastAPI, Django, gRPC, etc.)
W3C TraceContext standard for trace propagation across service boundaries
Supported by every major observability vendor
CNCF Graduated project with strong governance and long-term support

OpenTelemetry Architecture

Understanding the three components prevents confusion:

OTEL SDK (in your application): The library your application code links against. Provides the API for creating spans, recording metrics, and emitting logs. Available for 12+ languages.

OTEL Collector (separate process): Receives telemetry from your applications, processes it (batching, sampling, filtering, transforming), and exports it to backends. Runs as a sidecar container or as a cluster-wide deployment.

Backend (observability tool): Receives processed telemetry from the Collector. Jaeger, Grafana Tempo, Zipkin, Datadog, New Relic, Honeycomb - anything that speaks OTLP (OpenTelemetry Protocol).

The typical data flow:

Application (OTEL SDK) --> OTEL Collector --> Backend (Grafana/Datadog/etc.)

Node.js Auto-Instrumentation

Auto-instrumentation patches popular libraries at startup without requiring code changes. Express routes, HTTP calls, database queries, Redis operations, and more are automatically traced.

Install the dependencies:

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-grpc \
  @opentelemetry/exporter-metrics-otlp-grpc \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions

Create instrumentation.js (must be required before any other module):

// instrumentation.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME || 'my-service',
    [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0',
    'deployment.environment': process.env.NODE_ENV || 'production',
  }),

  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-collector:4317',
  }),

  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-collector:4317',
    }),
    exportIntervalMillis: 30000,  // Export metrics every 30 seconds
  }),

  instrumentations: [
    getNodeAutoInstrumentations({
      // Instrument HTTP calls (includes Express route handling)
      '@opentelemetry/instrumentation-http': {
        enabled: true,
        // Filter out health check endpoints from traces
        ignoreIncomingRequestHook: (req) => req.url === '/health',
      },
      // Instrument PostgreSQL queries
      '@opentelemetry/instrumentation-pg': { enabled: true },
      // Instrument Redis operations
      '@opentelemetry/instrumentation-redis': { enabled: true },
      // Instrument gRPC calls
      '@opentelemetry/instrumentation-grpc': { enabled: true },
    }),
  ],
});

sdk.start();

// Ensure clean shutdown
process.on('SIGTERM', () => {
  sdk.shutdown()
    .then(() => console.log('OpenTelemetry SDK shut down successfully'))
    .catch((error) => console.error('Error shutting down OpenTelemetry SDK', error))
    .finally(() => process.exit(0));
});

Start your application with the instrumentation:

node -r ./instrumentation.js app.js
# Or set NODE_OPTIONS for automatic loading:
NODE_OPTIONS="--require ./instrumentation.js" node app.js

With this setup, every Express route, PostgreSQL query, and Redis operation will automatically generate traces with timing, status codes, and error information.

Python Auto-Instrumentation

Python auto-instrumentation uses a similar approach but leverages the opentelemetry-instrument command-line wrapper.

Install dependencies:

pip install opentelemetry-distro opentelemetry-exporter-otlp-proto-grpc
opentelemetry-bootstrap -a install  # Automatically installs instrumentation packages

Configure via environment variables (no code changes required for basic setup):

export OTEL_SERVICE_NAME="payment-service"
export OTEL_SERVICE_VERSION="2.1.0"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317"
export OTEL_TRACES_EXPORTER="otlp"
export OTEL_METRICS_EXPORTER="otlp"
export OTEL_LOGS_EXPORTER="otlp"
export OTEL_PYTHON_LOG_CORRELATION="true"  # Inject trace IDs into log records

# Start FastAPI app with auto-instrumentation
opentelemetry-instrument uvicorn main:app --host 0.0.0.0 --port 8000

Or configure via code for more control:

# otel_setup.py
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
import os

def configure_otel():
    resource = Resource.create({
        "service.name": os.getenv("SERVICE_NAME", "payment-service"),
        "service.version": os.getenv("SERVICE_VERSION", "1.0.0"),
        "deployment.environment": os.getenv("ENVIRONMENT", "production"),
    })

    # Configure tracing
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(
            OTLPSpanExporter(endpoint=os.getenv("OTEL_ENDPOINT", "http://otel-collector:4317"))
        )
    )
    trace.set_tracer_provider(tracer_provider)

    # Configure metrics
    reader = PeriodicExportingMetricReader(
        OTLPMetricExporter(endpoint=os.getenv("OTEL_ENDPOINT", "http://otel-collector:4317")),
        export_interval_millis=30000,
    )
    meter_provider = MeterProvider(resource=resource, metric_readers=[reader])
    metrics.set_meter_provider(meter_provider)

configure_otel()

OTEL Collector Configuration

The Collector is the routing layer between your applications and backends. Configure it as a Kubernetes deployment or sidecar.

otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  # Add memory limit to prevent OOM
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  # Batch telemetry before sending to reduce API calls
  batch:
    timeout: 5s
    send_batch_size: 1024

  # Add resource attributes to all telemetry
  resource:
    attributes:
      - key: environment
        value: "production"
        action: upsert

  # Drop noisy internal metrics
  filter:
    spans:
      exclude:
        match_type: regexp
        span_names:
          - "^.*health.*$"
          - "^.*readiness.*$"

exporters:
  # Send traces to Grafana Tempo
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  # Send metrics to Prometheus (scraped by Grafana)
  prometheus:
    endpoint: 0.0.0.0:8889

  # Optionally send to Datadog
  datadog:
    api:
      key: "${DD_API_KEY}"
      site: datadoghq.com

  # Debug logging (disable in production)
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource, filter]
      exporters: [otlp/tempo]

    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Deploy the Collector in Kubernetes:

# otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: observability
spec:
  replicas: 2
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
        - name: otel-collector
          image: otel/opentelemetry-collector-contrib:0.95.0
          args: ["--config=/conf/config.yaml"]
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP
            - containerPort: 8889  # Prometheus metrics
          volumeMounts:
            - name: collector-config
              mountPath: /conf
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 512Mi
      volumes:
        - name: collector-config
          configMap:
            name: otel-collector-config

Adding Custom Spans

Auto-instrumentation traces framework operations. For business logic, add custom spans manually.

Node.js custom spans:

const { trace, context, SpanStatusCode } = require('@opentelemetry/api');

const tracer = trace.getTracer('payment-service', '1.0.0');

async function processPayment(orderId, amount, paymentMethod) {
  // Create a span for the entire payment processing operation
  return tracer.startActiveSpan('process-payment', async (span) => {
    try {
      // Add business context as span attributes
      span.setAttributes({
        'order.id': orderId,
        'payment.amount': amount,
        'payment.method': paymentMethod,
        'payment.currency': 'USD',
      });

      // Child span for fraud check
      const fraudResult = await tracer.startActiveSpan('fraud-check', async (fraudSpan) => {
        try {
          const result = await fraudCheckService.check({ orderId, amount });
          fraudSpan.setAttribute('fraud.score', result.score);
          fraudSpan.setAttribute('fraud.decision', result.decision);
          return result;
        } finally {
          fraudSpan.end();
        }
      });

      if (fraudResult.decision === 'block') {
        span.setStatus({ code: SpanStatusCode.ERROR, message: 'Payment blocked by fraud check' });
        span.setAttribute('payment.outcome', 'blocked');
        throw new Error('Payment blocked');
      }

      // Child span for payment gateway call
      const chargeResult = await tracer.startActiveSpan('charge-gateway', async (chargeSpan) => {
        try {
          chargeSpan.setAttribute('gateway.name', paymentMethod.gateway);
          const result = await paymentGateway.charge({ amount, token: paymentMethod.token });
          chargeSpan.setAttribute('gateway.transaction_id', result.transactionId);
          return result;
        } finally {
          chargeSpan.end();
        }
      });

      span.setAttribute('payment.outcome', 'success');
      span.setAttribute('payment.transaction_id', chargeResult.transactionId);
      return chargeResult;

    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Common Pitfalls

Pitfall 1: Sampling too aggressively. Many teams set a 1% head-based sampling rate and then cannot find traces for rare events (errors, slow outliers). Use tail-based sampling in the Collector: sample 100% of traces that contain errors or exceed a latency threshold, and sample 1-10% of successful fast traces.

Pitfall 2: Not setting semantic conventions. OTEL defines standard attribute names for common concepts (http.method, db.system, rpc.service). Using standard names means your data works with pre-built dashboards and alerts in Grafana and other tools. Using custom names means you lose compatibility.

Pitfall 3: No resource attributes. Without service.name, service.version, and deployment.environment attributes, traces from different services look identical. Always set these in your SDK configuration.

Pitfall 4: Sending traces directly to the backend. Always use the Collector. It handles batching, retries on backend failures, and sampling - things that are complex and resource-intensive to implement in every application SDK.

Pitfall 5: Ignoring cardinality in metrics. OTEL metrics with high-cardinality attributes (user ID, request ID, IP address) will cause backend cardinality explosions. Use low-cardinality attributes for metrics (service name, endpoint path, status code range). Save high-cardinality data for trace attributes.

Distributed tracing implemented well gives engineering teams the ability to debug complex microservices performance problems in minutes instead of hours. Our observability setup service gets your team from zero to production-grade distributed tracing with proper sampling, dashboards, and alerting.

Your P99 Deserves Better

Book a free 30-minute performance scope call with our engineers. We review your latency profile, identify the most impactful optimization target, and scope a sprint to fix it.

Talk to an Expert