Services

Resources

Company

Our Work

Blog

Schedule a Meet

Back to Blog

May 23, 2026 | 6 min read

Building Observable Go APIs With OpenTelemetry & Prometheus

Spandan Ghosh

Content @One2N

Back to Blog

May 23, 2026 | 6 min read

Building Observable Go APIs With OpenTelemetry & Prometheus

Spandan Ghosh

Content @One2N

Back to Blog

May 23, 2026 | 6 min read

Building Observable Go APIs With OpenTelemetry & Prometheus

Spandan Ghosh

Content @One2N

Teams rarely set out to build black-box APIs, yet that is exactly how many of them behave in production. When latency spikes or errors appear, teams often know something is wrong but still cannot explain why quickly enough.

In this post, we’ll build a small Go API in a way that makes it observable from day zero, not after the first painful incident. We’ll use OpenTelemetry for traces and context propagation, Go’s slog for structured logs, and Prometheus for metrics, because the goal is not to collect more telemetry but to make production behaviour explainable.

If you want to get comfortable building production-grade REST APIs in Go first, start with our Go Bootcamp.

What observable means

Monitoring tells you whether an API is up; observability helps you explain strange behaviour you did not explicitly predict when you shipped it. For an API, that means being able to answer which requests are failing, how latency varies by endpoint or tenant, and what changed around the time things broke.

The key is that metrics, logs, and traces must reinforce each other. Metrics tell you that something moved, traces tell you where time went, and logs give you detailed request context.

The demo service

To make this a follow-along, we’ll build a tiny service with two endpoints:

GET /health for a quick health check.
GET /users/{id} that simulates a downstream dependency call.

By the end, the service will:

Return a request ID in every response.
Emit structured logs for every request.
Expose Prometheus metrics on /metrics.
Create traces that show request flow and downstream latency.

That design-first approach is the same mindset behind Why your Architecture should start with Questions, not boxes: the important operational choices are made before production traffic shows up.

Step 1: Start with the API contract

Most teams treat observability as an implementation detail, but some of the highest-value decisions belong in the API contract. A standard response shape gives clients a predictable model and gives operators a reliable way to correlate failures across systems.

Use a single response envelope like this:

type APIResponse struct {
	Data      any    `json:"data,omitempty"`
	Error     string `json:"error,omitempty"`
	ErrorCode string `json:"error_code,omitempty"`
	RequestID string `json:"request_id"`
}

And a small helper to write responses consistently:

func writeJSON(w http.ResponseWriter, status int, resp APIResponse) {
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(status)
	_ = json.NewEncoder(w).Encode(resp)
}

A good rule here is to standardise three things early:

A machine-readable error code like validation_error or upstream_timeout.
A human-readable error string.
A request ID returned to the caller.

That one decision makes dashboards easier to group, support tickets easier to trace, and logs easier to search. If you already design APIs for service boundaries, this also pairs well with One2N’s Backend Engineering practice page for framing APIs as long-lived operational surfaces.

Checkpoint

At this stage, your API does not need OTEL or Prometheus yet. It should simply return a stable JSON shape for both success and error paths.

Step 2: Bootstrap tracing and metrics early

If tracing and metrics are initialised late or inconsistently, gaps appear exactly where you need clarity most. Set up the OTEL SDK, propagators, and metrics exposure before any handler starts serving traffic.

A simple OTEL tracer bootstrap in Go can look like this:

package telemetry

import (
	"context"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func InitTracerProvider(ctx context.Context, serviceName string) (*sdktrace.TracerProvider, error) {
	exporter, err := otlptracehttp.New(ctx)
	if err != nil {
		return nil, err
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		),
	)

	return tp, nil
}

Expose Prometheus metrics on a separate endpoint:

go func() {
	mux := http.NewServeMux()
	mux.Handle("/metrics", promhttp.Handler())
	if err := http.ListenAndServe(":9090", mux); err != nil {
		log.Fatal(err)
	}
}()

This is a good place to add a note in the live post that /metrics can run on a separate internal port if you do not want to expose it on the public API surface.

Checkpoint

By now, your service should:

Start without handler changes.
Export traces to an OTLP-compatible backend.
Expose /metrics on port 9090.

Step 3: Assign and propagate request IDs

You cannot debug distributed systems without consistent request identity. Correlation IDs are what let one customer complaint become a searchable trail across logs, traces, and downstream calls.

A small middleware can attach a request ID to every request:

package middleware

import (
	"context"
	"net/http"

	"github.com/google/uuid"
)

type contextKey string

const RequestIDKey contextKey = "request_id"

func RequestID(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		requestID := r.Header.Get("X-Request-Id")
		if requestID == "" {
			requestID = uuid.NewString()
		}

		ctx := context.WithValue(r.Context(), RequestIDKey, requestID)
		w.Header().Set("X-Request-Id", requestID)

		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func GetRequestID(ctx context.Context) string {
	if v, ok := ctx.Value(RequestIDKey).(string); ok {
		return v
	}
	return ""
}

Then use it in your handlers:

func getUserHandler(w http.ResponseWriter, r *http.Request) {
	requestID := middleware.GetRequestID(r.Context())

	resp := APIResponse{
		Data: map[string]any{
			"id":   "123",
			"name": "Jane Doe",
		},
		RequestID: requestID,
	}

	writeJSON(w, http.StatusOK, resp)
}

This is the moment where the article should feel interactive. Make one request with curl, inspect the response headers, and confirm that the same request ID appears in both the response body and the X-Request-Id header.

Try it

curl -i http://localhost:8080/users/123

You should see:

X-Request-Id in the response headers.
request_id in the JSON body.

Step 4: Make logs structured and useful

Text logs are fine when skimming one process locally, but they break down fast across services and incidents. Observable APIs need logs that are structured, queryable, and rich enough to explain a specific request.

Go’s slog is a good default:

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
	Level: slog.LevelInfo,
}))

Log with request context in handlers:

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		requestID := middleware.GetRequestID(r.Context())
		userID := r.PathValue("id")

		logger.Info("fetching user",
			"request_id", requestID,
			"endpoint", "/users/{id}",
			"user_id", userID,
			"method", r.Method,
		)

		resp := APIResponse{
			Data: map[string]any{
				"id":   userID,
				"name": "Jane Doe",
			},
			RequestID: requestID,
		}

		writeJSON(w, http.StatusOK, resp)
	}
}

The practical rule is simple:

Log in JSON.
Use consistent field names.
Add business context only when it helps explain behaviour.

Avoid two common mistakes:

Free-form log messages with the important values buried in strings.
Logging every tiny event with no consistent keys, which makes production search noisy and expensive.

Checkpoint

Make one request and verify that your logs include:

request_id
endpoint
method
user_id where relevant.

Step 5: Add tracing at the boundaries

Metrics tell you that something is wrong; traces tell you where the time went. For APIs that call databases, caches, or third-party services, tracing stops being optional very quickly.

Start a span in the handler:

var tracer = otel.Tracer("users-api")

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		ctx, span := tracer.Start(r.Context(), "getUserHandler")
		defer span.End()

		userID := r.PathValue("id")
		requestID := middleware.GetRequestID(ctx)

		span.SetAttributes(
			attribute.String("http.method", r.Method),
			attribute.String("http.route", "/users/{id}"),
			attribute.String("app.request_id", requestID),
			attribute.String("app.user_id", userID),
		)

		user, err := fetchUserFromDownstream(ctx, userID)
		if err != nil {
			span.RecordError(err)
			span.SetStatus(codes.Error, "downstream failure")

			writeJSON(w, http.StatusBadGateway, APIResponse{
				Error:     "failed to fetch user",
				ErrorCode: "upstream_timeout",
				RequestID: requestID,
			})
			return
		}

		writeJSON(w, http.StatusOK, APIResponse{
			Data:      user,
			RequestID: requestID,
		})
	}
}

Wrap the downstream boundary too:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

The key idea is to instrument boundaries, not every tiny function. Handlers, external calls, DB queries, and cache lookups are usually enough to make traces useful without turning them noisy.

Checkpoint

By now, one request to /users/123 should produce:

A parent span for the handler.
A child span for the downstream fetch.
Shared request context between the trace and logs.

Step 6: Expose the right metrics

APIs produce a lot of numbers, but only a few matter every day. Start with SLI-shaped metrics: latency, throughput, and error rate.

A simple request counter and latency histogram in Prometheus might look like this:

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of HTTP requests",
		},
		[]string{"method", "route", "status"},
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "HTTP request latency",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"method", "route", "status"},
	)
)

And wrap handlers with instrumentation:

func instrument(route string, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		rw := &statusRecorder{ResponseWriter: w, statusCode: http.StatusOK}
		next.ServeHTTP(rw, r)

		status := strconv.Itoa(rw.statusCode)
		duration := time.Since(start).Seconds()

		httpRequestsTotal.WithLabelValues(r.Method, route, status).Inc()
		httpRequestDuration.WithLabelValues(r.Method, route, status).Observe(duration)
	})
}

type statusRecorder struct {
	http.ResponseWriter
	statusCode int
}

func (r *statusRecorder) WriteHeader(statusCode int) {
	r.statusCode = statusCode
	r.ResponseWriter.WriteHeader(statusCode)
}

A useful note here is what not to label:

Do not label metrics with raw user_id or request_id, because that creates high-cardinality metrics that become hard to store and query efficiently.
Prefer stable, low-cardinality dimensions such as route, method, status, tenant tier, or region when needed.

This is also a natural spot to reference One2N’s CI/CD page or SRE Bootcamp, since these metrics become most useful when tied to release and reliability workflows.

Checkpoint

Hit the API a few times and then open /metrics. You should be able to find:

http_requests_total
http_request_duration_seconds.

Step 7: Break it on purpose

A follow-along post becomes more useful when readers can observe a failure, not just a healthy request path. So let’s simulate a flaky downstream dependency.

Change the downstream function:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	if userID == "500" {
		err := errors.New("downstream timeout")
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		time.Sleep(800 * time.Millisecond)
		return nil, err
	}

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

Now call:

curl -i http://localhost:8080/users/500

You should now see the three signals line up:

The response returns an error payload with error_code=upstream_timeout and a request_id.
The logs show the same request ID and an error path.
The trace shows the failing downstream span and the extra latency.
The metrics reflect a slower request and a 502 response.

This is also where a real-world internal cross-link helps. One2N’s How one parameter stalled Postgres Replication with Debezium for 3 Weeks is a good companion read because it shows why production debugging gets expensive when systems are operationally opaque.

Step 8: Make observability part of delivery

If observability only appears at the end of implementation, it will always feel like overhead. The better pattern is to make it part of how endpoints are shipped.

A practical team checklist looks like this:

Every new endpoint returns a request ID.
Every handler emits structured logs with stable keys.
Every external dependency call has a span around it.
Every critical path exposes latency and error metrics.
Dashboards and alerts live in version control alongside code where possible.

That “observability as code” mindset lines up well with One2N’s Gitops for Kafka in the real world, where operational control improves when systems are expressed explicitly rather than left implicit.

What you should have now

At this point, your small Go API should do five useful things:

Return consistent response envelopes.
Attach and propagate request IDs.
Emit structured JSON logs.
Expose Prometheus metrics.
Create traces that explain downstream latency and failure paths.

That does not make the service perfect. It does make it far easier to operate, debug, and evolve under real traffic.

Before you ship

Before you ship the next API or feature, check for these basics:

Every request has a correlation ID that flows across services and appears in responses.
Logs are structured, centralised, and enriched with useful request context.
Key SLIs such as latency, throughput, and error rate exist for critical endpoints.
Traces show boundary spans for downstream calls and capture failure states.
Metrics, alerts, and dashboard assumptions are treated as part of delivery, not post-release cleanup.

That is what “observable from day zero” really means in practice

If you want to get comfortable building production-grade REST APIs in Go first, start with our Go Bootcamp.

What observable means

The key is that metrics, logs, and traces must reinforce each other. Metrics tell you that something moved, traces tell you where time went, and logs give you detailed request context.

The demo service

To make this a follow-along, we’ll build a tiny service with two endpoints:

GET /health for a quick health check.
GET /users/{id} that simulates a downstream dependency call.

By the end, the service will:

Return a request ID in every response.
Emit structured logs for every request.
Expose Prometheus metrics on /metrics.
Create traces that show request flow and downstream latency.

That design-first approach is the same mindset behind Why your Architecture should start with Questions, not boxes: the important operational choices are made before production traffic shows up.

Step 1: Start with the API contract

Use a single response envelope like this:

type APIResponse struct {
	Data      any    `json:"data,omitempty"`
	Error     string `json:"error,omitempty"`
	ErrorCode string `json:"error_code,omitempty"`
	RequestID string `json:"request_id"`
}

And a small helper to write responses consistently:

func writeJSON(w http.ResponseWriter, status int, resp APIResponse) {
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(status)
	_ = json.NewEncoder(w).Encode(resp)
}

A good rule here is to standardise three things early:

A machine-readable error code like validation_error or upstream_timeout.
A human-readable error string.
A request ID returned to the caller.

Checkpoint

At this stage, your API does not need OTEL or Prometheus yet. It should simply return a stable JSON shape for both success and error paths.

Step 2: Bootstrap tracing and metrics early

A simple OTEL tracer bootstrap in Go can look like this:

package telemetry

import (
	"context"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func InitTracerProvider(ctx context.Context, serviceName string) (*sdktrace.TracerProvider, error) {
	exporter, err := otlptracehttp.New(ctx)
	if err != nil {
		return nil, err
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		),
	)

	return tp, nil
}

Expose Prometheus metrics on a separate endpoint:

go func() {
	mux := http.NewServeMux()
	mux.Handle("/metrics", promhttp.Handler())
	if err := http.ListenAndServe(":9090", mux); err != nil {
		log.Fatal(err)
	}
}()

This is a good place to add a note in the live post that /metrics can run on a separate internal port if you do not want to expose it on the public API surface.

Checkpoint

By now, your service should:

Start without handler changes.
Export traces to an OTLP-compatible backend.
Expose /metrics on port 9090.

Step 3: Assign and propagate request IDs

You cannot debug distributed systems without consistent request identity. Correlation IDs are what let one customer complaint become a searchable trail across logs, traces, and downstream calls.

A small middleware can attach a request ID to every request:

package middleware

import (
	"context"
	"net/http"

	"github.com/google/uuid"
)

type contextKey string

const RequestIDKey contextKey = "request_id"

func RequestID(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		requestID := r.Header.Get("X-Request-Id")
		if requestID == "" {
			requestID = uuid.NewString()
		}

		ctx := context.WithValue(r.Context(), RequestIDKey, requestID)
		w.Header().Set("X-Request-Id", requestID)

		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func GetRequestID(ctx context.Context) string {
	if v, ok := ctx.Value(RequestIDKey).(string); ok {
		return v
	}
	return ""
}

Then use it in your handlers:

func getUserHandler(w http.ResponseWriter, r *http.Request) {
	requestID := middleware.GetRequestID(r.Context())

	resp := APIResponse{
		Data: map[string]any{
			"id":   "123",
			"name": "Jane Doe",
		},
		RequestID: requestID,
	}

	writeJSON(w, http.StatusOK, resp)
}

Try it

curl -i http://localhost:8080/users/123

You should see:

X-Request-Id in the response headers.
request_id in the JSON body.

Step 4: Make logs structured and useful

Go’s slog is a good default:

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
	Level: slog.LevelInfo,
}))

Log with request context in handlers:

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		requestID := middleware.GetRequestID(r.Context())
		userID := r.PathValue("id")

		logger.Info("fetching user",
			"request_id", requestID,
			"endpoint", "/users/{id}",
			"user_id", userID,
			"method", r.Method,
		)

		resp := APIResponse{
			Data: map[string]any{
				"id":   userID,
				"name": "Jane Doe",
			},
			RequestID: requestID,
		}

		writeJSON(w, http.StatusOK, resp)
	}
}

The practical rule is simple:

Log in JSON.
Use consistent field names.
Add business context only when it helps explain behaviour.

Avoid two common mistakes:

Free-form log messages with the important values buried in strings.
Logging every tiny event with no consistent keys, which makes production search noisy and expensive.

Checkpoint

Make one request and verify that your logs include:

request_id
endpoint
method
user_id where relevant.

Step 5: Add tracing at the boundaries

Metrics tell you that something is wrong; traces tell you where the time went. For APIs that call databases, caches, or third-party services, tracing stops being optional very quickly.

Start a span in the handler:

var tracer = otel.Tracer("users-api")

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		ctx, span := tracer.Start(r.Context(), "getUserHandler")
		defer span.End()

		userID := r.PathValue("id")
		requestID := middleware.GetRequestID(ctx)

		span.SetAttributes(
			attribute.String("http.method", r.Method),
			attribute.String("http.route", "/users/{id}"),
			attribute.String("app.request_id", requestID),
			attribute.String("app.user_id", userID),
		)

		user, err := fetchUserFromDownstream(ctx, userID)
		if err != nil {
			span.RecordError(err)
			span.SetStatus(codes.Error, "downstream failure")

			writeJSON(w, http.StatusBadGateway, APIResponse{
				Error:     "failed to fetch user",
				ErrorCode: "upstream_timeout",
				RequestID: requestID,
			})
			return
		}

		writeJSON(w, http.StatusOK, APIResponse{
			Data:      user,
			RequestID: requestID,
		})
	}
}

Wrap the downstream boundary too:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

The key idea is to instrument boundaries, not every tiny function. Handlers, external calls, DB queries, and cache lookups are usually enough to make traces useful without turning them noisy.

Checkpoint

By now, one request to /users/123 should produce:

A parent span for the handler.
A child span for the downstream fetch.
Shared request context between the trace and logs.

Step 6: Expose the right metrics

APIs produce a lot of numbers, but only a few matter every day. Start with SLI-shaped metrics: latency, throughput, and error rate.

A simple request counter and latency histogram in Prometheus might look like this:

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of HTTP requests",
		},
		[]string{"method", "route", "status"},
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "HTTP request latency",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"method", "route", "status"},
	)
)

And wrap handlers with instrumentation:

func instrument(route string, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		rw := &statusRecorder{ResponseWriter: w, statusCode: http.StatusOK}
		next.ServeHTTP(rw, r)

		status := strconv.Itoa(rw.statusCode)
		duration := time.Since(start).Seconds()

		httpRequestsTotal.WithLabelValues(r.Method, route, status).Inc()
		httpRequestDuration.WithLabelValues(r.Method, route, status).Observe(duration)
	})
}

type statusRecorder struct {
	http.ResponseWriter
	statusCode int
}

func (r *statusRecorder) WriteHeader(statusCode int) {
	r.statusCode = statusCode
	r.ResponseWriter.WriteHeader(statusCode)
}

A useful note here is what not to label:

Do not label metrics with raw user_id or request_id, because that creates high-cardinality metrics that become hard to store and query efficiently.
Prefer stable, low-cardinality dimensions such as route, method, status, tenant tier, or region when needed.

This is also a natural spot to reference One2N’s CI/CD page or SRE Bootcamp, since these metrics become most useful when tied to release and reliability workflows.

Checkpoint

Hit the API a few times and then open /metrics. You should be able to find:

http_requests_total
http_request_duration_seconds.

Step 7: Break it on purpose

A follow-along post becomes more useful when readers can observe a failure, not just a healthy request path. So let’s simulate a flaky downstream dependency.

Change the downstream function:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	if userID == "500" {
		err := errors.New("downstream timeout")
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		time.Sleep(800 * time.Millisecond)
		return nil, err
	}

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

Now call:

curl -i http://localhost:8080/users/500

You should now see the three signals line up:

The response returns an error payload with error_code=upstream_timeout and a request_id.
The logs show the same request ID and an error path.
The trace shows the failing downstream span and the extra latency.
The metrics reflect a slower request and a 502 response.

Step 8: Make observability part of delivery

If observability only appears at the end of implementation, it will always feel like overhead. The better pattern is to make it part of how endpoints are shipped.

A practical team checklist looks like this:

Every new endpoint returns a request ID.
Every handler emits structured logs with stable keys.
Every external dependency call has a span around it.
Every critical path exposes latency and error metrics.
Dashboards and alerts live in version control alongside code where possible.

What you should have now

At this point, your small Go API should do five useful things:

Return consistent response envelopes.
Attach and propagate request IDs.
Emit structured JSON logs.
Expose Prometheus metrics.
Create traces that explain downstream latency and failure paths.

That does not make the service perfect. It does make it far easier to operate, debug, and evolve under real traffic.

Before you ship

Before you ship the next API or feature, check for these basics:

Every request has a correlation ID that flows across services and appears in responses.
Logs are structured, centralised, and enriched with useful request context.
Key SLIs such as latency, throughput, and error rate exist for critical endpoints.
Traces show boundary spans for downstream calls and capture failure states.
Metrics, alerts, and dashboard assumptions are treated as part of delivery, not post-release cleanup.

That is what “observable from day zero” really means in practice

If you want to get comfortable building production-grade REST APIs in Go first, start with our Go Bootcamp.

What observable means

The key is that metrics, logs, and traces must reinforce each other. Metrics tell you that something moved, traces tell you where time went, and logs give you detailed request context.

The demo service

To make this a follow-along, we’ll build a tiny service with two endpoints:

GET /health for a quick health check.
GET /users/{id} that simulates a downstream dependency call.

By the end, the service will:

Return a request ID in every response.
Emit structured logs for every request.
Expose Prometheus metrics on /metrics.
Create traces that show request flow and downstream latency.

That design-first approach is the same mindset behind Why your Architecture should start with Questions, not boxes: the important operational choices are made before production traffic shows up.

Step 1: Start with the API contract

Use a single response envelope like this:

type APIResponse struct {
	Data      any    `json:"data,omitempty"`
	Error     string `json:"error,omitempty"`
	ErrorCode string `json:"error_code,omitempty"`
	RequestID string `json:"request_id"`
}

And a small helper to write responses consistently:

func writeJSON(w http.ResponseWriter, status int, resp APIResponse) {
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(status)
	_ = json.NewEncoder(w).Encode(resp)
}

A good rule here is to standardise three things early:

A machine-readable error code like validation_error or upstream_timeout.
A human-readable error string.
A request ID returned to the caller.

Checkpoint

At this stage, your API does not need OTEL or Prometheus yet. It should simply return a stable JSON shape for both success and error paths.

Step 2: Bootstrap tracing and metrics early

A simple OTEL tracer bootstrap in Go can look like this:

package telemetry

import (
	"context"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func InitTracerProvider(ctx context.Context, serviceName string) (*sdktrace.TracerProvider, error) {
	exporter, err := otlptracehttp.New(ctx)
	if err != nil {
		return nil, err
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		),
	)

	return tp, nil
}

Expose Prometheus metrics on a separate endpoint:

go func() {
	mux := http.NewServeMux()
	mux.Handle("/metrics", promhttp.Handler())
	if err := http.ListenAndServe(":9090", mux); err != nil {
		log.Fatal(err)
	}
}()

This is a good place to add a note in the live post that /metrics can run on a separate internal port if you do not want to expose it on the public API surface.

Checkpoint

By now, your service should:

Start without handler changes.
Export traces to an OTLP-compatible backend.
Expose /metrics on port 9090.

Step 3: Assign and propagate request IDs

You cannot debug distributed systems without consistent request identity. Correlation IDs are what let one customer complaint become a searchable trail across logs, traces, and downstream calls.

A small middleware can attach a request ID to every request:

package middleware

import (
	"context"
	"net/http"

	"github.com/google/uuid"
)

type contextKey string

const RequestIDKey contextKey = "request_id"

func RequestID(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		requestID := r.Header.Get("X-Request-Id")
		if requestID == "" {
			requestID = uuid.NewString()
		}

		ctx := context.WithValue(r.Context(), RequestIDKey, requestID)
		w.Header().Set("X-Request-Id", requestID)

		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func GetRequestID(ctx context.Context) string {
	if v, ok := ctx.Value(RequestIDKey).(string); ok {
		return v
	}
	return ""
}

Then use it in your handlers:

func getUserHandler(w http.ResponseWriter, r *http.Request) {
	requestID := middleware.GetRequestID(r.Context())

	resp := APIResponse{
		Data: map[string]any{
			"id":   "123",
			"name": "Jane Doe",
		},
		RequestID: requestID,
	}

	writeJSON(w, http.StatusOK, resp)
}

Try it

curl -i http://localhost:8080/users/123

You should see:

X-Request-Id in the response headers.
request_id in the JSON body.

Step 4: Make logs structured and useful

Go’s slog is a good default:

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
	Level: slog.LevelInfo,
}))

Log with request context in handlers:

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		requestID := middleware.GetRequestID(r.Context())
		userID := r.PathValue("id")

		logger.Info("fetching user",
			"request_id", requestID,
			"endpoint", "/users/{id}",
			"user_id", userID,
			"method", r.Method,
		)

		resp := APIResponse{
			Data: map[string]any{
				"id":   userID,
				"name": "Jane Doe",
			},
			RequestID: requestID,
		}

		writeJSON(w, http.StatusOK, resp)
	}
}

The practical rule is simple:

Log in JSON.
Use consistent field names.
Add business context only when it helps explain behaviour.

Avoid two common mistakes:

Free-form log messages with the important values buried in strings.
Logging every tiny event with no consistent keys, which makes production search noisy and expensive.

Checkpoint

Make one request and verify that your logs include:

request_id
endpoint
method
user_id where relevant.

Step 5: Add tracing at the boundaries

Metrics tell you that something is wrong; traces tell you where the time went. For APIs that call databases, caches, or third-party services, tracing stops being optional very quickly.

Start a span in the handler:

var tracer = otel.Tracer("users-api")

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		ctx, span := tracer.Start(r.Context(), "getUserHandler")
		defer span.End()

		userID := r.PathValue("id")
		requestID := middleware.GetRequestID(ctx)

		span.SetAttributes(
			attribute.String("http.method", r.Method),
			attribute.String("http.route", "/users/{id}"),
			attribute.String("app.request_id", requestID),
			attribute.String("app.user_id", userID),
		)

		user, err := fetchUserFromDownstream(ctx, userID)
		if err != nil {
			span.RecordError(err)
			span.SetStatus(codes.Error, "downstream failure")

			writeJSON(w, http.StatusBadGateway, APIResponse{
				Error:     "failed to fetch user",
				ErrorCode: "upstream_timeout",
				RequestID: requestID,
			})
			return
		}

		writeJSON(w, http.StatusOK, APIResponse{
			Data:      user,
			RequestID: requestID,
		})
	}
}

Wrap the downstream boundary too:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

The key idea is to instrument boundaries, not every tiny function. Handlers, external calls, DB queries, and cache lookups are usually enough to make traces useful without turning them noisy.

Checkpoint

By now, one request to /users/123 should produce:

A parent span for the handler.
A child span for the downstream fetch.
Shared request context between the trace and logs.

Step 6: Expose the right metrics

APIs produce a lot of numbers, but only a few matter every day. Start with SLI-shaped metrics: latency, throughput, and error rate.

A simple request counter and latency histogram in Prometheus might look like this:

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of HTTP requests",
		},
		[]string{"method", "route", "status"},
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "HTTP request latency",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"method", "route", "status"},
	)
)

And wrap handlers with instrumentation:

func instrument(route string, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		rw := &statusRecorder{ResponseWriter: w, statusCode: http.StatusOK}
		next.ServeHTTP(rw, r)

		status := strconv.Itoa(rw.statusCode)
		duration := time.Since(start).Seconds()

		httpRequestsTotal.WithLabelValues(r.Method, route, status).Inc()
		httpRequestDuration.WithLabelValues(r.Method, route, status).Observe(duration)
	})
}

type statusRecorder struct {
	http.ResponseWriter
	statusCode int
}

func (r *statusRecorder) WriteHeader(statusCode int) {
	r.statusCode = statusCode
	r.ResponseWriter.WriteHeader(statusCode)
}

A useful note here is what not to label:

Do not label metrics with raw user_id or request_id, because that creates high-cardinality metrics that become hard to store and query efficiently.
Prefer stable, low-cardinality dimensions such as route, method, status, tenant tier, or region when needed.

This is also a natural spot to reference One2N’s CI/CD page or SRE Bootcamp, since these metrics become most useful when tied to release and reliability workflows.

Checkpoint

Hit the API a few times and then open /metrics. You should be able to find:

http_requests_total
http_request_duration_seconds.

Step 7: Break it on purpose

A follow-along post becomes more useful when readers can observe a failure, not just a healthy request path. So let’s simulate a flaky downstream dependency.

Change the downstream function:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	if userID == "500" {
		err := errors.New("downstream timeout")
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		time.Sleep(800 * time.Millisecond)
		return nil, err
	}

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

Now call:

curl -i http://localhost:8080/users/500

You should now see the three signals line up:

The response returns an error payload with error_code=upstream_timeout and a request_id.
The logs show the same request ID and an error path.
The trace shows the failing downstream span and the extra latency.
The metrics reflect a slower request and a 502 response.

Step 8: Make observability part of delivery

If observability only appears at the end of implementation, it will always feel like overhead. The better pattern is to make it part of how endpoints are shipped.

A practical team checklist looks like this:

Every new endpoint returns a request ID.
Every handler emits structured logs with stable keys.
Every external dependency call has a span around it.
Every critical path exposes latency and error metrics.
Dashboards and alerts live in version control alongside code where possible.

What you should have now

At this point, your small Go API should do five useful things:

Return consistent response envelopes.
Attach and propagate request IDs.
Emit structured JSON logs.
Expose Prometheus metrics.
Create traces that explain downstream latency and failure paths.

That does not make the service perfect. It does make it far easier to operate, debug, and evolve under real traffic.

Before you ship

Before you ship the next API or feature, check for these basics:

Every request has a correlation ID that flows across services and appears in responses.
Logs are structured, centralised, and enriched with useful request context.
Key SLIs such as latency, throughput, and error rate exist for critical endpoints.
Traces show boundary spans for downstream calls and capture failure states.
Metrics, alerts, and dashboard assumptions are treated as part of delivery, not post-release cleanup.

That is what “observable from day zero” really means in practice

If you want to get comfortable building production-grade REST APIs in Go first, start with our Go Bootcamp.

What observable means

The key is that metrics, logs, and traces must reinforce each other. Metrics tell you that something moved, traces tell you where time went, and logs give you detailed request context.

The demo service

To make this a follow-along, we’ll build a tiny service with two endpoints:

GET /health for a quick health check.
GET /users/{id} that simulates a downstream dependency call.

By the end, the service will:

Return a request ID in every response.
Emit structured logs for every request.
Expose Prometheus metrics on /metrics.
Create traces that show request flow and downstream latency.

That design-first approach is the same mindset behind Why your Architecture should start with Questions, not boxes: the important operational choices are made before production traffic shows up.

Step 1: Start with the API contract

Use a single response envelope like this:

type APIResponse struct {
	Data      any    `json:"data,omitempty"`
	Error     string `json:"error,omitempty"`
	ErrorCode string `json:"error_code,omitempty"`
	RequestID string `json:"request_id"`
}

And a small helper to write responses consistently:

func writeJSON(w http.ResponseWriter, status int, resp APIResponse) {
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(status)
	_ = json.NewEncoder(w).Encode(resp)
}

A good rule here is to standardise three things early:

A machine-readable error code like validation_error or upstream_timeout.
A human-readable error string.
A request ID returned to the caller.

Checkpoint

At this stage, your API does not need OTEL or Prometheus yet. It should simply return a stable JSON shape for both success and error paths.

Step 2: Bootstrap tracing and metrics early

A simple OTEL tracer bootstrap in Go can look like this:

package telemetry

import (
	"context"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func InitTracerProvider(ctx context.Context, serviceName string) (*sdktrace.TracerProvider, error) {
	exporter, err := otlptracehttp.New(ctx)
	if err != nil {
		return nil, err
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		),
	)

	return tp, nil
}

Expose Prometheus metrics on a separate endpoint:

go func() {
	mux := http.NewServeMux()
	mux.Handle("/metrics", promhttp.Handler())
	if err := http.ListenAndServe(":9090", mux); err != nil {
		log.Fatal(err)
	}
}()

This is a good place to add a note in the live post that /metrics can run on a separate internal port if you do not want to expose it on the public API surface.

Checkpoint

By now, your service should:

Start without handler changes.
Export traces to an OTLP-compatible backend.
Expose /metrics on port 9090.

Step 3: Assign and propagate request IDs

You cannot debug distributed systems without consistent request identity. Correlation IDs are what let one customer complaint become a searchable trail across logs, traces, and downstream calls.

A small middleware can attach a request ID to every request:

package middleware

import (
	"context"
	"net/http"

	"github.com/google/uuid"
)

type contextKey string

const RequestIDKey contextKey = "request_id"

func RequestID(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		requestID := r.Header.Get("X-Request-Id")
		if requestID == "" {
			requestID = uuid.NewString()
		}

		ctx := context.WithValue(r.Context(), RequestIDKey, requestID)
		w.Header().Set("X-Request-Id", requestID)

		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func GetRequestID(ctx context.Context) string {
	if v, ok := ctx.Value(RequestIDKey).(string); ok {
		return v
	}
	return ""
}

Then use it in your handlers:

func getUserHandler(w http.ResponseWriter, r *http.Request) {
	requestID := middleware.GetRequestID(r.Context())

	resp := APIResponse{
		Data: map[string]any{
			"id":   "123",
			"name": "Jane Doe",
		},
		RequestID: requestID,
	}

	writeJSON(w, http.StatusOK, resp)
}

Try it

curl -i http://localhost:8080/users/123

You should see:

X-Request-Id in the response headers.
request_id in the JSON body.

Step 4: Make logs structured and useful

Go’s slog is a good default:

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
	Level: slog.LevelInfo,
}))

Log with request context in handlers:

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		requestID := middleware.GetRequestID(r.Context())
		userID := r.PathValue("id")

		logger.Info("fetching user",
			"request_id", requestID,
			"endpoint", "/users/{id}",
			"user_id", userID,
			"method", r.Method,
		)

		resp := APIResponse{
			Data: map[string]any{
				"id":   userID,
				"name": "Jane Doe",
			},
			RequestID: requestID,
		}

		writeJSON(w, http.StatusOK, resp)
	}
}

The practical rule is simple:

Log in JSON.
Use consistent field names.
Add business context only when it helps explain behaviour.

Avoid two common mistakes:

Free-form log messages with the important values buried in strings.
Logging every tiny event with no consistent keys, which makes production search noisy and expensive.

Checkpoint

Make one request and verify that your logs include:

request_id
endpoint
method
user_id where relevant.

Step 5: Add tracing at the boundaries

Metrics tell you that something is wrong; traces tell you where the time went. For APIs that call databases, caches, or third-party services, tracing stops being optional very quickly.

Start a span in the handler:

var tracer = otel.Tracer("users-api")

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		ctx, span := tracer.Start(r.Context(), "getUserHandler")
		defer span.End()

		userID := r.PathValue("id")
		requestID := middleware.GetRequestID(ctx)

		span.SetAttributes(
			attribute.String("http.method", r.Method),
			attribute.String("http.route", "/users/{id}"),
			attribute.String("app.request_id", requestID),
			attribute.String("app.user_id", userID),
		)

		user, err := fetchUserFromDownstream(ctx, userID)
		if err != nil {
			span.RecordError(err)
			span.SetStatus(codes.Error, "downstream failure")

			writeJSON(w, http.StatusBadGateway, APIResponse{
				Error:     "failed to fetch user",
				ErrorCode: "upstream_timeout",
				RequestID: requestID,
			})
			return
		}

		writeJSON(w, http.StatusOK, APIResponse{
			Data:      user,
			RequestID: requestID,
		})
	}
}

Wrap the downstream boundary too:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

The key idea is to instrument boundaries, not every tiny function. Handlers, external calls, DB queries, and cache lookups are usually enough to make traces useful without turning them noisy.

Checkpoint

By now, one request to /users/123 should produce:

A parent span for the handler.
A child span for the downstream fetch.
Shared request context between the trace and logs.

Step 6: Expose the right metrics

APIs produce a lot of numbers, but only a few matter every day. Start with SLI-shaped metrics: latency, throughput, and error rate.

A simple request counter and latency histogram in Prometheus might look like this:

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of HTTP requests",
		},
		[]string{"method", "route", "status"},
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "HTTP request latency",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"method", "route", "status"},
	)
)

And wrap handlers with instrumentation:

func instrument(route string, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		rw := &statusRecorder{ResponseWriter: w, statusCode: http.StatusOK}
		next.ServeHTTP(rw, r)

		status := strconv.Itoa(rw.statusCode)
		duration := time.Since(start).Seconds()

		httpRequestsTotal.WithLabelValues(r.Method, route, status).Inc()
		httpRequestDuration.WithLabelValues(r.Method, route, status).Observe(duration)
	})
}

type statusRecorder struct {
	http.ResponseWriter
	statusCode int
}

func (r *statusRecorder) WriteHeader(statusCode int) {
	r.statusCode = statusCode
	r.ResponseWriter.WriteHeader(statusCode)
}

A useful note here is what not to label:

Do not label metrics with raw user_id or request_id, because that creates high-cardinality metrics that become hard to store and query efficiently.
Prefer stable, low-cardinality dimensions such as route, method, status, tenant tier, or region when needed.

This is also a natural spot to reference One2N’s CI/CD page or SRE Bootcamp, since these metrics become most useful when tied to release and reliability workflows.

Checkpoint

Hit the API a few times and then open /metrics. You should be able to find:

http_requests_total
http_request_duration_seconds.

Step 7: Break it on purpose

A follow-along post becomes more useful when readers can observe a failure, not just a healthy request path. So let’s simulate a flaky downstream dependency.

Change the downstream function:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	if userID == "500" {
		err := errors.New("downstream timeout")
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		time.Sleep(800 * time.Millisecond)
		return nil, err
	}

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

Now call:

curl -i http://localhost:8080/users/500

You should now see the three signals line up:

The response returns an error payload with error_code=upstream_timeout and a request_id.
The logs show the same request ID and an error path.
The trace shows the failing downstream span and the extra latency.
The metrics reflect a slower request and a 502 response.

Step 8: Make observability part of delivery

If observability only appears at the end of implementation, it will always feel like overhead. The better pattern is to make it part of how endpoints are shipped.

A practical team checklist looks like this:

Every new endpoint returns a request ID.
Every handler emits structured logs with stable keys.
Every external dependency call has a span around it.
Every critical path exposes latency and error metrics.
Dashboards and alerts live in version control alongside code where possible.

What you should have now

At this point, your small Go API should do five useful things:

Return consistent response envelopes.
Attach and propagate request IDs.
Emit structured JSON logs.
Expose Prometheus metrics.
Create traces that explain downstream latency and failure paths.

That does not make the service perfect. It does make it far easier to operate, debug, and evolve under real traffic.

Before you ship

Before you ship the next API or feature, check for these basics:

Every request has a correlation ID that flows across services and appears in responses.
Logs are structured, centralised, and enriched with useful request context.
Key SLIs such as latency, throughput, and error rate exist for critical endpoints.
Traces show boundary spans for downstream calls and capture failure states.
Metrics, alerts, and dashboard assumptions are treated as part of delivery, not post-release cleanup.

That is what “observable from day zero” really means in practice

If you want to get comfortable building production-grade REST APIs in Go first, start with our Go Bootcamp.

What observable means

The key is that metrics, logs, and traces must reinforce each other. Metrics tell you that something moved, traces tell you where time went, and logs give you detailed request context.

The demo service

To make this a follow-along, we’ll build a tiny service with two endpoints:

GET /health for a quick health check.
GET /users/{id} that simulates a downstream dependency call.

By the end, the service will:

Return a request ID in every response.
Emit structured logs for every request.
Expose Prometheus metrics on /metrics.
Create traces that show request flow and downstream latency.

That design-first approach is the same mindset behind Why your Architecture should start with Questions, not boxes: the important operational choices are made before production traffic shows up.

Step 1: Start with the API contract

Use a single response envelope like this:

type APIResponse struct {
	Data      any    `json:"data,omitempty"`
	Error     string `json:"error,omitempty"`
	ErrorCode string `json:"error_code,omitempty"`
	RequestID string `json:"request_id"`
}

And a small helper to write responses consistently:

func writeJSON(w http.ResponseWriter, status int, resp APIResponse) {
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(status)
	_ = json.NewEncoder(w).Encode(resp)
}

A good rule here is to standardise three things early:

A machine-readable error code like validation_error or upstream_timeout.
A human-readable error string.
A request ID returned to the caller.

Checkpoint

At this stage, your API does not need OTEL or Prometheus yet. It should simply return a stable JSON shape for both success and error paths.

Step 2: Bootstrap tracing and metrics early

A simple OTEL tracer bootstrap in Go can look like this:

package telemetry

import (
	"context"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/propagation"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func InitTracerProvider(ctx context.Context, serviceName string) (*sdktrace.TracerProvider, error) {
	exporter, err := otlptracehttp.New(ctx)
	if err != nil {
		return nil, err
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(resource.NewWithAttributes(
			semconv.SchemaURL,
			semconv.ServiceName(serviceName),
		)),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		),
	)

	return tp, nil
}

Expose Prometheus metrics on a separate endpoint:

go func() {
	mux := http.NewServeMux()
	mux.Handle("/metrics", promhttp.Handler())
	if err := http.ListenAndServe(":9090", mux); err != nil {
		log.Fatal(err)
	}
}()

This is a good place to add a note in the live post that /metrics can run on a separate internal port if you do not want to expose it on the public API surface.

Checkpoint

By now, your service should:

Start without handler changes.
Export traces to an OTLP-compatible backend.
Expose /metrics on port 9090.

Step 3: Assign and propagate request IDs

You cannot debug distributed systems without consistent request identity. Correlation IDs are what let one customer complaint become a searchable trail across logs, traces, and downstream calls.

A small middleware can attach a request ID to every request:

package middleware

import (
	"context"
	"net/http"

	"github.com/google/uuid"
)

type contextKey string

const RequestIDKey contextKey = "request_id"

func RequestID(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		requestID := r.Header.Get("X-Request-Id")
		if requestID == "" {
			requestID = uuid.NewString()
		}

		ctx := context.WithValue(r.Context(), RequestIDKey, requestID)
		w.Header().Set("X-Request-Id", requestID)

		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func GetRequestID(ctx context.Context) string {
	if v, ok := ctx.Value(RequestIDKey).(string); ok {
		return v
	}
	return ""
}

Then use it in your handlers:

func getUserHandler(w http.ResponseWriter, r *http.Request) {
	requestID := middleware.GetRequestID(r.Context())

	resp := APIResponse{
		Data: map[string]any{
			"id":   "123",
			"name": "Jane Doe",
		},
		RequestID: requestID,
	}

	writeJSON(w, http.StatusOK, resp)
}

Try it

curl -i http://localhost:8080/users/123

You should see:

X-Request-Id in the response headers.
request_id in the JSON body.

Step 4: Make logs structured and useful

Go’s slog is a good default:

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
	Level: slog.LevelInfo,
}))

Log with request context in handlers:

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		requestID := middleware.GetRequestID(r.Context())
		userID := r.PathValue("id")

		logger.Info("fetching user",
			"request_id", requestID,
			"endpoint", "/users/{id}",
			"user_id", userID,
			"method", r.Method,
		)

		resp := APIResponse{
			Data: map[string]any{
				"id":   userID,
				"name": "Jane Doe",
			},
			RequestID: requestID,
		}

		writeJSON(w, http.StatusOK, resp)
	}
}

The practical rule is simple:

Log in JSON.
Use consistent field names.
Add business context only when it helps explain behaviour.

Avoid two common mistakes:

Free-form log messages with the important values buried in strings.
Logging every tiny event with no consistent keys, which makes production search noisy and expensive.

Checkpoint

Make one request and verify that your logs include:

request_id
endpoint
method
user_id where relevant.

Step 5: Add tracing at the boundaries

Metrics tell you that something is wrong; traces tell you where the time went. For APIs that call databases, caches, or third-party services, tracing stops being optional very quickly.

Start a span in the handler:

var tracer = otel.Tracer("users-api")

func getUserHandler(logger *slog.Logger) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		ctx, span := tracer.Start(r.Context(), "getUserHandler")
		defer span.End()

		userID := r.PathValue("id")
		requestID := middleware.GetRequestID(ctx)

		span.SetAttributes(
			attribute.String("http.method", r.Method),
			attribute.String("http.route", "/users/{id}"),
			attribute.String("app.request_id", requestID),
			attribute.String("app.user_id", userID),
		)

		user, err := fetchUserFromDownstream(ctx, userID)
		if err != nil {
			span.RecordError(err)
			span.SetStatus(codes.Error, "downstream failure")

			writeJSON(w, http.StatusBadGateway, APIResponse{
				Error:     "failed to fetch user",
				ErrorCode: "upstream_timeout",
				RequestID: requestID,
			})
			return
		}

		writeJSON(w, http.StatusOK, APIResponse{
			Data:      user,
			RequestID: requestID,
		})
	}
}

Wrap the downstream boundary too:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

The key idea is to instrument boundaries, not every tiny function. Handlers, external calls, DB queries, and cache lookups are usually enough to make traces useful without turning them noisy.

Checkpoint

By now, one request to /users/123 should produce:

A parent span for the handler.
A child span for the downstream fetch.
Shared request context between the trace and logs.

Step 6: Expose the right metrics

APIs produce a lot of numbers, but only a few matter every day. Start with SLI-shaped metrics: latency, throughput, and error rate.

A simple request counter and latency histogram in Prometheus might look like this:

var (
	httpRequestsTotal = promauto.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "Total number of HTTP requests",
		},
		[]string{"method", "route", "status"},
	)

	httpRequestDuration = promauto.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "http_request_duration_seconds",
			Help:    "HTTP request latency",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"method", "route", "status"},
	)
)

And wrap handlers with instrumentation:

func instrument(route string, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		rw := &statusRecorder{ResponseWriter: w, statusCode: http.StatusOK}
		next.ServeHTTP(rw, r)

		status := strconv.Itoa(rw.statusCode)
		duration := time.Since(start).Seconds()

		httpRequestsTotal.WithLabelValues(r.Method, route, status).Inc()
		httpRequestDuration.WithLabelValues(r.Method, route, status).Observe(duration)
	})
}

type statusRecorder struct {
	http.ResponseWriter
	statusCode int
}

func (r *statusRecorder) WriteHeader(statusCode int) {
	r.statusCode = statusCode
	r.ResponseWriter.WriteHeader(statusCode)
}

A useful note here is what not to label:

Do not label metrics with raw user_id or request_id, because that creates high-cardinality metrics that become hard to store and query efficiently.
Prefer stable, low-cardinality dimensions such as route, method, status, tenant tier, or region when needed.

This is also a natural spot to reference One2N’s CI/CD page or SRE Bootcamp, since these metrics become most useful when tied to release and reliability workflows.

Checkpoint

Hit the API a few times and then open /metrics. You should be able to find:

http_requests_total
http_request_duration_seconds.

Step 7: Break it on purpose

A follow-along post becomes more useful when readers can observe a failure, not just a healthy request path. So let’s simulate a flaky downstream dependency.

Change the downstream function:

func fetchUserFromDownstream(ctx context.Context, userID string) (map[string]any, error) {
	ctx, span := tracer.Start(ctx, "fetchUserFromDownstream")
	defer span.End()

	if userID == "500" {
		err := errors.New("downstream timeout")
		span.RecordError(err)
		span.SetStatus(codes.Error, err.Error())
		time.Sleep(800 * time.Millisecond)
		return nil, err
	}

	time.Sleep(120 * time.Millisecond)

	return map[string]any{
		"id":   userID,
		"name": "Jane Doe",
	}, nil
}

Now call:

curl -i http://localhost:8080/users/500

You should now see the three signals line up:

The response returns an error payload with error_code=upstream_timeout and a request_id.
The logs show the same request ID and an error path.
The trace shows the failing downstream span and the extra latency.
The metrics reflect a slower request and a 502 response.

Step 8: Make observability part of delivery

If observability only appears at the end of implementation, it will always feel like overhead. The better pattern is to make it part of how endpoints are shipped.

A practical team checklist looks like this:

Every new endpoint returns a request ID.
Every handler emits structured logs with stable keys.
Every external dependency call has a span around it.
Every critical path exposes latency and error metrics.
Dashboards and alerts live in version control alongside code where possible.

What you should have now

At this point, your small Go API should do five useful things:

Return consistent response envelopes.
Attach and propagate request IDs.
Emit structured JSON logs.
Expose Prometheus metrics.
Create traces that explain downstream latency and failure paths.

That does not make the service perfect. It does make it far easier to operate, debug, and evolve under real traffic.

Before you ship

Before you ship the next API or feature, check for these basics:

Every request has a correlation ID that flows across services and appears in responses.
Logs are structured, centralised, and enriched with useful request context.
Key SLIs such as latency, throughput, and error rate exist for critical endpoints.
Traces show boundary spans for downstream calls and capture failure states.
Metrics, alerts, and dashboard assumptions are treated as part of delivery, not post-release cleanup.

That is what “observable from day zero” really means in practice

In this post

Section

In this post

section

Keywords

Observability, Go, OpenTelemetry, Prometheus, API

Continue reading.

Read Blog

The Gotchas of OTEL collector processors for effective observability in K8s

Sanket Rajgiri

SRE @One2N

Spandan Ghosh

Content @One2N

Struggling to make sense of OpenTelemetry Collector processors for real-world projects? This blog breaks down what each OTEL processor actually does, where it matters, and shares real lessons from messy SRE problems like taming noisy data, surviving crashes, and staying under cost limits in Kubernetes.

January 26, 2026 | 6 min read

Read Blog

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

In this post, we shed light on why following DORA metrics without context might cause more harm than good. We also show how leaders can use a practical, strategic approach to connect DORA metrics to meaningful business outcomes.

September 3, 2025 | 14 min read

Read Blog

A pragmatic guide to get started with OpenTelemetry

Spandan Ghosh

Content @One2N

A guide explaining OTEL, monitoring vs. observability, telemetry’s pillars, OTEL instrumentation, and seamless backend migration. Includes practical migration strategies and trade-offs for SREs adopting OTEL.

August 20, 2025 | 5 min read

Read Blog

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 18, 2025 | 5 min read

Read Blog

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

April 29, 2025 | 9 min read

Read Blog

Setting up CloudWatch alerts for EKS clusters: insights and solutions

Mihir Bhagwat

SRE @One2N

Explore the process of setting up CloudWatch alerts for specific scenarios while managing a production AWS EKS instance, highlighting key insights and practical solutions.

August 30, 2024 | 6 min read

Read Blog

The Gotchas of OTEL collector processors for effective observability in K8s

Sanket Rajgiri

SRE @One2N

Spandan Ghosh

Content @One2N

Struggling to make sense of OpenTelemetry Collector processors for real-world projects? This blog breaks down what each OTEL processor actually does, where it matters, and shares real lessons from messy SRE problems like taming noisy data, surviving crashes, and staying under cost limits in Kubernetes.

January 26, 2026 | 6 min read

Read Blog

DORA Metrics: Useful or Fluff

Jaideep Khandelwal

CTO @One2N

In this post, we shed light on why following DORA metrics without context might cause more harm than good. We also show how leaders can use a practical, strategic approach to connect DORA metrics to meaningful business outcomes.

September 3, 2025 | 14 min read

Read Blog

A pragmatic guide to get started with OpenTelemetry

Spandan Ghosh

Content @One2N

A guide explaining OTEL, monitoring vs. observability, telemetry’s pillars, OTEL instrumentation, and seamless backend migration. Includes practical migration strategies and trade-offs for SREs adopting OTEL.

August 20, 2025 | 5 min read

Read Blog

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

June 18, 2025 | 5 min read

Blogs

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Services

Resources

Company

Building Observable Go APIs With OpenTelemetry & Prometheus

Building Observable Go APIs With OpenTelemetry & Prometheus

Building Observable Go APIs With OpenTelemetry & Prometheus

Building Observable Go APIs With OpenTelemetry & Prometheus

In this post

In this post

Section

Share

Share

Tags

In this post

Tags

Share

Keywords

Continue reading.

The Gotchas of OTEL collector processors for effective observability in K8s

DORA Metrics: Useful or Fluff

In this post, we shed light on why following DORA metrics without context might cause more harm than good. We also show how leaders can use a practical, strategic approach to connect DORA metrics to meaningful business outcomes.

A pragmatic guide to get started with OpenTelemetry

A guide explaining OTEL, monitoring vs. observability, telemetry’s pillars, OTEL instrumentation, and seamless backend migration. Includes practical migration strategies and trade-offs for SREs adopting OTEL.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

Setting up CloudWatch alerts for EKS clusters: insights and solutions

Explore the process of setting up CloudWatch alerts for specific scenarios while managing a production AWS EKS instance, highlighting key insights and practical solutions.

The Gotchas of OTEL collector processors for effective observability in K8s

DORA Metrics: Useful or Fluff

In this post, we shed light on why following DORA metrics without context might cause more harm than good. We also show how leaders can use a practical, strategic approach to connect DORA metrics to meaningful business outcomes.

A pragmatic guide to get started with OpenTelemetry

A guide explaining OTEL, monitoring vs. observability, telemetry’s pillars, OTEL instrumentation, and seamless backend migration. Includes practical migration strategies and trade-offs for SREs adopting OTEL.

Implementing secure error handling in Go for B2B SaaS applications

A centralized error handling library helped our team improve user experience and strengthen security in Go-based microservices by replacing leaky, inconsistent error messages with structured, user-friendly responses.

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content