Services

Resources

Company

Our Work

Blog

Book a Call

Back to Blog

#DevSecOps

Jun 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

Back to Blog

#DevSecOps

Jun 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

Back to Blog

#DevSecOps

Jun 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

Back to Blog

#DevSecOps

Jun 18, 2025 | 5 min read

Implementing secure error handling in Go for B2B SaaS applications

Mohit Kumar

Software Engineer

In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.

I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.

The problem: incomprehensible errors and security leaks

Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.

The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:

A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.

Our initial setup

To understand the solution, let's first look at the current implementation:

We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (oapi-codegen) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.
Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.

Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:

// OpenAPI spec
post:
  summary: Set User Details
  responses:
    '200':
      $ref: '#/components/responses/SetUserSuccessResponse'
    // All error codes point to the same generic response
    '400':
      $ref: '#/components/responses/SetUserErrorResponse'
    '401':
      $ref: '#/components/responses/SetUserErrorResponse'
    '500':
      $ref: '#/components/responses/SetUserErrorResponse'

 // How we returned errors
 return server.SetUserErrorResponse{
		// This often exposed internal Go error messages directly
		Message: err.Error(),
 }, nil

This approach had several clear downsides:

Exposing Internal Errors: As seen above, err.Error() could easily leak internal system details.
Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.

The solution: a centralized error library

To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.

The library exposed a function which created a RestError. This RestError satisfied the Error interface and looked something like this:

type RestError struct {
	// Kind is the class of error, such as "database" or "permission".
	Kind Kind `json:"kind"`

	// Code is a human-readable, short representation of the error.
	Code Code `json:"code"`

	// Message is the user-friendly error message.
	Message string `json:"message"`
	// ... other useful fields like Param, Status, etc.

This structure allows us to categorize errors for easier identification and debugging:

The Kind field helps us understand the class of error (e.g., Database, Queue, Permission).
The Code field provides a more specific, machine-readable key (e.g., invalid_user_id). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.

With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:

// Before: Manually creating a response struct
return server.SetUserErrorResponse{ Message: err.Error() }, nil

// After: Using the new error library
return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"

Our new library also included several developer-friendly features:

It automatically determines the correct HTTP status code from the Kind and Code provided.
It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured RestError.
It handles nested errors and stack traces gracefully for easier debugging.

Handling upstream errors

The library solved error creation within a single service, but what about errors returned by the upstream services?

Previously, our user-facing service handled upstream errors with a flawed process:

Check if the upstream HTTP response status code was not 200 OK.
If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.

This approach had two major problems:

Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example: Login failed is still okay to be published, but LDAPFailed is not, because it exposes inner details.
Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.

The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.

With a new HandleUpstreamError function, our code became much cleaner:

if result.StatusCode() != http.StatusOK {
    // This function handles all the translation logic
	return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body

This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.

This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?

We solved it in 2 steps:

We first modified our public APIs to throw a non 200 response in a standard format, let’s call it ErrorResponse.
We wrote a middleware that does a few things, namely:
- It intercepts the final RestError before it's sent to the user and converts it to ErrorResponse
- It performs intelligent filtering. Using the Kind field, it decides what to expose and what to hide.
  - If the Kind is Database, Queue, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.
  - If the Kind is ResourceNotExists, Validation, or Unauthenticated, it allows the descriptive, user-safe message to pass through.
- It handles the formatting of the error messages by:
  - Applying basic grammatical corrections (Capitalization, Punctuation).
  - Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind ResourceNotExists, the suffix is - Please check your request or contact support if the issue persists.

Conclusion

The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.

The impact can be best described by the change in our API responses.

Before: A leaky and unhelpful error.

{
	"message" : "error getting user api key, err: no rows in result set"
}

After: A clear, secure, and actionable response.

{
    "code": "no_row_found",
    "message": "User does not have an API key. Please check your request or contact support if the issue persists.",
    "kind": "resource_not_exist",
    "status": 404,
    "error_id": "1234567890"
}

Limitations

While the new system works well for us, it's important to acknowledge its challenges and limitations:

Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.

The way forward

Our work isn't done. Addressing the limitations is our primary focus moving forward:

Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.

In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.

The problem: incomprehensible errors and security leaks

The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:

A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.

Our initial setup

To understand the solution, let's first look at the current implementation:

We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (oapi-codegen) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.
Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.

Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:

// OpenAPI spec
post:
  summary: Set User Details
  responses:
    '200':
      $ref: '#/components/responses/SetUserSuccessResponse'
    // All error codes point to the same generic response
    '400':
      $ref: '#/components/responses/SetUserErrorResponse'
    '401':
      $ref: '#/components/responses/SetUserErrorResponse'
    '500':
      $ref: '#/components/responses/SetUserErrorResponse'

 // How we returned errors
 return server.SetUserErrorResponse{
		// This often exposed internal Go error messages directly
		Message: err.Error(),
 }, nil

This approach had several clear downsides:

Exposing Internal Errors: As seen above, err.Error() could easily leak internal system details.
Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.

The solution: a centralized error library

The library exposed a function which created a RestError. This RestError satisfied the Error interface and looked something like this:

type RestError struct {
	// Kind is the class of error, such as "database" or "permission".
	Kind Kind `json:"kind"`

	// Code is a human-readable, short representation of the error.
	Code Code `json:"code"`

	// Message is the user-friendly error message.
	Message string `json:"message"`
	// ... other useful fields like Param, Status, etc.

This structure allows us to categorize errors for easier identification and debugging:

The Kind field helps us understand the class of error (e.g., Database, Queue, Permission).
The Code field provides a more specific, machine-readable key (e.g., invalid_user_id). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.

With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:

// Before: Manually creating a response struct
return server.SetUserErrorResponse{ Message: err.Error() }, nil

// After: Using the new error library
return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"

Our new library also included several developer-friendly features:

It automatically determines the correct HTTP status code from the Kind and Code provided.
It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured RestError.
It handles nested errors and stack traces gracefully for easier debugging.

Handling upstream errors

The library solved error creation within a single service, but what about errors returned by the upstream services?

Previously, our user-facing service handled upstream errors with a flawed process:

Check if the upstream HTTP response status code was not 200 OK.
If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.

This approach had two major problems:

Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example: Login failed is still okay to be published, but LDAPFailed is not, because it exposes inner details.
Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.

The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.

With a new HandleUpstreamError function, our code became much cleaner:

if result.StatusCode() != http.StatusOK {
    // This function handles all the translation logic
	return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body

We solved it in 2 steps:

We first modified our public APIs to throw a non 200 response in a standard format, let’s call it ErrorResponse.
We wrote a middleware that does a few things, namely:
- It intercepts the final RestError before it's sent to the user and converts it to ErrorResponse
- It performs intelligent filtering. Using the Kind field, it decides what to expose and what to hide.
  - If the Kind is Database, Queue, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.
  - If the Kind is ResourceNotExists, Validation, or Unauthenticated, it allows the descriptive, user-safe message to pass through.
- It handles the formatting of the error messages by:
  - Applying basic grammatical corrections (Capitalization, Punctuation).
  - Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind ResourceNotExists, the suffix is - Please check your request or contact support if the issue persists.

Conclusion

The impact can be best described by the change in our API responses.

Before: A leaky and unhelpful error.

{
	"message" : "error getting user api key, err: no rows in result set"
}

After: A clear, secure, and actionable response.

{
    "code": "no_row_found",
    "message": "User does not have an API key. Please check your request or contact support if the issue persists.",
    "kind": "resource_not_exist",
    "status": 404,
    "error_id": "1234567890"
}

Limitations

While the new system works well for us, it's important to acknowledge its challenges and limitations:

Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.

The way forward

Our work isn't done. Addressing the limitations is our primary focus moving forward:

Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.

In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.

The problem: incomprehensible errors and security leaks

The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:

A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.

Our initial setup

To understand the solution, let's first look at the current implementation:

We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (oapi-codegen) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.
Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.

Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:

// OpenAPI spec
post:
  summary: Set User Details
  responses:
    '200':
      $ref: '#/components/responses/SetUserSuccessResponse'
    // All error codes point to the same generic response
    '400':
      $ref: '#/components/responses/SetUserErrorResponse'
    '401':
      $ref: '#/components/responses/SetUserErrorResponse'
    '500':
      $ref: '#/components/responses/SetUserErrorResponse'

 // How we returned errors
 return server.SetUserErrorResponse{
		// This often exposed internal Go error messages directly
		Message: err.Error(),
 }, nil

This approach had several clear downsides:

Exposing Internal Errors: As seen above, err.Error() could easily leak internal system details.
Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.

The solution: a centralized error library

The library exposed a function which created a RestError. This RestError satisfied the Error interface and looked something like this:

type RestError struct {
	// Kind is the class of error, such as "database" or "permission".
	Kind Kind `json:"kind"`

	// Code is a human-readable, short representation of the error.
	Code Code `json:"code"`

	// Message is the user-friendly error message.
	Message string `json:"message"`
	// ... other useful fields like Param, Status, etc.

This structure allows us to categorize errors for easier identification and debugging:

The Kind field helps us understand the class of error (e.g., Database, Queue, Permission).
The Code field provides a more specific, machine-readable key (e.g., invalid_user_id). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.

With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:

// Before: Manually creating a response struct
return server.SetUserErrorResponse{ Message: err.Error() }, nil

// After: Using the new error library
return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"

Our new library also included several developer-friendly features:

It automatically determines the correct HTTP status code from the Kind and Code provided.
It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured RestError.
It handles nested errors and stack traces gracefully for easier debugging.

Handling upstream errors

The library solved error creation within a single service, but what about errors returned by the upstream services?

Previously, our user-facing service handled upstream errors with a flawed process:

Check if the upstream HTTP response status code was not 200 OK.
If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.

This approach had two major problems:

Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example: Login failed is still okay to be published, but LDAPFailed is not, because it exposes inner details.
Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.

The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.

With a new HandleUpstreamError function, our code became much cleaner:

if result.StatusCode() != http.StatusOK {
    // This function handles all the translation logic
	return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body

We solved it in 2 steps:

We first modified our public APIs to throw a non 200 response in a standard format, let’s call it ErrorResponse.
We wrote a middleware that does a few things, namely:
- It intercepts the final RestError before it's sent to the user and converts it to ErrorResponse
- It performs intelligent filtering. Using the Kind field, it decides what to expose and what to hide.
  - If the Kind is Database, Queue, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.
  - If the Kind is ResourceNotExists, Validation, or Unauthenticated, it allows the descriptive, user-safe message to pass through.
- It handles the formatting of the error messages by:
  - Applying basic grammatical corrections (Capitalization, Punctuation).
  - Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind ResourceNotExists, the suffix is - Please check your request or contact support if the issue persists.

Conclusion

The impact can be best described by the change in our API responses.

Before: A leaky and unhelpful error.

{
	"message" : "error getting user api key, err: no rows in result set"
}

After: A clear, secure, and actionable response.

{
    "code": "no_row_found",
    "message": "User does not have an API key. Please check your request or contact support if the issue persists.",
    "kind": "resource_not_exist",
    "status": 404,
    "error_id": "1234567890"
}

Limitations

While the new system works well for us, it's important to acknowledge its challenges and limitations:

Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.

The way forward

Our work isn't done. Addressing the limitations is our primary focus moving forward:

Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.

In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.

The problem: incomprehensible errors and security leaks

The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:

A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.

Our initial setup

To understand the solution, let's first look at the current implementation:

We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (oapi-codegen) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.
Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.

Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:

// OpenAPI spec
post:
  summary: Set User Details
  responses:
    '200':
      $ref: '#/components/responses/SetUserSuccessResponse'
    // All error codes point to the same generic response
    '400':
      $ref: '#/components/responses/SetUserErrorResponse'
    '401':
      $ref: '#/components/responses/SetUserErrorResponse'
    '500':
      $ref: '#/components/responses/SetUserErrorResponse'

 // How we returned errors
 return server.SetUserErrorResponse{
		// This often exposed internal Go error messages directly
		Message: err.Error(),
 }, nil

This approach had several clear downsides:

Exposing Internal Errors: As seen above, err.Error() could easily leak internal system details.
Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.

The solution: a centralized error library

The library exposed a function which created a RestError. This RestError satisfied the Error interface and looked something like this:

type RestError struct {
	// Kind is the class of error, such as "database" or "permission".
	Kind Kind `json:"kind"`

	// Code is a human-readable, short representation of the error.
	Code Code `json:"code"`

	// Message is the user-friendly error message.
	Message string `json:"message"`
	// ... other useful fields like Param, Status, etc.

This structure allows us to categorize errors for easier identification and debugging:

The Kind field helps us understand the class of error (e.g., Database, Queue, Permission).
The Code field provides a more specific, machine-readable key (e.g., invalid_user_id). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.

With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:

// Before: Manually creating a response struct
return server.SetUserErrorResponse{ Message: err.Error() }, nil

// After: Using the new error library
return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"

Our new library also included several developer-friendly features:

It automatically determines the correct HTTP status code from the Kind and Code provided.
It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured RestError.
It handles nested errors and stack traces gracefully for easier debugging.

Handling upstream errors

The library solved error creation within a single service, but what about errors returned by the upstream services?

Previously, our user-facing service handled upstream errors with a flawed process:

Check if the upstream HTTP response status code was not 200 OK.
If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.

This approach had two major problems:

Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example: Login failed is still okay to be published, but LDAPFailed is not, because it exposes inner details.
Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.

The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.

With a new HandleUpstreamError function, our code became much cleaner:

if result.StatusCode() != http.StatusOK {
    // This function handles all the translation logic
	return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body

We solved it in 2 steps:

We first modified our public APIs to throw a non 200 response in a standard format, let’s call it ErrorResponse.
We wrote a middleware that does a few things, namely:
- It intercepts the final RestError before it's sent to the user and converts it to ErrorResponse
- It performs intelligent filtering. Using the Kind field, it decides what to expose and what to hide.
  - If the Kind is Database, Queue, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.
  - If the Kind is ResourceNotExists, Validation, or Unauthenticated, it allows the descriptive, user-safe message to pass through.
- It handles the formatting of the error messages by:
  - Applying basic grammatical corrections (Capitalization, Punctuation).
  - Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind ResourceNotExists, the suffix is - Please check your request or contact support if the issue persists.

Conclusion

The impact can be best described by the change in our API responses.

Before: A leaky and unhelpful error.

{
	"message" : "error getting user api key, err: no rows in result set"
}

After: A clear, secure, and actionable response.

{
    "code": "no_row_found",
    "message": "User does not have an API key. Please check your request or contact support if the issue persists.",
    "kind": "resource_not_exist",
    "status": 404,
    "error_id": "1234567890"
}

Limitations

While the new system works well for us, it's important to acknowledge its challenges and limitations:

Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.

The way forward

Our work isn't done. Addressing the limitations is our primary focus moving forward:

Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.

In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.

The problem: incomprehensible errors and security leaks

The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:

A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.

Our initial setup

To understand the solution, let's first look at the current implementation:

We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (oapi-codegen) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.
Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.

Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:

// OpenAPI spec
post:
  summary: Set User Details
  responses:
    '200':
      $ref: '#/components/responses/SetUserSuccessResponse'
    // All error codes point to the same generic response
    '400':
      $ref: '#/components/responses/SetUserErrorResponse'
    '401':
      $ref: '#/components/responses/SetUserErrorResponse'
    '500':
      $ref: '#/components/responses/SetUserErrorResponse'

 // How we returned errors
 return server.SetUserErrorResponse{
		// This often exposed internal Go error messages directly
		Message: err.Error(),
 }, nil

This approach had several clear downsides:

Exposing Internal Errors: As seen above, err.Error() could easily leak internal system details.
Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.

The solution: a centralized error library

The library exposed a function which created a RestError. This RestError satisfied the Error interface and looked something like this:

type RestError struct {
	// Kind is the class of error, such as "database" or "permission".
	Kind Kind `json:"kind"`

	// Code is a human-readable, short representation of the error.
	Code Code `json:"code"`

	// Message is the user-friendly error message.
	Message string `json:"message"`
	// ... other useful fields like Param, Status, etc.

This structure allows us to categorize errors for easier identification and debugging:

The Kind field helps us understand the class of error (e.g., Database, Queue, Permission).
The Code field provides a more specific, machine-readable key (e.g., invalid_user_id). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.

With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:

// Before: Manually creating a response struct
return server.SetUserErrorResponse{ Message: err.Error() }, nil

// After: Using the new error library
return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"

Our new library also included several developer-friendly features:

It automatically determines the correct HTTP status code from the Kind and Code provided.
It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured RestError.
It handles nested errors and stack traces gracefully for easier debugging.

Handling upstream errors

The library solved error creation within a single service, but what about errors returned by the upstream services?

Previously, our user-facing service handled upstream errors with a flawed process:

Check if the upstream HTTP response status code was not 200 OK.
If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.

This approach had two major problems:

Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example: Login failed is still okay to be published, but LDAPFailed is not, because it exposes inner details.
Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.

The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.

With a new HandleUpstreamError function, our code became much cleaner:

if result.StatusCode() != http.StatusOK {
    // This function handles all the translation logic
	return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body

We solved it in 2 steps:

We first modified our public APIs to throw a non 200 response in a standard format, let’s call it ErrorResponse.
We wrote a middleware that does a few things, namely:
- It intercepts the final RestError before it's sent to the user and converts it to ErrorResponse
- It performs intelligent filtering. Using the Kind field, it decides what to expose and what to hide.
  - If the Kind is Database, Queue, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.
  - If the Kind is ResourceNotExists, Validation, or Unauthenticated, it allows the descriptive, user-safe message to pass through.
- It handles the formatting of the error messages by:
  - Applying basic grammatical corrections (Capitalization, Punctuation).
  - Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind ResourceNotExists, the suffix is - Please check your request or contact support if the issue persists.

Conclusion

The impact can be best described by the change in our API responses.

Before: A leaky and unhelpful error.

{
	"message" : "error getting user api key, err: no rows in result set"
}

After: A clear, secure, and actionable response.

{
    "code": "no_row_found",
    "message": "User does not have an API key. Please check your request or contact support if the issue persists.",
    "kind": "resource_not_exist",
    "status": 404,
    "error_id": "1234567890"
}

Limitations

While the new system works well for us, it's important to acknowledge its challenges and limitations:

Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.

The way forward

Our work isn't done. Addressing the limitations is our primary focus moving forward:

Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.

Jump to section

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

April 10, 2025 | 5 min read

GitHub runners fundamentals and self-hosted runner setup

Rajesh Jangid

SRE

This post is an introduction to Github runners, different ways in which they can be deployed, and a comparison between managed and self hosted runners.

April 10, 2025 | 5 min read

GitHub runners fundamentals and self-hosted runner setup

Rajesh Jangid

SRE

This post is an introduction to Github runners, different ways in which they can be deployed, and a comparison between managed and self hosted runners.

April 8, 2025 | 4 min read

All software is assembled

Chinmay Naik

Founder, CEO @One2N

Learn why modern software development relies more on strategic assembly of third-party components over building them from scratch.

April 8, 2025 | 4 min read

All software is assembled

Chinmay Naik

Founder, CEO @One2N

Learn why modern software development relies more on strategic assembly of third-party components over building them from scratch.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

June 25, 2025 | 3 min read

DARE to question your alerts?

Saurabh Hirani

Principal SRE @One2N

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

June 4, 2025 | 5 min read

Deploying a scalable NATS cluster part 1: core architecture and considerations

Barun Debnath

SRE @One2N

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

May 14, 2025 | 8 min read

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

Mihir Bhagwat

SRE @One2N

Sanket Rajgiri

SRE @One2N

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

April 29, 2025 | 9 min read

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

Srivatsa RV

SRE @One2N

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

Blog

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Subscribe for more such content

Stay updated with the latest insights and best practices in software engineering and site reliability engineering by subscribing to our content.

Services

Resources

Company

Implementing secure error handling in Go for B2B SaaS applications

Implementing secure error handling in Go for B2B SaaS applications

Implementing secure error handling in Go for B2B SaaS applications

Implementing secure error handling in Go for B2B SaaS applications

Implementing secure error handling in Go for B2B SaaS applications

Share

Jump to section

Related posts

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

GitHub runners fundamentals and self-hosted runner setup

This post is an introduction to Github runners, different ways in which they can be deployed, and a comparison between managed and self hosted runners.

GitHub runners fundamentals and self-hosted runner setup

This post is an introduction to Github runners, different ways in which they can be deployed, and a comparison between managed and self hosted runners.

All software is assembled

Learn why modern software development relies more on strategic assembly of third-party components over building them from scratch.

All software is assembled

Learn why modern software development relies more on strategic assembly of third-party components over building them from scratch.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

DARE to question your alerts?

Alert fatigue is real and it's hurting your on-call teams. This post breaks down how to make alerting systems smarter through regular alert analysis. Learn how to turn noisy alerts into actionable insights that drive real resilience.

Deploying a scalable NATS cluster part 1: core architecture and considerations

In this blog, we provide a detailed overview of the NATS architecture, key cluster design considerations, and best practices for deploying a scalable and reliable NATS messaging system. We cover topics like high availability, fault tolerance, message durability, and the infrastructure requirements.

Optimizing MongoDB backup strategy: lessons from achieving a 1-Hour RPO

This post walks through how we implemented a disaster recovery solution for a MongoDB cluster running on Google Kubernetes Engine (GKE) using the MongoDB Community Operator

Transforming alerting with GitOps - a journey in automating Elasticsearch alerts

This blog tells you how to approach alerting from first principles while on the ELK Stack. We cover how to capture the right signals that a NOC team looks at, structure them as alert definitions and operationalize them relying on GitOps. This ensures teams can act on alerts confidently.

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content