In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.
I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.
The Problem: Incomprehensible Errors and Security Leaks
Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.
The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:
A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.
Our Initial Setup
To understand the solution, let's first look at the current implementation:
We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (
oapi-codegen
) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.
Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:
// OpenAPI spec post: summary: Set User Details responses: '200': $ref: '#/components/responses/SetUserSuccessResponse' // All error codes point to the same generic response '400': $ref: '#/components/responses/SetUserErrorResponse' '401': $ref: '#/components/responses/SetUserErrorResponse' '500': $ref: '#/components/responses/SetUserErrorResponse' // How we returned errors return server.SetUserErrorResponse{ // This often exposed internal Go error messages directly Message: err.Error(), }, nil
This approach had several clear downsides:
Exposing Internal Errors: As seen above,
err.Error()
could easily leak internal system details.Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.
The Solution: A Centralized Error Library
To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.
The library exposed a function which created a RestError
. This RestError satisfied the Error interface and looked something like this:
type RestError struct { // Kind is the class of error, such as "database" or "permission". Kind Kind `json:"kind"` // Code is a human-readable, short representation of the error. Code Code `json:"code"` // Message is the user-friendly error message. Message string `json:"message"` // ... other useful fields like Param, Status, etc.
This structure allows us to categorize errors for easier identification and debugging:
The
Kind
field helps us understand the class of error (e.g.,Database
,Queue
,Permission
).The
Code
field provides a more specific, machine-readable key (e.g.,invalid_user_id
). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.
With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:
// Before: Manually creating a response struct return server.SetUserErrorResponse{ Message: err.Error() }, nil // After: Using the new error library return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"
Our new library also included several developer-friendly features:
It automatically determines the correct HTTP status code from the
Kind
andCode
provided.It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured
RestError
.It handles nested errors and stack traces gracefully for easier debugging.
Handling Upstream Errors
The library solved error creation within a single service, but what about errors returned by the upstream services?
Previously, our user-facing service handled upstream errors with a flawed process:
Check if the upstream HTTP response status code was not
200 OK
.If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.
This approach had two major problems:
Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example:
Login failed
is still okay to be published, butLDAPFailed
is not, because it exposes inner details.Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.
The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.
With a new HandleUpstreamError
function, our code became much cleaner:
if result.StatusCode() != http.StatusOK { // This function handles all the translation logic return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body
This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.
This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?
We solved it in 2 steps:
We first modified our public APIs to throw a non 200 response in a standard format, let’s call it
ErrorResponse
.We wrote a middleware that does a few things, namely:
It intercepts the final
RestError
before it's sent to the user and converts it toErrorResponse
It performs intelligent filtering. Using the
Kind
field, it decides what to expose and what to hide.If the
Kind
isDatabase
,Queue
, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.If the
Kind
isResourceNotExists
,Validation
, orUnauthenticated
, it allows the descriptive, user-safe message to pass through.
It handles the formatting of the error messages by:
Applying basic grammatical corrections (Capitalization, Punctuation).
Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind
ResourceNotExists
, the suffix is -Please check your request or contact support if the issue persists.
Conclusion
The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.
The impact can be best described by the change in our API responses.
Before: A leaky and unhelpful error.
{ "message" : "error getting user api key, err: no rows in result set" }
After: A clear, secure, and actionable response.
{ "code": "no_row_found", "message": "User does not have an API key. Please check your request or contact support if the issue persists.", "kind": "resource_not_exist", "status": 404, "error_id": "1234567890" }
Limitations
While the new system works well for us, it's important to acknowledge its challenges and limitations:
Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.
The Way Forward
Our work isn't done. Addressing the limitations is our primary focus moving forward:
Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.
In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.
I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.
The Problem: Incomprehensible Errors and Security Leaks
Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.
The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:
A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.
Our Initial Setup
To understand the solution, let's first look at the current implementation:
We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (
oapi-codegen
) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.
Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:
// OpenAPI spec post: summary: Set User Details responses: '200': $ref: '#/components/responses/SetUserSuccessResponse' // All error codes point to the same generic response '400': $ref: '#/components/responses/SetUserErrorResponse' '401': $ref: '#/components/responses/SetUserErrorResponse' '500': $ref: '#/components/responses/SetUserErrorResponse' // How we returned errors return server.SetUserErrorResponse{ // This often exposed internal Go error messages directly Message: err.Error(), }, nil
This approach had several clear downsides:
Exposing Internal Errors: As seen above,
err.Error()
could easily leak internal system details.Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.
The Solution: A Centralized Error Library
To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.
The library exposed a function which created a RestError
. This RestError satisfied the Error interface and looked something like this:
type RestError struct { // Kind is the class of error, such as "database" or "permission". Kind Kind `json:"kind"` // Code is a human-readable, short representation of the error. Code Code `json:"code"` // Message is the user-friendly error message. Message string `json:"message"` // ... other useful fields like Param, Status, etc.
This structure allows us to categorize errors for easier identification and debugging:
The
Kind
field helps us understand the class of error (e.g.,Database
,Queue
,Permission
).The
Code
field provides a more specific, machine-readable key (e.g.,invalid_user_id
). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.
With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:
// Before: Manually creating a response struct return server.SetUserErrorResponse{ Message: err.Error() }, nil // After: Using the new error library return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"
Our new library also included several developer-friendly features:
It automatically determines the correct HTTP status code from the
Kind
andCode
provided.It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured
RestError
.It handles nested errors and stack traces gracefully for easier debugging.
Handling Upstream Errors
The library solved error creation within a single service, but what about errors returned by the upstream services?
Previously, our user-facing service handled upstream errors with a flawed process:
Check if the upstream HTTP response status code was not
200 OK
.If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.
This approach had two major problems:
Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example:
Login failed
is still okay to be published, butLDAPFailed
is not, because it exposes inner details.Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.
The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.
With a new HandleUpstreamError
function, our code became much cleaner:
if result.StatusCode() != http.StatusOK { // This function handles all the translation logic return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body
This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.
This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?
We solved it in 2 steps:
We first modified our public APIs to throw a non 200 response in a standard format, let’s call it
ErrorResponse
.We wrote a middleware that does a few things, namely:
It intercepts the final
RestError
before it's sent to the user and converts it toErrorResponse
It performs intelligent filtering. Using the
Kind
field, it decides what to expose and what to hide.If the
Kind
isDatabase
,Queue
, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.If the
Kind
isResourceNotExists
,Validation
, orUnauthenticated
, it allows the descriptive, user-safe message to pass through.
It handles the formatting of the error messages by:
Applying basic grammatical corrections (Capitalization, Punctuation).
Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind
ResourceNotExists
, the suffix is -Please check your request or contact support if the issue persists.
Conclusion
The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.
The impact can be best described by the change in our API responses.
Before: A leaky and unhelpful error.
{ "message" : "error getting user api key, err: no rows in result set" }
After: A clear, secure, and actionable response.
{ "code": "no_row_found", "message": "User does not have an API key. Please check your request or contact support if the issue persists.", "kind": "resource_not_exist", "status": 404, "error_id": "1234567890" }
Limitations
While the new system works well for us, it's important to acknowledge its challenges and limitations:
Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.
The Way Forward
Our work isn't done. Addressing the limitations is our primary focus moving forward:
Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.
In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.
I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.
The Problem: Incomprehensible Errors and Security Leaks
Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.
The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:
A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.
Our Initial Setup
To understand the solution, let's first look at the current implementation:
We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (
oapi-codegen
) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.
Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:
// OpenAPI spec post: summary: Set User Details responses: '200': $ref: '#/components/responses/SetUserSuccessResponse' // All error codes point to the same generic response '400': $ref: '#/components/responses/SetUserErrorResponse' '401': $ref: '#/components/responses/SetUserErrorResponse' '500': $ref: '#/components/responses/SetUserErrorResponse' // How we returned errors return server.SetUserErrorResponse{ // This often exposed internal Go error messages directly Message: err.Error(), }, nil
This approach had several clear downsides:
Exposing Internal Errors: As seen above,
err.Error()
could easily leak internal system details.Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.
The Solution: A Centralized Error Library
To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.
The library exposed a function which created a RestError
. This RestError satisfied the Error interface and looked something like this:
type RestError struct { // Kind is the class of error, such as "database" or "permission". Kind Kind `json:"kind"` // Code is a human-readable, short representation of the error. Code Code `json:"code"` // Message is the user-friendly error message. Message string `json:"message"` // ... other useful fields like Param, Status, etc.
This structure allows us to categorize errors for easier identification and debugging:
The
Kind
field helps us understand the class of error (e.g.,Database
,Queue
,Permission
).The
Code
field provides a more specific, machine-readable key (e.g.,invalid_user_id
). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.
With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:
// Before: Manually creating a response struct return server.SetUserErrorResponse{ Message: err.Error() }, nil // After: Using the new error library return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"
Our new library also included several developer-friendly features:
It automatically determines the correct HTTP status code from the
Kind
andCode
provided.It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured
RestError
.It handles nested errors and stack traces gracefully for easier debugging.
Handling Upstream Errors
The library solved error creation within a single service, but what about errors returned by the upstream services?
Previously, our user-facing service handled upstream errors with a flawed process:
Check if the upstream HTTP response status code was not
200 OK
.If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.
This approach had two major problems:
Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example:
Login failed
is still okay to be published, butLDAPFailed
is not, because it exposes inner details.Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.
The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.
With a new HandleUpstreamError
function, our code became much cleaner:
if result.StatusCode() != http.StatusOK { // This function handles all the translation logic return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body
This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.
This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?
We solved it in 2 steps:
We first modified our public APIs to throw a non 200 response in a standard format, let’s call it
ErrorResponse
.We wrote a middleware that does a few things, namely:
It intercepts the final
RestError
before it's sent to the user and converts it toErrorResponse
It performs intelligent filtering. Using the
Kind
field, it decides what to expose and what to hide.If the
Kind
isDatabase
,Queue
, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.If the
Kind
isResourceNotExists
,Validation
, orUnauthenticated
, it allows the descriptive, user-safe message to pass through.
It handles the formatting of the error messages by:
Applying basic grammatical corrections (Capitalization, Punctuation).
Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind
ResourceNotExists
, the suffix is -Please check your request or contact support if the issue persists.
Conclusion
The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.
The impact can be best described by the change in our API responses.
Before: A leaky and unhelpful error.
{ "message" : "error getting user api key, err: no rows in result set" }
After: A clear, secure, and actionable response.
{ "code": "no_row_found", "message": "User does not have an API key. Please check your request or contact support if the issue persists.", "kind": "resource_not_exist", "status": 404, "error_id": "1234567890" }
Limitations
While the new system works well for us, it's important to acknowledge its challenges and limitations:
Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.
The Way Forward
Our work isn't done. Addressing the limitations is our primary focus moving forward:
Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.
In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.
I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.
The Problem: Incomprehensible Errors and Security Leaks
Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.
The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:
A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.
Our Initial Setup
To understand the solution, let's first look at the current implementation:
We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (
oapi-codegen
) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.
Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:
// OpenAPI spec post: summary: Set User Details responses: '200': $ref: '#/components/responses/SetUserSuccessResponse' // All error codes point to the same generic response '400': $ref: '#/components/responses/SetUserErrorResponse' '401': $ref: '#/components/responses/SetUserErrorResponse' '500': $ref: '#/components/responses/SetUserErrorResponse' // How we returned errors return server.SetUserErrorResponse{ // This often exposed internal Go error messages directly Message: err.Error(), }, nil
This approach had several clear downsides:
Exposing Internal Errors: As seen above,
err.Error()
could easily leak internal system details.Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.
The Solution: A Centralized Error Library
To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.
The library exposed a function which created a RestError
. This RestError satisfied the Error interface and looked something like this:
type RestError struct { // Kind is the class of error, such as "database" or "permission". Kind Kind `json:"kind"` // Code is a human-readable, short representation of the error. Code Code `json:"code"` // Message is the user-friendly error message. Message string `json:"message"` // ... other useful fields like Param, Status, etc.
This structure allows us to categorize errors for easier identification and debugging:
The
Kind
field helps us understand the class of error (e.g.,Database
,Queue
,Permission
).The
Code
field provides a more specific, machine-readable key (e.g.,invalid_user_id
). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.
With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:
// Before: Manually creating a response struct return server.SetUserErrorResponse{ Message: err.Error() }, nil // After: Using the new error library return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"
Our new library also included several developer-friendly features:
It automatically determines the correct HTTP status code from the
Kind
andCode
provided.It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured
RestError
.It handles nested errors and stack traces gracefully for easier debugging.
Handling Upstream Errors
The library solved error creation within a single service, but what about errors returned by the upstream services?
Previously, our user-facing service handled upstream errors with a flawed process:
Check if the upstream HTTP response status code was not
200 OK
.If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.
This approach had two major problems:
Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example:
Login failed
is still okay to be published, butLDAPFailed
is not, because it exposes inner details.Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.
The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.
With a new HandleUpstreamError
function, our code became much cleaner:
if result.StatusCode() != http.StatusOK { // This function handles all the translation logic return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body
This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.
This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?
We solved it in 2 steps:
We first modified our public APIs to throw a non 200 response in a standard format, let’s call it
ErrorResponse
.We wrote a middleware that does a few things, namely:
It intercepts the final
RestError
before it's sent to the user and converts it toErrorResponse
It performs intelligent filtering. Using the
Kind
field, it decides what to expose and what to hide.If the
Kind
isDatabase
,Queue
, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.If the
Kind
isResourceNotExists
,Validation
, orUnauthenticated
, it allows the descriptive, user-safe message to pass through.
It handles the formatting of the error messages by:
Applying basic grammatical corrections (Capitalization, Punctuation).
Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind
ResourceNotExists
, the suffix is -Please check your request or contact support if the issue persists.
Conclusion
The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.
The impact can be best described by the change in our API responses.
Before: A leaky and unhelpful error.
{ "message" : "error getting user api key, err: no rows in result set" }
After: A clear, secure, and actionable response.
{ "code": "no_row_found", "message": "User does not have an API key. Please check your request or contact support if the issue persists.", "kind": "resource_not_exist", "status": 404, "error_id": "1234567890" }
Limitations
While the new system works well for us, it's important to acknowledge its challenges and limitations:
Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.
The Way Forward
Our work isn't done. Addressing the limitations is our primary focus moving forward:
Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.
In this post, I’ll share how our team tackled an interesting challenge in our Go-based microservices: Error Handling.
I know 'Error Handling' doesn't sound exciting, but getting it right is one of the most effective ways to prevent security leaks and stop your product from feeling unfinished and unprofessional. By overhauling our approach, we significantly improved our customer and developer experience while mitigating security issues.
The Problem: Incomprehensible Errors and Security Leaks
Initially, our product's error messages were a major pain point. We received feedback from our customers that our application felt unpolished because when something went wrong, they were seeing raw, technical error messages.
The culprit was our error handling strategy. The messages revealed internal details, stack traces, and non-understandable tech jargon. This created two critical problems:
A Poor User Experience (UX): Customers were frustrated by un-actionable errors they couldn't understand.
A Serious Security Risk: Exposing internal implementation details (like database error messages) is a security vulnerability that can be exploited by malicious actors.
Our Initial Setup
To understand the solution, let's first look at the current implementation:
We have a bunch of Go-based microservices.
We use OpenAPI to define our REST APIs.
We use a tool (
oapi-codegen
) to auto-generate handlers, request/response bodies, and other boilerplate code from our OpenAPI specifications.Developers focus on writing the core service layer and business logic.
A single user-facing service acts as a gateway, orchestrating calls to our various internal, backend services.
Previously, our OpenAPI spec would define an error response for non-200 status codes, and the code to return an error looked like this:
// OpenAPI spec post: summary: Set User Details responses: '200': $ref: '#/components/responses/SetUserSuccessResponse' // All error codes point to the same generic response '400': $ref: '#/components/responses/SetUserErrorResponse' '401': $ref: '#/components/responses/SetUserErrorResponse' '500': $ref: '#/components/responses/SetUserErrorResponse' // How we returned errors return server.SetUserErrorResponse{ // This often exposed internal Go error messages directly Message: err.Error(), }, nil
This approach had several clear downsides:
Exposing Internal Errors: As seen above,
err.Error()
could easily leak internal system details.Bad User Experience: The messages were technical and unhelpful for end-users.
Lack of Standardization: Each developer or service could implement error responses differently, leading to inconsistency.
Painful Upstream Error Handling: The user-facing service struggled to correctly handle and translate errors coming from the internal services it called.
The Solution: A Centralized Error Library
To solve these problems, we built a new Go error library to standardize error handling across all our services. It was meant for creating and sending out meaningful errors for REST APIs. The goal of this library and other work that went behind this initiative, which I’ll talk about later, was to make the experience better not just for the users, but also for our developers who’ll use this library to send errors and then use those errors later to identify bugs and issues.
The library exposed a function which created a RestError
. This RestError satisfied the Error interface and looked something like this:
type RestError struct { // Kind is the class of error, such as "database" or "permission". Kind Kind `json:"kind"` // Code is a human-readable, short representation of the error. Code Code `json:"code"` // Message is the user-friendly error message. Message string `json:"message"` // ... other useful fields like Param, Status, etc.
This structure allows us to categorize errors for easier identification and debugging:
The
Kind
field helps us understand the class of error (e.g.,Database
,Queue
,Permission
).The
Code
field provides a more specific, machine-readable key (e.g.,invalid_user_id
). This is incredibly useful for searching and filtering logs, even while the user sees a friendly message.
With the new library, the developer experience became much simpler. Instead of manually building a response, a developer can now simply do this:
// Before: Manually creating a response struct return server.SetUserErrorResponse{ Message: err.Error() }, nil // After: Using the new error library return errs.E(errs.InvalidRequest, errors.New("User does not have an API key"
Our new library also included several developer-friendly features:
It automatically determines the correct HTTP status code from the
Kind
andCode
provided.It's forgiving. If a developer accidentally passes a raw internal error from our external dependencies (e.g., from Postgres or Redis), the library attempts to parse it into a structured
RestError
.It handles nested errors and stack traces gracefully for easier debugging.
Handling Upstream Errors
The library solved error creation within a single service, but what about errors returned by the upstream services?
Previously, our user-facing service handled upstream errors with a flawed process:
Check if the upstream HTTP response status code was not
200 OK
.If it was an error, un-marshal the response body.
Forward the upstream service's error message directly to the end-user.
This approach had two major problems:
Security Risk: It forwarded sensitive internal errors from the internal services, which is even riskier than sending the internal error of the user-facing service. For example:
Login failed
is still okay to be published, butLDAPFailed
is not, because it exposes inner details.Inconsistent Formats: Each internal microservice that we talk to could return errors in different formats, making them impossible to handle reliably.
The second problem was solved by adopting our new library across all services. To solve the first problem, we introduced a translation function and a middleware.
With a new HandleUpstreamError
function, our code became much cleaner:
if result.StatusCode() != http.StatusOK { // This function handles all the translation logic return nil, errorutils.HandleUpstreamError(ctx, result.StatusCode(), result.Body
This function translates the upstream errors from our internal services to the RestError format. If the error was not in a proper format, it has fallbacks in place to handle the scenario and create a RestError from the error response that it received.
This ensures that any error leaving our user-facing service is properly formatted. But a crucial question remains: How do we prevent sensitive internal error details from leaking out, even if they are well-formatted?
We solved it in 2 steps:
We first modified our public APIs to throw a non 200 response in a standard format, let’s call it
ErrorResponse
.We wrote a middleware that does a few things, namely:
It intercepts the final
RestError
before it's sent to the user and converts it toErrorResponse
It performs intelligent filtering. Using the
Kind
field, it decides what to expose and what to hide.If the
Kind
isDatabase
,Queue
, or another internal type, the middleware replaces it with a generic "Internal Server Error" message.If the
Kind
isResourceNotExists
,Validation
, orUnauthenticated
, it allows the descriptive, user-safe message to pass through.
It handles the formatting of the error messages by:
Applying basic grammatical corrections (Capitalization, Punctuation).
Creating appropriate suffix for the error message depending on the kind of the error. An example could be: For Kind
ResourceNotExists
, the suffix is -Please check your request or contact support if the issue persists.
Conclusion
The positive feedback from our customers confirmed the value of this effort. This error handling framework completely revamped our error messages, transforming a major pain point into a polished and professional feature. By building a system with empathy for both our customers and our developers, we created a more secure and robust product.
The impact can be best described by the change in our API responses.
Before: A leaky and unhelpful error.
{ "message" : "error getting user api key, err: no rows in result set" }
After: A clear, secure, and actionable response.
{ "code": "no_row_found", "message": "User does not have an API key. Please check your request or contact support if the issue persists.", "kind": "resource_not_exist", "status": 404, "error_id": "1234567890" }
Limitations
While the new system works well for us, it's important to acknowledge its challenges and limitations:
Developer Adoption: The approach forces developers to rethink how they handle errors. It is not intuitive without prior training and requires a conscious effort to adopt. We addressed this with tech huddles to demonstrate the process and its benefits.
Keeping Everyone Consistent: The new framework only works if our entire team uses it the same way every time. If developers slip back into old habits, we'll end up with a mix of different error styles, which defeats the whole purpose of creating a standard.
The Way Forward
Our work isn't done. Addressing the limitations is our primary focus moving forward:
Automating Compliance: The key question remains: "How do I know how much of my codebase is adhering to these standards?" Manually checking is not scalable. This is a good use case for static analysis. We are trying out AI-powered code review tools to automatically identify these kinds of inconsistencies and enforce our standards.
Continuous Improvement: We will continue to refine the library and the middleware, extending them to handle new dependencies and edge cases as our platform evolves.