Services

Resources

Company

Curious case of debugging failing webhook API requests

A short debugging story to start off the new year.

Developer (D): Hey, can you join a call? I need some help in debugging a webhook API connectivity issue. The customer team is also on the call.

Team lead (You): Okay sure, add me to the call.

You: So, what are you trying to do and what's the problem?

D: Our backend has a webhook API that a third party will invoke over the internet. The customer team from that third party is on the call. They are getting a 500 error when making this API call.

You: Okay, what's the error? And before that, can you tell me the HTTP request flow?

D: Yeah, so the error is related to TLS and I am not sure how to debug this. The request flow is as follows:

Third-party --> Firewall --> Nginx --> Our backend

You: Okay, have you tried exposing the API on staging on http instead of https?

D: Yes, this is on a staging environment, and the http API works. When I switch to https, we get an error in the third party.

You: Okay, where does the TLS termination happen in the above request flow?

D: (long pause) I think it happens at Nginx.

You: Sure?

D: (thinking...) Yes, I have configured the certs in Nginx.

You: So the Firewall is Layer 4 and Nginx acts as Layer 7 and terminates TLS?

D: Yes

You: Can you try making a call to webhook API from your local machine via curl or Postman?

D: (tries to demo this, but the call times out)

You: Is there any IP whitelisting at the Firewall level to make sure we allow the webhook API call only from that third party and no one else?

D: Yes.

You: Can you try removing the whitelisting for testing purposes right now and try the request from the local machine again?

D: (demos this use case, curl works okay, returns 4xx due to missing auth headers)

You: Can you open the url (used only as an example here) subdomain.example.com/api/webhook on your browser?

D: (demos this use case, on Chrome, it shows "Not secure" even though the protocol is https)

You: Interesting, show me more details about the certificate on Chrome.

D: (shows the details) Where are we going with this?

You: See, the certificate you're using is not valid for the subdomain. It's only valid for the main example.com domain. So, you're getting an error on the third party. The third-party strictly checks the certificate, so you're getting a TLS error.

(you continue): I wonder why it works via curl as it should also perform a strict check as we didn't pass -k or --insecure flag.

D: Oh, I have set up a default config in curl to always use the --insecure flag to make testing easier.

You: (smiling) Ah, that explains why! So, the TLS error issue is due to the bad certificate. You'll need to use the correct cert for a subdomain or a wildcard cert that works on all subdomains and update your Nginx config accordingly.

D: (hours later) Thanks, this issue was fixed. How do I learn to debug like this?

You: Here are the lessons.

Lessons

  • Learn the end-to-end request flow and the whole stack (from Layer 4 to Layer 7, at least)

  • Understand how proxies operate, what's TLS, DNS and how certs work

  • Get familiar with basic networking utilities - curl, nslookup, netstat, telnet, tcpdump, and more

  • Try to form a mental model about how things work and a hypothesis about where the problem could be

  • Only change one variable at a time when debugging

  • Only change what's relevant to your hypothesis and revisit your hypothesis and mental model

  • Practice and learn from past incidents and war stories from seniors

I write such stories on software engineering.

There's no specific frequency, as I don't make these up.

If you liked this one, you might love - Migrating Terabytes of metrics data with zero downtime.

Follow me on LinkedIn and Twitter for more such stuff, straight from the production oven!

Share
Share
Keywords

A short debugging story to start off the new year. Dev: Our backend has a webhook API that a third party will invoke over the internet. The customer team from that third party is on the call. They are getting a 500 error when making this API call.

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Hold for 2 seconds to verify

Security powered by One2N

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Hold for 2 seconds to verify

Security powered by One2N

Continue reading.