Platform Outage
Incident Report for Pinpoint
Postmortem

We use various tools at Pinpoint to help us identify and prevent attacks on our systems. One of those tools, Sqreen, monitors requests to our web servers and is able to block them based on a set of predetermined rules.

In this case it determined that a vulnerability discovery was being run on our system and correctly took steps to temporarily block the IP address that the requests were coming from.

Unfortunately the IP address identified as being the source of the requests was inaccurate, and instead of blocking a 3rd party IP address, it blocked an IP used by a load balancer that's used to distribute web traffic between our web servers.

This had the effect of stopping all web traffic from reaching our web servers, causing the outage. It also had the added side effect of presenting a web page to all those who attempted to access the site claiming that the site was under attack.

In order to resolve the issue the monitoring system (Sqreen) was turned off temporarily.

Upon investigation it was found that the cause of the issue was that the load balancer was reporting its own IP as the source of the requests, rather than the IP of the original request. The load balancer configuration was amended, which allowed us to turn Sqreen back on.

Posted Feb 29, 2020 - 12:42 UTC

Resolved
For around 3 minutes the entire Pinpoint application was unavailable for all end users.
Posted Feb 28, 2020 - 12:30 UTC