Database outage

Incident Report for Pinpoint

Postmortem

Date: 22 July 2025

Duration: 8 hours 34 minutes (01:07 – 09:41 BST)

What Happened

Pinpoint experienced a significant service disruption affecting our primary database, admin/API services, and careers site from 1:07 AM to 9:41 AM BST on 22 July. During this period, customers experienced complete service unavailability for approximately 2 hours and 45 minutes, followed by functionality with temporarily outdated data until full restoration was completed.

Timeline (BST)

  • 01:07 – Service disruption began; database connectivity issues detected
  • 01:09 – Emergency response initiated; services marked as unavailable
  • 01:45 – Initial database restoration commenced
  • 03:45 – Services restored with backup data from the previous day
  • 09:21 – Second restoration initiated with more recent backup
  • 09:41 – Full service restoration completed with current data
  • 10:00+ – Performance optimisation measures implemented
  • 23:50+ – Restoration of some delayed data from 3:00-9:41 AM

Why It Happened

The incident occurred during a trial of a new database vendor's migration service. While testing their system, operations intended for a test environment were unexpectedly redirected to our production database, resulting in the unintended removal of data.

The database vendor's system includes advanced features designed to help companies migrate data safely. However, the behaviour of these features during our testing phase was not clearly indicated in their interface. This led to test operations affecting our live production data instead of remaining isolated in the test environment, where they belonged.

What We're Doing About It

  • Immediate Recovery Completed: All services have been fully restored. We are working to restore data created during the early morning hours of Tuesday.
  • Enhanced Monitoring: We are implementing additional safeguards to detect and prevent unintended database operations, including alerts for any bulk data modifications.
  • Vendor Collaboration: We are working closely with the database vendor to improve their user interface and documentation, ensuring that system behaviour is communicated during all operational phases.

We apologise for this disruption to your service. We understand the critical nature of your recruitment processes and the impact this may have had on scheduled interviews and daily operations.

Posted Jul 23, 2025 - 08:31 UTC

Resolved

The outage has been resolved, and all services are operational.

A full post-mortem will follow.
Posted Jul 23, 2025 - 08:29 UTC

Monitoring

A fix has been implemented and we are monitoring the the application. You may experience degraded performance
Posted Jul 22, 2025 - 02:40 UTC

Investigating

We are investigating an outage with our primary database that is affecting the whole platform.
Posted Jul 22, 2025 - 01:33 UTC
This incident affected: Pinpoint.