A network issue to our backend database is causing intermittent disruption

Incident Report for Featureflow

Postmortem

The issue was caused by excessive ‘CPU Steal’ on our backend database causing connection issues and timeouts.

We temporarily and intentionally dropped connections by scaling down the servers to scale the database cluster in minimal time.

Services were restored after the cluster was upgraded, starting with the event and SDK servers.

Posted Sep 13, 2023 - 21:27 UTC

This incident has been resolved.

Posted Sep 13, 2023 - 20:26 UTC

All Services are operational.

Posted Sep 13, 2023 - 20:23 UTC

Feaatureflow SDK and Event Server is recovered. Admin Api is being recovered.

Posted Sep 13, 2023 - 20:18 UTC

Issue has been identified in a rogue process causing database CPU issues. We are recovering services currently.

Posted Sep 13, 2023 - 20:07 UTC

We are currently investigating this issue.

Posted Sep 13, 2023 - 18:19 UTC

This incident affected: Admin API, Event Server, and SDK Server.