CyberCenter Service Disruption
Incident Report for Aqua Cloud
Resolved
On August 21st, 2023, at approximately 8:30 PM UTC, our Container Image Scanning service experienced a major disruption due to a lambda function exceeding its ephemeral storage limit. The lambda, responsible for downloading and extracting a critical database for image scanning, was configured with 3GB of ephemeral storage. However, the extracted database size of 2.5GB, combined with the 500MB zip archive, exhausted the available storage, causing the lambda to enter a panic state.

This resulted in a service outage, impacting container image scanning capabilities. Although monitoring was in place for various components, an alert specifically based on lambda panics was missing, delaying proactive identification and remediation.

The Aqua Fields team promptly identified the issue and engaged the on-call channel. However, due to the unavailability of the US team and the late hour in India, response time was impacted. The India team resolved the incident at 10:52 PM UTC on August 21st by increasing the lambda's ephemeral storage.

We apologize for any inconvenience caused by this disruption. We are taking steps to improve our monitoring and alerting capabilities, including implementing automated remediation where possible, to prevent similar incidents in the future.
Posted Aug 21, 2024 - 20:30 UTC