Skip to main content

Incident Response Plan

Status: DRAFT Owner: Engineering Last Review: 2026-02-21 Applicable Standards: SOC 2 (CC7.3, CC7.4, CC7.5) / GDPR (Art. 33, Art. 34) / SEC (data breach disclosure)

1. Purpose

This document defines the incident response procedures for the Equa platform. It covers how incidents are detected, who is notified, how they are contained and remediated, and how post-incident reviews are conducted.

2. Scope

ComponentIn ScopeNotes
equa-serverYesApplication-level incidents, API outages, data breaches
equa-webYesFrontend availability, client-side security issues
PostgreSQL (Cloud SQL)YesDatabase incidents, data corruption, unauthorized access
AWS S3YesDocument storage incidents, access control breaches
Google Cloud RunYesInfrastructure incidents, deployment failures
equabot-gatewayYesAI agent incidents, permission violations
For incidents involving the AI agent (Equanaut), also refer to the gateway-specific incident response procedures documented in the equabot-gateway repository. Agent-specific controls include rate limiting (AGENT_MAX_TOOL_CALLS_PER_MINUTE, AGENT_MAX_WRITE_OPS_PER_MINUTE, AGENT_MAX_DESTRUCTIVE_PER_HOUR) and the permission proxy that enforces user-level permissions on all agent tool calls.Source: equa-server/modules/agent/src/security/guardrails.ts

3. Incident Severity Levels

SeverityDescriptionExamplesResponse Time
P1 — CriticalService outage or confirmed data breachDatabase compromise, production down, unauthorized data accessImmediate (within 15 minutes)
P2 — HighDegraded service or suspected security incidentPartial outage, unusual access patterns, failed deployment causing errorsWithin 1 hour
P3 — MediumNon-critical issue with potential security impactElevated error rates, dependency vulnerability disclosed, suspicious login activityWithin 4 hours
P4 — LowMinor issue, no immediate security impactPerformance degradation, non-critical bug, informational security alertWithin 24 hours

4. Phase 1: Detection

4.1 Automated Detection

MechanismWhat It DetectsCurrent Status
Health endpoint monitoringCloud Run instance health, application availabilityImplemented (Cloud Run built-in)
Error loggingApplication exceptions, unhandled errorsImplemented (application logs)
Cloud Run metricsRequest latency, error rates, instance countAvailable (Google Cloud Monitoring)
Database healthCloud SQL availability, connection pool exhaustionAvailable (Google Cloud Monitoring)
Agent guardrailsTool call rate limit violations, unauthorized write operationsImplemented (equa-server/modules/agent/src/security/guardrails.ts)

4.2 Manual Detection

SourceWhat It Detects
User reportsFunctionality issues, unexpected behavior, suspicious activity
Team observationUnusual patterns during routine operations
Third-party notificationVulnerability disclosure, vendor security advisory

4.3 Detection Gaps

The following detection capabilities should be implemented to improve incident identification.
GapRecommendation
No external uptime monitoringDeploy a third-party uptime monitor (e.g., Better Uptime, Pingdom)
No alerting on error rate spikesConfigure Cloud Monitoring alerts for 5xx rate exceeding threshold
No authentication anomaly detectionMonitor for brute-force patterns, credential stuffing, geographic anomalies
No audit log anomaly detectionAlert on unusual admin actions, bulk data access, or privilege escalation

5. Phase 2: Notification

5.1 Internal Notification

When an incident is detected, the following notification chain is activated:
StepActionResponsible
1Incident detected (automated alert or manual report)Detection system / reporter
2Incident logged with severity level, timestamp, and initial descriptionFirst responder
3Incident lead assigned based on severity and typeEngineering lead
4Notification sent to relevant team membersIncident lead
5For P1/P2: executive stakeholders notifiedIncident lead

5.2 External Notification

ScenarioNotification RequiredTimeline
Confirmed data breach (PII)Affected users, relevant supervisory authority (GDPR: within 72 hours)GDPR Article 33: 72 hours to authority; Article 34: without undue delay to users
Confirmed data breach (financial)Affected users, state attorneys general (per state breach notification laws)Varies by state; typically 30—60 days
Service outageAffected users via status page or emailAs soon as impact is confirmed
Vulnerability in third-party dependencyNo external notification unless exploitedInternal assessment first

6. Phase 3: Containment

6.1 Immediate Containment Actions

ActionWhen to UseHow
Isolate Cloud Run instanceSuspected compromised instanceDeploy a new revision with the fix; route traffic away from compromised revision
Revoke sessionsSuspected credential compromiseTruncate the sessions table or invalidate specific user sessions via equa-server/modules/auth/src/sessions.ts
Disable user accountConfirmed compromised accountSet Users.enabled = false; destroy all active sessions
Block IP rangeActive attack from identifiable sourceConfigure Cloud Armor or firewall rules
Rotate secretsSuspected secret exposureRotate API_SESSION_SECRET, TWO_FACTOR_PRIVATE_KEY, database credentials, OAuth secrets; redeploy
Enable maintenance modeWidespread compromise requiring investigationDeploy a static maintenance page; stop processing requests
Disable agentAI agent acting outside expected parametersRevoke agent permissions via permission proxy; disable agent tool access

6.2 Cloud Run-Specific Containment

Google Cloud Run provides natural containment boundaries:
  • Instance isolation — Each request is handled by an isolated container instance
  • Revision-based deployment — Traffic can be routed to a known-good revision instantly
  • Scaling controls — Min/max instances can be adjusted (currently 1—10) to limit blast radius
  • Service disable — The entire service can be stopped if necessary

6.3 Database Containment

  • Read-only mode — Cloud SQL can be set to read-only to prevent further data modification
  • Point-in-time recovery — Cloud SQL automated backups enable restoration to a specific timestamp
  • Connection kill — Active database connections can be terminated to stop ongoing unauthorized queries

7. Phase 4: Remediation

7.1 Root Cause Analysis

  1. Collect evidence — Preserve logs, database snapshots, and affected container images before any remediation
  2. Timeline reconstruction — Build a chronological timeline of the incident from first indicator to detection
  3. Attack vector identification — Determine how the incident occurred (vulnerability, misconfiguration, credential compromise, etc.)
  4. Impact assessment — Identify all affected data, users, and systems

7.2 Remediation Actions

CategoryActions
Code fixPatch the vulnerability, deploy via normal CI/CD pipeline with expedited review
Configuration fixUpdate infrastructure configuration (IAM, firewall, Cloud Run settings)
Credential rotationRotate all potentially compromised credentials and secrets
Data restorationRestore from backup if data was corrupted or deleted
User notificationNotify affected users with clear description of impact and actions taken
Monitoring enhancementAdd detection rules to catch similar incidents in the future

7.3 Verification

Before declaring the incident resolved:
  1. Deploy the fix to a staging environment and verify
  2. Deploy to production
  3. Monitor for recurrence (minimum 24 hours for P1/P2)
  4. Confirm all containment measures have been reversed (or intentionally kept)
  5. Verify affected systems are operating normally

8. Phase 5: Post-Incident Review

8.1 Timeline

SeverityReview Deadline
P1 — CriticalWithin 48 hours of resolution
P2 — HighWithin 5 business days
P3 — MediumWithin 10 business days
P4 — LowMonthly batch review

8.2 Post-Incident Report Template

Incident Report: [INCIDENT-YYYY-MM-DD-NNN]

Summary:        [One-sentence description]
Severity:       [P1/P2/P3/P4]
Duration:       [Detection time to resolution time]
Impact:         [Users affected, data affected, service degradation]

Timeline:
  [timestamp] — [event description]
  [timestamp] — [event description]
  ...

Root Cause:     [Technical description of what went wrong]
Contributing Factors: [Process, tooling, or organizational factors]

Remediation:
  - [Action taken]
  - [Action taken]

Prevention:
  - [Improvement to prevent recurrence]
  - [Improvement to detect earlier]
  - [Improvement to contain faster]

Action Items:
  - [ ] [Specific task] — Owner: [name] — Due: [date]
  - [ ] [Specific task] — Owner: [name] — Due: [date]

8.3 Blameless Culture

Post-incident reviews focus on systemic improvements, not individual fault. The goal is to understand what happened, why existing controls failed to prevent or detect it, and what changes will reduce the likelihood and impact of similar incidents.

9. Roles and Responsibilities

RoleResponsibilities
Incident LeadCoordinates response, makes containment decisions, owns communication
Engineering ResponderInvestigates technical root cause, implements fixes
Communications LeadDrafts user notifications, updates status page, handles external inquiries
Executive SponsorApproves external communications for P1/P2, allocates resources

10. Annual Review

This incident response plan is reviewed and updated:
  • Annually as part of the security program review
  • After every P1/P2 incident to incorporate lessons learned
  • When infrastructure changes that affect detection or containment capabilities

11. Cross-References

TopicDocument
Security controls and encryptionSecurity Architecture
Access control and permission modelAccess Control Model
Audit logging and event trackingAudit Trail Design
Data breach notification (GDPR)Data Privacy and GDPR
Evidence retention after incidentsData Retention Policy

12. Regulatory References

StandardRequirementCurrent Status
SOC 2 CC7.3Evaluate security events to determine if they are incidentsImplemented — severity classification defined (P1—P4)
SOC 2 CC7.4Respond to identified security incidentsDocumented — containment and remediation procedures defined
SOC 2 CC7.5Identify the cause of incidents and take corrective actionDocumented — post-incident review process with root cause analysis
GDPR Art. 33Notification of personal data breach to supervisory authority within 72 hoursDocumented — external notification timeline defined
GDPR Art. 34Communication of personal data breach to data subjectsDocumented — user notification procedures defined
SECMaterial cybersecurity incident disclosure (Form 8-K for public companies)Noted — Equa currently serves private companies; relevant if customers have public parent entities

13. Revision History

DateVersionAuthorChanges
2026-02-210.1Agent (Phase 5 Session A)Initial draft
2026-02-210.2Agent (Phase 5 Session B)Template alignment (status header, scope table, numbered sections), agent incident cross-reference, regulatory references table, cross-references to other compliance docs