Incident Response Plan
Status: DRAFT
Owner: Engineering
Last Review: 2026-02-21
Applicable Standards: SOC 2 (CC7.3, CC7.4, CC7.5) / GDPR (Art. 33, Art. 34) / SEC (data breach disclosure)
1. Purpose
This document defines the incident response procedures for the Equa platform. It covers how incidents are detected, who is notified, how they are contained and remediated, and how post-incident reviews are conducted.
2. Scope
| Component | In Scope | Notes |
|---|
| equa-server | Yes | Application-level incidents, API outages, data breaches |
| equa-web | Yes | Frontend availability, client-side security issues |
| PostgreSQL (Cloud SQL) | Yes | Database incidents, data corruption, unauthorized access |
| AWS S3 | Yes | Document storage incidents, access control breaches |
| Google Cloud Run | Yes | Infrastructure incidents, deployment failures |
| equabot-gateway | Yes | AI agent incidents, permission violations |
For incidents involving the AI agent (Equanaut), also refer to the gateway-specific incident response procedures documented in the equabot-gateway repository. Agent-specific controls include rate limiting (AGENT_MAX_TOOL_CALLS_PER_MINUTE, AGENT_MAX_WRITE_OPS_PER_MINUTE, AGENT_MAX_DESTRUCTIVE_PER_HOUR) and the permission proxy that enforces user-level permissions on all agent tool calls.Source: equa-server/modules/agent/src/security/guardrails.ts
3. Incident Severity Levels
| Severity | Description | Examples | Response Time |
|---|
| P1 — Critical | Service outage or confirmed data breach | Database compromise, production down, unauthorized data access | Immediate (within 15 minutes) |
| P2 — High | Degraded service or suspected security incident | Partial outage, unusual access patterns, failed deployment causing errors | Within 1 hour |
| P3 — Medium | Non-critical issue with potential security impact | Elevated error rates, dependency vulnerability disclosed, suspicious login activity | Within 4 hours |
| P4 — Low | Minor issue, no immediate security impact | Performance degradation, non-critical bug, informational security alert | Within 24 hours |
4. Phase 1: Detection
4.1 Automated Detection
| Mechanism | What It Detects | Current Status |
|---|
| Health endpoint monitoring | Cloud Run instance health, application availability | Implemented (Cloud Run built-in) |
| Error logging | Application exceptions, unhandled errors | Implemented (application logs) |
| Cloud Run metrics | Request latency, error rates, instance count | Available (Google Cloud Monitoring) |
| Database health | Cloud SQL availability, connection pool exhaustion | Available (Google Cloud Monitoring) |
| Agent guardrails | Tool call rate limit violations, unauthorized write operations | Implemented (equa-server/modules/agent/src/security/guardrails.ts) |
4.2 Manual Detection
| Source | What It Detects |
|---|
| User reports | Functionality issues, unexpected behavior, suspicious activity |
| Team observation | Unusual patterns during routine operations |
| Third-party notification | Vulnerability disclosure, vendor security advisory |
4.3 Detection Gaps
The following detection capabilities should be implemented to improve incident identification.| Gap | Recommendation |
|---|
| No external uptime monitoring | Deploy a third-party uptime monitor (e.g., Better Uptime, Pingdom) |
| No alerting on error rate spikes | Configure Cloud Monitoring alerts for 5xx rate exceeding threshold |
| No authentication anomaly detection | Monitor for brute-force patterns, credential stuffing, geographic anomalies |
| No audit log anomaly detection | Alert on unusual admin actions, bulk data access, or privilege escalation |
5. Phase 2: Notification
5.1 Internal Notification
When an incident is detected, the following notification chain is activated:
| Step | Action | Responsible |
|---|
| 1 | Incident detected (automated alert or manual report) | Detection system / reporter |
| 2 | Incident logged with severity level, timestamp, and initial description | First responder |
| 3 | Incident lead assigned based on severity and type | Engineering lead |
| 4 | Notification sent to relevant team members | Incident lead |
| 5 | For P1/P2: executive stakeholders notified | Incident lead |
5.2 External Notification
| Scenario | Notification Required | Timeline |
|---|
| Confirmed data breach (PII) | Affected users, relevant supervisory authority (GDPR: within 72 hours) | GDPR Article 33: 72 hours to authority; Article 34: without undue delay to users |
| Confirmed data breach (financial) | Affected users, state attorneys general (per state breach notification laws) | Varies by state; typically 30—60 days |
| Service outage | Affected users via status page or email | As soon as impact is confirmed |
| Vulnerability in third-party dependency | No external notification unless exploited | Internal assessment first |
6. Phase 3: Containment
| Action | When to Use | How |
|---|
| Isolate Cloud Run instance | Suspected compromised instance | Deploy a new revision with the fix; route traffic away from compromised revision |
| Revoke sessions | Suspected credential compromise | Truncate the sessions table or invalidate specific user sessions via equa-server/modules/auth/src/sessions.ts |
| Disable user account | Confirmed compromised account | Set Users.enabled = false; destroy all active sessions |
| Block IP range | Active attack from identifiable source | Configure Cloud Armor or firewall rules |
| Rotate secrets | Suspected secret exposure | Rotate API_SESSION_SECRET, TWO_FACTOR_PRIVATE_KEY, database credentials, OAuth secrets; redeploy |
| Enable maintenance mode | Widespread compromise requiring investigation | Deploy a static maintenance page; stop processing requests |
| Disable agent | AI agent acting outside expected parameters | Revoke agent permissions via permission proxy; disable agent tool access |
6.2 Cloud Run-Specific Containment
Google Cloud Run provides natural containment boundaries:
- Instance isolation — Each request is handled by an isolated container instance
- Revision-based deployment — Traffic can be routed to a known-good revision instantly
- Scaling controls — Min/max instances can be adjusted (currently 1—10) to limit blast radius
- Service disable — The entire service can be stopped if necessary
6.3 Database Containment
- Read-only mode — Cloud SQL can be set to read-only to prevent further data modification
- Point-in-time recovery — Cloud SQL automated backups enable restoration to a specific timestamp
- Connection kill — Active database connections can be terminated to stop ongoing unauthorized queries
7.1 Root Cause Analysis
- Collect evidence — Preserve logs, database snapshots, and affected container images before any remediation
- Timeline reconstruction — Build a chronological timeline of the incident from first indicator to detection
- Attack vector identification — Determine how the incident occurred (vulnerability, misconfiguration, credential compromise, etc.)
- Impact assessment — Identify all affected data, users, and systems
| Category | Actions |
|---|
| Code fix | Patch the vulnerability, deploy via normal CI/CD pipeline with expedited review |
| Configuration fix | Update infrastructure configuration (IAM, firewall, Cloud Run settings) |
| Credential rotation | Rotate all potentially compromised credentials and secrets |
| Data restoration | Restore from backup if data was corrupted or deleted |
| User notification | Notify affected users with clear description of impact and actions taken |
| Monitoring enhancement | Add detection rules to catch similar incidents in the future |
7.3 Verification
Before declaring the incident resolved:
- Deploy the fix to a staging environment and verify
- Deploy to production
- Monitor for recurrence (minimum 24 hours for P1/P2)
- Confirm all containment measures have been reversed (or intentionally kept)
- Verify affected systems are operating normally
8. Phase 5: Post-Incident Review
8.1 Timeline
| Severity | Review Deadline |
|---|
| P1 — Critical | Within 48 hours of resolution |
| P2 — High | Within 5 business days |
| P3 — Medium | Within 10 business days |
| P4 — Low | Monthly batch review |
8.2 Post-Incident Report Template
Incident Report: [INCIDENT-YYYY-MM-DD-NNN]
Summary: [One-sentence description]
Severity: [P1/P2/P3/P4]
Duration: [Detection time to resolution time]
Impact: [Users affected, data affected, service degradation]
Timeline:
[timestamp] — [event description]
[timestamp] — [event description]
...
Root Cause: [Technical description of what went wrong]
Contributing Factors: [Process, tooling, or organizational factors]
Remediation:
- [Action taken]
- [Action taken]
Prevention:
- [Improvement to prevent recurrence]
- [Improvement to detect earlier]
- [Improvement to contain faster]
Action Items:
- [ ] [Specific task] — Owner: [name] — Due: [date]
- [ ] [Specific task] — Owner: [name] — Due: [date]
8.3 Blameless Culture
Post-incident reviews focus on systemic improvements, not individual fault. The goal is to understand what happened, why existing controls failed to prevent or detect it, and what changes will reduce the likelihood and impact of similar incidents.
9. Roles and Responsibilities
| Role | Responsibilities |
|---|
| Incident Lead | Coordinates response, makes containment decisions, owns communication |
| Engineering Responder | Investigates technical root cause, implements fixes |
| Communications Lead | Drafts user notifications, updates status page, handles external inquiries |
| Executive Sponsor | Approves external communications for P1/P2, allocates resources |
10. Annual Review
This incident response plan is reviewed and updated:
- Annually as part of the security program review
- After every P1/P2 incident to incorporate lessons learned
- When infrastructure changes that affect detection or containment capabilities
11. Cross-References
| Topic | Document |
|---|
| Security controls and encryption | Security Architecture |
| Access control and permission model | Access Control Model |
| Audit logging and event tracking | Audit Trail Design |
| Data breach notification (GDPR) | Data Privacy and GDPR |
| Evidence retention after incidents | Data Retention Policy |
12. Regulatory References
| Standard | Requirement | Current Status |
|---|
| SOC 2 CC7.3 | Evaluate security events to determine if they are incidents | Implemented — severity classification defined (P1—P4) |
| SOC 2 CC7.4 | Respond to identified security incidents | Documented — containment and remediation procedures defined |
| SOC 2 CC7.5 | Identify the cause of incidents and take corrective action | Documented — post-incident review process with root cause analysis |
| GDPR Art. 33 | Notification of personal data breach to supervisory authority within 72 hours | Documented — external notification timeline defined |
| GDPR Art. 34 | Communication of personal data breach to data subjects | Documented — user notification procedures defined |
| SEC | Material cybersecurity incident disclosure (Form 8-K for public companies) | Noted — Equa currently serves private companies; relevant if customers have public parent entities |
13. Revision History
| Date | Version | Author | Changes |
|---|
| 2026-02-21 | 0.1 | Agent (Phase 5 Session A) | Initial draft |
| 2026-02-21 | 0.2 | Agent (Phase 5 Session B) | Template alignment (status header, scope table, numbered sections), agent incident cross-reference, regulatory references table, cross-references to other compliance docs |