SPEC 023 — File Storage

Field	Value
Status	DRAFT
Priority	P0 — Launch-Critical
Backend	`equa-server/modules/file-storage/`
Frontend	`equa-web/src/service/lib/http-client.ts` (`postMultipart()`)

1. Feature Purpose

File Storage is the foundational module that all file-handling features in Equa depend on. It manages upload, storage, retrieval, and deduplication of files using AWS S3 as the backing store and a content-addressed hashing scheme for deduplication. Every document uploaded to Data Rooms (SPEC 008), agreement PDFs (SPEC 005), generated reports (SPEC 014), and Google Drive synced files (SPEC 016) ultimately flows through this module. A secondary Microsoft file storage integration handles files managed through the Microsoft 365 connector (SPEC 017).

2. Current State (Verified)

2.1 Storage Backend

Detail	Value
Primary store	AWS S3
Addressing	Content-addressed by file hash (SHA-256)
Deduplication	`HashedFiles` table stores unique content hashes; multiple `Files` rows can reference the same hash
Upload limit	`AWS_S3_UPLOAD_SIZE_LIMIT_MB` (default 10 MB)
Static URL	`STATIC_FILE_URL` — base URL for public/signed file access

2.2 Upload Flow

Detail	Value
Frontend method	`postMultipart()` in `equa-web/src/service/lib/http-client.ts`
HTTP client	Axios with `Content-Type: multipart/form-data`
Progress tracking	`onUploadProgress` callback for UI progress bars
Server processing	Compute hash → check `HashedFiles` → upload to S3 if new → create `Files` record

2.3 Microsoft File Storage

Detail	Value
Module path	`equa-server/modules/file-storage/` (shared module, MS-specific adapters)
Integration	Files from Microsoft 365 (SharePoint, OneDrive) stored via same S3 pipeline
Metadata	Original Microsoft file metadata preserved in `Files` record

2.4 Environment Variables

Variable	Required	Default	Description
`AWS_S3_UPLOAD_SIZE_LIMIT_MB`	No	`10`	Maximum upload size in megabytes
`STATIC_FILE_URL`	Yes	—	Base URL for serving stored files
AWS credentials	Yes	—	Standard AWS credential chain (env vars, IAM role, etc.)

3. Data Model

Files

Column	Type	Constraints
id	uuid	PK
hash	Hash	NOT NULL, FK → HashedFiles — content-addressed reference
filename	varchar	NOT NULL — original filename
url	varchar	NOT NULL — S3 object URL
extension	varchar	nullable — file extension (e.g. `pdf`, `docx`)
contentType	varchar	nullable — MIME type (e.g. `application/pdf`)
size	number	NOT NULL — file size in bytes
owner	uuid	FK → Users, NOT NULL

HashedFiles

Column	Type	Constraints
hash	Hash	PK — SHA-256 content hash
size	number	NOT NULL — file size in bytes (for quick size lookups without joining Files)

Relationships

Files → HashedFiles: Many-to-one. Multiple Files records (different filenames, owners, or contexts) can reference the same HashedFiles row when content is identical.
Files → Users: Many-to-one via owner. Tracks who uploaded the file.
DirectoryItems → Files: Many-to-one via DirectoryItems.file (SPEC 008). Data Room entries reference Files.

4. API Endpoints

Method	Path	Auth	Description
POST	`/api/v1/files/upload`	Yes	Upload a file (multipart/form-data)
POST	`/api/v1/organizations/:id/files/upload`	Yes	Upload a file scoped to an organization
GET	`/api/v1/files/:fileId`	Yes	Get file metadata
GET	`/api/v1/files/:fileId/download`	Yes	Download file content (streams from S3)
DELETE	`/api/v1/files/:fileId`	Yes	Delete file record (orphan cleanup handles S3)
GET	`/api/v1/organizations/:id/files`	Yes	List files for an organization

Upload Processing Pipeline

Client sends multipart/form-data via postMultipart() (Axios)
Server receives file stream, computes SHA-256 hash
Check HashedFiles for existing hash
If hash exists → skip S3 upload (dedup), create new Files record pointing to existing hash
If hash is new → upload to S3, create HashedFiles row, create Files record
Return Files record with id, url, and metadata

5. Frontend Components

Component	Path	Description
postMultipart	`service/lib/http-client.ts`	Axios wrapper for multipart file uploads with progress callback
ProgressBar	(shared component)	Upload progress indicator driven by `onUploadProgress`

Frontend Behavior

Multipart upload — postMultipart(url, formData, onProgress) sends files as multipart/form-data with Content-Type header set automatically by Axios.
Progress tracking — onUploadProgress receives Axios progress events with loaded and total bytes for percentage calculation.
Size validation — Frontend validates file size against AWS_S3_UPLOAD_SIZE_LIMIT_MB before initiating upload; over-limit files show an error toast.
Multiple files — Batch uploads send files sequentially (not parallel) to avoid overwhelming the server; each file gets its own progress bar.
Retry — No automatic retry on upload failure; users see an error and can retry manually.

6. Business Rules

Content-addressed deduplication — Files with identical content (same SHA-256 hash) are stored once in S3; additional uploads create new Files metadata records pointing to the same HashedFiles entry.
Upload size limit — Files exceeding AWS_S3_UPLOAD_SIZE_LIMIT_MB (default 10 MB) are rejected at the server with a 413 response. The frontend also validates before upload.
Hash as foreign key — Files.hash references HashedFiles.hash; the HashedFiles row must exist before the Files row is created.
Owner tracking — Every file upload records the authenticated user as Files.owner for attribution and access control.
S3 key structure — S3 object keys include the content hash to enable direct dedup lookup; the exact key format is {prefix}/{hash}/{filename}.
Soft delete — Deleting a Files record does not immediately remove the S3 object; orphaned HashedFiles entries (no remaining Files references) are cleaned up by a background job.
MIME type detection — Content type is determined from the file extension and validated against the file’s magic bytes where possible.
Static file URL — STATIC_FILE_URL is prepended to file paths for client-facing URLs; this allows CDN or proxy configuration without changing stored paths.
Microsoft file passthrough — Files originating from Microsoft 365 (via SPEC 017) flow through the same upload pipeline; Microsoft-specific metadata is preserved in the Files record.
No direct S3 access — Clients never interact with S3 directly; all uploads and downloads are proxied through the API server for access control.

7. Acceptance Criteria

8. Risks

Risk	Impact	Mitigation
S3 bucket misconfiguration (public access)	Data exposure of all stored files	Audit bucket policies; block public access; use signed URLs with expiry
Upload size limit bypass via chunked encoding	Over-limit files consume storage	Enforce size limit at reverse proxy (nginx/ALB) and application layer
Hash collision (SHA-256)	Extremely unlikely but would serve wrong file content	Probability is negligible (~1 in 2^128); no practical mitigation needed
Orphan S3 objects accumulate	Wasted storage costs	Run periodic orphan cleanup job comparing `HashedFiles` references to S3 inventory
No multipart/resumable upload for large files	Uploads fail on slow connections near the 10 MB limit	Consider S3 presigned URL uploads or chunked multipart for files >5 MB
`STATIC_FILE_URL` change breaks existing URLs	Stored file URLs become unreachable	Use relative paths internally; resolve full URL at response time
Concurrent upload of identical content	Race condition creating duplicate `HashedFiles` rows	Use `INSERT ... ON CONFLICT` (upsert) on hash column
No virus/malware scanning	Malicious files stored and served	Add ClamAV or S3 Object Lambda scanning before persisting
File deletion without cascade awareness	Orphaned `DirectoryItems` pointing to deleted `Files`	Enforce referential integrity; check for references before delete

Platform Architecture

Compliance & Security

Feature Specifications

Developer Onboarding

Business

Reference & Templates

File Storage

SPEC 023 — File Storage

1. Feature Purpose

2. Current State (Verified)

2.1 Storage Backend

2.2 Upload Flow

2.3 Microsoft File Storage

2.4 Environment Variables

3. Data Model

Files

HashedFiles

Relationships

4. API Endpoints

Upload Processing Pipeline

5. Frontend Components

Frontend Behavior

6. Business Rules

7. Acceptance Criteria

8. Risks

Platform Architecture

Compliance & Security

Feature Specifications

Developer Onboarding

Business

Reference & Templates

​SPEC 023 — File Storage

​1. Feature Purpose

​2. Current State (Verified)

​2.1 Storage Backend

​2.2 Upload Flow

​2.3 Microsoft File Storage

​2.4 Environment Variables

​3. Data Model

​Files

​HashedFiles

​Relationships

​4. API Endpoints

​Upload Processing Pipeline

​5. Frontend Components

​Frontend Behavior

​6. Business Rules

​7. Acceptance Criteria

​8. Risks

SPEC 023 — File Storage

1. Feature Purpose

2. Current State (Verified)

2.1 Storage Backend

2.2 Upload Flow

2.3 Microsoft File Storage

2.4 Environment Variables

3. Data Model

Files

HashedFiles

Relationships

4. API Endpoints

Upload Processing Pipeline

5. Frontend Components

Frontend Behavior

6. Business Rules

7. Acceptance Criteria

8. Risks