SPEC 016 — Google Drive Integration
| Field | Value |
|---|---|
| Status | DRAFT |
| Priority | P2 — Integration |
| Backend | equa-server/modules/google-drive/ |
| Frontend | equa-web/src/modules/google-drive/ |
1. Feature Purpose
Google Drive Integration lets organizations connect a Google account and automatically synchronize files from specific Google Drive folders into the Equa Data Room. Admins configure which Google Drive folders map to which Data Room paths, and the system handles ongoing sync — detecting new files, updates, and deletions. This eliminates manual file uploads for teams that already use Google Drive as their primary document store.2. Current State (Verified)
2.1 OAuth Connection Flow
| Detail | Value |
|---|---|
| Auth type | OAuth 2.0 authorization code flow |
| Scopes | drive.readonly, drive.metadata.readonly (read-only access) |
| Token storage | GoogleDriveConnections.accessToken / refreshToken (encrypted at rest) |
| Token refresh | Automatic refresh when tokenExpiresAt is in the past |
| Env vars | GOOGLE_DRIVE_CLIENT_ID, GOOGLE_DRIVE_CLIENT_SECRET, GOOGLE_DRIVE_REDIRECT_URI, GOOGLE_DRIVE_ENABLED |
2.2 Sync Configuration
| Detail | Value |
|---|---|
| Mapping model | GoogleDriveSyncConfigurations — one row per folder mapping |
| Folder picker | Frontend folder browser hitting Google Drive API via backend proxy |
| Target path | Maps to a Data Room virtual path (SPEC 008 directory structure) |
| Subfolder support | Optional via syncSubfolders boolean |
| File type filter | JSONB column supporting include/exclude MIME type lists |
2.3 Sync Execution
| Detail | Value |
|---|---|
| Trigger | Manual (UI button) or scheduled (cron-based) |
| Change detection | Compares googleModifiedTime and googleMd5Checksum against last-synced values |
| Conflict resolution | Google Drive wins — local copy is overwritten on mismatch |
| History tracking | GoogleDriveSyncHistory records every sync run with file counts and errors |
2.4 Frontend
| Component | Path |
|---|---|
| Sync page | equa-web/src/modules/google-drive/GoogleDriveSyncPage.tsx |
| Endpoints file | equa-server/modules/api/src/endpoints/google-drive-endpoints.ts |
3. Data Model
GoogleDriveConnections
| Column | Type | Constraints |
|---|---|---|
| id | uuid | PK |
| organization | uuid | FK → Organizations, NOT NULL |
| connectedBy | uuid | FK → Users, NOT NULL |
| accessToken | varchar | Encrypted |
| refreshToken | varchar | Encrypted |
| tokenExpiresAt | timestamp | NOT NULL |
| googleEmail | varchar | NOT NULL |
| googleUserId | varchar | NOT NULL |
| isActive | boolean | DEFAULT true |
| lastSyncAt | timestamp | nullable |
| syncError | varchar | nullable |
GoogleDriveSyncConfigurations
| Column | Type | Constraints |
|---|---|---|
| id | uuid | PK |
| organization | uuid | FK → Organizations, NOT NULL |
| connection | uuid | FK → GoogleDriveConnections, NOT NULL |
| googleFolderId | varchar | NOT NULL |
| googleFolderName | varchar | NOT NULL |
| targetDataRoomPath | varchar | NOT NULL |
| syncEnabled | boolean | DEFAULT true |
| syncSubfolders | boolean | DEFAULT false |
| lastSyncAt | timestamp | nullable |
| syncError | varchar | nullable |
| fileTypeFilter | jsonb | nullable — { include: string[], exclude: string[] } |
GoogleDriveSyncHistory
| Column | Type | Constraints |
|---|---|---|
| id | uuid | PK |
| organization | uuid | FK → Organizations, NOT NULL |
| configuration | uuid | FK → GoogleDriveSyncConfigurations, NOT NULL |
| triggeredBy | uuid | FK → Users, nullable (null for scheduled runs) |
| status | varchar | pending, running, completed, failed |
| errorMessage | varchar | nullable |
| filesProcessed | integer | DEFAULT 0 |
| filesAdded | integer | DEFAULT 0 |
| filesUpdated | integer | DEFAULT 0 |
| filesSkipped | integer | DEFAULT 0 |
| filesFailed | integer | DEFAULT 0 |
| startedAt | timestamp | NOT NULL |
| completedAt | timestamp | nullable |
| details | jsonb | nullable — per-file results |
GoogleDriveSyncedFiles
| Column | Type | Constraints |
|---|---|---|
| id | uuid | PK |
| organization | uuid | FK → Organizations, NOT NULL |
| configuration | uuid | FK → GoogleDriveSyncConfigurations, NOT NULL |
| googleFileId | varchar | NOT NULL |
| googleFileName | varchar | NOT NULL |
| googleMimeType | varchar | NOT NULL |
| googleModifiedTime | timestamp | NOT NULL |
| googleMd5Checksum | varchar | nullable (Google Docs lack MD5) |
| localFileId | uuid | FK → Files, nullable |
| dataRoomPath | varchar | NOT NULL |
| syncStatus | varchar | synced, pending, error, deleted |
| lastSyncedAt | timestamp | nullable |
| syncError | varchar | nullable |
| size | bigint | DEFAULT 0 |
4. API Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /api/v1/organizations/:id/google-drive/status | Yes | Check connection status and feature flag |
| POST | /api/v1/organizations/:id/google-drive/connect | Yes | Initiate OAuth flow, return redirect URL |
| GET | /api/v1/organizations/:id/google-drive/callback | Yes | Handle OAuth callback, store tokens |
| DELETE | /api/v1/organizations/:id/google-drive/disconnect | Yes | Revoke tokens and deactivate connection |
| GET | /api/v1/organizations/:id/google-drive/folders | Yes | Browse Google Drive folders (proxy to Drive API) |
| GET | /api/v1/organizations/:id/google-drive/configurations | Yes | List sync configurations |
| POST | /api/v1/organizations/:id/google-drive/configurations | Yes | Create a new folder-to-path mapping |
| PUT | /api/v1/organizations/:id/google-drive/configurations/:configId | Yes | Update sync configuration |
| DELETE | /api/v1/organizations/:id/google-drive/configurations/:configId | Yes | Remove sync configuration |
| POST | /api/v1/organizations/:id/google-drive/sync | Yes | Trigger manual sync for all active configurations |
| POST | /api/v1/organizations/:id/google-drive/sync/:configId | Yes | Trigger sync for a specific configuration |
| GET | /api/v1/organizations/:id/google-drive/history | Yes | List sync history with pagination |
| GET | /api/v1/organizations/:id/google-drive/files | Yes | List synced files for a configuration |
5. Frontend Components
| Component | Path | Description |
|---|---|---|
| GoogleDriveSyncPage | google-drive/GoogleDriveSyncPage.tsx | Main integration page — connection status, folder mappings, sync controls |
| ConnectionCard | google-drive/components/ConnectionCard.tsx | Displays connected Google account with email, status, disconnect button |
| FolderMappingList | google-drive/components/FolderMappingList.tsx | List of configured folder-to-path mappings with enable/disable toggles |
| FolderPicker | google-drive/components/FolderPicker.tsx | Browsable tree selector for Google Drive folders |
| SyncHistoryTable | google-drive/components/SyncHistoryTable.tsx | Paginated table of past sync runs with status, counts, and error details |
Frontend Behavior
- Feature gate — Page is hidden unless
GOOGLE_DRIVE_ENABLEDis true and the user haseditDocumentspermission. - OAuth popup — Connect button opens a popup for Google consent; callback closes the popup and refreshes connection status.
- Sync progress — Manual sync triggers poll the history endpoint until the run status reaches
completedorfailed. - Error display — Configuration-level and file-level sync errors are surfaced inline with retry options.
- Folder picker — Lazy-loads folder children on expand; caches results for the session.
6. Business Rules
- One connection per organization — Only one Google account can be connected at a time; connecting a new account replaces the previous one.
- Read-only access — The integration requests only
drive.readonlyscopes; it never modifies files on Google Drive. - Token refresh — Access tokens are refreshed automatically before API calls when expired; refresh failures set
syncErrorand mark the connection for re-auth. - Google Drive wins — During sync, if a file’s
googleModifiedTimeorgoogleMd5Checksumdiffers from the stored values, the local copy is replaced. - Subfolder opt-in — Subfolders are only traversed when
syncSubfoldersis explicitly enabled on the configuration. - File type filtering — When
fileTypeFilteris set, only files matching the include list (or not in the exclude list) are synced. - Sync isolation — Each configuration syncs independently; a failure in one does not block others.
- History retention — Sync history records are retained indefinitely for audit; the
detailsJSONB field stores per-file outcomes. - Soft disconnect — Disconnecting sets
isActive = falseand revokes the OAuth token but preserves previously synced files in the Data Room. - Google Docs export — Google-native formats (Docs, Sheets, Slides) are exported to their Microsoft Office equivalents (docx, xlsx, pptx) during sync since they lack direct download URLs.
7. Acceptance Criteria
- Admin can connect a Google account via OAuth popup and see the connected email
- Admin can disconnect the Google account; synced files remain in the Data Room
- Admin can browse Google Drive folders and select one for sync
- Admin can map a Google Drive folder to a specific Data Room path
- Admin can enable/disable subfolder traversal per mapping
- Admin can set file type filters (include or exclude by MIME type)
- Manual sync detects new, updated, and deleted files correctly
- Sync history shows accurate counts for processed/added/updated/skipped/failed files
- Failed syncs display error messages with retry option
- Token refresh happens transparently; expired tokens do not block sync
- Google Docs are exported as Office-format files
- Page is not visible when
GOOGLE_DRIVE_ENABLEDis false - Non-admin users cannot connect/disconnect or modify sync configurations
8. Risks
| Risk | Impact | Mitigation |
|---|---|---|
| OAuth token stored in database | Token theft enables Google Drive read access | Encrypt tokens at rest; restrict DB access; rotate on disconnect |
| Google API rate limits (Drive API: 12 000 queries/day default) | Sync fails mid-run for large folders | Implement exponential backoff; batch requests; track quota usage |
| Large file sync (>100 MB) | Timeout or memory pressure during download/upload to S3 | Stream files through the server without buffering; set per-file timeout |
| Google Docs lack MD5 checksums | Cannot detect content changes for native Google formats | Fall back to modifiedTime comparison for Docs/Sheets/Slides |
| Subfolder traversal on deeply nested structures | Excessive API calls and slow sync | Cap recursion depth; paginate folder listing; show progress |
| Stale refresh tokens (Google revokes after 6 months of inactivity) | Silent sync failure | Proactive health check on connection status; notify admin to re-auth |
| Concurrent sync triggers (manual + scheduled) | Duplicate file processing, race conditions | Lock sync per configuration; skip if already running |
| Feature flag misconfiguration | Users see broken integration page | Check GOOGLE_DRIVE_ENABLED on both frontend route guard and API middleware |