Skip to main content

SPEC 016 — Google Drive Integration

FieldValue
StatusDRAFT
PriorityP2 — Integration
Backendequa-server/modules/google-drive/
Frontendequa-web/src/modules/google-drive/

1. Feature Purpose

Google Drive Integration lets organizations connect a Google account and automatically synchronize files from specific Google Drive folders into the Equa Data Room. Admins configure which Google Drive folders map to which Data Room paths, and the system handles ongoing sync — detecting new files, updates, and deletions. This eliminates manual file uploads for teams that already use Google Drive as their primary document store.

2. Current State (Verified)

2.1 OAuth Connection Flow

DetailValue
Auth typeOAuth 2.0 authorization code flow
Scopesdrive.readonly, drive.metadata.readonly (read-only access)
Token storageGoogleDriveConnections.accessToken / refreshToken (encrypted at rest)
Token refreshAutomatic refresh when tokenExpiresAt is in the past
Env varsGOOGLE_DRIVE_CLIENT_ID, GOOGLE_DRIVE_CLIENT_SECRET, GOOGLE_DRIVE_REDIRECT_URI, GOOGLE_DRIVE_ENABLED

2.2 Sync Configuration

DetailValue
Mapping modelGoogleDriveSyncConfigurations — one row per folder mapping
Folder pickerFrontend folder browser hitting Google Drive API via backend proxy
Target pathMaps to a Data Room virtual path (SPEC 008 directory structure)
Subfolder supportOptional via syncSubfolders boolean
File type filterJSONB column supporting include/exclude MIME type lists

2.3 Sync Execution

DetailValue
TriggerManual (UI button) or scheduled (cron-based)
Change detectionCompares googleModifiedTime and googleMd5Checksum against last-synced values
Conflict resolutionGoogle Drive wins — local copy is overwritten on mismatch
History trackingGoogleDriveSyncHistory records every sync run with file counts and errors

2.4 Frontend

ComponentPath
Sync pageequa-web/src/modules/google-drive/GoogleDriveSyncPage.tsx
Endpoints fileequa-server/modules/api/src/endpoints/google-drive-endpoints.ts

3. Data Model

GoogleDriveConnections

ColumnTypeConstraints
iduuidPK
organizationuuidFK → Organizations, NOT NULL
connectedByuuidFK → Users, NOT NULL
accessTokenvarcharEncrypted
refreshTokenvarcharEncrypted
tokenExpiresAttimestampNOT NULL
googleEmailvarcharNOT NULL
googleUserIdvarcharNOT NULL
isActivebooleanDEFAULT true
lastSyncAttimestampnullable
syncErrorvarcharnullable

GoogleDriveSyncConfigurations

ColumnTypeConstraints
iduuidPK
organizationuuidFK → Organizations, NOT NULL
connectionuuidFK → GoogleDriveConnections, NOT NULL
googleFolderIdvarcharNOT NULL
googleFolderNamevarcharNOT NULL
targetDataRoomPathvarcharNOT NULL
syncEnabledbooleanDEFAULT true
syncSubfoldersbooleanDEFAULT false
lastSyncAttimestampnullable
syncErrorvarcharnullable
fileTypeFilterjsonbnullable — { include: string[], exclude: string[] }

GoogleDriveSyncHistory

ColumnTypeConstraints
iduuidPK
organizationuuidFK → Organizations, NOT NULL
configurationuuidFK → GoogleDriveSyncConfigurations, NOT NULL
triggeredByuuidFK → Users, nullable (null for scheduled runs)
statusvarcharpending, running, completed, failed
errorMessagevarcharnullable
filesProcessedintegerDEFAULT 0
filesAddedintegerDEFAULT 0
filesUpdatedintegerDEFAULT 0
filesSkippedintegerDEFAULT 0
filesFailedintegerDEFAULT 0
startedAttimestampNOT NULL
completedAttimestampnullable
detailsjsonbnullable — per-file results

GoogleDriveSyncedFiles

ColumnTypeConstraints
iduuidPK
organizationuuidFK → Organizations, NOT NULL
configurationuuidFK → GoogleDriveSyncConfigurations, NOT NULL
googleFileIdvarcharNOT NULL
googleFileNamevarcharNOT NULL
googleMimeTypevarcharNOT NULL
googleModifiedTimetimestampNOT NULL
googleMd5Checksumvarcharnullable (Google Docs lack MD5)
localFileIduuidFK → Files, nullable
dataRoomPathvarcharNOT NULL
syncStatusvarcharsynced, pending, error, deleted
lastSyncedAttimestampnullable
syncErrorvarcharnullable
sizebigintDEFAULT 0

4. API Endpoints

MethodPathAuthDescription
GET/api/v1/organizations/:id/google-drive/statusYesCheck connection status and feature flag
POST/api/v1/organizations/:id/google-drive/connectYesInitiate OAuth flow, return redirect URL
GET/api/v1/organizations/:id/google-drive/callbackYesHandle OAuth callback, store tokens
DELETE/api/v1/organizations/:id/google-drive/disconnectYesRevoke tokens and deactivate connection
GET/api/v1/organizations/:id/google-drive/foldersYesBrowse Google Drive folders (proxy to Drive API)
GET/api/v1/organizations/:id/google-drive/configurationsYesList sync configurations
POST/api/v1/organizations/:id/google-drive/configurationsYesCreate a new folder-to-path mapping
PUT/api/v1/organizations/:id/google-drive/configurations/:configIdYesUpdate sync configuration
DELETE/api/v1/organizations/:id/google-drive/configurations/:configIdYesRemove sync configuration
POST/api/v1/organizations/:id/google-drive/syncYesTrigger manual sync for all active configurations
POST/api/v1/organizations/:id/google-drive/sync/:configIdYesTrigger sync for a specific configuration
GET/api/v1/organizations/:id/google-drive/historyYesList sync history with pagination
GET/api/v1/organizations/:id/google-drive/filesYesList synced files for a configuration

5. Frontend Components

ComponentPathDescription
GoogleDriveSyncPagegoogle-drive/GoogleDriveSyncPage.tsxMain integration page — connection status, folder mappings, sync controls
ConnectionCardgoogle-drive/components/ConnectionCard.tsxDisplays connected Google account with email, status, disconnect button
FolderMappingListgoogle-drive/components/FolderMappingList.tsxList of configured folder-to-path mappings with enable/disable toggles
FolderPickergoogle-drive/components/FolderPicker.tsxBrowsable tree selector for Google Drive folders
SyncHistoryTablegoogle-drive/components/SyncHistoryTable.tsxPaginated table of past sync runs with status, counts, and error details

Frontend Behavior

  • Feature gate — Page is hidden unless GOOGLE_DRIVE_ENABLED is true and the user has editDocuments permission.
  • OAuth popup — Connect button opens a popup for Google consent; callback closes the popup and refreshes connection status.
  • Sync progress — Manual sync triggers poll the history endpoint until the run status reaches completed or failed.
  • Error display — Configuration-level and file-level sync errors are surfaced inline with retry options.
  • Folder picker — Lazy-loads folder children on expand; caches results for the session.

6. Business Rules

  1. One connection per organization — Only one Google account can be connected at a time; connecting a new account replaces the previous one.
  2. Read-only access — The integration requests only drive.readonly scopes; it never modifies files on Google Drive.
  3. Token refresh — Access tokens are refreshed automatically before API calls when expired; refresh failures set syncError and mark the connection for re-auth.
  4. Google Drive wins — During sync, if a file’s googleModifiedTime or googleMd5Checksum differs from the stored values, the local copy is replaced.
  5. Subfolder opt-in — Subfolders are only traversed when syncSubfolders is explicitly enabled on the configuration.
  6. File type filtering — When fileTypeFilter is set, only files matching the include list (or not in the exclude list) are synced.
  7. Sync isolation — Each configuration syncs independently; a failure in one does not block others.
  8. History retention — Sync history records are retained indefinitely for audit; the details JSONB field stores per-file outcomes.
  9. Soft disconnect — Disconnecting sets isActive = false and revokes the OAuth token but preserves previously synced files in the Data Room.
  10. Google Docs export — Google-native formats (Docs, Sheets, Slides) are exported to their Microsoft Office equivalents (docx, xlsx, pptx) during sync since they lack direct download URLs.

7. Acceptance Criteria

  • Admin can connect a Google account via OAuth popup and see the connected email
  • Admin can disconnect the Google account; synced files remain in the Data Room
  • Admin can browse Google Drive folders and select one for sync
  • Admin can map a Google Drive folder to a specific Data Room path
  • Admin can enable/disable subfolder traversal per mapping
  • Admin can set file type filters (include or exclude by MIME type)
  • Manual sync detects new, updated, and deleted files correctly
  • Sync history shows accurate counts for processed/added/updated/skipped/failed files
  • Failed syncs display error messages with retry option
  • Token refresh happens transparently; expired tokens do not block sync
  • Google Docs are exported as Office-format files
  • Page is not visible when GOOGLE_DRIVE_ENABLED is false
  • Non-admin users cannot connect/disconnect or modify sync configurations

8. Risks

RiskImpactMitigation
OAuth token stored in databaseToken theft enables Google Drive read accessEncrypt tokens at rest; restrict DB access; rotate on disconnect
Google API rate limits (Drive API: 12 000 queries/day default)Sync fails mid-run for large foldersImplement exponential backoff; batch requests; track quota usage
Large file sync (>100 MB)Timeout or memory pressure during download/upload to S3Stream files through the server without buffering; set per-file timeout
Google Docs lack MD5 checksumsCannot detect content changes for native Google formatsFall back to modifiedTime comparison for Docs/Sheets/Slides
Subfolder traversal on deeply nested structuresExcessive API calls and slow syncCap recursion depth; paginate folder listing; show progress
Stale refresh tokens (Google revokes after 6 months of inactivity)Silent sync failureProactive health check on connection status; notify admin to re-auth
Concurrent sync triggers (manual + scheduled)Duplicate file processing, race conditionsLock sync per configuration; skip if already running
Feature flag misconfigurationUsers see broken integration pageCheck GOOGLE_DRIVE_ENABLED on both frontend route guard and API middleware