WebsitePlatform Login

Data Sources

Connect your existing document storage to DataVault

DataVault can sync documents from your existing storage systems. Configure one or more data sources to start processing your documents.

These data source configurations are relevant when you operate your own DataVault runtime (on-prem/customer-managed). In the default managed setup, most users configure sources in meinGPT and do not edit app_config.yaml directly.

UI-First Setup (Empfohlen)

Für die meisten Nutzer gilt: Quellen zuerst direkt in meinGPT konfigurieren:

  1. Öffne Data Pools / Data Sources in meinGPT
  2. Quelle im UI hinzufügen
  3. Sync starten und indexierte Inhalte prüfen

Die app_config.yaml-Beispiele pro Quelle brauchst Du nur, wenn Du einen eigenen On-Prem-DataVault betreibst.

Supported Data Sources

Basic Configuration

config/app_config.yaml
data_pools:
  - id: local
    type: local
    base_path: ./data
    
  - id: my-s3
    type: s3
    access_key_id: $AWS_ACCESS_KEY_ID
    secret_access_key: $AWS_SECRET_ACCESS_KEY
    endpoint: https://s3.amazonaws.com
    bucket_name: my-bucket

Security: Always use environment variables for credentials, never hardcode them in configuration files.

Synchronization (How It Works)

For all source types, synchronization follows the same high-level pattern:

  1. Data pool configuration is resolved (from meinGPT and/or local data_pools in app_config.yaml)
  2. Vault connector fetches files/content into local sync storage
  3. Content is parsed, chunked, embedded, and indexed
  4. Later sync runs update changed content incrementally

Auf dieser Seite