WebsitePlatform Login

Data Sources

Connect your existing document storage to DataVault

DataVault can sync documents from your existing storage systems. Configure one or more data sources to start processing your documents.

These data source configurations are relevant when you operate your own DataVault runtime (on-prem/customer-managed). In the default managed setup, most users configure sources in meinGPT and do not edit app_config.yaml directly.

For most users, configure sources in meinGPT first:

  1. Open Data Pools / Data Sources in meinGPT
  2. Add your source in the UI
  3. Start sync and verify indexed content

Use the per-source app_config.yaml examples below only when you run your own on-prem DataVault runtime.

Supported Data Sources

Basic Configuration

config/app_config.yaml
data_pools:
  - id: local
    type: local
    base_path: ./data
    
  - id: my-s3
    type: s3
    access_key_id: $AWS_ACCESS_KEY_ID
    secret_access_key: $AWS_SECRET_ACCESS_KEY
    endpoint: https://s3.amazonaws.com
    bucket_name: my-bucket

Security: Always use environment variables for credentials, never hardcode them in configuration files.

Synchronization (How It Works)

For all source types, synchronization follows the same high-level pattern:

  1. Data pool configuration is resolved (from meinGPT and/or local data_pools in app_config.yaml)
  2. Vault connector fetches files/content into local sync storage
  3. Content is parsed, chunked, embedded, and indexed
  4. Later sync runs update changed content incrementally

On this page