Complete reference for all DataVault enterprise configuration options
This is the complete configuration reference for meinGPT DataVault enterprise deployments. It covers all available configuration options for the main configuration file.
Advanced/on-prem only: this reference is for teams operating their own DataVault runtime.
If you use the default managed setup in meinGPT, you usually do not need to edit these files.
# Version of the config file formatversion: 1.0# Base URL for meinGPT servicemeingpt_url: $MEINGPT_URL# ================================# VAULT CORE SETTINGS# ================================vault: # Required: Your vault credentials from meinGPT dashboard id: your-vault-id secret: $VAULT_SECRET # Standalone mode - if true, vault won't connect to meinGPT server standalone_mode: false # Data storage directory (acts as sync target for rclone) data_dir: ./tmp # Ingestion settings ingestion_interval: 900 # Interval in seconds between ingestion runs (0 for disabled) tasks_batch_size: 10 # Tasks added to event loop at once from every datapool chunk_size: 256 # Size of each text chunk in tokens chunk_overlap: 26 # Overlapping tokens between consecutive chunks# ================================# VECTOR DATABASE CONFIGURATION# ================================weaviate: # Connection settings connection_type: local # "local" or "custom" host: localhost # Docker service name or IP port: 8001 # Weaviate port grpc_host: localhost # gRPC host for Weaviate grpc_port: 50051 # gRPC port for Weaviate # Authentication (empty string for local) api_key: ""# ================================# EMBEDDING MODEL CONFIGURATION# ================================embedding_model: # Required base settings for all providers rpm: 3000 # Requests per minute tpm: 1000000 # Tokens per minute # Optional prompts for specialized embedding query_prompt: null # Prompt prepended to queries document_prompt: null # Prompt prepended to documents # === AZURE OPENAI === provider: "azure" api_key: $AZURE_API_KEY api_version: "2023-05-15" model: text-embedding-3-small endpoint: https://your-endpoint.openai.azure.com/ embedding_dimensions: 512 # === OR OPENAI === # provider: "openai" # model: "text-embedding-ada-002" # Default model # base_url: null # Optional custom URL # api_key: $OPENAI_API_KEY # === OR NEBIUS === # provider: "nebius" # tokenizer: "BAAI/bge-multilingual-gemma2" # model: "bge-multilingual-gemma2" # base_url: "https://api.studio.nebius.ai/v1/" # api_key: $NEBIUS_API_KEY # === OR HUGGINGFACE LOCAL === # provider: "huggingface_local" # model: "sentence-transformers/all-mpnet-base-v2" # model_kwargs: {} # Additional model parameters # encode_kwargs: {} # Additional encoding parameters# ================================# LOGGING CONFIGURATION # ================================logging: log_level: "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL log_to_file: true log_file_path: "logs/app.log" uvicorn_log_file_path: "logs/uvicorn.log" # Sentry error tracking sentry_dsn: "" # Sentry DSN for error tracking sentry_event_level: "WARNING" sentry_tags: {} # Additional tags for Sentry events # Heartbeat monitoring heartbeat_url: null # URL for uptime monitoring heartbeat_interval_minutes: 1 # System monitoring intervals (0 to disable) system_monitoring_interval: 0 # System usage monitoring storage_monitoring_interval: 0 # Storage monitoring database_monitoring_interval: 0 # Database monitoring# ================================# API RATE LIMITING# ================================search_requests_per_minute: 30 # Search requests per minute limitsearch_results_limit: 20 # Maximum search results returned# ================================# DATA SOURCES CONFIGURATION# ================================data_pools: # === LOCAL FILESYSTEM === - id: local type: local base_path: ./data # Directory used as synchronization source # === AMAZON S3 === - id: s3-documents type: s3 access_key_id: $AWS_ACCESS_KEY_ID secret_access_key: $AWS_SECRET_ACCESS_KEY endpoint: $S3_ENDPOINT bucket_name: your-bucket-name provider: "Other" # AWS, MinIO, DigitalOcean, Other base_path: "documents/" # Optional folder prefix # === GOOGLE DRIVE === - id: google-drive type: drive refresh_token: $GOOGLE_REFRESH_TOKEN scope: "drive.readonly" # drive, drive.readonly, drive.file, drive.appfolder, drive.metadata.readonly root_folder_id: null # Optional specific folder team_drive: null # For shared drives client_id: null # Optional custom client client_secret: null base_path: "/" # === MICROSOFT ONEDRIVE === - id: onedrive type: onedrive client_id: "4306c62e-d96d-41a0-9f59-f577e3707aba" # Default client ID client_secret: null # Optional custom client secret refresh_token: $ONEDRIVE_REFRESH_TOKEN drive_id: $ONEDRIVE_DRIVE_ID drive_type: "personal" # personal, business, documentLibrary tenant_id: null # Optional custom tenant base_path: "/" # === CONFLUENCE === - id: confluence type: confluence url: "https://company.atlassian.net" username: $CONFLUENCE_USERNAME token: $CONFLUENCE_TOKEN space_id: $CONFLUENCE_SPACE_ID base_path: null # Optional # === SMB/CIFS NETWORK SHARE === - id: smb-share type: smb host: "server.company.com" user: $SMB_USERNAME password: $SMB_PASSWORD port: null # Optional port (default 445) domain: null # Optional domain spn: null # Optional SPN base_path: "/shared" # === WEBDAV === - id: webdav type: webdav url: "https://webdav.company.com" vendor: "nextcloud" # fastmail, nextcloud, owncloud, sharepoint, sharepoint-ntlm, rclone, other user: $WEBDAV_USERNAME password: $WEBDAV_PASSWORD bearer_token: null # Alternative to username/password base_path: "/"
data_pools is optional for many setups. The vault can fetch pool definitions configured in meinGPT at runtime; local entries are mainly for additional local/advanced scenarios.