Complete Configuration Reference
Complete reference for all DataVault enterprise configuration options
This is the complete configuration reference for meinGPT DataVault enterprise deployments. It covers all available configuration options for the main configuration file.
Configuration Structure
Vault Settings
Core vault credentials, processing, and system configuration
Weaviate Database
Vector database connection and configuration
Embedding Models
OpenAI, Azure, Nebius, and HuggingFace model configurations
Data Sources
S3, OneDrive, Google Drive, Confluence, SMB, WebDAV, and local file configurations
Configuration File Structure
# Version of the config file format
version: 1.0
# Base URL for meinGPT service
meingpt_url: $MEINGPT_URL
# ================================
# VAULT CORE SETTINGS
# ================================
vault:
# Required: Your vault credentials from meinGPT dashboard
id: your-vault-id
secret: $VAULT_SECRET
# Standalone mode - if true, vault won't connect to meinGPT server
standalone_mode: false
# Data storage directory (acts as sync target for rclone)
data_dir: ./tmp
# Ingestion settings
ingestion_interval: 900 # Interval in seconds between ingestion runs (0 for disabled)
tasks_batch_size: 10 # Tasks added to event loop at once from every datapool
chunk_size: 256 # Size of each text chunk in tokens
chunk_overlap: 26 # Overlapping tokens between consecutive chunks
# ================================
# VECTOR DATABASE CONFIGURATION
# ================================
weaviate:
# Connection settings
connection_type: local # "local" or "custom"
host: localhost # Docker service name or IP
port: 8001 # Weaviate port
grpc_host: localhost # gRPC host for Weaviate
grpc_port: 50051 # gRPC port for Weaviate
# Authentication (empty string for local)
api_key: ""
# ================================
# EMBEDDING MODEL CONFIGURATION
# ================================
embedding_model:
# Required base settings for all providers
rpm: 3000 # Requests per minute
tpm: 1000000 # Tokens per minute
# Optional prompts for specialized embedding
query_prompt: null # Prompt prepended to queries
document_prompt: null # Prompt prepended to documents
# === AZURE OPENAI ===
provider: "azure"
api_key: $AZURE_API_KEY
api_version: "2023-05-15"
model: text-embedding-3-small
endpoint: https://your-endpoint.openai.azure.com/
embedding_dimensions: 512
# === OR OPENAI ===
# provider: "openai"
# model: "text-embedding-ada-002" # Default model
# base_url: null # Optional custom URL
# api_key: $OPENAI_API_KEY
# === OR NEBIUS ===
# provider: "nebius"
# tokenizer: "BAAI/bge-multilingual-gemma2"
# model: "bge-multilingual-gemma2"
# base_url: "https://api.studio.nebius.ai/v1/"
# api_key: $NEBIUS_API_KEY
# === OR HUGGINGFACE LOCAL ===
# provider: "huggingface_local"
# model: "sentence-transformers/all-mpnet-base-v2"
# model_kwargs: {} # Additional model parameters
# encode_kwargs: {} # Additional encoding parameters
# ================================
# LOGGING CONFIGURATION
# ================================
logging:
log_level: "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
log_to_file: true
log_file_path: "logs/app.log"
uvicorn_log_file_path: "logs/uvicorn.log"
# Sentry error tracking
sentry_dsn: "" # Sentry DSN for error tracking
sentry_event_level: "WARNING"
sentry_tags: {} # Additional tags for Sentry events
# Heartbeat monitoring
heartbeat_url: null # URL for uptime monitoring
heartbeat_interval_minutes: 1
# System monitoring intervals (0 to disable)
system_monitoring_interval: 0 # System usage monitoring
storage_monitoring_interval: 0 # Storage monitoring
database_monitoring_interval: 0 # Database monitoring
# ================================
# API RATE LIMITING
# ================================
search_requests_per_minute: 30 # Search requests per minute limit
search_results_limit: 20 # Maximum search results returned
# ================================
# DATA SOURCES CONFIGURATION
# ================================
data_pools:
# === LOCAL FILESYSTEM ===
- id: local
type: local
base_path: ./data # Directory used as synchronization source
# === AMAZON S3 ===
- id: s3-documents
type: s3
access_key_id: $AWS_ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
endpoint: $S3_ENDPOINT
bucket_name: your-bucket-name
provider: "Other" # AWS, MinIO, DigitalOcean, Other
base_path: "documents/" # Optional folder prefix
# === GOOGLE DRIVE ===
- id: google-drive
type: drive
refresh_token: $GOOGLE_REFRESH_TOKEN
scope: "drive.readonly" # drive, drive.readonly, drive.file, drive.appfolder, drive.metadata.readonly
root_folder_id: null # Optional specific folder
team_drive: null # For shared drives
client_id: null # Optional custom client
client_secret: null
base_path: "/"
# === MICROSOFT ONEDRIVE ===
- id: onedrive
type: onedrive
client_id: "4306c62e-d96d-41a0-9f59-f577e3707aba" # Default client ID
client_secret: null # Optional custom client secret
refresh_token: $ONEDRIVE_REFRESH_TOKEN
drive_id: $ONEDRIVE_DRIVE_ID
drive_type: "personal" # personal, business, documentLibrary
tenant_id: null # Optional custom tenant
base_path: "/"
# === CONFLUENCE ===
- id: confluence
type: confluence
url: "https://company.atlassian.net"
username: $CONFLUENCE_USERNAME
token: $CONFLUENCE_TOKEN
space_id: $CONFLUENCE_SPACE_ID
base_path: null # Optional
# === SMB/CIFS NETWORK SHARE ===
- id: smb-share
type: smb
host: "server.company.com"
user: $SMB_USERNAME
password: $SMB_PASSWORD
port: null # Optional port (default 445)
domain: null # Optional domain
spn: null # Optional SPN
base_path: "/shared"
# === WEBDAV ===
- id: webdav
type: webdav
url: "https://webdav.company.com"
vendor: "nextcloud" # fastmail, nextcloud, owncloud, sharepoint, sharepoint-ntlm, rclone, other
user: $WEBDAV_USERNAME
password: $WEBDAV_PASSWORD
bearer_token: null # Alternative to username/password
base_path: "/"
Configuration Options Reference
Vault Settings
Field | Type | Default | Required | Description |
---|---|---|---|---|
id | string | - | ✅ | Unique identifier for the Vault instance |
secret | string | - | ✅ | Secret key for authentication |
standalone_mode | boolean | false | ❌ | If true, vault won't connect to meinGPT server |
data_dir | string | "./tmp" | ❌ | Directory for temporary data and rclone sync |
ingestion_interval | integer | 900 | ❌ | Seconds between ingestion runs (0 to disable) |
tasks_batch_size | integer | 10 | ❌ | Tasks added to event loop at once per datapool |
chunk_size | integer | 256 | ❌ | Text chunk size in tokens |
chunk_overlap | integer | 26 | ❌ | Overlapping tokens between chunks |
Weaviate Settings
Field | Type | Default | Required | Description |
---|---|---|---|---|
connection_type | string | "local" | ❌ | Connection type: "local" or "custom" |
host | string | "weaviate" | ❌ | HTTP host for Weaviate instance |
port | integer | 8001 | ❌ | HTTP port for Weaviate instance |
grpc_host | string | "weaviate" | ❌ | gRPC host for Weaviate instance |
grpc_port | integer | 50051 | ❌ | gRPC port for Weaviate instance |
api_key | string | "" | ❌ | API key for authentication (empty for local) |
Embedding Model Settings
Common Settings (All Providers)
Field | Type | Default | Required | Description |
---|---|---|---|---|
provider | string | - | ✅ | Provider: "azure", "openai", "nebius", "huggingface_local" |
rpm | integer | 3000 | ❌ | Requests per minute |
tpm | integer | 1000000 | ❌ | Tokens per minute |
query_prompt | string | null | ❌ | Prompt prepended to queries |
document_prompt | string | null | ❌ | Prompt prepended to documents |
Azure OpenAI (provider: "azure"
)
Field | Type | Default | Required | Description |
---|---|---|---|---|
api_key | string | - | ✅ | Azure API key |
api_version | string | "2023-05-15" | ❌ | API version |
model | string | "text-embedding-3-small" | ❌ | Model name |
endpoint | string | "https://meingpt-canada.openai.azure.com/" | ❌ | Azure endpoint URL |
embedding_dimensions | integer | 512 | ❌ | Number of dimensions |
OpenAI (provider: "openai"
)
Field | Type | Default | Required | Description |
---|---|---|---|---|
model | string | "text-embedding-ada-002" | ❌ | Model name |
base_url | string | null | ❌ | Optional custom URL |
api_key | string | - | ✅ | OpenAI API key |
Nebius (provider: "nebius"
)
Field | Type | Default | Required | Description |
---|---|---|---|---|
tokenizer | string | "BAAI/bge-multilingual-gemma2" | ❌ | Tokenizer name |
model | string | "bge-multilingual-gemma2" | ❌ | Model name |
base_url | string | "https://api.studio.nebius.ai/v1/" | ❌ | Nebius API URL |
api_key | string | - | ✅ | Nebius API key |
HuggingFace Local (provider: "huggingface_local"
)
Field | Type | Default | Required | Description |
---|---|---|---|---|
model | string | "sentence-transformers/all-mpnet-base-v2" | ❌ | Model path or name |
model_kwargs | object | ❌ | Model initialization parameters | |
encode_kwargs | object | ❌ | Encoding parameters |
Logging Settings
Field | Type | Default | Required | Description |
---|---|---|---|---|
log_level | string | "INFO" | ❌ | DEBUG, INFO, WARNING, ERROR, CRITICAL |
log_to_file | boolean | true | ❌ | Write logs to file |
log_file_path | string | "logs/app.log" | ❌ | Main application log file |
uvicorn_log_file_path | string | "logs/uvicorn.log" | ❌ | Uvicorn server logs |
sentry_dsn | string | "" | ❌ | Sentry DSN for error tracking |
sentry_event_level | string | "WARNING" | ❌ | Level for Sentry events |
sentry_tags | object | ❌ | Additional tags for Sentry events | |
heartbeat_url | string | null | ❌ | URL for uptime monitoring |
heartbeat_interval_minutes | integer | 1 | ❌ | Heartbeat interval |
system_monitoring_interval | integer | 0 | ❌ | System usage monitoring (0 to disable) |
storage_monitoring_interval | integer | 0 | ❌ | Storage monitoring (0 to disable) |
database_monitoring_interval | integer | 0 | ❌ | Database monitoring (0 to disable) |
Data Pool Types
Common Settings (All Data Pools)
Field | Type | Required | Description |
---|---|---|---|
id | string | ✅ | Unique identifier for the data pool |
type | string | ✅ | Data pool type |
base_path | string | ❌ | Optional path within the data source |
Local (type: "local"
)
No additional fields required.
S3 (type: "s3"
)
Field | Type | Required | Description |
---|---|---|---|
access_key_id | string | ✅ | AWS access key |
secret_access_key | string | ✅ | AWS secret key |
endpoint | string | ✅ | S3 endpoint URL |
bucket_name | string | ✅ | S3 bucket name |
provider | string | ❌ | Provider type ("AWS", "MinIO", "DigitalOcean", "Other") |
Google Drive (type: "drive"
)
Field | Type | Required | Description |
---|---|---|---|
refresh_token | string | ✅ | OAuth refresh token |
scope | string | ❌ | Access scope (default: "drive.readonly") |
root_folder_id | string | ❌ | Optional specific folder ID |
team_drive | string | ❌ | Shared drive ID |
client_id | string | ❌ | Optional custom client ID |
client_secret | string | ❌ | Optional custom client secret |
OneDrive (type: "onedrive"
)
Field | Type | Required | Description |
---|---|---|---|
refresh_token | string | ✅ | OAuth refresh token |
drive_id | string | ✅ | OneDrive ID |
drive_type | string | ✅ | Drive type ("personal", "business", "documentLibrary") |
client_id | string | ❌ | Application client ID (has default) |
client_secret | string | ❌ | Application client secret |
tenant_id | string | ❌ | Optional custom tenant |
Confluence (type: "confluence"
)
Field | Type | Required | Description |
---|---|---|---|
url | string | ✅ | Confluence base URL |
username | string | ✅ | Username for authentication |
token | string | ✅ | API token |
space_id | string | ✅ | Confluence space ID |
SMB (type: "smb"
)
Field | Type | Required | Description |
---|---|---|---|
host | string | ✅ | SMB server hostname or IP |
user | string | ✅ | Username for authentication |
password | string | ✅ | Password for authentication |
port | integer | ❌ | Optional port (default 445) |
domain | string | ❌ | Optional domain |
spn | string | ❌ | Optional SPN |
WebDAV (type: "webdav"
)
Field | Type | Required | Description |
---|---|---|---|
url | string | ✅ | WebDAV server URL |
vendor | string | ✅ | Vendor ("fastmail", "nextcloud", "owncloud", "sharepoint", "sharepoint-ntlm", "rclone", "other") |
user | string | ❌ | Username for authentication |
password | string | ❌ | Password for authentication |
bearer_token | string | ❌ | Bearer token (alternative to user/password) |