WebsitePlatform Login
DataVault Deployment

Local Deployment

Simple local DataVault setup with an OpenAI-compatible embedding service

This page guides you through the basic setup steps for a self-managed/on-premise DataVault deployment on your own hardware. In the default meinGPT cloud setup, you usually do not need these steps.

After following this tutorial, you will have a working local instance of the DataVault, connected to meinGPT, running on your server. Make sure you have the required prerequisites in place before following this guide.

Overview

The local DataVault consists of these services managed via Docker Compose:

ServicePurpose
vaultAPI server β€” serves queries and file downloads
vault-workerIngestion pipeline β€” processes and indexes documents
databasePostgreSQL with VectorChord β€” stores both ingestion metadata and vector embeddings
ollamaLocal embedding model server (OpenAI-compatible API)
pikoTunnel to meinGPT Cloud β€” connects the vault without exposing ports publicly

Your final project directory will look like this:

datavault-local/
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ vault.env           # Credentials and IDs (from meinGPT settings section)
β”‚   └── app_config.yaml     # Vault configuration
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ vault/              # Local documents for ingestion (optional)
β”‚   β”œβ”€β”€ postgres/           # Postgres data (auto-populated)
β”‚   └── ollama/             # Embedding model cache (auto-populated)
β”œβ”€β”€ documents/              # Optional extra read-only mount
└── docker-compose.yaml     # Service definitions

data/vault/ is only needed if you want to ingest local files from disk. If all your sources are cloud-based (SharePoint, Google Drive, etc.), you can skip it.

Step 1 β€” Create directory structure

mkdir datavault-local
cd datavault-local
mkdir -p config data/vault data/postgres data/ollama documents

Step 2 β€” Create the environment file

Grab your Vault ID and Vault Secret from the meinGPT dashboard and fill them in below.

config/vault.env
VAULT_ID=your-vault-id
VAULT_SECRET=your-vault-secret
MEINGPT_URL=https://app.meingpt.com

POSTGRES_USER=datavault
POSTGRES_PASSWORD=your-postgres-password

OPENAI_BASE_URL=http://ollama:11434/v1
OPENAI_API_KEY=local-dev
OPENAI_EMBEDDING_MODEL=bge-m3
OPENAI_EMBEDDING_DIMENSIONS=1024

Step 3 β€” Create the vault configuration

Values like $VAULT_ID reference the environment variables from vault.env β€” they are resolved automatically at runtime.

config/app_config.yaml
version: 1.0
meingpt_url: $MEINGPT_URL

vault:
  id: $VAULT_ID
  secret: $VAULT_SECRET
  standalone_mode: false
  data_dir: ./tmp
  ingestion_interval: 300
  tasks_batch_size: 3
  chunk_size: 256
  chunk_overlap: 26

metadata:
  # 'deployment_type' is a legacy label only β€” storage always uses PostgreSQL/VectorChord.
  # Both 'vault' and 'vault-worker' must point at the SAME postgres so they share the task queue.
  deployment_type: "cloud"
  postgres:
    user: $POSTGRES_USER
    password: $POSTGRES_PASSWORD
    host: database
    port: 5432
    database: datavault

embedding_model:
  provider: "openai"
  model: $OPENAI_EMBEDDING_MODEL
  base_url: $OPENAI_BASE_URL
  api_key: $OPENAI_API_KEY
  embedding_dimensions: $OPENAI_EMBEDDING_DIMENSIONS
  rpm: 1000
  tpm: 100000

logging:
  log_level: "INFO"
  log_to_file: true
  log_file_path: "logs/app.log"
  uvicorn_log_file_path: "logs/uvicorn.log"

data_pools:
  - id: your-datapool-id-from-meinGPT
    type: local
    base_path: /data/vault

Keep standalone_mode: false and keep the piko service enabled. This is what connects the local vault to meinGPT Cloud.

Step 4 β€” Create the Docker Compose file

Replace your-vault-id in the piko command below with the Vault ID from vault.env.

docker-compose.yaml
services:
  vault:
    image: meingpt/vault:latest
    ports:
      - 8080:8080
    depends_on:
      database-init:
        condition: service_completed_successfully
      ollama:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - vault_network
    volumes:
      - ./config:/etc/vault:ro
      - ./data/vault:/data/vault
      # Optional: mount a local directory for ingestion
      - ./documents:/app/documents:ro
    environment:
      - VAULT_CONFIG_FILE_PATH=/etc/vault/app_config.yaml
    env_file:
      - ./config/vault.env

  vault-worker:
    image: meingpt/vault:worker-latest
    ports:
      - 8081:8080
    depends_on:
      database-init:
        condition: service_completed_successfully
      ollama:
        condition: service_healthy
      vault:
        condition: service_started
    restart: unless-stopped
    networks:
      - vault_network
    volumes:
      - ./config:/etc/vault:ro
      - ./data/vault:/data/vault
      # Optional: mount a local directory for ingestion
      - ./documents:/app/documents:ro
    environment:
      - VAULT_CONFIG_FILE_PATH=/etc/vault/app_config.yaml
    env_file:
      - ./config/vault.env

  database:
    image: tensorchord/vchord-suite:pg17-20260401
    command:
      - postgres
      - -c
      - shared_preload_libraries=vchord.so,vchord_bm25.so,pg_tokenizer.so
      - -c
      - search_path="$$user", public, bm25_catalog, tokenizer_catalog
    ports:
      - 5432:5432
    env_file:
      - ./config/vault.env
    environment:
      - POSTGRES_DB=datavault
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d datavault"]
      interval: 5s
      timeout: 5s
      retries: 20
    networks:
      - vault_network

  database-init:
    image: tensorchord/vchord-suite:pg17-20260401
    depends_on:
      database:
        condition: service_healthy
    env_file:
      - ./config/vault.env
    environment:
      - POSTGRES_DB=datavault
    entrypoint: ["/bin/sh", "-c"]
    command:
      - |
        PGPASSWORD="$${POSTGRES_PASSWORD}" psql \
          -h database \
          -U "$${POSTGRES_USER}" \
          -d "$${POSTGRES_DB}" <<'SQL'
        \set ON_ERROR_STOP on

        CREATE EXTENSION IF NOT EXISTS vector;
        CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
        CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE;
        CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;

        SET search_path TO "$$user", public, bm25_catalog, tokenizer_catalog;

        DO $$$$
        BEGIN
          EXECUTE format(
            'ALTER DATABASE %I SET search_path TO "$$user", public, bm25_catalog, tokenizer_catalog',
            current_database()
          );
        END
        $$$$;

        DO $$$$
        BEGIN
          IF NOT EXISTS (
            SELECT 1
            FROM tokenizer_catalog.tokenizer
            WHERE name = 'chunks_token'
          ) THEN
            PERFORM create_tokenizer('chunks_token', $$tokenizer$$
        model = "llmlingua2"
        $$tokenizer$$);
          END IF;
        END
        $$$$;
        SQL
    restart: "no"
    networks:
      - vault_network

  ollama:
    image: ollama/ollama:latest
    # Pre-pull the embedding model on first start so vault can use it immediately.
    entrypoint: ["/bin/sh", "-c"]
    command:
      - |
        ollama serve &
        SERVE_PID=$$!
        export OLLAMA_HOST=http://127.0.0.1:11434
        until ollama list >/dev/null 2>&1; do sleep 1; done
        ollama pull bge-m3
        wait $$SERVE_PID
    volumes:
      - ./data/ollama:/root/.ollama
    networks:
      - vault_network
    healthcheck:
      # Becomes healthy only once the embedding model is fully pulled,
      # which gates vault/vault-worker startup so first embed calls don't fail.
      test: ["CMD-SHELL", "OLLAMA_HOST=http://127.0.0.1:11434 ollama list | grep -q bge-m3"]
      interval: 10s
      timeout: 5s
      retries: 60
      start_period: 30s

  piko:
    image: ghcr.io/andydunstall/piko:latest
    command:
      - agent
      - http
      # Replace "your-vault-id" with the vault ID from vault.env
      - your-vault-id
      - vault:8080
      - --connect.url
      - https://vault-proxy.meingpt.com
    env_file:
      - ./config/vault.env
    depends_on:
      vault:
        condition: service_started
    networks:
      - vault_network

networks:
  vault_network:

Embedding model: This config uses bge-m3 (1024 dim, multilingual β€” works well for German). Other Ollama embedding models like nomic-embed-text (768 dim) or mxbai-embed-large (1024 dim) work too β€” adjust OPENAI_EMBEDDING_MODEL and OPENAI_EMBEDDING_DIMENSIONS in vault.env accordingly.

GPU: Ollama runs CPU-only inside Docker by default. If you have an NVIDIA GPU on the host, add deploy: { resources: { reservations: { devices: [{ driver: nvidia, count: all, capabilities: [gpu] }] } } } to the ollama service for major throughput gains.

Step 5 β€” Deploy

  1. If you have local files to ingest, add them to data/vault/
  2. Pull images: docker compose pull
  3. Start services: docker compose up -d
  4. Check health: curl http://localhost:8080/health/
  5. Monitor logs: docker compose logs -f database ollama vault vault-worker

On first start, ollama downloads bge-m3. vault and vault-worker wait until the model is available.

Troubleshooting

  • Check service status: docker compose ps
  • View API logs: docker compose logs vault
  • View ingestion logs: docker compose logs vault-worker
  • Test Postgres: docker compose exec database pg_isready -U datavault -d datavault
  • Test Ollama: docker compose exec ollama sh -c "OLLAMA_HOST=http://127.0.0.1:11434 ollama list"
  • Restart services: docker compose restart

Common pitfall: ingestion not picked up

If ingestion never runs (no errors, but documents stay un-indexed), check that the metadata.postgres block in app_config.yaml is present and that both vault and vault-worker resolve to the same database host. They must share a Postgres so the worker sees tasks the API enqueues.

Common pitfall: database initialized with the wrong setup

If you started once with a wrong database image or wrong database volume path, remove the broken data directory and start again:

docker compose down
rm -rf data/postgres
mkdir -p data/postgres
docker compose up -d

On this page