Overview
Overview of Data Pools, RAG, and source connectivity
Data Pools (RAG)
Data Pools are the foundation for retrieval-augmented generation (RAG) in meinGPT.
Content from connected sources is indexed and provided to your assistants as knowledge.
For Most Teams (Default)
In the default setup, you connect sources directly in meinGPT and use data pools without running your own infrastructure.
- Connect sources in meinGPT
- Select the data pool
- Use it in assistants/workflows
You do not need to run or configure your own Data Vault for this.
Advanced: Customer-Managed Data Vault (On-Premise)
If you need your own on-prem knowledge infrastructure, you can deploy and operate your own Data Vault.
- Choose network model: On-Premise Connections
- Vault operations and configuration: /integrations/vault
When to use Data Pools
- You want to manage large document collections centrally
- You need reusable knowledge across multiple assistants
- You want controlled source sync and ingestion
Sources
All supported sources are listed here:
Typical source types:
- SharePoint / OneDrive
- Google Drive
- Confluence
- Amazon S3
- SMB / WebDAV
- Local filesystems
Custom Data Preparation Pipelines
For the dedicated pattern with S3 handover for third-party systems, see: