Guide: Data Quality
Prepare data pragmatically before integration
Goal
Deliver fast without building on low-quality data.
SharePoint / file shares at large scale
When connecting large SharePoint estates:
- do not ingest everything at once: select relevant sites/scopes first
- reduce duplicates/outdated content: less noise, better retrieval
- use clear metadata/naming conventions: better findability
Pragmatic sequence:
- define top use cases
- map only relevant data scopes
- expand step by step after validation
SAP / ERP with many tables
For very large table landscapes (e.g. SAP):
- do not start with full coverage
- curate tables by use case
- assign business owners per data domain
Recommendation:
- start with a small core set
- validate answer quality
- expand table scope in controlled increments
Minimum standards for structured data
- stable keys/IDs available
- consistent date fields
- null/empty handling is understood
- field semantics are documented
- clear update cadence (e.g. hourly/daily)
Minimum standards for document data
- clear titles/file names
- current versions over shadow copies
- avoid legacy archives in first scope
- consistent folder/metadata structure
Go/No-Go checklist before pilot
- Is first scope clearly bounded?
- Are data owners assigned?
- Are 1-2 high-value use cases explicitly defined?
- Is it clear which data is intentionally excluded from phase 1?
If these points are clear, pilot speed and stability improve significantly.