WebsitePlatform Login

Guide: Data Quality

Prepare data pragmatically before integration

Goal

Deliver fast without building on low-quality data.

SharePoint / file shares at large scale

When connecting large SharePoint estates:

  • do not ingest everything at once: select relevant sites/scopes first
  • reduce duplicates/outdated content: less noise, better retrieval
  • use clear metadata/naming conventions: better findability

Pragmatic sequence:

  1. define top use cases
  2. map only relevant data scopes
  3. expand step by step after validation

SAP / ERP with many tables

For very large table landscapes (e.g. SAP):

  • do not start with full coverage
  • curate tables by use case
  • assign business owners per data domain

Recommendation:

  • start with a small core set
  • validate answer quality
  • expand table scope in controlled increments

Minimum standards for structured data

  • stable keys/IDs available
  • consistent date fields
  • null/empty handling is understood
  • field semantics are documented
  • clear update cadence (e.g. hourly/daily)

Minimum standards for document data

  • clear titles/file names
  • current versions over shadow copies
  • avoid legacy archives in first scope
  • consistent folder/metadata structure

Go/No-Go checklist before pilot

  • Is first scope clearly bounded?
  • Are data owners assigned?
  • Are 1-2 high-value use cases explicitly defined?
  • Is it clear which data is intentionally excluded from phase 1?

If these points are clear, pilot speed and stability improve significantly.

On this page