Deployment Verification

After deploying DataVault, it's crucial to verify that all components are functioning correctly. This guide provides a comprehensive checklist and testing procedures.

Initial Health Checks

1. Service Status

Check all services are running:

# Docker deployment
docker-compose ps

# Kubernetes deployment
kubectl get pods -n datavault
kubectl get svc -n datavault

# Systemd services
systemctl status datavault-api
systemctl status weaviate
systemctl status postgresql

Expected output:

All services show "running" or "healthy" status
No restart loops
No error states

2. API Accessibility

Test API endpoint:

# Health check
curl -X GET http://localhost:8000/health

# Expected response:
{
  "status": "healthy",
  "version": "2.0.0",
  "services": {
    "database": "connected",
    "vector_db": "connected",
    "cache": "connected"
  }
}

# API docs
curl -X GET http://localhost:8000/docs

3. Database Connectivity

PostgreSQL:

# Test connection
psql -h localhost -U datavault -d datavault -c "SELECT version();"

# Check tables
psql -h localhost -U datavault -d datavault -c "\dt"

Weaviate:

# Check Weaviate status
curl -X GET http://localhost:8080/v1/meta

# List schemas
curl -X GET http://localhost:8080/v1/schema

Functional Testing

1. Authentication

Test login:

# Get access token
curl -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "username": "admin",
    "password": "your-password"
  }'

# Use token for authenticated requests
export TOKEN="your-access-token"
curl -X GET http://localhost:8000/api/v1/user/profile \
  -H "Authorization: Bearer $TOKEN"

2. Document Ingestion

Upload test document:

# Create test file
echo "This is a test document for DataVault verification." > test.txt

# Upload document
curl -X POST http://localhost:8000/api/v1/documents/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@test.txt" \
  -F "metadata={\"category\":\"test\"}"

# Check processing status
curl -X GET http://localhost:8000/api/v1/documents/status/{document_id} \
  -H "Authorization: Bearer $TOKEN"

3. Vector Search

Test semantic search:

# Search for similar documents
curl -X POST http://localhost:8000/api/v1/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "test document verification",
    "limit": 10,
    "threshold": 0.7
  }'

4. RAG Functionality

Test question answering:

# Ask a question
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is DataVault?"}],
    "use_context": true,
    "model": "gpt-3.5-turbo"
  }'

Performance Verification

1. Response Times

Measure API latency:

# Simple latency test
time curl -X GET http://localhost:8000/health

# Load test with Apache Bench
ab -n 100 -c 10 -H "Authorization: Bearer $TOKEN" \
   http://localhost:8000/api/v1/documents/list

# Expected results:
# - Health check: <100ms
# - Document list: <500ms
# - Search queries: <2s

2. Resource Usage

Monitor system resources:

# CPU and Memory
top -b -n 1 | grep -E "datavault|weaviate|postgres"

# Disk usage
df -h | grep -E "datavault|docker"

# Network connections
netstat -tulpn | grep -E "8000|8080|5432"

3. Database Performance

Check query performance:

-- PostgreSQL slow queries
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Table sizes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Integration Testing

1. Data Source Connections

Test each configured source:

# S3 source
curl -X POST http://localhost:8000/api/v1/sources/test \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "s3",
    "config": {
      "bucket": "test-bucket",
      "region": "us-east-1"
    }
  }'

# Confluence source
curl -X POST http://localhost:8000/api/v1/sources/test \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "confluence",
    "config": {
      "url": "https://company.atlassian.net",
      "space": "TEST"
    }
  }'

2. Embedding Provider

Verify embedding generation:

# Test embedding endpoint
curl -X POST http://localhost:8000/api/v1/embeddings/generate \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Test embedding generation",
    "model": "text-embedding-ada-002"
  }'

3. LLM Provider

Test LLM connectivity:

# Test completion without context
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello"}],
    "use_context": false,
    "model": "gpt-3.5-turbo"
  }'

Security Verification

1. Authentication Tests

# Test invalid token
curl -X GET http://localhost:8000/api/v1/documents/list \
  -H "Authorization: Bearer invalid-token"
# Expected: 401 Unauthorized

# Test missing token
curl -X GET http://localhost:8000/api/v1/documents/list
# Expected: 401 Unauthorized

# Test expired token
# Wait for token expiry, then:
curl -X GET http://localhost:8000/api/v1/documents/list \
  -H "Authorization: Bearer $EXPIRED_TOKEN"
# Expected: 401 Unauthorized

2. Authorization Tests

# Test accessing other user's documents
curl -X GET http://localhost:8000/api/v1/documents/{other_user_doc_id} \
  -H "Authorization: Bearer $TOKEN"
# Expected: 403 Forbidden

# Test admin-only endpoints as regular user
curl -X GET http://localhost:8000/api/v1/admin/users \
  -H "Authorization: Bearer $USER_TOKEN"
# Expected: 403 Forbidden

3. Input Validation

# Test SQL injection
curl -X POST http://localhost:8000/api/v1/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "test'; DROP TABLE documents; --",
    "limit": 10
  }'
# Expected: Normal results, no database damage

# Test XSS
curl -X POST http://localhost:8000/api/v1/documents/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@test.txt" \
  -F "metadata={\"title\":\"<script>alert('xss')</script>\"}"
# Expected: Escaped or rejected

Monitoring Setup

1. Prometheus Metrics

# Check metrics endpoint
curl -X GET http://localhost:8000/metrics

# Verify key metrics:
# - http_requests_total
# - http_request_duration_seconds
# - document_processing_duration_seconds
# - vector_search_duration_seconds

2. Logging

# Check application logs
tail -f /var/log/datavault/api.log

# Check error logs
tail -f /var/log/datavault/error.log

# Verify log format and content:
# - Timestamp
# - Log level
# - Request ID
# - User ID
# - Action performed

3. Alerts

Test alert conditions:

# Simulate high CPU
stress --cpu 8 --timeout 60s

# Simulate disk full
dd if=/dev/zero of=/tmp/testfile bs=1G count=10

# Simulate service down
docker stop datavault-api

# Verify alerts are triggered

Load Testing

Gradual Load Test

# load_test.py
import concurrent.futures
import requests
import time

BASE_URL = "http://localhost:8000"
TOKEN = "your-token"

def search_request():
    headers = {"Authorization": f"Bearer {TOKEN}"}
    data = {"query": "test query", "limit": 10}
    response = requests.post(
        f"{BASE_URL}/api/v1/search",
        json=data,
        headers=headers
    )
    return response.status_code, response.elapsed.total_seconds()

# Test with increasing load
for num_users in [1, 5, 10, 20, 50]:
    print(f"\nTesting with {num_users} concurrent users:")
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=num_users) as executor:
        start_time = time.time()
        futures = [executor.submit(search_request) for _ in range(100)]
        results = [f.result() for f in futures]
        
        total_time = time.time() - start_time
        success_count = sum(1 for status, _ in results if status == 200)
        avg_response_time = sum(t for _, t in results) / len(results)
        
        print(f"Total time: {total_time:.2f}s")
        print(f"Success rate: {success_count}/100")
        print(f"Avg response time: {avg_response_time:.2f}s")

Verification Checklist

Core Functionality

Performance

Response times meet requirements
Resource usage is acceptable
System handles expected load
No memory leaks observed

Security

Reliability

Troubleshooting Failed Checks

If any verification fails:

Check service logs for errors
Verify configuration files
Ensure all dependencies are installed
Check network connectivity
Verify resource availability
Review security settings

For detailed troubleshooting, see the operations guide.