OpenAI Embeddings

Configure OpenAI embedding models

Configuration

embedding_model:
  provider: "openai"
  api_key: $OPENAI_API_KEY
  model: "text-embedding-ada-002"
  base_url: null

Configuration Options

FieldTypeDefaultRequiredDescription
providerstring-Must be "openai"
api_keystring-OpenAI API key
modelstring"text-embedding-ada-002"Model name
base_urlstringnullOptional custom URL
rpminteger3000Requests per minute
tpminteger1000000Tokens per minute

Setup

  1. Create OpenAI account: platform.openai.com
  2. Generate API key: API Keys
  3. Add to environment: OPENAI_API_KEY=sk-...

OpenAI API Documentation

Configuration Parameters

ParameterDescriptionRequiredDefault
providerMust be openai-
api_keyOpenAI API key-
modelName of the embedding modeltext-embedding-ada-002
base_urlAlternative API URL (for proxies)https://api.openai.com/v1
rpmMaximum API requests per minute3000
tpmMaximum tokens per minute1000000

Available Models

text-embedding-3-small

  • Dimensions: 1536
  • Cost: ~$0.00002/1K tokens
  • Usage: Cost-effective, good for most applications

text-embedding-3-large

  • Dimensions: 3072
  • Cost: ~$0.00013/1K tokens
  • Usage: Highest quality, for demanding applications

text-embedding-ada-002 (Legacy)

  • Dimensions: 1536
  • Cost: ~$0.0001/1K tokens
  • Usage: Proven, gradually being replaced by newer models

Example Configuration

Standard Setup

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-small

With Custom Base URL (Proxy)

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-small
  base_url: https://your-proxy.company.com/v1

With Rate Limiting

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-large
  rpm: 1000  # Reduced requests per minute
  tpm: 500000  # Reduced tokens per minute

Legacy Model

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-ada-002

API Key Setup

1. Create OpenAI Account

Create an account or log in

Navigate to API Keys

Click "Create new secret key"

Enter a name and copy the API key

2. Environment Variables

Define in your vault.env file:

# OpenAI API
OPENAI_API_KEY=sk-...

Security: Never share your API key publicly or commit it to Git repositories. Always use environment variables.

Cost Optimization

Configure Rate Limiting

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-small
  rpm: 500   # Reduced requests for cost control
  tpm: 100000  # Fewer tokens per minute

Model Selection

ScenarioRecommended ModelReason
Cost-sensitivetext-embedding-3-smallBest cost-benefit ratio
Highest qualitytext-embedding-3-largeBest performance
Existing integrationtext-embedding-ada-002Proven and stable

Monitoring and Limits

OpenAI Usage Dashboard

Monitor your usage via:

Typical Limits (as of 2024)

ModelRate LimitContext Limit
text-embedding-3-small3000 RPM8191 tokens
text-embedding-3-large3000 RPM8191 tokens
text-embedding-ada-0023000 RPM8191 tokens

Troubleshooting

Common Issues

Debug Configuration

For detailed logs:

embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-small
  # Reduced limits for testing
  rpm: 10
  tpm: 1000

Performance Tips

Batch Processing

The DataVault processes multiple texts simultaneously for better efficiency

Cost Monitoring

Regularly monitor your OpenAI usage in the dashboard

API Key Rotation

Rotate API keys regularly for better security

Migration from Legacy Models

From ada-002 to text-embedding-3-small

# Old
embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-ada-002

# New
embedding_model:
  provider: openai
  api_key: $OPENAI_API_KEY
  model: text-embedding-3-small

Reindexing required: After a model change, all documents must be reindexed. Plan for appropriate downtime.

Enterprise Features

Dedicated Instances

For large enterprises, OpenAI offers dedicated instances:

  • Guaranteed capacity
  • Lower latency
  • Customized rate limits

Compliance

OpenAI meets various compliance standards:

  • GDPR: General Data Protection Regulation
  • SOC 2: Service Organization Control 2
  • Others: Depending on region and plan