Azure OpenAI
Puorpose of quotas
To regulate the number of requests, the volume of texts tokens, or the amount of computational
power utilized by OpenAI API user
Quota Allocation
Subscription receive a quota allocation, determined by geographic region and regions quantified by
token-per-minute (TPM)
Rate limit
- tokens per minute (TPM)
- request per minute (RPM)
Conversion metric
- The RPM ratio is set to 6 RPM for every 1000 TPM available
Azure API Management (APIM)
Azure Monitor and Log Analytics
Azure OpenAI Monitoring Metrics
- Azure OpenAI requests
- Processed prompt tokes
- Generated completion tokens
- Processed inference tokens
- Processed fine tuned training hours
References
Learn Live: Monitoring Azure OpenAI
Monitor OpenAI
Monitor OpenAI