Hi Marcus,
Thanks for the feedback. You might consider the APIM Standard v2 tier—it offers VNET integration at a lower cost point. The Azure Function I’m using handles cost lookups, latency measurement, and real-time quota checks. While it’s theoretically possible to embed some of that logic within APIM policies, doing so makes debugging more challenging compared to using a full programming language. Moreover, streaming token usage can have delayed reporting in Application Insights, which can post some issues on enforcing quotas in real time at scale.
I’m currently using the Flex Consumption tier for Azure Functions. It comes with a generous free grant (250,000 executions and 100,000 GB-s per month) and supports VNET integration, making it a sustainable choice overall.
Hope that helps clarify things.
Hieu