api design
26 TopicsTake full control of your AI APIs with Azure API Management Gateway
This article is part of a series of articles on API Management and Generative AI. We believe that adding Azure API Management to your AI projects can help you scale your AI models, make them more secure and easier to manage. In this article, we will shed some light on capabilities in API Management, which are designed to help you govern and manage Generative AI APIs, ensuring that you are building resilient and secure intelligent applications. But why exactly do I need API Management for my AI APIs? Common challenges when implementing Gen AI-Powered solutions include: - Quota, (calculated in tokens-per-minute (TPM)), allocation across multiple client apps, How to control and track token consumption for all users, Mechanisms to attribute costs to specific client apps, activities, or users, Your systems resiliency to backend failures when hitting one or more limits And the list goes on with more challenges and questions. Well, let’s find some answers, shall we? Quota allocation Take a scenario where you have more than one client application, and they are talking to one or more models from Azure OpenAI Service or Azure AI Foundry. With this complexity, you want to have control over the quota distribution for each of the applications. Tracking Token usage & Security I bet you agree with me that it would be unfortunate if one of your applications (most likely that which gets the highest traffic), hogs up all the TPM quota leaving zero tokens remaining for your other applications, right? If this occurs though, there is a high chance that it might be a DDOS Attack, with bad actors trying to bombard your system with purposeless traffic causing service downtime. Yet another reason why you will need more control and tracking mechanisms to ensure this doesn’t happen. Token Metrics As a data-driven company, having additional insights with flexibility to dissect and examine usage data down to dimensions like subscription ID or API ID level is extremely valuable. These metrics go a long way in informing capacity and budget planning decisions. Automatic failovers This is a common one. You want to ensure that your users experience zero service downtime, so if one of your backends is down, does your system architecture allow automatic rerouting and forwarding to healthy services? So, how will API Management help address these challenges? API Management has a set of policies and metrics called Generative AI (Gen AI) gateway capabilities, which empower you to manage and have full control of all these moving pieces and components of your intelligent systems. Minimize cost with Token-based limits and semantic caching How can you minimize operational costs for AI applications as much as possible? By leveraging the `llm-token limit` policy in Azure API Management, you can enforce token-based limits per user on identifiers such as subscription keys and requesting IP addresses. When a caller surpasses their allocated tokens-per-minute quota, they receive a HTTP "Too Many Requests" error along with ‘retry-after’ instructions. This mechanism ensures fair usage and prevents any single user from monopolizing resources. To optimize cost consumption for Large Language Models (LLMs), it is crucial to minimize the number of API calls made to the model. Implementing the `llm-semantic-cache-store` policy and `llm-semantic-cache-lookup` policies allow you to store and retrieve similar completions. This method involves performing a cache lookup for reused completions, thereby reducing the number of calls sent to the LLM backend. Consequently, this strategy helps in significantly lowering operational costs. Ensure reliability with load balancing and circuit breakers Azure API Management allows you to leverage load balancers to distribute the workload across various prioritized LLM backends effectively. Additionally, you can set up circuit breaker rules that redirect requests to a responsive backend if the prioritized one fails, thereby minimizing recovery time and enhancing system reliability. Implementing the semantic-caching policy not only saves costs but also reduces system latency by minimizing the number of calls processed by the backend. Okay. What Next? This article mentions these capabilities at a high level, but in the coming weeks, we will publish articles that go deeper into each of these generative AI capabilities in API Management, with examples of how to set up each policy. Stay tuned! Do you have any resources I can look at in the meantime to learn more? Absolutely! Check out: - Manage your Azure OpenAI APIs with Azure API Management http://aka.ms/apimloveHow To: Retrieve from CosmosDB using Azure API Management
In this How To, I will show a simple mechanism for reading items from CosmosDB using Azure API Management (APIM). There are many scenarios where you might want to do this in order to leverage the capabilities of APIM while having a highly scalable, flexible data store.Essentials for building and modernizing AI apps on Azure
Building and modernizing AI applications is complex—but Azure Essentials simplifies the journey. With a structured, three-stage approach—Readiness and Foundation, Design and Govern, Manage and Optimize—it provides tools, best practices, and expert guidance to tackle key challenges like skilled resource gaps, modernization, and security. Discover how to streamline AI app development, enhance scalability, and achieve cost efficiency while driving business value. Ready to transform your AI journey? Explore the Azure Essentials Hub today.How To: Send requests to Azure Storage from Azure API Management
In this How To, I will show a simple mechanism for writing a payload to Azure Blob Storage from Azure API Management. Some examples where this is useful is implementing a Claim-Check pattern for large messages or to support message logging when Application Insights is not suitable.Build a no-code GraphQL service with Azure API Management
Synthetic GraphQL allows you to build a GraphQL API from your existing REST, SOAP, or other HTTP APIs. It allows your API to coexist while you migrate your client applications to GraphQL. Learn how you can leverage Synthetic GraphQL with Azure API Management.Helm charts managed through Terraform to deploy an Azure SecretProviderClass on AKS
Using Terraform and Helm charts will help you reap the benefits of both worlds: Make full use of your teams’ skills. Pass calculated values from your cloud provider without writing them in your code. Manage planned changes that new git commits plan to do before applying them in production.2.1KViews0likes0CommentsHow To: APIM Asynch to Synch Pattern
Just because you can do something, does not mean you should... In this How To post, I will use APIM to turn an asynchronous messaging into a synchronous messaging by publishing a message to Azure Service Bus and retrieving the response using Azure Blob Storage.4.2KViews2likes0Comments