azure openai service
22 TopicsUse generative AI to extract structured data out of emails
One thing we regularly hear from clients is that they receive information that are key to their business such as order requests via email in an unstructured format and sometimes there are structured information within the body of those emails in a variety of table formats. In today’s fast-paced digital world, businesses need a way to automatically extract, structure, and integrate this information into their existing applications. Whether it’s leveraging AI-powered document processing, natural language processing (NLP), or intelligent automation, the right approach can transform email-based orders into structured, actionable data. In this blog, we’ll explore one such scenario where AI can be leveraged to extract information in tabular format that has been provided within an email. The emails contextually belong to a specific domain, but the tables are not with consistent headers or shapes. Sometimes in the body of one email there could be multiple tables. The problem Statement Extract tabular information with varying table formats from emails The typical approach to this problem involves rule-based processing, where individual tables are extracted and merged based on predefined logic. However, given the variety of email formats from hundreds or even thousands of different senders, maintaining such rule-based logic becomes increasingly complex and difficult to manage. A more optimal solution is leveraging the cognitive capabilities of generative AI, which can dynamically adapt to different table structures, column names, and formatting variations—eliminating the need for constant rule updates while improving accuracy and scalability. To create this sample code, I used below email with test data, with two tables with inconsistent column names. It is going to provide some upcoming trainings information. Please note the difference between the column headers: Hi there, Regarding the upcoming trainings, this is the list: Event Date Description of Event Length Grade 2025-01-21 Digital environments 20 hours 5 2025-03-01 AI for Industry A 10 hours 3 and some further events in below list Date Subject Duration Grade 2025-01-21 Digital environments 2 2 days 1 2025-03-01 AI for Industry B 2 weeks 4 These sessions are designed to be interactive and informative, so your timely participation is crucial. Please make sure to log in or arrive on time to avoid missing key insights. If you have any questions or need assistance, feel free to reach out. Looking forward to seeing you there! Thanks, Azadeh These are the two tables within the email, and we need to extract one consistent table format with all the rows from these two tables. Table 1 Event Date Description of Event Length Grade 2025-01-21 Digital environments 20 hours 5 2025-03-01 AI for Industry A 10 hours 3 Table 2 Date Subject Duration Grade 2025-01-21 Digital environments 2 2 days 1 2025-03-01 AI for Industry B 2 weeks 4 To extract the tabular data into one single table in json format, I am using python with below libraries installed in my environment: pandas beautifulsoup4 openai lxml The Code I use azure OpenAI service with a gpt 4o deployment. Below code is just one way of solving this type of problem and can be customized or improved to fit to other similar problems. I have provided some guidelines about merging the tables and column names similarity in the user prompt. This sample code is using an email message that is saved in 'eml' format in a local path, but the email library has other capabilities to help you connect to a mailbox and get the emails. import email import pandas as pd from bs4 import BeautifulSoup import os from openai import AzureOpenAI endpoint = os.getenv("ENDPOINT_URL", "https://....myendpointurl....openai.azure.com/") deployment = os.getenv("DEPLOYMENT_NAME", "gpt-4o") subscription_key = os.getenv("AZURE_OPENAI_API_KEY", "myapikey) # Initialize Azure OpenAI Service client with key-based authentication client = AzureOpenAI( azure_endpoint=endpoint, api_key=subscription_key, api_version="2024-05-01-preview", ) # Process email content with GPT-4 def extract_information(email_body, client): soup = BeautifulSoup(email_body, "html.parser") body = soup.get_text() print(body) #Prepare the chat prompt chat_prompt = [ { "role": "system", "content": [ { "type": "text", "text": "You are an AI assistant that is expert in extracting structured data from emails." } ] }, { "role": "user", "content": [ { "type": "text", "text": f"Extract the required information from the following email and format it as JSON and consolidate the tables using the common column names. For example the columns length and duration are the same and the columns Event and Subject are the same:\n\n{body}" } ] } ] messages = chat_prompt # Generate the completion completion = client.chat.completions.create( model=deployment, messages=messages, max_tokens=800, temperature=0.1, top_p=0.95, frequency_penalty=0, presence_penalty=0, stop=None, stream=False ) return completion.choices[0].message.content email_file_name = r'...path to your file....\Test Email with Tables.eml' with open(email_file_name, "r") as f: msg = email.message_from_file(f) email_body = "" for part in msg.walk(): if part.get_content_type() == "text/plain": email_body = part.get_payload(decode=True).decode() elif part.get_content_type() == "text/html": email_body = part.get_payload(decode=True).decode() extracted_info = extract_information(email_body, client) print(extracted_info) The output is: ``` [ { "Event": "Digital environments", "Date": "2025-01-21", "Length": "20 hours", "Grade": 5 }, { "Event": "AI for Industry A", "Date": "2025-03-01", "Length": "10 hours", "Grade": 3 }, { "Event": "Digital environments 2", "Date": "2025-01-21", "Length": "2 days", "Grade": 1 }, { "Event": "AI for Industry B", "Date": "2025-03-01", "Length": "2 weeks", "Grade": 4 } ] ``` Key points in the code: Read an email and extract the body Use a gen AI model with the right instructions prompt to complete the task Gen AI will follow the instructions and create a combined consistent table Get the output in the right format, e.g. 'json' I hope you find this blog post helpful, and you can apply it to your use case/domain. Or you can simply get the idea of how to use generative AI to solve a problem, instead of building layers of custom logic.1KViews7likes1CommentLaying the Groundwork: Key Elements for Effective AI Deployment
This post explores the essential components required to build production-ready AI solutions, including the importance of solid architectural foundations, robust data management practices, and responsible AI development. We discuss the complexities of integrating AI into existing systems, the need for continuous evaluation to ensure optimal performance, and the ethical considerations vital for deploying AI responsibly. Whether you're starting your AI journey or looking to refine your approach, this post provides valuable insights into creating scalable, reliable, and ethical AI solutions.2.1KViews6likes0CommentsThe Future of AI: Harnessing AI for E-commerce - personalized shopping agents
Explore the development of personalized shopping agents that enhance user experience by providing tailored product recommendations based on uploaded images. Leveraging Azure AI Foundry, these agents analyze images for apparel recognition and generate intelligent product recommendations, creating a seamless and intuitive shopping experience for retail customers.379Views5likes2CommentsGenAI Solutions: Elevating Production Apps Performance Through Latency Optimization
As the influence of GenAI-based applications continues to expand, the critical need to enhance their performance becomes ever more apparent. In the realm of production applications, responses are expected within a range of milliseconds to seconds. The integration of Large Language Models (LLMs) has the potential to extend response times of such applications to few more seconds. This blog intricately explores diverse strategies aimed at optimizing response times in applications that harness Large Language Models on the Azure platform. In broad context, the subsequent methodologies can be employed to optimize the responsiveness of Generative Artificial Intelligence (GenAI) applications: Response Optimization of LLM models Designing an Efficient Workflow orchestration Improving Latency in Ancillary AI Services Response Optimization of LLM models The inherent complexity and size of Language Model (LLM) architectures contribute substantially to the latency observed in any application upon their integration. Therefore, prioritizing the optimization of LLM responsiveness becomes imperative. Let’s now explore various strategies aimed at enhancing the responsiveness of LLM applications, placing particular emphasis on the optimization of the Large Language Model itself. Key factors influencing the latency of LLMs are Prompt Size and Output token count A token is a unit of text that the model processes. It can be as short as one character or as long as one word, depending on the model's architecture. For example, in the sentence "ChatGPT is amazing," there are five tokens: ["Chat", "G", "PT", " is", " amazing"]. Each word or sub-word is considered a token, and the model analyses and generates text based on these units. A helpful rule of thumb is that one token corresponds to ~4 characters of text for common English text. This translates to ¾ of a word (so 100 tokens ~= 75 words). A deployment of the GPT-3.5-turbo instance on Azure comes with a rate limit of around 120,000 tokens per minute, equivalent to approximately 2,000 tokens per second ( Details of TPM limits of each Azure OpenAI model are given here . It is evident that the quantity of output tokens has a direct impact on the response of Large Language Models (LLMs), consequently influencing the application's responsiveness. To Optimize application response times, it is recommended to minimize the number of output tokens generated. Set an appropriate value for the max_tokens parameter to limit the response length. This can help in controlling the length of the generated output. The latency of LLMs is influenced not only by the output tokens but also by the input prompts. Input prompts can be categorized into two main types: Instructions, which serve as guidelines for LLMs to follow, and Information, providing a summary or context for the grounded data to be processed by LLMs. While instructions are typically of standard lengths and crucial for prompt construction, but the inclusion of multiple tasks may lead to varied instructions, and ultimately increasing the overall prompt size. It is advisable to limit prompts to a maximum of one or two tasks to manage prompt size effectively. Additionally, the information or content can be condensed or summarized to optimize the overall prompt length. Model size The size of the LLMs is typically measured in terms of its parameters. A simple neural network with just one hidden layer has a parameter for each connection between nodes (neurons) across layers and for each node’s bias. The more layers and nodes a model have, the more parameters it will contain. A larger parameter size usually translates into a more complex model that can capture intricate patterns in the data. Applications frequently utilize Large Language Models (LLMs) for various tasks such as classification, keyword extraction, reasoning, and summarization. It is crucial to choose the appropriate model for the specific task at hand. Smaller models like Davinci are well-suited for tasks like classification or key value extraction, offering enhanced accuracy and speed compared to larger models. On the other hand, large models are more suitable for complex use cases like summarization, reasoning and chat conversations. Selecting the right model tailored to the task optimizes both efficiency and performance. Leverage Azure-hosted LLM Models Azure AI Studio provides customers with cutting-edge language models like OpenAI's GPT-4, GPT-3, Codex, DALL-E, and Whisper models, and other open-source models all backed by the security, scalability, and enterprise assurances of Microsoft Azure. The OpenAI models are co-developed by Azure OpenAI and OpenAI, ensuring seamless compatibility and a smooth transition between the different models. By opting for Azure OpenAI, customers not only benefit from the security features inherent to Microsoft Azure but also run on the same models employed by OpenAI. This service offers additional advantages such as private networking, regional availability, scalability, and responsible AI content filtering, enhancing the overall experience and reliability of language AI applications. If anyone is using GenAI models from creators, the transition to Azure-hosted version of these Models has yielded notable enhancements in the response time of the models. This shift to Azure infrastructure has led to improved efficiency and performance, resulting in more responsive and timely outputs from the models. Rate Limiting, Batching, Parallelize API calls Large language models are subject to rate limits, such as RPM (requests per minute) and TPM (tokens per minute), which depend on the chosen model and platform. It is important to recognize that rate limiting can introduce latency into the application. To accommodate high traffic requirements, it is recommended to select the maximum value for the max_token parameter to prevent any occurrence of a 429 error, which can lead to subsequent latency issues. Additionally, it is advisable to implement retry logic in your application to further enhance its resilience. Effectively managing the balance between RPM and TPM allows for enhanced latency through strategies like batching or parallelizing API calls. When you find yourself reaching the upper limit of RPM but remain comfortably within TPM bounds, consolidating multiple requests into a single batch can optimize your response times. This batching approach enables more efficient utilization of the model's token capacity without violating rate limits. Moreover, if your application involves multiple calls to the LLMs API, you can achieve a notable speed boost by adopting an asynchronous programming approach that allows requests to be made in parallel. This concurrent execution minimizes idle time, enhancing overall responsiveness and making the most of available resources. If the parameters are already optimized and the application requires additional support for higher traffic and a more scalable approach, consider implementing a load balancing solution through Azure API Management layer. Stream output and use stop sequence Every LLM endpoint has a particular throughput capacity. As discussed, earlier GPT-3.5-turbo instance on Azure comes with a rate limit of 120,000 tokens per minute, equivalent to approximately 2 tokens per milliseconds. So, to get an output paragraph with 2000 tokens it takes 1 second and the time taken to get the output response increases as the number of tokens increase. The time taken for the output response (Latency) can be measured as the sum of time taken for the first token generation and the time taken per token from the first token onwards. That is Latency = (Time to first token + (Time taken per token * Total tokens)) So, to improve latency we can stream the output as every token gets generated instead of waiting for the entire paragraph to finish. Both the completions and chat Azure OpenAI APIs support a stream parameter, which when set to true, streams the response back from the model via Server Sevent Events (SSE). We can use Azure Functions with FastAPI to stream the output of OpenAI models in Azure as shown in the blog here. Designing an Efficient Workflow orchestration Incorporating GenAI solutions into applications requires the utilization of specialized frameworks like LangChain or Semantic Kernel. These frameworks play a crucial role in orchestrating multiple Large Language Model (LLM) based tasks and grounding these models on custom datasets. However, it's essential to address the latency introduced by these frameworks in the overall application response. To minimize the impact on application latency, a strategic approach is imperative. A highly effective strategy involves optimizing LLM usage through workflow consolidation, either by minimizing the frequency of calls to LLM APIs or simplifying the overall workflow steps. By streamlining the process, you not only enhance the overall efficiency but also ensure a smoother user experience. For example, when the requirement is to identify the intention of the user query and based on its context get response by grounding on data from multiple sources. Most times such requirements are executed as a 3-step process – the first step is to identify the intent using a LLM and the next step is to get the prompt content from the knowledge base relevant to the intent and then with the prompt content get the output derived from the LLMs. One simple approach could be to leverage data engineering and building a consolidated knowledge base with data from all sources and using the input user text directly as the prompt to the grounded data in knowledge base to get the final LLM response in almost a single step. Improving Latency in Ancillary AI Services The supporting AI services like Vector DB, Azure AI Search, data pipelines, and others that complement a Language Model (LLM)-based application within the overall Retrieval-Augmented Generation (RAG) pattern are often referred to as "ancillary AI services." These services play a crucial role in enhancing different aspects of the application, such as data ingestion, searching, and processing, to create a comprehensive and efficient AI ecosystem. For instance, in scenarios where data ingestion plays a substantial role, optimizing the ingestion process becomes paramount to minimize latency in the application. Similarly lets look at the improvement of few other such services – Azure AI search Here are some tips for better performance in Azure AI Search: Index size and schema: Queries run faster on smaller indexes. One best practice is to periodically revisit index composition, both schema and documents, to look for content reduction opportunities. Schema complexity can also adversely affect indexing and query performance. Excessive field attribution builds in limitations and processing requirements. Query design: Query composition and complexity are one of the most important factors for performance, and query optimization can drastically improve performance. Service capacity: A service is overburdened when queries take too long or when the service starts dropping requests. To avoid this, you can increase capacity by adding replicas or upgrading the service tier. For more information on the optimizations of Azure AI Search index please refer here . For optimizing third-party vector databases, consider exploring techniques such as vector indexing, Approximate Nearest Neighbor (ANN) search (instead of KNN), optimizing data distribution, implementing parallel processing, and incorporating load balancing strategies. These approaches enhance scalability and improve overall performance significantly. Conclusion In conclusion, these strategies contribute significantly to mitigating latency and enhancing response in large language models. However, given the inherent complexity of these models, the optimal response time can fluctuate between milliseconds and 3-4 seconds. It is crucial to recognize that comparing the response expectations of large language models to those of traditional applications, which typically operate in milliseconds, may not be entirely equitable.5.8KViews5likes1CommentAnnouncing Model Fine-Tuning Collaborations: Weights & Biases, Scale AI, Gretel and Statsig
As AI continues to transform industries, the ability to fine-tune models and customize them for specific use cases has become more critical than ever. Fine-tuning can enable companies to align models with their unique business goals, ensuring that AI solutions deliver results with greater precision However, organizations face several hurdles in their model customization journey: Lack of end-to-end tooling: Organizations struggle with fine-tuning foundation models due to complex processes, and the absence of tracking and evaluation tools for modifications. Data scarcity and quality: Limited access to large, high-quality datasets, along with privacy issues and high costs, complicate model training and fine-tuning. Shortage of fine-tuning expertise and pre-trained models: Many companies lack specialized knowledge and access to refined models for fine-tuning. Insufficient experimentation tools: A lack of tools for ongoing experimentation in production limits optimization of key variables like model diversity and operational efficiency. To address these challenges, Azure AI Foundry is pleased to announce new collaborations with Weights & Biases, Scale AI, Gretel and Statsig to streamline the process of model fine-tuning and experimentation through advanced tools, synthetic data and specialized expertise. Weights & Biases integration with Azure OpenAI Service: Making end-to-end fine-tuning accessible with tooling The integration of Weights & Biases with Azure OpenAI Service offers a comprehensive end-to-end solution for enterprises aiming to fine-tune foundation models such as GPT-4, GPT-4o, and GPT-4o mini. This collaboration provides a seamless connection between Azure OpenAI Service and Weights and Biases Models which offers powerful capabilities for experiment tracking, visualization, model management, and collaboration. With the integration, users can also utilize Weights and Biases Weave to evaluate, monitor, and iterate on the performance of their fine-tuned models powered AI applications in real-time. Azure's scalable infrastructure allows organizations to handle the computational demands of fine-tuning, while Weights and Biases offers robust capabilities for fine-tuning experimentation and evaluation of LLM-powered applications. Whether optimizing GPT-4o for complex reasoning tasks or using the lightweight GPT-4o mini for real-time applications, the integration simplifies the customization of models to meet enterprise-specific needs. This collaboration addresses the growing demand for tailored AI models in industries such as retail and finance, where fine-tuning can significantly improve customer service chatbots or complex financial analysis. Azure Open AI Service and Weights & Biases integration is now available in public preview. For further details on Azure OpenAI Service and Weights & Biases integration including real-world use-cases and a demo, refer to the blog here. Scale AI and Azure Collaboration: Confidently Implement Agentic GenAI Solutions in Production Scale AI collaborates with Azure AI Foundry to offer advanced fine-tuning and model customization for enterprise use cases. It enhances the performance of Azure AI Foundry models by providing high-quality data transformation, fine-tuning and customization services, end-to-end solution development and specialized Generative AI expertise. This collaboration helps improve the performance of AI-driven applications and Azure AI services such as Azure AI Agent in Azure AI Foundry, while reducing production time and driving business impact. "Scale is excited to partner with Azure to help our customers transform their proprietary data into real business value with end-to-end GenAI Solutions, including model fine-tuning and customization in Azure." Vijay Karunamurthy, Field CTO, Scale AI Checkout a demo in BRK116 session showcasing how Scale AI’s fine-tuned models can improve agents in Azure AI Foundry and Copilot Studio. In the coming months, Scale AI will offer fine-tuning services for Azure AI Agents in Azure AI Foundry. For more details, please refer to this blog and start transforming your AI initiatives by exploring Scale AI on the Azure Marketplace. Gretel and Azure OpenAI Service Collaboration: Revolutionizing data pipeline for custom AI models Azure AI Foundry is collaborating with Gretel, a pioneer in synthetic data and privacy technology, to remove data bottlenecks and bring advanced AI development capabilities to our customers. Gretel's platform enables Azure users to generate high-quality datasets for ML and AI through multiple approaches - from prompts and seed examples to differential privacy-preserved synthetic data. This technology helps organizations overcome key challenges in AI development including data availability, privacy requirements, and high development costs with support for structured, unstructured, and hybrid text data formats. Through this collaboration, customers can seamlessly generate datasets tailored to their specific use cases and industry needs using Gretel, then use them directly in Azure OpenAI Service for fine-tuning. This integration greatly reduces both costs and time compared to traditional data labeling methods, while maintaining strong privacy and compliance standards. The collaboration enables new use cases for Azure AI Foundry customers who can now easily use synthetic data generated by Gretel for training and fine-tuning models. Some of the new use cases include cost-effective improvements for Small Language Models (SLMs), improved reasoning abilities of Large Language Models (LLMs), and scalable data generation from limited real-world examples. This value is already being realized by leading enterprises. “EY is leveraging the privacy-protected synthetic data to fine-tune Azure OpenAI Service models in the financial domain," said John Thompson, Global Client Technology AI Lead at EY. "Using this technology with differential privacy guarantees, we generate highly accurate synthetic datasets—within 1% of real data accuracy—that safeguard sensitive financial information and prevent PII exposure. This approach ensures model safety through privacy attack simulations and robust data quality reporting. With this integration, we can safely fine-tune models for our specific financial use cases while upholding the highest compliance and regulatory standards.” The Gretel integration with Azure OpenAI Service is available now through Gretel SDK. Explore this blog describing a finance industry case study and checkout details in technical documentation for fine-tuning Azure OpenAI Service models with synthetic data from Gretel. Visit this page to learn more Statsig and Azure Collaboration: Enabling Experimentation in AI Applications Statsig is a platform for feature management and experimentation that helps teams manage releases, run powerful experiments, and measure the performance of their products. Statsig and Azure AI Foundry are collaborating to enable customers to easily configure and run experiments (A/B tests) in Azure AI-powered applications, using Statsig SDKs in Python, NodeJS and .NET. With these Statsig SDKs, customers can manage the configuration of their AI applications, manage the release of new configurations, run A/B tests to optimize model and application performance, and automatically collect metrics at the model and application level. Please check out this page to learn more about the collaboration and get detailed documentation here. Conclusion The new collaborations between Azure and Weights & Biases, Scale AI, Gretel and Statsig represent a significant step forward in simplifying the process of AI model customization. These collaborations aim to address the common pain points associated with fine-tuning models, including lack of end-to-end tooling, data scarcity and privacy concerns, lack of expertise and experimentation tooling. Through these collaborations, Azure AI Foundry will empower organizations to fine-tune and customize models more efficiently, ultimately enabling faster, more accurate AI deployments. Whether it’s through better model tracking, access to synthetic data, or scalable data preparation services, these collaborations will help businesses unlock the full potential of AI.2.7KViews3likes1CommentThe Future of AI: Customizing AI agents with the Semantic Kernel agent framework
The blog post Customizing AI agents with the Semantic Kernel agent framework discusses the capabilities of the Semantic Kernel SDK, an open-source tool developed by Microsoft for creating AI agents and multi-agent systems. It highlights the benefits of using single-purpose agents within a multi-agent system to achieve more complex workflows with improved efficiency. The Semantic Kernel SDK offers features like telemetry, hooks, and filters to ensure secure and responsible AI solutions, making it a versatile tool for both simple and complex AI projects.271Views2likes0Comments