microsoft ignite 2024
14 TopicsIntroducing Azure AI Agent Service
Introducing Azure AI Agent Service at Microsoft Ignite 2024 Discover how Azure AI Agent Service is revolutionizing the development and deployment of AI agents. This service empowers developers to build, deploy, and scale high-quality AI agents tailored to business needs within hours. With features like rapid development, extensive data connections, flexible model selection, and enterprise-grade security, Azure AI Agent Service sets a new standard in AI automation63KViews10likes8CommentsA Roadmap to Microsoft Ignite 2024 for AI Developers at Startups and Cloud-First Companies
Microsoft Ignite starts November 19th. Register now to attend virtually. Microsoft Ignite, the annual conference where customers, partners, and tech professionals come together to learn about Microsoft technology and AI solutions kicks off Nov 19 th in Chicago. If you don’t already have a pass to attend in-person, you can register as a virtual attendee and get access to all the keynotes and breakout sessions. Generative AI is rapidly evolving. Take for example, the recent launch of OpenAI’s newest models o1-preview and o1-mini. These models have enhanced reasoning capabilities with multistep reasoning. While earlier models excelled at language tasks such as writing and editing, the new models are designed to handle complex reasoning tasks such as advanced mathematics, coding, and STEM-based questions. They use a “chain of thought” technique that leads to increased accuracy versus earlier language-based models. Each generation of models continues to provide considerable performance improvement with lower costs. And that’s just OpenAI. There is a plethora of cutting-edge solutions providing models, frameworks, vector databases, LLM observability platforms, and many more tools for developers building GenAI apps. GenAI has spurred a record share of startup funding this year with 35% of US startup investment going to AI-related companies per Crunchbase data. All these advancements are making GenAI easier to build, more affordable and suitable for a wide range of use cases. While a large majority of the companies are jumping on the genAI bandwagon, experimenting with use cases and technologies, many are not running genAI apps at scale in production just yet. Microsoft has been an early innovator in the GenAI wave, and has been recognized as a leader in the 2024 Gartner Magic Quadrant for Cloud AI Developer Services as well as the latest Gartner Magic Quadrant for Cloud Application Platform. Microsoft Ignite promises to feature exciting announcements and learning opportunities to help you build, deploy, and run your GenAI apps at scale securely and responsibly. More importantly, you’ll also hear from peers at leading startups and at-scale cloud-first companies on how they are innovating and disrupting with GenAI. Here are the top must-watch sessions: Microsoft Ignite opening keynote (Nov 19, 6-8.30am PST). Reason to watch: Satya Nadella, CEO Microsoft, will be joined by Scott Guthrie, EVP Cloud and AI, Charlie Bell, EVP Microsoft Security, and Rajesh Jha, EVP Experiences and Devices, to talk about how this era of AI will unlock new opportunities, transform how developers work and drive business productivity across industries. Implementing AI Responsibly and Securely (Nov 19, 12.45-1.30pm PST). Reason to watch: The session addresses startup-specific issues like limited resources, ethical considerations, and building scalable AI solutions, offering practical tips. Annie Pearl, CVP/GM of Azure Experiences and Ecosystems leads this session featuring Adriano Koshiyama, Co-Founder Holistic AI and Sam Dover, Global AI Strategy Lead at Unilever. How Microsoft can help you build and sell differentiated AI apps (Nov 19, 2-2.45pm PST). Reason to watch: Startups and cloud-first companies are looking not just for a cloud platform to build their genAI apps on, but also a reliable technology partner that can help them grow their business. Jason Graefe, CVP ISV and Digital Natives is joined by Microsoft leaders for AI, developer services, and application platform as well as Adam Binnie, Chief Innovation Officer at Visier and Jeff Zobrist, VP global Partner Network and GTM at Siemens Digital Industries Software, to dive deep into building on the Microsoft Cloud and partnering with Microsoft to grow your business. Strategies for enterprises to fast-track AI adoption (Nov 21, 6.30-7.15 am PST). Reason to watch: This fireside chat, led by Tom Davis, Partner at Microsoft for Startups features Shlomi Avigdor, CPO Dataloop AI, Surojit Chatterjee, Founder and CEO Ema Unlimited, and Gil Perry, CEO and Co-founder D-ID. It delves into strategies and methods from industry leading startups who have worked hand in hand with global Enterprises to expedite the integration and widespread use of artificial intelligence across various industries. Inspire AI enterprise revolution with innovations from NVIDIA startups (Nov 21, 7.45-8.30 am PST). Reason to watch: NVIDIA Inception Head of Cloud Partnerships, Jennifer Hoskins, hosts a power-packed panel including Dr Vibhor Gupta, CEO Pangaea Data, Lauren Nemeth, COO Pinecone, and Andrew Gostine, CEO Artisight to Gain insights into how AI is driving groundbreaking innovations, enhancing efficiency, and creating new business opportunities across sectors. Unicorns Unleashed: Scaling Innovation with Microsoft (Nov 21, 12.30-1 pm PST). Reason to watch: If you are going to be attending in-person in Chicago, this panel led by Ross Kennedy, VP Digital Natives features visionary leaders from leading startups across the globe including Marc Jones CTO Shift Technology, Mike Murchison CEO and Co-founder Ada, and Sergio Passos CTO Blip. Best practices for building and monetizing your AI applications (Nov 21, 1.45-2.30pm PST). Reason to watch: This session will offer best practices and hands-on guidance to help maximize your AI product’s success, including product strategy, considerations for tech stack, GTM and even pricing. It features Dave Uppal, VP of ecosystem at Moveworks, and Alois Reitbauer, Chief Technology Strategist at Dynatrace. Disrupt and grow with Azure AI (On Demand). Reason to watch: If you are new to Azure, watch this on-demand session to learn about the full stack of Azure services that help you build, run, and scale your GenAI apps. Anand Kulkarni, CEO Crowdbotics will talk about their AI-powered app development platform built on Azure. Azure OpenAI for Startups and Digital-First Companies (On Demand). Reason to watch: While there is a ton of content on Azure OpenAI, this one specifically talks about challenges facing startups and at-scale cloud-first companies and how Azure OpenAI can help accelerate innovation. Mike Murchison, CEO Ada will discuss how Ada’s AI-powered customer service automation platform built on Azure OpenAI delivers streamlined and differentiated customer service experience to enterprises. These are just a few sessions featuring thought leaders from Microsoft and the startup industry. If you are a developer, there is a lot of content across the Azure AI, application platform, database, developer services, and infrastructure stack you can put on your watch list based on the models, use cases, and tools you use. I highly recommend checking out the session catalog to discover sessions that are most relevant to you. Popular AI toolchain solutions such as Arize, LlamaIndex, Pinecone, Weights & Biases, and Hugging Face are also featured in the content. Thousands of startups and at-scale software development companies are innovating with Microsoft. D-ID, for example, provides a Natural User Interface (NUI)-based immersive content platform for businesses specializing in customer experience, marketing, and sales. Enveda, a biotechnology company, uses AI-powered tools to identify and characterize a wide range of molecules produced by living organisms—the vast majority of which have never been explored by science—creating a database of chemical biodiversity. Cribl, the Data Engine for IT and Security, partners with Microsoft to make its data management capabilities available to enterprises to help improve security posture and enhance visibility. This is truly the era of AI and cloud-first companies are at the forefront of this wave. Join us at Microsoft Ignite 2024 and let’s build the future together! Btw, if you are looking to quickly get started with building apps on Azure for common AI use cases in as little as 5 mins, I highly recommend checking out the AI App Template Gallery. Resources: Microsoft Azure Microsoft for Startups Founders Hub Digital Natives on Azure Build AI applications with the new AI App Template Gallery Try Azure for free1.5KViews0likes1CommentEnter new era of enterprise communication with Microsoft Translator Pro & document image translation
Microsoft Translator Pro: standalone, native mobile experience We are thrilled to unveil the gated public preview of Microsoft Translator Pro, our robust solution designed for enterprises seeking to dismantle language barriers in the workplace. Available on iOS, Microsoft Translator Pro offers a standalone, native experience, enabling speech-to-speech translated conversations among coworkers, users, or clients within your enterprise ecosystem. Watch how Microsoft Translator Pro transforms a hotel check-in experience by breaking down language barriers. In this video, a hotel receptionist speaks in English, and the app translates and plays the message aloud in Chinese for the traveler. The traveler responds in Chinese, and the app translates and plays the message aloud in English for the receptionist. Key features of the public preview Our enterprise version of the app is packed with features tailored to meet the stringent demands of enterprises: Core feature - speech-to-speech translation: Break language barriers: Real-time speech-to-speech translation allows you to have seamless communication with individuals speaking different languages. Unified experience: View or hear both transcription and translation simultaneously on a single device, ensuring smooth and efficient conversations. On-device translation: Harness the app's speech-to-speech translation capability without an internet connection in limited languages, ensuring your productivity remains unhampered. Full administrator control: Enterprise IT Administrators wield extensive control over the app's deployment and usage within your organization. They can fine-tune settings to manage conversation history, audit, and diagnostic logs, with the ability to disable history or configure automatic exportation of the history to cloud storage. Uncompromised privacy and security: Microsoft Translator Pro provides enterprises with a high level of translation quality and robust security. We know that Privacy and security are top priorities for you. Once granted access by your organization's admin, you can sign in the app with your organizational credentials. Your conversational data remains strictly yours, safeguarded within your Azure tenant. Neither Microsoft nor any external entities have access to your data. Join the Preview To embark on this journey with us, please complete the gating form . Upon meeting the criteria, we will grant your organization access to the paid version of the Microsoft Translator Pro app, which is now available in the US. Learn more and get started: Microsoft Translator Pro documentation. Document translation translates text embedded in images Our commitment to advancing cross-language communication takes a major step forward with a new enhancement in Azure AI Translator’s Document Translation (DT) feature. Previously, Document Translation supported fully digital documents and scanned PDFs. Starting January 2025, with this latest update, the service can also process mixed-content documents, translating both digital text and text embedded within images. Sample document translated from English to Spanish: (Frames in order: Source document, translated output document (image not translated), translated output document with image translation) How It Works To enable this feature, the Document Translation service now leverages Microsoft Azure AI Vision API to detect, extract, and translate text from images within documents. This capability is especially useful for scenarios where documents contain a mix of digital text and image-based text, ensuring complete translations without manual intervention. Getting Started To take advantage of this feature, customers can use the new optional parameter when setting up a translation request: Request A new parameter under "options" called "translateTextWithinImage" has been introduced. This parameter is of type Boolean, accepting "true" or "false." The default value is "false," so you’ll need to set it to "true" to activate the image text translation capability. Response: When this feature is enabled, the response will include additional details for transparency on image processing: totalImageScansSucceeded: The count of successfully translated image scans. totalImageScansFailed: The count of image scans that encountered processing issues. Usage and cost For this feature, customers will need to use the Azure AI Services resource, as this new feature leverages Azure AI Vision services along with Azure AI Translator. The OCR service incurs additional charges based on usage. Pricing details for the OCR service can be found here: Pricing details Learn more and get started (starting January 2025): Translator Documentation These new advancements reflect our dedication to pushing boundaries in Document Translation, empowering enterprises to connect and collaborate more effectively, regardless of language. Stay tuned for more innovations as we continue to expand the reach and capabilities of Microsoft Azure AI Translator.3.4KViews0likes0CommentsAnnouncing an accelerator for GenAI-powered assistants using Azure AI Language and Azure OpenAI
We’re thrilled to introduce a new accelerator solution in GitHub Azure-Samples library designed specifically for creating and enhancing your GenAI-based conversational assistants with robust, human-controllable workflows. This accelerator uses key services from Azure AI Language in addition to Azure OpenAI, including PII detection to protect sensitive information, Conversational Language Understanding (CLU) to predict top users’ intents, Custom Question Answering (CQA) to respond to top questions with deterministic answers. Together with Azure OpenAI and Large Language Models (LLMs), the solution is designed to orchestrate and deliver a smooth, human-guided, controllable and deterministic conversational experience. The integration with LLMs will come soon. It’s perfect for developers and organizations looking to build assistants that can handle complex queries, route tasks, and provide reliable answers, all with a controlled, scalable architecture. Why This Accelerator While LLMs have been appreciated by many customers to build conversational assistants for natural, engaging, and context-aware interactions, there are challenges such as the significant efforts required in prompt engineering, document chunking, and reducing hallucinations to improve the quality of their Retrieval-Augmented Generation (RAG) solutions. If an AI quality issue is discovered in production, customers need to find an effective way to address it promptly. This solution aims to help customers utilize offerings in the Azure AI portfolio and address key challenges when building Generative AI (GenAI) assistants. Designed for flexibility and reliability, this accelerator enables human-controllable workflows that meet real-world customer needs. It minimizes the need for extensive prompt engineering by using a structured workflow to prioritize top questions with exact answers and custom intents that are critical to your business and use LLM to handle topics in a conversation that have lower priorities. This architecture not only enhances answer quality and control but also ensures that complex queries are handled efficiently. If you want to fix quickly an incorrect answer for your chatbot built with RAG, you can also attach this accelerator solution to your existing RAG solution and quickly add a QA pair with the correct response in CQA to fix the issue for your users. What This Accelerator Delivers This accelerator provides and demonstrates an end-to-end orchestration using a few capabilities in Azure AI Language and Azure OpenAI for conversational assistants. It can be applied in various scenarios where control over assistant behavior and response quality is essential, like call centers, help desks, and other customer support applications. Below is a reference architecture of the solutions: Key components of this solution include (components in dash boxes coming soon): Client-Side User Interface for Demonstration (coming soon) A web-based client-side interface is included in the accelerator solution, to showcase the accelerator solution in an interactive, user-friendly format. This web UI allows you to quickly explore and test this solution, such as its orchestration routing behavior and functionalities. Workflow Orchestration for Human-Controllable Conversations By combining services like CLU, CQA, and LLMs, the accelerator allows for a dynamic, adaptable workflow. CLU can recognize and route customer-defined intents, while CQA provides exact answers from predefined QA pairs. If a question falls outside the pre-defined scope, the workflow can seamlessly fall back to LLMs, which is enhanced with RAG for contextually relevant, accurate responses. This workflow ensures human-like adaptability while maintaining control over assistant responses. Conversational Language Understanding (CLU) for Intent Routing The CLU service allows you to define the top intents you want the assistants to handle. The top intents can be those critical to your business and/or those most users ask your assistants. This component plays a central role in directing conversations by interpreting user intents and routing them to the right action or AI agents. Whether completing a task or addressing specific customer needs, CLU provides the mechanism to ensure the assistant accurately understands and executes the process of handling custom-defined intents. Custom Question Answering (CQA) for Exact Answers and with No Hallucinations CQA allows you to create and manage predefined QA pairs to deliver precise responses, reducing ambiguity and ensuring that the assistant aligns closely with defined answers. This controlled response approach maintains consistency in interactions, improving reliability, particularly for high-stake or regulatory-sensitive conversations. You can also attach CQA to your existing RAG solution to quickly fix incorrect answers. PII Detection and Redaction for Privacy Protection (coming soon) Protecting user privacy is a top priority, especially in conversational AI. This accelerator showcases an optional integration of Azure AI Language’s Personally Identifiable Information (PII) to automatically identify and redact sensitive information, if compliance with privacy standards and regulations is required LLM with RAG to Handle Everything Else (coming soon) In this accelerator, we are using a RAG solution to handle missed intents or user queries on lower-priority topics. This RAG solution can be replaced with your existing one. The predefined intents and question-answer pairs can be appended and updated over time based on evolving business needs and DSATs (dissatisfaction) discovered in the RAG responses. This approach ensures controlled and deterministic experiences for high-value or high-priority topics while maintaining flexibility and extensibility for lower-priority interactions. Components Configuration for "Plug-and-Play" One of the standout features of this accelerator is its flexibility through a "plug-and-play" component configuration. The architecture is designed to allow you to easily swap, add, or remove components to tailor the solution to your specific needs. Whether you want to add custom intents, adjust fallback mechanisms, or incorporate additional data sources, the modular nature of the accelerator makes it simple to configure. Get Started Building Your GenAI-Powered Assistant Today Our new accelerator is available on GitHub, ready for developers to deploy, customize, and use as a foundation for your own needs. Join us as we move towards a future where GenAI can empower organizations to meet business needs with intelligent, adaptable, and human-controllable assistants. What’s more: Other New Azure AI Language Releases This Ignite Beyond these, Azure AI Language provides additional capabilities to support GenAI customers in more scenarios to ensure quality, privacy and flexible deployment in any types of environments, either clouds or on premises. We are also excited to announce the following new features launching at Ignite. Azure AI Language in Azure AI Studio: Azure AI Language is moving to AI Studio. Extract PII from text, Extract PII from conversation, Summarize text, Summarize conversation, Summarize for call center, and Text Analytics for health are now available in AI Studio playground. More skills follow. Conversational Language Understanding (CLU): Today, customers use CLU to build custom natural language understanding models hosted by Azure to predict the overall intention of an incoming utterance and extract important information from it. However, some customers have specific needs that require an on-premise connection. We are excited to announce runtime containers for CLU for these specific use cases. PII Detection and Redaction: Azure AI Language offers Text PII and Conversational PII services to extract personally identifiable information from input text and conversation to enhance privacy and security, oftentimes before sending data to the cloud or an LLM. We are excited to announce new improvements to these services - the preview API (version 2024-11-15-preview) now supports the option to mask detected sensitive entities with a label (i.e. “John Doe received a call from 424-878-9192” can now be masked with an entity label, i.e. . “[PERSON_1] received a call from [PHONENUMBER_1]”. More on how to specify the redaction policy style for your outputs can be found in our documentation. Native document support: The gating control is removed with the latest API version, 2024-11-15-preview, allowing customers to access native document support for PII redaction and summarization. Key updates in this version include: - Increased Maximum File Size Limits (from 1 MB to 10 MB). - Enhanced PII Redaction Customization: Customers can now specify whether they want only the redacted document or both the redacted document and a JSON file containing the detected entities. Language detection: Language detection is a preconfigured feature that can detect the language a document is written in and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages. We are happy to announce today the general availability of scription detection capability, and 16 more languages support, which adds up to 139 total supported languages. Named entity recognition (NER): The Named Entity Recognition (NER) service supports customer scenarios for identifying and analyzing entities such as addresses, names, and phone numbers from inputs text. NER’s Generally Available API (version 2024-11-01) now supports several optional input parameters (inclusionList, exclusionList, inferenceOptions, and overlapPolicy) as well as an updated output structure (with new fields tags, type, and metadata) to ensure enhanced user customization and deeper analysis. More on how to use these parameters can be found in our documentation. Text analytics for health: Text analytics for health (TA4H) is a preconfigured feature that extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. Today, we released support for Fast Healthcare Interoperability Resources (FHIR) structuring and temporal assertion detection in the Generally Available API.1.8KViews3likes0CommentsExciting Enhancements in Azure OpenAI Service: Provisioned Deployment and Global Standard Support
Provisioned Deployment Support for Fine-Tuned Models Starting from December, Azure OpenAI Service will offer Provisioned deployment support for fine-tuned models. This new feature ensures that your AI workloads receive the necessary compute resources, providing fast and consistent results regardless of their demands. Whether you’re building a chatbot to handle thousands of users or running complex batch processes, this feature can elevate your application to the next level. Models Supported: GPT-4o, GPT-4o-mini Regions Supported (Public Preview): North Central US and Sweden Central What is Provisioned Throughput? Imagine hosting a party where you have a private table just for your guests, separate from the general buffet. Provisioned Throughput is similar for your AI models. It reserves dedicated compute resources in Azure OpenAI Service, so your models don’t have to compete with others for power, resulting in consistent performance. What’s New About This Offering? Provisioned deployment for fine-tuned models offers the same features as the existing Provisioned deployment for base models, with the added option to deploy a fine-tuned model with the Provisioned SKU. Overall, provisioned deployment for fine-tuning offers customers the benefits of consistent performance, predictable pricing, cost-effective scaling, and the ability to handle high throughput requirements, making it a valuable option for those looking to optimize their AI model deployments. How to Get Started? The public preview will be available in early December. Stay tuned for more updates! Global Standard Support for Fine-Tuned Model Inferencing We are thrilled to announce that Global Standard deployment support for Azure OpenAI Service fine-tuned model inferencing will be available as a Public Preview starting early December. This launch provides greater flexibility and higher availability for our valued customers. What Motivated Us? Until now, Azure OpenAI Service fine-tuning was limited to specific regions, creating challenges in managing resources and capacity for training and inferencing. Customers often had to copy their fine-tuned models across different regions, which was not ideal. We understand the inconvenience and are excited to address it with this new feature. What’s in It for You? Expanded Regional Support: Azure OpenAI fine-tune capabilities will now be available in additional regions, including East US, West US 3, and various European regions. Increased Flexibility: You will no longer be constrained by regional capacity limits, allowing for seamless growth in your fine-tuned model usage. Enhanced Backup and Disaster Recovery: Improved backup and disaster recovery options will be available with multi-LoRA endpoints for a base model enabled across multiple regions. The launch of Provisioned deployment support and Global Standard support for Azure OpenAI fine-tuned model inferencing marks a significant step forward. By ensuring consistent AI performance and addressing regional capacity limits, we aim to provide you with a smoother, more flexible experience. We’re excited to see the positive impact these updates will have on your operations. Stay tuned for more updates as we continue to innovate and expand our services. Thank you for being on this journey with us! Ready to Get Started? Learn more about Azure OpenAI Service Watch this Ignite session about new fine-tuning capabilities in Azure OpenAI Service Check out our How-To Guide for Fine Tuning with Azure OpenAI Try it out with Azure AI Foundry1.3KViews0likes0CommentsTransforming Video into Value with Azure AI Content Understanding
Unlocking Value from Unstructured Video Every minute, social video sharing platforms see over 500 hours of video uploads [1] and 91% of businesses leverage video as a key tool[2]. From media conglomerates managing extensive archives to enterprises producing training and marketing materials, organizations are overwhelmed with video. Yet, despite this abundance, video remains inherently unstructured and difficult to utilize effectively. While the volume of video content continues to grow exponentially, its true value often remains untapped due to the friction involved in making video useful. Organizations grapple with several pain points: Inaccessibility of Valuable Content Archives: Massive video archives sit idle because finding the right content to reuse requires extensive manual effort. The Impossibility of Personalization Without Metadata: Personalization holds the key to unlocking new revenue streams and increasing engagement. However, without reliable and detailed metadata, it's cost-prohibitive to tailor content to specific audiences or individuals. Missed Monetization Opportunities: For media companies, untapped archives mean missed chances to monetize content through new formats or platforms. Operational Bottlenecks: Enterprises struggle with slow turnaround times for training materials, compliance checks, and marketing campaigns due to inefficient video workflows, leading to delays and increased expenses. Many video processing application rely on purpose-built, frame-by-frame analysis to identify objects and key elements within video content. While this method can detect a specific list of objects, it is inherently lossy, struggling to capture actions, events, or uncommon objects. It also is expensive and time consuming to customize for specific tasks. Generative AI promises to revolutionize video content analysis, with GPT-4o topping leaderboards for video understanding tasks, but finding a generative model that processes video is just the first step. Creating video pipelines with generative models is hard. Developers must invest significant effort in infrastructure to create custom video processing pipelines to get good results. These systems need optimized prompts, integrated transcription, smart handling of context-window limitations, shot aligned segmentation, and much more. This makes them expensive to optimize and hard to maintain over time. Introducing Azure AI Content Understanding for video This is where Azure AI Content Understanding transforms the game. By offering an integrated video pipeline that leverages advanced foundational models, you can effortlessly extract insights from both the audio and visual elements of your videos. This service transforms unstructured video into structured, searchable knowledge, enabling powerful use cases like media asset management and highlight reel generation. With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full context. For example, for corporate events and conferences you can quickly produce same-day highlight reels. This capability not only reduces the time and cost associated with manual editing but also empowers organizations to deliver timely, professional reaction videos that keep audiences engaged and informed. In another case, A news broadcaster can create a new personalized viewing experience for news by recommending stories of interest. This is achieved by automatically tagging segments with relevant metadata like topic and location, enabling the delivery of content personalized to individual interests, driving higher engagement and viewer satisfaction. By generating specific metadata on a segment-by-segment basis, including chapters, scenes, and shots, Content Understanding provides a detailed outline of what's contained in the video, facilitating these workflows. This is enabled by a streamlined pipeline for video that starts with content extraction tasks like transcription, shot detection, key frame extraction, and face grouping to create grounding data for analysis. Then, generative models use that information to extract the specific fields you request for each segment of the video. This generative field extraction capability enables customers to: Customize Metadata: Tailor the extracted information to focus on elements important to your use case, such as key events, actions, or dialogues. Create Detailed Outlines: Understand the structure of your video content at a granular level. Automate Repetitive Editing Tasks: Quickly pinpoint important segments to create summaries, trailers, or compilations that capture the essence of the full video. By leveraging these capabilities, organizations can automate many video creation tasks including creating highlight reels and repurposing content across formats, saving time and resources while delivering compelling content to their audiences. Whether it's summarizing conference keynotes, capturing the essence of corporate events, or showcasing the most exciting moments in sports, Azure AI Content Understanding makes video workflows efficient and scalable. But how do these solutions perform in real-world scenarios? Customer Success Stories IPV Curator: Transforming Media Asset Management IPV Curator, a leader in media asset management solutions, assists clients in managing and monetizing extensive video libraries across various industries, including broadcast, sports, and global enterprises. It enables seamless, zero-download editing of video in Azure cloud using Adobe applications. Their customers needed an efficient way to search, repurpose, and produce vast amounts of video content with data extraction tailored to specific use cases. IPV integrated Azure AI Content Understanding into their Curator media asset management platform. They found that it provided a step-function improvement in metadata extraction for their clients. It was particularly beneficial as it enabled: Industry Specific Metadata: Allowed clients to extract metadata tailored to their specific needs by using simple prompts and without the need for domain-specific training of new AI models. For example: Broadcast: Rapidly identified key scenes for promo production and to efficiently identify their highest value content for Free ad-supported streaming TV (FAST) channels. Travel Marketing Content: Automatically tagged geographic locations, landmarks, shot types (e.g., aerial, close-up), and highlighted scenic details. Shopping Channel Content: Detected specific products, identified demo segments, product categories, and key selling points. Advanced Action and Event Analysis: Enabled detailed analysis of a set of frames in a video segment to identify actions and events. This provides a new level of insights compared to frame-by-frame analysis of objects. Segmentation Aligned to Shots: Detected shot boundaries in produced videos and in-media edit points, enabling easy reuse by capturing full shots in segments. As a result, IPV's clients can quickly find and repurpose content, significantly reducing editing time and accelerating video production at scale. IPV Curator enables search across industry specific metadata extracted from videos "IPV's collaboration with Microsoft transforms media stored in Azure into an easily accessible, streaming, and highly searchable active archive. The powerful search engine within IPV's new generation of Media Asset Management uses Azure AI Content Understanding to accurately surface any archived video clip, driving users to their highest value content in seconds." —Daniel Mathew, Chief Revenue Officer, IPV Cognizant: Innovative Ad Moderation Cognizant, a global leader in consulting and professional services, has identified a challenge of moderating advertising content for its media customers. Their customers' traditional methods are heavily reliant on manual review and struggling to scale with the increasing volume of content requiring assessment. The Cognizant Ad Moderation solution framework leverages Content Understanding to create a more accurate, cost-effective approach to ad moderation that results in a 96% reduction in review time. It allows customers to automate ad reviews to ensure cultural sensitivity, regulatory compliance, and optimizing programming placement, ultimately reducing manual review efforts. Cognizant achieves these results by leveraging Content Understanding for multimodal field extraction, tailored output, and native generative AI video processing. Multimodal Field Extraction: Extracts key attributes from both the audio and visual elements, allowing for a more comprehensive analysis of the content. This analysis is critical to get a holistic view of suitability for various audiences. Tailored Output Schema: Outputs a custom structured schema that detects content directly relevant to the moderation task. This includes detecting specific risky attributes like prohibited language, potentially banned topics, violations of content restrictions, and sensitive products like alcohol or smoking. Native Generative AI Video Processing: Content Understanding natively processes video files with generative AI to provide the detailed insights requested in the schema capturing context, actions, and events over entire segments of the video. This optimized video pipeline provides Cognizant with a detailed analysis of videos to ground an automated decision. It allows them to quickly green light compliant ads and flag others for rejection or human review. Content Understanding empowers Cognizant to focus on solving business challenges rather than managing the underlying infrastructure for video processing and integrating generative models. “I'm absolutely thrilled about the Azure AI Content Understanding service! It's a game-changer that accelerates processing by integrating multiple AI capabilities into a single service call, delivering combined audio and video transcription in one JSON output with incredibly detailed results. The ability to add custom fields that integrate with an LLM provides even more detailed, meaningful, and flexible output.” - Rushil Patel – Developer @ Cognizant The Broader Impact: Transformation across industries The transformative power of Azure AI Content Understanding extends far beyond these specific use cases, offering significant benefits across various industries and workflows. By leveraging advanced AI capabilities on video, organizations have been able to unlock new opportunities and drive innovation in several key areas: Social Media Listening and Consumer Insights: Analyze video content across social platforms to understand how products are perceived and discussed online. Gain valuable consumer insights to inform product development, marketing strategies, and brand management. Unlocking Video for AI Assistants and Agents: Enable AI assistants and agents to access and utilize information from video content, transforming meeting recordings, training videos, and events into valuable data sources for Retrieval-Augmented Generation (RAG). Enhance customer support and knowledge management by integrating video insights into AI-driven interactions. Enhancing Accessibility with Audio Descriptions: Generate draft audio descriptions for video content to provide a starting point for human editors. This streamlines the creation of accessible content for visually impaired audiences, reducing effort and accelerating compliance with accessibility standards. Marketing and Advertising Workflows: Automate content analysis to ensure brand alignment and effective advertising. Understand and optimize the content within video advertisements to maintain consistent branding and enhance audience engagement. The business value of Azure AI Content Understanding is clear. By addressing core challenges in video content management with generative AI, customization, and native video processing, it enhances operational efficiencies and unlocks new opportunities for monetization and innovation. Organizations can now turn dormant video archives into valuable assets, deliver personalized content to engage audiences effectively, and automate manual time-consuming workflows. Ready to Transform Your Video Content? For more details on how to use Content Understanding for video check out the Video Solution Overview. If you are at Microsoft Ignite 2024 or are watching online, check out this breakout session. Try this new service in Azure AI Foundry. For documentation, please refer to the Content Understanding Overview For a broader perspective, see Announcing Azure AI Content Understanding: Transforming Multimodal Data into Insights and discover how it extends these capabilities across all content formats. ----- [1] According to Statistia in 2022 - Hours of video uploaded every minute 2022 | Statista [2] According to a Wyzowl survey in 2024 - Video Marketing 2024 (10 Years of Data) | Wyzowl3.4KViews0likes0CommentsAnnouncing Azure AI Content Understanding: Transforming Multimodal Data into Insights
Solve Common GenAI Challenges with Content Understanding As enterprises leverage foundation models to extract insights from multimodal data and develop agentic workflows for automation, it's common to encounter issues like inconsistent output quality, ineffective pre-processing, and difficulties in scaling out the solution. Organizations often find that to handle multiple types of data, the effort is fragmented by modality, increasing the complexity of getting started. Azure AI Content Understanding is designed to eliminate these barriers, accelerating success in Generative AI workflows. Handling Diverse Data Formats: By providing a unified service for ingesting and transforming data of different modalities, businesses can extract insights from documents, images, videos, and audio seamlessly and simultaneously, streamlining workflows for enterprises. Improving Output Data Accuracy: Deriving high-quality output for their use-cases requires practitioners to ensure the underlying AI is customized to their needs. Using advanced AI techniques like intent clarification, and a strongly typed schema, Content Understanding can effectively parse large files to extract values accurately. Reducing Costs and Accelerating Time-to-Value: Using confidence scores to trigger human review only when needed minimizes the total cost of processing the content. Integrating the different modalities into a unified workflow and grounding the content when applicable allows for faster reviews. Core Features and Advantages Azure AI Content Understanding offers a range of innovative capabilities that improve efficiency, accuracy, and scalability, enabling businesses to unlock deeper value from their content and deliver a superior experience to their end users. Multimodal Data Ingestion and Content Extraction: The service ingests a variety of data types such as documents, images, audio, and video, transforming them into a structured format that can be easily processed and analyzed. It instantly extracts core content from your data including transcriptions, text, faces, and more. Data Enrichment: Content Understanding offers additional features that enhance content extraction results, such as layout elements, barcodes, and figures in documents, speaker recognition and diarization in audio, and more. Schema Inferencing: The service offers a set of prebuilt schemas and allows you to build and customize your own to extract exactly what you need from your data. Schemas allow you to extract a variety of results, generating task-specific representations like captions, transcripts, summaries, thumbnails, and highlights. This output can be consumed by downstream applications for advanced reasoning and automation. Post Processing: Enhances service capabilities with generative AI tools that ensure the accuracy and usability of extracted information. This includes providing confidence scores for minimal human intervention and enabling continuous improvement through user feedback. Transformative Applications Across Industries Azure AI Content Understanding is ideal for a wide range of use cases and industries, as it is fully customizable and allows for the input of data from multiple modalities. Here are just a few examples of scenarios Content Understanding is powering today: Post call analytics: Customers utilize Azure AI Content Understanding to extract analytics on call center or recorded meeting data, allowing you to aggregate data on the sentiment, speakers, and content discussed, including specific names, companies, user data, and more. Media asset management and content creation assistance: Extract key features from images and videos to better manage media assets and enable search on your data for entities like brands, setting, key products, people, and more. Insurance claims: Analyze and process insurance claims and other low-latency batch processing scenarios to automate previously time-intensive processes. Highlight video reel generation: With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full content. For example, automatically generate a first draft of highlight reels from conferences, seminars, or corporate events by identifying key moments and significant announcements. Retrieval Augmented Generation (RAG): Ingest and enrich content of any modality to effectively find answers to common questions in scenarios like customer service agents, or power content search scenarios across all types of data. Customer Success with Content Understanding Customers all over the world are already finding unique and powerful ways to accelerate their inferencing and unlock insights on their data by leveraging the multi modal capabilities of Content Understanding. Here are a few examples of how customers are unlocking greater value from their data: Philips: Philips Speech Processing Solutions (SPS) is a global leader in dictation and speech-to-text solutions, offering innovative hardware and software products that enhance productivity and efficiency for professionals worldwide. Content Understanding enables Philips to power their speech-to-result solution, allowing customers to use voice to generate accurate, ready-to-use documentation. “With Azure AI Content Understanding, we're taking Philips SpeechLive, our speech-to-result solution to a whole new level. Imagine speaking, and getting fully generated, accurate documents—ready to use right away, thanks to powerful AI speech analytics that work seamlessly with all the relevant data sources.” – Thomas Wagner, CTO Philips Dictation Services WPP: WPP, one of the world’s largest advertising and marketing services providers, is revolutionizing website experiences using Azure AI Content Understanding. SJR, a content tech firm within WPP, is leveraging this technology for SJR Generative Experience Manager (GXM) which extracts data from all types of media on a company's website—including text, audio, video, PDFs, and images—to deliver intelligent, interactive, and personalized web experiences, with the support of WPP's AI technology company, Satalia. This enables them to convert static websites into dynamic, conversational interfaces, unlocking information buried deep within websites and presenting it as if spoken by the company's most knowledgeable salesperson. Through this innovation, WPP's SJR is enhancing customer engagement and driving conversion for their clients. ASC: ASC Technologies is a global leader in providing software and cloud solutions for omni-channel recording, quality management, and analytics, catering to industries such as contact centers, financial services, and public safety organizations. ASC utilizes Content Understanding to enhance their compliance analytics solution, streamlining processes and improving efficiency. "ASC expects to significantly reduce the time-to-market for its compliance analytics solutions. By integrating all the required capture modalities into one request, instead of customizing and maintaining various APIs and formats, we can cover a wide range of use cases in a much shorter time.” - Tobias Fengler, Chief Engineering Officer Numonix: Numonix AI specializes in capturing, analyzing, and managing customer interactions across various communication channels, helping organizations enhance customer experiences and ensure regulatory compliance. They are leveraging Content Understanding to capture insights from recorded call data from both audio and video to transcribe, analyze, and summarize the contents of calls and meetings, allowing them to ensure compliance across all conversations. “Leveraging Azure AI Content Understanding across multiple modalities has allowed us to supercharge the value of the recorded data Numonix captures on behalf of our customers. Enabling smarter communication compliance and security in the financial industry to fully automating quality management in the world’s largest call centers.” – Evan Kahan, CTO & CPO Numonix IPV Curator: A leader in media asset management solutions, IPV is leveraging Content Understanding to improve their metadata extraction capabilities to produce stronger industry specific metadata, advanced action and event analysis, and align video segmentation to specific shots in videos. IPV’s clients are now able to accelerate their video production, reduce editing time, access their content more quickly and easily. To learn more about how Content Understanding empowers video scenarios as well as how our customers such as IPV are using the service to power their unique media applications, check out Transforming Video Content into Business Value. Robust Security and Compliance Built using Azure’s industry-leading enterprise security, data privacy, and Responsible AI guidelines, Azure AI Content Understanding ensures that your data is handled with the utmost care and compliance and generates responses that align with Microsoft’s principles for responsible use of AI. We are excited to see how Azure AI Content Understanding will empower organizations to unlock their data's full potential, driving efficiency and innovation across various industries. Stay tuned as we continue to develop and enhance this groundbreaking service. Getting Started If you are at Microsoft Ignite 2024 or are watching online, check out this breakout session on Content Understanding. Learn more about the new Azure AI Content Understanding service here. Build your own Content Understanding solution in the Azure AI Foundry. For all documentation on Content Understanding, please refer to this page.4.8KViews1like0CommentsGitHub Models: Retrieval Augmented Generation (RAG)
We recently announced that we’re launching GitHub Models, enabling more than 100 million GitHub developers to become AI developers and build with industry-leading AI models. GitHub Models opens the door for you to rapidly go from idea to code to cloud, simplifying model experimentation and selection across the best of Azure AI catalog. Today, we’re announcing Retrieval Augmented Generation (RAG), powered by Azure AI Search, for GitHub Models. Coming soon to public beta, GitHub Models RAG simplifies the development of user friendly and high-quality RAG. With an intuitive interface and seamless integration within GitHub, you can effortlessly create data-grounded applications with your own data. Your RAG indexes automatically take advantage of Azure AI Search’s capabilities – hybrid text and vector retrieval, semantic ranking, integrated vectorization, and more – right out of the box. Key features you’ll love: Playground for Experimentation: For the first time, you can easily ground your model with your own data by uploading files directly within the Playground. This intuitive setup lets you quickly experiment with RAG. Advanced Scenarios in Code: With only your GitHub personal access token (PAT), explore ready-to-use code samples and dive into more advanced RAG and retrieval scenarios, whether in Codespaces or your preferred development tools. Free, Full-Featured Azure AI Search Service: With only your GitHub credentials, get started with Azure AI Search at no cost. It comes auto provisioned for you without an Azure subscription. This free version provides a growth path for expanding into full production as your needs evolve. Full Azure AI Search Query API: Enjoy complete access to the Azure AI Search query API, including powerful features like vectors, hybrid, and semantic ranker – all within GitHub Models. From Playground to Code: Frictionless Grounding with Your Data Traditional standard models can present significant challenges, relying on static, outdated knowledge that often results in inaccurate, generalized responses lacking the specificity your projects require. Customizing these models for your needs is both time-consuming and costly, and requires navigating multiple tools and workflows, which adds unnecessary complexity. GitHub Models RAG transforms this experience by enabling your models to access up-to-date information, retrieving current, relevant documents as needed. This approach reduces reliance on outdated data, providing contextual accurate answers grounded in your data – without the need for frequent retraining. You can start in the GitHub Models playground at no cost and easily transition to code for more advanced scenarios. With only your GitHub PAT, you can experiment with RAG building blocks, tune retrieval and response accuracy with Azure AI Search features like vectors, hybrid search, and semantic ranker, and set up scalable solutions tailored to your domain. import os import json from azure.ai.inference import ChatCompletionsClient from azure.core.credentials import AzureKeyCredential from azure.ai.inference.models import * import requests from azure.search.documents import SearchClient from azure.search.documents.models import QueryType, VectorizableTextQuery # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings. # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] MODELS_ENDPOINT = "https://models.inference.ai.azure.com" client = ChatCompletionsClient( endpoint=MODELS_ENDPOINT, credential=AzureKeyCredential(GITHUB_TOKEN), ) # Your Free Azure AI Search endpoint will be auto-provisioned SEARCH_ENDPOINT = https://[AzureAISearchServiceName].search.windows.net search_client = SearchClient( endpoint=SEARCH_ENDPOINT, index_name="default-vector-index", credential=AzureKeyCredential(GITHUB_TOKEN), ) def search_documents(query: str): """Search documents matching a query given as a series of keywords.""" # search for documents that are semantically similar to the query r = search_client.search( search_text=query, vector_queries=[ VectorizableTextQuery( text=query, k_nearest_neighbors=50, fields="text_vector" ) ], query_type=QueryType.SEMANTIC, top=5, select=["chunk_id", "title", "chunk"], ) # create a json structure for the tool output results = [f"title:{doc['title']}\n" + f"content: {doc['chunk']}".replace("\n", " ") for doc in r] # return the json structure as a string return json.dumps(results) search_tool = ChatCompletionsToolDefinition( function=FunctionDefinition( name=search_documents.__name__, description=search_documents.__doc__, parameters={ "type": "object", "properties": { "query": { "type": "string", "description": "a series of keywords to use as query to search documents", } }, "required": ["query"], }, ) ) # start a loop to interact with the model agent_returned = False messages = [ SystemMessage( content="""You are an assistant that answers users questions about documents. You are given a tool that can search within those documents. Do that systematically to provid the best answers to the user's questions. If you do not find the information, just say you do not know.""" ), UserMessage(content="Can I claim an ambulance?"),] print(f"User> {messages[-1].content}") while not agent_returned: # Initialize the chat history with the assistant's welcome message response = client.complete( messages=messages, tools=[search_tool], model="gpt-4o", temperature=1, max_tokens=4096, top_p=1, ) # We expect the model to ask for a tool call if response.choices[0].finish_reason == CompletionsFinishReason.TOOL_CALLS: # Append the model response to the chat history messages.append( AssistantMessage(tool_calls=response.choices[0].message.tool_calls) ) # There might be multiple tool calls to run in parallel if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: # We expect the tool to be a function call if isinstance(tool_call, ChatCompletionsToolCall): # Parse the function call arguments and call the function function_args = json.loads( tool_call.function.arguments.replace("'", '"') ) print( f"System> Calling function `{tool_call.function.name}` with arguments {function_args}" ) callable_func = locals()[tool_call.function.name] function_return = callable_func(**function_args) print(f"System> Function returned.") # Append the function call result to the chat history messages.append( ToolMessage(tool_call_id=tool_call.id, content=function_return) ) else: raise ValueError( f"Expected a function call tool, instead got: {tool_call}" ) else: agent_returned = True print(f"Assistant> {response.choices[0].message.content}") The code snippet can be accessed in the code tab of the GitHub Models playground. This provides a convenient way for initiating quick experimentation, allowing you to obtain relevant responses tailored to your specific data. All you need is your GitHub PAT. Enhance Retrieval with Azure AI Search’s Hybrid Search and Semantic Ranking Azure AI Search enhances your experience with GitHub Models RAG by providing advanced retrieval capabilities through hybrid search and semantic ranking. With hybrid search, you can combine keyword and vector retrieval, using Reciprocal Rank Fusion (RRF) to merge and select the most relevant results from each method. This fusion step ensures that both precise keywords and contextual relevance play a role in the results you receive. Additionally, Semantic Ranker adds a reranking layer to your initial results, whether they are BM25-ranked or RRF-ranked. Leveraging deep learning models that support multi-lingual contexts, Semantic Ranker emphasizes results that are semantically relevant to the query. Because Semantic Ranker runs natively within the Azure AI Search stack, our data shows that combining semantic ranking with hybrid retrieval delivers the most effective and relevant retrieval experience right out of the box. Both hybrid search and semantic ranking are enabled by default in the GitHub Models playground. Query rewriting will come to GitHub Models soon, and as we introduce new Azure AI Search capabilities, they will be part of GitHub Models RAG for free, ensuring you always have access to the latest advancements. What’s Next? GitHub Models RAG enters public beta next month, giving you the perfect opportunity to dive in, experiment, and start building intelligent, data-grounded applications. Join the public beta to explore the ease of use of GitHub Models RAG firsthand. Get Started and Next Steps Join the waitlist Explore and experiment with GitHub Models Learn more about GitHub Models Learn more about Azure AI Search2.4KViews3likes0Comments