azure ai video indexer
7 TopicsFrom Foundry to Fine-Tuning: Topics you Need to Know in Azure AI Services
With so many new features from Azure and newer ways of development, especially in generative AI, you must be wondering what all the different things you need to know are and where to start in Azure AI. Whether you're a developer or IT professional, this guide will help you understand the key features, use cases, and documentation links for each service. Let's explore how Azure AI can transform your projects and drive innovation in your organization. Stay tuned for more details! Term Description Use Case Azure Resource Azure AI Foundry A comprehensive platform for building, deploying, and managing AI-driven applications. Customizing, hosting, running, and managing AI applications. Azure AI Foundry AI Agent Within Azure AI Foundry, an AI Agent acts as a "smart" microservice that can be used to answer questions (RAG), perform actions, or completely automate workflows. can be used in a variety of applications to automate tasks, improve efficiency, and enhance user experiences. Link AutoGen An open-source framework designed for building and managing AI agents, supporting workflows with multiple agents. Developing complex AI applications with multiple agents. Autogen Multi-Agent AI Systems where multiple AI agents collaborate to solve complex tasks. Managing energy in smart grids, coordinating drones. Link Model as a Platform A business model leveraging digital infrastructure to facilitate interactions between user groups. Social media channels, online marketplaces, crowdsourcing websites. Link Azure OpenAI Service Provides access to OpenAI’s powerful language models integrated into the Azure platform. Text generation, summarization, translation, conversational AI. Azure OpenAI Service Azure AI Services A suite of APIs and services designed to add AI capabilities like image analysis, speech-to-text, and language understanding to applications. Image analysis, speech-to-text, language understanding. Link Azure Machine Learning (Azure ML) A cloud-based service for building, training, and deploying machine learning models. Creating models to predict sales, detect fraud. Azure Machine Learning Azure AI Search An AI-powered search service that enhances information to facilitate exploration. Enterprise search, e-commerce search, knowledge mining. Azure AI Search Azure Bot Service A platform for developing intelligent, enterprise-grade bots. Creating chatbots for customer service, virtual assistants. Azure Bot Service Deep Learning A subset of ML using neural networks with many layers to analyze complex data. Image and speech recognition, natural language processing. Link Multimodal AI AI that integrates and processes multiple types of data, such as text and images(including input & output). Describing images, answering questions about pictures. Azure OpenAI Service, Azure AI Services Unimodal AI AI that processes a single type of data, such as text or images (including input & output). Writing text, recognizing objects in photos. Azure OpenAI Service, Azure AI Services Fine-Tuning Models Adapting pre-trained models to specific tasks or datasets for improved performance. Customizing models for specific industries like healthcare. Azure Foundry Model Catalog A repository of pre-trained models available for use in AI projects. Discovering, evaluating, fine-tuning, and deploying models. Model Catalog Capacity & Quotas Limits and quotas for using Azure AI services, ensuring optimal resource allocation. Managing resource usage and scaling AI applications. Link Tokens Units of text processed by language models, affecting cost and performance. Managing and optimizing text processing tasks. Link TPM (Tokens per Minute) A measure of the rate at which tokens are processed, impacting throughput and performance. Allocating and managing processing capacity for AI models. Link PTU(provisioned throughput) provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. Ensuring predictable performance for AI applications. Link968Views1like0CommentsEnhancing Workplace Safety and Efficiency with Azure AI Foundry's Content Understanding
Discover how Azure AI Foundry’s Content Understanding service, featuring the Video Shot Analysis template, revolutionizes workplace safety and efficiency. By leveraging Generative AI to analyze video data, businesses can gain actionable insights into worker actions, posture, safety risks, and environmental conditions. Learn how this cutting-edge tool transforms operations across industries like manufacturing, logistics, and healthcare.554Views2likes0CommentsMultimodal video search powered by Video Retrieval in Azure
Video content is becoming increasingly central to business operations, from training materials to safety monitoring. As part of Azure's comprehensive video analysis capabilities, we're excited to discuss Azure Video Retrieval, a powerful service that enables natural language search across your video and image content. This service makes it easier than ever to locate exactly what you need within your media assets. What is Azure Video Retrieval? Azure Video Retrieval allows you to create a search index and populate it with both videos and images. Using natural language queries, you can search through this content to identify visual elements (like objects and safety events) and speech content without requiring manual transcription or specialized expertise. The service offers powerful customization options - developers can define metadata schemas for each index, ingest custom metadata, and specify which features (vision, speech) to extract and filter during search operations. Whether you're looking for specific spoken phrases or visual occurrences, the service pinpoints exact timestamps where your search criteria appear. Key Features Multimodal Search: Search across both visual and audio content using natural language Custom Metadata Support: Define and ingest metadata schemas for enhanced retrieval Flexible Feature Extraction: Specify which features (vision, speech) to extract and search Precise Timestamp Matching: Get exact frame locations where your search criteria appear Multiple Content Types: Index and search both videos and images Simple Integration: Easy implementation with Azure Blob Storage Comprehensive API: Full REST API support for custom implementations Getting Started Prerequisites Before you begin, you'll need: An Azure Cognitive Services multi-service account An Azure Blob Storage Account for video content Setting Up Video Indexing The indexing process is straightforward. Here's how to create an index and upload videos: # Iterate through blobs and build the index for blob in blob_service_client.get_container_client(az_storage_container_name).list_blobs(): blob_name = blob.name blob_url = f"https://{az_storage_account_name}.blob.core.windows.net/{az_storage_container_name}/{blob_name}" # Generate SAS URL for secure access sas_url = blob_url + "?" + sas_token # Add video to index payload["videos"].append({ "mode": "add", "documentId": str(uuid.uuid4()), "documentUrl": sas_url, "metadata": { "cameraId": "video-indexer-demo-camera1", "timestamp": datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d %H:%M:%S") } }) # Create index response = requests.put(url, headers=headers, json=payload) Searching Videos The service supports two primary search modes: # Query templates for searching by text or speech query_by_text = { "queryText": "<user query>", "filters": { "featureFilters": ["vision"], }, } query_by_speech = { "queryText": "<user query>", "filters": { "featureFilters": ["speech"], }, } The search input is passed to the REST API based on the mode chosen. # Function to search for video frames based on user input, from the Azure Video Retrieval Service def search_videos(query, query_type): url = f"https://{az_video_indexer_endpoint}/computervision/retrieval/indexes/{az_video_indexer_index_name}:queryByText?api-version={az_video_indexer_api_version}" headers = { "Ocp-Apim-Subscription-Key": az_video_indexer_key, "Content-Type": "application/json", } input_query = None if query_type == "Speech": query_by_speech["queryText"] = query input_query = query_by_speech else: query_by_text["queryText"] = query input_query = query_by_text try: response = requests.post(url, headers=headers, json=input_query) response.raise_for_status() print("search response \n", response.json()) return response.json() except Exception as e: print("error", e.args) print("error", e) return None The REST APIs that are required to complete the steps in this process are covered here Use Cases Azure Video Retrieval can transform how organizations work with video content across various scenarios: Training and Education: Quickly locate specific topics or demonstrations within training videos Content Management: Efficiently organize and retrieve media assets Safety and Compliance: Find specific safety-related content or incidents Media Production: Locate specific scenes or dialogue across video libraries Demo Watch this sample application that uses Video retrieval to let users search frames across multiple videos in an Index The source code of the sample application can be accessed here Resources : Video Retrieval API Video Retrieval API reference Azure AI Video Indexer overview322Views0likes0CommentsTransforming Video into Value with Azure AI Content Understanding
Unlocking Value from Unstructured Video Every minute, social video sharing platforms see over 500 hours of video uploads [1] and 91% of businesses leverage video as a key tool[2]. From media conglomerates managing extensive archives to enterprises producing training and marketing materials, organizations are overwhelmed with video. Yet, despite this abundance, video remains inherently unstructured and difficult to utilize effectively. While the volume of video content continues to grow exponentially, its true value often remains untapped due to the friction involved in making video useful. Organizations grapple with several pain points: Inaccessibility of Valuable Content Archives: Massive video archives sit idle because finding the right content to reuse requires extensive manual effort. The Impossibility of Personalization Without Metadata: Personalization holds the key to unlocking new revenue streams and increasing engagement. However, without reliable and detailed metadata, it's cost-prohibitive to tailor content to specific audiences or individuals. Missed Monetization Opportunities: For media companies, untapped archives mean missed chances to monetize content through new formats or platforms. Operational Bottlenecks: Enterprises struggle with slow turnaround times for training materials, compliance checks, and marketing campaigns due to inefficient video workflows, leading to delays and increased expenses. Many video processing application rely on purpose-built, frame-by-frame analysis to identify objects and key elements within video content. While this method can detect a specific list of objects, it is inherently lossy, struggling to capture actions, events, or uncommon objects. It also is expensive and time consuming to customize for specific tasks. Generative AI promises to revolutionize video content analysis, with GPT-4o topping leaderboards for video understanding tasks, but finding a generative model that processes video is just the first step. Creating video pipelines with generative models is hard. Developers must invest significant effort in infrastructure to create custom video processing pipelines to get good results. These systems need optimized prompts, integrated transcription, smart handling of context-window limitations, shot aligned segmentation, and much more. This makes them expensive to optimize and hard to maintain over time. Introducing Azure AI Content Understanding for video This is where Azure AI Content Understanding transforms the game. By offering an integrated video pipeline that leverages advanced foundational models, you can effortlessly extract insights from both the audio and visual elements of your videos. This service transforms unstructured video into structured, searchable knowledge, enabling powerful use cases like media asset management and highlight reel generation. With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full context. For example, for corporate events and conferences you can quickly produce same-day highlight reels. This capability not only reduces the time and cost associated with manual editing but also empowers organizations to deliver timely, professional reaction videos that keep audiences engaged and informed. In another case, A news broadcaster can create a new personalized viewing experience for news by recommending stories of interest. This is achieved by automatically tagging segments with relevant metadata like topic and location, enabling the delivery of content personalized to individual interests, driving higher engagement and viewer satisfaction. By generating specific metadata on a segment-by-segment basis, including chapters, scenes, and shots, Content Understanding provides a detailed outline of what's contained in the video, facilitating these workflows. This is enabled by a streamlined pipeline for video that starts with content extraction tasks like transcription, shot detection, key frame extraction, and face grouping to create grounding data for analysis. Then, generative models use that information to extract the specific fields you request for each segment of the video. This generative field extraction capability enables customers to: Customize Metadata: Tailor the extracted information to focus on elements important to your use case, such as key events, actions, or dialogues. Create Detailed Outlines: Understand the structure of your video content at a granular level. Automate Repetitive Editing Tasks: Quickly pinpoint important segments to create summaries, trailers, or compilations that capture the essence of the full video. By leveraging these capabilities, organizations can automate many video creation tasks including creating highlight reels and repurposing content across formats, saving time and resources while delivering compelling content to their audiences. Whether it's summarizing conference keynotes, capturing the essence of corporate events, or showcasing the most exciting moments in sports, Azure AI Content Understanding makes video workflows efficient and scalable. But how do these solutions perform in real-world scenarios? Customer Success Stories IPV Curator: Transforming Media Asset Management IPV Curator, a leader in media asset management solutions, assists clients in managing and monetizing extensive video libraries across various industries, including broadcast, sports, and global enterprises. It enables seamless, zero-download editing of video in Azure cloud using Adobe applications. Their customers needed an efficient way to search, repurpose, and produce vast amounts of video content with data extraction tailored to specific use cases. IPV integrated Azure AI Content Understanding into their Curator media asset management platform. They found that it provided a step-function improvement in metadata extraction for their clients. It was particularly beneficial as it enabled: Industry Specific Metadata: Allowed clients to extract metadata tailored to their specific needs by using simple prompts and without the need for domain-specific training of new AI models. For example: Broadcast: Rapidly identified key scenes for promo production and to efficiently identify their highest value content for Free ad-supported streaming TV (FAST) channels. Travel Marketing Content: Automatically tagged geographic locations, landmarks, shot types (e.g., aerial, close-up), and highlighted scenic details. Shopping Channel Content: Detected specific products, identified demo segments, product categories, and key selling points. Advanced Action and Event Analysis: Enabled detailed analysis of a set of frames in a video segment to identify actions and events. This provides a new level of insights compared to frame-by-frame analysis of objects. Segmentation Aligned to Shots: Detected shot boundaries in produced videos and in-media edit points, enabling easy reuse by capturing full shots in segments. As a result, IPV's clients can quickly find and repurpose content, significantly reducing editing time and accelerating video production at scale. IPV Curator enables search across industry specific metadata extracted from videos "IPV's collaboration with Microsoft transforms media stored in Azure into an easily accessible, streaming, and highly searchable active archive. The powerful search engine within IPV's new generation of Media Asset Management uses Azure AI Content Understanding to accurately surface any archived video clip, driving users to their highest value content in seconds." —Daniel Mathew, Chief Revenue Officer, IPV Cognizant: Innovative Ad Moderation Cognizant, a global leader in consulting and professional services, has identified a challenge of moderating advertising content for its media customers. Their customers' traditional methods are heavily reliant on manual review and struggling to scale with the increasing volume of content requiring assessment. The Cognizant Ad Moderation solution framework leverages Content Understanding to create a more accurate, cost-effective approach to ad moderation that results in a 96% reduction in review time. It allows customers to automate ad reviews to ensure cultural sensitivity, regulatory compliance, and optimizing programming placement, ultimately reducing manual review efforts. Cognizant achieves these results by leveraging Content Understanding for multimodal field extraction, tailored output, and native generative AI video processing. Multimodal Field Extraction: Extracts key attributes from both the audio and visual elements, allowing for a more comprehensive analysis of the content. This analysis is critical to get a holistic view of suitability for various audiences. Tailored Output Schema: Outputs a custom structured schema that detects content directly relevant to the moderation task. This includes detecting specific risky attributes like prohibited language, potentially banned topics, violations of content restrictions, and sensitive products like alcohol or smoking. Native Generative AI Video Processing: Content Understanding natively processes video files with generative AI to provide the detailed insights requested in the schema capturing context, actions, and events over entire segments of the video. This optimized video pipeline provides Cognizant with a detailed analysis of videos to ground an automated decision. It allows them to quickly green light compliant ads and flag others for rejection or human review. Content Understanding empowers Cognizant to focus on solving business challenges rather than managing the underlying infrastructure for video processing and integrating generative models. “I'm absolutely thrilled about the Azure AI Content Understanding service! It's a game-changer that accelerates processing by integrating multiple AI capabilities into a single service call, delivering combined audio and video transcription in one JSON output with incredibly detailed results. The ability to add custom fields that integrate with an LLM provides even more detailed, meaningful, and flexible output.” - Rushil Patel – Developer @ Cognizant The Broader Impact: Transformation across industries The transformative power of Azure AI Content Understanding extends far beyond these specific use cases, offering significant benefits across various industries and workflows. By leveraging advanced AI capabilities on video, organizations have been able to unlock new opportunities and drive innovation in several key areas: Social Media Listening and Consumer Insights: Analyze video content across social platforms to understand how products are perceived and discussed online. Gain valuable consumer insights to inform product development, marketing strategies, and brand management. Unlocking Video for AI Assistants and Agents: Enable AI assistants and agents to access and utilize information from video content, transforming meeting recordings, training videos, and events into valuable data sources for Retrieval-Augmented Generation (RAG). Enhance customer support and knowledge management by integrating video insights into AI-driven interactions. Enhancing Accessibility with Audio Descriptions: Generate draft audio descriptions for video content to provide a starting point for human editors. This streamlines the creation of accessible content for visually impaired audiences, reducing effort and accelerating compliance with accessibility standards. Marketing and Advertising Workflows: Automate content analysis to ensure brand alignment and effective advertising. Understand and optimize the content within video advertisements to maintain consistent branding and enhance audience engagement. The business value of Azure AI Content Understanding is clear. By addressing core challenges in video content management with generative AI, customization, and native video processing, it enhances operational efficiencies and unlocks new opportunities for monetization and innovation. Organizations can now turn dormant video archives into valuable assets, deliver personalized content to engage audiences effectively, and automate manual time-consuming workflows. Ready to Transform Your Video Content? For more details on how to use Content Understanding for video check out the Video Solution Overview. If you are at Microsoft Ignite 2024 or are watching online, check out this breakout session. Try this new service in Azure AI Foundry. For documentation, please refer to the Content Understanding Overview For a broader perspective, see Announcing Azure AI Content Understanding: Transforming Multimodal Data into Insights and discover how it extends these capabilities across all content formats. ----- [1] According to Statistia in 2022 - Hours of video uploaded every minute 2022 | Statista [2] According to a Wyzowl survey in 2024 - Video Marketing 2024 (10 Years of Data) | Wyzowl3.4KViews0likes0CommentsAzure Video Indexer & Phi-3 introduce Textual Video Summary on Edge: Better Together story
Azure AI Video Indexer collaborated with the Phi-3 team to introduce a Textual Video Summary capability on Edge. This collaboration showcases the utilization of the SLM, Phi-3 model, enabling the Azure AI Video Indexer team to extend the same LLM based summarization capabilities that are available for the cloud, to also be available on the Edge. This comes following Build’s 2024 announcements of the integration of Azure AI Video Indexer with language models to generate textual summaries of videos and the expansion of the Phi-3 models family. The feature is accessible both in the cloud, utilizing Azure Open AI, and at the Edge via the Phi-3-mini-4k-instruct model.4.1KViews2likes0Comments