Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >

azure ai vision

25 Topics

Learn about Azure AI during the Global AI Bootcamp 2025
The Global AI Bootcamp starting next week, and it’s more exciting than ever! With 135 bootcamps in 44 countries, this is your chance to be part of a global movement in AI innovation. 🤖🌍 From Germany to India, Nigeria to Canada, and beyond, join us for hands-on workshops, expert talks, and networking opportunities that will boost your AI skills and career. Whether you’re a seasoned pro or just starting out, there’s something for everyone! 🚀 Why Attend? 🛠️ Hands-on Workshops: Build and deploy AI models. 🎤 Expert Talks: Learn the latest trends from industry leaders. 🤝 Network: Connect with peers, mentors, and potential collaborators. 📈 Career Growth: Discover new career paths in AI. Don't miss this incredible opportunity to learn, connect, and grow! Check out the event in your city or join virtually. Let's shape the future of AI together! 🌟 👉 Explore All Bootcamps
hboelman
Feb 28, 2025 Place AI - Azure AI services Blog
386Views
0likes
0Comments
From Foundry to Fine-Tuning: Topics you Need to Know in Azure AI Services
With so many new features from Azure and newer ways of development, especially in generative AI, you must be wondering what all the different things you need to know are and where to start in Azure AI. Whether you're a developer or IT professional, this guide will help you understand the key features, use cases, and documentation links for each service. Let's explore how Azure AI can transform your projects and drive innovation in your organization. Stay tuned for more details! Term Description Use Case Azure Resource Azure AI Foundry A comprehensive platform for building, deploying, and managing AI-driven applications. Customizing, hosting, running, and managing AI applications. Azure AI Foundry AI Agent Within Azure AI Foundry, an AI Agent acts as a "smart" microservice that can be used to answer questions (RAG), perform actions, or completely automate workflows. can be used in a variety of applications to automate tasks, improve efficiency, and enhance user experiences. Link AutoGen An open-source framework designed for building and managing AI agents, supporting workflows with multiple agents. Developing complex AI applications with multiple agents. Autogen Multi-Agent AI Systems where multiple AI agents collaborate to solve complex tasks. Managing energy in smart grids, coordinating drones. Link Model as a Platform A business model leveraging digital infrastructure to facilitate interactions between user groups. Social media channels, online marketplaces, crowdsourcing websites. Link Azure OpenAI Service Provides access to OpenAI’s powerful language models integrated into the Azure platform. Text generation, summarization, translation, conversational AI. Azure OpenAI Service Azure AI Services A suite of APIs and services designed to add AI capabilities like image analysis, speech-to-text, and language understanding to applications. Image analysis, speech-to-text, language understanding. Link Azure Machine Learning (Azure ML) A cloud-based service for building, training, and deploying machine learning models. Creating models to predict sales, detect fraud. Azure Machine Learning Azure AI Search An AI-powered search service that enhances information to facilitate exploration. Enterprise search, e-commerce search, knowledge mining. Azure AI Search Azure Bot Service A platform for developing intelligent, enterprise-grade bots. Creating chatbots for customer service, virtual assistants. Azure Bot Service Deep Learning A subset of ML using neural networks with many layers to analyze complex data. Image and speech recognition, natural language processing. Link Multimodal AI AI that integrates and processes multiple types of data, such as text and images(including input & output). Describing images, answering questions about pictures. Azure OpenAI Service, Azure AI Services Unimodal AI AI that processes a single type of data, such as text or images (including input & output). Writing text, recognizing objects in photos. Azure OpenAI Service, Azure AI Services Fine-Tuning Models Adapting pre-trained models to specific tasks or datasets for improved performance. Customizing models for specific industries like healthcare. Azure Foundry Model Catalog A repository of pre-trained models available for use in AI projects. Discovering, evaluating, fine-tuning, and deploying models. Model Catalog Capacity & Quotas Limits and quotas for using Azure AI services, ensuring optimal resource allocation. Managing resource usage and scaling AI applications. Link Tokens Units of text processed by language models, affecting cost and performance. Managing and optimizing text processing tasks. Link TPM (Tokens per Minute) A measure of the rate at which tokens are processed, impacting throughput and performance. Allocating and managing processing capacity for AI models. Link PTU(provisioned throughput) provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. Ensuring predictable performance for AI applications. Link
SAVITAMITTAL
Feb 06, 2025 Place AI - Azure AI services Blog
967Views
1like
0Comments
Real Time, Real You: Announcing General Availability of Face Liveness Detection
A Milestone in Identity Verification We are excited to announce the general availability of our face liveness detection features, a key milestone in making identity verification both seamless and secure. As deepfake technology and sophisticated spoofing attacks continue to evolve, organizations need solutions that can verify the authenticity of an individual in real time. During the preview, we listened to customer feedback, expanded capabilities, and made significant improvements to ensure that liveness detection works across three platforms and for common use cases. What’s New Since the Preview? During the preview, we introduced several features that laid the foundation for secure and seamless identity verification, including active challenge in JavaScript library. Building on that foundation, there are improvements across the board. Here’s what’s new: Feature Parity Across Platforms: Liveness detection’s active challenge is now available on both Android and iOS platforms, achieving full feature parity across all supported devices. This allows a consistent and seamless experience for both developers and end users on all three supported platforms. Easy integration: The liveness detection client SDK now requires only a single function call to start the entire flow, making it easier for developers to integrate. The SDK also includes an integrated UI flow to simplify implementation, allowing a seamless developer experience across platforms. Runtime environment safety: The liveness detection client SDK integrated safety check for untrustworthy runtime environment on both iOS and Android devices. Accuracy and Usability Improvements: We’ve delivered numerous bug fixes and enhancements to improve detection accuracy and user experience across all supported platforms. Our solution is now faster, more intuitive, and more resilient against even the most advanced spoofing techniques. These advancements help that businesses integrate liveness detection with confidence, providing both security and convenience. Security in Focus: Microsoft’s Commitment to Innovation As identity verification threats continue to evolve, general availability is the start of the journey. Microsoft is dedicated to advancing our face liveness detection technology to address evolving security challenges: Continuous Support and Innovation: Our team is actively monitoring emerging spoofing techniques. With ongoing updates and enhancements, we ensure that our liveness detection solution adapts to new challenges. Learn more about liveness detection updates. Security and Privacy by Design: Microsoft’s principles of security and privacy are built into every step. We provide robust support to assist customers in integrating and maintaining these solutions effectively. We process the data securely, respecting user privacy and complying with global regulations. By collaborating closely with our customers, we ensure that together, we build solutions that are not only innovative but also secure. Learn more about shared responsibility in liveness solutions We provide reliable, long-term solutions to help organizations stay ahead of threats. Get Start Today We’re excited for customers to experience the benefits of real-time liveness detection. Whether you’re safeguarding financial transactions, streamlining digital onboarding, or enabling secure logins, our solution can strengthen your security. Explore: Learn more about integrating liveness detection into your applications by this tutorial. Try it Out: Liveness detection is available to experience in Vision Studio Build with Confidence: Empower your organization with secure, real-time identity verification. Try our sample code to see how easy it is to get started: Azure-Samples/azure-ai-vision-sdk A Step Toward a Safer Future With a focus on real-time, reliable identity verification, we’re making identity verification smarter, faster, and safer. As we continue to improve and evolve this solution, our goal remains the same: to protect identities, build trust, and verify that the person behind the screen is really you. Start building with liveness detection today and join us on this journey toward a more secure digital world.
Jinyu_Li_2005
Jan 30, 2025 Place AI - Azure AI services Blog
644Views
4likes
0Comments
Dify work with Microsoft AI Search
Please refer to my repo to get more AI resources, wellcome to star it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/LLMs/Dify-With-AI-Search Dify work with Microsoft AI Search Dify is an open-source platform for developing large language model (LLM) applications. It combines the concepts of Backend as a Service (BaaS) and LLMOps, enabling developers to quickly build production-grade generative AI applications. Dify offers various types of tools, including first-party and custom tools. These tools can extend the capabilities of LLMs, such as web search, scientific calculations, image generation, and more. On Dify, you can create more powerful AI applications, like intelligent assistant-type applications, which can complete complex tasks through task reasoning, step decomposition, and tool invocation. Dify works with AI Search Demo Till now, Dify could not integrate with Microsoft directly via default Dify web portal. Let me show how to achieve it. Please click below pictures to see my demo video on Yutube: https://www.youtube.com/watch?v=20GjS6AtjTo Dify works with AI Search Configuration steps Configure on AI search Create index, make sure you could get the result from AI search index: Run dify on VM via docker: root@a100vm:~# docker ps |grep -i dify 5d6c32a94313 langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-worker-1 264e477883ee langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-api-1 2eb90cd5280a langgenius/dify-sandbox:0.2.9 "/main" 3 months ago Up 3 minutes (healthy) docker-sandbox-1 708937964fbb langgenius/dify-web:0.8.3 "/bin/sh ./entrypoin…" 3 months ago Up 3 minutes 3000/tcp docker-web-1 Create customer tool in Dify portal,set schema: schema details: { "openapi": "3.0.0", "info": { "title": "Azure Cognitive Search Integration", "version": "1.0.0" }, "servers": [ { "url": "https://ai-search-eastus-xinyuwei.search.windows.net" } ], "paths": { "/indexes/wukong-doc1/docs": { "get": { "operationId": "getSearchResults", "parameters": [ { "name": "api-version", "in": "query", "required": true, "schema": { "type": "string", "example": "2024-11-01-preview" } }, { "name": "search", "in": "query", "required": true, "schema": { "type": "string" } } ], "responses": { "200": { "description": "Successful response", "content": { "application/json": { "schema": { "type": "object", "properties": { "@odata.context": { "type": "string" }, "value": { "type": "array", "items": { "type": "object", "properties": { "@search.score": { "type": "number" }, "chunk_id": { "type": "string" }, "parent_id": { "type": "string" }, "title": { "type": "string" }, "chunk": { "type": "string" }, "text_vector": { "type": "SingleCollection" }, } } } } } } } } } } } } } Set AI Search AI key: Do search test: Input words: Create a workflow on dify: Check AI search stage: Check LLM stage: Run workflow: Get workflow result:
xinyuwei
Jan 21, 2025 Place AI - Azure AI services Blog
1.6KViews
0likes
0Comments
Announcing the General Availability of Document Intelligence v4.0 API
The Document Intelligence v4.0 API is now generally available! This latest version of Document Intelligence API brings new and updated capabilities across the entire product including updates to Read and Layout APIs for content extraction, prebuilt and custom extraction models for schema extraction from documents and classification models. Document Intelligence has all the tools to enable RAG and document automation solutions for structured and unstructured documents. Enhanced Layout capabilities This release brings significant updates to our Layout capabilities, making it the default choice for document ingestion with enhanced support for Retrieval-Augmented Generation (RAG) workflows. The Layout API now offers a markdown output format that provides a better representation of document elements such as headers, footers, sections, section headers and tables when working with Gen AI models. This structured output enables semantic chunking of content, making it easier to ingest documents into RAG workflows and generate more accurate results. Try Layout in the Document Intelligence Studio or use Layout as a skill in your RAG pipelines with Azure Search. Searchable PDF output Document Intelligence no longer outputs only JSON! With the 4.0 release, you can now generate a searchable PDF output from an input document. The recognized text is overlaid over the scanned text, making all the content in the documents instantly searchable. This feature enhances the accessibility and usability of your documents, allowing for quick and efficient information retrieval. Try the new searchable PDF output in the Studio or learn more. Searchable PDF is available as an output from the Read API at no additional cost. This release also includes several updates to the OCR model to better handle complex text recognition challenges. New and updated Prebuilt models Prebuilt models offer a simple API to extract a defined schema from known document types. The v4.0 release adds new prebuilt models for mortgage processing, bank document processing, paystub, credit/debit card, check, marriage certificate, and prebuilt models for processing variants of the 1095, W4, and 1099 tax forms for US tax processing scenarios. These models are ideal for extracting specific details from documents like bank statements, checks, paystubs, and various tax forms. With over 22 prebuilt model types, Document Intelligence has models for common documents in procurement, tax, mortgage and financial services. See models overview for a complete list of document types supported with prebuilt models. Query field add-on capability Query field is an add-on capability to extend the schema extracted from any prebuilt model. This add-on capability is ideal when you have simple fields that need to be extracted. Query field also work with Layout, so for simple documents, you don’t need to train a custom model and can just define the query fields to begin processing the document with no training. Query field supports a maximum of 20 fields per request. Try query field in the Document Intelligence Studio with Layout or any prebuilt model. Document classification model The custom classification models are updated to improve the classification process and now support multi-language documents and incremental training. This allows you to update the classifier model with additional samples or classes without needing the entire training dataset. Classifiers also support analyzing Office document types (.docx, .pptx, and .xls). Version 4.0 adds a classifier copy operation for copying your classifier across resources, regions or subscriptions making model management easier. This version also introduces some changes in the splitting behavior, by default, the custom classification model no longer splits documents during analysis. Learn more about the classification and splitting capabilities. Improvements to Custom Extraction models Custom extraction models now output confidence scores for tables, table rows, and cells. This makes the process of validating model results much easier and provides the tools to trigger human reviews. Custom model capabilities have also improved with the addition of signature detection to neural models and support for overlapping fields. Neural models now include a paid training tier for when you have a large dataset of labeled documents to train. Paid training enables longer training to ensure you have a model that performs better on the different variations in your training dataset. Learn more about improvements to custom extraction models. New implementation of model compose for greater flexibility With custom extraction models in the past, you could compose multiple models into a single composed model. When a document was analyzed with a composed model, the service picked the model best suited to process the document. With this version, the model compose introduces a new implementation requiring a classification model in addition to the extraction models. This enables processing multiple instances of the same document with splitting, conditional routing and more. Learn more about the new model compose implementation. Get started with the v4.0 API today The Document Intelligence v4.0 API is packed with many more updates. Start with the what’s new page to learn more. You can try all of the new and updated capabilities in the Document Intelligence Studio. Explore the new REST API or the language specific SDKs to start building our updating your document workflows.
Vinod_Kurpad
Jan 19, 2025 Place AI - Azure AI services Blog
2.3KViews
1like
1Comment
Unlock Multimodal Data Insights with Azure AI Content Understanding: New Code Samples Available
We are excited to share code samples that leverage the Azure AI Content Understanding service to help you extract insights from your images, documents, videos, and audio content. These code samples are available on GitHub and cover the following: Azure AI integrations Visual Document Search: Leverage Azure Document Intelligence, Content Understanding, Azure Search, and Azure OpenAI to unlock natural language search of document contents for a complex document with pictures of charts and diagrams. Video Chapter Generation: Generate video chapters using Azure Content Understanding and Azure OpenAI. This allows you to break long videos into smaller, labeled parts with key details, making it easier to find, share, and access the most relevant content. Video Content Discovery: Learn how to use Content Understanding, Azure Search, and Azure OpenAI models to process videos and create a searchable index for AI-driven content discovery. Content Understanding Operations Analyzer Templates: An Analyzer enables you to tailor Content Understanding to extract valuable insights from your content based on your specific needs. Start quickly with these ready-made templates. Content Extraction: Learn how Content Understanding API can extract semantic information from various files including performing OCR to recognize tables in documents, transcribing audio files, and analyzing faces in videos. Field Extraction: This example demonstrates how to extract specific fields from your content. For instance, you can identify the invoice amount in a document, capture names mentioned in an audio file, or generate a summary of a video. Analyzer Training: For document scenarios, you can further enhance field extraction performance by providing a few labeled samples. Analyzer management: Create a minimal analyzer, list all analyzers in your resource, and delete any analyzers you no longer need. Azure AI Content Understanding: Turn Multimodal Content into Structured Data Azure AI Content Understanding is a cutting-edge Azure AI offering designed to help businesses seamlessly extract insights from various content types. Built with and for Generative AI, it empowers organizations to seamlessly develop GenAI solutions using the latest models, without needing advanced AI expertise. Content Understanding simplifies the processing of unstructured data stores of documents, images, videos, and audio—transforming them into structured, actionable insights. It is versatile and adaptable across numerous industries and, use case scenarios, offering customization and support for input from multiple data types. Here are a few example use cases: Retrieval Augmented Generation (RAG): Enhance and integrate content from any format to power effective content searches or provide answers to frequent questions in scenarios like customer service or enterprise-wide data retrieval. Post-call analytics: Organizations use Content Understanding to analyze call center or meeting recordings, extracting insights like sentiment, speaker details, and topics discussed, including names, companies, and other relevant data. Insurance claims processing: Automate time-consuming processes like analyzing and handling insurance claims or other low-latency batch processing tasks. Media asset management and content creation: Extract essential features from images and videos to streamline media asset organization and enable entity-based searches for brands, settings, key products, and people. Resources & Documentation To begin extracting valuable insights from your multimodal content, explore the following resources: Azure Content Understanding Overview Azure Content Understanding in Azure AI Foundry FAQs Want to get in touch? We’d love to hear from you! Send us an email at cu_contact@microsoft.com
dybe
Jan 17, 2025 Place AI - Azure AI services Blog
1.1KViews
0likes
0Comments
Enhancing Workplace Safety and Efficiency with Azure AI Foundry's Content Understanding
Discover how Azure AI Foundry’s Content Understanding service, featuring the Video Shot Analysis template, revolutionizes workplace safety and efficiency. By leveraging Generative AI to analyze video data, businesses can gain actionable insights into worker actions, posture, safety risks, and environmental conditions. Learn how this cutting-edge tool transforms operations across industries like manufacturing, logistics, and healthcare.
John_Carroll
Jan 14, 2025 Place AI - Azure AI services Blog
554Views
2likes
0Comments
Microsoft Computer Vision Test
Please refer to my repo to get more AI resources, wellcome to star it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/Multimodal-Models/Computer-Vision I have developed 2 Python programs that runs on Windows and utilizes Azure Computer Vision (Azure CV) . Perform object recognition on images selected by the user. After the recognition is complete, the user can choose the objects they wish to retain (one or more). The selected objects are then cropped and saved locally. Do background remove based on the images and the object user select. Object detection and image segmentation： Please refer to my demo vedio on Yutube: https://youtu.be/edjB-PDapN8 Currently, the background removal API of Azure CV has been discontinued. In the future, this functionality can be achieved through the region-to-segmentation feature of Florence-2. For detailed implementation, please refer to: https://huggingface.co/microsoft/Florence-2-large/blob/main/sample_inference.ipynb Object recognition and background remove： Please refer to my demo vedio on Yutube: https://youtu.be/6x49D3YUTGA Code for Object detection and image segmentation import requests from PIL import Image, ImageTk, ImageDraw import tkinter as tk from tkinter import messagebox, filedialog import threading # Azure Computer Vision API 信息 subscription_key = "o" endpoint = "https://cv-2.cognitiveservices.azure.com/" # 图像分析函数 def analyze_image(image_path): analyze_url = endpoint + "vision/v3.2/analyze" headers = { 'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream' } params = {'visualFeatures': 'Objects'} try: with open(image_path, 'rb') as image_data: response = requests.post( analyze_url, headers=headers, params=params, data=image_data, timeout=10 # 设置超时时间为10秒 ) response.raise_for_status() analysis = response.json() print("图像分析完成") return analysis except requests.exceptions.Timeout: print("请求超时，请检查网络连接或稍后重试。") messagebox.showerror("错误", "请求超时，请检查网络连接或稍后重试。") except Exception as e: print("在 analyze_image 中发生异常：", e) messagebox.showerror("错误", f"发生错误：{e}") # 背景移除函数 def remove_background(image_path, objects_to_keep): print("remove_background 被调用") try: image = Image.open(image_path).convert("RGBA") width, height = image.size # 创建一个透明背景的图像 new_image = Image.new("RGBA", image.size, (0, 0, 0, 0)) # 创建一个与图像大小相同的掩码 mask = Image.new("L", (width, height), 0) draw = ImageDraw.Draw(mask) # 在掩码上绘制要保留的对象区域 for obj in objects_to_keep: x1, y1, x2, y2 = obj['coords'] # 将坐标转换为整数 x1, y1, x2, y2 = map(int, [x1, y1, x2, y2]) # 绘制矩形区域，填充为白色（表示保留） draw.rectangle([x1, y1, x2, y2], fill=255) # 应用掩码到原始图像上 new_image.paste(image, (0, 0), mask) print("背景移除完成，显示结果") new_image.show() # 保存结果 save_path = filedialog.asksaveasfilename( defaultextension=".png", filetypes=[('PNG 图像', '*.png')], title='保存结果图像' ) if save_path: new_image.save(save_path) messagebox.showinfo("信息", f"处理完成，结果已保存到：{save_path}") except Exception as e: print("在 remove_background 中发生异常：", e) messagebox.showerror("错误", f"发生错误：{e}") print("remove_background 完成") # GUI 界面 def create_gui(): # 创建主窗口 root = tk.Tk() root.title("选择要保留的对象") # 添加选择图像的按钮 def select_image(): image_path = filedialog.askopenfilename( title='选择一张图像', filetypes=[('图像文件', '*.png;*.jpg;*.jpeg;*.bmp'), ('所有文件', '*.*')] ) if image_path: show_image(image_path) else: messagebox.showwarning("警告", "未选择图像文件。") def show_image(image_path): analysis = analyze_image(image_path) if analysis is None: print("分析结果为空，无法创建 GUI") return # 加载图像 pil_image = Image.open(image_path) img_width, img_height = pil_image.size tk_image = ImageTk.PhotoImage(pil_image) # 创建 Canvas canvas = tk.Canvas(root, width=img_width, height=img_height) canvas.pack() # 在 Canvas 上显示图像 canvas.create_image(0, 0, anchor='nw', image=tk_image) canvas.tk_image = tk_image # 保留对图像的引用 # 记录对象的矩形、标签和选择状态 object_items = [] # 处理每个检测到的对象 for obj in analysis['objects']: rect = obj['rectangle'] x = rect['x'] y = rect['y'] w = rect['w'] h = rect['h'] obj_name = obj['object'] # 绘制对象的边界框 rect_item = canvas.create_rectangle( x, y, x + w, y + h, outline='red', width=2 ) # 显示对象名称 text_item = canvas.create_text( x + w/2, y - 10, text=obj_name, fill='red' ) # 将对象的选择状态初始化为未选中 selected = False # 将对象的信息添加到列表 object_items.append({ 'rect_item': rect_item, 'text_item': text_item, 'coords': (x, y, x + w, y + h), 'object': obj_name, 'selected': selected }) # 定义点击事件处理函数 def on_canvas_click(event): for item in object_items: x1, y1, x2, y2 = item['coords'] if x1 <= event.x <= x2 and y1 <= event.y <= y2: # 切换选择状态 item['selected'] = not item['selected'] if item['selected']: # 已选中，边框设为绿色 canvas.itemconfig(item['rect_item'], outline='green') canvas.itemconfig(item['text_item'], fill='green') else: # 未选中，边框设为红色 canvas.itemconfig(item['rect_item'], outline='red') canvas.itemconfig(item['text_item'], fill='red') break canvas.bind("<Button-1>", on_canvas_click) # 提交按钮 def on_submit(): print("on_submit 被调用") selected_objects = [] for item in object_items: if item['selected']: # 如果对象被选中，保存其信息 selected_objects.append(item) if not selected_objects: messagebox.showwarning("警告", "请至少选择一个对象。") else: # 调用背景消除函数 threading.Thread(target=remove_background, args=(image_path, selected_objects)).start() print("on_submit 完成") submit_button = tk.Button(root, text="提交", command=on_submit) submit_button.pack() # 添加选择图像的按钮 select_button = tk.Button(root, text="选择图像", command=select_image) select_button.pack() root.mainloop() # 示例使用 if __name__ == "__main__": create_gui() Demo result： Code for Object recognition and background remove On GPU VM: from transformers import AutoProcessor, AutoModelForCausalLM from PIL import Image, ImageDraw, ImageChops import torch import numpy as np import ipywidgets as widgets from IPython.display import display, clear_output import io # Load the model model_id = 'microsoft/Florence-2-large' device = 'cuda' if torch.cuda.is_available() else 'cpu' model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype='auto' ).to(device) processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True ) def run_example(task_prompt, image, text_input=None): if text_input is None: prompt = task_prompt else: prompt = task_prompt + text_input # Process inputs inputs = processor( text=prompt, images=image, return_tensors="pt" ) # Move inputs to the device with appropriate data types inputs = { "input_ids": inputs["input_ids"].to(device), # input_ids are integers (int64) "pixel_values": inputs["pixel_values"].to(device, torch.float16) # pixel_values need to be float16 } with torch.no_grad(): generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, early_stopping=False, do_sample=False, num_beams=3, ) generated_text = processor.batch_decode( generated_ids, skip_special_tokens=False )[0] parsed_answer = processor.post_process_generation( generated_text, task=task_prompt, image_size=(image.width, image.height) ) return parsed_answer def create_mask(image_size, prediction): mask = Image.new('L', image_size, 0) mask_draw = ImageDraw.Draw(mask) for polygons in prediction['polygons']: for _polygon in polygons: _polygon = np.array(_polygon).reshape(-1, 2) if len(_polygon) < 3: continue _polygon = _polygon.flatten().tolist() mask_draw.polygon(_polygon, outline=255, fill=255) return mask def combine_masks(masks): combined_mask = Image.new('L', masks[0].size, 0) for mask in masks: combined_mask = ImageChops.lighter(combined_mask, mask) return combined_mask def apply_combined_mask(image, combined_mask): # Convert the image to RGBA image = image.convert('RGBA') result_image = Image.new('RGBA', image.size, (255, 255, 255, 0)) result_image = Image.composite(image, result_image, combined_mask) return result_image def process_image_multiple_objects(image, descriptions): """ Process the image for multiple object descriptions. Parameters: - image: PIL.Image object. - descriptions: list of strings, descriptions of objects to retain. Returns: - output_image: Processed image with the specified objects retained. """ masks = [] for desc in descriptions: print(f"Processing description: {desc}") results = run_example('<REFERRING_EXPRESSION_SEGMENTATION>', image, text_input=desc.strip()) prediction = results['<REFERRING_EXPRESSION_SEGMENTATION>'] if not prediction['polygons']: print(f"No objects found for description: {desc}") continue # Generate mask for this object mask = create_mask(image.size, prediction) masks.append(mask) if not masks: print("No objects found for any of the descriptions.") return image.convert('RGBA') # Combine all masks combined_mask = combine_masks(masks) # Apply the combined mask output_image = apply_combined_mask(image, combined_mask) return output_image def on_file_upload(change): # Clear any previous output (except for the upload widget) clear_output(wait=True) display(widgets.HTML("<h3>Please upload an image file using the widget below:</h3>")) display(upload_button) # Check if a file has been uploaded if upload_button.value: # Get the first uploaded file uploaded_file = upload_button.value[0] # Access the content of the file image_data = uploaded_file.content image = Image.open(io.BytesIO(image_data)).convert('RGB') # Display the uploaded image print("Uploaded Image:") display(image) # Create a text box for object descriptions desc_box = widgets.Text( value='', placeholder='Enter descriptions of objects to retain, separated by commas', description='Object Descriptions:', disabled=False, layout=widgets.Layout(width='80%') ) # Create a button to submit the descriptions submit_button = widgets.Button( description='Process Image', disabled=False, button_style='primary', tooltip='Click to process the image', icon='check' ) # Function to handle the button click def on_submit_button_click(b): object_descriptions = desc_box.value if not object_descriptions.strip(): print("Please enter at least one description.") return # Disable the button to prevent multiple clicks submit_button.disabled = True # Clear previous output clear_output(wait=True) print("Processing the image. This may take a few moments...") # Split the descriptions by commas descriptions_list = [desc.strip() for desc in object_descriptions.split(',') if desc.strip()] if not descriptions_list: print("No valid descriptions entered. Exiting the process.") return # Process the image output_image = process_image_multiple_objects(image, descriptions_list) # Display the result display(output_image) # Optionally, save the output image # Uncomment the lines below to save the image # output_image.save('output_image.png') # print("The image with background removed has been saved as 'output_image.png'") submit_button.on_click(on_submit_button_click) # Display the text box and submit button display(widgets.VBox([desc_box, submit_button])) # Create the upload widget upload_button = widgets.FileUpload( accept='image/*', multiple=False ) display(widgets.HTML("<h3>Please upload an image file using the widget below:</h3>")) display(upload_button) # Observe changes in the upload widget upload_button.observe(on_file_upload, names='value') GPU resource needed during inference:
xinyuwei
Dec 29, 2024 Place AI - Azure AI services Blog
234Views
0likes
0Comments
Boost Your Holiday Spirit with Azure AI
Here's the revised LinkedIn post with points 7 and 8 integrated into points 2 and 3: 🎄✨ **Boost Your Holiday Spirit with Azure AI! 🎄✨ As we gear up for the holiday season, what better way to bring innovation to your business than by using cutting-edge Azure AI technologies? From personalized customer experiences to festive-themed data insights, here’s how Azure AI can help elevate your holiday initiatives: 🎅 1. Azure OpenAI Service for Creative Content Kickstart the holiday cheer by using Azure OpenAI to create engaging holiday content. From personalized greeting messages to festive social media posts, the GPT models can assist you in generating creative text in a snap. 🎨 Step-by-step: Use GPT to draft festive email newsletters, promotions, or customer-facing messages. Train models on your specific brand voice for customized holiday greetings. 🎁 2. Azure AI Services for Image Recognition and Generation Enhance your holiday product offerings by leveraging image recognition to identify and categorize holiday-themed products. Additionally, create stunning holiday-themed visuals with DALL-E. Generate unique images from text descriptions to make your holiday marketing materials stand out. 📸 Step-by-step: Use Azure Computer Vision to analyze product images and automatically categorize seasonal items. Implement the AI model in e-commerce platforms to help customers find holiday-specific products faster. Use DALL-E to generate holiday-themed images based on your descriptions. Customize and refine the images to fit your brand’s style. Incorporate these visuals into your marketing campaigns. ✨ 3. Azure AI Speech Services for Holiday Customer Interaction and Audio Generation Transform your customer service experience with Azure’s Speech-to-Text and Text-to-Speech services. You can create festive voice assistants or add holiday-themed voices to your customer support lines for a warm, personalized experience. Additionally, add a festive touch to your audio content with Azure OpenAI. Use models like Whisper for high-quality speech-to-text and text-to-speech conversions, perfect for creating holiday-themed audio messages and voice assistants. 🎙️ Step-by-step: Use Speech-to-Text to transcribe customer feedback or support requests in real-time. Build a holiday-themed voice model using Text-to-Speech for interactive voice assistants. Use Whisper to transcribe holiday messages or convert text to festive audio. Customize the audio to match your brand’s tone and style. Implement these audio clips in customer interactions or marketing materials. 🎄 4. Azure Machine Learning for Predictive Holiday Trends Stay ahead of holiday trends with Azure ML models. Use AI to analyze customer behavior, forecast demand for holiday products, and manage stock levels efficiently. Predict what your customers need before they even ask! 📊 Step-by-step: Use Azure ML to train models on historical sales data to predict trends in holiday shopping. Build dashboards using Power BI integrated with Azure for real-time tracking of holiday performance metrics. 🔔 5. Azure AI for Sentiment Analysis Understand the holiday mood of your customers by implementing sentiment analysis on social media, reviews, and feedback. Gauge the public sentiment around your brand during the festive season and respond accordingly. 📈 Step-by-step: Use Text Analytics for sentiment analysis on customer feedback, reviews, or social media posts. Generate insights and adapt your holiday marketing based on customer sentiment trends. 🌟 6. Latest Azure AI Open Models Explore the newest Azure AI models to bring even more innovation to your holiday projects: GPT-4o and GPT-4 Turbo: These models offer enhanced capabilities for understanding and generating natural language and code, perfect for creating sophisticated holiday content. Embeddings: Use these models to convert holiday-related text into numerical vectors for improved text similarity and search capabilities. 🔧 7. Azure AI Foundry Leverage Azure AI Foundry to build, deploy, and scale AI-driven applications. This platform provides everything you need to customize, host, run, and manage AI applications, ensuring your holiday projects are innovative and efficient 🎉 Conclusion: With Azure AI, the possibilities to brighten your business this holiday season are endless! Whether it's automating your operations or delivering personalized customer experiences, Azure's AI models can help you stay ahead of the game and spread holiday joy. Wishing everyone a season filled with innovation and success! 🎄✨
SAVITAMITTAL
Dec 20, 2024 Place AI - Azure AI services Blog
344Views
1like
0Comments
Multimodal video search powered by Video Retrieval in Azure
Video content is becoming increasingly central to business operations, from training materials to safety monitoring. As part of Azure's comprehensive video analysis capabilities, we're excited to discuss Azure Video Retrieval, a powerful service that enables natural language search across your video and image content. This service makes it easier than ever to locate exactly what you need within your media assets. What is Azure Video Retrieval? Azure Video Retrieval allows you to create a search index and populate it with both videos and images. Using natural language queries, you can search through this content to identify visual elements (like objects and safety events) and speech content without requiring manual transcription or specialized expertise. The service offers powerful customization options - developers can define metadata schemas for each index, ingest custom metadata, and specify which features (vision, speech) to extract and filter during search operations. Whether you're looking for specific spoken phrases or visual occurrences, the service pinpoints exact timestamps where your search criteria appear. Key Features Multimodal Search: Search across both visual and audio content using natural language Custom Metadata Support: Define and ingest metadata schemas for enhanced retrieval Flexible Feature Extraction: Specify which features (vision, speech) to extract and search Precise Timestamp Matching: Get exact frame locations where your search criteria appear Multiple Content Types: Index and search both videos and images Simple Integration: Easy implementation with Azure Blob Storage Comprehensive API: Full REST API support for custom implementations Getting Started Prerequisites Before you begin, you'll need: An Azure Cognitive Services multi-service account An Azure Blob Storage Account for video content Setting Up Video Indexing The indexing process is straightforward. Here's how to create an index and upload videos: # Iterate through blobs and build the index for blob in blob_service_client.get_container_client(az_storage_container_name).list_blobs(): blob_name = blob.name blob_url = f"https://{az_storage_account_name}.blob.core.windows.net/{az_storage_container_name}/{blob_name}" # Generate SAS URL for secure access sas_url = blob_url + "?" + sas_token # Add video to index payload["videos"].append({ "mode": "add", "documentId": str(uuid.uuid4()), "documentUrl": sas_url, "metadata": { "cameraId": "video-indexer-demo-camera1", "timestamp": datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d %H:%M:%S") } }) # Create index response = requests.put(url, headers=headers, json=payload) Searching Videos The service supports two primary search modes: # Query templates for searching by text or speech query_by_text = { "queryText": "<user query>", "filters": { "featureFilters": ["vision"], }, } query_by_speech = { "queryText": "<user query>", "filters": { "featureFilters": ["speech"], }, } The search input is passed to the REST API based on the mode chosen. # Function to search for video frames based on user input, from the Azure Video Retrieval Service def search_videos(query, query_type): url = f"https://{az_video_indexer_endpoint}/computervision/retrieval/indexes/{az_video_indexer_index_name}:queryByText?api-version={az_video_indexer_api_version}" headers = { "Ocp-Apim-Subscription-Key": az_video_indexer_key, "Content-Type": "application/json", } input_query = None if query_type == "Speech": query_by_speech["queryText"] = query input_query = query_by_speech else: query_by_text["queryText"] = query input_query = query_by_text try: response = requests.post(url, headers=headers, json=input_query) response.raise_for_status() print("search response \n", response.json()) return response.json() except Exception as e: print("error", e.args) print("error", e) return None The REST APIs that are required to complete the steps in this process are covered here Use Cases Azure Video Retrieval can transform how organizations work with video content across various scenarios: Training and Education: Quickly locate specific topics or demonstrations within training videos Content Management: Efficiently organize and retrieve media assets Safety and Compliance: Find specific safety-related content or incidents Media Production: Locate specific scenes or dialogue across video libraries Demo Watch this sample application that uses Video retrieval to let users search frames across multiple videos in an Index The source code of the sample application can be accessed here Resources : Video Retrieval API Video Retrieval API reference Azure AI Video Indexer overview
srikantan
Dec 17, 2024 Place AI - Azure AI services Blog
322Views
0likes
0Comments