azure ai vision
30 TopicsExploring Azure OpenAI Assistants and Azure AI Agent Services: Benefits and Opportunities
In the rapidly evolving landscape of artificial intelligence, businesses are increasingly turning to cloud-based solutions to harness the power of AI. Microsoft Azure offers two prominent services in this domain: Azure OpenAI Assistants and Azure AI Agent Services. While both services aim to enhance user experiences and streamline operations, they cater to different needs and use cases. This blog post will delve into the details of each service, their benefits, and the opportunities they present for businesses. Understanding Azure OpenAI Assistants What Are Azure OpenAI Assistants? Azure OpenAI Assistants are designed to leverage the capabilities of OpenAI's models, such as GPT-3 and its successors. These assistants are tailored for applications that require advanced natural language processing (NLP) and understanding, making them ideal for conversational agents, chatbots, and other interactive applications. Key Features Pre-trained Models: Azure OpenAI Assistants utilize pre-trained models from OpenAI, which means they come with a wealth of knowledge and language understanding out of the box. This reduces the time and effort required for training models from scratch. Customizability: While the models are pre-trained, developers can fine-tune them to meet specific business needs. This allows for the creation of personalized experiences that resonate with users. Integration with Azure Ecosystem: Azure OpenAI Assistants seamlessly integrate with other Azure services, such as Azure Functions, Azure Logic Apps, and Azure Cognitive Services. This enables businesses to build comprehensive solutions that leverage multiple Azure capabilities. Benefits of Azure OpenAI Assistants Enhanced User Experience: By utilizing advanced NLP capabilities, Azure OpenAI Assistants can provide more natural and engaging interactions. This leads to improved customer satisfaction and loyalty. Rapid Deployment: The availability of pre-trained models allows businesses to deploy AI solutions quickly. This is particularly beneficial for organizations looking to implement AI without extensive development time. Scalability: Azure's cloud infrastructure ensures that applications built with OpenAI Assistants can scale to meet growing user demands without compromising performance. Understanding Azure AI Agent Services What Are Azure AI Agent Services? Azure AI Agent Services provide a more flexible framework for building AI-driven applications. Unlike Azure OpenAI Assistants, which are limited to OpenAI models, Azure AI Agent Services allow developers to utilize a variety of AI models, including those from other providers or custom-built models. Key Features Model Agnosticism: Developers can choose from a wide range of AI models, enabling them to select the best fit for their specific use case. This flexibility encourages innovation and experimentation. Custom Agent Development: Azure AI Agent Services support the creation of custom agents that can perform a variety of tasks, from simple queries to complex decision-making processes. Integration with Other AI Services: Like OpenAI Assistants, Azure AI Agent Services can integrate with other Azure services, allowing for the creation of sophisticated AI solutions that leverage multiple technologies. Benefits of Azure AI Agent Services Diverse Use Cases: The ability to use any AI model opens a world of possibilities for businesses. Whether it's a specialized model for sentiment analysis or a custom-built model for a niche application, organizations can tailor their solutions to meet specific needs. Enhanced Automation: AI agents can automate repetitive tasks, freeing up human resources for more strategic activities. This leads to increased efficiency and productivity. Cost-Effectiveness: By allowing the use of various models, businesses can choose cost-effective solutions that align with their budget and performance requirements. Opportunities for Businesses Improved Customer Engagement Both Azure OpenAI Assistants and Azure AI Agent Services can significantly enhance customer engagement. By providing personalized and context-aware interactions, businesses can create a more satisfying user experience. For example, a retail company can use an AI assistant to provide tailored product recommendations based on customer preferences and past purchases. Data-Driven Decision Making AI agents can analyze vast amounts of data and provide actionable insights. This capability enables organizations to make informed decisions based on real-time data analysis. For instance, a financial institution can deploy an AI agent to monitor market trends and provide investment recommendations to clients. Streamlined Operations By automating routine tasks, businesses can streamline their operations and reduce operational costs. For example, a customer support team can use AI agents to handle common inquiries, allowing human agents to focus on more complex issues. Innovation and Experimentation The flexibility of Azure AI Agent Services encourages innovation. Developers can experiment with different models and approaches to find the most effective solutions for their specific challenges. This culture of experimentation can lead to breakthroughs in product development and service delivery. Enhanced Analytics and Insights Integrating AI agents with analytics tools can provide businesses with deeper insights into customer behavior and preferences. This data can inform marketing strategies, product development, and customer service improvements. For example, a company can analyze interactions with an AI assistant to identify common customer pain points, allowing them to address these issues proactively. Conclusion In summary, both Azure OpenAI Assistants and Azure AI Agent Services offer unique advantages that can significantly benefit businesses looking to leverage AI technology. Azure OpenAI Assistants provide a robust framework for building conversational agents using advanced OpenAI models, making them ideal for applications that require sophisticated natural language understanding and generation. Their ease of integration, rapid deployment, and enhanced user experience make them a compelling choice for businesses focused on customer engagement. Azure AI Agent Services, on the other hand, offer unparalleled flexibility by allowing developers to utilize a variety of AI models. This model-agnostic approach encourages innovation and experimentation, enabling businesses to tailor solutions to their specific needs. The ability to automate tasks and streamline operations can lead to significant cost savings and increased efficiency. Additional Resources To further explore Azure OpenAI Assistants and Azure AI Agent Services, consider the following resources: Agent Service on Microsoft Learn Docs Watch On-Demand Sessions Streamlining Customer Service with AI-Powered Agents: Building Intelligent Multi-Agent Systems with Azure AI Microsoft learn Develop AI agents on Azure - Training | Microsoft Learn Community and Announcements Tech Community Announcement: Introducing Azure AI Agent Service Bonus Blog Post: Announcing the Public Preview of Azure AI Agent Service AI Agents for Beginners 10 Lesson Course https://aka.ms/ai-agents-beginners524Views0likes2CommentsLearn about Azure AI during the Global AI Bootcamp 2025
The Global AI Bootcamp starting next week, and it’s more exciting than ever! With 135 bootcamps in 44 countries, this is your chance to be part of a global movement in AI innovation. 🤖🌍 From Germany to India, Nigeria to Canada, and beyond, join us for hands-on workshops, expert talks, and networking opportunities that will boost your AI skills and career. Whether you’re a seasoned pro or just starting out, there’s something for everyone! 🚀 Why Attend? 🛠️ Hands-on Workshops: Build and deploy AI models. 🎤 Expert Talks: Learn the latest trends from industry leaders. 🤝 Network: Connect with peers, mentors, and potential collaborators. 📈 Career Growth: Discover new career paths in AI. Don't miss this incredible opportunity to learn, connect, and grow! Check out the event in your city or join virtually. Let's shape the future of AI together! 🌟 👉 Explore All Bootcamps384Views0likes0CommentsFrom Foundry to Fine-Tuning: Topics you Need to Know in Azure AI Services
With so many new features from Azure and newer ways of development, especially in generative AI, you must be wondering what all the different things you need to know are and where to start in Azure AI. Whether you're a developer or IT professional, this guide will help you understand the key features, use cases, and documentation links for each service. Let's explore how Azure AI can transform your projects and drive innovation in your organization. Stay tuned for more details! Term Description Use Case Azure Resource Azure AI Foundry A comprehensive platform for building, deploying, and managing AI-driven applications. Customizing, hosting, running, and managing AI applications. Azure AI Foundry AI Agent Within Azure AI Foundry, an AI Agent acts as a "smart" microservice that can be used to answer questions (RAG), perform actions, or completely automate workflows. can be used in a variety of applications to automate tasks, improve efficiency, and enhance user experiences. Link AutoGen An open-source framework designed for building and managing AI agents, supporting workflows with multiple agents. Developing complex AI applications with multiple agents. Autogen Multi-Agent AI Systems where multiple AI agents collaborate to solve complex tasks. Managing energy in smart grids, coordinating drones. Link Model as a Platform A business model leveraging digital infrastructure to facilitate interactions between user groups. Social media channels, online marketplaces, crowdsourcing websites. Link Azure OpenAI Service Provides access to OpenAI’s powerful language models integrated into the Azure platform. Text generation, summarization, translation, conversational AI. Azure OpenAI Service Azure AI Services A suite of APIs and services designed to add AI capabilities like image analysis, speech-to-text, and language understanding to applications. Image analysis, speech-to-text, language understanding. Link Azure Machine Learning (Azure ML) A cloud-based service for building, training, and deploying machine learning models. Creating models to predict sales, detect fraud. Azure Machine Learning Azure AI Search An AI-powered search service that enhances information to facilitate exploration. Enterprise search, e-commerce search, knowledge mining. Azure AI Search Azure Bot Service A platform for developing intelligent, enterprise-grade bots. Creating chatbots for customer service, virtual assistants. Azure Bot Service Deep Learning A subset of ML using neural networks with many layers to analyze complex data. Image and speech recognition, natural language processing. Link Multimodal AI AI that integrates and processes multiple types of data, such as text and images(including input & output). Describing images, answering questions about pictures. Azure OpenAI Service, Azure AI Services Unimodal AI AI that processes a single type of data, such as text or images (including input & output). Writing text, recognizing objects in photos. Azure OpenAI Service, Azure AI Services Fine-Tuning Models Adapting pre-trained models to specific tasks or datasets for improved performance. Customizing models for specific industries like healthcare. Azure Foundry Model Catalog A repository of pre-trained models available for use in AI projects. Discovering, evaluating, fine-tuning, and deploying models. Model Catalog Capacity & Quotas Limits and quotas for using Azure AI services, ensuring optimal resource allocation. Managing resource usage and scaling AI applications. Link Tokens Units of text processed by language models, affecting cost and performance. Managing and optimizing text processing tasks. Link TPM (Tokens per Minute) A measure of the rate at which tokens are processed, impacting throughput and performance. Allocating and managing processing capacity for AI models. Link PTU(provisioned throughput) provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. Ensuring predictable performance for AI applications. Link966Views1like0CommentsReal Time, Real You: Announcing General Availability of Face Liveness Detection
A Milestone in Identity Verification We are excited to announce the general availability of our face liveness detection features, a key milestone in making identity verification both seamless and secure. As deepfake technology and sophisticated spoofing attacks continue to evolve, organizations need solutions that can verify the authenticity of an individual in real time. During the preview, we listened to customer feedback, expanded capabilities, and made significant improvements to ensure that liveness detection works across three platforms and for common use cases. What’s New Since the Preview? During the preview, we introduced several features that laid the foundation for secure and seamless identity verification, including active challenge in JavaScript library. Building on that foundation, there are improvements across the board. Here’s what’s new: Feature Parity Across Platforms: Liveness detection’s active challenge is now available on both Android and iOS platforms, achieving full feature parity across all supported devices. This allows a consistent and seamless experience for both developers and end users on all three supported platforms. Easy integration: The liveness detection client SDK now requires only a single function call to start the entire flow, making it easier for developers to integrate. The SDK also includes an integrated UI flow to simplify implementation, allowing a seamless developer experience across platforms. Runtime environment safety: The liveness detection client SDK integrated safety check for untrustworthy runtime environment on both iOS and Android devices. Accuracy and Usability Improvements: We’ve delivered numerous bug fixes and enhancements to improve detection accuracy and user experience across all supported platforms. Our solution is now faster, more intuitive, and more resilient against even the most advanced spoofing techniques. These advancements help that businesses integrate liveness detection with confidence, providing both security and convenience. Security in Focus: Microsoft’s Commitment to Innovation As identity verification threats continue to evolve, general availability is the start of the journey. Microsoft is dedicated to advancing our face liveness detection technology to address evolving security challenges: Continuous Support and Innovation: Our team is actively monitoring emerging spoofing techniques. With ongoing updates and enhancements, we ensure that our liveness detection solution adapts to new challenges. Learn more about liveness detection updates. Security and Privacy by Design: Microsoft’s principles of security and privacy are built into every step. We provide robust support to assist customers in integrating and maintaining these solutions effectively. We process the data securely, respecting user privacy and complying with global regulations. By collaborating closely with our customers, we ensure that together, we build solutions that are not only innovative but also secure. Learn more about shared responsibility in liveness solutions We provide reliable, long-term solutions to help organizations stay ahead of threats. Get Start Today We’re excited for customers to experience the benefits of real-time liveness detection. Whether you’re safeguarding financial transactions, streamlining digital onboarding, or enabling secure logins, our solution can strengthen your security. Explore: Learn more about integrating liveness detection into your applications by this tutorial. Try it Out: Liveness detection is available to experience in Vision Studio Build with Confidence: Empower your organization with secure, real-time identity verification. Try our sample code to see how easy it is to get started: Azure-Samples/azure-ai-vision-sdk A Step Toward a Safer Future With a focus on real-time, reliable identity verification, we’re making identity verification smarter, faster, and safer. As we continue to improve and evolve this solution, our goal remains the same: to protect identities, build trust, and verify that the person behind the screen is really you. Start building with liveness detection today and join us on this journey toward a more secure digital world.644Views4likes0CommentsDify work with Microsoft AI Search
Please refer to my repo to get more AI resources, wellcome to star it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/LLMs/Dify-With-AI-Search Dify work with Microsoft AI Search Dify is an open-source platform for developing large language model (LLM) applications. It combines the concepts of Backend as a Service (BaaS) and LLMOps, enabling developers to quickly build production-grade generative AI applications. Dify offers various types of tools, including first-party and custom tools. These tools can extend the capabilities of LLMs, such as web search, scientific calculations, image generation, and more. On Dify, you can create more powerful AI applications, like intelligent assistant-type applications, which can complete complex tasks through task reasoning, step decomposition, and tool invocation. Dify works with AI Search Demo Till now, Dify could not integrate with Microsoft directly via default Dify web portal. Let me show how to achieve it. Please click below pictures to see my demo video on Yutube: https://www.youtube.com/watch?v=20GjS6AtjTo Dify works with AI Search Configuration steps Configure on AI search Create index, make sure you could get the result from AI search index: Run dify on VM via docker: root@a100vm:~# docker ps |grep -i dify 5d6c32a94313 langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-worker-1 264e477883ee langgenius/dify-api:0.8.3 "/bin/bash /entrypoi…" 3 months ago Up 3 minutes 5001/tcp docker-api-1 2eb90cd5280a langgenius/dify-sandbox:0.2.9 "/main" 3 months ago Up 3 minutes (healthy) docker-sandbox-1 708937964fbb langgenius/dify-web:0.8.3 "/bin/sh ./entrypoin…" 3 months ago Up 3 minutes 3000/tcp docker-web-1 Create customer tool in Dify portal,set schema: schema details: { "openapi": "3.0.0", "info": { "title": "Azure Cognitive Search Integration", "version": "1.0.0" }, "servers": [ { "url": "https://ai-search-eastus-xinyuwei.search.windows.net" } ], "paths": { "/indexes/wukong-doc1/docs": { "get": { "operationId": "getSearchResults", "parameters": [ { "name": "api-version", "in": "query", "required": true, "schema": { "type": "string", "example": "2024-11-01-preview" } }, { "name": "search", "in": "query", "required": true, "schema": { "type": "string" } } ], "responses": { "200": { "description": "Successful response", "content": { "application/json": { "schema": { "type": "object", "properties": { "@odata.context": { "type": "string" }, "value": { "type": "array", "items": { "type": "object", "properties": { "@search.score": { "type": "number" }, "chunk_id": { "type": "string" }, "parent_id": { "type": "string" }, "title": { "type": "string" }, "chunk": { "type": "string" }, "text_vector": { "type": "SingleCollection" }, } } } } } } } } } } } } } Set AI Search AI key: Do search test: Input words: Create a workflow on dify: Check AI search stage: Check LLM stage: Run workflow: Get workflow result:1.6KViews0likes0CommentsAnnouncing the General Availability of Document Intelligence v4.0 API
The Document Intelligence v4.0 API is now generally available! This latest version of Document Intelligence API brings new and updated capabilities across the entire product including updates to Read and Layout APIs for content extraction, prebuilt and custom extraction models for schema extraction from documents and classification models. Document Intelligence has all the tools to enable RAG and document automation solutions for structured and unstructured documents. Enhanced Layout capabilities This release brings significant updates to our Layout capabilities, making it the default choice for document ingestion with enhanced support for Retrieval-Augmented Generation (RAG) workflows. The Layout API now offers a markdown output format that provides a better representation of document elements such as headers, footers, sections, section headers and tables when working with Gen AI models. This structured output enables semantic chunking of content, making it easier to ingest documents into RAG workflows and generate more accurate results. Try Layout in the Document Intelligence Studio or use Layout as a skill in your RAG pipelines with Azure Search. Searchable PDF output Document Intelligence no longer outputs only JSON! With the 4.0 release, you can now generate a searchable PDF output from an input document. The recognized text is overlaid over the scanned text, making all the content in the documents instantly searchable. This feature enhances the accessibility and usability of your documents, allowing for quick and efficient information retrieval. Try the new searchable PDF output in the Studio or learn more. Searchable PDF is available as an output from the Read API at no additional cost. This release also includes several updates to the OCR model to better handle complex text recognition challenges. New and updated Prebuilt models Prebuilt models offer a simple API to extract a defined schema from known document types. The v4.0 release adds new prebuilt models for mortgage processing, bank document processing, paystub, credit/debit card, check, marriage certificate, and prebuilt models for processing variants of the 1095, W4, and 1099 tax forms for US tax processing scenarios. These models are ideal for extracting specific details from documents like bank statements, checks, paystubs, and various tax forms. With over 22 prebuilt model types, Document Intelligence has models for common documents in procurement, tax, mortgage and financial services. See models overview for a complete list of document types supported with prebuilt models. Query field add-on capability Query field is an add-on capability to extend the schema extracted from any prebuilt model. This add-on capability is ideal when you have simple fields that need to be extracted. Query field also work with Layout, so for simple documents, you don’t need to train a custom model and can just define the query fields to begin processing the document with no training. Query field supports a maximum of 20 fields per request. Try query field in the Document Intelligence Studio with Layout or any prebuilt model. Document classification model The custom classification models are updated to improve the classification process and now support multi-language documents and incremental training. This allows you to update the classifier model with additional samples or classes without needing the entire training dataset. Classifiers also support analyzing Office document types (.docx, .pptx, and .xls). Version 4.0 adds a classifier copy operation for copying your classifier across resources, regions or subscriptions making model management easier. This version also introduces some changes in the splitting behavior, by default, the custom classification model no longer splits documents during analysis. Learn more about the classification and splitting capabilities. Improvements to Custom Extraction models Custom extraction models now output confidence scores for tables, table rows, and cells. This makes the process of validating model results much easier and provides the tools to trigger human reviews. Custom model capabilities have also improved with the addition of signature detection to neural models and support for overlapping fields. Neural models now include a paid training tier for when you have a large dataset of labeled documents to train. Paid training enables longer training to ensure you have a model that performs better on the different variations in your training dataset. Learn more about improvements to custom extraction models. New implementation of model compose for greater flexibility With custom extraction models in the past, you could compose multiple models into a single composed model. When a document was analyzed with a composed model, the service picked the model best suited to process the document. With this version, the model compose introduces a new implementation requiring a classification model in addition to the extraction models. This enables processing multiple instances of the same document with splitting, conditional routing and more. Learn more about the new model compose implementation. Get started with the v4.0 API today The Document Intelligence v4.0 API is packed with many more updates. Start with the what’s new page to learn more. You can try all of the new and updated capabilities in the Document Intelligence Studio. Explore the new REST API or the language specific SDKs to start building our updating your document workflows.2.3KViews1like1CommentUnlock Multimodal Data Insights with Azure AI Content Understanding: New Code Samples Available
We are excited to share code samples that leverage the Azure AI Content Understanding service to help you extract insights from your images, documents, videos, and audio content. These code samples are available on GitHub and cover the following: Azure AI integrations Visual Document Search: Leverage Azure Document Intelligence, Content Understanding, Azure Search, and Azure OpenAI to unlock natural language search of document contents for a complex document with pictures of charts and diagrams. Video Chapter Generation: Generate video chapters using Azure Content Understanding and Azure OpenAI. This allows you to break long videos into smaller, labeled parts with key details, making it easier to find, share, and access the most relevant content. Video Content Discovery: Learn how to use Content Understanding, Azure Search, and Azure OpenAI models to process videos and create a searchable index for AI-driven content discovery. Content Understanding Operations Analyzer Templates: An Analyzer enables you to tailor Content Understanding to extract valuable insights from your content based on your specific needs. Start quickly with these ready-made templates. Content Extraction: Learn how Content Understanding API can extract semantic information from various files including performing OCR to recognize tables in documents, transcribing audio files, and analyzing faces in videos. Field Extraction: This example demonstrates how to extract specific fields from your content. For instance, you can identify the invoice amount in a document, capture names mentioned in an audio file, or generate a summary of a video. Analyzer Training: For document scenarios, you can further enhance field extraction performance by providing a few labeled samples. Analyzer management: Create a minimal analyzer, list all analyzers in your resource, and delete any analyzers you no longer need. Azure AI Content Understanding: Turn Multimodal Content into Structured Data Azure AI Content Understanding is a cutting-edge Azure AI offering designed to help businesses seamlessly extract insights from various content types. Built with and for Generative AI, it empowers organizations to seamlessly develop GenAI solutions using the latest models, without needing advanced AI expertise. Content Understanding simplifies the processing of unstructured data stores of documents, images, videos, and audio—transforming them into structured, actionable insights. It is versatile and adaptable across numerous industries and, use case scenarios, offering customization and support for input from multiple data types. Here are a few example use cases: Retrieval Augmented Generation (RAG): Enhance and integrate content from any format to power effective content searches or provide answers to frequent questions in scenarios like customer service or enterprise-wide data retrieval. Post-call analytics: Organizations use Content Understanding to analyze call center or meeting recordings, extracting insights like sentiment, speaker details, and topics discussed, including names, companies, and other relevant data. Insurance claims processing: Automate time-consuming processes like analyzing and handling insurance claims or other low-latency batch processing tasks. Media asset management and content creation: Extract essential features from images and videos to streamline media asset organization and enable entity-based searches for brands, settings, key products, and people. Resources & Documentation To begin extracting valuable insights from your multimodal content, explore the following resources: Azure Content Understanding Overview Azure Content Understanding in Azure AI Foundry FAQs Want to get in touch? We’d love to hear from you! Send us an email at cu_contact@microsoft.com1.1KViews0likes0CommentsFine-Tuning and Deploying Phi-3.5 Model with Azure and AI Toolkit
What is Phi-3.5? Phi-3.5 as a state-of-the-art language model with strong multilingual capabilities. Emphasize that it is designed to handle multiple languages with high proficiency, making it a versatile tool for Natural Language Processing (NLP) tasks across different linguistic backgrounds. Key Features of Phi-3.5 Highlight the core features of the Phi-3.5 model: Multilingual Capabilities: Explain that the model supports a wide variety of languages, including major world languages such as English, Spanish, Chinese, French, and others. You can provide an example of its ability to handle a sentence or document translation from one language to another without losing context or meaning. Fine-Tuning Ability: Discuss how the model can be fine-tuned for specific use cases. For instance, in a customer support setting, the Phi-3.5 model can be fine-tuned to understand the nuances of different languages used by customers across the globe, improving response accuracy. High Performance in NLP Tasks: Phi-3.5 is optimized for tasks like text classification, machine translation, summarization, and more. It has superior performance in handling large-scale datasets and producing coherent, contextually correct language outputs. Applications in Real-World Scenarios To make this section more engaging, provide a few real-world applications where the Phi-3.5 model can be utilized: Customer Support Chatbots: For companies with global customer bases, the model’s multilingual support can enhance chatbot capabilities, allowing for real-time responses in a customer’s native language, no matter where they are located. Content Creation for Global Markets: Discuss how businesses can use Phi-3.5 to automatically generate or translate content for different regions. For example, marketing copy can be adapted to fit cultural and linguistic nuances in multiple languages. Document Summarization Across Languages: Highlight how the model can be used to summarize long documents or articles written in one language and then translate the summary into another language, improving access to information for non-native speakers. Why Choose Phi-3.5 for Your Project? End this section by emphasizing why someone should use Phi-3.5: Versatility: It’s not limited to just one or two languages but performs well across many. Customization: The ability to fine-tune it for particular use cases or industries makes it highly adaptable. Ease of Deployment: With tools like Azure ML and Ollama, deploying Phi-3.5 in the cloud or locally is accessible even for smaller teams. Objective Of Blog Specialized Language Models (SLMs) are at the forefront of advancements in Natural Language Processing, offering fine-tuned, high-performance solutions for specific tasks and languages. Among these, the Phi-3.5 model has emerged as a powerful tool, excelling in its multilingual capabilities. Whether you're working with English, Spanish, Mandarin, or any other major world language, Phi-3.5 offers robust, reliable language processing that adapts to various real-world applications. This makes it an ideal choice for businesses looking to deploy multilingual chatbots, automate content generation, or translate customer interactions in real time. Moreover, its fine-tuning ability allows for customization, making Phi-3.5 versatile across industries and tasks. Customization and Fine-Tuning for Different Applications The Phi-3.5 model is not just limited to general language understanding tasks. It can be fine-tuned for specific applications, industries, and language models, allowing users to tailor its performance to meet their needs. Customizable for Industry-Specific Use Cases: With fine-tuning, the model can be trained further on domain-specific data to handle particular use cases like legal document translation, medical records analysis, or technical support. Example: A healthcare company can fine-tune Phi-3.5 to understand medical terminology in multiple languages, enabling it to assist in processing patient records or generating multilingual health reports. Adapting for Specialized Tasks: You can train Phi-3.5 to perform specialized tasks like sentiment analysis, text summarization, or named entity recognition in specific languages. Fine-tuning helps enhance the model's ability to handle unique text formats or requirements. Example: A marketing team can fine-tune the model to analyse customer feedback in different languages to identify trends or sentiment across various regions. The model can quickly classify feedback as positive, negative, or neutral, even in less widely spoken languages like Arabic or Korean. Applications in Real-World Scenarios To illustrate the versatility of Phi-3.5, here are some real-world applications where this model excels, demonstrating its multilingual capabilities and customization potential: Case Study 1: Multilingual Customer Support Chatbots Many global companies rely on chatbots to handle customer queries in real-time. With Phi-3.5’s multilingual abilities, businesses can deploy a single model that understands and responds in multiple languages, cutting down on the need to create language-specific chatbots. Example: A global airline can use Phi-3.5 to power its customer service bot. Passengers from different countries can inquire about their flight status or baggage policies in their native languages—whether it's Japanese, Hindi, or Portuguese—and the model responds accurately in the appropriate language. Case Study 2: Multilingual Content Generation Phi-3.5 is also useful for businesses that need to generate content in different languages. For example, marketing campaigns often require creating region-specific ads or blog posts in multiple languages. Phi-3.5 can help automate this process by generating localized content that is not just translated but adapted to fit the cultural context of the target audience. Example: An international cosmetics brand can use Phi-3.5 to automatically generate product descriptions for different regions. Instead of merely translating a product description from English to Spanish, the model can tailor the description to fit cultural expectations, using language that resonates with Spanish-speaking audiences. Case Study 3: Document Translation and Summarization Phi-3.5 can be used to translate or summarize complex documents across languages. Its ability to preserve meaning and context across languages makes it ideal for industries where accuracy is crucial, such as legal or academic fields. Example: A legal firm working on cross-border cases can use Phi-3.5 to translate contracts or legal briefs from German to English, ensuring the context and legal terminology are accurately preserved. It can also summarize lengthy documents in multiple languages, saving time for legal teams. Fine-Tuning Phi-3.5 Model Fine-tuning a language model like Phi-3.5 is a crucial step in adapting it to perform specific tasks or cater to specific domains. This section will walk through what fine-tuning is, its importance in NLP, and how to fine-tune the Phi-3.5 model using Azure Model Catalog for different languages and tasks. We'll also explore a code example and best practices for evaluating and validating the fine-tuned model. What is Fine-Tuning? Fine-tuning refers to the process of taking a pre-trained model and adapting it to a specific task or dataset by training it further on domain-specific data. In the context of NLP, fine-tuning is often required to ensure that the language model understands the nuances of a particular language, industry-specific terminology, or a specific use case. Why Fine-Tuning is Necessary Pre-trained Large Language Models (LLMs) are trained on diverse datasets and can handle various tasks like text summarization, generation, and question answering. However, they may not perform optimally in specialized domains without fine-tuning. The goal of fine-tuning is to enhance the model's performance on specific tasks by leveraging its prior knowledge while adapting it to new contexts. Challenges of Fine-Tuning Resource Intensiveness: Fine-tuning large models can be computationally expensive, requiring significant hardware resources. Storage Costs: Each fine-tuned model can be large, leading to increased storage needs when deploying multiple models for different tasks. LoRA and QLoRA To address these challenges, techniques like LoRA (Low-rank Adaptation) and QLoRA (Quantized Low-rank Adaptation) have emerged. Both methods aim to make the fine-tuning process more efficient: LoRA: This technique reduces the number of trainable parameters by introducing low-rank matrices into the model while keeping the original model weights frozen. This approach minimizes memory usage and speeds up the fine-tuning process. QLoRA: An enhancement of LoRA, QLoRA incorporates quantization techniques to further reduce memory requirements and increase the efficiency of the fine-tuning process. It allows for the deployment of large models on consumer hardware without the extensive resource demands typically associated with full fine-tuning. from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments from peft import get_peft_model, LoraConfig # Load a pre-trained model model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") # Configure LoRA lora_config = LoraConfig( r=16, # Rank lora_alpha=32, lora_dropout=0.1, ) # Wrap the model with LoRA model = get_peft_model(model, lora_config) # Define training arguments training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=16, per_device_eval_batch_size=16, num_train_epochs=3, ) # Create a Trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, ) # Start fine-tuning trainer.train() This code outlines how to set up a model for fine-tuning using LoRA, which can significantly reduce the resource requirements while still adapting the model effectively to specific tasks. In summary, fine-tuning with methods like LoRA and QLoRA is essential for optimizing pre-trained models for specific applications in NLP, making it feasible to deploy these powerful models in various domains efficiently. Why is Fine-Tuning Important in NLP? Task-Specific Performance: Fine-tuning helps improve performance for tasks like text classification, machine translation, or sentiment analysis in specific domains (e.g., legal, healthcare). Language-Specific Adaptation: Since models like Phi-3.5 are trained on general datasets, fine-tuning helps them handle industry-specific jargon or linguistic quirks. Efficient Resource Utilization: Instead of training a model from scratch, fine-tuning leverages pre-trained knowledge, saving computational resources and time. Steps to Fine-Tune Phi-3.5 in Azure AI Foundry Fine-tuning the Phi-3.5 model in Azure AI Foundry involves several key steps. Azure provides a user-friendly interface to streamline model customization, allowing you to quickly configure, train, and deploy models. Step 1: Setting Up the Environment in Azure AI Foundry Access Azure AI Foundry: Log in to Azure AI Foundry. If you don’t have an account, you can create one and set up a workspace. Create a New Experiment: Once in the Azure AI Foundry, create a new training experiment. Choose the Phi-3.5 model from the pre-trained models provided in the Azure Model Zoo. Set Up the Data for Fine-Tuning: Upload your custom dataset for fine-tuning. Ensure the dataset is in a compatible format (e.g., CSV, JSON). For instance, if you are fine-tuning the model for a customer service chatbot, you could upload customer queries in different languages. Step 2: Configure Fine-Tuning Settings Select the Training Dataset: Select the dataset you uploaded and link it to the Phi-3.5 model. 2) Configure the Hyperparameters: Set up training hyperparameters like the number of epochs, learning rate, and batch size. You may need to experiment with these settings to achieve optimal performance. 3) Choose the Task Type: Specify the task you are fine-tuning for, such as text classification, translation, or summarization. This helps Azure AI Foundry understand how to optimize the model during fine-tuning. 4) Fine-Tuning for Specific Languages: If you are fine-tuning for a specific language or multilingual tasks, ensure that the dataset is labeled appropriately and contains enough examples in the target language(s). This will allow Phi-3.5 to learn language-specific features effectively. Step 3: Train the Model Launch the Training Process: Once the configuration is complete, launch the training process in Azure AI Foundry. Depending on the size of your dataset and the complexity of the model, this could take some time. Monitor Training Progress: Use Azure AI Foundry’s built-in monitoring tools to track performance metrics such as loss, accuracy, and F1 score. You can view the model’s progress during training to ensure that it is learning effectively. Code Example: Fine-Tuning Phi-3.5 for a Specific Use Case Here's a code snippet for fine-tuning the Phi-3.5 model using Python and Azure AI Foundry SDK. In this example, we are fine-tuning the model for a customer support chatbot in multiple languages. from azure.ai import Foundry from azure.ai.model import Model # Initialize Azure AI Foundry foundry = Foundry() # Load the Phi-3.5 model model = Model.load("phi-3.5") # Set up the training dataset training_data = foundry.load_dataset("customer_queries_dataset") # Fine-tune the model model.fine_tune(training_data, epochs=5, learning_rate=0.001) # Save the fine-tuned model model.save("fine_tuned_phi_3.5") Best Practices for Evaluating and Validating Fine-Tuned Models Once the model is fine-tuned, it's essential to evaluate and validate its performance before deploying it in production. Split Data for Validation: Always split your dataset into training and validation sets. This ensures that the model is evaluated on unseen data to prevent overfitting. Evaluate Key Metrics: Measure performance using key metrics such as: Accuracy: The proportion of correct predictions. F1 Score: A measure of precision and recall. Confusion Matrix: Helps visualize true vs. false predictions for classification tasks. Cross-Language Validation: If the model is fine-tuned for multiple languages, test its performance across all supported languages to ensure consistency and accuracy. Test in Production-Like Environments: Before full deployment, test the fine-tuned model in a production-like environment to catch any potential issues. Continuous Monitoring and Re-Fine-Tuning: Once deployed, continuously monitor the model’s performance and re-fine-tune it periodically as new data becomes available. Deploying Phi-3.5 Model After fine-tuning the Phi-3.5 model, the next crucial step is deploying it to make it accessible for real-world applications. This section will cover two key deployment strategies: deploying in Azure for cloud-based scaling and reliability, and deploying locally with AI Toolkit for simpler offline usage. Each deployment strategy offers its own advantages depending on the use case. Deploying in Azure Azure provides a powerful environment for deploying machine learning models at scale, enabling organizations to deploy models like Phi-3.5 with high availability, scalability, and robust security features. Azure AI Foundry simplifies the entire deployment pipeline. Set Up Azure AI Foundry Workspace: Log in to Azure AI Foundry and navigate to the workspace where the Phi-3.5 model was fine-tuned. Go to the Deployments section and create a new deployment environment for the model. Choose Compute Resources: Compute Target: Select a compute target suitable for your deployment. For large-scale usage, it’s advisable to choose a GPU-based compute instance. Example: Choose an Azure Kubernetes Service (AKS) cluster for handling large-scale requests efficiently. Configure Scaling Options: Azure allows you to set up auto-scaling based on traffic. This ensures that the model can handle surges in demand without affecting performance. Model Deployment Configuration: Create an Inference Pipeline: In Azure AI Foundry, set up an inference pipeline for your model. Specify the Model: Link the fine-tuned Phi-3.5 model to the deployment pipeline. Deploy the Model: Select the option to deploy the model to the chosen compute resource. Test the Deployment: Once the model is deployed, test the endpoint by sending sample requests to verify the predictions. Configuration Steps (Compute, Resources, Scaling) During deployment, Azure AI Foundry allows you to configure essential aspects like compute type, resource allocation, and scaling options. Compute Type: Choose between CPU or GPU clusters depending on the computational intensity of the model. Resource Allocation: Define the minimum and maximum resources to be allocated for the deployment. For real-time applications, use Azure Kubernetes Service (AKS) for high availability. For batch inference, Azure Container Instances (ACI) is suitable. Auto-Scaling: Set up automatic scaling of the compute instances based on the number of requests. For example, configure the deployment to start with 1 node and scale to 10 nodes during peak usage. Cost Comparison: Phi-3.5 vs. Larger Language Models When comparing the costs of using Phi-3.5 with larger language models (LLMs), several factors come into play, including computational resources, pricing structures, and performance efficiency. Here’s a breakdown: Cost Efficiency Phi-3.5: Designed as a Small Language Model (SLM), Phi-3.5 is optimized for lower computational costs. It offers competitive performance at a fraction of the cost of larger models, making it suitable for budget-conscious projects. The smaller size (3.8 billion parameters) allows for reduced resource consumption during both training and inference. Larger Language Models (e.g., GPT-3.5): Typically require more computational resources, leading to higher operational costs. Larger models may incur additional costs for storage and processing power, especially in cloud environments. Performance vs. Cost Performance Parity: Phi-3.5 has been shown to achieve performance parity with larger models on various benchmarks, including language comprehension and reasoning tasks. This means that for many applications, Phi-3.5 can deliver similar results to larger models without the associated costs. Use Case Suitability: For simpler tasks or applications that do not require extensive factual knowledge, Phi-3.5 is often the more cost-effective choice. Larger models may still be preferred for complex tasks requiring deep contextual understanding or extensive factual recall. Pricing Structure Azure Pricing: Phi-3.5 is available through Azure with a pay-as-you-go billing model, allowing users to scale costs based on usage. Pricing details for Phi-3.5 can be found on the Azure pricing page, where users can customize options based on their needs. Code Example: API Setup and Endpoints for Live Interaction Below is a Python code snippet demonstrating how to interact with a deployed Phi-3.5 model via an API in Azure: import requests # Define the API endpoint and your API key api_url = "https://<your-azure-endpoint>/predict" api_key = "YOUR_API_KEY" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(api_url, json=input_data, headers={"Authorization": f"Bearer {api_key}"}) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Deploying Locally with AI Toolkit For developers who prefer to run models on their local machines, the AI Toolkit provides a convenient solution. The AI Toolkit is a lightweight platform that simplifies local deployment of AI models, allowing for offline usage, experimentation, and rapid prototyping. Deploying the Phi-3.5 model locally using the AI Toolkit is straightforward and can be used for personal projects, testing, or scenarios where cloud access is limited. Introduction to AI Toolkit The AI Toolkit is an easy-to-use platform for deploying language models locally without relying on cloud infrastructure. It supports a range of AI models and enables developers to work in a low-latency environment. Advantages of deploying locally with AI Toolkit: Offline Capability: No need for continuous internet access. Quick Experimentation: Rapid prototyping and testing without the delays of cloud deployments. Setup Guide: Installing and Running Phi-3.5 Locally Using AI Toolkit Install AI Toolkit: Go to the AI Toolkit website and download the platform for your operating system (Linux, macOS, or Windows). Install AI Toolkit by running the appropriate installation command in your terminal. Download the Phi-3.5 Model: Once AI Toolkit is installed, you can download the Phi-3.5 model locally by running: 3. Run the Model Locally: After downloading the model, start a local session by running: This will launch a local server on your machine where the model will be available for interaction. Code Example: Using Phi-3.5 Locally in a Project Below is a Python code example demonstrating how to send a query to the locally deployed Phi-3.5 model running on the AI Toolkit. import requests # Define the local endpoint local_url = "http://localhost:8000/predict" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(local_url, json=input_data) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Comparing Language Capabilities Test Results: How Phi-3.5 Handles Different Languages The Phi-3.5 model demonstrates robust multilingual capabilities, effectively processing and generating text in various languages. Below are comparative examples showcasing its performance in English, Spanish, and Mandarin: English Example: Input: "What are the benefits of renewable energy?" Output: "Renewable energy sources, such as solar and wind, reduce greenhouse gas emissions and promote sustainability." Spanish Example: Input: "¿Cuáles son los beneficios de la energía renovable?" Output: "Las fuentes de energía renovable, como la solar y la eólica, reducen las emisiones de gases de efecto invernadero y promueven la sostenibilidad." Mandarin Example: Input: "可再生能源的好处是什么?" Output: "可再生能源,如太阳能和风能,减少温室气体排放,促进可持续发展。" Performance Benchmarking and Evaluation Across Different Languages Benchmarking Phi-3.5 across different languages involves evaluating its accuracy, fluency, and contextual understanding. For instance, using BLEU scores and human evaluations, the model can be assessed on its translation quality and coherence in various languages. Real-World Use Case: Multilingual Customer Service Chatbot A practical application of Phi-3.5's multilingual capabilities is in developing a customer service chatbot that can interact with users in their preferred language. For instance, the chatbot could provide support in English, Spanish, and Mandarin, ensuring a wider reach and better user experience. Optimizing and Validating Phi-3.5 Model Model Performance Metrics To validate the model's performance in different scenarios, consider the following metrics: Accuracy: Measure how often the model's outputs are correct or align with expected results. Fluency: Assess the naturalness and readability of the generated text. Contextual Understanding: Evaluate how well the model understands and responds to context-specific queries. Tools to Use in Azure and Ollama for Evaluation Azure Cognitive Services: Utilize tools like Text Analytics and Translator to evaluate performance. Ollama: Use local testing environments to quickly iterate and validate model outputs. Conclusion In summary, Phi-3.5 exhibits impressive multilingual capabilities, effective deployment options, and robust performance metrics. Its ability to handle various languages makes it a versatile tool for natural language processing applications. Phi-3.5 stands out for its adaptability and performance in multilingual contexts, making it an excellent choice for future NLP projects, especially those requiring diverse language support. We encourage readers to experiment with the Phi-3.5 model using Azure AI Foundry or the AI Toolkit, explore fine-tuning techniques for their specific use cases, and share their findings with the community. For more information on optimized fine-tuning techniques, check out the Ignite Fine-Tuning Workshop. References Customize the Phi-3.5 family of models with LoRA fine-tuning in Azure Fine-tune Phi-3.5 models in Azure Fine Tuning with Azure AI Foundry and Microsoft Olive Hands on Labs and Workshop Customize a model with fine-tuning https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=azure-openai%2Cturbo%2Cpython-new&pivots=programming-language-studio Microsoft AI Toolkit - AI Toolkit for VSCode678Views1like2CommentsEnhancing Workplace Safety and Efficiency with Azure AI Foundry's Content Understanding
Discover how Azure AI Foundry’s Content Understanding service, featuring the Video Shot Analysis template, revolutionizes workplace safety and efficiency. By leveraging Generative AI to analyze video data, businesses can gain actionable insights into worker actions, posture, safety risks, and environmental conditions. Learn how this cutting-edge tool transforms operations across industries like manufacturing, logistics, and healthcare.553Views2likes0CommentsMicrosoft Computer Vision Test
Please refer to my repo to get more AI resources, wellcome to star it: https://github.com/xinyuwei-david/david-share.git This article if from one of my repo: https://github.com/xinyuwei-david/david-share/tree/master/Multimodal-Models/Computer-Vision I have developed 2 Python programs that runs on Windows and utilizes Azure Computer Vision (Azure CV) . Perform object recognition on images selected by the user. After the recognition is complete, the user can choose the objects they wish to retain (one or more). The selected objects are then cropped and saved locally. Do background remove based on the images and the object user select. Object detection and image segmentation: Please refer to my demo vedio on Yutube: https://youtu.be/edjB-PDapN8 Currently, the background removal API of Azure CV has been discontinued. In the future, this functionality can be achieved through the region-to-segmentation feature of Florence-2. For detailed implementation, please refer to: https://huggingface.co/microsoft/Florence-2-large/blob/main/sample_inference.ipynb Object recognition and background remove: Please refer to my demo vedio on Yutube: https://youtu.be/6x49D3YUTGA Code for Object detection and image segmentation import requests from PIL import Image, ImageTk, ImageDraw import tkinter as tk from tkinter import messagebox, filedialog import threading # Azure Computer Vision API 信息 subscription_key = "o" endpoint = "https://cv-2.cognitiveservices.azure.com/" # 图像分析函数 def analyze_image(image_path): analyze_url = endpoint + "vision/v3.2/analyze" headers = { 'Ocp-Apim-Subscription-Key': subscription_key, 'Content-Type': 'application/octet-stream' } params = {'visualFeatures': 'Objects'} try: with open(image_path, 'rb') as image_data: response = requests.post( analyze_url, headers=headers, params=params, data=image_data, timeout=10 # 设置超时时间为10秒 ) response.raise_for_status() analysis = response.json() print("图像分析完成") return analysis except requests.exceptions.Timeout: print("请求超时,请检查网络连接或稍后重试。") messagebox.showerror("错误", "请求超时,请检查网络连接或稍后重试。") except Exception as e: print("在 analyze_image 中发生异常:", e) messagebox.showerror("错误", f"发生错误:{e}") # 背景移除函数 def remove_background(image_path, objects_to_keep): print("remove_background 被调用") try: image = Image.open(image_path).convert("RGBA") width, height = image.size # 创建一个透明背景的图像 new_image = Image.new("RGBA", image.size, (0, 0, 0, 0)) # 创建一个与图像大小相同的掩码 mask = Image.new("L", (width, height), 0) draw = ImageDraw.Draw(mask) # 在掩码上绘制要保留的对象区域 for obj in objects_to_keep: x1, y1, x2, y2 = obj['coords'] # 将坐标转换为整数 x1, y1, x2, y2 = map(int, [x1, y1, x2, y2]) # 绘制矩形区域,填充为白色(表示保留) draw.rectangle([x1, y1, x2, y2], fill=255) # 应用掩码到原始图像上 new_image.paste(image, (0, 0), mask) print("背景移除完成,显示结果") new_image.show() # 保存结果 save_path = filedialog.asksaveasfilename( defaultextension=".png", filetypes=[('PNG 图像', '*.png')], title='保存结果图像' ) if save_path: new_image.save(save_path) messagebox.showinfo("信息", f"处理完成,结果已保存到:{save_path}") except Exception as e: print("在 remove_background 中发生异常:", e) messagebox.showerror("错误", f"发生错误:{e}") print("remove_background 完成") # GUI 界面 def create_gui(): # 创建主窗口 root = tk.Tk() root.title("选择要保留的对象") # 添加选择图像的按钮 def select_image(): image_path = filedialog.askopenfilename( title='选择一张图像', filetypes=[('图像文件', '*.png;*.jpg;*.jpeg;*.bmp'), ('所有文件', '*.*')] ) if image_path: show_image(image_path) else: messagebox.showwarning("警告", "未选择图像文件。") def show_image(image_path): analysis = analyze_image(image_path) if analysis is None: print("分析结果为空,无法创建 GUI") return # 加载图像 pil_image = Image.open(image_path) img_width, img_height = pil_image.size tk_image = ImageTk.PhotoImage(pil_image) # 创建 Canvas canvas = tk.Canvas(root, width=img_width, height=img_height) canvas.pack() # 在 Canvas 上显示图像 canvas.create_image(0, 0, anchor='nw', image=tk_image) canvas.tk_image = tk_image # 保留对图像的引用 # 记录对象的矩形、标签和选择状态 object_items = [] # 处理每个检测到的对象 for obj in analysis['objects']: rect = obj['rectangle'] x = rect['x'] y = rect['y'] w = rect['w'] h = rect['h'] obj_name = obj['object'] # 绘制对象的边界框 rect_item = canvas.create_rectangle( x, y, x + w, y + h, outline='red', width=2 ) # 显示对象名称 text_item = canvas.create_text( x + w/2, y - 10, text=obj_name, fill='red' ) # 将对象的选择状态初始化为未选中 selected = False # 将对象的信息添加到列表 object_items.append({ 'rect_item': rect_item, 'text_item': text_item, 'coords': (x, y, x + w, y + h), 'object': obj_name, 'selected': selected }) # 定义点击事件处理函数 def on_canvas_click(event): for item in object_items: x1, y1, x2, y2 = item['coords'] if x1 <= event.x <= x2 and y1 <= event.y <= y2: # 切换选择状态 item['selected'] = not item['selected'] if item['selected']: # 已选中,边框设为绿色 canvas.itemconfig(item['rect_item'], outline='green') canvas.itemconfig(item['text_item'], fill='green') else: # 未选中,边框设为红色 canvas.itemconfig(item['rect_item'], outline='red') canvas.itemconfig(item['text_item'], fill='red') break canvas.bind("<Button-1>", on_canvas_click) # 提交按钮 def on_submit(): print("on_submit 被调用") selected_objects = [] for item in object_items: if item['selected']: # 如果对象被选中,保存其信息 selected_objects.append(item) if not selected_objects: messagebox.showwarning("警告", "请至少选择一个对象。") else: # 调用背景消除函数 threading.Thread(target=remove_background, args=(image_path, selected_objects)).start() print("on_submit 完成") submit_button = tk.Button(root, text="提交", command=on_submit) submit_button.pack() # 添加选择图像的按钮 select_button = tk.Button(root, text="选择图像", command=select_image) select_button.pack() root.mainloop() # 示例使用 if __name__ == "__main__": create_gui() Demo result: Code for Object recognition and background remove On GPU VM: from transformers import AutoProcessor, AutoModelForCausalLM from PIL import Image, ImageDraw, ImageChops import torch import numpy as np import ipywidgets as widgets from IPython.display import display, clear_output import io # Load the model model_id = 'microsoft/Florence-2-large' device = 'cuda' if torch.cuda.is_available() else 'cpu' model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype='auto' ).to(device) processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True ) def run_example(task_prompt, image, text_input=None): if text_input is None: prompt = task_prompt else: prompt = task_prompt + text_input # Process inputs inputs = processor( text=prompt, images=image, return_tensors="pt" ) # Move inputs to the device with appropriate data types inputs = { "input_ids": inputs["input_ids"].to(device), # input_ids are integers (int64) "pixel_values": inputs["pixel_values"].to(device, torch.float16) # pixel_values need to be float16 } with torch.no_grad(): generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, early_stopping=False, do_sample=False, num_beams=3, ) generated_text = processor.batch_decode( generated_ids, skip_special_tokens=False )[0] parsed_answer = processor.post_process_generation( generated_text, task=task_prompt, image_size=(image.width, image.height) ) return parsed_answer def create_mask(image_size, prediction): mask = Image.new('L', image_size, 0) mask_draw = ImageDraw.Draw(mask) for polygons in prediction['polygons']: for _polygon in polygons: _polygon = np.array(_polygon).reshape(-1, 2) if len(_polygon) < 3: continue _polygon = _polygon.flatten().tolist() mask_draw.polygon(_polygon, outline=255, fill=255) return mask def combine_masks(masks): combined_mask = Image.new('L', masks[0].size, 0) for mask in masks: combined_mask = ImageChops.lighter(combined_mask, mask) return combined_mask def apply_combined_mask(image, combined_mask): # Convert the image to RGBA image = image.convert('RGBA') result_image = Image.new('RGBA', image.size, (255, 255, 255, 0)) result_image = Image.composite(image, result_image, combined_mask) return result_image def process_image_multiple_objects(image, descriptions): """ Process the image for multiple object descriptions. Parameters: - image: PIL.Image object. - descriptions: list of strings, descriptions of objects to retain. Returns: - output_image: Processed image with the specified objects retained. """ masks = [] for desc in descriptions: print(f"Processing description: {desc}") results = run_example('<REFERRING_EXPRESSION_SEGMENTATION>', image, text_input=desc.strip()) prediction = results['<REFERRING_EXPRESSION_SEGMENTATION>'] if not prediction['polygons']: print(f"No objects found for description: {desc}") continue # Generate mask for this object mask = create_mask(image.size, prediction) masks.append(mask) if not masks: print("No objects found for any of the descriptions.") return image.convert('RGBA') # Combine all masks combined_mask = combine_masks(masks) # Apply the combined mask output_image = apply_combined_mask(image, combined_mask) return output_image def on_file_upload(change): # Clear any previous output (except for the upload widget) clear_output(wait=True) display(widgets.HTML("<h3>Please upload an image file using the widget below:</h3>")) display(upload_button) # Check if a file has been uploaded if upload_button.value: # Get the first uploaded file uploaded_file = upload_button.value[0] # Access the content of the file image_data = uploaded_file.content image = Image.open(io.BytesIO(image_data)).convert('RGB') # Display the uploaded image print("Uploaded Image:") display(image) # Create a text box for object descriptions desc_box = widgets.Text( value='', placeholder='Enter descriptions of objects to retain, separated by commas', description='Object Descriptions:', disabled=False, layout=widgets.Layout(width='80%') ) # Create a button to submit the descriptions submit_button = widgets.Button( description='Process Image', disabled=False, button_style='primary', tooltip='Click to process the image', icon='check' ) # Function to handle the button click def on_submit_button_click(b): object_descriptions = desc_box.value if not object_descriptions.strip(): print("Please enter at least one description.") return # Disable the button to prevent multiple clicks submit_button.disabled = True # Clear previous output clear_output(wait=True) print("Processing the image. This may take a few moments...") # Split the descriptions by commas descriptions_list = [desc.strip() for desc in object_descriptions.split(',') if desc.strip()] if not descriptions_list: print("No valid descriptions entered. Exiting the process.") return # Process the image output_image = process_image_multiple_objects(image, descriptions_list) # Display the result display(output_image) # Optionally, save the output image # Uncomment the lines below to save the image # output_image.save('output_image.png') # print("The image with background removed has been saved as 'output_image.png'") submit_button.on_click(on_submit_button_click) # Display the text box and submit button display(widgets.VBox([desc_box, submit_button])) # Create the upload widget upload_button = widgets.FileUpload( accept='image/*', multiple=False ) display(widgets.HTML("<h3>Please upload an image file using the widget below:</h3>")) display(upload_button) # Observe changes in the upload widget upload_button.observe(on_file_upload, names='value') GPU resource needed during inference:234Views0likes0Comments