slm
27 TopicsBuilding AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook578Views4likes1CommentAI Sparks: AI Toolkit for VS Code - from playground to production
Are you building AI-powered applications from scratch or infusing intelligence into existing production code and systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) for VS Code from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. Multi-Modal AI – Work with images, text, and beyond. Agentic Frameworks – Build autonomous, decision-making AI systems. What will you learn from this session? Whether you're a developer, startup founder, or AI enthusiast, you'll gain practical insights, live demos, and actionable takeaways to level up your AI integration journey. Join us and spark your AI transformation! You can click here and register for the entire series on the reactor page Episode list You can also sign up for the individual episode and read about the topics covered using the following links: Feb 13 th 2025 – WATCH ON DEMAND @ Introduction to AI toolkit and feature walkthrough In In this episode, we’ll introduce the AI Toolkit extension for VS Code—a powerful way to explore and integrate the latest AI models from OpenAI, Meta, Deepseek, Mistral, and more. With this extension, you can browse state-of-the-art models, download some for local use, or experiment with others remotely. Whether you're enhancing an existing application or building something new, the AI Toolkit simplifies the process of selecting and integrating the right model for your needs. Feb 27 th 2025 – A short introduction to SLMs and local model with use cases In this episode, we’ll explore Small Language Models (SLMs) and how they compare to larger models. SLMs are efficient, require less compute and memory, and can run on edge devices while still excelling at a variety of tasks. We’ll dive into the Phi-3.5 and Phi-4 model series and demonstrate how to build a practical application using these models. Mar 13 th 2025 – How to work with embedding models and build a RAG application In this episode, we’ll dive into embedding models—important tools for working with vector databases and large language models. These models convert text into numerical representations, making it easier to process and retrieve information efficiently. After covering the core concepts, we’ll apply them in practice by building a Retrieval-Augmented Generation (RAG) app using Small Language Models (SLMs) and a vector database. Mar 27 th 2025 – Multi-modal support and image analysis In this episode, we’ll dig deeper into multi-modal capabilities of local and remote AI models and use visualization tools for better insights. We’ll also dive into multi-modal support in the AI Toolkit, showcasing how to process and analyze images alongside text. By the end, you’ll see how these capabilities come together to enhance powerful AI applications. Apr 10 th 2025 – Evaluations – How to choose the best model for you applications needs In this episode, we’ll explore how to evaluate AI models and choose the right one for your needs. We’ll cover key performance metrics, compare different models, and demonstrate testing strategies using features like Playground, Bulk Run, and automated evaluations. Whether you're experimenting with the latest models or transitioning to a new version, these evaluation techniques will help you make informed decisions with confidence. Apr 24 th 2025 – Agents and Agentic Frameworks In this episode, we’ll explore agents and agentic frameworks—systems that enable AI models to make decisions, take actions, and automate complex tasks. We’ll break down how these frameworks work, their practical applications, and how to build and integrate them into your projects. By the end, you’ll have a clear understanding of how to build and leverage AI agents effectively. We will explore how to use and build agentic frameworks using AI Toolkit. Resources AI toolkit for VSCode - https://aka.ms/AIToolkit AI toolkit for VSCode Documentation - https://aka.ms/AIToolkit/doc Building Retrieval Augmented Generation (RAG) apps on VSCode & AI Toolkit Understanding and using Reasoning models such as DeepSeek R1 on AI toolkit - Using Ollama and OpenAI, Google and Anthropic hosted models with AI toolkit AI Sparks - YouTube PlaylistAI Genius - AI Skilling series for Developers
We are conducting a six-part AI Skilling series called AI Genius starting January 28th, 2025, to kickstart your AI learning journey from beginner to advanced use cases. The series will feature experts from Microsoft talking about different aspects of using AI and building AI Applications. This is targeted towards developers who are looking to upskill their AI capabilities in the latest AI technologies such as SLMs, RAG and AI agents.Using Advanced Reasoning Model on EdgeAI Part 2 - Evaluate local models using AITK for VSCode
Quantizing a generative AI model assists in ease of deployment of edge devices. For development purposes we would like to have 1) quantization 2) optimization and 3) evaluation of models all in one tool. We recommend AI Toolkit for Visual Studio Code (AITK) for as an all-in-one tool for developers experimenting with GenAI models. This is a lightweight open-source tool that covers model selection, fine-tuning, deployment, application development, batch data for testing, and evaluation. In Using Advanced Reasoning Model on EdgeAI Part 1 - Quantization, Conversion, Performance, we performed quantization and format conversion for Phi-4 and DeepSeek-R1-Distill-Qwen-1.5B. In This blog will evaluate Phi-4-14B-ONNX-INT4-GPU and DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU using AITK. About model quantization Quantization refers to the process of approximating the continuous value of a signal to a finite number of discrete values. It can be understood as a method of information compression. When considering this concept on a computer system, it is generally represented by "low bits". Some people also call quantization "fixed-point", but strictly speaking, the range represented is reduced. Fixed-point refers specifically to linear quantization with a scale of 2, which is a more practical quantization method. Our trained models generally have the characteristics of large number of parameters, large amount of calculation, high memory usage, and high precision. However, after model quantization, we can compress parameters, increase the calculation speed of the model, and reduce memory usage, but there is a problem that the quantization approximation algorithm reduces the precision. In the use of generative AI, we will face some errors and omissions in the results. We therefore need to evaluate the quantization model. Use AITK to evaluate quantitative models Model deployment AITK is an open-source tool with GenAIOps, based on Visual Studio Code. If you want to evaluate the model, you first need to deploy the model, which can be deployed locally or remotely. Currently AITK only supports the deployment of ONNX models in the Model Catalog (Phi-3/Phi-3.5 Family/Mistral 7B) To test models that are not currently supported within model catalog we propose API that is compatible with AITK. This uses the remote deployment method of the Open AI Chat completion. We use Python + Flask to create the Open AI Chat completion service, the code is as follows: from flask import Flask, request, jsonify, stream_with_context, Response import json import onnxruntime_genai as og import numpy as np import requests import uuid, time app = Flask(__name__) model_path = "Your Phi-4-14B-ONNX-INT4-GPU or DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU path" model = og.Model(model_path) tokenizer = og.Tokenizer(model) tokenizer_stream = tokenizer.create_stream() # Your Phi-4 chat template chat_template = "<|user|>{input}<|assistant|>" # Your DeepSeek-R1-Distill-Qwen chat template # chat_template = "<|im_start|> user {input}<|im_end|><|im_start|> assistant" @app.route('/v1/chat/completions', methods=['POST']) def ort_chat(): data = request.get_json() messages = data.get("messages") if not messages or not isinstance(messages, list): return jsonify({ "error": { "message": "Invalid request. 'messages' is required and must be a list of strings.", "type": "invalid_request_error", "param": "messages", "code": None } }), 400 prompt = '' if messages: last_message = messages[-1] if last_message.get("role") == "user" and last_message.get("content"): prompt = last_message['content'] search_options = {} search_options['max_length'] = data.get("max_tokens") search_options['temperature'] = data.get("temperature") search_options['top_p'] = data.get("top_p") search_options['past_present_share_buffer'] = False prompt = f'{chat_template.format(input=messages)}' input_tokens = tokenizer.encode(prompt) params = og.GeneratorParams(model) params.set_search_options(**search_options) params.input_ids = input_tokens generator = og.Generator(model, params) reply = "" while not generator.is_done(): generator.compute_logits() generator.generate_next_token() new_token = generator.get_next_tokens()[0] reply += tokenizer.decode(new_token) if data.get("stream"): def generate(): tokens = reply.split() for token in tokens: chunk = { "id": "chatcmpl-"+ str(uuid.uuid4()), "object": "chat.completion.chunk", "created": int(time.time()), "model": data.get("model"), "choices": [ { "delta": {"content": token + " "}, "index": 0, "finish_reason": None } ] } yield "data: " + json.dumps(chunk) + "\n\n" time.sleep(0.5) yield "data: [DONE]\n\n" return Response(stream_with_context(generate()), mimetype="text/event-stream") response_data = { "id": "chatcmpl-"+ str(uuid.uuid4()), "object": "chat.completion", "created": int(time.time()), "model": data.get("model"), "choices": [ { "index": 0, "message": { "role": "assistant", "content": reply }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": len(messages), "completion_tokens": len(reply.split()), "total_tokens": len(reply.split()) } } return jsonify(response_data) if __name__ == '__main__': app.run(debug=True, port=5000) After create service, through AITK's My MODELS Remote models select "Add a custom model" select the API (http://127.0.0.1:5000/v1/chat/completions) set the name such as DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU or Phi-4-14B-ONNX-INT4-GPU. We can test it in playground DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Playground Phi-4-14B-ONNX-INT4-GPU Playground Batch Data We can use AITK's Bulk Data to set up the execution environment for batch data. I simulate the execution of 10 data here. We can execute in batches to see the relevant results. This is an important step in the evaluation because we need to know the results based on a specific problem before the evaluation. After the run, we can export the run results Evaluation You can evaluate the exported batch data results. AITK supports different evaluation methods. Create an evaluation by selecting Evaluation from Tools and importing the data exported from Bulk Data. You can choose different evaluation methods. We select F1 to complete the evaluation selection, Let's take a look at the evaluation results of DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Summary The AI Toolkit (AITK) enables us to assess various models, which is instrumental for our edge deployments. As Responsible AI gains importance, continuous testing and evaluation post-deployment become essential. In this context, AITK serves as a crucial tool for our daily operations. Resource Learn about AI Toolkit for Visual Studio Code https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio DeepSeek-R1 in GitHub Models https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1 DeepSeek-R1 in Azure AI Foundry https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek Phi-4-14B in Hugging face https://huggingface.co/microsoft/phi-4 Learn about Microsoft Olive https://github.com/microsoft/olive Learn about ONNX Runtime GenAI https://github.com/microsoft/onnxruntime-genai285Views0likes0CommentsGetting Started with the AI Dev Gallery
The AI Dev Gallery is a new open-source project designed to inspire and support developers in integrating on-device AI functionality into their Windows apps. It offers an intuitive UX for exploring and testing interactive AI samples powered by local models. Key features include: Quickly explore and download models from well-known sources on GitHub and HuggingFace. Test different models with interactive samples over 25 different scenarios, including text, image, audio, and video use cases. See all relevant code and library references for every sample. Switch between models that run on CPU and GPU depending on your device capabilities. Quickly get started with your own projects by exporting any sample to a fresh Visual Studio project that references the same model cache, preventing duplicate downloads. Part of the motivation behind the Gallery was exposing developers to the host of benefits that come with on-device AI. Some of these benefits include improved data security and privacy, increased control and parameterization, and no dependence on an internet connection or third-party cloud provider. Requirements Device Requirements Minimum OS Version: Windows 10, version 1809 (10.0; Build 17763) Architecture: x64, ARM64 Memory: At least 16 GB is recommended Disk Space: At least 20GB free space is recommended GPU: 8GB of VRAM is recommended for running samples on the GPU Visual Studio 2022 You will need Visual Studio 2022 with the Windows Application Development workload. Running the Gallery To run the gallery: Clone the repository: git clone https://github.com/microsoft/AI-Dev-Gallery.git Run the solution: .\AIDevGallery.sln Hit F5 to build and run the Gallery Using the Gallery The AI Dev Gallery has can be navigated in two ways: The Samples View The Models View Navigating Samples In this view, samples are broken up into categories (Text, Code, Image, etc.) and then into more specific samples, like in the Translate Text pictured below: On clicking a sample, you will be prompted to choose a model to download if you haven’t run this sample before: Next to the model you can see the size of the model, whether it will run on CPU or GPU, and the associated license. Pick the model that makes the most sense for your machine. You can also download new models and change the model for a sample later from the sample view. Just click the model drop down at the top of the sample: The last thing you can do from the Sample pane is view the sample code and export the project to Visual Studio. Both buttons are found in the top right corner of the sample, and the code view will look like this: Navigating Models If you would rather navigate by models instead of samples, the Gallery also provides the model view: The model view contains a similar navigation menu on the right to navigate between models based on category. Clicking on a model will allow you to see a description of the model, the versions of it that are available to download, and the samples that use the model. Clicking on a sample will take back over to the samples view where you can see the model in action. Deleting and Managing Models If you need to clear up space or see download details for the models you are using, you can head over the Settings page to manage your downloads: From here, you can easily see every model you have downloaded and how much space on your drive they are taking up. You can clear your entire cache for a fresh start or delete individual models that you are no longer using. Any deleted model can be redownload through either the models or samples view. Next Steps for the Gallery The AI Dev Gallery is still a work in progress, and we plan on adding more samples, models, APIs, and features, and we are evaluating adding support for NPUs to take the experience even further If you have feedback, noticed a bug, or any ideas for features or samples, head over to the issue board and submit an issue. We also have a discussion board for any other topics relevant to the Gallery. The Gallery is an open-source project, and we would love contribution, feedback, and ideation! Happy modeling!3.7KViews4likes3CommentsAI Toolkit for VS Code January Update
AI Toolkit is a VS Code extension aiming to empower AI engineers in transforming their curiosity into advanced generative AI applications. This toolkit, featuring both local-enabled and cloud-accelerated inner loop capabilities, is set to ease model exploration, prompt engineering, and the creation and evaluation of generative applications. We are pleased to announce the January Update to the toolkit with support for OpenAI's o1 model and enhancements in the Model Playground and Bulk Run features. What's New? January’s update brings several exciting new features to boost your productivity in AI development. Here's a closer look at what's included: Support for OpenAI’s new o1 Model: We've added access to GitHub hosted OpenAI’s latest o1 model. This new model replaces the o1-preview and offers even better performance in handling complex tasks. You can start interacting with the o1 model within VS Code for free by using the latest AI Toolkit update. Chat History Support in Model Playground: We have heard your feedback that tracking past model interactions is crucial. The Model Playground has been updated to include support for chat history. This feature saves chat history as individual files stored entirely on your local machine, ensuring privacy and security. Bulk Run with Prompt Templating: The Bulk Run feature, introduced in the AI Toolkit December release, now supports prompt templating with variables. This allows users to create templates for prompts, insert variables, and run them in bulk. This enhancement simplifies the process of testing multiple scenarios and models. Stay tuned for more updates and enhancements as we continue to innovate and support your journey in AI development. Try out the AI Toolkit for Visual Studio Code, share your thoughts, and file issues and suggest features in our GitHub repo. Thank you for being a part of this journey with us!Fine-Tuning Language Models with Azure AI Foundry: A Detailed Guide
What is Azure AI Foundry? Azure AI Foundry is a comprehensive platform designed to simplify the development, deployment, and management of AI models. It provides a user-friendly interface and powerful tools that enable developers to create custom AI solutions without needing extensive machine learning expertise. Key Features of Azure AI Foundry One-Button Fine-Tuning: A streamlined process that allows users to fine-tune models with minimal configuration. Integration with Development Tools: Seamless integration with popular development environments, particularly Visual Studio Code. Support for Multiple Models: Access to a variety of pre-trained models, including the Phi family of models. Understanding Fine-Tuning Fine-tuning is the process of taking a pre-trained model and adapting it to a specific dataset or task. This is particularly useful when the base model has been trained on a large corpus of general data but needs to perform well on a narrower domain. Why Fine-Tune? Improved Performance: Fine-tuning can significantly enhance the model's accuracy and relevance for specific tasks. Reduced Training Time: Starting with a pre-trained model reduces the amount of data and time required for training. Customization: Tailor the model to meet the unique needs of your application or business. One-Button Fine-Tuning in Azure AI Foundry Step-by-Step Process Select the Model: Log in to Azure AI Foundry and navigate to the model selection interface. Choose Phi-3 or another small language model from the available options. Prepare Your Data: Ensure your dataset is formatted correctly. Typically, this involves having a set of input-output pairs that the model can learn from. Upload your dataset to Azure AI Foundry. The platform supports various data formats, making it easy to integrate your existing data. Initiate Fine-Tuning: Locate the one-button fine-tuning feature within the Azure AI Foundry interface. Click the button to start the fine-tuning process. The platform will handle the configuration and setup automatically. Monitor Progress: After initiating fine-tuning, you can monitor the process through the Azure portal. The portal provides real-time updates on training metrics, allowing you to track the model's performance as it learns. Evaluate the Model: Once fine-tuning is complete, evaluate the model's performance using a validation dataset. Azure AI Foundry provides tools for assessing accuracy, precision, recall, and other relevant metrics. Deploy the Model: After successful evaluation, you can deploy the fine-tuned model directly from Azure AI Foundry. The platform supports various deployment options, including REST APIs and integration with other Azure services. Using the AI Toolkit in Visual Studio Code Overview of the AI Toolkit The AI Toolkit for Visual Studio Code enhances the development experience by providing tools specifically designed for AI model management and fine-tuning. This integration allows developers to work within a familiar environment while leveraging powerful AI capabilities. Key Features of the AI Toolkit 1) Model Management: Easily manage and switch between different models, including Phi-3 and Ollama models. 2) Data Handling: Simplified data upload and preprocessing tools to prepare datasets for training. 3) Real-Time Collaboration: Collaborate with team members in real-time, sharing insights and progress on AI projects. How to Use the AI Toolkit 1) Install the AI Toolkit: Open Visual Studio Code and navigate to the Extensions Marketplace. Search for "AI Toolkit" and install the extension. 2) Connect to Azure AI Foundry: Once installed, configure the toolkit to connect to your Azure AI Foundry account. This will allow you to access your models and datasets directly from Visual Studio Code. 3) Fine-Tune Models: Use the toolkit to initiate fine-tuning processes directly from your development environment. Monitor training progress and view logs without leaving Visual Studio Code. 4) Consume Ollama Models: The AI Toolkit supports the consumption of Ollama models, providing additional flexibility in your AI projects. This feature allows you to integrate various models seamlessly, enhancing your application's capabilities. Microsoft ONNX Live for Fine-Tuning What is Microsoft ONNX Live? Microsoft ONNX Live is a platform that allows developers to deploy and optimize AI models using the Open Neural Network Exchange (ONNX) format. ONNX is an open-source format that enables interoperability between different AI frameworks, making it easier to deploy models across various environments. Key Features of Microsoft ONNX Live Model Optimization: ONNX Live provides tools to optimize models for performance, ensuring they run efficiently in production environments. Cross-Framework Compatibility: Models trained in different frameworks (like PyTorch or TensorFlow) can be converted to ONNX format, allowing for greater flexibility in deployment. Real-Time Inference: ONNX Live supports real-time inference, enabling applications to utilize AI models for immediate predictions. Fine-Tuning with ONNX Live Model Conversion: If you have a model trained in a different framework, you can convert it to ONNX format using tools provided by Microsoft. This conversion allows you to leverage the benefits of ONNX Live for deployment and optimization. Integration with Azure AI Foundry: Once your model is in ONNX format, you can integrate it with Azure AI Foundry for fine-tuning. The one-button fine-tuning feature can be used to adapt the ONNX model to your specific dataset. Optimization Techniques: After fine-tuning, you can apply various optimization techniques available in ONNX Live to enhance the model's performance. Techniques such as quantization and pruning can significantly reduce the model size and improve inference speed. Deployment: Once optimized, the model can be deployed directly from Azure AI Foundry or ONNX Live. This deployment can be done as a REST API, allowing easy integration with web applications and services. Additional Resources To further enhance your understanding and capabilities in fine-tuning language models, consider exploring the following resources: Phi-3 Cookbook: This comprehensive guide provides insights into getting started with Phi models, including best practices for fine-tuning and deployment. Explore the Phi-3 Cookbook. Ignite Fine-Tuning Workshop: This workshop offers a hands-on approach to learning about fine-tuning techniques and tools. It includes real-world scenarios to help you understand the practical applications of fine-tuning. Visit the GitHub Repository. Conclusion Fine-tuning language models like Phi-3 using Azure AI Foundry, combined with the AI Toolkit in Visual Studio Code and Microsoft ONNX Live, provides a powerful and efficient workflow for developers. The one-button fine-tuning feature simplifies the process, while the integration with ONNX Live allows for optimization and deployment flexibility. By leveraging these tools, you can enhance your AI applications, ensuring they are tailored to meet specific needs and perform optimally in production environments. Whether you are a seasoned AI developer or just starting, Azure AI Foundry and its associated tools offer a robust ecosystem for building and deploying advanced AI solutions. References Microsoft Docs Links Fine-Tuning Models in Azure OpenAI Azure AI Services Documentation Azure Machine Learning Documentation Microsoft Learn Links Develop Generative AI Apps in Azure Fine-Tune a Language Model Azure AI Foundry Overview Get started with AI Toolkit for Visual Studio Code859Views0likes0CommentsRecipe Generator Application with Phi-3 Vision on AI Toolkit Locally
In today's data-driven world, images have become a ubiquitous source of information. From social media feeds to medical imaging, we encounter and generate images constantly. Extracting meaningful insights from these visual data requires sophisticated analysis techniques. In this blog post let’s build an Image Analysis Application using the cutting-edge Phi-3 Vision model completely free of cost and on-premise environment using the VS Code AI Toolkit. We'll explore the exciting possibilities that this powerful combination offers. The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models. I would recommend going through the following blogs for getting started with VS Code AI Toolkit. 1. Visual Studio Code AI Toolkit: How to Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys Setup VS Code AI Toolkit: Launch the VS Code application and Click on the VS Code AI Toolkit extension. Login to the GitHub account if not already done. Once ready, click on model catalog. In the model catalog there are a lot of models, broadly classified into two categories, Local Run (with CPU and with GPU) Remote Access (Hosted by GitHub and other providers) For this blog, we will be using a Local Run model. This will utilize the local machine’s hardware to run the Language model. Since it involves analyzing images, we will be using the language model which supports vision operations and hence Phi-3-Vision will be a good fit as its light and supports local run. Download the model and then further it will be loaded it in the playground to test. Once downloaded, Launch the “Playground” tab and load the Phi-3 Vision model from the dropdown. The Playground also shows that Phi-3 vision allows image attachments. We can try it out before we start developing the application. Let’s upload the image using the “Paperclip icon” on the UI. I have uploaded image of Microsoft logo and prompted the language model to Analyze and explain the image. Phi-3 vision running on local premise boasts an uncanny ability to not just detect but unerringly pinpoint the exact Company logo and decipher the name with astonishing precision. This is a simple use case, but it can be built upon with various applications to unlock a world of new possibilities. Port Forwarding: Port Forwarding, a valuable feature within the AI Toolkit, serves as a crucial gateway for seamless communication with the GenAI model. To do this, launch the terminal and navigate to the “Ports” section. There will be button “Forward a Port”, click on that and select any desired port, in this blog we will use 5272 as the port. The Model-as-a-server is now ready, where the model will be available on the port 5272 to respond to the API calls. It can be tested with any API testing application. To know more click here. Creating Application with Python using OpenAI SDK: To follow this section, Python must be installed on the local machine. Launch the new VS Code window and set the working directory. Create a new Python Virtual environment. Once the setup is ready, open the terminal on VS Code, and install the libraries using “pip”. pip install openai pip install streamlit Before we build the streamlit application, lets develop the basic program and check the responses in the VSCode terminal and then further develop a basic webapp using the streamlit framework. Basic Program Import libraries: import base64 from openai import OpenAI base64: The base64 module provides functions for encoding binary data to base64-encoded strings and decoding base64-encoded strings back to binary data. Base64 encoding is commonly used for encoding binary data in text-based formats such as JSON or XML. OpenAI: The OpenAI package is a Python client library for interacting with OpenAI's API. The OpenAI class provides methods for accessing various OpenAI services, such as generating text, performing natural language processing tasks, and more. Initialize Client: Initialize an instance of the OpenAI class from the openai package, client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) OpenAI (): Initializes a OpenAI model with specific parameters, including a base URL for the API, an API key, a custom model name, and a temperature setting. This model is used to generate responses based on user queries. This instance will be used to interact with the OpenAI API. base_url = "http://127.0.0.1:5272/v1/": Specifies the base URL for the OpenAI API. In this case, it points to a local server running on 127.0.0.1 (localhost) at port 5272. api_key = "ai-toolkit": The API key used to authenticate requests to the OpenAI API. In case of AI Toolkit usage, we don’t have to specify any API key. The image analysis application will frequently deal with images uploaded by users. But to send these images to GenAI model, we need them in a format it understands. This is where the encode_image function comes in. # Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") Function Definition: def encode_image(image_path): defines a function named encode_image that takes a single argument, image_path. This argument represents the file path of the image we want to encode. Opening the Image: with open(image_path, "rb") as image_file: opens the image file specified by image_path in binary reading mode ("rb"). This is crucial because we're dealing with raw image data, not text. Reading Image Content: image_file.read() reads the entire content of the image file into a byte stream. Remember, images are stored as collections of bytes representing color values for each pixel. Base64 Encoding: base64.b64encode(image_file.read()) encodes the byte stream containing the image data into base64 format. Base64 encoding is a way to represent binary data using a combination of printable characters, which makes it easier to transmit or store the data. Decoding to UTF-8: .decode("utf-8") decodes the base64-encoded data into a UTF-8 string. This step is necessary because the OpenAI API typically expects text input, and the base64-encoded string can be treated as text containing special characters. Returning the Encoded Image: return returns the base64-encoded string representation of the image. This encoded string is what we'll send to the AI model for analysis. In essence, the encode_image function acts as a bridge, transforming an image file on your computer into a format that the AI model can understand and process. Path for the Image: We will use an image stored on our local machine for this section, while we develop the webapp, we will change this to accept it to what the user uploads. image_path = "C:/img.jpg" #path of the image here This line of code is crucial for any program that needs to interact with an image file. It provides the necessary information for the program to locate and access the image data. Base64 String: # Getting the base64 string base64_image = encode_image(image_path) This line of code is responsible for obtaining the base64-encoded representation of the image specified by the image_path. Let's break it down: encode_image(image_path): This part calls the encode_image function, which we've discussed earlier. This function takes the image_path as input and performs the following: Reads the image file from the specified path. Converts the image data into a base64-encoded string. Returns the resulting base64-encoded string. base64_image = ...: This part assigns the return value of the encode_image function to the variable base64_image. This section effectively fetches the image from the given location and transforms it into a special format (base64) that can be easily handled and transmitted by the computer system. This base64-encoded string will be used subsequently to send the image data to the AI model for analysis. Invoking the Language Model: This code tells the AI model what to do with the image. response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in the Image?", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) response = client.chat.completions.create(...): This line sends instructions to the AI model we're using (represented by client). Here's a breakdown of what it's telling the model: chat.completions.create: We're using a specific part of the OpenAI API designed for having a conversation-like interaction with the model. The ... part: This represents additional details that define what we want the model to do, which we'll explore next. Let's break down the details (...) sent to the model: 1) model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx": This tells the model exactly which AI model to use for analysis. In our case, it's the "Phi-3-vision" model. 2) messages: This defines what information we're providing to the model. Here, we're sending two pieces of information: role": "user": This specifies that the first message comes from a user (us). The content: This includes two parts: "What's in the Image?": This is the prompt we're sending to the model about the image. "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}: This sends the actual image data encoded in base64 format (stored in base64_image). In a nutshell, this code snippet acts like giving instructions to the AI model. We specify the model to use, tell it we have a question about an image, and then provide the image data itself. Printing the response on the console: print(response.choices[0].message.content) We asked the AI model "What's in this image?" This line of code would then display the AI's answer. Console response: Finally, we can see the response on the terminal. Now to make things more interesting, let’s convert this into a webapp using the streamlit framework. Recipe Generator Application with Streamlit: Now we know how to interact with the Vision model offline using a basic console. Let’s make things even more exciting by applying all this to a use-case which probably will be most loved by all those who are cooking enthusiasts!! Yes, let’s create an application which will assist in cooking by looking what’s in the image of ingredients! Create a new file and name is as “app.py” select the same. venv that was used earlier. Make sure the Visual studio toolkit is running and serving the Phi-3 Vision model through the port 5272. First step is importing the libraries, import streamlit as st import base64 from openai import OpenAI base64 and OpenAI is the same as we had used in the earlier section. Streamlit: This part imports the entire Streamlit library, which provides a powerful set of tools for creating user interfaces (UIs) with Python. Streamlit simplifies the process of building web apps by allowing you to write Python scripts that directly translate into interactive web pages. client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) As discussed in the earlier section, initializing the client and configuring the base_url and api_key. st.title('Recipe Generator 🍔') st.write('This is a simple recipe generator application.Upload images of the Ingridients and get the recipe by Chef GenAI! 🧑🍳') uploaded_file = st.file_uploader("Choose a file") if uploaded_file is not None: st.image(uploaded_file, width=300) st.title('Recipe Generator 🍔'): This line sets the title of the Streamlit application as "Recipe Generator" with a visually appealing burger emoji. st.write(...): This line displays a brief description of the application's functionality to the user. uploaded_file = st.file_uploader("Choose a file"): This creates a file uploader component within the Streamlit app. Users can select and upload an image file (likely an image of ingredients). if uploaded_file is not None: : This conditional block executes only when the user has actually selected and uploaded a file. st.image(uploaded_file, width=300): If an image is uploaded, this line displays the uploaded image within the Streamlit app with a width of 300 pixels. In essence, this code establishes the basic user interface for the Recipe Generator app. It allows users to upload an image, and if an image is uploaded, it displays the image within the app. preference = st.sidebar.selectbox( "Choose your preference", ("Vegetarian", "Non-Vegetarian") ) cuisine = st.sidebar.selectbox( "Select for Cuisine", ("Indian","Chinese","French","Thai","Italian","Mexican","Japanese","American","Greek","Spanish") ) We use Streamlit's sidebar and selectbox features to create interactive user input options within a web application: st.sidebar.selectbox(...): This line creates a dropdown menu (selectbox) within the sidebar of the Streamlit application.The first argument, "Choose your preference", sets the label or title for the dropdown.The second argument, ("Vegetarian", "Non-Vegetarian"), defines the list of options available for the user to select (in this case, dietary preferences). cuisine = st.sidebar.selectbox(...): This line creates another dropdown menu in the sidebar, this time for selecting the desired cuisine.The label is "Select for Cuisine".The options provided include "Indian", "Chinese", "French", and several other popular cuisines. In essence, this code allows users to interact with the application by selecting their preferred dietary restrictions (Vegetarian or Non-Vegetarian) and desired cuisine from the dropdown menus in the sidebar. def encode_image(uploaded_file): """Encodes a Streamlit uploaded file into base64 format""" if uploaded_file is not None: content = uploaded_file.read() return base64.b64encode(content).decode("utf-8") else: return None base64_image = encode_image(uploaded_file) The same function of encode_image as discussed in the earlier section is being used here. if st.button("Ask Chef GenAI!"): if base64_image: response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": f"STRICTLY use the ingredients in the image to generate a {preference} recipe and {cuisine} cuisine.", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) print(response.choices[0].message.content) st.write(response.choices[0].message.content) else: st.write("Please upload an image with any number of ingridients and instantly get a recipe.") Above code block implements the core functionality of the Recipe Generator app, triggered when the user clicks a button labeled "Ask Chef GenAI!": if st.button("Ask Chef GenAI!"): This line checks if the user has clicked the button. If they have, the code within the if block executes. if base64_image: This inner if condition checks if a variable named base64_image has a value. This variable likely stores the base64 encoded representation of the uploaded image (containing ingredients). If base64_image has a value (meaning an image is uploaded), the code proceeds. client.chat.completions.create(...): Client that had been defined earlier interacts with the API . Here, it calls a to generate text completions, thereby invoking a small language model. The arguments provided specify the model to be used ("Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx") and the message to be completed. The message consists of two parts within a list: User Input: The first part defines the user's role ("user") and the content they provide. This content is an instruction with two key points: Dietary Preference: It specifies to "STRICTLY use the ingredients in the image" to generate a recipe that adheres to the user's preference (vegetarian or non-vegetarian, set using the preference dropdown). Cuisine Preference: It mentions the desired cuisine type (Indian, Chinese, etc., selected using the cuisine dropdown). Image Data: The second part provides the image data itself. It includes the type ("image_url") and the URL, which is constructed using the base64_image variable containing the base64 encoded image data. print(response.choices[0].message.content) & st.write(...): The response will contain a list of possible completions. Here, the code retrieves the first completion (response.choices[0]) and extracts its message content. This content is then printed to the console like before and displayed on the Streamlit app using st.write. else block: If no image is uploaded (i.e., base64_image is empty), the else block executes. It displays a message reminding the user to upload an image to get recipe recommendations. The above code block is the same as before except the we have now modified it to accept few inputs and also have made it compatible with streamlit. The coding is now completed for our streamlit application! It's time to test the application. Navigate to the terminal on Visual Studio Code and enter the following command, (if the file is named as app.py) streamlit run app.py Upon successful run, it will redirect to default browser and a screen with the Recipe generator will be launched, Upload an image with ingredients, select the recipe, cuisine and click on “Ask Chef GenAI”. It will take a few moments for delightful recipe generation. While generating we can see the logs on the terminal and finally the recipe will be shown on the screen! Enjoy your first recipe curated by Chef GenAI powered by Phi-3 vision model on local prem using Visual Studio AI Toolkit! The code is available on the following GitHub Repository. In the upcoming series we will explore more types of Gen AI implementations with AI toolkit. Resources: 1. Visual Studio Code AI Toolkit: Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys 5. Expanded model catalog for AI Toolkit 6. Azure Toolkit Samples GitHub Repository320Views1like1CommentGetting Started - Generative AI with Phi-3-mini: Running Phi-3-mini in Intel AI PC
In 2024, with the empowerment of AI, we will enter the era of AI PC. On May 20, Microsoft also released the concept of Copilot + PC, which means that PC can run SLM/LLM more efficiently with the support of NPU. We can use models from different Phi-3 family combined with the new AI PC to build a simple personalized Copilot application for individuals. This content will combine Intel's AI PC, use Intel's OpenVINO, NPU Acceleration Library, and Microsoft's DirectML to create a local Copilot.30KViews2likes2Comments