Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >

ollama

8 Topics

Building AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook
kinfey
Mar 11, 2025 Place Educator Developer Blog
503Views
4likes
1Comment
Step-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.
suzarilshah
Mar 06, 2025 Place Educator Developer Blog
524Views
0likes
0Comments
Recipe Generator Application with Phi-3 Vision on AI Toolkit Locally
In today's data-driven world, images have become a ubiquitous source of information. From social media feeds to medical imaging, we encounter and generate images constantly. Extracting meaningful insights from these visual data requires sophisticated analysis techniques. In this blog post let’s build an Image Analysis Application using the cutting-edge Phi-3 Vision model completely free of cost and on-premise environment using the VS Code AI Toolkit. We'll explore the exciting possibilities that this powerful combination offers. The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models. I would recommend going through the following blogs for getting started with VS Code AI Toolkit. 1. Visual Studio Code AI Toolkit: How to Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys Setup VS Code AI Toolkit: Launch the VS Code application and Click on the VS Code AI Toolkit extension. Login to the GitHub account if not already done. Once ready, click on model catalog. In the model catalog there are a lot of models, broadly classified into two categories, Local Run (with CPU and with GPU) Remote Access (Hosted by GitHub and other providers) For this blog, we will be using a Local Run model. This will utilize the local machine’s hardware to run the Language model. Since it involves analyzing images, we will be using the language model which supports vision operations and hence Phi-3-Vision will be a good fit as its light and supports local run. Download the model and then further it will be loaded it in the playground to test. Once downloaded, Launch the “Playground” tab and load the Phi-3 Vision model from the dropdown. The Playground also shows that Phi-3 vision allows image attachments. We can try it out before we start developing the application. Let’s upload the image using the “Paperclip icon” on the UI. I have uploaded image of Microsoft logo and prompted the language model to Analyze and explain the image. Phi-3 vision running on local premise boasts an uncanny ability to not just detect but unerringly pinpoint the exact Company logo and decipher the name with astonishing precision. This is a simple use case, but it can be built upon with various applications to unlock a world of new possibilities. Port Forwarding: Port Forwarding, a valuable feature within the AI Toolkit, serves as a crucial gateway for seamless communication with the GenAI model. To do this, launch the terminal and navigate to the “Ports” section. There will be button “Forward a Port”, click on that and select any desired port, in this blog we will use 5272 as the port. The Model-as-a-server is now ready, where the model will be available on the port 5272 to respond to the API calls. It can be tested with any API testing application. To know more click here. Creating Application with Python using OpenAI SDK: To follow this section, Python must be installed on the local machine. Launch the new VS Code window and set the working directory. Create a new Python Virtual environment. Once the setup is ready, open the terminal on VS Code, and install the libraries using “pip”. pip install openai pip install streamlit Before we build the streamlit application, lets develop the basic program and check the responses in the VSCode terminal and then further develop a basic webapp using the streamlit framework. Basic Program Import libraries: import base64 from openai import OpenAI base64: The base64 module provides functions for encoding binary data to base64-encoded strings and decoding base64-encoded strings back to binary data. Base64 encoding is commonly used for encoding binary data in text-based formats such as JSON or XML. OpenAI: The OpenAI package is a Python client library for interacting with OpenAI's API. The OpenAI class provides methods for accessing various OpenAI services, such as generating text, performing natural language processing tasks, and more. Initialize Client: Initialize an instance of the OpenAI class from the openai package, client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) OpenAI (): Initializes a OpenAI model with specific parameters, including a base URL for the API, an API key, a custom model name, and a temperature setting. This model is used to generate responses based on user queries. This instance will be used to interact with the OpenAI API. base_url = "http://127.0.0.1:5272/v1/": Specifies the base URL for the OpenAI API. In this case, it points to a local server running on 127.0.0.1 (localhost) at port 5272. api_key = "ai-toolkit": The API key used to authenticate requests to the OpenAI API. In case of AI Toolkit usage, we don’t have to specify any API key. The image analysis application will frequently deal with images uploaded by users. But to send these images to GenAI model, we need them in a format it understands. This is where the encode_image function comes in. # Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") Function Definition: def encode_image(image_path): defines a function named encode_image that takes a single argument, image_path. This argument represents the file path of the image we want to encode. Opening the Image: with open(image_path, "rb") as image_file: opens the image file specified by image_path in binary reading mode ("rb"). This is crucial because we're dealing with raw image data, not text. Reading Image Content: image_file.read() reads the entire content of the image file into a byte stream. Remember, images are stored as collections of bytes representing color values for each pixel. Base64 Encoding: base64.b64encode(image_file.read()) encodes the byte stream containing the image data into base64 format. Base64 encoding is a way to represent binary data using a combination of printable characters, which makes it easier to transmit or store the data. Decoding to UTF-8: .decode("utf-8") decodes the base64-encoded data into a UTF-8 string. This step is necessary because the OpenAI API typically expects text input, and the base64-encoded string can be treated as text containing special characters. Returning the Encoded Image: return returns the base64-encoded string representation of the image. This encoded string is what we'll send to the AI model for analysis. In essence, the encode_image function acts as a bridge, transforming an image file on your computer into a format that the AI model can understand and process. Path for the Image: We will use an image stored on our local machine for this section, while we develop the webapp, we will change this to accept it to what the user uploads. image_path = "C:/img.jpg" #path of the image here This line of code is crucial for any program that needs to interact with an image file. It provides the necessary information for the program to locate and access the image data. Base64 String: # Getting the base64 string base64_image = encode_image(image_path) This line of code is responsible for obtaining the base64-encoded representation of the image specified by the image_path. Let's break it down: encode_image(image_path): This part calls the encode_image function, which we've discussed earlier. This function takes the image_path as input and performs the following: Reads the image file from the specified path. Converts the image data into a base64-encoded string. Returns the resulting base64-encoded string. base64_image = ...: This part assigns the return value of the encode_image function to the variable base64_image. This section effectively fetches the image from the given location and transforms it into a special format (base64) that can be easily handled and transmitted by the computer system. This base64-encoded string will be used subsequently to send the image data to the AI model for analysis. Invoking the Language Model: This code tells the AI model what to do with the image. response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in the Image?", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) response = client.chat.completions.create(...): This line sends instructions to the AI model we're using (represented by client). Here's a breakdown of what it's telling the model: chat.completions.create: We're using a specific part of the OpenAI API designed for having a conversation-like interaction with the model. The ... part: This represents additional details that define what we want the model to do, which we'll explore next. Let's break down the details (...) sent to the model: 1) model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx": This tells the model exactly which AI model to use for analysis. In our case, it's the "Phi-3-vision" model. 2) messages: This defines what information we're providing to the model. Here, we're sending two pieces of information: role": "user": This specifies that the first message comes from a user (us). The content: This includes two parts: "What's in the Image?": This is the prompt we're sending to the model about the image. "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}: This sends the actual image data encoded in base64 format (stored in base64_image). In a nutshell, this code snippet acts like giving instructions to the AI model. We specify the model to use, tell it we have a question about an image, and then provide the image data itself. Printing the response on the console: print(response.choices[0].message.content) We asked the AI model "What's in this image?" This line of code would then display the AI's answer. Console response: Finally, we can see the response on the terminal. Now to make things more interesting, let’s convert this into a webapp using the streamlit framework. Recipe Generator Application with Streamlit: Now we know how to interact with the Vision model offline using a basic console. Let’s make things even more exciting by applying all this to a use-case which probably will be most loved by all those who are cooking enthusiasts!! Yes, let’s create an application which will assist in cooking by looking what’s in the image of ingredients! Create a new file and name is as “app.py” select the same. venv that was used earlier. Make sure the Visual studio toolkit is running and serving the Phi-3 Vision model through the port 5272. First step is importing the libraries, import streamlit as st import base64 from openai import OpenAI base64 and OpenAI is the same as we had used in the earlier section. Streamlit: This part imports the entire Streamlit library, which provides a powerful set of tools for creating user interfaces (UIs) with Python. Streamlit simplifies the process of building web apps by allowing you to write Python scripts that directly translate into interactive web pages. client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) As discussed in the earlier section, initializing the client and configuring the base_url and api_key. st.title('Recipe Generator 🍔') st.write('This is a simple recipe generator application.Upload images of the Ingridients and get the recipe by Chef GenAI! 🧑‍🍳') uploaded_file = st.file_uploader("Choose a file") if uploaded_file is not None: st.image(uploaded_file, width=300) st.title('Recipe Generator 🍔'): This line sets the title of the Streamlit application as "Recipe Generator" with a visually appealing burger emoji. st.write(...): This line displays a brief description of the application's functionality to the user. uploaded_file = st.file_uploader("Choose a file"): This creates a file uploader component within the Streamlit app. Users can select and upload an image file (likely an image of ingredients). if uploaded_file is not None: : This conditional block executes only when the user has actually selected and uploaded a file. st.image(uploaded_file, width=300): If an image is uploaded, this line displays the uploaded image within the Streamlit app with a width of 300 pixels. In essence, this code establishes the basic user interface for the Recipe Generator app. It allows users to upload an image, and if an image is uploaded, it displays the image within the app. preference = st.sidebar.selectbox( "Choose your preference", ("Vegetarian", "Non-Vegetarian") ) cuisine = st.sidebar.selectbox( "Select for Cuisine", ("Indian","Chinese","French","Thai","Italian","Mexican","Japanese","American","Greek","Spanish") ) We use Streamlit's sidebar and selectbox features to create interactive user input options within a web application: st.sidebar.selectbox(...): This line creates a dropdown menu (selectbox) within the sidebar of the Streamlit application.The first argument, "Choose your preference", sets the label or title for the dropdown.The second argument, ("Vegetarian", "Non-Vegetarian"), defines the list of options available for the user to select (in this case, dietary preferences). cuisine = st.sidebar.selectbox(...): This line creates another dropdown menu in the sidebar, this time for selecting the desired cuisine.The label is "Select for Cuisine".The options provided include "Indian", "Chinese", "French", and several other popular cuisines. In essence, this code allows users to interact with the application by selecting their preferred dietary restrictions (Vegetarian or Non-Vegetarian) and desired cuisine from the dropdown menus in the sidebar. def encode_image(uploaded_file): """Encodes a Streamlit uploaded file into base64 format""" if uploaded_file is not None: content = uploaded_file.read() return base64.b64encode(content).decode("utf-8") else: return None base64_image = encode_image(uploaded_file) The same function of encode_image as discussed in the earlier section is being used here. if st.button("Ask Chef GenAI!"): if base64_image: response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": f"STRICTLY use the ingredients in the image to generate a {preference} recipe and {cuisine} cuisine.", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) print(response.choices[0].message.content) st.write(response.choices[0].message.content) else: st.write("Please upload an image with any number of ingridients and instantly get a recipe.") Above code block implements the core functionality of the Recipe Generator app, triggered when the user clicks a button labeled "Ask Chef GenAI!": if st.button("Ask Chef GenAI!"): This line checks if the user has clicked the button. If they have, the code within the if block executes. if base64_image: This inner if condition checks if a variable named base64_image has a value. This variable likely stores the base64 encoded representation of the uploaded image (containing ingredients). If base64_image has a value (meaning an image is uploaded), the code proceeds. client.chat.completions.create(...): Client that had been defined earlier interacts with the API . Here, it calls a to generate text completions, thereby invoking a small language model. The arguments provided specify the model to be used ("Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx") and the message to be completed. The message consists of two parts within a list: User Input: The first part defines the user's role ("user") and the content they provide. This content is an instruction with two key points: Dietary Preference: It specifies to "STRICTLY use the ingredients in the image" to generate a recipe that adheres to the user's preference (vegetarian or non-vegetarian, set using the preference dropdown). Cuisine Preference: It mentions the desired cuisine type (Indian, Chinese, etc., selected using the cuisine dropdown). Image Data: The second part provides the image data itself. It includes the type ("image_url") and the URL, which is constructed using the base64_image variable containing the base64 encoded image data. print(response.choices[0].message.content) & st.write(...): The response will contain a list of possible completions. Here, the code retrieves the first completion (response.choices[0]) and extracts its message content. This content is then printed to the console like before and displayed on the Streamlit app using st.write. else block: If no image is uploaded (i.e., base64_image is empty), the else block executes. It displays a message reminding the user to upload an image to get recipe recommendations. The above code block is the same as before except the we have now modified it to accept few inputs and also have made it compatible with streamlit. The coding is now completed for our streamlit application! It's time to test the application. Navigate to the terminal on Visual Studio Code and enter the following command, (if the file is named as app.py) streamlit run app.py Upon successful run, it will redirect to default browser and a screen with the Recipe generator will be launched, Upload an image with ingredients, select the recipe, cuisine and click on “Ask Chef GenAI”. It will take a few moments for delightful recipe generation. While generating we can see the logs on the terminal and finally the recipe will be shown on the screen! Enjoy your first recipe curated by Chef GenAI powered by Phi-3 vision model on local prem using Visual Studio AI Toolkit! The code is available on the following GitHub Repository. In the upcoming series we will explore more types of Gen AI implementations with AI toolkit. Resources: 1. Visual Studio Code AI Toolkit: Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys 5. Expanded model catalog for AI Toolkit 6. Azure Toolkit Samples GitHub Repository
shreyanfern
Jan 28, 2025 Place Educator Developer Blog
318Views
1like
1Comment
Fine-Tuning and Deploying Phi-3.5 Model with Azure and AI Toolkit
What is Phi-3.5? Phi-3.5 as a state-of-the-art language model with strong multilingual capabilities. Emphasize that it is designed to handle multiple languages with high proficiency, making it a versatile tool for Natural Language Processing (NLP) tasks across different linguistic backgrounds. Key Features of Phi-3.5 Highlight the core features of the Phi-3.5 model: Multilingual Capabilities: Explain that the model supports a wide variety of languages, including major world languages such as English, Spanish, Chinese, French, and others. You can provide an example of its ability to handle a sentence or document translation from one language to another without losing context or meaning. Fine-Tuning Ability: Discuss how the model can be fine-tuned for specific use cases. For instance, in a customer support setting, the Phi-3.5 model can be fine-tuned to understand the nuances of different languages used by customers across the globe, improving response accuracy. High Performance in NLP Tasks: Phi-3.5 is optimized for tasks like text classification, machine translation, summarization, and more. It has superior performance in handling large-scale datasets and producing coherent, contextually correct language outputs. Applications in Real-World Scenarios To make this section more engaging, provide a few real-world applications where the Phi-3.5 model can be utilized: Customer Support Chatbots: For companies with global customer bases, the model’s multilingual support can enhance chatbot capabilities, allowing for real-time responses in a customer’s native language, no matter where they are located. Content Creation for Global Markets: Discuss how businesses can use Phi-3.5 to automatically generate or translate content for different regions. For example, marketing copy can be adapted to fit cultural and linguistic nuances in multiple languages. Document Summarization Across Languages: Highlight how the model can be used to summarize long documents or articles written in one language and then translate the summary into another language, improving access to information for non-native speakers. Why Choose Phi-3.5 for Your Project? End this section by emphasizing why someone should use Phi-3.5: Versatility: It’s not limited to just one or two languages but performs well across many. Customization: The ability to fine-tune it for particular use cases or industries makes it highly adaptable. Ease of Deployment: With tools like Azure ML and Ollama, deploying Phi-3.5 in the cloud or locally is accessible even for smaller teams. Objective Of Blog Specialized Language Models (SLMs) are at the forefront of advancements in Natural Language Processing, offering fine-tuned, high-performance solutions for specific tasks and languages. Among these, the Phi-3.5 model has emerged as a powerful tool, excelling in its multilingual capabilities. Whether you're working with English, Spanish, Mandarin, or any other major world language, Phi-3.5 offers robust, reliable language processing that adapts to various real-world applications. This makes it an ideal choice for businesses looking to deploy multilingual chatbots, automate content generation, or translate customer interactions in real time. Moreover, its fine-tuning ability allows for customization, making Phi-3.5 versatile across industries and tasks. Customization and Fine-Tuning for Different Applications The Phi-3.5 model is not just limited to general language understanding tasks. It can be fine-tuned for specific applications, industries, and language models, allowing users to tailor its performance to meet their needs. Customizable for Industry-Specific Use Cases: With fine-tuning, the model can be trained further on domain-specific data to handle particular use cases like legal document translation, medical records analysis, or technical support. Example: A healthcare company can fine-tune Phi-3.5 to understand medical terminology in multiple languages, enabling it to assist in processing patient records or generating multilingual health reports. Adapting for Specialized Tasks: You can train Phi-3.5 to perform specialized tasks like sentiment analysis, text summarization, or named entity recognition in specific languages. Fine-tuning helps enhance the model's ability to handle unique text formats or requirements. Example: A marketing team can fine-tune the model to analyse customer feedback in different languages to identify trends or sentiment across various regions. The model can quickly classify feedback as positive, negative, or neutral, even in less widely spoken languages like Arabic or Korean. Applications in Real-World Scenarios To illustrate the versatility of Phi-3.5, here are some real-world applications where this model excels, demonstrating its multilingual capabilities and customization potential: Case Study 1: Multilingual Customer Support Chatbots Many global companies rely on chatbots to handle customer queries in real-time. With Phi-3.5’s multilingual abilities, businesses can deploy a single model that understands and responds in multiple languages, cutting down on the need to create language-specific chatbots. Example: A global airline can use Phi-3.5 to power its customer service bot. Passengers from different countries can inquire about their flight status or baggage policies in their native languages—whether it's Japanese, Hindi, or Portuguese—and the model responds accurately in the appropriate language. Case Study 2: Multilingual Content Generation Phi-3.5 is also useful for businesses that need to generate content in different languages. For example, marketing campaigns often require creating region-specific ads or blog posts in multiple languages. Phi-3.5 can help automate this process by generating localized content that is not just translated but adapted to fit the cultural context of the target audience. Example: An international cosmetics brand can use Phi-3.5 to automatically generate product descriptions for different regions. Instead of merely translating a product description from English to Spanish, the model can tailor the description to fit cultural expectations, using language that resonates with Spanish-speaking audiences. Case Study 3: Document Translation and Summarization Phi-3.5 can be used to translate or summarize complex documents across languages. Its ability to preserve meaning and context across languages makes it ideal for industries where accuracy is crucial, such as legal or academic fields. Example: A legal firm working on cross-border cases can use Phi-3.5 to translate contracts or legal briefs from German to English, ensuring the context and legal terminology are accurately preserved. It can also summarize lengthy documents in multiple languages, saving time for legal teams. Fine-Tuning Phi-3.5 Model Fine-tuning a language model like Phi-3.5 is a crucial step in adapting it to perform specific tasks or cater to specific domains. This section will walk through what fine-tuning is, its importance in NLP, and how to fine-tune the Phi-3.5 model using Azure Model Catalog for different languages and tasks. We'll also explore a code example and best practices for evaluating and validating the fine-tuned model. What is Fine-Tuning? Fine-tuning refers to the process of taking a pre-trained model and adapting it to a specific task or dataset by training it further on domain-specific data. In the context of NLP, fine-tuning is often required to ensure that the language model understands the nuances of a particular language, industry-specific terminology, or a specific use case. Why Fine-Tuning is Necessary Pre-trained Large Language Models (LLMs) are trained on diverse datasets and can handle various tasks like text summarization, generation, and question answering. However, they may not perform optimally in specialized domains without fine-tuning. The goal of fine-tuning is to enhance the model's performance on specific tasks by leveraging its prior knowledge while adapting it to new contexts. Challenges of Fine-Tuning Resource Intensiveness: Fine-tuning large models can be computationally expensive, requiring significant hardware resources. Storage Costs: Each fine-tuned model can be large, leading to increased storage needs when deploying multiple models for different tasks. LoRA and QLoRA To address these challenges, techniques like LoRA (Low-rank Adaptation) and QLoRA (Quantized Low-rank Adaptation) have emerged. Both methods aim to make the fine-tuning process more efficient: LoRA: This technique reduces the number of trainable parameters by introducing low-rank matrices into the model while keeping the original model weights frozen. This approach minimizes memory usage and speeds up the fine-tuning process. QLoRA: An enhancement of LoRA, QLoRA incorporates quantization techniques to further reduce memory requirements and increase the efficiency of the fine-tuning process. It allows for the deployment of large models on consumer hardware without the extensive resource demands typically associated with full fine-tuning. from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments from peft import get_peft_model, LoraConfig # Load a pre-trained model model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") # Configure LoRA lora_config = LoraConfig( r=16, # Rank lora_alpha=32, lora_dropout=0.1, ) # Wrap the model with LoRA model = get_peft_model(model, lora_config) # Define training arguments training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=16, per_device_eval_batch_size=16, num_train_epochs=3, ) # Create a Trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, ) # Start fine-tuning trainer.train() This code outlines how to set up a model for fine-tuning using LoRA, which can significantly reduce the resource requirements while still adapting the model effectively to specific tasks. In summary, fine-tuning with methods like LoRA and QLoRA is essential for optimizing pre-trained models for specific applications in NLP, making it feasible to deploy these powerful models in various domains efficiently. Why is Fine-Tuning Important in NLP? Task-Specific Performance: Fine-tuning helps improve performance for tasks like text classification, machine translation, or sentiment analysis in specific domains (e.g., legal, healthcare). Language-Specific Adaptation: Since models like Phi-3.5 are trained on general datasets, fine-tuning helps them handle industry-specific jargon or linguistic quirks. Efficient Resource Utilization: Instead of training a model from scratch, fine-tuning leverages pre-trained knowledge, saving computational resources and time. Steps to Fine-Tune Phi-3.5 in Azure AI Foundry Fine-tuning the Phi-3.5 model in Azure AI Foundry involves several key steps. Azure provides a user-friendly interface to streamline model customization, allowing you to quickly configure, train, and deploy models. Step 1: Setting Up the Environment in Azure AI Foundry Access Azure AI Foundry: Log in to Azure AI Foundry. If you don’t have an account, you can create one and set up a workspace. Create a New Experiment: Once in the Azure AI Foundry, create a new training experiment. Choose the Phi-3.5 model from the pre-trained models provided in the Azure Model Zoo. Set Up the Data for Fine-Tuning: Upload your custom dataset for fine-tuning. Ensure the dataset is in a compatible format (e.g., CSV, JSON). For instance, if you are fine-tuning the model for a customer service chatbot, you could upload customer queries in different languages. Step 2: Configure Fine-Tuning Settings Select the Training Dataset: Select the dataset you uploaded and link it to the Phi-3.5 model. 2) Configure the Hyperparameters: Set up training hyperparameters like the number of epochs, learning rate, and batch size. You may need to experiment with these settings to achieve optimal performance. 3) Choose the Task Type: Specify the task you are fine-tuning for, such as text classification, translation, or summarization. This helps Azure AI Foundry understand how to optimize the model during fine-tuning. 4) Fine-Tuning for Specific Languages: If you are fine-tuning for a specific language or multilingual tasks, ensure that the dataset is labeled appropriately and contains enough examples in the target language(s). This will allow Phi-3.5 to learn language-specific features effectively. Step 3: Train the Model Launch the Training Process: Once the configuration is complete, launch the training process in Azure AI Foundry. Depending on the size of your dataset and the complexity of the model, this could take some time. Monitor Training Progress: Use Azure AI Foundry’s built-in monitoring tools to track performance metrics such as loss, accuracy, and F1 score. You can view the model’s progress during training to ensure that it is learning effectively. Code Example: Fine-Tuning Phi-3.5 for a Specific Use Case Here's a code snippet for fine-tuning the Phi-3.5 model using Python and Azure AI Foundry SDK. In this example, we are fine-tuning the model for a customer support chatbot in multiple languages. from azure.ai import Foundry from azure.ai.model import Model # Initialize Azure AI Foundry foundry = Foundry() # Load the Phi-3.5 model model = Model.load("phi-3.5") # Set up the training dataset training_data = foundry.load_dataset("customer_queries_dataset") # Fine-tune the model model.fine_tune(training_data, epochs=5, learning_rate=0.001) # Save the fine-tuned model model.save("fine_tuned_phi_3.5") Best Practices for Evaluating and Validating Fine-Tuned Models Once the model is fine-tuned, it's essential to evaluate and validate its performance before deploying it in production. Split Data for Validation: Always split your dataset into training and validation sets. This ensures that the model is evaluated on unseen data to prevent overfitting. Evaluate Key Metrics: Measure performance using key metrics such as: Accuracy: The proportion of correct predictions. F1 Score: A measure of precision and recall. Confusion Matrix: Helps visualize true vs. false predictions for classification tasks. Cross-Language Validation: If the model is fine-tuned for multiple languages, test its performance across all supported languages to ensure consistency and accuracy. Test in Production-Like Environments: Before full deployment, test the fine-tuned model in a production-like environment to catch any potential issues. Continuous Monitoring and Re-Fine-Tuning: Once deployed, continuously monitor the model’s performance and re-fine-tune it periodically as new data becomes available. Deploying Phi-3.5 Model After fine-tuning the Phi-3.5 model, the next crucial step is deploying it to make it accessible for real-world applications. This section will cover two key deployment strategies: deploying in Azure for cloud-based scaling and reliability, and deploying locally with AI Toolkit for simpler offline usage. Each deployment strategy offers its own advantages depending on the use case. Deploying in Azure Azure provides a powerful environment for deploying machine learning models at scale, enabling organizations to deploy models like Phi-3.5 with high availability, scalability, and robust security features. Azure AI Foundry simplifies the entire deployment pipeline. Set Up Azure AI Foundry Workspace: Log in to Azure AI Foundry and navigate to the workspace where the Phi-3.5 model was fine-tuned. Go to the Deployments section and create a new deployment environment for the model. Choose Compute Resources: Compute Target: Select a compute target suitable for your deployment. For large-scale usage, it’s advisable to choose a GPU-based compute instance. Example: Choose an Azure Kubernetes Service (AKS) cluster for handling large-scale requests efficiently. Configure Scaling Options: Azure allows you to set up auto-scaling based on traffic. This ensures that the model can handle surges in demand without affecting performance. Model Deployment Configuration: Create an Inference Pipeline: In Azure AI Foundry, set up an inference pipeline for your model. Specify the Model: Link the fine-tuned Phi-3.5 model to the deployment pipeline. Deploy the Model: Select the option to deploy the model to the chosen compute resource. Test the Deployment: Once the model is deployed, test the endpoint by sending sample requests to verify the predictions. Configuration Steps (Compute, Resources, Scaling) During deployment, Azure AI Foundry allows you to configure essential aspects like compute type, resource allocation, and scaling options. Compute Type: Choose between CPU or GPU clusters depending on the computational intensity of the model. Resource Allocation: Define the minimum and maximum resources to be allocated for the deployment. For real-time applications, use Azure Kubernetes Service (AKS) for high availability. For batch inference, Azure Container Instances (ACI) is suitable. Auto-Scaling: Set up automatic scaling of the compute instances based on the number of requests. For example, configure the deployment to start with 1 node and scale to 10 nodes during peak usage. Cost Comparison: Phi-3.5 vs. Larger Language Models When comparing the costs of using Phi-3.5 with larger language models (LLMs), several factors come into play, including computational resources, pricing structures, and performance efficiency. Here’s a breakdown: Cost Efficiency Phi-3.5: Designed as a Small Language Model (SLM), Phi-3.5 is optimized for lower computational costs. It offers competitive performance at a fraction of the cost of larger models, making it suitable for budget-conscious projects. The smaller size (3.8 billion parameters) allows for reduced resource consumption during both training and inference. Larger Language Models (e.g., GPT-3.5): Typically require more computational resources, leading to higher operational costs. Larger models may incur additional costs for storage and processing power, especially in cloud environments. Performance vs. Cost Performance Parity: Phi-3.5 has been shown to achieve performance parity with larger models on various benchmarks, including language comprehension and reasoning tasks. This means that for many applications, Phi-3.5 can deliver similar results to larger models without the associated costs. Use Case Suitability: For simpler tasks or applications that do not require extensive factual knowledge, Phi-3.5 is often the more cost-effective choice. Larger models may still be preferred for complex tasks requiring deep contextual understanding or extensive factual recall. Pricing Structure Azure Pricing: Phi-3.5 is available through Azure with a pay-as-you-go billing model, allowing users to scale costs based on usage. Pricing details for Phi-3.5 can be found on the Azure pricing page, where users can customize options based on their needs. Code Example: API Setup and Endpoints for Live Interaction Below is a Python code snippet demonstrating how to interact with a deployed Phi-3.5 model via an API in Azure: import requests # Define the API endpoint and your API key api_url = "https://<your-azure-endpoint>/predict" api_key = "YOUR_API_KEY" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(api_url, json=input_data, headers={"Authorization": f"Bearer {api_key}"}) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Deploying Locally with AI Toolkit For developers who prefer to run models on their local machines, the AI Toolkit provides a convenient solution. The AI Toolkit is a lightweight platform that simplifies local deployment of AI models, allowing for offline usage, experimentation, and rapid prototyping. Deploying the Phi-3.5 model locally using the AI Toolkit is straightforward and can be used for personal projects, testing, or scenarios where cloud access is limited. Introduction to AI Toolkit The AI Toolkit is an easy-to-use platform for deploying language models locally without relying on cloud infrastructure. It supports a range of AI models and enables developers to work in a low-latency environment. Advantages of deploying locally with AI Toolkit: Offline Capability: No need for continuous internet access. Quick Experimentation: Rapid prototyping and testing without the delays of cloud deployments. Setup Guide: Installing and Running Phi-3.5 Locally Using AI Toolkit Install AI Toolkit: Go to the AI Toolkit website and download the platform for your operating system (Linux, macOS, or Windows). Install AI Toolkit by running the appropriate installation command in your terminal. Download the Phi-3.5 Model: Once AI Toolkit is installed, you can download the Phi-3.5 model locally by running: 3. Run the Model Locally: After downloading the model, start a local session by running: This will launch a local server on your machine where the model will be available for interaction. Code Example: Using Phi-3.5 Locally in a Project Below is a Python code example demonstrating how to send a query to the locally deployed Phi-3.5 model running on the AI Toolkit. import requests # Define the local endpoint local_url = "http://localhost:8000/predict" # Prepare the input data input_data = { "text": "What are the benefits of renewable energy?" } # Make the API request response = requests.post(local_url, json=input_data) # Print the model's response if response.status_code == 200: print("Model Response:", response.json()) else: print("Error:", response.status_code, response.text) Comparing Language Capabilities Test Results: How Phi-3.5 Handles Different Languages The Phi-3.5 model demonstrates robust multilingual capabilities, effectively processing and generating text in various languages. Below are comparative examples showcasing its performance in English, Spanish, and Mandarin: English Example: Input: "What are the benefits of renewable energy?" Output: "Renewable energy sources, such as solar and wind, reduce greenhouse gas emissions and promote sustainability." Spanish Example: Input: "¿Cuáles son los beneficios de la energía renovable?" Output: "Las fuentes de energía renovable, como la solar y la eólica, reducen las emisiones de gases de efecto invernadero y promueven la sostenibilidad." Mandarin Example: Input: "可再生能源的好处是什么？" Output: "可再生能源，如太阳能和风能，减少温室气体排放，促进可持续发展。" Performance Benchmarking and Evaluation Across Different Languages Benchmarking Phi-3.5 across different languages involves evaluating its accuracy, fluency, and contextual understanding. For instance, using BLEU scores and human evaluations, the model can be assessed on its translation quality and coherence in various languages. Real-World Use Case: Multilingual Customer Service Chatbot A practical application of Phi-3.5's multilingual capabilities is in developing a customer service chatbot that can interact with users in their preferred language. For instance, the chatbot could provide support in English, Spanish, and Mandarin, ensuring a wider reach and better user experience. Optimizing and Validating Phi-3.5 Model Model Performance Metrics To validate the model's performance in different scenarios, consider the following metrics: Accuracy: Measure how often the model's outputs are correct or align with expected results. Fluency: Assess the naturalness and readability of the generated text. Contextual Understanding: Evaluate how well the model understands and responds to context-specific queries. Tools to Use in Azure and Ollama for Evaluation Azure Cognitive Services: Utilize tools like Text Analytics and Translator to evaluate performance. Ollama: Use local testing environments to quickly iterate and validate model outputs. Conclusion In summary, Phi-3.5 exhibits impressive multilingual capabilities, effective deployment options, and robust performance metrics. Its ability to handle various languages makes it a versatile tool for natural language processing applications. Phi-3.5 stands out for its adaptability and performance in multilingual contexts, making it an excellent choice for future NLP projects, especially those requiring diverse language support. We encourage readers to experiment with the Phi-3.5 model using Azure AI Foundry or the AI Toolkit, explore fine-tuning techniques for their specific use cases, and share their findings with the community. For more information on optimized fine-tuning techniques, check out the Ignite Fine-Tuning Workshop. References Customize the Phi-3.5 family of models with LoRA fine-tuning in Azure Fine-tune Phi-3.5 models in Azure Fine Tuning with Azure AI Foundry and Microsoft Olive Hands on Labs and Workshop Customize a model with fine-tuning https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=azure-openai%2Cturbo%2Cpython-new&pivots=programming-language-studio Microsoft AI Toolkit - AI Toolkit for VSCode
Sharda_Kaur
Jan 17, 2025 Place Educator Developer Blog
677Views
1like
2Comments
Semantic Kernel: Local LLMs Unleashed on Raspberry Pi 5
The blog post shows use of local Large Language Models (LLMs) on devices like the Raspberry Pi 5. The post shows Ollama platform’s ability to run open-source LLMs locally.
elbruno
May 03, 2024 Place Educator Developer Blog
3.5KViews
1like
0Comments
Harnessing Local AI: Unleashing the Power of .NET Smart Components and Llama2
Lets explore the integration of AI into .NET applications, highlighting .NET Smart Components with local LLMs like Llama2 for enhanced productivity, and encouraging the adoption of these transformative AI features.
elbruno
Apr 10, 2024 Place Educator Developer Blog
2.5KViews
2likes
0Comments
Extending Semantic Kernel using OllamaSharp for Chat and Text Completion
This cutting-edge .NET binding for the Ollama API revolutionizes how we interact with AI, making it a breeze for developers to integrate chat and text completion features into their applications with the power of Semantic Kernel and OllamaSharp.
elbruno
Apr 05, 2024 Place Educator Developer Blog
7.7KViews
1like
0Comments
SemanticKernel – 📎Chat Service demo running Llama2 LLM locally in Ubuntu
Learn how to run a Llama 2 model locally with Ollama, an open-source language model platform. Interact with the model using .NET and Semantic Kernel, a chat service and a console app. Experiment with large language models without external tools or services.
elbruno
Feb 23, 2024 Place Educator Developer Blog
7.3KViews
0likes
0Comments