Tag:"slm" | Microsoft Community Hub

Microsoft Semantic Kernel and AutoGen: Open Source Frameworks for AI Solutions
Explore Microsoft’s open-source frameworks, Semantic Kernel and AutoGen. Semantic Kernel enables developers to create AI solutions across various domains using a single Large Language Model (LLM). AutoGen, on the other hand, uses AI Agents to perform smart tasks through agent dialogues. Discover how these technologies serve different scenarios and can be used to build powerful AI applications.
Lee_Stott
Feb 08, 2024 Place Educator Developer Blog
44KViews
6likes
1Comment
Building AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook
kinfey
Mar 11, 2025 Place Educator Developer Blog
593Views
4likes
1Comment
Getting Started with the AI Dev Gallery
The AI Dev Gallery is a new open-source project designed to inspire and support developers in integrating on-device AI functionality into their Windows apps. It offers an intuitive UX for exploring and testing interactive AI samples powered by local models. Key features include: Quickly explore and download models from well-known sources on GitHub and HuggingFace. Test different models with interactive samples over 25 different scenarios, including text, image, audio, and video use cases. See all relevant code and library references for every sample. Switch between models that run on CPU and GPU depending on your device capabilities. Quickly get started with your own projects by exporting any sample to a fresh Visual Studio project that references the same model cache, preventing duplicate downloads. Part of the motivation behind the Gallery was exposing developers to the host of benefits that come with on-device AI. Some of these benefits include improved data security and privacy, increased control and parameterization, and no dependence on an internet connection or third-party cloud provider. Requirements Device Requirements Minimum OS Version: Windows 10, version 1809 (10.0; Build 17763) Architecture: x64, ARM64 Memory: At least 16 GB is recommended Disk Space: At least 20GB free space is recommended GPU: 8GB of VRAM is recommended for running samples on the GPU Visual Studio 2022 You will need Visual Studio 2022 with the Windows Application Development workload. Running the Gallery To run the gallery: Clone the repository: git clone https://github.com/microsoft/AI-Dev-Gallery.git Run the solution: .\AIDevGallery.sln Hit F5 to build and run the Gallery Using the Gallery The AI Dev Gallery has can be navigated in two ways: The Samples View The Models View Navigating Samples In this view, samples are broken up into categories (Text, Code, Image, etc.) and then into more specific samples, like in the Translate Text pictured below: On clicking a sample, you will be prompted to choose a model to download if you haven’t run this sample before: Next to the model you can see the size of the model, whether it will run on CPU or GPU, and the associated license. Pick the model that makes the most sense for your machine. You can also download new models and change the model for a sample later from the sample view. Just click the model drop down at the top of the sample: The last thing you can do from the Sample pane is view the sample code and export the project to Visual Studio. Both buttons are found in the top right corner of the sample, and the code view will look like this: Navigating Models If you would rather navigate by models instead of samples, the Gallery also provides the model view: The model view contains a similar navigation menu on the right to navigate between models based on category. Clicking on a model will allow you to see a description of the model, the versions of it that are available to download, and the samples that use the model. Clicking on a sample will take back over to the samples view where you can see the model in action. Deleting and Managing Models If you need to clear up space or see download details for the models you are using, you can head over the Settings page to manage your downloads: From here, you can easily see every model you have downloaded and how much space on your drive they are taking up. You can clear your entire cache for a fresh start or delete individual models that you are no longer using. Any deleted model can be redownload through either the models or samples view. Next Steps for the Gallery The AI Dev Gallery is still a work in progress, and we plan on adding more samples, models, APIs, and features, and we are evaluating adding support for NPUs to take the experience even further If you have feedback, noticed a bug, or any ideas for features or samples, head over to the issue board and submit an issue. We also have a discussion board for any other topics relevant to the Gallery. The Gallery is an open-source project, and we would love contribution, feedback, and ideation! Happy modeling!
zteutsch
Dec 10, 2024 Place Microsoft Developer Community Blog
3.7KViews
4likes
3Comments
AI Toolkit for Visual Studio Code: October 2024 Update Highlights
The AI Toolkit’s October 2024 update revolutionizes Visual Studio Code with game-changing features for developers, researchers, and enthusiasts. Explore multi-model integration, including GitHub Models, ONNX, and Google Gemini, alongside custom model support. Dive into multi-modal capabilities for richer AI testing and seamless multi-platform compatibility across Windows, macOS, and Linux. Tailored for productivity, the enhanced Model Catalog simplifies choosing the best tools for your projects. Try it now and share feedback to shape the future of AI in VS Code!
ronglums
Nov 19, 2024 Place Microsoft Developer Community Blog
2.6KViews
4likes
0Comments
Getting Started - Generative AI with Phi-3-mini: A Guide to Inference and Deployment
Getting started with Microsoft Phi-3-mini - Inference Phi-3-mini models, Discover how Phi-3-mini, a new series of models from Microsoft, enables deployment of Large Language Models (LLMs) on edge devices and IoT devices. Learn how to use Semantic Kernel, Ollama/LlamaEdge, and ONNX Runtime to access and infer phi3-mini models, and explore the possibilities of generative AI in various application scenarios
kinfey
Apr 23, 2024 Place Microsoft Developer Community Blog
49KViews
4likes
13Comments
Introducing BioAgents: Advancing Bioinformatics with Multi-Agent Systems
BioAgents is a multi-agent system designed to improve bioinformatics analysis by leveraging specialized agents fine-tuned on bioinformatics data and enhanced with retrieval-augmented generation.
Venkat_Malladi
Jan 14, 2025 Place Healthcare and Life Sciences Blog
1.4KViews
3likes
0Comments
Unlocking the Potential of Phi-3 and C# in AI Development
Unlocking the Potential of Phi-3 and C# in AI Development: A Must-Attend Session for Technical Students Are you a technical student eager to dive into the world of AI and software development? Look no further! We're excited to invite you to an enlightening session that explores the integration of Phi-3 models with C#, presented by the cloud advocates team at Microsoft, featuring Bruno Capuano and Kinfey Lo
Lee_Stott
Jul 09, 2024 Place Educator Developer Blog
2KViews
2likes
0Comments
Accelerate Phi-3 use on macOS: A Beginner's Guide to Using Apple MLX Framework
Learn how to use macOS and Apple Silicon to speed up machine learning models with this easy guide. We’ll cover the Apple MLX Framework, a tool that helps you run and fine-tune models like Phi-3-mini right on your Mac. First, install MLX by running pip install mlx-lm in your terminal. You can then use commands to run or fine-tune models. Apple's Metal Performance Shaders make this possible by using your Mac's GPU. We'll also show you how to use LoRA for better fine-tuning results and compare the performance of different models.
kinfey
Jun 25, 2024 Place Microsoft Developer Community Blog
12KViews
2likes
0Comments
Getting Started - Generative AI with Phi-3-mini: Running Phi-3-mini in Intel AI PC
In 2024, with the empowerment of AI, we will enter the era of AI PC. On May 20, Microsoft also released the concept of Copilot + PC, which means that PC can run SLM/LLM more efficiently with the support of NPU. We can use models from different Phi-3 family combined with the new AI PC to build a simple personalized Copilot application for individuals. This content will combine Intel's AI PC, use Intel's OpenVINO, NPU Acceleration Library, and Microsoft's DirectML to create a local Copilot.
kinfey
May 21, 2024 Place Microsoft Developer Community Blog
30KViews
2likes
2Comments
Exploring Microsoft's Phi-3 Family of Small Language Models (SLMs) with Azure AI
Dive into the world of small language models (SLMs) with Microsoft's Phi-3 family and learn how to integrate them into real-world applications using Azure AI. Discover step-by-step guidance, practical exercises, and a Gradio-powered chatbot interface to bolster your confidence in deploying and integrating AI. Keep learning and building with Azure AI
Lee_Stott
May 09, 2024 Place Educator Developer Blog
17KViews
2likes
0Comments