llm
48 TopicsSpeed Up OpenAI Embedding By 4x With This Simple Trick!
In today’s fast-paced world of AI applications, optimizing performance should be one of your top priorities. This guide walks you through a simple yet powerful way to reduce OpenAI embedding response sizes by 75%—cutting them from 32 KB to just 8 KB per request. By switching from float32 to base64 encoding in your Retrieval-Augmented Generation (RAG) system, you can achieve a 4x efficiency boost, minimizing network overhead, saving costs and dramatically improving responsiveness. Let's consider the following scenario. Use Case: RAG Application Processing a 10-Page PDF A user interacts with a RAG-powered application that processes a 10-page PDF and uses OpenAI embedding models to make the document searchable from an LLM. The goal is to show how optimizing embedding response size impacts overall system performance. Step 1: Embedding Creation from the 10-Page PDF In a typical RAG system, the first step is to embed documents (in this case, a 10-page PDF) to store meaningful vectors that will later be retrieved for answering queries. The PDF is split into chunks. In our example, each chunk contains approximately 100 tokens (for the sake of simplicity), but the recommended chunk size varies based on the language and the embedding model. Assumptions for the PDF: - A 10-page PDF has approximately 3325 tokens (about 300 tokens per page). - You’ll split this document into 34 chunks (each containing 100 tokens). - Each chunk will then be sent to the embedding OpenAI API for processing. Step 2: The User Interacts with the RAG Application Once the embeddings for the PDF are created, the user interacts with the RAG application, querying it multiple times. Each query is processed by retrieving the most relevant pieces of the document using the previously created embeddings. For simplicity, let’s assume: - The user sends 10 queries, each containing 200 tokens. - Each query requires 2 embedding requests (since the query is split into 100-token chunks for embedding). - After embedding the query, the system performs retrieval and returns the most relevant documents (the RAG response). Embedding Response Size The OpenAI Embeddings models take an input of tokens (the text to embed) and return a list of numbers called a vector. This list of numbers represents the “embedding” of the input in the model so that it can be compared with another vector to measure similarity. In RAG, we use embedding models to quickly search for relevant data in a vector database. By default, embeddings are serialized as an array of floating-point values in a JSON document so each response from the embedding API is relatively large. The array values are 32-bit floating point numbers, or float32. Each float32 value occupies 4 bytes, and the embedding vector returned by models like OpenAI’s text-embedding-ada-002 typically consists of 1536-dimensional vectors. The challenge is the size of the embedding response: - Each response consists of 1536 float32 values (one per dimension). - 1536 float32 values result in 6144 bytes (1536 × 4 bytes). - When serialized as UTF-8 for transmission over the network, this results in approximately 32 KB per response due to additional serialization overhead (like delimiters). Optimizing Embedding Response Size One approach to optimize the embedding response size is to serialize the embedding as base64. This encoding reduces the overall size by compressing the data, while maintaining the integrity of the embedding information. This leads to a significant reduction in the size of the embedding response. With base64-encoded embeddings, the response size reduces from 32 KB to approximately 8 KB, as demonstrated below: base64 vs float32 Min (Bytes) Max (Bytes) Mean (Bytes) Min (+) Max (+) Mean (+) 100 tokens embeddings: text-embedding-3-small 32673.000 32751.000 32703.800 8192.000 (4.0x) (74.9%) 8192.000 (4.0x) (75.0%) 8192.000 (4.0x) (74.9%) 100 tokens embeddings: text-embedding-3-large 65757.000 65893.000 65810.200 16384.000 (4.0x) (75.1%) 16384.000 (4.0x) (75.1%) 16384.000 (4.0x) (75.1%) 100 tokens embeddings: text-embedding-ada-002 32882.000 32939.000 32909.000 8192.000 (4.0x) (75.1%) 8192.000 (4.0x) (75.2%) 8192.000 (4.0x) (75.1%) The source code of this benchmark can be found at: https://github.com/manekinekko/rich-bench-node (kudos to Anthony Shaw for creating the rich-bench python runner) Comparing the Two Scenarios Let’s break down and compare the total performance of the system in two scenarios: Scenario 1: Embeddings Serialized as float32 (32 KB per Response) Scenario 2: Embeddings Serialized as base64 (8 KB per Response) Scenario 1: Embeddings Serialized as Float32 In this scenario, the PDF embedding creation and user queries involve larger responses due to float32 serialization. Let’s compute the total response size for each phase: 1. Embedding Creation for the PDF: - 34 embedding requests (one per 100-token chunk). - 34 responses with 32 KB each. Total size for PDF embedding responses: 34 × 32 KB = 1088 KB = 1.088 MB 2. User Interactions with the RAG App: - Each user query consists of 200 tokens (which is split into 2 chunks of 100 tokens). - 10 user queries, requiring 2 embedding responses per query (for 2 chunks). - Each embedding response is 32 KB. Total size for user queries: Embedding responses: 20 × 32 KB = 640 KB. RAG responses: 10 × 32 KB = 320 KB. Total size for user interactions: 640 KB (embedding) + 320 KB (RAG) = 960 KB. 3. Total Size: Total size for embedding responses (PDF + user queries): 1088 KB + 640 KB = 1.728 MB Total size for RAG responses: 320 KB. Overall total size for all 10 responses: 1728 KB + 320 KB = 2048 KB = 2 MB Scenario 2: Embeddings Serialized as Base64 In this optimized scenario, the embedding response size is reduced to 8 KB by using base64 encoding. 1. Embedding Creation for the PDF: - 34 embedding requests. - 34 responses with 8 KB each. Total size for PDF embedding responses: 34 × 8 KB = 272 KB. 2. User Interactions with the RAG App: - Embedding responses for 10 queries, 2 responses per query. - Each embedding response is 8 KB. Total size for user queries: Embedding responses: 20 × 8 KB = 160 KB. RAG responses: 10 × 8 KB = 80 KB. Total size for user interactions: 160 KB (embedding) + 80 KB (RAG) = 240 KB 3. Total Size (Optimized Scenario): Total size for embedding responses (PDF + user queries): 272 KB + 160 KB = 432 KB. Total size for RAG responses: 80 KB. Overall total size for all 10 responses: 432 KB + 80 KB = 512 KB Performance Gain: Comparison Between Scenarios The optimized scenario (base64 encoding) is 4 times smaller than the original (float32 encoding): 2048 / 512 = 4 times smaller. The total size reduction between the two scenarios is: 2048 KB - 512 KB = 1536 KB = 1.536 MB. And the reduction in data size is: (1536 / 2048) × 100 = 75% reduction. How to Configure base64 encoding format When getting a vector representation of a given input that can be easily consumed by machine learning models and algorithms, as a developer, you usually call either the OpenAI API endpoint directly or use one of the official libraries for your programming language. Calling the OpenAI or Azure OpenAI APIs Using OpenAI endpoint: curl -X POST "https://api.openai.com/v1/embeddings" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "input": "The five boxing wizards jump quickly", "model": "text-embedding-ada-002", "encoding_format": "base64" }' Or, calling Azure OpenAI resources: curl -X POST "https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21" \ -H "Content-Type: application/json" \ -H "api-key: YOUR_API_KEY" \ -d '{ "input": ["The five boxing wizards jump quickly"], "encoding_format": "base64" }' Using OpenAI Libraries JavaScript/TypeScript const response = await client.embeddings.create({ input: "The five boxing wizards jump quickly", model: "text-embedding-3-small", encoding_format: "base64" }); A pull request has been sent to the openai SDK for Node.js repository to make base64 the default encoding when/if the user does not provide an encoding. Please feel free to give that PR a thumb up. Python embedding = client.embeddings.create( input="The five boxing wizards jump quickly", model="text-embedding-3-small", encoding_format="base64" ) NB: from 1.62 the openai SDK for Python will default to base64. Java EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams .builder() .input("The five boxing wizards jump quickly") .encodingFormat(EncodingFormat.BASE64) .model("text-embedding-3-small") .build(); .NET The openai-dotnet library is already enforcing the base64 encoding, and does not allow setting encoding_format by the user (see). Conclusion By optimizing the embedding response serialization from float32 to base64, you achieved a 75% reduction in data size and improved performance by 4x. This reduction significantly enhances the efficiency of your RAG application, especially when processing large documents like PDFs and handling multiple user queries. For 1 million users sending 1,000 requests per month, the total size saved would be approximately 22.9 TB per month simply by using base64 encoded embeddings. As demonstrated, optimizing the size of the API responses is not only crucial for reducing network overhead but also for improving the overall responsiveness of your application. In a world where efficiency and scalability are key to delivering robust AI-powered solutions, this optimization can make a substantial difference in both performance and user experience. ■ Shoutout to my colleague Anthony Shaw for the the long and great discussions we had about embedding optimisations.Unlocking the Power of Azure Container Apps in 1 Minute Video
Azure Container Apps provides a seamless way to build, deploy, and scale cloud-native applications without the complexity of managing infrastructure. Whether you’re developing microservices, APIs, or AI-powered applications, this fully managed service enables you to focus on writing code while Azure handles scalability, networking, and deployments. In this blog post, we explore five essential aspects of Azure Container Apps—each highlighted in a one-minute video. From intelligent applications and secure networking to effortless deployments and rollbacks, these insights will help you maximize the capabilities of serverless containers on Azure. Azure Container Apps - in 1 Minute Azure Container Apps is a fully managed platform designed for cloud-native applications, providing effortless deployment and scaling. It eliminates infrastructure complexity, letting developers focus on writing code while Azure automatically handles scaling based on demand. Whether running APIs, event-driven applications, or microservices, Azure Container Apps ensures high performance and flexibility with minimal operational overhead. Watch the video on YouTube Intelligent Apps with Azure Container Apps – in 1 Minute Azure Container Apps, Azure OpenAI, and Azure AI Search make it possible to build intelligent applications with Retrieval-Augmented Generation (RAG). Your app can call Azure OpenAI in real-time to generate and interpret data, while Azure AI Search retrieves relevant information, enhancing responses with up-to-date context. For advanced scenarios, AI models can execute live code via Azure Container Apps, and GPU-powered instances support fine-tuning and inferencing at scale. This seamless integration enables AI-driven applications to deliver dynamic, context-aware functionality with ease. Watch the video on YouTube Networking for Azure Container Apps: VNETs, Security Simplified – in 1 Minute Azure Container Apps provides built-in networking features, including support for Virtual Networks (VNETs) to control service-to-service communication. Secure internal traffic while exposing public endpoints with custom domain names and free certificates. Fine-tuned ingress and egress controls ensure that only the right traffic gets through, maintaining a balance between security and accessibility. Service discovery is automatic, making inter-app communication seamless within your Azure Container Apps environment. Watch the video on YouTube Azure Continuous Deployment and Observability with Azure Container Apps - in 1 Minute Azure Container Apps simplifies continuous deployment with built-in integrations for GitHub Actions and Azure DevOps pipelines. Every code change triggers a revision, ensuring smooth rollouts with zero downtime. Observability is fully integrated via Azure Monitor, Log Streaming, and the Container Console, allowing you to track performance, debug live issues, and maintain real-time visibility into your app’s health—all without interrupting operations. Watch the video on YouTube Effortless Rollbacks and Deployments with Azure Container Apps – in 1 Minute With Azure Container Apps, every deployment creates a new revision, allowing multiple versions to run simultaneously. This enables safe, real-time testing of updates without disrupting production. Rolling back is instant—just select a previous revision and restore your app effortlessly. This powerful revision control system ensures that deployments remain flexible, reliable, and low-risk. Watch the video on YouTube Watch the Full Playlist For a complete overview of Azure Container Apps capabilities, watch the full JavaScript on Azure Container Apps YouTube Playlist Create Your Own AI-Powered Video Content Inspired by these short-form technical videos? You can create your own AI-generated videos using Azure AI to automate scriptwriting and voiceovers. Whether you’re a content creator, or business looking to showcase technical concepts, Azure AI makes it easy to generate professional-looking explainer content. Learn how to create engaging short videos with Azure AI by following our open-source AI Video Playbook. Conclusion Azure Container Apps is designed to simplify modern application development by providing a fully managed, serverless container environment. Whether you need to scale microservices, integrate AI capabilities, enhance security with VNETs, or streamline CI/CD workflows, Azure Container Apps offers a comprehensive solution. By leveraging its built-in features such as automatic scaling, revision-based rollbacks, and deep observability, developers can deploy and manage applications with confidence. These one-minute videos provide a quick technical overview of how Azure Container Apps empowers you to build scalable, resilient applications with ease. FREE Content Check out our other FREE content to learn more about Azure services and Generative AI: Generative AI for Beginners - A JavaScript Adventure! Learn more about Azure AI Agent Service LlamaIndex on Azure JavaScript on Azure Container Apps JavaScript at MicrosoftLearn how to develop innovative AI solutions with updated Azure skilling paths
The rapid evolution of generative AI is reshaping how organizations operate, innovate, and deliver value. Professionals who develop expertise in generative AI development, prompt engineering, and AI lifecycle management are increasingly valuable to organizations looking to harness these powerful capabilities while ensuring responsible and effective implementation. In this blog, we’re excited to share our newly refreshed series of Plans on Microsoft Learn that aim to supply your team with the tools and knowledge to leverage the latest AI technologies, including: Find the best model for your generative AI solution with Azure AI Foundry Create agentic AI solutions by using Azure AI Foundry Build secure and responsible AI solutions and manage generative AI lifecycles From sophisticated AI agents that can autonomously perform complex tasks to advanced chat models that enable natural human-AI collaboration, these technologies are becoming essential business tools rather than optional enhancements. Let’s take a look at the latest developments and unlock their full potential with our curated training resources from Microsoft Learn. Simplify the process of choosing an AI model with Azure AI Foundry Choosing the optimal generative AI model is essential for any solution, requiring careful evaluation of task complexity, data requirements, and computational constraints. Azure AI Foundry streamlines this decision-making process by offering diverse pre-trained models, fine-tuning capabilities, and comprehensive MLOps tools that enable businesses to test, optimize, and scale their AI applications while maintaining enterprise-grade security and compliance. Our Plan on Microsoft Learn titled Find the best model for your generative AI solution with Azure AI Foundry will guide you through the process of discovering and deploying the best models for creating generative AI solutions with Azure AI Foundry, including: Learn about the differences and strengths of various language models Find out how to integrate and use AI models in your applications to enhance functionality and user experience. Rapidly create intelligent, market-ready multimodal applications with Azure models, and explore industry-specific models. In addition, you’ll have the chance to take part in a Microsoft Azure Virtual Training Day, with interactive sessions and expert guidance to help you skill up on Azure AI features and capabilities. By engaging with this Plan on Microsoft Learn, you’ll also have the chance to prove your skills and earn a Microsoft Certification. Leap into the future of agentic AI solutions with Azure After choosing the right model for your generative AI purposes, our next Plan on Microsoft Learn goes a step further by introducing agentic AI solutions. A significant evolution in generative AI, agentic AI solutions enable autonomous decision-making, problem-solving, and task execution without constant human intervention. These AI agents can perceive their environment, adapt to new inputs, and take proactive actions, making them valuable across various industries. In the Create agentic AI solutions by using Azure AI Foundry Plan on Microsoft Learn, you’ll find out how developing agentic AI solutions requires a platform that provides scalability, adaptability, and security. With pre-built AI models, MLOps tools, and deep integrations with Azure services, Azure AI Foundry simplifies the development of custom AI agents that can interact with data, make real-time decisions, and continuously learn from new information. You’ll also: Learn how to describe the core features and capabilities of Azure AI Foundry, provision and manage Azure AI resources, create and manage AI projects, and determine when to use Azure AI Foundry. Discover how to customize with RAG in Azure AI Foundry, Azure AI Foundry SDK, or Azure OpenAI Service to look for answers in documents. Learn how to use Azure AI Agent Service, a comprehensive suite of feature-rich, managed capabilities, to bring together the models, data, tools, and services your enterprise needs to automate business processes There’s also a Microsoft Virtual Training Day featuring interactive sessions and expert guidance, and you can validate your skills by earning a Microsoft Certification. Safeguard your AI systems for security and fairness Widespread AI adoption demands rigorous security, fairness, and transparency safeguards to prevent bias, privacy breaches, and vulnerabilities that lead to unethical outcomes or non-compliance. Organizations must implement responsible AI through robust data governance, explainability, bias mitigation, and user safety protocols, while protecting sensitive data and ensuring outputs align with ethical standards. Our third Plan on Microsoft Learn, Build secure and responsible AI solutions and manage generative AI lifecycles, is designed to introduce the basics of AI security and responsible AI to help increase the security posture of AI environments. You’ll not only learn how to evaluate and improve generative AI outputs for quality and safety, but you’ll also: Gain an understanding of the basic concepts of AI security and responsible AI to help increase the security posture of AI environments. Learn how to assess and improve generative AI outputs for quality and safety. Discover how to help reduce risks by using Azure AI Content Safety to detect, moderate, and manage harmful content. Learn more by taking part in an interactive, expert-guided Microsoft Virtual Training Day to deepen your understanding of core AI concepts. Got a skilling question? Our new Ask Learn AI assistant is here to help Beyond our comprehensive Plans on Microsoft Learn, we’re also excited to introduce Ask Learn, our newest skilling innovation! Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms throughout your training experience. Ask Learn is your Copilot for getting skilled in AI, helping to answer your questions within the Microsoft Learn interface, so you don’t have to search elsewhere for the information. Simply click the Ask Learn icon at the top corner of the page to activate! Begin your generative AI skilling journey with curated Azure skilling Plans Azure AI Foundry provides the necessary platform to train, test, and deploy AI solutions at scale, and with the expert-curated skilling resources available in our newly refreshed Plans on Microsoft learn, your teams can accelerate the creation of intelligent, self-improving AI agents tailored to your business needs. Get started today! Find the best model for your generative AI solution with Azure AI Foundry Create agentic AI solutions by using Azure AI Foundry Build secure and responsible AI solutions and manage generative AI lifecyclesAI Sparks: AI Toolkit for VS Code - from playground to production
Are you building AI-powered applications from scratch or infusing intelligence into existing production code and systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) for VS Code from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. Multi-Modal AI – Work with images, text, and beyond. Agentic Frameworks – Build autonomous, decision-making AI systems. What will you learn from this session? Whether you're a developer, startup founder, or AI enthusiast, you'll gain practical insights, live demos, and actionable takeaways to level up your AI integration journey. Join us and spark your AI transformation! You can click here and register for the entire series on the reactor page Episode list You can also sign up for the individual episode and read about the topics covered using the following links: Feb 13 th 2025 – WATCH ON DEMAND @ Introduction to AI toolkit and feature walkthrough In In this episode, we’ll introduce the AI Toolkit extension for VS Code—a powerful way to explore and integrate the latest AI models from OpenAI, Meta, Deepseek, Mistral, and more. With this extension, you can browse state-of-the-art models, download some for local use, or experiment with others remotely. Whether you're enhancing an existing application or building something new, the AI Toolkit simplifies the process of selecting and integrating the right model for your needs. Feb 27 th 2025 – A short introduction to SLMs and local model with use cases In this episode, we’ll explore Small Language Models (SLMs) and how they compare to larger models. SLMs are efficient, require less compute and memory, and can run on edge devices while still excelling at a variety of tasks. We’ll dive into the Phi-3.5 and Phi-4 model series and demonstrate how to build a practical application using these models. Mar 13 th 2025 – How to work with embedding models and build a RAG application In this episode, we’ll dive into embedding models—important tools for working with vector databases and large language models. These models convert text into numerical representations, making it easier to process and retrieve information efficiently. After covering the core concepts, we’ll apply them in practice by building a Retrieval-Augmented Generation (RAG) app using Small Language Models (SLMs) and a vector database. Mar 27 th 2025 – Multi-modal support and image analysis In this episode, we’ll dig deeper into multi-modal capabilities of local and remote AI models and use visualization tools for better insights. We’ll also dive into multi-modal support in the AI Toolkit, showcasing how to process and analyze images alongside text. By the end, you’ll see how these capabilities come together to enhance powerful AI applications. Apr 10 th 2025 – Evaluations – How to choose the best model for you applications needs In this episode, we’ll explore how to evaluate AI models and choose the right one for your needs. We’ll cover key performance metrics, compare different models, and demonstrate testing strategies using features like Playground, Bulk Run, and automated evaluations. Whether you're experimenting with the latest models or transitioning to a new version, these evaluation techniques will help you make informed decisions with confidence. Apr 24 th 2025 – Agents and Agentic Frameworks In this episode, we’ll explore agents and agentic frameworks—systems that enable AI models to make decisions, take actions, and automate complex tasks. We’ll break down how these frameworks work, their practical applications, and how to build and integrate them into your projects. By the end, you’ll have a clear understanding of how to build and leverage AI agents effectively. We will explore how to use and build agentic frameworks using AI Toolkit. Resources AI toolkit for VSCode - https://aka.ms/AIToolkit AI toolkit for VSCode Documentation - https://aka.ms/AIToolkit/doc Building Retrieval Augmented Generation (RAG) apps on VSCode & AI Toolkit Understanding and using Reasoning models such as DeepSeek R1 on AI toolkit - Using Ollama and OpenAI, Google and Anthropic hosted models with AI toolkit AI Sparks - YouTube PlaylistAI Genius - AI Skilling series for Developers
We are conducting a six-part AI Skilling series called AI Genius starting January 28th, 2025, to kickstart your AI learning journey from beginner to advanced use cases. The series will feature experts from Microsoft talking about different aspects of using AI and building AI Applications. This is targeted towards developers who are looking to upskill their AI capabilities in the latest AI technologies such as SLMs, RAG and AI agents.Getting Started with the AI Dev Gallery
The AI Dev Gallery is a new open-source project designed to inspire and support developers in integrating on-device AI functionality into their Windows apps. It offers an intuitive UX for exploring and testing interactive AI samples powered by local models. Key features include: Quickly explore and download models from well-known sources on GitHub and HuggingFace. Test different models with interactive samples over 25 different scenarios, including text, image, audio, and video use cases. See all relevant code and library references for every sample. Switch between models that run on CPU and GPU depending on your device capabilities. Quickly get started with your own projects by exporting any sample to a fresh Visual Studio project that references the same model cache, preventing duplicate downloads. Part of the motivation behind the Gallery was exposing developers to the host of benefits that come with on-device AI. Some of these benefits include improved data security and privacy, increased control and parameterization, and no dependence on an internet connection or third-party cloud provider. Requirements Device Requirements Minimum OS Version: Windows 10, version 1809 (10.0; Build 17763) Architecture: x64, ARM64 Memory: At least 16 GB is recommended Disk Space: At least 20GB free space is recommended GPU: 8GB of VRAM is recommended for running samples on the GPU Visual Studio 2022 You will need Visual Studio 2022 with the Windows Application Development workload. Running the Gallery To run the gallery: Clone the repository: git clone https://github.com/microsoft/AI-Dev-Gallery.git Run the solution: .\AIDevGallery.sln Hit F5 to build and run the Gallery Using the Gallery The AI Dev Gallery has can be navigated in two ways: The Samples View The Models View Navigating Samples In this view, samples are broken up into categories (Text, Code, Image, etc.) and then into more specific samples, like in the Translate Text pictured below: On clicking a sample, you will be prompted to choose a model to download if you haven’t run this sample before: Next to the model you can see the size of the model, whether it will run on CPU or GPU, and the associated license. Pick the model that makes the most sense for your machine. You can also download new models and change the model for a sample later from the sample view. Just click the model drop down at the top of the sample: The last thing you can do from the Sample pane is view the sample code and export the project to Visual Studio. Both buttons are found in the top right corner of the sample, and the code view will look like this: Navigating Models If you would rather navigate by models instead of samples, the Gallery also provides the model view: The model view contains a similar navigation menu on the right to navigate between models based on category. Clicking on a model will allow you to see a description of the model, the versions of it that are available to download, and the samples that use the model. Clicking on a sample will take back over to the samples view where you can see the model in action. Deleting and Managing Models If you need to clear up space or see download details for the models you are using, you can head over the Settings page to manage your downloads: From here, you can easily see every model you have downloaded and how much space on your drive they are taking up. You can clear your entire cache for a fresh start or delete individual models that you are no longer using. Any deleted model can be redownload through either the models or samples view. Next Steps for the Gallery The AI Dev Gallery is still a work in progress, and we plan on adding more samples, models, APIs, and features, and we are evaluating adding support for NPUs to take the experience even further If you have feedback, noticed a bug, or any ideas for features or samples, head over to the issue board and submit an issue. We also have a discussion board for any other topics relevant to the Gallery. The Gallery is an open-source project, and we would love contribution, feedback, and ideation! Happy modeling!3.7KViews4likes3CommentsHow Reasoning Models are transforming Logical AI thinking
Introduction Reasoning models are a new category of specialized language models. They are designed to break down complex problems into smaller, manageable steps and solve them through explicit logical reasoning (This step is also called “thinking”). Unlike general-purpose LLMs which might generate direct answers, reasoning models are specifically trained to show their work and follow a more structured thought process. Some models don’t show their logical reasoning phase while others explicitly show their logical reasoning phase. The reasoning phase shows how the model can break down the problem stated into smaller problems (Decomposition), try different approaches (Ideation), choose the best approaches (validation), reject invalid approaches (possibly backtracking) and finally choose the best answer (execution/solving). Reasoning models like OpenAI’s o1, o1-mini, o3-mini, and DeepSeek-R1 are available in the AI Toolkit for VS Code model catalog, Azure AI Foundry model catalog as well as online for free usage (rate-limited) in the Github Marketplace for models. There are some key differences between reasoning models and general-purpose LLMs. These are summarized in the following table. Summary Table Feature Reasoning Models General Purpose LLMs Primary Purpose and Strengths Explicit step-by-step problem solving and logical reasoning General-purpose text generation and understanding Problem-Solving Approach Break down problems into smaller sub-steps and shows intermediate reasoning (“thinking” phase) Show the output directly without showing intermediate steps. Mostly pattern based next token prediction Output Structure Highly structured, with clear delineation between reasoning steps More flexible and conversational, may mix reasoning with other content Training Specifically trained on step-by-step reasoning tasks and formal logic Trained on a diverse corpus of text data with many different writing styles and purposes Usage of Chain-of-Thought Built into the core architecture and training process Can use chain-of-thought prompting but not inherently designed for it. Must specifically ask for this in the prompt. Interpretability and Error Detection Detecting errors is easier due to explicit reasoning steps Black-box nature makes reasoning for output harder to trace Computational Efficiency May require more computing resources due to step-by-step processing, especially at inference time Relatively lower computing requirements at inference time and usually more efficient for direct responses Latency for response May be slower for simple tasks that don't require explicit reasoning May struggle with complex logical problems requiring step-by-step analysis and interpretation. Examples OpenAI o1, o1-mini, o3-mini, DeepSeek-R1 GPT-4o, Llama3.3, Claude Use cases Scientific reasoning, legal analysis, complex problem-solving, AI agents with multi-step decision-making Chatbots, Text summarization, content generation, Q&A, code completion Since we know that reasoning models excel at logical step-by step problem solving, we will give them tasks. We will be using DeepSeek-R1 from the GitHub model marketplace and AI Toolkit for VSCode for this analysis. Task 1 – Solve a logical puzzle We will use a variation of the river crossing problem that has some interesting constraints and needs some level of backtracking which humans excel at. This is the prompt that we will use is Can you solve this logic puzzle and show it as a diagram "A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?" AI Toolkit Session follows below: User Can you solve this logic puzzle and show it as a diagram "A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. How can they cross the river without anything being eaten?" DeepSeek-R1 <think> Okay, so the user wants me to solve this logic puzzle about the farmer crossing the river with a wolf, goat, and cabbage. Let me start by recalling the problem. The farmer can only take one item at a time in the boat. If left alone, the wolf eats the goat, and the goat eats the cabbage. The goal is to get everyone across safely. First, I need to break down the constraints. The boat can carry the farmer and one item. So each trip either the farmer goes alone or takes one item. The main issue is the forbidden pairs: wolf and goat can't be left alone, goat and cabbage can't be left alone. So whenever the farmer moves, the remaining items on either bank must not include these pairs. Let me think through the possible steps. Let's consider the initial state: all four (farmer, wolf, goat, cabbage) are on the left bank. The goal is to get all to the right bank. The first thought: if the farmer takes the goat first. Then left bank has wolf and cabbage. They are safe because wolf and cabbage don't interact. So after first trip, farmer and goat are on the right, wolf and cabbage on the left. Then the farmer goes back alone to the left. Now, the right bank has just the goat. Then the farmer can take either the wolf or the cabbage next. Suppose he takes the wolf. Then on the right bank would be wolf and goat. Oh no, that's a problem because the wolf would eat the goat. So that's forbidden. Alternatively, after taking the goat over and returning alone, the farmer could take the cabbage. If he takes the cabbage to the right bank, then left bank has wolf. Right bank has goat and cabbage. But when he takes the cabbage over, the goat is now on the right with the cabbage. If the farmer leaves them to go back, the goat would eat the cabbage. So that's not allowed either. Wait, this seems confusing. Let's track step by step. State 0: All on left (F, W, G, C). Right is empty. Step 1: Farmer takes goat to right. Now left has W, C; right has F, G. That's safe. Then Step 2: Farmer goes back alone to left. Now left has F, W, C; right has G. Step 3: Farmer takes either W or C. Suppose he takes W. Now right has F, W, G. Left has C. But when he takes W, he leaves G and W on the right? Wait, no. When farmer takes W to the right, after step 3, left has C, right has F, W, G. But if he leaves W and G on the right, that's a problem (wolf and goat). Wait, no, the farmer is there too. So actually, if the farmer is present, they can't eat. The problem occurs only when unattended. So when the farmer takes W to the right, he's with W and G on the right. Then he can leave W there and take G back? Wait, but that would complicate things. Wait, let's step through again. After step 2: left is F, W, C; right is G. Step 3: Farmer takes W to right. Left has C. Right has F, W, G. Since farmer is on the right, wolf and goat are not left alone. Then the farmer leaves W on the right, goes back with G. Wait, but if he leaves W on the right, then the right bank has W and G, but farmer is taking G back. Wait, no. If the farmer takes W to the right, then right bank now has W and G. But the farmer is there. So he can take W to the right bank, and then he stays there, but then he can't take G back. Wait, no. Wait, no. Let me correct this. If the farmer is taking the wolf from left to right, starting at left (F, W, C) and right (G). After moving W: Left has C (and no farmer), right has F, W, G. Wait, but left would have C alone, which is safe. Then the farmer is on the right with W and G. Then he has to leave either W or G to go back. But if he leaves W and G on the right, unattended, that's a problem. So he can't leave them. Thus, the farmer needs to take something with him. Wait, so step 3 alternatives: after step 2 (left has F, W, C; right has G), farmer takes cabbage to the right: Left becomes W (and no farmer), right has F, C, G. But then when he takes C to the right, the right bank would have C and G, which would be a problem if left unattended. Because farmer is on the right, so they are attended. Then he leaves C there, takes G back to left. But that seems steps going back and forth. This is getting a bit tangled. Let's try outlining possible valid moves. The key insight here is that the goat is the only item that can't be left with either wolf or cabbage. So the farmer must ensure that whenever the goat is on a bank without the farmer, there are no other dangerous items there. So, the strategy that typically works is: Take goat over first. Return alone. Take either wolf or cabbage over. Bring goat back. Take the other item (cabbage or wolf) over. Return alone. Take goat over again. Let me check this. Step-by-Step: Start: All on left (F, W, G, C) Take G to right. State: Left: W, C; Right: F, G. Safe, because Left has W and C (no conflict). Farmer returns to left alone. State: F, W, C | G. Farmer takes W to right. State: Left: C; Right: F, W, G. But on the right, when the farmer leaves W there, does he take G back? No. Wait, after bringing W to right, the farmer would have to leave something. If he takes W to the right and leaves it there while he goes back, then right has W and G. The farmer is on the way back, so they are left alone. That's a problem because wolf and goat would be together. Therefore, this is invalid. Hence, step 3 taking W leads to a problem. Ah, so to avoid that, after step 2, instead of taking W, the farmer should take C. Alternative Step 3: After returning, take C to right. Take C to right. State: Left: F, W; Right: G, C. But when he takes C to right, the farmer leaves left bank as W. Which is safe, just wolf alone. Right bank has F, C, G. But when he leaves C and G on the right (since farmer is there, they are safe). Wait, the farmer is present, so they don't eat. So he can leave them, then go back. So step 4: return alone. State: Left: F, W; Right: C, G. Wait, no. Let me track correctly: After Step 1: left: W, C; right: F, G. Step 2: return to left alone. Now left: F, W, C; right: G. Step 3: take C to right. After moving, left: W; farmer is taking C to right. So left becomes W and farmer is moving to right, so left has W alone (which is safe). Right now has F, G, C. But upon arrival, farmer is with G and C. So they are safe. Step 4: farmer returns to left alone. Now left: F, W; right: G, C. But on the left, F and W are there, which is okay. On right, G and C are unattended – that's a problem! Goat would eat cabbage. Wait, even though the farmer has left, he's on left now. So the right bank has G and C without the farmer. So that violates the conditions. So this path would result in the goat eating the cabbage. Hence invalid. Therefore, taking the cabbage on step 3 leads to a problem when returning. Therefore, neither taking W nor taking C works here. Hmm. So alternative approach: after step 2, when the farmer is on the left with W and C, instead of taking C or W, he needs to take the goat back. Wait, but that seems to reverse step 1. That can't help. Wait, maybe the key is that after taking the goat over first, then the farmer must bring it back later to avoid leaving dangerous pairs. Let's try a different sequence: Take goat to right. Left: W, C; Right: F, G. Return alone to left. Left: F, W, C; Right: G. Take wolf to right. Left: C; Right: F, W, G. But as before, after this, farmer is on right with W and G. If he leaves W there and takes G back, then right has W alone, which is okay. So: Take goat back to left. Left: F, C, G; Right: W. Now left has C, G, F; right has W. Then take cabbage to right. Left: G; Right: F, W, C. Then return alone to left. Take goat to right. Now all are on right. Done. This seems to work. Let's verify each step: Take G over. Left: W, C (safe); Right: F, G. Return alone. Left: F, W, C; Right: G. Take W over. Left: C (safe). Right: F, W, G. Farmer is present, so W and G safe. Take G back. Left: F, G, C; Right: W (safe). Wait, on left, when farmer takes G back, left now has G and C with farmer. So when he arrives, left has F, G, C; right has W. But then when he leaves again, left would have G and C, which would be a problem (goat eats cabbage). Wait, no. After step 4, after taking G back, the state is: Left: F, G, C (farmer is there); Right: W. Then the farmer leaves to take C over: Step 5: Take C over. Left: G (alone on left? Wait, no. Wait, if farmer takes C to right, he leaves G on left. Wait, no. Let's see: After step 4: farmer is on left with G, having brought G back from the right. So left has F, C, G; right has W. Then, step 5: farmer takes C to right. So left becomes F, G on left? Wait no: starting from: Left: F, C, G; Right: W. When farmer takes C to right, farmer and C move to right. Left becomes G alone, but without the farmer. That's bad because goat alone on left but nothing else. So it's safe. Wait, goat alone is okay. Goat on left alone is harmless. So after step 5, right has F, C, W. Left has G. But the farmer is on the right. Then step 6: farmer returns alone to left. Left: F, G; Right: W, C. Step 7: Take G to right. Now all on right: F, G, W, C. Done. But let's check at each step: Step 1: left (W, C), right (F, G) - safe. Step 2: left (F, W, C), right (G). Farmer is on left. Step 3: take W to right. Left has C (safe). Right has F, W, G. Farmer there. Step 4: take G back. Left: F, G, C (farmer there). Right: W. So now right has W alone, safe. Step 5: take C to right. Left: G alone (safe). Right: F, W, C. Farmer there. Step 6: return alone. Left: G (farmer on right). So G alone on left is safe. Wait no, after step 6: farmer returns to left, now left has F, G; right has W, C. Farmer is on left with G, right has W and C (separately, so no problem). Safe. Step 7: take G to right. Now all on right. Done. Yes, that works. So the steps are: Farmer and Goat → right. Farmer ← left. Farmer and Wolf → right. Farmer and Goat ← left. Farmer and Cabbage → right. Farmer ← left. Farmer and Goat → right. Each step maintains safety on both banks. Hence the solution is this sequence. Then, to represent it as a diagram, perhaps using a flowchart showing each state with left and right banks, connected by arrows indicating the steps. </think> To solve the puzzle, follow these steps systematically, ensuring that no dangerous pairs are left unattended: Cross with Goat: Farmer takes the Goat to the right bank. Left: Wolf, Cabbage | Right: Farmer, Goat Return Alone: Farmer returns to the left bank alone. Left: Farmer, Wolf, Cabbage | Right: Goat Cross with Wolf: Farmer takes the Wolf to the right bank. Left: Cabbage | Right: Farmer, Wolf, Goat Return with Goat: Farmer brings the Goat back to the left bank. Left: Farmer, Goat, Cabbage | Right: Wolf Cross with Cabbage: Farmer takes the Cabbage to the right bank. Left: Goat | Right: Farmer, Wolf, Cabbage Return Alone: Farmer returns to the left bank alone. Left: Farmer, Goat | Right: Wolf, Cabbage Final Trip with Goat: Farmer takes the Goat to the right bank. Left: (empty) | Right: Farmer, Wolf, Goat, Cabbage Diagram (simplified as a flowchart): <>Plain text Start: F, W, G, C | [Left] → [Right] (empty) Take G ➔ [W, C] | [F, G] Return ↵ [F, W, C] | [G] Take W ➔ [C] | [F, W, G] Take G ↵ [F, G, C] | [W] Take C ➔ [G] | [F, W, C] Return ↵ [F, G] | [W, C] Take G ➔ [ ] | [F, W, G, C] All cross safely! 🚣✨ Token used: 3278 (11693 characters) Conclusion As you can see it has arrived at the right answer, and you can see it taking the following steps Reformulating the problem as a set of constraints Adding symbols for each of the interacting entities. Then it takes a series of steps to see if they satisfy the constraints and discards ones that do not. Then it layers each of these steps to build an efficient solution based on backtracking and trial-and-error. Finally, once it has a probable answer, it crosschecks again on the constraints and draws the diagram as we instructed it (with symbols for each). A nice touch is that it has also shown the answer in human readable text before to make it easy to follow. Task 2 – Explain obfuscated code Another scenario that we can often face in our lives as programmers is stumbling across a piece of code in our codebase that does something clever but alas, it has no documentation. Sounds familiar? Let's see how DeepSeek-R1 can tackle this task. This is the prompt we will use What does this program do? What language is it in? And what are such programs called? s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" AI toolkit Session transcript follows: User What does this program do? What language is it in? and what are such programs called? s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" DeepSeek-R1 <think> Okay, so I need to figure out what this program does, what language it's written in, and what such programs are called. Let me start by looking closely at the code. First, the code starts with s='s=\47%s\47;printf "$s" "$s"'; followed by printf "$s" "$s". Hmm, the initial part is assigning a string to variable s. The string seems to have some escape characters. The \47 is probably an octal escape in some programming language. Since 47 in octal is 39 in decimal, which is the ASCII code for a single quote ('). So s='s=\47%s\47;printf "$s" "$s"' would actually translate to s='s='%s';printf "$s" "$s"'. Wait, let me check that. So the single quote in the string is escaped using \47, which would allow the string to include a single quote without closing the string prematurely. So the value of s is a string that starts with 's=' followed by a single quote (from \47), then %s, then another single quote (again \47), followed by ;printf "$s" "$s". Then, the next part is printf "$s" "$s". So they're using printf with the format string stored in s and the argument is the same s variable. Let's think step by step. Let's substitute s into the printf command. The s variable is: 's=\47%s\47;printf "$s" "$s"'. When printf is executed, the format string is s, which includes a %s placeholder. The arguments passed to printf are "$s", so the first %s in the format string is replaced by the value of s. Wait, but in the first line, the code is written as s='s=\47%s\47;printf "$s" "$s"'; which when interpreted would need to handle the single quotes correctly. Let me parse this. The assignment is s='...' where the content inside the single quotes includes \47, which as I thought is a single quote. So the actual value of s when assigned is: s='s='%s';printf "$s" "$s"'. Wait, maybe not. Let me think again. The original assignment is s='s=\47%s\47;printf "$s" "$s"'; So inside the single-quoted string, \47 is an escape sequence. But in some shells like Bash, inside single quotes, escapes are not processed unless the single quote is closed. Wait, but the code is using printf, which is a shell command, suggesting maybe this is a shell script? But in the shell, single quotes don't allow escaping, except for the single quote itself by using '''. Wait, but maybe the code is in a different language where \47 represents a single quote. For example, in C, octal escapes are allowed in strings. Wait, but the code uses printf with "$s" as the format and "$s" as the argument. Let me try to simulate this. Let's suppose s has the value: s='s=\47%s\47;printf "$s" "$s"'. Then, when the printf is executed: printf "$s" "$s". So substituting variables, the format string is the value of s, which is 's=\47%s\47;printf "$s" "$s"'. Then, the first argument is also s. So the %s in the format will be replaced by the value of the first argument, which is s. Wait, but the substitution happens inside the format string. Let me try writing it out. Let's say s is the string 's=\47%s\47;printf "$s" "$s"'. Then, when we do printf "$s" "$s", the %s in the format string will be replaced by the argument "$s". So the output would be the format string with %s replaced by s's value. Let's compute that. Replacing the %s with the value of s gives: s='\47 followed by the value of s (which includes \47%s\47) followed by \47'; then ;printf "$s" "$s". Hmm, that's getting a bit tangled. Maybe this program is a quine? A quine is a program that outputs its own source code. So perhaps this code, when run, generates its own source as output. Let's test this idea. Let's see what the output would be. The initial value of s is s='s=\47%s\47;printf "$s" "$s"'. Then printf is called with the format string being s and the argument being s. So substituting, the format string is 's=\47%s\47;printf "$s" "$s"', and the %s will be replaced with the same string. Let's do that: replacing %s with the value of s (i.e., the string 's=\47%s\47;printf "$s" "$s"') into the format string. So the result would be s='\47's=\47%s\47;printf "$s" "$s"'' followed by ;printf "$s" "$s". Wait, no. Let me actually plug in the substitution. The original s is: s=\47%s\47;printf "$s" "$s" So when we replace the %s in the format with the s itself, the line becomes s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" Wait, but wait. Let me process this step by step. Let me consider what happens when the printf command runs. The first printf is given the format string in variable s, which is 's=\47%s\47;printf "$s" "$s"' (the outer single quotes in the assignment are part of the shell syntax, but the actual content is s=\47%s\47;printf "$s" "$s"). Wait, no. Let's clarify: in shell scripting, when you assign s='some string', the variable s holds 'some string' without the single quotes. Wait, no. Wait in shell scripting, variable assignments like s='value' assign s the literal 'value', with the quotes used to delimit the string but not part of the value. So in this code, s is assigned the string s=\47%s\47;printf "$s" "$s" where \47 is a literal \ followed by 47, but when interpreted inside a shell, what's the actual value? Wait, but shell does not interpret octal escapes in normal strings unless using certain commands like echo with flags. Oh, but here they're using printf, which does interpret backslash escapes. Wait, but variable assignment in shell doesn't process escapes unless in certain contexts. So if s is assigned as s='s=\47%s\47;printf ...', then \47 is literally a backslash followed by 4 and 7, not as an octal escape. Hmm, that complicates things. So in that case, when the printf is executed, the format string would have \47 which printf would interpret as octal 47 (if the format string is processed with printf's escape handling). Wait, but the shell's printf command processes the escapes in the format string. Wait, the shell's printf uses the format string and replaces escape sequences in it if they are present. But in this code, the format string is stored in $s, which is being passed as the first argument to printf, and the second argument is $s. Let's take an example. Let's say s is the string 's=\47%s\47;printf "$s" "$s"'. Then, when you run printf "$s" "$s", the first argument (the format) is "s=\47%s\47;printf "$s" "$s"" and the second argument is the same string. So the %s in the format will be replaced by the second argument. Wait, but the format has \47 which, when interpreted by printf, is octal 47, which is a single quote. So in the format string, s=\47%s\47 becomes s='%s' after replacing \47 with '. Then the %s is replaced by the argument, which is the original s value. Wait, this is getting a bit confusing. Maybe I should write down the actual evaluation step by step. Let me try to reproduce the program and see what the output is. The program is: s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" Assuming this is a shell script (Bash, perhaps), let's parse this. The first part assigns the variable s to the string 's=\47%s\47;printf "$s" "$s"'. Then, the printf command is executed with the format string "$s" and the argument "$s". So, expanding variables, the printf command becomes: printf 's=\47%s\47;printf "$s" "$s"' "$s" So the format string is 's=\47%s\47;printf "$s" "$s"' with %s, and the argument is the value of s (same string). Now, when printf processes the format string, it replaces the %s with the argument. But the format string includes \47, which in the printf command is interpreted as an octal escape for the character with octal value 47, which is 39 in decimal, which is ASCII for '. Therefore, \47 becomes ', so the format string would be 's='%s';printf "$s" "$s"'. So substituting the argument (which is s='s=\47%s\47;printf "$s" "$s"') into the %s: s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" Wait, but substituting the value of s into the format, which is now 's=%s' with the %s being replaced by s, which includes the same string. Wait, this seems like a quine, where the program generates its own source code. When you run the printf command, it replaces %s with the original s variable, but within a structure that reconstructs the original program. Let me try creating the output step by step. The original code is: s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" The printf command is: printf "s=\47%s\47;printf "$s" "$s"" "$s" Because the $s in the format is replaced by the variable s. Wait, but variables inside single quotes are not expanded. Wait, no, in the assignment, the string is in single quotes, so variables inside aren't expanded. Then in the printf "$s" "$s", the $s are in double quotes, so they get expanded. Wait, but the first line is s='s=\47%s\47;printf "$s" "$s"'; Then, the second command is printf "$s" "$s". Because $s is in double quotes, the shell will replace $s with its value, which is 's=\47%s\47;printf "$s" "$s"'. Wait, but not exactly, because in the assignment, the variable s is assigned the literal string between the single quotes. Let's clarify with an example: suppose s is assigned to the literal characters s=\47%s\47;printf "$s" "$s". Because the assignment is s='...', so the single quotes around the value when assigning mean that everything inside is literal, so \47 is just three characters: backslash, 4, 7. But when you use printf "$s" "$s", the shell first substitutes the $s variables. So the first argument to printf is the value of s: s=\47%s\47;printf "$s" "$s". Then the second argument is the same. The printf command now has a format string where %s is present. Also, in the format string, there are \47 sequences. But when the shell's printf processes the format string, it will interpret backslash escapes. So the \47 in the format string, as parsed by printf, represents an octal escape code for the single quote. So let me reconstruct the formatted output: The format string after substitution is "s=\47%s\47;printf "$s" "$s"" (wait, not exactly, because variables inside double quotes are expanded, but inside the format string here it's the literal s value). Maybe I need to consider that when the line is parsed, the printf command receives the expanded variables. Wait, step by step: Original code: s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" The first part assigns s to the string "s=\47%s\47;printf "$s" "$s"" (but with the $s inside the string being literal because of the single quotes). So after assignment, the variable s contains the literal characters: s=\47%s\47;printf "$s" "$s" Now, the printf command is executed as printf "$s" "$s". The shell replaces $s with its value, so the command becomes: printf 's=\47%s\47;printf "$s" "$s"' 's=\47%s\47;printf "$s" "$s"' Notice that the first argument is the format string: 's=\47%s\47;printf "$s" "$s"' But when printf processes this format string, it replaces the \47 with the octal value 47, which is a single quote. So the format string becomes: s='%s';printf "$s" "$s" Then, the %s is replaced with the second argument: 's=\47%s\47;printf "$s" "$s"' So the resulting output is: s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" Which is exactly the original code. Therefore, this program outputs its own source code. Therefore, it's a quine. Now, regarding the language: this uses printf, which is a shell command. The syntax of variable assignment and the use of quotes indicates that it's a shell script. The initial assignment uses single quotes to define the variable s, then uses printf in a way that constructs the output by embedding the variable's content. Therefore, this code is written in a shell scripting language, probably Bash or similar. Such programs that produce their own source code as output are called quines. So putting this all together: the program is a quine written in a shell scripting language (like Bash) that prints its own source code when executed. </think> The program is a quine, which is a program that outputs its own source code when executed. It is written in shell script (specifically compatible with Bourne shell or Bash). Here's how it works: Variable Assignment: The string s='s=\47%s\47;printf "$s" "$s"' assigns the variable s with a format string containing \47 (an octal escape for the single quote '). The %s acts as a placeholder for string substitution. Print Command: The printf "$s" "$s" uses the value of s as both the format string and the argument. The shell expands $s, replacing %s with the original value of s and interpreting \47 as '. Self-Referential Output: When executed, the program constructs its identical source code by substituting %s with the escaped version of s, reproducing the original code. Language: Shell script (Bash/Bourne-like shell). Program Type: Quine. Output: <>Plain text s='s=\47%s\47;printf "$s" "$s"';printf "$s" "$s" This matches the original code, confirming it's a quine. Token used: 3376 (12922 characters) Conclusion So, as you can see from the example above, the model followed a series of logical steps to arrive at the answer It examines the code, simulates some parts of the code to figure out what it does At the same time, it also sees certain functions such as print and single quotes and guesses that the language is bash scripting It then simulates the entire program and realizes that code output is same as the generating code. It knows from its world knowledge and training that such programs are called quines. Armed with all this information, it validates the entire chain of thought to be sure of its answer and then finally formulates and prints the answer. Task 3 – Solve an optimization problem with constraints In everyday life, we often must solve problems when we have budgets and cost constraints or want to maximize revenue or profit or any other such numerical target. We might have metrics to measure them and might guess some relationships between different variables. For example, when we have a set marketing budget and different channels such as online ads, Linkedin/X posts and emails to send to prospective attendees, how do we distribute this budget between different channels with slightly different audiences and efficacy of reach and signs ups (say for an online seminar or a conference). Let us look at a similar problem but in the manufacturing domain. This is the prompt that we will give. I want to sell a certain number of items (children's toys) in order to maximize my profit. Market research tells me that if I set the price at $1.50, I will be able to sell 5000 items, and for every 10 cents I lower the price below $1.50 I will be able to sell another 1000 items. Suppose that my manufacturing setup costs ( “start-up costs” ) total $2000, and the manufacturing cost per item ( “marginal cost” ) is $0.50. Find the price to set per item and the number of items that I need to sell in order to maximize my profit and also determine the maximum profit I can get. AI Toolkit Session follows: User I want to sell a certain number of items (children's toys) in order to maximize my profit. Market research tells me that if I set the price at $1.50, I will be able to sell 5000 items, and for every 10 cents I lower the price below $1.50 I will be able to sell another 1000 items. Suppose that my manufacturing setup costs ( “start-up costs” ) total $2000, and the manufacturing cost per item ( “marginal cost” ) is $0.50. Find the price to set per item and the number of items that I need to sell in order to maximize my profit and also determine the maximum profit I can get. Deepseek-R1 <think> Okay, let's try to figure out how to maximize the profit for selling these children's toys. Hmm, so the problem gives me some information about price, quantity sold, start-up costs, and marginal cost. I need to model this mathematically and then find the optimal price and quantity. Let's break it down step by step. First, the initial price is $1.50, and at that price, they can sell 5000 items. Then, for every 10 cents they lower the price below $1.50, they can sell an additional 1000 items. So, the price decrease affects the quantity sold. I need to model the relationship between price and quantity sold. Let me denote the number of times they lower the price by 10 cents as x. So, if x = 0, the price is $1.50, and quantity sold is 5000. If x = 1, the price is $1.40, and quantity sold is 5000 + 1000*1 = 6000. If x = 2, price is $1.30, quantity is 7000, and so on. So, in general, if they lower the price x times, the price becomes: Price = 1.50 - 0.10x dollars. And the quantity sold becomes: Quantity = 5000 + 1000x items. But x has to be a non-negative integer, right? Wait, or can x be any non-negative real number? The problem says "for every 10 cents I lower the price below $1.50 I will be able to sell another 1000 items." So, is the 10 cent decrease a discrete step or can we consider a continuous decrease? Hmm, the problem doesn't specify that we have to keep the price in multiples of 10 cents. Maybe it's continuous. But the phrasing is "for every 10 cents," which suggests that each 10 cent decrease gives an additional 1000 items. But maybe we can model it as a linear relationship. Let me check. If the price drops by 10 cents, quantity increases by 1000. So, per 1 cent decrease, quantity increases by 100 items? Wait, 10 cents correspond to 1000 items, so 1 cent decrease would correspond to 1000/10 = 100 items per cent. So, we can model the quantity sold as a linear function of the price decrease. Let me define the price p as a variable, then express the quantity sold in terms of p. Alternatively, if we consider p in dollars, the relationship is: when p decreases by $0.10, quantity increases by 1000. So, if we let p = 1.50 - 0.10x, then quantity q = 5000 + 1000x. Solving for x in terms of p: x = (1.50 - p)/0.10. Then plug into q: q = 5000 + 1000*(1.50 - p)/0.10. Let me compute that. q = 5000 + 1000*(1.50 - p)/0.10 = 5000 + 1000*( (1.50 - p) / 0.10 ). Let's compute the coefficient: 1000 / 0.10 = 10,000. So, q = 5000 + 10,000*(1.50 - p) = 5000 + 15,000 - 10,000p = 20,000 - 10,000p. Wait, that seems like a linear demand function where q = 20,000 - 10,000p. Let me check that. When p = 1.50, q = 20,000 - 10,000*(1.50) = 20,000 - 15,000 = 5000, which matches. If p decreases by $0.10 to 1.40, then q = 20,000 - 10,000*(1.40) = 20,000 - 14,000 = 6,000, which is 5000 + 1000. So that works. Therefore, the quantity sold is a linear function of price: q = 20,000 - 10,000p. So, we can model this as a linear demand curve. Alternatively, maybe they want us to model x as a continuous variable, so that x can take any non-negative real value, meaning that the price can be any value below $1.50, not necessarily in increments of 10 cents, and the quantity sold would adjust accordingly. So, if you lower the price by 5 cents, then you can sell 500 more items. But does the problem allow that? The market research says "for every 10 cents I lower the price below $1.50 I will be able to sell another 1000 items." So, it's a rate: 10 cents decrease => 1000 items more. So, that is $0.10 decrease corresponds to 1000 items increase. Therefore, the slope of the demand curve is dq/dp = -1000 / 0.10 = -10,000. So, the demand function is linear with slope -10,000. So, starting from p = 1.50, q = 5000. So, the equation is q = -10,000(p - 1.50) + 5000 = -10,000p + 15,000 + 5000 = 20,000 - 10,000p, which is the same as before. So, yes, continuous. Therefore, the quantity sold as a function of price is q(p) = 20,000 - 10,000p. Alternatively, we can write p as a function of q. Let's solve for p: q = 20,000 - 10,000p => 10,000p = 20,000 - q => p = (20,000 - q)/10,000 => p = 2 - q/10,000. So, the inverse demand function is p = 2 - q/10,000. Okay, so total revenue R is pq. So, R = pq = (2 - q/10,000)*q = 2q - q²/10,000. Total cost C consists of fixed costs (start-up costs) of $2000 and variable costs of $0.50 per item. So, total cost is C = 2000 + 0.50q. Therefore, profit π is total revenue minus total cost: π = R - C = (2q - q²/10,000) - (2000 + 0.50q) = 2q - q²/10,000 - 2000 - 0.50q Simplify: π = (2q - 0.50q) - q²/10,000 - 2000 = 1.50q - q²/10,000 - 2000. So, profit as a function of quantity q is π(q) = - (q²)/10,000 + 1.50q - 2000. To maximize profit, we need to find the value of q that maximizes π(q). Since this is a quadratic function with a negative coefficient on q², it opens downward, so the maximum is at the vertex. The vertex of a quadratic function ax² + bx + c is at x = -b/(2a). Here, a = -1/10,000 and b = 1.50. Therefore, the quantity that maximizes profit is: q = -b/(2a) = -1.50 / (2*(-1/10,000)) = -1.50 / (-2/10,000) = (1.50 * 10,000)/2 = (15,000)/2 = 7,500. So, the profit-maximizing quantity is 7,500 items. Then, the corresponding price can be found from the inverse demand function p = 2 - q/10,000 = 2 - 7,500/10,000 = 2 - 0.75 = $1.25 per item. Let me verify that. So, at q = 7500, p = $1.25. Then, total revenue R = p*q = 1.25 * 7500 = $9,375. Total cost C = 2000 + 0.50*7500 = 2000 + 3750 = $5,750. Profit π = 9,375 - 5,750 = $3,625. Is this the maximum profit? Let's check another point. Suppose q = 7000. Then, p = 2 - 7000/10,000 = 2 - 0.7 = $1.30. Revenue = 1.30 * 7000 = $9,100. Cost = 2000 + 0.50*7000 = 2000 + 3500 = $5,500. Profit = 9,100 - 5,500 = $3,600, which is less than $3,625. Similarly, q = 8000, which would be p = 2 - 8000/10,000 = 2 - 0.8 = $1.20. Revenue = 1.20 * 8000 = $9,600. Cost = 2000 + 0.50*8000 = 2000 + 4000 = $6,000. Profit = 9,600 - 6,000 = $3,600, again less than $3,625. So, 7500 seems to be the maximum. Alternatively, since we derived this using calculus (vertex of parabola), it's correct. But just to check. Alternatively, if we take derivative of π(q) with respect to q: dπ/dq = 1.50 - 2q/10,000. Set derivative equal to zero: 1.50 - (2q)/10,000 = 0 => (2q)/10,000 = 1.50 => q = (1.50 * 10,000)/2 = 7,500. So, same result. Wait, but in the original problem, they mention that for every 10 cents decrease, you can sell another 1000 items. So, starting at $1.50 with 5000 items. Then, decreasing by 10 cents increments gives 1000 more each time. So, how does it work at 7500 items? That would be an increase of 2500 items from the original 5000. Since each 10 cents decrease gives 1000 more, 2500 would be 2.5 times 10 cents. So, decrease price by 25 cents? Wait, but in the model we had, q = 20,000 - 10,000p. So, if p is $1.25, which is a $0.25 decrease from $1.50, then yes, (1.50 - 1.25)/0.10 = 2.5, so x=2.5. Then, quantity sold is 5000 + 1000*2.5 = 7500. So, even though x is 2.5, which is not an integer, the problem didn't specify that the price has to be in 10-cent increments, so allowing x to be a real number. So, in reality, you could set the price at $1.25, even though it's a 25-cent decrease, which is 2.5 times 10 cents. So, the model allows for continuous variables, which is fine. So, the conclusion is that the optimal price is $1.25 per item, selling 7500 items, resulting in a maximum profit of $3,625. Wait, let me just confirm profit calculation once again: Revenue: 7500 * 1.25 = 7500 * 1 + 7500 * 0.25 = 7500 + 1875 = 9375. Cost: 2000 + 7500 * 0.50 = 2000 + 3750 = 5750. Profit: 9375 - 5750 = 3625. Yes, that's $3,625. So, all checks out. So, the answer is price of $1.25, quantity of 7500, profit of $3,625. Final Answer The optimal price is \boxed{1.25} dollars per item, the number of items to sell is \boxed{7500}, and the maximum profit is \boxed{3625} dollars. </think> To determine the optimal price, quantity, and maximum profit for selling children's toys, we start by modeling the relationship between price and quantity sold. The market research indicates that for every $0.10 decrease in price below $1.50, the quantity sold increases by 1000 items. This results in a linear demand function. Demand Function: The price ( p ) and quantity ( q ) are related by the equation: [ q = 20,000 - 10,000p ] Solving for ( p ) gives the inverse demand function: [ p = 2 - \frac{q}{10,000} ] Profit Function: Total revenue ( R ) is given by ( R = p \times q ): [ R = \left(2 - \frac{q}{10,000}\right) q = 2q - \frac{q^2}{10,000} ] Total cost ( C ) includes fixed costs of $2000 and variable costs of $0.50 per item: [ C = 2000 + 0.50q ] Profit ( \pi ) is total revenue minus total cost: [ \pi = \left(2q - \frac{q^2}{10,000}\right) - (2000 + 0.50q) = 1.50q - \frac{q^2}{10,000} - 2000 ] Maximizing Profit: The profit function is a quadratic equation, and the maximum profit occurs at the vertex. For the quadratic equation ( \pi(q) = -\frac{q^2}{10,000} + 1.50q - 2000 ), the vertex is found at: [ q = -\frac{b}{2a} = -\frac{1.50}{2 \left(-\frac{1}{10,000}\right)} = \frac{1.50 \times 10,000}{2} = 7,500 ] Substituting ( q = 7,500 ) into the inverse demand function gives the optimal price: [ p = 2 - \frac{7,500}{10,000} = 2 - 0.75 = 1.25 ] Calculating Maximum Profit: Total revenue at ( q = 7,500 ) and ( p = 1.25 ): [ R = 1.25 \times 7,500 = 9,375 ] Total cost: [ C = 2000 + 0.50 \times 7,500 = 2000 + 3750 = 5,750 ] Maximum profit: [ \pi = 9,375 - 5,750 = 3,625 ] Final Answer The optimal price is \boxed{1.25} dollars per item, the number of items to sell is \boxed{7500}, and the maximum profit is \boxed{3625} dollars. Token used: 3399 (10017 characters) Conclusion As you can see the various steps logically laid out where the model uses its knowledge of the worlds (profits = sales - cost), generates an algebraic function to link the behavior of various variables and then uses calculus to maximize the profit function to arrive at the right price and quantity for maximum profit. Resources Reasoning models on OpenAI - https://platform.openai.com/docs/guides/reasoning Bring your own models on AI Toolkit - using Ollama and API keys Building Retrieval Augmented Generation on VSCode & AI Toolkit4.4KViews4likes0CommentsRecipe Generator Application with Phi-3 Vision on AI Toolkit Locally
In today's data-driven world, images have become a ubiquitous source of information. From social media feeds to medical imaging, we encounter and generate images constantly. Extracting meaningful insights from these visual data requires sophisticated analysis techniques. In this blog post let’s build an Image Analysis Application using the cutting-edge Phi-3 Vision model completely free of cost and on-premise environment using the VS Code AI Toolkit. We'll explore the exciting possibilities that this powerful combination offers. The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models. I would recommend going through the following blogs for getting started with VS Code AI Toolkit. 1. Visual Studio Code AI Toolkit: How to Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys Setup VS Code AI Toolkit: Launch the VS Code application and Click on the VS Code AI Toolkit extension. Login to the GitHub account if not already done. Once ready, click on model catalog. In the model catalog there are a lot of models, broadly classified into two categories, Local Run (with CPU and with GPU) Remote Access (Hosted by GitHub and other providers) For this blog, we will be using a Local Run model. This will utilize the local machine’s hardware to run the Language model. Since it involves analyzing images, we will be using the language model which supports vision operations and hence Phi-3-Vision will be a good fit as its light and supports local run. Download the model and then further it will be loaded it in the playground to test. Once downloaded, Launch the “Playground” tab and load the Phi-3 Vision model from the dropdown. The Playground also shows that Phi-3 vision allows image attachments. We can try it out before we start developing the application. Let’s upload the image using the “Paperclip icon” on the UI. I have uploaded image of Microsoft logo and prompted the language model to Analyze and explain the image. Phi-3 vision running on local premise boasts an uncanny ability to not just detect but unerringly pinpoint the exact Company logo and decipher the name with astonishing precision. This is a simple use case, but it can be built upon with various applications to unlock a world of new possibilities. Port Forwarding: Port Forwarding, a valuable feature within the AI Toolkit, serves as a crucial gateway for seamless communication with the GenAI model. To do this, launch the terminal and navigate to the “Ports” section. There will be button “Forward a Port”, click on that and select any desired port, in this blog we will use 5272 as the port. The Model-as-a-server is now ready, where the model will be available on the port 5272 to respond to the API calls. It can be tested with any API testing application. To know more click here. Creating Application with Python using OpenAI SDK: To follow this section, Python must be installed on the local machine. Launch the new VS Code window and set the working directory. Create a new Python Virtual environment. Once the setup is ready, open the terminal on VS Code, and install the libraries using “pip”. pip install openai pip install streamlit Before we build the streamlit application, lets develop the basic program and check the responses in the VSCode terminal and then further develop a basic webapp using the streamlit framework. Basic Program Import libraries: import base64 from openai import OpenAI base64: The base64 module provides functions for encoding binary data to base64-encoded strings and decoding base64-encoded strings back to binary data. Base64 encoding is commonly used for encoding binary data in text-based formats such as JSON or XML. OpenAI: The OpenAI package is a Python client library for interacting with OpenAI's API. The OpenAI class provides methods for accessing various OpenAI services, such as generating text, performing natural language processing tasks, and more. Initialize Client: Initialize an instance of the OpenAI class from the openai package, client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) OpenAI (): Initializes a OpenAI model with specific parameters, including a base URL for the API, an API key, a custom model name, and a temperature setting. This model is used to generate responses based on user queries. This instance will be used to interact with the OpenAI API. base_url = "http://127.0.0.1:5272/v1/": Specifies the base URL for the OpenAI API. In this case, it points to a local server running on 127.0.0.1 (localhost) at port 5272. api_key = "ai-toolkit": The API key used to authenticate requests to the OpenAI API. In case of AI Toolkit usage, we don’t have to specify any API key. The image analysis application will frequently deal with images uploaded by users. But to send these images to GenAI model, we need them in a format it understands. This is where the encode_image function comes in. # Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") Function Definition: def encode_image(image_path): defines a function named encode_image that takes a single argument, image_path. This argument represents the file path of the image we want to encode. Opening the Image: with open(image_path, "rb") as image_file: opens the image file specified by image_path in binary reading mode ("rb"). This is crucial because we're dealing with raw image data, not text. Reading Image Content: image_file.read() reads the entire content of the image file into a byte stream. Remember, images are stored as collections of bytes representing color values for each pixel. Base64 Encoding: base64.b64encode(image_file.read()) encodes the byte stream containing the image data into base64 format. Base64 encoding is a way to represent binary data using a combination of printable characters, which makes it easier to transmit or store the data. Decoding to UTF-8: .decode("utf-8") decodes the base64-encoded data into a UTF-8 string. This step is necessary because the OpenAI API typically expects text input, and the base64-encoded string can be treated as text containing special characters. Returning the Encoded Image: return returns the base64-encoded string representation of the image. This encoded string is what we'll send to the AI model for analysis. In essence, the encode_image function acts as a bridge, transforming an image file on your computer into a format that the AI model can understand and process. Path for the Image: We will use an image stored on our local machine for this section, while we develop the webapp, we will change this to accept it to what the user uploads. image_path = "C:/img.jpg" #path of the image here This line of code is crucial for any program that needs to interact with an image file. It provides the necessary information for the program to locate and access the image data. Base64 String: # Getting the base64 string base64_image = encode_image(image_path) This line of code is responsible for obtaining the base64-encoded representation of the image specified by the image_path. Let's break it down: encode_image(image_path): This part calls the encode_image function, which we've discussed earlier. This function takes the image_path as input and performs the following: Reads the image file from the specified path. Converts the image data into a base64-encoded string. Returns the resulting base64-encoded string. base64_image = ...: This part assigns the return value of the encode_image function to the variable base64_image. This section effectively fetches the image from the given location and transforms it into a special format (base64) that can be easily handled and transmitted by the computer system. This base64-encoded string will be used subsequently to send the image data to the AI model for analysis. Invoking the Language Model: This code tells the AI model what to do with the image. response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in the Image?", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) response = client.chat.completions.create(...): This line sends instructions to the AI model we're using (represented by client). Here's a breakdown of what it's telling the model: chat.completions.create: We're using a specific part of the OpenAI API designed for having a conversation-like interaction with the model. The ... part: This represents additional details that define what we want the model to do, which we'll explore next. Let's break down the details (...) sent to the model: 1) model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx": This tells the model exactly which AI model to use for analysis. In our case, it's the "Phi-3-vision" model. 2) messages: This defines what information we're providing to the model. Here, we're sending two pieces of information: role": "user": This specifies that the first message comes from a user (us). The content: This includes two parts: "What's in the Image?": This is the prompt we're sending to the model about the image. "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}: This sends the actual image data encoded in base64 format (stored in base64_image). In a nutshell, this code snippet acts like giving instructions to the AI model. We specify the model to use, tell it we have a question about an image, and then provide the image data itself. Printing the response on the console: print(response.choices[0].message.content) We asked the AI model "What's in this image?" This line of code would then display the AI's answer. Console response: Finally, we can see the response on the terminal. Now to make things more interesting, let’s convert this into a webapp using the streamlit framework. Recipe Generator Application with Streamlit: Now we know how to interact with the Vision model offline using a basic console. Let’s make things even more exciting by applying all this to a use-case which probably will be most loved by all those who are cooking enthusiasts!! Yes, let’s create an application which will assist in cooking by looking what’s in the image of ingredients! Create a new file and name is as “app.py” select the same. venv that was used earlier. Make sure the Visual studio toolkit is running and serving the Phi-3 Vision model through the port 5272. First step is importing the libraries, import streamlit as st import base64 from openai import OpenAI base64 and OpenAI is the same as we had used in the earlier section. Streamlit: This part imports the entire Streamlit library, which provides a powerful set of tools for creating user interfaces (UIs) with Python. Streamlit simplifies the process of building web apps by allowing you to write Python scripts that directly translate into interactive web pages. client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) As discussed in the earlier section, initializing the client and configuring the base_url and api_key. st.title('Recipe Generator 🍔') st.write('This is a simple recipe generator application.Upload images of the Ingridients and get the recipe by Chef GenAI! 🧑🍳') uploaded_file = st.file_uploader("Choose a file") if uploaded_file is not None: st.image(uploaded_file, width=300) st.title('Recipe Generator 🍔'): This line sets the title of the Streamlit application as "Recipe Generator" with a visually appealing burger emoji. st.write(...): This line displays a brief description of the application's functionality to the user. uploaded_file = st.file_uploader("Choose a file"): This creates a file uploader component within the Streamlit app. Users can select and upload an image file (likely an image of ingredients). if uploaded_file is not None: : This conditional block executes only when the user has actually selected and uploaded a file. st.image(uploaded_file, width=300): If an image is uploaded, this line displays the uploaded image within the Streamlit app with a width of 300 pixels. In essence, this code establishes the basic user interface for the Recipe Generator app. It allows users to upload an image, and if an image is uploaded, it displays the image within the app. preference = st.sidebar.selectbox( "Choose your preference", ("Vegetarian", "Non-Vegetarian") ) cuisine = st.sidebar.selectbox( "Select for Cuisine", ("Indian","Chinese","French","Thai","Italian","Mexican","Japanese","American","Greek","Spanish") ) We use Streamlit's sidebar and selectbox features to create interactive user input options within a web application: st.sidebar.selectbox(...): This line creates a dropdown menu (selectbox) within the sidebar of the Streamlit application.The first argument, "Choose your preference", sets the label or title for the dropdown.The second argument, ("Vegetarian", "Non-Vegetarian"), defines the list of options available for the user to select (in this case, dietary preferences). cuisine = st.sidebar.selectbox(...): This line creates another dropdown menu in the sidebar, this time for selecting the desired cuisine.The label is "Select for Cuisine".The options provided include "Indian", "Chinese", "French", and several other popular cuisines. In essence, this code allows users to interact with the application by selecting their preferred dietary restrictions (Vegetarian or Non-Vegetarian) and desired cuisine from the dropdown menus in the sidebar. def encode_image(uploaded_file): """Encodes a Streamlit uploaded file into base64 format""" if uploaded_file is not None: content = uploaded_file.read() return base64.b64encode(content).decode("utf-8") else: return None base64_image = encode_image(uploaded_file) The same function of encode_image as discussed in the earlier section is being used here. if st.button("Ask Chef GenAI!"): if base64_image: response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": f"STRICTLY use the ingredients in the image to generate a {preference} recipe and {cuisine} cuisine.", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) print(response.choices[0].message.content) st.write(response.choices[0].message.content) else: st.write("Please upload an image with any number of ingridients and instantly get a recipe.") Above code block implements the core functionality of the Recipe Generator app, triggered when the user clicks a button labeled "Ask Chef GenAI!": if st.button("Ask Chef GenAI!"): This line checks if the user has clicked the button. If they have, the code within the if block executes. if base64_image: This inner if condition checks if a variable named base64_image has a value. This variable likely stores the base64 encoded representation of the uploaded image (containing ingredients). If base64_image has a value (meaning an image is uploaded), the code proceeds. client.chat.completions.create(...): Client that had been defined earlier interacts with the API . Here, it calls a to generate text completions, thereby invoking a small language model. The arguments provided specify the model to be used ("Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx") and the message to be completed. The message consists of two parts within a list: User Input: The first part defines the user's role ("user") and the content they provide. This content is an instruction with two key points: Dietary Preference: It specifies to "STRICTLY use the ingredients in the image" to generate a recipe that adheres to the user's preference (vegetarian or non-vegetarian, set using the preference dropdown). Cuisine Preference: It mentions the desired cuisine type (Indian, Chinese, etc., selected using the cuisine dropdown). Image Data: The second part provides the image data itself. It includes the type ("image_url") and the URL, which is constructed using the base64_image variable containing the base64 encoded image data. print(response.choices[0].message.content) & st.write(...): The response will contain a list of possible completions. Here, the code retrieves the first completion (response.choices[0]) and extracts its message content. This content is then printed to the console like before and displayed on the Streamlit app using st.write. else block: If no image is uploaded (i.e., base64_image is empty), the else block executes. It displays a message reminding the user to upload an image to get recipe recommendations. The above code block is the same as before except the we have now modified it to accept few inputs and also have made it compatible with streamlit. The coding is now completed for our streamlit application! It's time to test the application. Navigate to the terminal on Visual Studio Code and enter the following command, (if the file is named as app.py) streamlit run app.py Upon successful run, it will redirect to default browser and a screen with the Recipe generator will be launched, Upload an image with ingredients, select the recipe, cuisine and click on “Ask Chef GenAI”. It will take a few moments for delightful recipe generation. While generating we can see the logs on the terminal and finally the recipe will be shown on the screen! Enjoy your first recipe curated by Chef GenAI powered by Phi-3 vision model on local prem using Visual Studio AI Toolkit! The code is available on the following GitHub Repository. In the upcoming series we will explore more types of Gen AI implementations with AI toolkit. Resources: 1. Visual Studio Code AI Toolkit: Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys 5. Expanded model catalog for AI Toolkit 6. Azure Toolkit Samples GitHub Repository318Views1like1Comment