knowledge mining
17 TopicsAI-900: Microsoft Azure AI Fundamentals Study Guide
This comprehensive study guide provides a thorough overview of the topics covered in the Microsoft Azure AI Fundamentals (AI-900) exam, including Artificial Intelligence workloads, fundamental principles of machine learning, computer vision and natural language processing workloads. Learn about the exam's intended audience, how to earn the certification, and the skills measured as of April 2022. Discover the important considerations for responsible AI, the capabilities of Azure Machine Learning Studio and more. Get ready to demonstrate your knowledge of AI and ML concepts and related Microsoft Azure services with this helpful study guide.33KViews11likes3CommentsExtract Data from PDFs using Form Recognizer with Code or Without!
Form Recognizer is a powerful tool to help build a variety of document machine learning solutions. It is one service however its made up of many prebuilt models that can perform a variety of essential document functions. You can even custom train a model using supervised or unsupervised learning for tasks outside of the scope of the prebuilt models! Read more about all the features of Form Recognizer here. In this example we will be looking at how to use one of the prebuilt models in the Form Recognizer service that can extract the data from a PDF document dataset. Our documents are invoices with common data fields so we are able to use the prebuilt model without having to build a customized model. Sample Invoice: After we take a look at how to do this with Python and Azure Form Recognizer, we will take a look at how to do the same process with no code using the Power Platform services: Power Automate and Form Recognizer built into AI Builder. In the Power Automate flow we are scheduling a process to happen every day. What the process does is look in the raw blob container to see if there is new files to be processed. If there is new files to be processed it gets all blobs from the container and loops through each blob to extract the PDF data using a prebuilt AI builder step. Then it deletes the processed document from the raw container. See what it looks like below. Power Automate Flow: Prerequisites for Python Azure Account Sign up here! Anaconda and/or VS Code Basic programming knowledge Prerequisites for Power Automate Power Automate Account Sign up here! No programming knowledge Process PDFs with Python and Azure Form Recognizer Service Create Services First lets create the Form Recognizer Cognitive Service. Go to portal.azure.com to create the resource or click this link. Now lets create a storage account to store the PDF dataset we will be using in containers. We want two containers, one for the processed PDFs and one for the raw unprocessed PDF. Create an Azure Storage Account Create two containers: processed , raw Upload data Upload your dataset to the Azure Storage raw folder since they need to be processed. Once processed then they would get moved to the processed container. The result should look something like this: Create Notebook and Install Packages Now that we have our data stored in Azure Blob Storage we can connect and process the PDF forms to extract the data using the Form Recognizer Python SDK. You can also use the Python SDK with local data if you are not using Azure Storage. This example will assume you are using Azure Storage. Create a new Jupyter notebook in VS Code. Install the Python SDK !pip install azure-ai-formrecognizer --pre Then we need to import the packages. import os from azure.core.exceptions import ResourceNotFoundError from azure.ai.formrecognizer import FormRecognizerClient from azure.core.credentials import AzureKeyCredential import os, uuid from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__ Create FormRecognizerClient Update the endpoint and key with the values from the service you created. These values can be found in the Azure Portal under the Form Recongizer service you created under the Keys and Endpoint on the navigation menu. endpoint = "<your endpoint>" key = "<your key>" We then use the endpoint and key to connect to the service and create the FormRecongizerClient form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key)) Create the print_results helper function for use later to print out the results of each invoice. def print_result(invoices, blob_name): for idx, invoice in enumerate(invoices): print("--------Recognizing invoice {}--------".format(blob_name)) vendor_name = invoice.fields.get("VendorName") if vendor_name: print("Vendor Name: {} has confidence: {}".format(vendor_name.value, vendor_name.confidence)) vendor_address = invoice.fields.get("VendorAddress") if vendor_address: print("Vendor Address: {} has confidence: {}".format(vendor_address.value, vendor_address.confidence)) customer_name = invoice.fields.get("CustomerName") if customer_name: print("Customer Name: {} has confidence: {}".format(customer_name.value, customer_name.confidence)) customer_address = invoice.fields.get("CustomerAddress") if customer_address: print("Customer Address: {} has confidence: {}".format(customer_address.value, customer_address.confidence)) customer_address_recipient = invoice.fields.get("CustomerAddressRecipient") if customer_address_recipient: print("Customer Address Recipient: {} has confidence: {}".format(customer_address_recipient.value, customer_address_recipient.confidence)) invoice_id = invoice.fields.get("InvoiceId") if invoice_id: print("Invoice Id: {} has confidence: {}".format(invoice_id.value, invoice_id.confidence)) invoice_date = invoice.fields.get("InvoiceDate") if invoice_date: print("Invoice Date: {} has confidence: {}".format(invoice_date.value, invoice_date.confidence)) invoice_total = invoice.fields.get("InvoiceTotal") if invoice_total: print("Invoice Total: {} has confidence: {}".format(invoice_total.value, invoice_total.confidence)) due_date = invoice.fields.get("DueDate") if due_date: print("Due Date: {} has confidence: {}".format(due_date.value, due_date.confidence)) Connect to Blob Storage Now lets connect to our blob storage containers and create the BlobServiceClient. We will use the client to connect to the raw and processed containers that we created earlier. # Create the BlobServiceClient object which will be used to get the container_client connect_str = "<Get connection string from the Azure Portal>" blob_service_client = BlobServiceClient.from_connection_string(connect_str) # Container client for raw container. raw_container_client = blob_service_client.get_container_client("raw") # Container client for processed container processed_container_client = blob_service_client.get_container_client("processed") # Get base url for container. invoiceUrlBase = raw_container_client.primary_endpoint print(invoiceUrlBase) HINT: If you get a "HttpResponseError: (InvalidImageURL) Image URL is badly formatted." error make sure the proper permissions to access the container are set. Learn more about Azure Storage Permissions here Extract Data from PDFs We are ready to process the blobs now! Here we will call list_blobs to get a list of blobs in the raw container. Then we will loop through each blob, call the begin_recognize_invoices_from_url to extract the data from the PDF. Then we have our helper method to print the results. Once we have extracted the data from the PDF we will upload_blob to the processed folder and delete_blob from the raw folder. print("\nProcessing blobs...") blob_list = raw_container_client.list_blobs() for blob in blob_list: invoiceUrl = f'{invoiceUrlBase}/{blob.name}' print(invoiceUrl) poller = form_recognizer_client.begin_recognize_invoices_from_url(invoiceUrl) # Get results invoices = poller.result() # Print results print_result(invoices, blob.name) # Copy blob to processed processed_container_client.upload_blob(blob, blob.blob_type, overwrite=True) # Delete blob from raw now that its processed raw_container_client.delete_blob(blob) Each result should look similar to this for the above invoice example: The prebuilt invoices model worked great for our invoices so we don't need to train a customized Form Recognizer model to improve our results. But what if we did and what if we didn't know how to code?! You can still leverage all this awesomeness in AI Builder with Power Automate without writing any code. We will take a look at this same example in Power Automate next. Use Form Recognizer with AI Builder in Power Automate You can achieve these same results using no code with Form Recognizer in AI Builder with Power Automate. Lets take a look at how we can do that. Create a New Flow Log in to Power Automate Click Create then click Scheduled Cloud Flow . You can trigger Power Automate flows in a variety of ways so keep in mind that you may want to select a different trigger for your project. Give the Flow a name and select the schedule you would like the flow to run on. Connect to Blob Storage Click New Step List blobs Step Search for Azure Blob Storage and select List blobs Select the ellipsis click Create new connection if your storage account isn't already connected Fill in the Connection Name , Azure Storage Account name (the account you created), and the Azure Storage Account Access Key (which you can find in the resource keys in the Azure Portal) Then select Create Once the storage account is selected click the folder icon on the right of the list blobs options. You should see all the containers in the storage account, select raw . Your flow should look something like this: Loop Through Blobs to Extract the Data Click the plus sign to create a new step Click Control then Apply to each Select the textbox and a list of blob properties will appear. Select the value property Next select add action from within the Apply to each Flow step. Add the Get blob content step: Search for Azure Blob Storage and select Get blob content Click the textbox and select the Path property. This will get the File content that we will pass into the Form Recognizer. Add the Process and save information from invoices step: Click the plus sign and then add new action Search for Process and save information from invoices Select the textbox and then the property File Content from the Get blob content section Add the Copy Blob step: Repeat the add action steps Search for Azure Blob Storage and select Copy Blob Select the Source url text box and select the Path property Select the Destination blob path and put /processed for the processed container Select Overwrite? dropdown and select Yes if you want the copied blob to overwrite blobs with the existing name. Add the Delete Blob step: Repeat the add action steps Search for Azure Blob Storage and select Delete Blob Select the Blob text box and select the Path property The Apply to each block should look something like this: Save and Test the Flow Once you have completed creating the flow save and test it out using the built in test features that are part of Power Automate. This prebuilt model again worked great on our invoice data. However if you have a more complex dataset, use the AI Builder to label and create a customized machine learning model for your specific dataset. Read more about how to do that here. Conclusion We went over a fraction of the things that you can do with Form Recognizer so don't let the learning stop here! Check out the below highlights of new Form Recognizer features that were just announced and the additional doc links to dive deeper into what we did here. Additional Resources New Form Recognizer Features What is Form Recognizer? Quickstart: Use the Form Recognizer client library or REST API Tutorial: Create a form-processing app with AI Builder AI Developer Resources page AI Essentials video including Form Recognizer45KViews1like2CommentsPersonal AI
How could Microsoft get it so wrong. I tried to use Azure and AI and its just matrix calculations. Open-AI is like a brute force training method, but at least its visualized. I just want a dog, a pet cat. An annoying robot that helps me with math and does my homework and that i can play games with. That will learn and grow with me and all you have is just a bunch of worthless databases like your CD collection, or vintage games. Bunch of comic book nerds with their libraries of books. Its like walking into a library and you don't know how to read. Here i was thinking that paperclip from word would have progressed into an annoying helper, but at least i have dancing girls hows **bleep** i can vibrate. Seriously can you not just make something simple, someone somewhere in that huge office must be someone who knows where the AI program is and can make a simple program and is like a "ferbe". Thats as helpful as the paperclip with a comprehensive help toolkit. Something that works like an operating system of something robotic i want to build. Could someone please help me make something simple, that can do basic reading comprehension so i can at least talk to it, and get it to read a book and answer questions. Maybe one that can learn maths and can 3d model things so i can visualize it. Or maybe something that can just learn by watching me and can recall something i need for reference. "i saw something on facebook" (pulls out phone to show person) "asks ai something vague" instantly pulls it up.2.2KViews0likes1CommentSemantic Search in Action
Azure Cognitive search has included a new feature called semantic search. Customers have put this feature to action, so early in May, 2021, Ogilvy a subsidiary of WPP incorporated semantic search in their Enterprise Knowledge Management system called Starfish. The project is based around a content Discovery portal which should be the first point of contact for users and a key component in Ogilvy’s rich ecosystem. It uses cognitive search which provides intelligent document insights and recommendation son RFI, RFP’s and case studies, leading to faster and efficient response to new business requests. A client typically ask a series of questions starting with inquiring about Ogilvy as a company, the capabilities and its accomplishments, similar works for a peer company, and fees. On the Starfish Portal they would ask the following When Ogilvy receives an RFI, it will include some basic questions about Ogilvy Where are Ogilvy's headquarters? What are Ogilvy's core competencies? Who are Ogilvy's biggest customers? RFI’s will also include deeper questions about Ogilvy’s experience and how they think/work Give an example of Ogilvy’s work <- good answer Ogilvy may also want to reference past customers scenarios to show how they solved problems in the past What was Ogilvy's campaign for Fanta? When was Fanta discovered? Without Semantic search query terms are analyzed via similarity algorithms, using a term frequency that count the number of times a term appears in a document or within a document corpus. A probability is applied and estimates if this is relevant. Intent is lacking in most web experience. Overall Sematic search has significantly advanced the quality of search results: Technology benefits: Intelligent Ranking - uses a semantic ranking model , so search is based on the context and intent , it is elevating matches that make more sense given the relevance of the content in the results. Better Query Understanding – it is based on meaning and not just the syntax of the word unlike others technologies that will use text frequency. WHO sent a message ( World Health Org) vs Who is the father…? Semantic answers – It improves the quality of search results in two ways. First, the ranking of documents that are semantically closer to the intent of original query is a significant benefit. Second, results are more immediately consumable when captions, and potentially answers, are present on the page. At all times, the engine is working with existing content. Language models used in semantic search are designed to extract an intact string that looks like an answer but won't try to compose a new string as an answer to a query, or as a caption for a matching document. We use Deep neural nets in Bing that understand the nuance of the language and trained on different models of the language – how words are related in various context and dimensions. Figure 1. Json Query { "search": "When was Fanta Orange discovered", "queryType": "semantic", "queryLanguage": "en-us", "speller": "lexicon", "answers": "extractive|count-3", "searchFields": "content,metadata_storage_name", "count": true } Response : Note the caption in the answer { "@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)", "@odata.count": 2115, "@search.answers": [ { "key": "79b0fe8e-0648-4cc5-bd5c-eaf0e2027855", "text": "First launched Fanta began Fanta U.S. in U.S. phasing out in U.S. relaunch 1940 1941 1959 1987 2002 2005 First launched Minute Maid in Germany launched in U.S. As beverage choice has exploded in recent years, carbonated soft drinks (CSDs) have faced stiff competition.", "highlights": null, "score": 0.8339705 } Versus the same query without Semantic Search : { "search": "When was Fanta discovered", "queryType": "full", "queryLanguage": "en-us", "speller": "lexicon", "count": true } { "@odata.context": "https://ci-acs.search.windows.net/indexes('ogilvy-poc-index')/$metadata#docs(*)", "@odata.count": 3253, "@search.nextPageParameters": { "search": "When was Fanta discovered", "queryType": "full", "queryLanguage": "en-us", "speller": "lexicon", "count": true, "skip": 50 }, Response has several hits but not close "value": [ { "@search.score": 42.056797, "content": "\n_rels/.rels\n\n\ndocProps/core.xml\n\n\ndocProps/app.xml\n\n\nppt/presentation.xml\n\n\nppt/_rels/presentation.xml.rels\n\n\nppt/presProps.xml\n\n\nppt/viewProps.xml\n\n\nppt/commentAuthors.xml\n\n\nppt/slideMasters/slideMaster1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideMasters/_rels/slideMaster1.xml.rels\n\n\nppt/theme/theme1.xml\n\n\nppt/slideLayouts/slideLayout1.xml\nTitle TextBody Level OneBody Level TwoBody Level ThreeBody Level FourBody Level Five\n\n\nppt/slideLayouts/_rels/slideLayout1.xml.rels\n\n\nppt/slideLayouts/slideLayout2.xml\nTitle TextBody Level OneBody Level TwoBody Level Technology Background: Semantic search adds a semantic ranking model; and second, it returns captions and answers in the response. Semantic ranking looks for context and relatedness among terms, elevating matches that make more sense given the query. Language understanding finds summarizations or captions and answers within your content and includes them in the response, which can then be rendered on a search results page for a more productive search experience. State-of-the-art pretrained models are used for summarization and ranking. To maintain the fast performance that users expect from search, semantic summarization and ranking are applied to just the top 50 results, as scored by the default similarity scoring algorithm ( BM25) . Using those results as the document corpus, semantic ranking re-scores those results based on the semantic strength of the match.. Scores are calculated based on the degree of linguistic similarity between query terms and matching terms in the index The underlying technology is from Bing and Microsoft Research, and integrated into the Cognitive Search infrastructure as an add-on feature. In the preparation step, the document corpus returned from the initial result set is analyzed at the sentence and paragraph level to find passages that summarize each document. In contrast with keyword search, this step uses machine reading and comprehension to evaluate the content. Through this stage of content processing, a semantic query returns captions and answers. To formulate them, semantic search uses language representation to extract and highlight key passages that best summarize a result. If the search query is a question - and answers are requested - the response will also include a text passage that best answers the question, as expressed by the search query. For both captions and answers, existing text is used in the formulation. The semantic models do not compose new sentences or phrases from the available content, nor does it apply logic to arrive at new conclusions. In short, the system will never return content that doesn't already exist. Results are then re-scored based on the conceptual similarity of query terms. Key Success Measurements for Ogilvy 40% improvement in RFP/RFI response time. Content growth per month RFP Generator clicks Content downloads User Adoption and Collaboration Quality Content Searches Business Outcomes: The biggest business impact will be to have a significant increase in win rate for RFI's which lead to a higher revenue, this was achieved by the portals ability to identify best answers to the RFI and layouts without having to perform multiple searches, saving time and resources. Being able to use routine methods, filters and cognitive function to refine the search results would eliminate redundancy by almost 40%, reducing the costs of the process, and enhance customer experience and satisfaction.4.5KViews0likes1CommentAnalyzing COVID Medical Papers with Azure and Text Analytics for Health
Since the beginning of COVID pandemic more than a year ago, there have been more than 400000 scientific papers published on the subject. In this post, we show how AI can help to extract some knowledge from those papers to gain insights, and how to build a tool to help researcher navigate the paper collection in a semantically rich way.6.2KViews1like0Comments