natural language processing
17 TopicsHarness the power of Large Language Models with Azure Machine Learning prompt flow
Unlock the full potential of your AI solutions with our latest blog on prompt flow! Discover how to effectively assess and refine your prompts and flows, leading to production-ready, impactful LLM-infused applications. Don't miss out on these game-changing insights!105KViews17likes6CommentsMistral Large, Mistral AI's flagship LLM, debuts on Azure AI Models-as-a-Service
Microsoft is partnering with Mistral AI to bring its Large Language Models (LLMs) to Azure. Mistral AI’s OSS models, Mixtral-8x7B and Mistral-7B, we added to the Model Catalog last December. We are excited to announce the addition of Mistral AI’s new flagship model, Mistral Large to the Mistral AI collection of models in the Azure AI model catalog today. The Mistral Large model will be available through Models-as-a-Service (MaaS) that offers API-based access and token based billing for LLMs, making it easy to build Generative AI apps. You can provision an API endpoint in a matter of seconds and try out the model in the Azure AI Studio playground or use it with popular LLM app development tools like Azure AI prompt flow and LangChain. The APIs support two layers of safety – first, the model has built-in support for a “safe prompt” parameter and second, Azure AI content safety that screens for harmful content generated by the model, enabling developers to build safe and trustworthy applications.48KViews4likes7CommentsAzure Open AI (GPT) with Power Apps - Build a Power App to create Demo or Personal Knowledge Bot
Let's build a power app with open ai to create a personal knowledge assistant and or also to show demo. App allows users to increase productivity by gaining knowledge by asking questions and other features like Summarization, Code generation, Email Content, Blob post content, twitter content.26KViews6likes4CommentsFalcon LLMs in Azure Machine Learning
We are thrilled to announce that the Falcon family of state-of-the-art language models, created by the Technology Innovation Institute (TII) and hosted on Hugging Face hub, is now available in Model Catalog on the Azure Machine Learning platform. This exciting addition is a result of the partnership between Microsoft and Hugging Face. With their flagship library, Transformers, and Models hub with over 200,000 open-source models, Hugging Face has become the go-to source for state-of-the-art pre-trained models and tools in the NLP space.14KViews2likes0CommentsComparative study of Azure Open AI GPT model and LLAMA 2
Introduction to LLMs Large Language Models (LLMs) have revolutionized natural language processing by leveraging extensive training on massive datasets with tens of millions to billions of weights. Employing self-supervised and semi-supervised learning, exemplified by models like GPT-3, GPT-4, and LLAMA, these models predict the next token or word in input text, showcasing a profound understanding of contextual relationships. Notably, recent advancements feature a uni-directional (autoregressive) Transformer architecture, as seen in GPT-3, surpassing 100 billion parameters and significantly enhancing language understanding capabilities. GPT OpenAI's GPT (generative pre-trained transformer) models have been trained to understand natural language and code. GPTs provide text outputs in response to their inputs. The inputs to GPTs are also referred to as "prompts". Designing a prompt is essentially how you “program” a GPT model, usually by providing instructions or some examples of how to successfully complete a task. Model Evaluation Pre-trained base GPT-4 model assessed using traditional language model benchmarks. Contamination checks performed for test data to identify overlaps with the training set. Few-shot prompting utilized for all benchmarks during the evaluation. GPT-4 exhibits superior performance compared to existing language models and previous state-of-the-art systems. Outperforms models that often employ benchmark-specific crafting or additional training protocols. GPT-4's performance evaluated alongside the best state-of-the-art models (SOTA) with benchmark-specific training. GPT-4 outperforms existing models on all benchmarks, surpassing SOTA with benchmark-specific training on all datasets except DROP. GPT-4 outperforms the English language performance of GPT 3.5 and existing language models for the majority of languages, including low-resource languages such as Latvian, Welsh, and Swahili. GPT-4 substantially improves over previous models in the ability to follow user intent. On a dataset of 5,214 prompts submitted to ChatGPT and the OpenAI API, the responses generated by GPT-4 were preferred over the responses generated by GPT-3.5 on 70.2% of prompts. Azure OpenAI service Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-4, GPT-35-Turbo, and Embeddings model series. These models can be easily adapted to your specific task including but not limited to content generation, summarization, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio. GPT-4 GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like GPT-3.5 Turbo, GPT-4 is optimized for chat and works well for traditional completions tasks. Use the Chat Completions API to use GPT-4. GPT-3.5 GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is GPT-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. LLAMA 2 Llama 2 is a large language model (LLM) developed by Meta that can generate natural language text for various applications. It is pretrained on 2 trillion tokens of public data with a context length of 4096 and has variants with 7B, 13B and 70B parameters. In this blog post, we will show you how to use Llama 2 on Microsoft Azure, the platform for the most widely adopted frontier and open models. Comparison to closed-source models LLAMA 2 is not good at coding as per the statistics below but goes head-to-head with Chat GPT in other tasks. Human Evaluation Human evaluation results for Llama 2-Chat models compared to open- and closed-source models across ~4,000 helpfulness prompts with three raters per prompt. The largest Llama 2-Chat model is competitive with ChatGPT. Llama 2-Chat 70B model has a win rate of 36% and a tie rate of 31.5% relative to ChatGPT. Note: 4k prompt set does not include any coding- or reasoning-related prompts. Language distribution in pretraining data Most data is in English, meaning that Llama 2 will perform best for English-language use cases. The large unknown category is partially made up of programming code data. Model is not trained with enough coding data and hence not capable of solving coding related problems. The model’s performance in languages other than English remains fragile and should be used with caution. Performance with tool use Evaluation on the math datasets Llama 2-Chat is able to understand the tools’s applications, and the API arguments, just through the semantics, despite never having been trained to use tools Example: Ultimately, the choice between Llama 2 and GPT or ChatGPT-4 would depend on the specific requirements and budget of the user. Larger parameter sizes in models like ChatGPT-4 can potentially offer improved performance and capabilities, but the free accessibility of Llama 2 may make it an attractive option for those seeking a cost-effective solution for chatbot development.10KViews0likes0CommentsBuild custom NLP solutions with AzureML AutoML NLP
Introduction: Since the publication of the BERT paper [1], Transformer architecture [2] based pretrained deep neural networks have become the state of the art for Natural Language Processing (NLP) tasks. These models have helped Machine Learning professionals in research, academia, and industry alike. Many of the biggest technology companies have devoted enormous resources towards further improving these models, in terms of performance and scale, while many others have leveraged these to cater to their use cases. AzureML (Azure Machine Learning) AutoML (Automated Machine Learning) was one of the earliest adopters of Foundational Deep Neural Network Models for NLP tasks like classification since the beginning of 2020 [3]. We have been building on it ever since. We’re now taking a step further and are excited to announce the General Availability of AutoML NLP, an end-to-end deep learning solution for text data within AzureML. AutoML NLP solves NLP problems like text classification and named entity recognition (NER) and provides the following capabilities: Large pool of Pretrained Text Deep Neural Network (DNN) models (currently in preview) Ability to tune hyperparameters (currently in preview) on these models to help achieve high scores Data awareness that taps into input dataset characteristics and subtleties Native support for 104 languages Optimizations for near-linear scale on very large data sizes and clusters Seamless ML Ops and production deployment on AzureML endpoints Scenarios Supported AutoML NLP currently offers three scenarios: Multiclass Classification, Multilabel Classification, and NER. Multiclass Classification (including Binary Classification) This task helps classify each datapoint/sample into exactly one class from a total of two or more classes. Multilabel Classification This task helps classify each datapoint/sample into any number of classes, including all classes or no classes, from a total of two or more classes. Named Entity Recognition (NER) This task helps classify each entity into exactly one entity class, such that multiple entities corresponding to the same chunk are classified into the same base entity class, leveraging special formatting techniques, discussed next. We expect the NER input data to be based on the CoNLL format, such that the dataset would be provided as text files. Within these files every input text example would be split into multiple lines, where each line would contain one word followed by the label/category for that word, and every input example would be followed by an empty new line. The labels in the NER data should adhere to the IOB2 (Inside-Outside-Beginning) tagging format [4]. According to this format, tokens which are part of a chunk of multiple tokens, such as first and last name of an individual, are prefixed with “B-” and “I-” tags respectively. This helps determine the position in the chunk. Every chunk begins with the “B-” prefix, and all entities that are part of a chunk following the beginning entity, are prefixed with “I-”. Entities that do not belong to any entity class are classified as “O”. For more details and examples, see Set up AutoML for NLP - Azure Machine Learning | Microsoft Docs Leveraging Pretrained Language Models Large language models, such as BERT, RoBERTa, XLNet, Turing-NLG, GPT-3, are pretrained using large training corpora. They leverage the enormous knowledge gained during pretraining when used for task-specific finetuning, thus requiring only a small amount of labeled data and a few epochs to achieve good results. Collecting and/or labeling data is often incredibly challenging and expensive; hence employing such models can be an ideal option for users who have limited training data. The larger the model, the higher the number of trainable parameters, and greater the ability to store more knowledge from the pretraining corpus. However, this also increases the GPU memory requirements, training time, and inference latency. The diagram below captures the finetuning results for several pretrained models comparing accuracy and normalized training times with respect to the size of the training dataset. We leverage the popular multiclass dataset, AG News [5]. The training time is normalized by dividing the training time for bert-base-cased model. Sweeping over many Models and Hyperparameter combinations We empower our customers to select from a wide array of powerful pretrained text DNN models for finetuning. We currently support 15 pretrained models including: autoencoding models (like BERT) and autoregressive models (like XLNet) multilingual models like XLM-RoBERTa and BERT-multilingual large models like RoBERTa-large and BERT-large for achieving higher scores base models (like BERT-base and RoBERTa-base) and distilled models (like distil-BERT and distil-RoBERTa) for faster training and lower GPU memory consumption For all models, AutoML uses intelligent default hyperparameters which would produce good results for almost all use cases. For users who want more control, we provide the ability to override these default hyperparameters and their corresponding ranges, empowering users to leverage their domain knowledge for better fine-tuning results. Hyperparameters such as batch-size, gradient-accumulation-steps and epochs are commonly used and can impact training time and GPU memory usage, in addition to overall model performance. Other hyperparameters, such as learning-rate, weight-decay, warmup-ratio and lr-scheduler-type are also available for tuning, but the training results are quite sensitive to these. We have found AutoML defaults to work best for most scenarios, hence it is recommended to use those and only customize if they are not producing the best outcome. The model sweeping feature also offers the early termination functionality which automatically ends the finetuning runs for poorly performing models. Several policies for early termination are supported along with customizable evaluation intervals [6]. The overall goal is to improve computation efficiency: achieve the best results using compute resources judiciously. Model Sweeping and Hyperparameter tuning capabilities are released for Public Preview at this time, and they will be made Generally Available soon. In the next few subsections, we’ll describe several features that we’ve enabled in AutoML NLP for improving performance and efficiency. Custom Features One size does not fit all, even though the pretrained text DNN models are largely capable of solving a wide variety of tasks for many kinds of datasets. The dataset’s characteristics provide important signals that can help improve finetuning results. For example, adjusting the model’s sequence length can offer significant boost in the scores for longer range text data, while also reducing the memory and time requirements for shorter range text data. Similarly, when datasets have more than one text column, AutoML smartly utilizes text data from all columns. Multilingual Support AutoML NLP natively supports 104 languages. Customers are required to provide the dataset language parameter when they submit an experiment, and the model best suited to that language would be leveraged. Additionally, with our model sweeping functionality, users can leverage powerful multilingual capable models like bert-base-multilingual, xlm-roberta-base and xlm-roberta-large to achieve near state of the art (SOTA) performance on datasets in a variety of languages. Distributed Training AutoML is well-tuned to work best for a combination of GPU SKUs with high efficiency InfiniBand interconnections, latest libraries for data parallelism, and innovation from Microsoft Research to achieve robust training on multi-GPU or multi-node AzureML compute clusters providing near-linear scaling. This functionality is available to all NLP tasks that we support. Here is an example of the speedups achieved through distributed training with NC24rs_v3 virtual machines, each of which comprises of 4 V100 GPUs. We measure scaling in terms of strong scaling [7] defined in high-performance computing as the speedup in training time obtained for the same problem size by varying the number of processors. Now that we’ve introduced our features and capabilities, we’ll describe the evaluation and deployment phases before sharing answers to our anticipated frequently asked questions. Evaluation and Metrics It’s important to evaluate the performance of the fine-tuned model on unseen data. As part of the fine-tuning/training run, we ask users to provide the hold-out validation dataset, which is used to evaluate/score the trained model. A wide variety of metrics are provided for each of the three scenarios, with some metrics like accuracy, precision, recall and F-1 that are common to all scenarios. Specifically, for our multilabel text classification scenario we also offer the thresholding feature. We provide a metrics.csv file as part of the finetuning run, to help users understand the impact of varying the threshold (used for predicted probabilities) on metrics like precision and recall. A smaller threshold value would allow more labels per sample on average and hence increase chances for false positives (useful when high recall is desirable). A larger threshold value would allow fewer labels and hence increase chances for false negatives (useful when high precision is desirable). The user can leverage this capability when inferencing the finetuned model on the test dataset or when testing the deployed model. Deployment AutoML NLP is natively integrated within AzureML, enabling users to use all AzureML workflows with AutoML NLP. Once you train a model you can register and deploy it to the REST endpoints like any other AzureML model. You could use both UI and SDK to deploy this model. Architecture The following diagram explains the high-level architecture of AutoML NLP. FAQ I am a Machine Learning (ML) professional, but do not have time/expertise to conduct research in NLP. How do I even know which pretrained model to use? AutoML NLP finds the model that works best for your task and training data. However, if you would prefer to specify a list of models from what we currently support, you can leverage the model sweeping feature of AutoML NLP. Data scientists spend a lot of time cleaning and processing data. Do I need to perform any preprocessing on my text data? AutoML NLP expects the data to comply with the format specified for a particular task. We explain the format, with examples, in our documentation. In addition to data validation checks, AutoML NLP also checks for data pitfalls, and either warns the user or fails the run. Sometimes the data may comply with the pre-specified format, but still lead to hidden problems which do not surface even at runtime, but usually culminate with misleading scores. We have checks in place to avoid many such issues. What if my data is in a non-English language, or if it uses multiple languages? AutoML NLP natively supports a variety of languages. The user only needs to provide the three-letter ISO code corresponding to the dataset’s language, and we will do the rest. Additionally, users can specify multilingual capable models and leverage model sweeping. How do I evaluate the finetuned model? How do I know which metrics to use for evaluation? AutoML NLP is integrated with AzureML’s rich set of metrics available to the user via the UI. The ultimate choice of metric rests with the user and their business use case, although we can share some general guidelines. While accuracy is a well understood metric for classification tasks, it is of little value for NER. In many NER datasets, most of the tokens do not correspond to any entity class but while computing accuracy even such tokens get counted, making accuracy an unreliable metric for NER. We recommend using F1 score, precision and recall for NER, because in our implementation they compute scores by taking into account entity level granularity. For classification tasks, when datasets are imbalanced, metrics like AUC (Area under the curve) are more informative since accuracy is sensitive to imbalance. Get Started Today! Watch our Azure Machine Learning breakout session Get started with Microsoft Learn to build skills Explore Azure Machine Learning announcements at Microsoft Ignite Read this Docs page to learn more about AutoML NLP References: [1] https://arxiv.org/pdf/1810.04805.pdf [2] https://arxiv.org/pdf/1706.03762.pdf [3] https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/how-bert-is-integrated-into-azure-automated-machine-learning/ba-p/1194657 [4] https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging) [5] http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html [6] https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#early-termination [7] https://hpc-wiki.info/hpc/Scaling Contributors: Arjun Singh – Senior Data & Applied Scientist Anup Shirgaonkar – Principal Data & Applied Scientist Manager7.5KViews2likes0CommentsElevating AI with Databricks on Azure: Introducing the Latest Large Language Models
Join us on a journey through the synergy of Databricks and Azure, where cutting-edge data capabilities meet scalable cloud infrastructure, opening new realms of possibilities for developers and enterprises alike.7.1KViews2likes0Comments