Blog Post

AI - Azure AI services Blog
3 MIN READ

Introducing new task-optimized summarization capabilities powered by fine-tuned large-language model

YanlingX's avatar
YanlingX
Icon for Microsoft rankMicrosoft
Nov 15, 2023

For years, developers around the world have relied on pre-built AI capabilities offered through Azure AI Language, ranging from analyzing sentiment, extracting information, mining opinions and much more. Such pre-built capabilities have accelerated AI building efforts for enterprises looking to support users across geographies. Summarization is one such capability, with many customers using it to optimize their AI workflows. For instance, Beiersdorf uses document summarization and key phrase extraction to power an advanced AI platform that harnesses their resources and condensing essential knowledge around cutting-edge skin care solutions (Customer Story HERE). Arthur D Little, similarly uses abstract summarization to unlock the teams’ collective intelligence (Customer Story HERE) .

 

Today we are thrilled to announce new capabilities that are designed to accelerate customers’ AI workflows allowing them to build summarization use-cases faster than ever before. We are expanding our utilization of LLMs to GPT-3.5 Turbo, along with our proprietary z-Code++ models to offer task-optimized summarization using a balance of output accuracy and cost.

 

Here are some of the highlights. All the new public previews are available under api-version 2023-11-15-preview.

  • [in public preview] More out-of-the-box conversation summarization capabilities, e.g. meeting recap and follow-up tasks, in addition to existing capabilities -- narrative, chapter title, issue and resolution. The “recap” aspect condenses a lengthy meeting or conversation into a concise one-paragraph summary to provide a quick overview, and the “follow-up tasks” aspect summarizes out action items and tasks that arise during a meeting.
  • [in public preview] Support the native documents formats, including Word, PDF, PowerPoint in addition to plain text. Customers can now input documents directly without the need for conversion. Please read this post for more.
  • [in public preview] Summarize articles and reports for a specific point of interest, as opposed to a general summarization that is currently available. For example, to get targeted summaries for this news about Microsoft merging Activision Blizzard , you can specify “Activision Blizzard” or “Early Settlement Date” or other queries as your point of interests. The summary generated will then focus more on these points of interest rather than a general summarization.
  • [in public preview] 8 new languages for conversation summarization (namely French, German, Italian, Japanese, Korean, Chinese, Portuguese, Spanish) in addition to English, 2 more languages (namely Hebrew and Polish) for document summarization in addition to those supported with conversations summarization.
  • [in public preview] Container solutions with options for connected as well as disconnected, allowing customers to use summarization in more regions and countries beyond what is supported by the cloud offering. For information about container solution, please check on the Summarization Container Blog
  • [in public preview] Document Analysis sample flow in Prompt Flow for quick start. In the sample flow we have integrated all building tools for a document analytics use cases where you can translate, PII redact, summarize, extract entities, etc. You can also tailor the flow for your own needs by drag and drop the building tools, even add your own tools if necessary. For more information and getting started, please visit Document Analysis Prompt Flow README.
  • Commitment tier pricing making it a cost-effective choice for long-term usage.
  • [in public preview] One preview region, Sweden Central, are added for showcasing our latest and continually evolving LLM finetuning techniques, where all summarization capabilities are available. We welcome customers to try it out by creating a resource in this preview region. Your valuable feedback is vital to our continuous enhancement.

Document summarization (pre-prompt-engineered, 2 styles, query-focused, length control, 11 Languages, 125K chars, ~ 60 pages, in Prompt Flow, in container)

 

 

Conversation summarization (pre-prompt-engineered, 6 aspects, 9 languages, 125K chars, ~3 hours, in container)

 

We can’t wait to see developers use these new capabilities and remain committed to delivering innovative solutions that enable our customers to achieve their goals. Thank you for your continued trust in our products, and we welcome your feedback as we strive to continuously improve our services.

For more details and resources, please explore the following links:

 

Updated Jan 11, 2024
Version 8.0
  • Gary_Melhaff's avatar
    Gary_Melhaff
    Copper Contributor

    Thanks YanlingX....So sounds like I'd need to parallel run the 25 batches of documents to achieve the throughput that I'd need.  Wish there was way to do that without having to code that kind of complexity but at least I know it's possible.  Thanks for the information!  Gary

  • Hi Gary,

    Thank you for taking the time to share your thoughts on our service. We truly appreciate your feedback and value your opinion.

     

    We would like to take this opportunity to explain that our service is a "task-optimized" solution, designed to provide the highest quality and experience for specific tasks, with enterprise-level scalability as well as containerization-ability for data sensitive needs. This sets us apart from other services that offer generic solutions, and allows our customers to easily achieve their desired results without the need for prompt engineering or fine-tuning.

     

    We also strive to constantly optimize latency. As a GenAI solution, our service generates responses token by token, which can result in longer latency compared to classification ML models. The length of the input document also affects latency, e.g. longer documents resulting in longer response times. 

     

    Our service has been successfully used by numerous enterprise customers for large-scale document/conversation processing, and some of them are real-time or near real-time. We would be happy to share some recommendations with you on how to use our service more efficiently. For example, 

    • Use multi-threading to maximize usage up to the rate limit provisioned for your account. 
    • Send multiple requests simultaneously without waiting for the completion of previous requests. 
    • Wait for a brief period, such as 1 or 0.5 second, before querying for the results.

    Our team is committed to continuously improving our service, including reducing latency. We welcome your feedback and would be delighted to schedule a call with you to discuss your thoughts in more detail. Please feel free to reach out to us at mslangsvc@microsoft.com.

     

    Thank you again for your feedback, and we look forward to providing you with the best service possible.

  • Gary_Melhaff's avatar
    Gary_Melhaff
    Copper Contributor

    YanlinX,

    The S tier can run 1000 requests but that's irrelevant if the latency only allows 1 per second.  I'm averaging well over 1 second per transaction with only small volume of pre-summarized textual data. This makes it impossible to run any kind of volume let alone full conversations.   That said I now see I could batch the documents into 25 at a time per request which helps although is still problematic when you're talking millions of documents and especially if I attempted to run full conversations.  Is batching documents to 25 the only tuning capability that could increase throughput?

  • Hi Gary,

    thank you for your question. Would you elaborate on your comments on scaling? Our service is an Enterprise scale SaaS, and scalability is one of the advantages we provide to customers. By default, you can have up to 1000 request per second with a S tier. more information you can get from https://learn.microsoft.com/en-us/azure/ai-services/language-service/concepts/data-limits#rate-limits. and if you need higher rates, please feel free to let us know.

  • Gary_Melhaff's avatar
    Gary_Melhaff
    Copper Contributor

    This is an amazing capability but at this point is pretty much useless due to performance issues.  I would argue "task optimized" is a bit misleading.  I'm running over 1 second per document for summarization and have millions of documents to analyze.  It would take months of continuous running to just do a few million rows.  This is completely unacceptable.   Other services such as sentiment are very fast and offered in most regions.  Why release a capability when it can't scale?