Forum Discussion

voonsionglum's avatar
voonsionglum
Brass Contributor
Nov 29, 2021
Solved

Root bot, skill bot and scaling

Hi,

 

We need help with scaling.

 

Problem statement: With 2 instances of root bot running, a skill bot invocation is unable to return results reliably to the root bot.

 

With 1 instance of the root bot and 1 instance of the skill bot, everything works as expected.

 

We configured the root bot with "Scale Out" and "Manual Scale" to set the instance count to 2.  When we repeat the interaction, we see that the skill response is not getting returned reliably.  

 

In a successful scenario where the user's skill response is routed back to the root bot that the user interacted with, the user gets the skill results.

 

 

In a failed scenario, the user's skill response was routed to the root bot instance that the user did not interact with.  Hence, the results are lost.

 

How do we ensure that the skill results are returned to the correct root bot instance?

 

We reviewed the "Skills overview" documentation at https://docs.microsoft.com/en-us/azure/bot-service/skills-conceptual?view=azure-bot-service-4.0, but it does not mention anything about scaling and persistence.

 

We had thought that the lack of persistence could be due to the bots' conversation state.  However, our root bot is currently storing all conversation states in DB.  Since both instances are using the same DB connection string, they should have access to the same conversation states in the DB.

 

Thank You

 

 

 

 

 

  • Slacked2737's avatar
    Slacked2737
    Dec 14, 2022

    voonsionglum 

    We had James check his data and found this. See if it helps. In the root bot:

     

    • Double check and be 100% sure that you're using the SkillConversationIdFactory that is a part of the MS chatbot framework (NOT one that you may have created). It should have a IStorage constructor parameter that lets you pass in whatever storage you want to use to persist ids used with skills communication. You probably need to use the class that is given to you in the chatbot framework. (i.e. SkillConversationIdFactory that inherits from"Microsoft.Bot.Builder.Skills.SkillConversationIdFactoryBase")

     

    • For the IStorage object used by SkillConversationIdFactory, If you are using some kind of in-memory only storage, (i.e. A ConcurrentDictionary or other MemoryStorage type object), that might be a problem. The code in SkillConversationIdFactory might not be persisting the conversation/skill ID lookup data (needed to talk with skills) into a place that other apps can read.

     

    I have found some old MS examples that give a "demo" of how to use skills and shows a SkillConversationIdFactory that uses in-memory storage...which of course won't scale or work across different apps.

     

    https://learn.microsoft.com/en-us/dotnet/api/microsoft.bot.builder.skills.skillconversationidfactory?view=botbuilder-dotnet-stable

     

     

  • Slacked2737's avatar
    Slacked2737
    Copper Contributor

    We're having this exact scaling issue on 4.10. If we try to scale the root bot instances above one, communication made by a skill back to a root bot instance that is NOT the originating root bot produces a 404 (and the skill errors out).

    Any luck with finding out whether this is a version issue?

    We found this delivery mode option (ExpectReplies), which will tie the call and the response to the same root bot instance, but it seems like it might just be an alternate workaround.

    https://docs.microsoft.com/en-us/azure/bot-service/skills-about-skill-consumers?view=azure-bot-service-4.0#using-a-delivery-mode-of-expect-replies

    • voonsionglum's avatar
      voonsionglum
      Brass Contributor
      Good to know we are not the only ones having this issue 🙂 We upgraded our root bot and skill dialog bot to use the latest 4.15.0 npm packages. We scaled out both the root and skill dialog bots to 2 instances. Sadly, we still face the same error.

      Our plan was to redeploy the 4.15 dialog root bot and dialog skill bot samples and scale out the instances to 2. We have been having some trouble overwriting the existing dialog root bot's web app with the 4.15 sample. I'll update again when we get this resolved and test out scaling.

      We were not aware of the delivery mode option. Thank You for bringing that to our attention. A workaround is better than nothing 🙂
    • HunaidHanfee-MSFT's avatar
      HunaidHanfee-MSFT
      Icon for Microsoft rankMicrosoft

      Slacked2737 - Hello did you checked by installing the manifest I share? Also, can you elaborate more on the repro step to be make sure not missing anything.

      • voonsionglum's avatar
        voonsionglum
        Brass Contributor
        HunaidHanfee-MSFT, would it be possible to have access to the actual web apps behind the manifest you have shared? We would like to view the code via Kudu console and examine the scale out settings that have been applied to both the root bot and skill dialog bot.

Resources