Thanks gabrielcorrea
Yes, this is a common question or scenario. In this process I would emphasize the "Running Different Template Versions" section of this blog for testing changes. I would agree testing capabilities is a concern and this is an area where functionality can be improved upon. I think the most important piece is being able to identify and track where nested templates are being leveraged. This would help when trying to gauge impact of updating a task template which is used by a job template and that job template is used by a stage template. There isn't traceability on where the templates are being used other than manually searching the codebase and tracking your way up. There is a feedback item here.
In the meantime, I'd recommend adopting such practices as adding new parameters to include a default which is either '' or the default value of the input of the task being leveraged, the least impactful default (i.e. default to dev not prod values). I'd also look at a few "business critical" pipelines and/or a few common pipeline scenarios like deploy a .NET app and attempt to test these with manual runs with the updates contained in the development branch.
As for that proposal I am hesitant of using custom task to encapsulate logic to create a stage, job, or task. The reason being is this approach can quickly lead to extra complications, lack of clarity, and introduces another required skillset (PowerShell/bash) for adoption. When I have seen this approach, it comes at the risk of demotivation/complexity for the Inner Source team to be engaged. What you mention sounds similar to the approach the CARML project handing CI/CD. One of the common areas for improvement I have seen and heard on CARML is the CI/CD process in terms of making changes and scaling for additional deployments.