Blog Post

Microsoft Mechanics Blog
10 MIN READ

Introducing the Microsoft Purview Unified Catalog

Zachary-Cavanell's avatar
Zachary-Cavanell
Bronze Contributor
Feb 27, 2025

Get control of your data

Locate, access, and trust the data you need using Microsoft Purview’s Unified Catalog. By leveraging AI-powered search and automated quality checks, you can use data across your organization while staying compliant and meeting privacy standards. With streamlined approval workflows, request and gain access to data quickly, collaborate with stakeholders, and ensure data quality across projects.

Daniel Hidalgo, Microsoft Purview Senior Product Manager, joins Jeremy Chapman to share how to manage data governance, drive better decisions, and support meaningful AI outcomes with Microsoft Purview.

Simplify collaboration.

Deliver trusted data as a business user, data steward, or data owner. Check out the Microsoft Purview Unified Catalog.

Automate data quality checks.

Ensure consistent, quality data with automated scans and column-level rules for accuracy. Take a look at Microsoft Purview.

Improve governance with actionable insights.

Track data quality, compliance, and health trends using Microsoft Purview’s built-in reporting. See it here.

Watch our video here:

 

QUICK LINKS:

00:00 — Microsoft Purview Data Governance
00:29 — Data visibility and access
01:46 — Universal data catalog
03:23 — Crowdsourcing approach
04:24 — Business user demo
06:58 — How it works
08:05 — Create new governance domain
08:53 — Define data products
09:39 — Automate data quality checks
10:33 — Day-to-day management
11:57 — Wrap up

Link References

Get started at https://aka.ms/PurviewDataGovernance

Unfamiliar with Microsoft Mechanics?

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.

Keep getting this insider knowledge, join us on social:

Video Transcript:

-If you struggle with trusting that you have the right data to build a business report or need to find the right data but don’t know if it exists, now with the new Unified Catalog in Microsoft Purview, you have a centralized way to easily see and understand the data across your organization with all the controls in place to responsibly access it and use it. And joining me today to share more about this is Daniel Hidalgo from the Microsoft Purview Product Team. Welcome to the show.

- Thanks. It’s good to be back.

- So with Microsoft Purview, we’ve had capabilities to classify sensitive data and also adapt information protections based on user risk along with policies then to keep your organization compliant for a while. So how does this new solution then change the picture?

- So we’re tackling how to make quality data easily accessible in your organization without breaking privacy and regulatory requirements. In fact, with AI, you could say that we are now at an inflection point because quality data is pivotal to meaningful AI outcomes as well. This is the first solution of its kind that now gives you central visibility of your data wherever it lives across Microsoft and non-Microsoft endpoints, the ability as the person consuming the data to responsibly discover the data you need all using AI-powered natural language. And because data quality is built into the service, meaning it’s well described and complete, you can trust it. And also as a data consumer, you aren’t spending a third of your time cleaning up the data. You can get the specific data you need, know where it fits with other related data by understanding its lineage and to help with that freedom of discovery and access, we make it easy for you to abide by the usage expectations and policies set by your organization and everything is auditable. Importantly, the experience is non-technical as you consume data and we’re removing the curation and manual effort for making quality data accessible with the right controls for data governance.

- And this is good because I think for most of us, we’re just trying to get our work done and the whole notion then of data governance, it might seem complex and a lot of effort.

- Yeah, and there’s some truth to that and a lot of efforts fail at governance when they are too restrictive. In fact, we went through this exact journey at Microsoft before landing on a federated approach to build a unified data catalog where there is mutual incentive for the relevant roles inside the organization, to create quality data that can equally be discovered, accessed and governed, from business users who have the domain knowledge and want an easy way of navigating what’s available to them, to data stewards who can provide informed perspectives for what would make the data useful by acting like a bridge between business users and data owners, where data owners are ultimately accountable for delivering a usable data product, one that is well described, including recommended usage, measuring data quality and its trending score over time, defining access requirements, for example, by specifying purpose of use and time limits, and ultimately, they have approval over access requests. And then the Central Data Office is responsible for overseeing how data is discovered and used by each role. It owns the data hierarchy and permissions across the roles I just mentioned. And they own the top-level access policies that are inherited by data products to meet specific company standards.

- So it’s kind of like a crowdsourcing approach then where we’re leveraging the collective knowledge of consumer stakeholders to the curators of data and also the controllers of it really to participate in ensuring and maintaining quality data coverage and also usage.

- Right, and it’s self-service, where all these roles have a say in the right data elements, business metrics, data types, and can request access. Again, everyone plays their part and it’s gaining a lot of traction with over 7,000 organizations using it already.

- So should we think of this then as a data catalog but with more guardrails?

- Think of it as a unified catalog, the catalog of catalogs because we can connect to data sources wherever they live individually and as part of any existing catalog. When you connect to dozens of available non-Microsoft services, including Databricks and Snowflake, the information from their respective catalogs will come through. And of course, we’re integrated with Microsoft Fabric, so we can bring in data from OneLake across the data estate and collections you choose, once you register it with the service.

- So can we see what it looks like then in action?

- Sure, so I’m going to start with the experience as an everyday business user. I’m on the data product search page where you can see the different governance domains that are already publishing data, what you may have already accessed, and which data you have access to today. You can search for data products based on what you’re trying to find using natural language. I’m going to look for “sales forecast data to validate our actual sales in 2024.” This is using semantic search. And you’ll see it gives me a ranked list of results. That said, the top result looks like it’s around budgets, but the items below seem thematically closer to what I’m looking for. But let’s select the top result to test it out. And immediately, I can see that the AI knew what I was looking for because budget can be a synonym for forecast. And you can see this data product is endorsed by the data owner. Here I can also see that there is a Forecast data asset as part of our Lakehouse, as well as the Sales Planning Report in Fabric. In fact, I can see all the different assets that make up this data product, including the other items coming from Snowflake and even Databricks. And as I go even deeper into each of the assets, like this Customer table here in Snowflake to quickly understand the nature of the information, and I can drill into more details to see its level of data quality, with a trending score and below that, I can also see the data lineage. Now, when I navigate back to the Data Product page, I also have links to related glossary terms, critical data elements, and who I can contact to support my use of this data. Once I know that this is the data I want, I can click “request access,” which brings me to a standard form where I need to select the reason why I need this data and include a business justification. So I’ll say “meeting for Q1.” Next, I’m prompted to agree with the specific terms of use defined for this data product, as a prerequisite for the approvers to grant me access to the data. And when I finish and confirm, it will run through the established approval process for the data product and data owners now have all the information they need to approve or deny my access request. And with that flow completed, I now have access to the data.

- What I really like about this approach is that often finding the right data in your organization and who owns it, that’s kind of internal or tribal knowledge. So that process, it’s now streamlined and you can easily discover if the data that you need exists at the source and also request access with an approval workflow and there’s less hoops to jump through compared to having to manually figure this out on your own. So what’s behind all of this then for it to work?

- So now that I’ve shown you the experience of consuming the data, let me explain the architecture behind this, whether you’re a data steward, data owner, or a data governance administrator in the Central Data Office. Microsoft Purview Data Governance is built around what’s called Governance Domains, which are boundaries that you define for data policies, roles, and ownership, and how your data is discovered. These can be based on business departments or functions, or subject areas such as products or customer records, or a mix of both. And if we zoom in to the governance domain for sales, you’ll see that the domain then consists of different data products. Then zooming into a data product like transactions, you’ll see that it can contain one or more data assets, which can be tables, files, and reports that are all packaged together into a single data product to be shared with people and groups in your organization. And as you saw, data products are discovered using the Unified Catalog as a central place where people can go to find the data assets they need and request access.

- So how do you go about citing all this up then as a central data team?

- It’s pretty straightforward. Once you’ve set up a few roles and permissions and mapped a few data sources, you can start by creating a new governance domain. To save time, I’ll walk through a sales domain that was previously created. Then under roles, each governance domain will have the right people in the right roles assigned, along with other details like corresponding policies like required attestations that we saw before when requesting data access or triggering the built-in manager approval flow. You can also make sure the right glossary terms are assigned to each governance domain, like terms you’re seeing here for sales, as well as any critical data elements as items you want to pay close attention to, like the sensitive information here for customers. Once you have a governance domain configured, you can start defining your data products. Here you can see all of the data products in our sales domain. And for each, you’ll see all of the items that belong to the data products and can include different data or file types. You can easily add more data assets, even using AI-powered suggestions like you’re seeing here where the AI has found relevant data corresponding with customer churn. And I can select this table and add it. Related to this data, you can also configure policies which might be different from the inherited policies at the governance domain level, and where you can for example set access time limits, or define access request approvers for cases where you can define groups or individuals who can grant access.

- And earlier we saw trending data quality scores, so what would feed into that?

- There are a few things you can do to ensure data is accurate and conforms with the right data types that you set. For example, in data quality, which is built in, you can navigate to each data product. This way you can make sure that within a data asset, there are rules defined for its columns and fields to use the right format, like matching Data Types. Often, fields like this column for date, are simply set to a “string” type, where it should align to date and time formatting. And it’s also common for integer columns to be set as strings too. So you can then use rules to ensure that data in those columns match what’s required. Then automated quality scans can be set up to run on a schedule, making sure the quality doesn’t drift over time and the data in each column remains in a healthy state.

- So now that you’ve basically highlighted everything to set everything up, once the data then starts flowing in, people start using Data Products, what does the day-to-day management experience look like?

- So there can be more governance domains and data products to configure or refine, but beyond that, you can also use built-in reporting to fine tune your policies and make sure your people are getting the most out of their data. Under controls, you’ll find score details based on how well each score complies with your data standards, along with scoring trends for each category. There is also a centralized view of all the recommended actions tailored to you as a data owner. You’ll use these actions to optimize your data health. For example, adding detailed descriptions, data quality controls, missing classifications, and more. And there are also a number of detailed visual reports with scores across categories as well as scores per governance domain, as well as data health for each, scores by control type, and even health score trends over time to help track your progress and prioritize what to do next. So now you have centralized visibility and understanding of your entire data estate, along with how data is being used. And your business users can discover, access, and consume the right data with the right quality and policy controls in place, so everyone wins.

- So it’s really great to see how with the new Unified Catalog in Microsoft Purview, we’re making data governance something that everyone plays a part in. So for anyone who’s watching right now looking to get started, what do you recommend?

- It’s simple. You can find everything you need at aka.ms/PurviewDataGovernance.

- Thanks Daniel, and thank you for joining us today. Be sure to subscribe for the latest updates, and we’ll see you again soon.

 

 

Updated Feb 19, 2025
Version 1.0
No CommentsBe the first to comment