Learning Guide: Introduction to Microsoft Purview for Data Governance (Fall 2022 Update)

With the general availability of Microsoft Purview for Data Governance, (formerly Azure Purview), it is a great time to review some of the key features and provide a learning path made up of free online resources to help you get up to speed.

This article will take you through a curated overview of the application focusing on Data Governance and providing links to more detailed material.

What is Azure Purview

Azure Purview is a cloud-based data governance service that helps you catalog, manage, and govern your on-premises, multi-cloud, and software-as-a-service (SaaS) data. You can create a holistic, up-to-date map of your data landscape and prepare this with automated data discovery, sensitive data classification, and end-to-end data lineage.

Data Governance requires a business process first and foremost, but that business process needs an application that simplifies the implementation. For example, suppose the system is too difficult to implement. In that case, people will not do it, and you will have shadow processes that avoid the rules and exposes organizations to the possibility of compliance trouble. So, Data Governance is a team sport that needs a flexible tool to bring this all together, especially in the hybrid environments companies use today.

The tool’s key benefit is providing a Cloud-Native tool that offers a way to discover, automatically catalog, and tag data that helps build a process around data governance of your Azure data estate.

Pictured below is the landing page.

More Information:

Looking for Azure Purview demo videos?
The following provides extensive demonstrations of the platform. The second one, by the Microsoft Security Community, goes into detail, exploring the Microsoft 365 sensitivity labels from the Microsoft Compliance connectivity.

Setting the Stage: The Data Governance Problem

In the simplest terms, data governance is about managing data as a strategic asset. It involves ensuring that there are controls around data, such as; content, structure, use, and safety. A great example of this is the need to track and provide guidance around personally identifying information, which must be kept secure for compliance and regulators.

Data Growth and Complexity

As modern business data usage evolves, it embraces advanced analytics, artificial intelligence, and machine learning. This need is driving the amount, velocity, and variety of data in play. With all that data comes a wealth of new possibilities and a new set of challenges. Our ability to and this is important here, is to optimize the management and governance of the ever-greater amounts of data so we are successful. But, especially with regulations such as GDPR, making mistakes can be costly in reputation and financially.

As we have continued our move to the cloud, the amount of data we are willing to keep has grown. With blob storage being far cheaper than a new SAN, we see data not only with a high business value being kept but data that may have value later on.

With Machine learning, AI, and more analytics opportunities, we are keeping data that we want to use to solve business problems that we do not yet know about.

Opportunity vs Risk of Data

I remember one of my former manager’s favorite comments back in the day: “We do not know today what the questions are that we want our data to answer.” Well, now, with AI, Machine Learning, and cheap storage, we can keep more data for longer. But again, this is a balancing act. We have to balance the opportunities we see now and in the future with the risks of more data accessible to more people.

When we were setting up clients with data cataloging, we had a couple of searches that we would take management aside and review, such as; executive salary, layoffs, popular movies, and Napster content. I always was able to find something that shocked them.

When getting into an example on the data side, how many copies of their customer table they had, how outdated it could be, and the shock of what personal customer data happened to be shared around the company. Remember, you are only an Excel download away from a data breach!!!

The key takeaway is that without a plan, you invite issues. Unfortunately, this was usually the best way to start the governance discussion.

Data Governance Resources

Let’s review the parts that makeup Azure Purview.

Purview Data Map

The Data Map is the processing heart of the service. It provides the automation, scanning, and classification of data sources you wish to catalog. The service is multi-cloud, with Amazon S3 coming soon. The listing below shows the current Azure Sources available in preview with other connectors added to as time goes on.

The following sources are currently available (Feb 2021) in preview. The self-hosted integration runtime (SHIR) allows the on-premises data sources.

  • On-premises SQL Server SQL Auth UX
  • Azure Synapse Analytics (formerly SQL DW)
  • Azure SQL Database (DB)
  • Azure SQL Database Managed Instance
  • Azure Blob Storage
  • Azure Data Explorer
  • Azure Data Lake Storage Gen1 (ADLS Gen1)
  • Azure Data Lake Storage Gen2 (ADLS Gen2)
  • Azure Cosmos DB

In addition to these sources, the following file types are supported for scanning, schema extraction, and classification where applicable:

  • Structured file formats supported by extension: AVRO, ORC, PARQUET, CSV, JSON, PSV, SSV, TSV, TXT, XML
  • Document file formats supported by extension: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT
  • Purview also supports custom file extensions and custom parsers.

Azure Purview will also scan within certain files to sample the data to provide meta-data and data types.

Purview has three scanning levels:

  • L1 scan: Extracts basic information and metadata like file name, size and fully qualified name
  • L2 scan: Extracts schema for structured file types and database tables
  • L3 scan: Extracts schema where applicable and subjects the sampled file to system and custom classification rules

With many data files, such as those with a specific format and structured file types, Purview samples 128 rows in each column or 1 MB, whichever is lower. For document file formats, it samples 20 MB of each file. Document files larger than 20 MB are not subject to a deep scan (subject to classification). In that case, Purview captures only basic metadata like file name and fully qualified name.

More Information:

Purview Data Catalog

Once the meta-data scan has been gathered and the discovery is complete, the data catalog is built. Each scan discovers the metadata attached to a file used to help users find data in their data estate through search. The Purview landing page provides various paths to information, including a search bar. As pictured below, multiple suggestions are provided for selection by entering a search term or by hitting enter; you will see a complete set of results on a filterable page.

For example, you can easily find a dataset called DimCustomer in the SQL database. As shown below, various filters, such as the Browse by Asset Type experience, narrow your navigation down to the SQL Server. You can then select the DimCustomer object, as pictured below, to see the record entry.

A data consumer can discover data using the familiar hierarchical namespace for each data source using an explorer view. Once the data source is registered and scanned, the Data map extracts information about the structure; the hierarchical namespace is shown below. This information is used to build the browsing experience for data discovery.

Data Lineage Example

Seeing the data workflow that brings data from the source through the transformations to the final dashboards will help you better understand your data.

You can scan your Power BI environment and Azure Synapse Analytics workspaces, which automatically publishes all discovered assets and their lineage to the Purview Data Map. You can also connect Azure Purview to Azure Data Factory instances to automatically collect data integration lineage.

As pictured below, you can get a view of what reports and visualizations are created. This allows you to determine which analytics and reports exist and examine the data flow from source to destination.

More Information:

  • Data lineage in Azure Purview Data Catalog client – This article provides an overview of data lineage in Azure Purview Data Catalog. It also details how data systems can integrate with the catalog to capture the lineage of data. Purview can capture lineage for data in different parts of your organization’s data estate and at varying levels of preparation.
  • Azure Purview Data Catalog lineage user guide – One of the platform features of Azure Purview is the ability to show the lineage between datasets created by data processes. Systems like Data Factory, Data Share, and Power BI capture data lineage as it moves. Custom lineage reporting is also supported via Atlas hooks and REST API.

Purview Data Insights

Insights are one of Purview’s key pillars where reporting, scanning, and logging resides, which allows you to surface what is happening within your data estate.

Let’s say you are responsible for your data security. You can Extend Microsoft 365 sensitivity labels to assets in Azure Purview and create or select the labels you want to apply to your data. Then, matched with the Insights reports, you can use different filters to essentially set up different ways to slice and dice this information. This gives a detailed overview of your data estate from a compliance standpoint.

In the M-365 Compliance Center, the sensitive information types are the same sensitive information types that we are now bringing to Azure.

The feature provides customers with a single pane of glass view into their catalog and further aims to provide specific insights to the data source administrators, business users, data stewards, data officers and, security administrators. Currently, Purview has the following Insights reports that will be available to customers at public preview. Follow the links below for more detail and sample Insight reports**

Azure Purview Quick Starts and Tutorials


4 responses to “Learning Guide: Introduction to Microsoft Purview for Data Governance (Fall 2022 Update)”

  1. […] Learning Path: Introduction to Azure Purview – With the announcement of Azure Purview, it is a great time to review some of the key features and provide a learning path made up of free online resources to help you get up to speed. This article will take you through an overview of Azure Purview by setting the stage focusing on Data Governance and providing links to the next steps. […]

  2. […] Learning Path: Introduction to Azure Purview – With the announcement of Azure Purview, it is a great time to review some of the key features and provide a learning path made up of free online resources to help you get up to speed. This article will take you through an overview of Azure Purview by setting the stage focusing on Data Governance and providing links to the next steps. […]

  3. […] More information on Data Governce can be found on this site; Why You Need a Data Governance Process Now and Introduction to Microsoft Purview for Data Governance . […]

  4. […] Data Transfer: Costs related to transferring data in and out of Azure. […]