Learning Path

Learning Path: Introduction to Azure Purview

With the announcement of Azure Purview, it is a great time to review some of the key features and provide a learning path made up of free online resources to help you get up to speed. This article will take you through an overview of Azure Purview by setting the stage focusing on Data Governance and providing links to the next steps.

What is Azure Purview

Azure Purview is a cloud-based data governance service that helps you catalog, manage, and govern your on-premises, multi-cloud, and software-as-a-service (SaaS) data. You can create a holistic, up-to-date map of your data landscape and prepares this with automated data discovery, sensitive data classification, and end-to-end data lineage.

Data Governance requires a business process first and foremost, but that business process needs an application that simplifies the implementation. Suppose the system is too difficult to implement. In that case, people will not do it, and you will have shadow processes that avoid the rules and exposes organizations to the possibility of compliance trouble. So, Data Governance is a team sport that needs a flexible tool to bring this all together, especially in the hybrid environments companies are using today.

The tool’s key benefit is providing a Cloud-Native tool that offers a way to discover, automatically catalog, and tag data that helps build a process around data governance of your Azure data estate.

Pictured below is the landing page.

More Information:

Looking for a tremendous Azure Purview demo videos?
The following provides extensive demonstrations of the platform. The second one by the Microsoft Security Community goes into detail exploring the Microsoft 365 sensitivity labels from the Microsoft Compliance connectivity.

  • Demo: Enable unified data governance with Azure Purview | Azure Friday – Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multi-cloud, and software-as-a-service (SaaS) data. Gaurav Malhotra joins Scott Hanselman to show how easy it is to create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage so that you can empower your data consumers to find valuable, trustworthy data.

  • Demo: Azure Purview webinar: Introduction to Azure Purview – A 50-minute webinar covering Azure Purview which also includes an extensive demo.

Setting the Stage: The Data Governance Problem

In the simplest terms, data governance is about managing data as a strategic asset. It involves ensuring that there are controls in place around data, such as; content, structure, use, and safety. A great example of this is the need to track and provide guidance around the use of personally-identifying information, which needs to be kept secure for compliance and regulators.

Data Growth and Complexity

As modern business data usage evolves, it embraces advanced analytics, artificial intelligence, and machine learning. This need is driving the amount, velocity, and variety of data in play. With all that data comes a wealth of new possibilities and a new set of challenges. Our ability to, and this is important here, is to optimize the management and governance of the ever-greater amounts of data, so we are successful. Especially with regulations such as GDPR, making mistakes can be costly not only in reputation but also financially.

As we have continued our move to the cloud, the amount of data we are willing to keep has grown. With blob storage being far cheaper than a new SAN, we see data not only with a high business value being kept but data that may have value later on.

With Machine learning, AI, and more analytics opportunities, we are keeping data that we want to use to solve business problems that we do not yet know about.

Opportunity vs Risk of Data

I even remember back in the day, one of my former manager’s favourite comments was, "we do not know today what the questions are, that we want our data to answer." Well, now, with AI, Machine Learning and cheap storage, we can keep more data for longer. But again, this is a balancing act. We have to balance the opportunities that we see now and in the future with the risks of having more data accessible by more people.

When we were setting up clients with data cataloguing we had a couple of searches that we would take management aside and review, such as; executive salary, layoffs, popular movies, and Napster content. I always was able to find something that shocked them.

When getting into an example on the data side, how many copies of their customer table they had, how out of date it could be, and the shock of what personal customer data happened to be shared around the company. Remember, you are only an Excel download away from a data breach!!!

The key takeaway is that without a plan, you just invite issues. This was usually the best way to start the governance discussion.

Data Governance Resources

Let’s review the parts that makeup Azure Purview.


Purview Data Map

Data Map is the processing heart of the service. It provides the automation, scanning and classification of data sources you wish to catalog. The service is multi-cloud, with Amazon S3 coming soon. The listing below shows the current Azure Sources available in preview with other connectors added to as time goes on.

The following sources are available currently (Feb 2021) in preview. The on-premises data sources are available when using the self-hosted integration runtime (SHIR).

  • On-premises SQL Server SQL Auth UX
  • Azure Synapse Analytics (formerly SQL DW)
  • Azure SQL Database (DB)
  • Azure SQL Database Managed Instance
  • Azure Blob Storage
  • Azure Data Explorer
  • Azure Data Lake Storage Gen1 (ADLS Gen1)
  • Azure Data Lake Storage Gen2 (ADLS Gen2)
  • Azure Cosmos DB

In addition to these sources, the following file types are supported for scanning, schema extraction and classification where applicable:

  • Structured file formats supported by extension: AVRO, ORC, PARQUET, CSV, JSON, PSV, SSV, TSV, TXT, XML
  • Document file formats supported by extension: DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT
  • Purview also supports custom file extensions and custom parsers.

Azure Purview will also scan within certain files to sample the data to provide meta-data and data types.

Purview has three scanning levels:

  • L1 scan: Extracts basic information and metadata like file name, size and fully qualified name
  • L2 scan: Extracts schema for structured file types and database tables
  • L3 scan: Extracts schema where applicable and subjects the sampled file to system and custom classification rules

With many data files, such as those with a specific format and structured file types, Purview samples 128 rows in each column or 1 MB, whichever is lower. For document file formats, it samples 20 MB of each file. For document files larger than 20 MB, it is not subject to a deep scan (subject to classification). In that case, Purview captures only basic metadata like file name and fully qualified name.

More Information:


Purview Data Catalog

Once the meta-data scan has been gathered and discovery complete, the data catalog is built. Each scan discovers the metadata attached to a file used to help users find data in their data estate through search. The Purview landing page provides various paths to information, including a search bar. As pictured below, by entering a search term, multiple suggestions are provided for selection, or by hitting enter, you will see a full set of results on a filterable page.

For example, you can easily find a dataset called DimCustomer in the SQL database. As shown below, various filters such as the Browse by Asset Type experience to narrow down your navigation to the SQL Server, for example. You can then select the DimCustomer object as pictured below to see the record entry.

A data consumer can discover data using the familiar hierarchical namespace for each of the data sources using an explorer view. Once the data source is registered and scanned, the Data map extracts information about the structure, hierarchical namespace shown below, of the data source. This information is used to build the browsing experience for data discovery.

Data Lineage Example

Seeing the data workflow that brings data from the source through the transformations to the final dashboards will help bring a better understanding of your data.

You can scan your Power BI environment and Azure Synapse Analytics workspaces, which automatically publishes all discovered assets and their lineage to the Purview Data Map. You can also connect Azure Purview to Azure Data Factory instances to automatically collect data integration lineage.
As pictured below, you can get a view of what reports and visualizations are created. This allows you to determine which analytics and reports exist, examine the data flow from source to destination.

More Information:

  • Search the Azure Purview Data Catalog – This article describes how to use the various search features in the Azure Purview Data Catalog.

  • Data lineage in Azure Purview Data Catalog client – This article provides an overview of data lineage in Azure Purview Data Catalog. It also details how data systems can integrate with the catalog to capture the lineage of data. Purview can capture lineage for data in different parts of your organization’s data estate and at varying levels of preparation.

  • How To Browse the Azure Purview Data catalog – data consumer can discover data using the familiar hierarchical namespace for each of the data sources using an explorer view. Once the data source is registered and scanned, the Data map extracts information about the data source’s structure (hierarchical namespace). This information is used to build the browsing experience for data discovery.

  • Azure Purview Data Catalog lineage user guide – One of the platform features of Azure Purview is the ability to show the lineage between datasets created by data processes. Systems like Data Factory, Data Share, and Power BI capture data lineage as it moves. Custom lineage reporting is also supported via Atlas hooks and REST API.


Purview Data Insights

Insights are one of Purview’s key pillars where reporting, scanning, and logging reside, which allows you to surface what is happening within your data estate.

Let’s say you are responsible for your data security. You can Extended Microsoft 365 sensitivity labels to assets in Azure Purview, and created or selected the labels you want to apply to your data. Matched with the Insights reports, you can use different filters to essentially set up different ways to slice and dice this information. This gives a detailed overview of your data estate from a compliance standpoint.

In the M-365 Compliance Center, the sensitive information types are the same sensitive information types that now we are bringing to Azure.

The feature provides customers with a single pane of glass view into their catalog and further aims to provide specific insights to the data source administrators, business users, data stewards, data officer and, security administrators. Currently, Purview has the following Insights reports that will be available to customers at public preview. Follow the links below for more detail and sample Insight reports**


Azure Purview Resources

The following link set provides more detail on the analytics available.


Azure Purview Quick Starts and Tutorials

Similar Posts