There were several interesting Data & AI announcements that came out of Microsoft Build 2021. I put together this summary of 11 interesting and includes links to both the announcements and other follow-on information.
Table of contents
- 1. Azure Purview – Register and Scan Azure SQL Managed Instance
- 2. Query Delta Lake files using T-SQL language in Azure Synapse Analytics
- 3. Apache Spark 3.0 support in Azure Synapse Analytics
- 4. How to Query Serverless SQL pool from an Apache Spark Scala notebook
- 5. How to use CI/CD integration to automate Synapse Workspace Deployment to multiple environments
- 6. Embed Microsoft Power BI analytics reports in a Jupyter Notebook
- 7. Azure Synapse Link for Dataverse is now in preview
- 8. Synapse Apache Spark Hardware Acceleration is now in preview
- 9. Azure Purview now supports Azure Database for mySQL and Azure Database for PostgreSQL
- 10. Public preview: Scan and view lineage of data stored in Hive Metastore Database using Azure Purview
- 11. Converging the Physical and Digital with Digital Twins, mixed reality, and metaverse apps
1. Azure Purview – Register and Scan Azure SQL Managed Instance
Register and scan Azure SQL Database Managed Instance – Azure Purview | Microsoft Docs – This article outlines how to register an Azure SQL Database Managed Instance data source in Purview and set up a scan on it. The Azure SQL Database Managed Instance data source supports the following functionality:
- Full and incremental scans to capture metadata and classification in Azure SQL Database Managed Instance.
- Lineage between data assets for ADF copy and dataflow activities.
2. Query Delta Lake files using T-SQL language in Azure Synapse Analytics
Query Delta Lake using T-SQL in Synapse Analytics (microsoft.com) – Azure Synapse enables you to query data stored in Apache Delta Lake format. This is one of the top feedback requests, and we are happy to announce that this feature is now available in public preview. This article will teach how to run the T-SQL queries on Delta Lake storage from your Synapse workspace.
3. Apache Spark 3.0 support in Azure Synapse Analytics
Apache Spark 3.0 support in Azure Synapse Analytics – Microsoft Tech Community – The Apache Spark 3.0 runtime is now available in Azure Synapse. This version builds on top of existing open-source and Microsoft-specific enhancements to include additional unique improvements listed below. The combination of these enhancements results in a significantly faster processing capability than the open-source Spark 3.0.2 and 2.4.
The public preview announced today starts with the foundation based on the open-source Apache Spark 3.0 branch with subsequent updates leading up to a Generally Available version derived from the latest 3.1 branch.
4. How to Query Serverless SQL pool from an Apache Spark Scala notebook
Spark notebook can read data from SQL pool (microsoft.com) – Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data. You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.
In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool. The example also shows how to Configuring connection to the serverless SQL pool endpoint
5. How to use CI/CD integration to automate Synapse Workspace Deployment to multiple environments
This article reviews how to use CD/CI cloud-based data solutions using Azure Synapse Analytics. It shows you how to use DevOps as a preferred software development methodology and to use three distinct environments: Development, UAT, and a Production environment.
In this article we are going to demonstrate how you can use Azure Synapse Analytics integrated with an Azure DevOps Git repository to achieve these goals.
6. Embed Microsoft Power BI analytics reports in a Jupyter Notebook
Announcing Power BI in Jupyter notebooks | Microsoft Power BI Blog | Microsoft Power BI – Get your Power BI analytics in a Jupyter notebook with the new powerbiclient Python package.
Users can now embed Microsoft Power BI analytics reports in a Jupyter Notebook. Jupyter Notebook, an open-source development tool featuring documents with live code, equations, visualizations, and narrative text, is often used for data visualization and more. Power BI Embedded Analytics enables data app developers to engage directly with data, explore analytics and generate reports. These Jupyter Notebook integrations are now in preview.
7. Azure Synapse Link for Dataverse is now in preview
This new capability removes barriers between business application data and analytical systems. In just a few clicks, developers working in Microsoft Power Apps or Dynamics 365 can bring their entire Dataverse environment to Azure Synapse to power new insights, perform predictive analytics and enrich existing data with other business datasets and explore their data lake or another large repository of data. Power your business applications data with analytical and predictive insights | Azure Blog and Updates | Microsoft Azure
8. Synapse Apache Spark Hardware Acceleration is now in preview
Azure Synapse Analytics now supports using field-programmable gate array (FPGA) hardware and graphics processing unit (GPU) processors, designed to handle AI better to accelerate Apache Spark for data processing and machine learning. This development will help enterprise data engineers use large datasets for things like developing new product lines, transforming supply-chain models, responding to security threats, and data scientists using larger and more complex datasets for AI needs. Read More: NVIDIA GPU Acceleration for Apache Spark™ in Azure Synapse Analytics (microsoft.com)
9. Azure Purview now supports Azure Database for mySQL and Azure Database for PostgreSQL
a data governance service now supports Azure Database for MySQL and Azure Database for PostgreSQL as a source for metadata, classification, and lineage extraction. Now in preview, this capability extends the reach of the Purview Data Map to Azure open-source databases.
Customers of Azure Database for MySQL and Azure Database for PostgreSQL can automatically scan and classify these sources and visualize lineage when this data gets transformed. Data consumers can then discover this data and its lineage in the Purview Data Catalog. Read More: Quickstart: Create an Azure Purview account in the Azure portal (preview) – Azure Purview | Microsoft Docs
10. Public preview: Scan and view lineage of data stored in Hive Metastore Database using Azure Purview
Azure Purview now supports Hive Metastore Database as a source. The Hive Metastore source supports a Full scan to extract metadata from a Hive Metastore database and fetches Lineage between data assets. The supported platforms are Apache Hadoop, Cloudera, Hortonworks, and Databricks. Read More: Register Hive Metastore database and setup scans in Azure Purview – Azure Purview | Microsoft Docs
11. Converging the Physical and Digital with Digital Twins, mixed reality, and metaverse apps
Cloud and edge computing are coming together as never before, leading to huge opportunities for developers and organizations around the world. Digital twins, mixed reality, and autonomous systems are at the core of a massive wave of innovation from which our customers already benefit.
Azure Digital Twins can model any asset, system, or entire environment and keep the digital twins live and up-to-date with Azure IoT. Azure Synapse Analytics tracks the history of digital twins and finds insights to predict future states. Our AI and machine learning platform let you build autonomous systems that continually learn and improve. Read More: Converging the physical and digital with digital twins, mixed reality, and metaverse apps | Azure Blog and Updates | Microsoft Azure. Learn with MSLearn: Develop with Azure Digital Twins – Learn | Microsoft Docs a 3.5 hr free online course.