Learning Path: Azure Synapse (Formerly Azure SQL Data Warehouse)
As technology professionals, we are working during some of the most challenging times. Not only is the pace of change increasing, but keeping up with the volume of new cloud-based products and services is becoming more difficult. To survive in this environment, we must employ various learning techniques, and this Learning Path is an example of Guided Technical Enablement. It provides a structure for quality curated content that a student can use to start mastering a topic.
In this Learning Path for the Azure Synapse (Formerly Azure SQL Data Warehouse), you are provided with various online resources in a structured learning format. Each level contains the objectives delivered through articles, labs and tutorials. Learning Paths Introduction: Guided Technical Enablement is provided that also lists other Learning Paths available. More content will be added over time.
Note: This learning path covers Azure Synapse (Formerly Azure SQL Data Warehouse), I also have a learning path,Learning Path: Azure Synapse Analytics which summarizes introductory material to the entire Azure Synapse Analytics suite.
Goals and Format of this Learning Path
In this Learning Path for Azure Synapse, you will find various online resources in a structured learning format. Each level contains the objectives delivered through articles, labs and tutorials.
The paths split into different levels to provide a structure to the process.
|300||Optimization and Deep Technical|
|400||Best Practices and Use Cases|
The concepts and skills that this Learning Path covers would be beneficial for the following roles:
- Solution Architects who design overall solutions that include Azure SQL Data Warehouse in end-to-end solutions.
- BI and Data Developers to understand how to design and implement Azure SQL data solutions.
- Data Professionals and DevOps teams, who implement and operate Azure SQL Data Warehouse systems
Level 000 – The Prerequisites
- Build out an environment in Azure with the proper resources.
The first step is to build out an environment. You may already have one available, but you can also get a trial subscription here. However, there are 12 free months of selected services and a $200 azure credit to try other services. ** Always shut down your Azure SQL DW when not in use ** For training or tying out services, I usually keep them in one resource group and delete that resource group when I am finished in order to keep costs down.
The following PDF contains a lab that walks you through the creation of an environment with an Azure SQL Databases and a SQL Data Warehouse. Hands-On-Lab-SQL-DW-AzureSetUp
Level 100 – The Basics
The following section provides the Level 100 introduction articles.
- Describe Azure SQL Data Warehouse features and differences to Azure SQL Database
- Discuss how Azure SQL Data Warehouse fits into a solution architecture
SQL Data Warehouse samples repository – This GitHub repository contains code samples that demonstrate how to use Microsoft’s Azure SQL Data Warehouse service. This site also includes sample data to build your database.
SQL Data Warehouse Documentation – It all starts with the documentation. This link provides online documentation and also various video and tutorials.
MPP Architecture Components – Video and documentation covering how Azure SQL DW combines Massively Parallel Processing (MPP) with Azure storage. Many features and design decisions revolve around this architecture feature.
Solution Design – Cheat sheet for Azure SQL Data Warehouse – This article provides an overview of building a solution with SQL DW. The topic includes operational and features included in a full end-to-end solution.
Scaling The DW – Data Warehouse Units (DWUs) and compute Data Warehouse Units (cDWUs) – Data warehouse units (DWUs, cDWUs) are used to optimize price and performance of the installation. In addition to definitions, this article provides recommendations on choosing the ideal number of units.
Video – Introducing Gen 2 Azure SQL Data Warehouse | Microsoft Youtube Video – Overview of the Gen 2 version of Azure SQL DW
Video – Introducing the modern data warehouse solution pattern with Azure SQL Data Warehouse – Introduction to the Modern Data Warehouse Solution pattern, comprised of Data Factory, Data Lake Store, Databricks and SQL Data Warehouse and how they fit together to define a solution architecture.
Level 200 – Deeper Concepts
The following section provides the Level 200 articles. The objective of this section is to start the process of designing and building the Data Warehouse.
- Discuss Polybase and its importance to Azure SQL DW
- Describe and develop external tables using Polybase
- Review the recommended practices for loading data into the database
- Define the differences between Hash-distributed, Round-robin and Replicated tables and how to select between them
- Define the importance of the distribution column and what factors go into the selection process
Designing a PolyBase data loading strategy for Azure SQL Data Warehouse – Overview of Polybase and the data loading workflow from External Table definition to Creating Table as Select functionality. Included is a video on ELT with Azure SQL DW.
Guidance for designing distributed tables in Azure SQL Data Warehouse – Reviews the process of creating distributed tables and how to select an optimal distribution column and schema objects.
**VIP Article – Dealing with Errors ** Load confidently with SQL Data Warehouse PolyBase Rejected Row Location – In Azure SQL Data Warehouse the Create External Table definition has been extended to include a Rejected_Row_Location parameter. This value represents the location in the External Data Source, where the Error File(s) and Rejected Row(s) will be written. (CSV files only)
Design guidance for using replicated tables in Azure SQL Data Warehouse – Designing replicated tables in your SQL Data Warehouse schema. Using replicated tables can improve query performance by reducing data movement.
Partitioning tables in SQL Data Warehouse – For SQL Server, you could improve the efficiency and performance by using Table Partitions. Table partitions, also available in Azure SQL Data Warehouse, enable you to divide your data into smaller groups of data. In most cases, table partitions are created on a date column.
Workload management with resource classes in Azure SQL Data Warehouse -Resource classes are pre-determined resource limits in Azure SQL Data Warehouse that govern compute resources and concurrency for query execution. Resource classes can help you manage your workload by setting limits on the number of queries that run concurrently and on the compute-resources assigned to each query.
Best practices for loading data into Azure SQL Data Warehouse – Recommendations and performance optimizations for loading data into Azure SQL Data Warehouse.
Maximizing row group quality for columnstore – Rowgroup quality is determined by the number of rows in a row group. Use these methods to improve compression rates and query performance for columnstore indexes.
Columnstore indexes: Overview – SQL Server | Microsoft Docs – Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage.
Workload management with resource classes in Azure SQL Data Warehouse – Resource classes are implemented by assigning users to database roles. When a user runs a query, the query runs with the user’s resource class.
Level 300 – Optimization and Deep Technical
The following section provides the Level 300 articles.
- Understand various monitoring techniques
- Review operational activities such as Backups, workload classifications and query plans
- Complete a hands-on data migration tutorial
Table statistics in Azure SQL Data Warehouse – Recommendations and examples for creating and updating query-optimization statistics on tables in Azure SQL Data Warehouse.
Workload management with resource classes in Azure SQL Data Warehouse -Guidance for using resource classes to manage memory and concurrency for queries in your Azure SQL Data Warehouse.
Monitoring resource utilization and query activity in Azure SQL Data Warehouse – Azure SQL Data Warehouse provides a rich monitoring experience within the Azure portal to surface insights to your data warehouse workload. The Azure portal is the recommended tool when monitoring your data warehouse as it provides configurable retention periods, alerts, recommendations, and customizable charts and dashboards for metrics and logs.
Backup and restore in Azure SQL Data Warehouse – Learn how to use backup and restore in Azure SQL Data Warehouse. Use data warehouse restore points to recover or copy your data warehouse to a previous state in the primary region. Use data warehouse geo-redundant backups to restore to a different geographical region.
Azure SQL Data Warehouse workload classification – This article explains the SQL Data Warehouse workload classification process of assigning a resource class and importance to incoming requests. A video is included in this article.
Monitor your workload using DMVs – This article describes how to use Dynamic Management Views (DMVs) to monitor your workload. This includes investigating query execution in Azure SQL Data Warehouse.
Tutorial / Best Practices – Migrating data to Azure SQL Data Warehouse in practice – This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration. Once you understand the steps involved in migration, you can practice them by following a running example of migrating a sample database to Azure SQL Data Warehouse.
Azure SQL Data Warehouse loading patterns and strategies – This article provides an overview of the Microsoft Azure SQL Data Warehouse architecture with an emphasis on data loading.
Unlock Query Performance with SQL Data Warehouse using Graphical Execution Plans – The Graphical Execution Plan feature within SQL Server Management Studio (SSMS) is now supported for SQL Data Warehouse (SQL DW)! With a click of a button, you can create a graphical representation of a distributed query plan for SQL DW. You can now seamlessly and visually debug query plans to identify performance bottlenecks directly within the SSMS window.
Level 400 – Best Practices / Lessons from the field / Use Cases
The following section provides the Level 400 articles.
- Review real-world solution decision points around costs, migrations
- A complete tutorial on queued queries and determining queries performance
- Review best practices through recommended development patterns
Common ISV application patterns using Azure SQL Data Warehouse – This article discusses typical customer implementations and proven architecture patterns using SQL DW. This pattern covers the ones used by Independent Software Vendor (ISV) on SQL DW.
Migrating data to Azure SQL Data Warehouse in practice – This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration.
Azure SQL Data Warehouse Workload Patterns and Anti-Patterns – This article clarifies some of the concepts around RDBMS usage related to OLTP and OLAP workload, Symmetric Multiprocessing (SMP) and Massively Parallel Processing (MPP), workload patterns and anti-patterns, focusing on Azure SQL DW.
Costs – Manage compute in Azure SQL Data Warehouse – Learn about managing compute resources in Azure SQL Data Warehouse. Lower costs by pausing the data warehouse, or scale the data warehouse to meet performance demands.
Tutorial: Checking for queued queries in Azure SQL Data Warehouse – Azure SQL Data Warehouse manages query workload through a combination of concurrent query execution slots and resources classes.
Tutorial: How many queries and slots are running in my Azure SQL Data Warehouse? – Azure SQL Data Warehouse allows you to manage your workload using resource classes and concurrency slots for query execution.
Many feature requests are added to products in an ongoing schedule. With the Azure Platform as a Service offering, these updates happen very regularly. The following feature updates are some of the more critical when reviewing the Azure SQL Data Warehouse. The first link provides various release notes that correspond to new features added to the platform.
Azure SQL Data Warehouse current release notes – As new features are rolled out, they are done so in a rolling schedule. Check the version deployed to your instance and the latest Azure SQL DW release notes for feature availability. (Instruction in the article)
Azure updates by service – Important Azure product updates, roadmaps, and announcements. To stay up to date, you can subscribe to receive email notifications.
JSON data support for Azure SQL Data Warehouse is now in preview – JSON functions, such as JSON_VALUE, JSON_QUERY, JSON_MODIFY, and OPENJSON are now supported in Azure SQL Data Warehouse.
Azure SQL Data Warehouse releases new capabilities for performance and security -Several new features have been added around price-performance, flexibility, and security; such as result-set caching, materialized views and ordered clustered columnstore indexes. These features are in preview as of September 2019.
SQL Data Warehouse samples repository – This GitHub repository contains code samples that demonstrate how to use Microsoft’s Azure SQL Data Warehouse service. This site also includes ample data to build your database.
Autoscaling Azure Resources – Autoscaling is the process of dynamically allocating resources to match performance requirements. This article reviews the advantages and processes around this feature. Azure Functions can be used to automatically allocate compute power when your code is running, scaling out SQL DW as necessary to handle the load.
Tutorial: Create a function in Azure that is triggered by a timer – Learn how to use Azure Functions to create a serverless function that runs based a schedule that you define.
Memory and concurrency limits for Azure SQL Data Warehouse – View the memory and concurrency limits allocated to the various performance levels and resource classes in Azure SQL Data Warehouse. This is used in workload management.
Indexing tables in SQL Data Warehouse – Index types – SQL Data Warehouse offers several indexing options including clustered columnstore indexes, clustered indexes and nonclustered indexes, and a non-index option also known as heap.
Columnstore indexes: Overview – Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
Heaps (Tables without Clustered Indexes)– A heap is a table without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order.
Clustered and Nonclustered Indexes Described – An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. This article reviews various definitions and examples.
Authenticate to Azure SQL Data Warehouse – Learn how to authenticate to Azure SQL Data Warehouse by using Azure Active Directory (AAD) or SQL Server authentication.
Secure a database in SQL Data Warehouse – This article walks through the basics of securing your Azure SQL Data Warehouse database. In particular, this article gets you started with resources for limiting access, protecting data, and monitoring activities on a database.
Tutorial – Examples – Statistics – Recommendations and examples for creating and updating query-optimization statistics on tables in Azure SQL Data Warehouse.
Labs & Online Training
The following section provides additional training.
Video – Azure Data Factory Hands-on Lab – Load Azure SQL Data Warehouse with ADF V2
Best practices for loading data into Azure SQL Data Warehouse – Recommendations and performance optimizations for loading data into Azure SQL Data Warehouse.
Tutorial: Load data to Azure SQL Data Warehouse – This tutorial uses PolyBase to load the WideWorldImportersDW data warehouse from Azure Blob storage to Azure SQL Data Warehouse.
Tutorial: Load New York Taxicab data to Azure SQL Data Warehouse – This tutorial uses PolyBase to load New York Taxicab data from a public Azure blob to Azure SQL Data Warehouse.
Use Azure Functions to manage compute resources in Azure SQL Data Warehouse – This tutorial uses Azure Functions to manage compute resources for a data warehouse in Azure SQL Data Warehouse.
1 Learning styles – Wikipedia.org – https://en.wikipedia.org/wiki/Learning_styles Sept 28,2019