Learning Resources to get you Started in Microsoft Azure Data Projects
As companies move data workloads to Microsoft’s Azure cloud, more and more resources are available online to help you get up to speed. The following learning map provides a path to these online resources not only to give you an overview but also to provide an experience of these technologies. This list is laid out in the general order you would want to review these technologies when implementing your data solution in Azure.
Note: I am going to update and add to this list as time goes on and the services are updated.
Many of the technologies used in Azure require a knowledge of PowerShell. PowerShell is exposed in Azure as a set of cmdlets that leverage Azure Resource Manager. You can write and run Powershell in a browser using the Azure Cloud Shell, however, can also be installed on your local machine or Azure VM in a local PowerShell session.
- Overview of Azure PowerShell
- Install and configure Azure PowerShell
- Azure PowerShell samples for Azure Data Factory
- Tutorial: Getting started with Azure PowerShell
- Create a data factory by using the Azure Data Factory UI
- Samples: Azure PowerShell samples for Azure SQL Database
Firewall Configuration & Security
As with any project, but especially with data projects, security should be job one. The following resources cover the knowledge required to develop your data projects this securely. Some of the links cover Azure as a whole and have been placed here for reference.
- Azure SQL Database server-level and database-level firewall rules
- Configure Azure Storage Firewalls and Virtual Networks
- Azure Analysis Services adds firewall support
- Securing your Azure SQL Database
- Secure access to an application’s data in the cloud
- Microsoft Azure Trust Center
- Technical Overview of the Security Features in the Azure Platform
Azure Active Directory (Azure AD)
Azure Active Directory provides the cloud-based directory and identity management services that are used in Azure but most important, provides access and security to the data platform services.
Getting Data Into the Flow
All data projects start with getting data into the workflow. Azure Data Factory should be the starting point for most Azure data projects.
Azure Data Factory V2
At the time of this post, Azure Data Factory V2 is in a preview, however, is the future of the product. This provides the services that allow complex extract-transform-load (ETL) and now with big data tools such as Azure Data Lake, extract-load-transform (ELT).
The biggest feature of V2 is the ability to utilize SQL Server Integration Services (SSIS) packages in the flows. This allows you to leverage your current expertise as this really provides SSIS in the cloud.
Azure Data Factory V2 Preview Documentation (Includes tutorials and samples);
The following tutorials provide training and a walkthrough of the technology and how to implement data flows.
- Deploy SSIS packages to Azure – Data Factory UI | Azure PowerShell
- Copy data in cloud – Copy Data tool | Data Factory UI | .NET
- Copy on-premises data to cloud – Copy Data tool | Data Factory UI | Azure PowerShell
- Copy data in bulk – Data Factory UI | Azure PowerShell
- Copy data incrementally – Data Factory UI | Azure PowerShell
- Transform data in the cloud using Spark – Data Factory UI | Azure PowerShell
- Transform data in a virtual network – Data Factory UI | Azure PowerShell
- Control flow – Data Factory UI | .NET
Azure IR Gateway
At the time of this post access to on-premises data from Azure Data Factory V2 is through the Azure Data Factory Integration Runtime (IR). This product is installed on an on-premises VM or computer within the firewalls of your organization. You also need to open up specific firewalls for outbound traffic. This product was formerly known as the Data Management Gateway. As this product is in preview and not yet released, check in from time to time with the Microsoft DOCs site for updates.
Where to Store Your Data for Import
If you do not provide access directly to your on-premises data, you need to migrate it to Azure for import. For organizations who do not want to use on-premises connectivity, Azure storage is the best solution. Treating Azure Blob storage as just another server share keeps the same flat file data transfer model that many projects currently use.
Azure Blob Storage
The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used may many different services in Azure. Blob storage can expose data publicly to the world, or to store application data privately.
- Introduction to Blob storage
- Deciding when to use Azure Blobs, Azure Files, or Azure Disks
- Azure Storage Client Tools
- Get started with Storage Explorer (Preview)
- Getting the Latest Azure Storage Explorer Version
Azure Data Lake Store & Analytics
These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on the size of data. Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, that can then be used to run massively-parallel analytics.
Data Lake Analytics provides the services to develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. The key to the services is that you only pay per job and not for the infrastructure to run them.
Use VSCode to run the tools used with these services.
Azure SQL Database
Azure SQL DB is a relational database-as-a-service using the Microsoft SQL Server Engine. The feature sets of the Azure SQL databases product match, with few exceptions, with the on-premises versions of SQL 2017.
- Azure SQL Database Documentation
- Securing your SQL Database
- SQL Tools and Utilities for SQL Server, Azure SQL Database, and Azure SQL Data Warehouse
- SQL Server Management Studio (SSMS)
- SQL Server Data Tools (SSDT)
- Azure SQL Database: Use Visual Studio Code to connect and query data
Azure Analysis Services
Azure Analysis Services provides data modeling in the cloud. It is a fully managed platform as a service (PaaS), integrated with Azure data platform services. Check out the Azure Analysis Services Overview for a complete description. You can also publish your Power BI model into Azure Analysis Services to scale up your dashboards.
- Azure Analysis Services overview
- Analysis Services Documentation
- Data Analysis Expressions (DAX) Reference
- Introduction to DAX
- Community – Analysis Services Team blog
HDInsight is Microsoft’s implementation of Apache Hadoop that provides distributed processing and analysis of big data workloads. The following components are part of the HDInsight toolset;
Tools available in the HDInsight service;
Presentation & Visualization Tools
Power BI Desktop
Power BI Desktop is a suite of tools that provide full end to end data pipeline workflow, from ingestion to visualization. Using these free tools you can gather, cleanse, data model, add calculations, and provide data visualizations.
One of the more important aspects of the platform is the community behind the product. The final link takes you to the end user forums where partners, MVPs, and users discuss project features, wish lists and question and answers.
- Landing Page
- Power BI Documentation
- Power BI Service
- Power BI Desktop
- Guided Learning
- Power BI Developer Resources
- Power BI Mobile Apps
- Community – PowerBI Team blog
- Community Forums
Automation and Orchestration
These tools allow you to operationalize your data gathering workflow. Utilizing PowerShell you can modify and create various Azure services and provide automated management tasks.