Skip to content

Learning Resources to get you Started in Microsoft Azure Data Projects

As companies move data workloads to Microsoft’s Azure cloud, more and more resources are available online to help you get up to speed. The following learning map provides a path to these online resources not only to give you an overview but also to provide an experience of these technologies. This list is laid out in the general order you would want to review these technologies when implementing your data solution in Azure.

Note:  I am going to update and add to this list as time goes on and the services are updated.

Infrastructure Tools

Many of the technologies used in Azure require a knowledge of PowerShell. PowerShell is exposed in Azure as a set of cmdlets that leverage Azure Resource Manager. You can write and run Powershell in a browser using the Azure Cloud Shell, however, can also be installed on your local machine or Azure VM in a local PowerShell session.

Firewall Configuration & Security

As with any project, but especially with data projects, security should be job one.  The following resources cover the knowledge required to develop your data projects this securely.  Some of the links cover Azure as a whole and have been placed here for reference.

Azure Active Directory (Azure AD)

Azure Active Directory provides the cloud-based directory and identity management services that are used in Azure but most important, provides access and security to the data platform services.

Getting Data Into the Flow

All data projects start with getting data into the workflow. Azure Data Factory should be the starting point for most Azure data projects.

Azure Data Factory V2

At the time of this post, Azure Data Factory V2 is in a preview, however, is the future of the product. This provides the services that allow complex extract-transform-load (ETL) and now with big data tools such as Azure Data Lake, extract-load-transform (ELT).

The biggest feature of V2 is the ability to utilize SQL Server Integration Services (SSIS) packages in the flows. This allows you to leverage your current expertise as this really provides SSIS in the cloud.

Azure Data Factory V2 Preview Documentation (Includes tutorials and samples);

The following tutorials provide training and a walkthrough of the technology and how to implement data flows.

  1. Deploy SSIS packages to Azure – Data Factory UI | Azure PowerShell
  2. Copy data in cloud – Copy Data tool | Data Factory UI | .NET
  3. Copy on-premises data to cloud – Copy Data tool | Data Factory UI | Azure PowerShell
  4. Copy data in bulk – Data Factory UI | Azure PowerShell
  5. Copy data incrementally – Data Factory UI | Azure PowerShell
  6. Transform data in the cloud using Spark – Data Factory UI | Azure PowerShell
  7. Transform data in a virtual network – Data Factory UI | Azure PowerShell
  8. Control flow – Data Factory UI | .NET

Azure IR Gateway

At the time of this post access to on-premises data from Azure Data Factory V2 is through the Azure Data Factory Integration Runtime (IR). This product is installed on an on-premises VM or computer within the firewalls of your organization. You also need to open up specific firewalls for outbound traffic. This product was formerly known as the Data Management Gateway. As this product is in preview and not yet released, check in from time to time with the Microsoft DOCs site for updates.


Where to Store Your Data for Import

If you do not provide access directly to your on-premises data, you need to migrate it to Azure for import. For organizations who do not want to use on-premises connectivity, Azure storage is the best solution. Treating Azure Blob storage as just another server share keeps the same flat file data transfer model that many projects currently use.

Azure Blob Storage

The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used may many different services in Azure.  Blob storage can expose data publicly to the world, or to store application data privately.

Azure Data Lake Store & Analytics

These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on the size of data. Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, that can then be used to run massively-parallel analytics.

Data Lake Analytics provides the services to develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. The key to the services is that you only pay per job and not for the infrastructure to run them.

Use VSCode to run the tools used with these services.

Azure SQL Database

Azure SQL DB is a relational database-as-a-service using the Microsoft SQL Server Engine. The feature sets of the Azure SQL databases product match, with few exceptions, with the on-premises versions of SQL 2017.

Azure Analysis Services

Azure Analysis Services provides data modeling in the cloud. It is a fully managed platform as a service (PaaS), integrated with Azure data platform services. Check out the Azure Analysis Services Overview for a complete description. You can also publish your Power BI model into Azure Analysis Services to scale up your dashboards.


HDInsight is Microsoft’s implementation of Apache Hadoop that provides distributed processing and analysis of big data workloads. The following components are part of the HDInsight toolset;

Tools available in the HDInsight service;

Presentation & Visualization Tools

Power BI Desktop

Power BI Desktop is a suite of tools that provide full end to end data pipeline workflow, from ingestion to visualization. Using these free tools you can gather, cleanse, data model, add calculations, and provide data visualizations.

One of the more important aspects of the platform is the community behind the product.  The final link takes you to the end user forums where partners, MVPs, and users discuss project features, wish lists and question and answers.

Automation and Orchestration

Azure Automation

These tools allow you to operationalize your data gathering workflow. Utilizing PowerShell you can modify and create various Azure services and provide automated management tasks.

%d bloggers like this: