| |

Learning Guide: Azure Data Solution Architecture Resources

There are many aspects to a successful Azure Data Solution Architecture and this guide will review a set of curated resources that will get you started. The resources are split into a number of sections that roughly follow the order of your design decisions.   With the wide variety of possible architectures, this post concentrates on a data reporting and analytics style of a solution.

Start with a High Level Data Solution Logical Architecture

When looking at a data architecture design, you really need to start with the requirements of your solution.   All solutions can be bubbled down to a specific set of elements.  As pictured below, you have the information you want to collect, either in real-time streaming or in batches.  This information will be stored in a data lake or database where other people will use it to create analytics and reports to make decisions.  

Most of the data solutions you create will start like this.  Your next decisions will be what tools can you use to successfully manage your requirements.   

Logical Data Architecture for an Analytical Solution

All solutions start with the following design layers;

  • Data Sources – Various types of data in different locations, make up information that you need to bring together.  
  • Ingestion – Processes to automate and transform the source data into various storage areas.  This is represented by the Orchestration layer.
  • Store – Depending on the type of data your are gathering, you will bring the data into various storage areas.
  • Model & Serve – Once your have your raw data, creating data models and serving that data to down stream applications.
  • Analytic Data Store – These is normally read-only systems that store data that supports business intelegence and analytic style queries. (See Delta Lake for another take)
  • Analyze and Report – This represents the applications and products that are served up to end users who use and make decisions on the data. The datasets at this level are ready to use.

These functional areas translate to various products and services in Azure. Pictured below is how these products translate into various physical solution designs. Some tools can be used in multiple layers

General Azure Physical Solution Architecture (Technology)

First Resources – Start Here

The following resources provide you must have and must start with links. The Azure Architecture Center provides a landing page and guide to content. The site also provides a searchable set of Azure Architectures and use cases as both learning and inspiration. I have also provided a listing of various popular data solution architectures with examples of real-world deployments.

Overall Azure Data Architecture Design

As you start to look at the design of your solution, the Microsoft Application Architecture Guide provides guidance on a series of steps summarized below. As with each decision, you need to take your solution requirements and run them past various architecture and technologies features. You make tradeoffs to determine what will be best for your solution today but also will grow and scale to match any planned growth.

This creates a tech stack that is a combination of technologies in scope for your solution. There is guidance around the benefits and challenges of each technology you choose.

Architecture Style

What type of architecture are you constructing is the most basic detail and usually the first decision. It could be a big data solution, an old-school analytic solution, or part of a more traditional N-tier application. There are numerous architectural styles that require examining and selecting between advantages and challenges to each.

For our example here, we have been looking at big data architectures and will concentrate on those.

Learn more

Technology Choices

Application architectures all start with a technology choice or answering a specific set of questions that are best explained through defining your workload.

It is important to choose the right data store for your needs.  There are many data implementations to choose from among the Azure database offerings.  You select your data stores by their structure and operations. Each store supports different types of operations,  SQL and NoSQL for example.

Let’s use the documented example. You have an application in one of the following uses cases;

  • Inventory management
  • CRM
  • Sales and Order management
  • Event Organization
  • Reporting database
  • Accounting and Payroll
  • Employee Performance Data

You might define your workload this way;

  • Need to Create, Read, Update, and Delete (CRUD) heavy – frequently created and updated
  • Support multiple operations and changes have to be completed in a single transaction – A.C.I.D. Atomicity, consistency, isolation, and durability.
  • Data and subjects have relationships are enforced using database constraints
  • Indexes are used to optimize query performance

Your solution might classify the data you wish to store with these specifications;

  • Data is highly normalized.
  • Database schemas are required and enforced.
  • Many-to-many relationships between data entities in the database.
  • Constraints are defined in the schema and imposed on any data in the database.
  • Data requires high integrity. Indexes and relationships need to be maintained accurately.
  • Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
  • Size of individual data entries is small to medium-sized.

Defining our solution requirements and specification would lead you to use a Relational Data Store in this example. You would look at the following relational data services that match those workload requirements;

  • Azure SQL Database
  • Azure Synapse Analytics
  • Azure SQL Managed Instances
  • Azure VM running SQL Server
  • Azure Database for MySQL
  • Azure Database for PostgreSQL
  • Azure Database for MariaDB

There are many other things to consider once you come to a decision on architecture. Decisions around costing and scale for example have a big impact on deployment and implementation. See How to Estimate Your Azure Solution Costs for more information on determining Azure costing.

Learn More: This is just one data store, the reference articles below review other data architectures, their advantages, and limitations.

Designing the Application Architecture

The specific design of your application comes together once you have decided on the architecture style and technology components. Splitting the application architecture into the following areas will help to organize the tasks and resources.

Reference Architectures

Rather than reinvent the wheel or even starting from scratch, there are a number of Reference Architectures that may be a good place to start. There are considerations for security, resilience, availability, and other aspects of the design in each reference architecture.   Some of these reference architectures also include a deployable solution.

Reference: Browse Azure Architecture – Azure Architecture Center | Microsoft Docs

Design Principles

There are 10 high-level Azure Data Solution Architecture design principles that allow your solution to be more scalable, resilient, and manageable. These are general principles that you can use with any architecture style. These principles include;

  • Design for self healing. Design your application so it can survive and deal with failures.
  • Make all things redundant. Build redundancy as to avoid having single points of failure.
  • Minimize coordination. Minimize coordination between application services to achieve scalability.
  • Design to scale out. Design your application so that it can scale horizontally, adding or removing new instances as demand requires.
  • Partition around limits. Use partitioning to work around database, network, and compute limits.
  • Design for operations. Design your application so that the operations team has the tools they need.
  • Use managed services. When possible, use platform as a service (PaaS) rather than infrastructure as a service (IaaS). Let someone else manage your platform so you can concentrate on the solution.
  • Use the best data store for the job. Pick the storage technology that is the best fit for your data. Watch out for edge cases and future growth of requiremtnes. What happens if your application goes viral?
  • Design for evolution. If data solutions have one constant, it is that requirements change over time. Don’t design yourself into a corner.
  • Build for the needs of business. Watch our for scope creap. Every design decision must be justified by a business requirement.

Source: Design principles for Azure applications – Azure Architecture Center | Microsoft Docs

Design Patterns

There are a number of Cloud design patterns that address specific challenges in distributed systems. They include availability, high availability, operational excellence, resiliency, performance, and security. There is a specific section in the resource below that covers Data management patterns.

Resource: Cloud design patterns – Azure Architecture Center | Microsoft Docs

Best Practices

I tend to not use the term Best Practice as this assumes that the suggestions are best for everyone. I prefer recommended practices as you should always make sure that you follow what is best for your specific solution and situation. One person’s best practice could grind your solution to a halt!!

Resource: Best practices in cloud applications – Azure Architecture Center | Microsoft Docs Specifically, Data partitioning and Monitoring and diagnostics.

Security best practices

Storage and processing of business data need high levels of confidentiality, integrity, and availability. The resources below cover some of the security and governance topics important for Azure Data Solution Architecture and design.

Resources:

  • Application security in Azure | Microsoft Docs – Applications and the data associated with them ultimately act as the primary store of business value on a cloud platform. This article covers a high level review of application platform security topics.
  • Data Governance: Why You Need a Data Governance Process Now!! – 5MinuteBI – There is an immediate requirement for Data Governance Initiatives to determine how to secure data usage, manage activity, gain visibility and control of one of your most important assets. In many of the analytic projects I have been involved in, whether big or small, providing guidance to those using the data increases the adoption and long-term value of the solution.
  • Protection of customer data in Azure | Microsoft Docs – Protection of your data in Azure
  • Azure SQL Database security features | Microsoft Docs – To protect customer data and provide strong security features that customers expect from a relational database service, SQL Database has its own sets of security capabilities. These capabilities build upon the controls that are inherited from Azure.

What is the Microsoft Azure Well-Architected Framework?

Successful Azure Data Solutions start with a well define and architected platform.  As more and more solutions have been migrated to Azure, Microsoft has produced a series of recommended practices and guidances called the Azure Well-architected Framework

Remember that the goal of the framework is to help you design a solution that is of high quality, stable under load, cost-effective while being a scalable and efficient cloud architecture for your solution.

The Azure Well-Architected Framework is divided up into various principles, or tenets called the five pillars of architectural excellence.  These sections review these principles with an eye to an Azure Data Solution architecture.  Links in the table provide more detailed information on the topic. 

PillarDescription
ReliabilityThe ability of a system to recover from failures and continue to function.
SecurityProtecting applications and data from threats.
Cost OptimizationManaging costs to maximize the value delivered.
Operational ExcellenceOperations processes that keep a system running in production.
Performance EfficiencyThe ability of a system to adapt to changes in load.
Source: Microsoft Azure Well-Architected Framework

Utilizing Data Lakes & Delta Lakes in a Data Architecture

Azure Blob Storage

The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used may many different services in Azure.  Blob storage can expose data publicly to the world, or store application data privately.

Azure Data Lake Store & Analytics

These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on the size of data. Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, which can then be used to run massively parallel analytics.

Resources:

  • Azure Data Lake Storage Gen2 Introduction – Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Because these capabilities are built on Blob storage, you’ll also get low-cost, tiered storage, with high availability/disaster recovery capabilities.
  • Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark – This tutorial shows you how to connect your Azure Databricks cluster to data stored in an Azure storage account that has Azure Data Lake Storage Gen2 enabled. This connection enables you to natively run queries and analytics from your cluster on your data.
  • Azure Data Lake Storage Gen2 Hierarchical Namespace – A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.

Azure Delta Lakes (Databricks)

Delta Lake is an interesting option for the Lakehouse architecture pattern put forward by Databricks. It addresses many of the challenges of traditional data architectures. This is becoming very popular option for data solutions. Learn More with this great introductory article, Simplify Your Lakehouse Architecture with Azure Databricks, Delta Lake, and Azure Data Lake Storage.

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, Scalable Metadata handling, and unified streaming and data processing. The Delta Lake runs on top of your existing data lake and is compatible with Apache.

Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast interactive queries.

Resources:

Resources

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.