There are many aspects to a successful Azure Data Solution Architecture and this guide will review a set of curated resources that will get you started. The resources are split into a number of sections that roughly follow the order of your design decisions. With the wide variety of possible architectures, this post concentrates on a data reporting and analytics style of a solution.
Start with a High Level Data Solution Logical Architecture
When looking at a data architecture design, you really need to start with the requirements of your solution. All solutions can be bubbled down to a specific set of elements. As pictured below, you have the information you want to collect, either in real-time streaming or in batches. This information will be stored in a data lake or database where other people will use it to create analytics and reports to make decisions.
Most of the data solutions you create will start like this. Your next decisions will be what tools can you use to successfully manage your requirements.
All solutions start with the following design layers;
- Data Sources – Various types of data in different locations, make up information that you need to bring together.
- Ingestion – Processes to automate and transform the source data into various storage areas. This is represented by the Orchestration layer.
- Store – Depending on the type of data your are gathering, you will bring the data into various storage areas.
- Model & Serve – Once your have your raw data, creating data models and serving that data to down stream applications.
- Analytic Data Store – These is normally read-only systems that store data that supports business intelegence and analytic style queries. (See Delta Lake for another take)
- Analyze and Report – This represents the applications and products that are served up to end users who use and make decisions on the data. The datasets at this level are ready to use.
These functional areas translate to various products and services in Azure. Pictured below is how these products translate into various physical solution designs. Some tools can be used in multiple layers
First Resources – Start Here
The following resources provide you must have and must start with links. The Azure Architecture Center provides a landing page and guide to content. The site also provides a searchable set of Azure Architectures and use cases as both learning and inspiration. I have also provided a listing of various popular data solution architectures with examples of real-world deployments.
- Azure Architecture Center – Microsoft Docs – Main landing page for Azure Architecture. Guidance for architecting solutions on Azure using established patterns and practices.
- Browse Azure Architecture – Azure Architecture Center | Microsoft Docs – Find reference architectures, technology descriptions, real-world examples, and solution ideas for common workloads on Azure.
- Solution Archtiecture Example: Analytics end-to-end with Azure Synapse – | Microsoft Docs – This example scenario demonstrates how to use the extensive family of Azure Data Services to build a modern data platform capable of handling the most common data challenges in an organization. The solution described in this article combines a range of Azure services that will ingest, store, process, enrich, and serve data and insights from different sources (structured, semi-structured, unstructured, and streaming).
- Solution Archtiecture Example: Analytics architecture design – Azure Architecture Center | Microsoft Docs – The workflow starts with learning about common approaches, aligning processes and roles around a cloud mindset.
- Solution Archtiecture Example: Enterprise business intelligence – Azure Reference Architectures | Microsoft Docs – This reference architecture implements an extract, load, and transform (ELT) pipeline that moves data from an on-premises SQL Server database into Azure Synapse and transforms the data for analysis.
- Solution Archtiecture Example: SQL Server on Azure Virtual Machines with Azure NetApp Files – Microsoft Docs – The most demanding SQL Server database workloads require very high I/O capacity. They also need low-latency access to storage. This document describes a high-bandwidth, low-latency solution for SQL Server workloads.
- Solution Archtiecture Example: Enterprise Data Warehouse Architecture | Microsoft Docs – An enterprise data warehouse lets you bring together all your data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics for all your users.
- Solution Archtiecture Example: Real Time Analytics on Big Data Architecture – Azure Solution Ideas – Get insights from live streaming data with ease. Capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.
- Solution Archtiecture Example: Demand forecasting – Azure Solution Ideas | Microsoft Docs – Almost every business needs to predict the future to make better decisions and allocate resources more effectively. This article focuses on presenting useful links to the forecasting best practices and an example of detailed architecture for an end-to-end implementation in Azure.
Overall Azure Data Architecture Design
As you start to look at the design of your solution, the Microsoft Application Architecture Guide provides guidance on a series of steps summarized below. As with each decision, you need to take your solution requirements and run them past various architecture and technologies features. You make tradeoffs to determine what will be best for your solution today but also will grow and scale to match any planned growth.
This creates a tech stack that is a combination of technologies in scope for your solution. There is guidance around the benefits and challenges of each technology you choose.
What type of architecture are you constructing is the most basic detail and usually the first decision. It could be a big data solution, an old-school analytic solution, or part of a more traditional N-tier application. There are numerous architectural styles that require examining and selecting between advantages and challenges to each.
For our example here, we have been looking at big data architectures and will concentrate on those.
- Azure Application Architecture Guide – Azure Architecture Center – Overall application architecture, a set of architecture styles that are commonly found in cloud applications.
- Big data architecture style – Azure Application Architecture Guide – A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. This article provides an overview.
Application architectures all start with a technology choice or answering a specific set of questions that are best explained through defining your workload.
It is important to choose the right data store for your needs. There are many data implementations to choose from among the Azure database offerings. You select your data stores by their structure and operations. Each store supports different types of operations, SQL and NoSQL for example.
Let’s use the documented example. You have an application in one of the following uses cases;
- Inventory management
- Sales and Order management
- Event Organization
- Reporting database
- Accounting and Payroll
- Employee Performance Data
You might define your workload this way;
- Need to Create, Read, Update, and Delete (CRUD) heavy – frequently created and updated
- Support multiple operations and changes have to be completed in a single transaction – A.C.I.D. Atomicity, consistency, isolation, and durability.
- Data and subjects have relationships are enforced using database constraints
- Indexes are used to optimize query performance
Your solution might classify the data you wish to store with these specifications;
- Data is highly normalized.
- Database schemas are required and enforced.
- Many-to-many relationships between data entities in the database.
- Constraints are defined in the schema and imposed on any data in the database.
- Data requires high integrity. Indexes and relationships need to be maintained accurately.
- Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
- Size of individual data entries is small to medium-sized.
Defining our solution requirements and specification would lead you to use a Relational Data Store in this example. You would look at the following relational data services that match those workload requirements;
- Azure SQL Database
- Azure Synapse Analytics
- Azure SQL Managed Instances
- Azure VM running SQL Server
- Azure Database for MySQL
- Azure Database for PostgreSQL
- Azure Database for MariaDB
There are many other things to consider once you come to a decision on architecture. Decisions around costing and scale for example have a big impact on deployment and implementation. See How to Estimate Your Azure Solution Costs for more information on determining Azure costing.
Learn More: This is just one data store, the reference articles below review other data architectures, their advantages, and limitations.
- Understand data store models – Generally, you should start by considering which storage model is best suited for your requirements. Then consider a particular data store within that category, based on factors such as feature set, cost, and ease of management. This article covers a great process to get this done.
- Criteria for choosing a data store – This article describes the comparison criteria you should use when evaluating a data store. The goal is to help you determine which data storage types can meet your solution’s requirements.
- Must Read !!: – Data store decision tree – Azure Application Architecture Guide | Microsoft Docs – Azure offers a number of managed data storage solutions, each providing different features and capabilities. This article will help you to choose a managed data store for your application.
- Review your storage options – Cloud Adoption Framework – Storage capabilities are critical for supporting workloads and services that are hosted in the cloud. As you prepare for your cloud adoption, review this information to plan for your storage needs.
- Review your data options – Cloud Adoption Framework – When you prepare your landing zone environment for your cloud adoption, you need to determine the data requirements for hosting your workloads.
Designing the Application Architecture
The specific design of your application comes together once you have decided on the architecture style and technology components. Splitting the application architecture into the following areas will help to organize the tasks and resources.
Rather than reinvent the wheel or even starting from scratch, there are a number of Reference Architectures that may be a good place to start. There are considerations for security, resilience, availability, and other aspects of the design in each reference architecture. Some of these reference architectures also include a deployable solution.
There are 10 high-level Azure Data Solution Architecture design principles that allow your solution to be more scalable, resilient, and manageable. These are general principles that you can use with any architecture style. These principles include;
- Design for self healing. Design your application so it can survive and deal with failures.
- Make all things redundant. Build redundancy as to avoid having single points of failure.
- Minimize coordination. Minimize coordination between application services to achieve scalability.
- Design to scale out. Design your application so that it can scale horizontally, adding or removing new instances as demand requires.
- Partition around limits. Use partitioning to work around database, network, and compute limits.
- Design for operations. Design your application so that the operations team has the tools they need.
- Use managed services. When possible, use platform as a service (PaaS) rather than infrastructure as a service (IaaS). Let someone else manage your platform so you can concentrate on the solution.
- Use the best data store for the job. Pick the storage technology that is the best fit for your data. Watch out for edge cases and future growth of requiremtnes. What happens if your application goes viral?
- Design for evolution. If data solutions have one constant, it is that requirements change over time. Don’t design yourself into a corner.
- Build for the needs of business. Watch our for scope creap. Every design decision must be justified by a business requirement.
There are a number of Cloud design patterns that address specific challenges in distributed systems. They include availability, high availability, operational excellence, resiliency, performance, and security. There is a specific section in the resource below that covers Data management patterns.
I tend to not use the term Best Practice as this assumes that the suggestions are best for everyone. I prefer recommended practices as you should always make sure that you follow what is best for your specific solution and situation. One person’s best practice could grind your solution to a halt!!
Resource: Best practices in cloud applications – Azure Architecture Center | Microsoft Docs Specifically, Data partitioning and Monitoring and diagnostics.
Storage and processing of business data need high levels of confidentiality, integrity, and availability. The resources below cover some of the security and governance topics important for Azure Data Solution Architecture and design.
- Application security in Azure | Microsoft Docs – Applications and the data associated with them ultimately act as the primary store of business value on a cloud platform. This article covers a high level review of application platform security topics.
- Data Governance: Why You Need a Data Governance Process Now!! – 5MinuteBI – There is an immediate requirement for Data Governance Initiatives to determine how to secure data usage, manage activity, gain visibility and control of one of your most important assets. In many of the analytic projects I have been involved in, whether big or small, providing guidance to those using the data increases the adoption and long-term value of the solution.
- Protection of customer data in Azure | Microsoft Docs – Protection of your data in Azure
- Azure SQL Database security features | Microsoft Docs – To protect customer data and provide strong security features that customers expect from a relational database service, SQL Database has its own sets of security capabilities. These capabilities build upon the controls that are inherited from Azure.
What is the Microsoft Azure Well-Architected Framework?
Successful Azure Data Solutions start with a well define and architected platform. As more and more solutions have been migrated to Azure, Microsoft has produced a series of recommended practices and guidances called the Azure Well-architected Framework.
Remember that the goal of the framework is to help you design a solution that is of high quality, stable under load, cost-effective while being a scalable and efficient cloud architecture for your solution.
The Azure Well-Architected Framework is divided up into various principles, or tenets called the five pillars of architectural excellence. These sections review these principles with an eye to an Azure Data Solution architecture. Links in the table provide more detailed information on the topic.
|Reliability||The ability of a system to recover from failures and continue to function.|
|Security||Protecting applications and data from threats.|
|Cost Optimization||Managing costs to maximize the value delivered.|
|Operational Excellence||Operations processes that keep a system running in production.|
|Performance Efficiency||The ability of a system to adapt to changes in load.|
Utilizing Data Lakes & Delta Lakes in a Data Architecture
Azure Blob Storage
The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used may many different services in Azure. Blob storage can expose data publicly to the world, or store application data privately.
- Introduction to Blob storage
- Deciding when to use Azure Blobs, Azure Files, or Azure Disks
- Azure Storage Client Tools
- Get started with Storage Explorer (Preview)
- Getting the Latest Azure Storage Explorer Version
Azure Data Lake Store & Analytics
These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on the size of data. Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, which can then be used to run massively parallel analytics.
- Azure Data Lake Storage Gen2 Introduction – Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Because these capabilities are built on Blob storage, you’ll also get low-cost, tiered storage, with high availability/disaster recovery capabilities.
- Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark – This tutorial shows you how to connect your Azure Databricks cluster to data stored in an Azure storage account that has Azure Data Lake Storage Gen2 enabled. This connection enables you to natively run queries and analytics from your cluster on your data.
- Azure Data Lake Storage Gen2 Hierarchical Namespace – A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.
Azure Delta Lakes (Databricks)
Delta Lake is an interesting option for the Lakehouse architecture pattern put forward by Databricks. It addresses many of the challenges of traditional data architectures. This is becoming very popular option for data solutions. Learn More with this great introductory article, Simplify Your Lakehouse Architecture with Azure Databricks, Delta Lake, and Azure Data Lake Storage.
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, Scalable Metadata handling, and unified streaming and data processing. The Delta Lake runs on top of your existing data lake and is compatible with Apache.
Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast interactive queries.
- Delta Lake on Azure – Microsoft Tech Community – Shows how Delta integrate with other Azure Services.
- What is Delta Lake – Azure Synapse Analytics – Azure Synapse Analytics is compatible with Linux Foundation Delta Lake. Delta Lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and durability) transactions to Apache Spark and big data workloads. The current version of Delta Lake included with Azure Synapse has language support for Scala, PySpark, and .NET.
- Delta Lake and Delta Engine guide – Azure Databricks – Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast interactive queries. This guide covers Delta Lake on Azure Databricks and Delta Engine.
- Tutorial: Delta Lake quickstart – Azure Databricks – Step by step tutorial to get started.
- The emerging big data architectural pattern | Azure blog and updates – The Lambda architecture is a popular pattern that allows you to handle massive quantities of data by taking advantage of both a batch and stream-processing layer. This article reviews the reasons that have led to the popularity and success of the lambda architecture, particularly in big data processing pipelines.
- Azure Icons – Azure Architecture Center | Microsoft Docs – For your diagrams, these are SVG graphics that can be used in your documentation.
- Modern analytics architecture with Azure Databricks – Azure Solution Ideas | Microsoft Docs – Azure Databricks forms the core of the solution. This platform works seamlessly with other services such as Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, and Power BI.
- Case Study:How to reduce infrastructure costs by up to 80% with Azure Databricks and Delta Lake – Microsoft Tech Community – The implementation of the modern data architecture allowed Relogix to scale back costs on wasted compute resources by 80% while further empowering their data team.
- Microsoft Cloud Adoption Framework for Azure – Microsoft Doc – The Cloud Adoption Framework is a collection of documentation, implementation guidance, best practices, and tools that are proven guidance from Microsoft designed to accelerate your cloud adoption journey.
- Microsoft Azure Well-Architected Framework – Azure Architecture Center | Microsoft Docs – The Azure Well-Architected Framework is a set of guiding tenets that can be used to improve the quality of a workload.