network engeneers working in network server room mini SBI

Learning Guide: Azure Data Solution Architecture Resources

There are many aspects to a successful Azure Data Solution Architecture, and this guide will review a set of curated resources that will get you started. The resources are split into several sections that roughly follow the order of your design decisions.   With the wide variety of possible architectures, this post concentrates on a solution’s data reporting and analytics style.

Start with a High Level Data Solution Logical Architecture

When looking at a data architecture design, you really need to start with the requirements of your solution.   All solutions can be bubbled down to a specific set of elements. As pictured below, you have the information you want to collect, either in real-time streaming or in batches. This information will be stored in a data lake or database where other people will use it to create analytics and reports to make decisions.  

Most of the data solutions you create will start like this. Your next decisions will be what tools you can use to manage your requirements successfully.   

Logical Data Architecture for an Analytical Solution

All solutions start with the following design layers;

  • Data Sources – Various types of data in different locations, make up information that you need to bring together.  
  • Ingestion – Processes to automate and transform the source data into various storage areas.  This is represented by the Orchestration layer.
  • Store – Depending on the type of data you are gathering; you will bring the data into various storage areas.
  • Model & Serve – Once you have your raw data, creating data models and serving that data to downstream applications.
  • Analytic Data Store – These is normally read-only systems that store data that supports business intelligence and analytic style queries. (See Delta Lake for another take)
  • Analyze and Report – This represents the applications and products that are served up to end users who use and make decisions on the data. The datasets at this level are ready to use.

These functional areas translate to various products and services in Azure. Pictured below is how these products translate into various physical solution designs. Some tools can be used in multiple layers

General Azure Physical Solution Architecture (Technology)

First Resources – Start Here

The following resources provide you must have and must start with links. The Azure Architecture Center provides a landing page and guide to content. The site also provides a searchable set of Azure Architectures and use cases as both learning and inspiration. I have also listed various popular data solution architectures with examples of real-world deployments.

Overall Azure Data Architecture Design

As you start to look at the design of your solution, the Microsoft Application Architecture Guide guides a series of steps summarized below. With each decision, you need to take your solution requirements and run them past various architecture and technologies features. Then, you make tradeoffs to determine what will be best for your solution today and will grow and scale to match any planned growth.

This creates a tech stack that combines technologies in scope for your solution. In addition, there is guidance around the benefits and challenges of each technology you choose.

Architecture Style

What type of architecture you are constructing is the most basic detail and usually the first decision. It could be a big data solution, an old-school analytic solution, or part of a more traditional N-tier application. Numerous architectural styles require examining and selecting between advantages and challenges to each.

For our example here, we have been looking at big data architectures and will concentrate on those.

Learn more

Technology Choices

Application architectures all start with a technology choice or answer a specific set of questions best explained by defining your workload. Note that you can also use Power BI as an embedded solution in your web application..

It is important to choose the right data store for your needs. There are many data implementations to choose from among the Azure database offerings. You select your data stores by their structure and operations. Each store supports different types of operations, SQL and NoSQL, for example.

Let’s use the documented example. You have an application in one of the following uses cases;

  • Inventory management
  • CRM
  • Sales and Order management
  • Event Organization
  • Reporting database
  • Accounting and Payroll
  • Employee Performance Data

You might define your workload this way;

  • Need to Create, Read, Update, and Delete (CRUD) heavy – frequently created and updated
  • Support multiple operations and changes have to be completed in a single transaction – A.C.I.D. Atomicity, consistency, isolation, and durability.
  • Data and subjects have relationships are enforced using database constraints
  • Indexes are used to optimize query performance

Your solution might classify the data you wish to store with these specifications;

  • Data is highly normalized.
  • Database schemas are required and enforced.
  • Many-to-many relationships between data entities in the database.
  • Constraints are defined in the schema and imposed on any data in the database.
  • Data requires high integrity. Indexes and relationships need to be maintained accurately.
  • Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
  • Size of individual data entries is small to medium-sized.

Defining our solution requirements and specification would lead you to use a Relational Data Store in this example. You would look at the following relational data services that match those workload requirements;

  • Azure SQL Database
  • Azure Synapse Analytics
  • Azure SQL Managed Instances
  • Azure VM running SQL Server
  • Azure Database for MySQL
  • Azure Database for PostgreSQL
  • Azure Database for MariaDB

There are many other things to consider once you decide on architecture. Decisions around costing and scale, for example, have a big impact on deployment and implementation. See How to Estimate Your Azure Solution Costs for more information on determining Azure costing.

Learn More: This is just one data store; the reference articles below review other data architectures, advantages, and limitations.

  • Understand data store models – Generally, you should start by considering which storage model is best suited for your requirements. Then consider a particular data store within that category, based on factors such as feature set, cost, and ease of management. This article covers a great process to get this done.
  • Criteria for choosing a data store – This article describes the comparison criteria you should use when evaluating a data store. The goal is to help you determine which data storage types can meet your solution’s requirements.
  • Must Read !!:Data store decision tree – Azure Application Architecture Guide | Microsoft Docs – Azure offers a number of managed data storage solutions, each providing different features and capabilities. This article will help you to choose a managed data store for your application.
  • Review your storage options – Cloud Adoption Framework – Storage capabilities are critical for supporting workloads and services that are hosted in the cloud. As you prepare for your cloud adoption, review this information to plan for your storage needs.
  • Review your data options – Cloud Adoption Framework – When you prepare your landing zone environment for your cloud adoption, you need to determine the data requirements for hosting your workloads. 

Designing the Application Architecture

The specific design of your application comes together once you have decided on the architecture style and technology components. Splitting the application architecture into the following areas will help organize the tasks and resources.

Reference Architectures

Rather than reinvent the wheel or even start from scratch, several Reference Architectures may be a good place to start. There are considerations for security, resilience, availability, and other design aspects in each reference architecture.   Some of these reference architectures also include a deployable solution.

Reference: Browse Azure Architecture – Azure Architecture Center | Microsoft Docs

Design Principles

10 high-level Azure Data Solution Architecture design principles allow your solution to be more scalable, resilient, and manageable. These are general principles that you can use with any architecture style. These principles include;

  • Design for self healing. Design your application so it can survive and deal with failures.
  • Make all things redundant. Build redundancy as to avoid having single points of failure.
  • Minimize coordination. Minimize coordination between application services to achieve scalability.
  • Design to scale out. Design your application so that it can scale horizontally, adding or removing new instances as demand requires.
  • Partition around limits. Use partitioning to work around database, network, and compute limits.
  • Design for operations. Design your application so that the operations team has the tools they need.
  • Use managed services. When possible, use platform as a service (PaaS) rather than infrastructure as a service (IaaS). Let someone else manage your platform so you can concentrate on the solution.
  • Use the best data store for the job. Pick the storage technology that is the best fit for your data. Watch out for edge cases and future growth of requirements. What happens if your application goes viral?
  • Design for evolution. If data solutions have one constant, it is that requirements change over time. Don’t design yourself into a corner.
  • Build for the needs of business. Watch our for scope creep. Every design decision must be justified by a business requirement.

Source: Design principles for Azure applications – Azure Architecture Center | Microsoft Docs

Design Patterns

Several Cloud design patterns address specific challenges in distributed systems. They include availability, high availability, operational excellence, resiliency, performance, and security. A specific section in the resource below covers Data management patterns.

Resource: Cloud design patterns – Azure Architecture Center | Microsoft Docs

Best Practices

I tend not to use the term Best Practice, assuming that the suggestions are best for everyone. I prefer recommended practices as you should always make sure that you follow what is best for your specific solution and situation. One person’s best practice could grind your solution to a halt!!

Resource: Best practices in cloud applications – Azure Architecture Center | Microsoft Docs Specifically, Data partitioning and Monitoring and diagnostics.

Security best practices

Storage and processing of business data need high confidentiality, integrity, and availability. The resources below cover some of the security and governance topics important for Azure Data Solution Architecture and design.

Resources:

  • Application security in Azure | Microsoft Docs – Applications and the data associated with them ultimately act as the primary store of business value on a cloud platform. This article covers a high-level review of application platform security topics.
  • Data Governance: Why You Need a Data Governance Process Now!! – 5MinuteBI – There is an immediate requirement for Data Governance Initiatives to determine how to secure data usage, manage activity, gain visibility and control of one of your most important assets. In many of the analytic projects I have been involved in, whether big or small, providing guidance to those using the data increases the adoption and long-term value of the solution.
  • Protection of customer data in Azure | Microsoft Docs – Protection of your data in Azure
  • Azure SQL Database security features | Microsoft Docs – To protect customer data and provide strong security features that customers expect from a relational database service, SQL Database has its own sets of security capabilities. These capabilities build upon the controls that are inherited from Azure.

What is the Microsoft Azure Well-Architected Framework?

Successful Azure Data Solutions start with a well-defined and architected platform. As more and more solutions have been migrated to Azure, Microsoft has produced a series of recommended practices and guidance called the Azure Well-architected Framework

Remember that the framework’s goal is to help you design a solution that is of high quality, stable under load, cost-effective while being a scalable and efficient cloud architecture for your solution.

The Azure Well-Architected Framework is divided into various principles or tenets called the five pillars of architectural excellence. These sections review these principles with an eye to an Azure Data Solution architecture. In addition, links in the table provide more detailed information on the topic. 

PillarDescription
ReliabilityThe ability of a system to recover from failures and continue to function.
SecurityProtecting applications and data from threats.
Cost OptimizationManaging costs to maximize the value delivered.
Operational ExcellenceOperations processes that keep a system running in production.
Performance EfficiencyThe ability of a system to adapt to changes in load.
Source: Microsoft Azure Well-Architected Framework

Utilizing Data Lakes & Delta Lakes in a Data Architecture

Azure Blob Storage

The Azure Blob storage is a service that provides the ability to store large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. This is a low-cost way to store large amounts of data that can then be used in many different Azure services. Blob storage can expose data publicly to the world or store application data privately.

Azure Data Lake Store & Analytics

These services allow unstructured, semi-structured, and structured data to be stored in an enterprise-class service with no limits on data size. In addition, Azure Data Lake Store is secured, scalable, and built to the open HDFS standard, which can then run massively parallel analytics.

Resources:

  • Azure Data Lake Storage Gen2 Introduction – Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Because these capabilities are built on Blob storage, you’ll also get low-cost, tiered storage, with high availability/disaster recovery capabilities.
  • Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark – This tutorial shows you how to connect your Azure Databricks cluster to data stored in an Azure storage account that has Azure Data Lake Storage Gen2 enabled. This connection enables you to natively run queries and analytics from your cluster on your data.
  • Azure Data Lake Storage Gen2 Hierarchical Namespace – A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.

Azure Delta Lakes (Databricks)

Delta Lake is an interesting option for the Lakehouse architecture pattern put forward by Databricks. It addresses many of the challenges of traditional data architectures. As a result, this is becoming a prevalent option for data solutions. Learn More with this great introductory article, Simplify Your Lakehouse Architecture with Azure Databricks, Delta Lake, and Azure Data Lake Storage.

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, Scalable Metadata handling, and unified streaming and data processing. The Delta Lake runs on top of your existing data lake and is compatible with Apache.

Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast, interactive queries.

Resources:

Resources


Comments

2 responses to “Learning Guide: Azure Data Solution Architecture Resources”

  1. […] analytic projects I have been involved in, whether big or small, you have to guide those using the data solution and data architectures you […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.