Skip to content
5MinuteBI Creating Data Power Users 5 Minutes at a time

5MinuteBI

Creating Data Power Users, 5 Minutes at a Time

  • Home
  • Blog
  • About
  • Contact
Home / Data Architecture / Power BI Solution Architecture /

Using Azure Data Lake with Power BI

BySteve Young Updated onMay 24, 2023

Power BI gives you the ability to use a wide variety of data sources in your data visualization project. As more data visualization workloads include big data sources, having the ability to use sources such as Azure Data Lake (ADL) is becoming more common. This walkthrough shows you how to access files in a Data Lake and import them into a Power BI data model.  

Scenario-Daily Files Drop to ADL

The real-world scenario in this example has a series of files produced each day, streamed into a time-based folder structure as pictured below.

The log files generated each hour are captured in a folder using the format, Year\Month\Day\Hour\*.csv.

These raw files are then combined into analytic extracts using U-SQL or Data Bricks and placed into an area within the Data Lake for consumption in other tools and processes.

This allows Power BI to use the same location to consume the data as the sources are refreshed. This keeps the heavy lifting to a minimum in the ingestion and automation layer.

The folder is stored in the data lake, secured by an Azure Active Directory account.

Formatting of Files

There are a couple of things to keep in mind when creating files for import.

First, many files generated for big data consumption do not include column headers. If you do not have column headers, you will have to add column names later, which can become a lot of work.

Second, the format of the files being in comma or at least delimited format can be a big help. You can import and transform unstructured or report-style files using the Power BI Query Editor, but the more formatted the inputs, the higher the productivity boost of your group.

Note that there is a move to parquet files. Apache Parquet is an open-source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Think of it as a text Excel file. The file retains the data format information and is compressed, saving space and memory.

If you have a group of people who consume the files for different projects, each group would have to duplicate the transformation process when using the files. Our example pictured below has a storage account that contains a number of folders. Capturing these files in this location will allow us to perform refreshes to the data model as these files are updated over time.

Step 1: Getting the Azure Data Lake Path Folder Path

  1. In the Azure Administrative portal, bring up your Azure Data Lake instance.
  2. Open the Data Explorer and navigate to the folder containing your source data.
  3. Open the Folder Properties blade by clicking on the three dots, then select the option.

On the Folder Properties blade, the URL of the folder you will need for setting up the Power BI datasource is listed under Path.

Copy this path to the clipboard.

Step 2: Use Power BI “Get Data” to Link Files

The next step,

  • From Power BI Desktop, connect Power BI to the data using the Get Data toolbar item
  • The Azure section has the Azure Data Lake connector,
  • Select and then Connect.

  • Paste Path from the above step 1 into the URL
  • Select the FileSystemView option
  • and select OK.

You now need to sign into the resource.

If this is your first time connecting, you must sign in using your Azure Active Directory Organizational Account, or the account your Operations area has provided.

  • Select Sign-In, which will bring up the standard Azure sign-on.

Step 3: Load this query into Power BI

This brings in the listing of all the files located in the folder.

There are two choices, Load or Edit. At this point, I prefer to Load as this will bring the file listing into Power BI as a data source with the associated file metadata.

I can save the Power BI book and carry on and load each file into the data model.

Step 4: Edit the Queries – Loading and Shaping

With the source listing loaded, we go back into the Edit Queries to show the file listing Query.

As shown below, each file is listed in the Query. Power BI shows the files as Binary.

There are two different ways to load the file. If you double-click on the “Binary” link, an assumption on the file as to what type it is loads into the model.

This may or may not be how you want to deal with the file, depending on the format that is assumed. Instead, for our example, we right-click on one of the files and add it as a new query, as shown below.

This will also make the properties of the files available for reporting, to show when the file was last modified for example.

The file will then be displayed as a new query and is presented as a file in the query window, as shown below. You then right-click on the file and select how you want the data to be handled on the import. This is rather straightforward with a .CSV formatted file, but you may have another extension you want to make sure is handled in a specific way.

The files will appear as separate queries ready for transformations in the Query Editor.

Notice the Applied Steps pictured below. You can continue to make any modifications, shape or transform the data into a more analysis-ready format, then load it into your model.

Now repeat for each file you want to load into the model, which will give you a set of queries as pictured below.

When ready, select Close & Apply to load into the model.

Step 5: Ready for Clean-Up

Once you have completed the steps, your data is ready for the data modeling process.

Closing

I hope you found this quick introduction useful. I am seeing more clients wanting to do some quick ad-hoc analysis with their data lake. Power Bi provides that quick end-to-end ad-hoc tool.

Related

Steve Young

With over 34 years of experience in the data and technology industry, the last 16 with Microsoft, I have had the opportunity to work in various capacities, contributing to my knowledge and expertise in Data Engineering, Power BI, and Data Visualization.

Facebook Twitter Instagram YouTube Linkedin Pinterest

Post navigation

Previous Previous
How To: Create the missing KPI Indicator In Power BI Desktop Using Conditional Formatting (Updated)
NextContinue
Productionalize, Productionize, or Productionise?
Search

Categories

  • Business Intelligence (0)
    • Self-Service BI (3)
  • Data Analytics (0)
    • Azure Data & Analytics (6)
    • Data Analysis With Python (3)
    • Data Storytelling (2)
  • Data Architecture (0)
    • Data Governance (5)
    • Data Modeling (2)
    • Power BI Solution Architecture (11)
  • Data Visualization (0)
    • Power BI Data Visualization Framework (2)
    • Power BI Data Visualizations (7)
  • Technical Education and Training (0)
    • AI in Education Content Creation (1)
    • Creating Training Materials (3)
    • Learning Paths in Tech (2)
    • Visual Learning (2)

Meta

  • Log in
  • Entries feed
  • Comments feed
  • Powered by WordPress.com.

Disclaimer: The views expressed are my own. I offer no guarantees for the accuracy of the information shared, and is for educational purposes only.

All non-original photography is sourced and licensed from my account on PEXELS,  STORYBLOCKS, iStockPhoto, and Pixabay. Please use our Contact Page if you have a question.

The information provided on this blog is for educational purposes only. Steve Young is not responsible for any errors or omissions or for any actions taken based on the information provided on this blog.

© 2023 5MinuteBI

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
  • Home
  • Blog
  • About
  • Contact