Using Azure Data Lake with Power BI
Power BI gives you the ability to use a wide variety of data sources in your data visualization project. As more data visualization workloads are including various big data sources, having the ability to use sources such as Azure Data Lake Store are becoming more common. This walkthrough shows you how to access files in a Data Lake and import them into a Power BI data model.
The real world scenario used in this example has a series of files produced each day, streamed into a time-based folder structure as pictured below.
The log files generated each hour are captured in a folder using the format, Year\Month\Day\Hour\*.csv. These raw files are then combined into analytic extracts using U-SQL or Data Bricks and placed into an area within the Data Lake for consumption in other tools and processes. This allows Power BI to use the same location to consume the data as the sources are refreshed. This keeps the heavy lifting to a minimum and in the ingestion and automation layer. The folder is stored in the data lake, secured by an Azure Active Directory account.
Formatting of Files
There are a couple of things to keep in mind when creating files for import. First, many files generated for big data consumption do not include the column headers. If you do not have column headers, you will have to add in column names later, which believe me, can become a lot of work. Second, the format of the files being in comma or at least delimited format can be a big help. You can import and transform unstructured or report style files using the Power BI Query Editor, but the more formatted the inputs, the higher the productivity boost of your group. If you have a group of people who consume the files for different projects, each group would have to duplicate the transformation process when using the files. Our example pictured below has a storage account that contains a number of folders. Capturing these files in this location will allow us to perform refreshes to the data model as these files are updated over time.
Step 1: Getting the ADL
In the Azure Administrative portal, bring up your Azure Data Lake instance. Open the Data Explorer and navigate to the folder containing your source data. Open the Folder Properties blade by clicking on the three dots, then select the option. On the Folder Properties blade, the URL of the folder you will need for Power BI is listed under Path. Copy this path to the clipboard.
Step 2: Get Data
The next step, from Power BI Desktop, is to connect Power BI to the data using Get Data toolbar item. The Azure section has the Azure Data Lake Store connector, Select and then Connect. Paste Path from the above step 1 into the URL and select OK. If this is the first time connecting, you will have to Sign-In using your Azure Active Directory Organizational Account. Select Sign-In which will bring up the standard Azure sign-on.
Step 3: Load this query into Power BI
This brings in the listing of all the files located in the folder. There are two choices, Load or Edit. At this point, I prefer to Load as this will bring the file listing into Power BI as a data source with the associated file metadata. I can save the Power BI book and carry on and load each file into the data model.
Step 4: Edit the Queries – Loading and Shaping
With the source listing loaded, we go back into the Edit Queries to show the file listing Query. As shown below, each file is listed in the Query. Power BI shows the files as Binary. There are two different ways to load the file. If you double click on the “Binary” link, an assumption on the file as to what type it is and loads into the model. This may or may not be how you want to deal with the file depending on the format that is assumed. Instead, for our example, we right click on one of the files and add it as a new query as shown below. This will also make the properties of the files available for reporting, to show when the file was last modified for example. The file will then be displayed as a new query and is presented as a file in the query window as shown below. You then right click on the file and select how you want the data to be handled on the import. This is rather straightforward with a .CSV formatted file, but you may have another extension you want to make sure is handled a specific way. The files will appear as a separate query ready for transformations in the Query Editor. Notice the Applied Steps pictured below. You can continue to make any modifications, shape or transform the data into a more analysis-ready format then load into your model. Now repeat for each file you want to load into the model which will give you a set of queries as pictured below. When ready, select Close & Apply to load into the model.
Step 5: Ready for Clean-Up
Once you have completed the steps, your data is ready for the data modeling process.
I hope you found this quick introduction useful. I am seeing more clients wanting to do some sort of quick ad-hoc analysis with their data lake. Power Bi provides that quick end-to-end ad-hoc tool. If you are interested in more, check out the resources below. The first resource is a walkthrough I wrote, available on this site, that would create the sliding window exports and folder structure that would create the source style referenced in this scenario.
How to use Azure Data Factory V2 Sliding Windows for SQL Exports to Azure Data Lake: The following article reviews the process of using Azure Data Factory V2 sliding windows triggers to archive data from SQL Azure DB. This process will automatically export records to Azure Data Lake into CSV files over a recurring period, providing a historical archive which will be available to various routines such as Azure Machine Learning, U-SQL Data Lake Analytics or other big data style workloads. This will generate one folder for each period captured. By Year/Month/Day/Hour Get started with Azure Data Lake Analytics using the Azure portal: This article describes how to use the Azure portal to create Azure Data Lake Analytics accounts, define jobs in U-SQL, and submit jobs to the Data Lake Analytics service. Develop U-SQL scripts using Data Lake Tools for Visual Studio: Learn how to use Visual Studio to create Azure Data Lake Analytics accounts, define jobs in U-SQL, and submit jobs to the Data Lake Analytics service Get started with Azure Data Lake Analytics U-SQL language: U-SQL is a language that combines declarative SQL with imperative C# to let you process data at any scale. This site provides an introduction and overview of the U-SQL language.