Skip to content

Tag: Power BI

Creating complex real-time dashboards with Power BI

Image result for real-time

If you are new to Power BI, or if you’ve worked with Power BI Desktop, you’re familiar with the concept of refreshing data. By default, Power BI caches data which needs to be refreshed on a periodic basis. Reports that use Direct Query datasets do not need to have their caches refreshed, but to see data changes, the report pages themselves need to be refreshed. If the requirement is to have visuals on screen refreshed without any user intervention at all, it is necessary to use a streaming dataset.

Unlike regular datasets, data is not “pulled” into a streaming dataset, rather it is “pushed” in through the Power BI API, Microsoft Flow, Azure Stream Analytics, or third party services such as PubNub. This article aims to explore the various ways of working with streaming datasets.

Creating and populating a streaming dataset

Streaming datasets are created directly in the Power BI service itself or through the Power BI API. Unlike with other dataset types, there is no schema to read in from an external source.

To create a streaming dataset, choose “create” from a workspace menu in the upper right, and then select “Streaming dataset”.

Select the type of the streaming dataset. For both API and flow, choose the API option. then select “Next”.

Streaming datasets contain only a small subset of the data types supported by regular datasets. These types are Text, Number and DateTime. Give the dataset a name and then create all the necessary fields.

It is important to choose the dataset correctly, as there is no opportunity to transform fields into different types within reports or dashboards.

As fields are added, the JSON definition of the dataset is available. This can be copied and used by the data source that is pushing the data into the dataset. Note that Microsoft Flow can read the schema directly, so that copying is not necessary

In order to use the techniques outlined below, it is critical to turn on the “Historic Data Analysis” switch. This switch changes the dataset from a streaming dataset to a push dataset.

With a streaming dataset, data is stored in a cache long enough to display in a dashboard tile and it expires very quickly. a push dataset retains the data permanently up to a limit of 15 MB. In order to create more complex visuals, a report must be created, and a report requires a push dataset.

A push dataset is identified as “Hybrid” in the Power BI dataset list.

The most important option to select in the definition of the dataset is “retain historical data” If this option is not selected dashboard tiles will be able to display current data, but will not be able to display it over any significant time period. Data will be loaded into the dashboard cache for use with dashboard tiles, but when the cache expires, so doe the data. In order to use reports of any kind with a streaming dataset, this option must be selected.

Once created, data can be added through the API, Microsoft Flow, Stream Analytics or PubNub.

It should be noted that data stored in this way will only be available to Power BI, and only until the limits are reached. As the dataset fills, the oldest data will drop off. If there is any requirement to analyze the data over any significant amount of time, it is highly recommended that it also be stored in another location.

Adding a tile

Dashboard tiles can be created directly by Opening a dashboard and selecting “Add tile” from the ribbon menu. Select the Real-Time Data tile, and then select the dataset to use.

Tiles created in this way are limited to several visual types. These types are:

  • Card
  • Line chart
  • Clustered bar chart
  • Clustered column chart
  • Gauge

There are a limited number of configuration options available to these tile, depending on the tile type.Tiles created in this way will display data from the point of creation forward, according to the settings for the visual itself. These values will update in real time as data is added to the dataset, with no user intervention or refreshing required.

In order to display different types of visuals, or to use customize them beyond what is available directly in the dashboard it is necessary to create a report.

Adding a report in the service

Once created, the streaming dataset will appear in the service like any other. As with other datasets, selecting it from the dataset menu will open a new report canvas that can be saved. The report canvas in the service allows any of the Power BI visuals to be used with the streaming dataset.

Visuals on a report do not update automatically as data is pushed into the dataset, but these visuals can be pinned to a dashboard. Once pinned, the dashboard til will update automatically, so in this way, practically any visual can be added to a dashboard and updated in real time. All that is necessary is first create a report.

Creating a report in the service allows full fidelity access to the report canvas and all of the available visual types, but it does not allow for any editing of the data model. If things like calclated measures and columns are needed, it is necessary to create a report using Power BI Desktop.

Adding a report with Power BI Desktop

Power BI Desktop is able to connect directly to datasets in the Power BI service and push datasets are no exception. To connect to a streaming dataset (or any other), select the “Get Data” button, select “More”, then select the Power BI tab. Finally, select the Power BI dataset option, then select “Connect”

Next, select the workspace that contains the real-time dataset, and select the dataset itself. Selecting Load will establish a connection between the report and the dataset,

Because the report uses a direct connection to the dataset in the service, there is not data transformation opportunity and Power Query cannot be used. Additionally, several DAX functions are not available. For example, most of the functions on the “Modeling” tab are unavailable. It is also not possible to create calculated columns, but calculated measures can be created. Using Power BI Desktop, some relatively complex visuals can be created.

Once the report is published to the service, the visuals can be pinned to a dashboard, and once pinned, they will update automatically in real time.

Purging data

From time to time, it may be necessary to purge the data from the push dataset to reset the dashboard. To do this, the dataset can be temporarily changed from “push” mode to “streaming” mode. This will purge the stored data. Setting it back to “push” will start storing the data again.

To change the mode of the dataset, select the “Datasets” tab from the workspace menu, and then select the “edit” icon for the database that is to be changed.

The option that changes the mode it “Historic data analysis”. Switching it off changes it to a streaming dataset, and switching it on changes it to a push dataset.

At first, it may seem that visualizing real-time data in Power BI is quite limited due to the limited nature of tiles creating in dashboards. However, by using push datasets along with Power BI Desktop built reports allows for relatively complex visuals to be viewed in real time.

3 Comments

Using Microsoft Graph Data Connect with Power BI Dataflows

Microsoft Graph data connect (GDC) is a connector technology that allows an organization to extract data in bulk from the Microsoft Graph. Using Azure Data Factory, extraction jobs can be scheduled that can securely extract Graph data while respecting an organization’s data control policies. On a scheduled basis, GDC stages the data behind the scenes, and stores it in an Azure storage account. The storage can either be Azure Blob storage, Azure Data lake Gen 1, or Azure Data Lake Gen 2. This article describes a procedure to process the output from GDC and store it in a Power BI dataflow.

Details on how to configure GDC can be found here, and an excellent video tutorial here.

Azure Data Lake Gen 2 Storage

Azure Data Lake Gen 2 (ADLG2) brings a hierarchical namespace to Azure Blob storage. This storage system is designed for big data analytics and is highly cost effective. It is one of the three data sink (destination) options for GDC, and it is the required storage system for the “bring your own”, or external storage option of Power BI Dataflows. Given that n ADLG2 account is required for the Power BI Dataflows, it is logical to use the same account as the GDC data sink, but it is not required.

In order to use an ADLG2 account for external storage with Power BI dataflows, it must be in the same data center as the Power BI tenant. The data center for a tenant can be determined by navigating to the Power BI web application, selectin the “?” icon in the upper right, and then selecting “About Power BI”.

In order to be able to use an external storage account for Power BI dataflows, it MUST be created in the data center listed in “Your data is stored in”.

Connecting Power BI to ADLGen2 Storage

When a dataflow is created in Power BI, it is stored in an ADLG2 storage system managed by Microsoft. If Power BI is the only platform that will access the data, this is perfectly adequate, but an organization may wish to use the data with other tools. If this is the case, a Power BI tenant can be connected to an ADLG2 account that is accessible to other tools. A workspace administrator can then decide to have all the dataflows in that workspace store their data in the custom storage account. These are known as “external dataflows”. Dataflows are all stored in Common Data Model (CDM) folders which are described in detail here.

Detailed instructions on configuring external dataflow storage for Power BI can be found here . The process consists of several steps. It should be noted that as of this writing, external dataflows are in preview, and these steps could change.

  1. If one does not already exist, create an ADLG2 account in the same tenant as Power BI
  2. In Azure, Grant the Reader role to the Power BI service identity for the account in #1
  3. Create a file system for Power BI. The file system MUST be named “powerbi”
  4. Using Azure Storage Explorer, grant file system access to three Power BI service principals, Power BI Premium, Power BI Service, and Power Query Online (see the above link for details)
  5. Connect the Power BI tenant to the ADLG1 account
  6. Enable workspace administrators to assign workspaces to external storage.

As of this writing, step #5 above is irreversible. Care should be taken with its name.

Once configured, a workspace administrator can assign their workspace to their external storage. This setting is a property of the workspace, and can be accessed via its settings with the “Storage” tab.

Once this setting has been enabled, all dataflows will be stored in external storage. A folder is created within the file system created in step #3 above with the name of the workspace. Each dataflow in the workspace will be added within that folder, and each entity of the dataflow as a folder of its own. The dataflow folder will contain a file named model.json which describes the entities, and the entity folders contain multiple csv files which house the data itself. Within Azure Data Explorer, the structure appears as below.

  1. Azure Data Lake Gen 2 account (connected to the Power BI tenant)
  2. File system created for Power BI dataflows (always named powerbi)
  3. Workspace folder
  4. Dataflow folder
  5. JSON file describing the dataflow
  6. Entity folder containing entity data

Once configured, Power Query Online (part of the process of creating a dataflow) can be used to acquire and transform data. The data will be stored in these folders according to the Common Data Model specification and can be accessed by other applications. However, the reverse of this is also true. Any CDM folder that is stored in the Power BI connected file system can be connected to Power BI as an external dataflow. The process for doing this is described here. The order of operations is important. The user that will make the connection needs to be granted access to the CDM folder before it is populated with data.

An external dataflow is read only with respect to Power BI (Power BI only sees the data; it does not transform it). The goal is therefore to transform the data created by Graph data connect into the CDM format. Azure Databricks provides support for doing so.

Azure Databricks

Azure Databrick is a suite of serverless big data technologies that encompass Hadoop, Apache Spark, SQL, Python and Scala technologies. Databricks clusters can be created and used when needed and discarded or suspended when not as needed. A discussion of how to create and use Databricks is beyond the scope of this post, but there is a great deal of documentation on it here. In addition, Microsoft provides a free 14-day trial of Azure Databricks.

Databricks is particularly useful in this scenario, as it has libraries that support Azure Data Lake Gen 2, and libraries that support the Common Data Model. Databricks notebooks can be called from Azure Data Factory, so that when a GDC extraction job is completed, the resulting files can be processed with Databricks to populate the CDM folders. 

An excellent tutorial on using Databricks with dataflows and CDM folders can be found on GitHub here. The scenario in the tutorial involves using dataflows to produce data instead of consuming it, but it does cover off several important concepts. The tutorial is part of the project that includes the CDM library for Databricks which is used to transform GDC data into CDM folders.

As of this writing, the CDM library requires a Databricks 4.3.x-scala2.11 cluster. This is an older configuration that is not available to the standard user interface when creating a Databricks cluster. Subsequent versions of the CDM library will most likely support newer clusters, but at present, it is necessary to take a few additional steps during cluster creation.

From the cluster creation UI, specify window.prefs.set(“enableCustomSparkVersions”, true) in the browser debug console, and then navigate to the cluster page, and specify the image tag below. Refresh the browser and then 4.3.x-scala2.11 will be listed as a custom version.

Once a cluster has been created, and the CDM library loaded into it, a notebook can be created to process the GDC data. Processing consists of four main steps. Connecting Databricks to ADLG2, Reading the JSON files from GDC, extracting the desired data into dataframes, and writing the data out to CDM folders.

Connecting Databricks to ADLG2

The recommended way to connect Databricks to ADLG2 storage is through a Service Principal. The same principal that GDC itself uses can be used, and if the same ADLG2 account is being used, no further configuration is necessary.

Databricks will need to read from the file system that houses the GDC data. Several lines of code (Python) in a Databricks notebook will establish the required connections:

DLG2AccessKey = “Storage Account Access Key”
ADLG2AccountName = “ADLG2 Account”

spark.conf.set("fs.azure.account.key." + ADLG2AccountName + ".dfs.core.windows.net", DLG2AccessKey)
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
filesystem = "abfss://GDCFileSystem@ADLG2AccountName.dfs.core.windows.net/"

Once connected, files in the GDC folder can be listed using the built in dbutils library:

dbutils.fs.ls(filesystem + “/GDCFolderName”)

While the above and below examples shows account names and keys being explicitly defined in the notebook, this is not recommended beyond any testing or demonstration environments. Instead, it is recommended to store such secure strings in Azure Key Vault and retrieve them at runtime. For instructions on how this is done, see the document Secret Scopes.

Reading JSON Files

Databricks can read all JSON files in a folder (as well as other text-based formats) into a dataframe. A dataframe is an in-memory table that can be hierarchical and queried via standard SQL commands. The schema of the dataframe will be implied through the structure of the JSON files contained within. To load all of the GDC JSON files from a particular folder into a dataframe, the following line of Python can be used:

contactbasedf = spark.read.json(filesystem + “/Contacts Folder”)

The read is recursive, which means that subfolders are interrogated as well. GDC folders typically contains a metadata folder with files of differing schemas than the data files themselves. For this reason, it is a good idea to move the data files to a dedicated folder before reading them into a dataframe. This can be done with the dbutils.fs.mv command.

Extracting the desired data

Once the files have been read into a dataframe, the dataframe can be saved to a temporary table. This table can be queried through standard SQL commands. For example, the query creates a temporary table from the initial dataframe (contactbasedf) that was created by reading JSON files created by GDC for organizational contacts. The relevant details are then queried and saved into another dataframe, named df1 in this case.

contactbasedf.createOrReplaceTempView("contactbaseTemp")
df1 = spark.sql("SELECT AssistantName, Birthday, BusinessHomePage, CompanyName, Department, DisplayName, GivenName, Initials, JobTitle, Manager, MiddleName, NickName, PersonalNotes, Profession, Surname, Title from contactbaseTemp")

Writing to CDM folders

Once the CDM libraries are loaded into a Databrick cluster, writing data to them is a relatively simple method call from a dataframe. The call itself requires several parameters, and those parameters are:

  • cdmModelName – The name of the Model (dataflow) that houses all entities
  • entity – the name of the entity within the dataflow (a dataflow can contain multiple entities or tables)
  • cdmFolder – The folder in ADLG2 to save the model.
  • appId – The service principal ID of an application with Blob Contributor access to the ADLG2 account
  • appKey – The secret key for the appId specified above
  • tenantId – The tenant ID for the ADLG2 account

Using the dataframe defined above, the contents of the dataframe can be written out to the CDM folder with the following (Python) code:

appID = “Service Principal ID”
appKey = “Service Principal Key”
tenantID = “Tenant ID”
Workspace = “Workspace Name”
cdmModelName = "AllContacts"
outputLocation = "https://" + ADLG2AccountName + ".dfs.core.windows.net/powerbi/" + Workspace + "/” + cdmModelName

(df.write.format("com.microsoft.cdm")
                   .option("entity", "Contacts")
                   .option("appId", appID)
                   .option("appKey", appKey)
                   .option("tenantId", tenantID)
                   .option("cdmFolder", outputLocation)
                   .option("cdmModelName", cdmModelName)
                   .save())

The above code will output the contents of the dataframe to an entity named “Contacts” in a model named “AllContacts” stored in a folder named “AllContacts” within the workspace folder specified in the “Workspace” variable.

Creating an external Dataflow

Once the GDC data has been written to a CDM folder, it can be connected to Power BI as an external dataflow. In order to do so, as mentioned above, the user making the connection must have explicit access to the model folders.

From a Power BI V2 workspace (V1 workspaces are not supported), go to the dataflows tab, and select Create – Dataflow from the toolbar. If Power BI has been connected to the ADLG2 storage, and the workspace has been configured for external storage, the “Attach a Common Data Model folder” option should appear.

Selecting “Create and attach” brings up the Attach Common Data Model folder dialog box, where two items must be entered.

The Name of the dataflow is the name with respect to Power BI. It can be completely different than the name of the model folder, or the internal name of the model created above, but it’s likely a good idea to keep it consistent. The CDM folder path is actually the absolute path to the model.json file that describes the model, and it’s vital that model.json be included at the end of the path. Failing to do so will result in an error.

Finishing Up

 Once completed, Power BI Desktop can be used to connect to the external dataflow, just like any other dataflow. The only difference is that external dataflows are not refreshed in the Power BI service, but will be updated by Databricks. The same Azure Data Factory jobs that extract data from Graph data connect can be used to call into the Databricks notebooks when the data has been extracted.

If you are interested in a product that leverages the data produced by Graph data connect, I would be remiss if I did not suggest our tyGraph for Exchange, which is currently in preview. It combines all of the technology listed above with a rich set of reports in concert with other Office 365 workloads. If you are interested, please contact me directly, or email sales@tygraph.com .

Finally, for an in depth conversation with Abram Jackson and Tyler Lenig from the Graph data connect team, check out their recent appearance on our BIFocal podcast.

2 Comments

Get the most out of Power Query in Power BI Dataflows

Power BI dataflows

Power BI dataflows are the first place that many users will encounter the new Power Query web based interface. Until now Power Query has been restricted to Power BI Desktop and Excel. This new web interface is, well, new, and it doesn’t contain all of the capabilities of the more mature client based interface. The good news is that you can take advantage of both the reusability of dataflows, and the maturity of the Power BI Desktop interface.

Two Interfaces

A quick glance at the user interfaces for Power Query on the web and in Power BI Desktop reveals the feature difference.

Power Query editor in dataflows
Power Query editor in dataflows

Power Query editor in Power BI Desktop
Power Query editor in Power BI Desktop

The Desktop editor has a full ribbon interface with a wide array of capabilities, while the web interface has a simple button bar with a subset of features. In the images above (which show the exact same set of queries in the two interfaces) it is easy to see that the combine binaries, or expand tables feature is not there for the “Content” column.

Advanced editor

The key to bringing all of these capabilities in Power BI Desktop to Power Query in dataflows is the Advanced editor. Power Query is at its essence an interface that is used to construct queries using the M language. This core code is available to you in both Desktop and dataflows.

In Desktop, the Advanced editor is available from the ribbon both in the Home tab and in the View tab. In the web based editor it is available by right clicking on an entity (query) and selecting Advanced editor.

 

The code revealed by this editor can appear rather daunting for a complex query, but all you really need to understand is how to copy and paste it. Build up whatever query you need using Desktop (or Excel!), open the Advanced editor and copy it to your clipboard. Then, either create a new dataflow or add an entity to a new dataflow using the Blank Query data source. Once the editor is open, right click on the query, open the Advanced editor, and paste the query from your clipboard.

Done. Well, almost. While both environments execute M code, there are a few differences to be aware of.

Some functions may not work

There are subtle differences between the M engine in Excel and the one in Power BI Desktop. This approach works very well with these two products, but occasionally an incompatibility can crop up. This is no different with the M engine for dataflows. If you do encounter an incompatibility, try achieving the same thing a different way in Desktop and trying again.

Data Sources

Dataflows do not support all of the data sources that Power BI Desktop and Excel do. This will of course change over time, but as of this writing, dataflows are in preview, and currently support 24 data sources compared to the almost 100 in Power BI  Desktop.

Data sources supported in dataflows - Nov 2018
Data sources supported in dataflows – Nov 2018

Queries posted into a dataflow that use an unsupported data source will therefore likely not work. However, there’s nothing stopping you from trying, I’ve been pleasantly surprised by a few.

Functions

Functions ARE supported in dataflows. They can be created using a blank query (and copying function from Power BI Desktop). However, if that’s all that you do, you may receive an error like “This dataflow contains computed entities, which require Premium to refresh” or “We cannot convert a value of type Table to Function”.

You do NOT need Premium to use functions, but a function must its “Enable Load” function disabled. This is done by right clicking on the function and toggling the Enable Load item to off.

Turn off Enable Load for a function
Turn off Enable Load for a function

Computed Entities

Computed entities (or calculated queries) are supported by dataflows but because the type of calculation can’t be predicted, they require the isolation that dedicated capacity (Premium) provides.

Referenced tables are an example of computed entities.  If you are in the habit of designing a base query that does not load data, and then creating variants of that table that do in your reports, you will need to change that design in dataflows in order to avoid the Premium requirement.

In Power BI Desktop, this is the difference between Duplicate and Reference when creating a new query from a base query. Duplicate will simply create a new query with the same steps, while Reference will create a computed entity. If you want to avoid Premium, you’ll need to use Duplicate.

New query create options in Power BI Desktop
New query create options in Power BI Desktop

The Power Query capabilities in dataflows are more powerful that they might appear at first glance. Power BI Desktop is the key to unlocking them, unless you’re already a total wizard at writing M code. Even then, the new editing features in Power BI Desktop likely put it over the top as an editor.

For now we need to cut and paste, but I would love to see a day when Power BI Desktop could connect directly to a dataflow and edit it in place.

4 Comments

Use Power BI dataflows to warehouse SharePoint list data

Image result for data warehouse

Reporting on SharePoint data has been a requirement for a long time, and there have been many approaches to fulfill this need. Custom web parts, Data View web parts and SSRS direct connected reports have historically been some of the solutions, but they all suffer from the same problem. If you have any serious amount of SharePoint data, you’ll quickly begin to bump into capacity limits and performance limitations, and in some cases, you can impact the performance of the overall system. In order to avoid this problem, it is necessary to warehouse SharePoint data first, as I argued in this post from 2012.

Once your list-based data is in a relational database, the performance issue is taken care of. However, the means of getting it moved there have traditionally been problematic. For a long time, there was a CodePlex project called the SharePoint List Source and Destination. This solution provided read and write access to SharePoint lists from SQL Server Integration Services (SSIS). Unfortunately, it was last updated in 2012, it was unsupported by Microsoft, and it did not support authentication for Office 365. This of course rendered it useless for use with SharePoint Online. In 2015, SQL Server Integration Services got an OData source, and given that SharePoint lists have OData endpoints, this became a viable option, particularly given that it did support Office 365 authentication. The OData connection from SharePoint did however have some limitations as well.

For cloud scenarios, Power BI has emerged as a very competent way of reporting against SharePoint data. It has native connectors for SharePoint list data, both on premises and in the cloud and Power BI reports can be hosted in the cloud through the SharePoint Power BI web part. On premises, the same can be done with Power BI Report Server. The structure of Power BI reports mean that the data is cached in a data model, so reports are not run directly against the list data source. This avoids the performance issues listed earlier.

Earlier this year I published a series of articles detailing how to do exactly this. The only issue with this approach is that the data shaping and preparation is always specific to a single report. If I have 5 different reports that use one list, I must query and shape that data 5 different times – one for each report. This is where Power BI dataflows come in.

In this context, dataflows are essentially a data warehousing layer with transformation capability. Instead of each report connecting back to a source list, the dataflow connects to the list, shapes the data with Power Query online and stores it in a data lake. The Power BI reports then connect to the dataflow as their data source. Transformation and storage only need to happen once.

As of this writing, dataflows are in public preview, so be warned – some things could change.

Creating a dataflow

Creating a dataflow from a SharePoint list is relatively straightforward. In our examples below, we will work with the same sample list from the series of articles on SharePoint data earlier this year. To begin open Power BI and navigate to a workspace (your personal workspace will not have dataflows). Click on the workspace name in the navigation pane and the dataflows tab should be available.

To create a new dataflow, Select the Create button, and click dataflow.

Select the Add new entities button and the data source selection will appear. SharePoint list and SharePoint online list are both options. SharePoint list is for on premises list data which will work with the On-Premises Data Gateway. In our case we are working with SharePoint Online, so we select the SharePoint Online source.

At this point, you enter the URL for the site that you want to connect to (NOT the URL for the list) and select the Next button. Power BI Will connect to the site and you can then select which list you want to work with. In our case, we need our Listings data, so we select that list and click Next.

Finally, we’re in the Power Query editing screen. This should be quite familiar to those used to working with Power Query in either Power BI Desktop or in Excel. From here you can select the columns that you want to include in the dataflow.

Although this experience is similar that building queries in the Power BI Desktop, there are a few noticeable differences. Queries in a PBIX file are referred to as queries, but within a dataflow they are referred to as entities. These entities can be custom, or they can be mapped to Common Data Model object types. The Power Query web editor also does not include the full featured editing ribbon found in Power BI Desktop, but instead has a button bar. Many of the editing options available in Power BI Desktop are not available in the Power Query web experience.

If you have read through some of my earlier articles on working with SharePoint data in Power BI, you will notice that there are fewer columns available than we see in the Desktop Power Query editor. Most notably for us working with SharePoint data is the FieldValuesAsText column which is the convenient way of retrieving the text representation of complex SharePoint list column types. At first glance, this would appear to be quite limiting.

However, by right-clicking on the entity name, we can access the Advanced Editor.

This Advanced editor allows you to write queries by hand using the M language. The side benefit of the Advanced editor is that it makes queries portable between platforms -Desktop, Excel, and now dataflows. You can therefore build your queries in Power BI Desktop using its fully functional editor and then copy and paste it into a new blank query in the dataflow editor. Using this approach allows you take advantage of the SharePoint helpers built into Power BI Desktop as the FieldValuesAsText column, and other columns are available. Using this technique, the Listings example can be transformed into several normalized tables in the dataflow.

Click on Done to save your entities, and then the Save button to save your dataflow. You will be prompted to Refresh Now which is a good idea because by default, the dataflow has no data contained within it. To keep the data up to date, you need to set a refresh schedule by clicking the schedule refresh icon under actions for the dataflow in question. From here, you schedule data refresh in the same manner as you would with ta Power BI Report.

Using the dataflow

Once data is loaded into the dataflow it becomes a source for a Power BI report. You must use Power BI Desktop to create this report, there is no way to connect a report to a dataflow in the pure web interface. Start Power BI Desktop and select “Get Data”. Choose the Power BI blade and then Power BI dataflows.

After clicking Connect, you will be presented with a set of Power BI workspaces that contain dataflows. Opening the workspace will allow you to open the dataflow and select the desired entities.

Once loaded, the report can be built just like any other. When it is refreshed, it will be refreshed from the data stored in the dataflow, NOT directly from the SharePoint list. It is therefore important to keep the dataflow itself up to date.

Any number of reports can be created from the dataflow. Instead of having all the transformation logic tied up within a single report, dataflows allow them to be centralized and consistent. With a little work, these transformations allow you work with your SharePoint data just as though it were relational. Power BI dataflows really are the best way to perform data warehousing with your SharePoint data, whether you SharePoint is on line or on-premises.

3 Comments

Using Excel Files with Power BI Desktop and SharePoint

As discussed in a previous post, Working with Excel Files in the Power BI Service, Excel and Power BI have a rich, complex relationship. Power BI Desktop is the primary design tool for Power BI, and it has many feature overlaps with Excel as an analytic tool. Excel can be used both as an analytic tool and a data source, and the structure of the Excel file will dictate the way that Power BI Desktop can be used with it. If Excel is being used as an analytic tool (i.e. connected to data), the appropriate items in the file can be imported into Power BI Desktop. If it is being used as a data source (data in worksheets), Power BI Desktop will connect to it, and use its data to build a model. This post attempts to articulate the nuances of both scenarios.

Importing from Excel

Power BI Desktop has an unfortunate name in my opinion. It is a design tool and is not meant to replicate the capabilities of the Power BI service on the desktop, as the name might suggest. A better name for the product would I believe be Power BI Designer. Its purpose is to connect to and transform data (Power Query), build data models for Analysis (Power Pivot) and build reports (report designer).

Used as an analysis tool, Excel has all these capabilities as well. In fact, the first two (Power Query and Power Pivot) are identical to what is already in Power BI Desktop. Excel also has Power View for analytic reporting. Power View is very similar to the type of reporting in Power BI Desktop, but uses a different technology and has been deprecated for some time. As a result, Excel charts and pivot tables are the primary means of visualizing data in Excel.

So why would you need to use Power BI Desktop if you are using Excel? As explained in my previous post, the Power BI service can fully interact with Excel as an analysis tool, and allows you to interact with Excel right from the Power BI Service. If Excel is meeting all your analytics needs, then there may be no need to introduce Power BI Desktop at all. However, if you wish to take advantage of Power BI’s analytic reporting capabilities, and you have existing Excel assets, you may wish to convert them to the native Power BI format.  Whatever the reason, moving from Excel to Power BI is relatively straightforward with Power BI Desktop.

From the File menu in Power BI Desktop, select Import, and then Excel Workbook Contents.

Importing Excel Files contents

You are then prompted to select an Excel file. Once selected, you are then presented with a warning dialog.

Excel files contents warning dialog

The dialog does a very good job of explaining what will happen, specifically the fact that data from workbooks will not be brought into this new file. Any Power Pivot data models or Power Queries will be brought in. If the workbook contains legacy Power View sheets, they will be converted to native Power BI visuals. In addition, any legacy (non-Power Query) data connections used by the source file’s Power Pivot data model will be converted to Power Query and imported.

Imported Power View sheet

Legacy Power View Sheet converted to Power BI visuals

A complete list of workbook content and what is/isn’t converted is below:

Excel Content Import to PBI Desktop Support
Data in sheets Not imported
Data model (Power Pivot) Imported
Data connections Converted to Power Query and imported
Power Queries Imported
Power View Sheets Converted to PBI visuals and imported
Pivot charts/tables Not imported
Excel charts Not imported
Macros Not imported

Once imported, the new Power BI file (PBIX) lives on its own and contains no connection or any other type of relationship to the original source Excel file. If the source Excel file is changed, there is no way to update the PBIX file. Any imported data connections are between the PBIX file and the original data source. The new PBIX file can be published to the Power BI service like any other.

Connecting to Excel

Connecting to Excel as a data source is a very different thing than importing from it. In this scenario, the data in the worksheets and only the data in the worksheets is brought into the data model. The is very different behaviour than that of connecting to Excel files to the Power BI Service, where both the model and the worksheet data is brought in.

Using the Excel Connector

The easiest, and most obvious way to connect to Excel worksheet data is by using the Excel connector. From the ribbon in Power BI Desktop, select Get Data. The Excel connector is right at the top of the list.

Connecting to Excel files on the file system

Selecting it allows you select your source file, and then the workbooks within it, and then build out the data model.

This approach works well but carries with it an important limitation. The new queries are  connected to the file using a local file system. This means that to be refreshed, an on-premises data gateway is required. In order to eliminate the gateway requirement, you can connect to the file in SharePoint using the SharePoint folder connector.

Using the SharePoint Folder Connector

The SharePoint Folder connects to all the files stored in libraries of a SharePoint site. It allows you to report on file metadata, but it also allows you to drill into file contents.

From Power BI Desktop select Get Data but instead of selecting Excel, Search for SharePoint and select SharePoint folder.

Using the SharePoint folder connector

Once selected, enter the URL of the SharePoint site (NOT the URL of a library or folder) in the dialog box.

Next, you will be presented with a preview of all the files in your site. Unless you are only interested in file metadata, click on the Edit button to bring up the Power Query editor.

The initial view will contain all the files in the site, but we are interested in the content of just one of those files. Every file in this view will contain the hyperlinked value “Binary” in the Content column. Clicking that link for the file that you want to connect to will drill down into the contents.

Site contents using the SharePoint folder connector

From this level, you can build your Power Query, data model, and report as needed just as if you had used the Excel connector. The difference is that now when you publish your report to Power BI, it will know the file is stored in SharePoint and will connect directly to it. It will not require a gateway for refresh purposes. Once credentials are registered, the report will refresh itself directly from the workbook stored in SharePoint.

XLS vs XLSX

A note of caution. The above SharePoint folder approach only works for XLSX files. The Power BI Desktop and the Power BI service both support both Excel file formats (XLS and XLSX). However, refresh does not. If the source file format is XLS, and a refresh is attempted, you will receive the classic “microsoft.ace.oledb.12.0 provider” error in the Power BI service.

Excel files refresh error with XLS file types

The older Excel file format (XLS) requires an Access driver to refresh, which is not a part of the Power BI service. The newer XLSX file does not require this driver. As a result, if the source file is XLS, refreshing it requires going through an On-Premises Data gateway, and that gateway machine must also have the ACE components installed.

To recap, you can bring Excel assets into Power BI Desktop by using the import function, and you can load data from Excel files through Power Query. The two operations have very different results, and the can be combined if a source workbook contains both analyses and data.

5 Comments