Skip to content

Tag: Data Management Gateway

Using Power Query with SharePoint Lists and Lookup Fields

As I’ve explained many times before, querying SharePoint data directly is a bad idea. The SharePoint data storage mechanisms simply aren’t designed for querying of any scale, hence the lookup limitations that have been imposed upon it. The best approach to querying SharePoint list data is to first load it into a data warehouse or data mart of some sort. However, both Reporting Services (SSRS) and Power Query support direct access to SharePoint lists. While I try to strongly dissuade people from doing this with Reporting Services, properly used, Power Query is a totally viable means of querying SharePoint list data.

Why is this? With SSRS, every query goes back to the data source for retrieval.  Power Query is different – it’s analogous to SQL Server Integration Services, which is an ETL management product. It loads source data into a repository, in this case, an embedded xVelocity, or Power Pivot model which can be considered a “personal data warehouse”. Queries against this mini data warehouse are fast, and don’t rely on SharePoint  retrieval mechanisms, and can be used quite effectively in reports.

There are a couple of subtleties to querying SharePoint list items with Power Query, and I will briefly walk through the process below.

With Excel open, click the Power Query tab, select “From Other Sources” and the select “From SharePoint List”.

image

Next, enter the URL for the SharePoint site (or subsite) that contains the list you wish to query.

image

If it is the first time accessing this site, you will be prompted for credentials. If your site is Office365, be sure to enter organizational credentials. If it is on premise, use Windows credentials.

Once entered, you will be presented with a list of SharePoint lists in the Power Query Navigator window. Select the list that you wish to query, in our case, Announcements. When selected, click the edit button to edit the query.

image

The data, or a subset will load into the query editor window. You will see all of the list item fields expressed as columns, and for the most part, using the correct data type. At this point you can remove any columns that are unnecessary, or filter any undesired rows. There are a couple of SharePoint field types that bear special mention.

Lookup fields are a lookup into another SharePoint list. Internally, the SharePoint item stores this as an ID and display value, but Power Query gives you access to all of the properties of the related item as a one-to-one relationship. Essentially, what you can do is to flatten that relationship by incorporating the related item’s attributes.

If you scroll to a column of this particular type, you will see the value expressed as a hyperlink with the value “Record”. Clicking on it will drill down to one related record, but that’s not what we want to do. We want to expand the properties for all items in the list. The way that you do this is  to click on the expand icon in the column header. In our case, we want to expand the “CreatedBy” field. CreatedBy is a standard list field, of the Person type. Person fields are actually a special case of a lookup field, so it exhibits this behaviour.

image

Here, we are interested in retrieving the user’s name and mobile phone, so we deselect all of the other fields. A new column will be created for every expanded field in the format sourcefieldname.attributename .

image

Attachments are another special case. There can be multiple attachments for a single list item, a one to many relationship. The hyperlink is therefore “Table”. Clicking on the column header expand for this column looks similar, but with an important difference. Options are available to either expand or aggregate the related items.

image

Selecting expand will create a new source record for each related item, and the only columns that will differ will be the items selected from the related table (Name in our case). Aggregate will not create any new records, but will summarize the related fields. For numeric fields, they can be totalled or averaged, and for text fields they can be counted.

Once ready, click “Close and Load” from the Query Editor ribbon, and the list data will load to either your model, or your workbook, depending on what your preferences are. Of course, I always recommend that you load to the model only.

Once loaded, any visualizations and queries will work against the model. The data can be refreshed at any point either manually, or automatically if using the Data Management Gateway. Keep in mind however that refreshes will operate against the source list.

17 Comments

Power BI Data Management Gateway 1.2 Changes the Game

Last week, a new version of the Power BI Data Management Gateway was released. If you’re unfamiliar with it, it is the Power BI component that allows for workbooks stored in the cloud to be refreshed on  a regular basis with data that exists on-premises, or outside of the hosting center.

I’ve been using the gateway since its initial availability in preview form, and in my opinion, this is the most significant functionality change yet. Until this release there were a grand total of three possible data source types that could be refreshed. With this release, the total increases to 18 by my count (you could argue 22, but that’s plenty).

With past versions, I would write up a quick post on how it is configured, but that has been done, along with the complete list of supported data sources and a helpful video on this blog post by the Power BI team. In addition, an very comprehensive (although amazingly already in need of update) white paper on hybrid data scenarios has just been published by Microsoft here.

The big change here is that this multitude of data sources is supported for data retrieved by Power Query, and NOT by Power Pivot natively. The catch is that they’re only supported for Power Query queries. There is absolutely nothing wrong with this, but it does require us to change our approach a bit to using the data management gateway.

As I first mentioned in a post almost a year ago on Using the Data Management Gateway, and in a number of posts since, data connection strings needed to line up with Power Pivot connections. At the time, the only supported sources were SQL Server and Oracle (for the gateway) and Power Query wasn’t supported at all. Version 1.1 of the Gateway brought Power Query support, but only for those 2 supported data sources. With this release, the Power Query support includes not only all of the new data sources, but also the three original Power Pivot data sources (note: Power Pivot data connections can be found in the Power Pivot add-in UI, while Power Query connections are available on the data tab in Excel).

image

image

As of this release of the Data Management Gateway, there is almost no reason to use native Power Pivot connections any longer. My recommendation is therefore that unless there is a good reason for not doing so, you should try to use Power Query for all data acquisition tasks. It is quite clearly the way forward, and will only gain in supported capabilities. My suspicion is that Power Pivot connections will be retained for backward compatibility reasons only.

With that said, there are a couple of good reasons for using Power Pivot connections directly. One of these reasons is if your data source is online, whether it is SQL Azure, SharePoint Online, or Project Server online. With these data sources, a Data Management Gateway is not required for refresh to work from an embedded Power Pivot connection. However, if Power Query is used to access these sources, it is.

What this means is that for Power Pivot connections to these sources, a refresh allows the Power BI service in the cloud to directly access these data sources in the cloud. However, because ALL Power Query connections require the Data Management Gateway for refresh operations, a Power Query refresh operation will require all of the data to be first downloaded to the on-premises gateway, and then sent back up to the Power BI service in the cloud. While functional, this is hardly the most efficient approach.

Apart from this one small caveat, this version of the Data Management Gateway spells the way forward. Additional data source support should come fast a furious, and the Power Query focus means that we can start to rely its powerful transformational capabilities without having to sacrifice refreshability (if that’s even a word….).

7 Comments

How to Refresh Data Models in Office 365 from On Premises Using Nintex Workflow – A Hybrid Approach

One of the viable use cases for Power BI is to provide reporting to a mobile workforce, even when the organization has invested in BI on premises. In this scenario, there may be a SharePoint farm using PowerPivot for SharePoint with multiple workbooks connected to a myriad of data sources, including SQL Server Analysis Services cubes and models.

The Problem

In this scenario, the workbooks required by the mobile users can be copied up to the Office 365 tenant, and consumed from there. However, how is the data kept current? One way to do so is to set up the Power BI Data Management Gateway (DMG) to do this. The DMG is a great solution, but is relatively new, and has a few inherent limitations at the moment. Chief among them is that data can only be refreshed from SQL Server or Oracle data sources. For the moment at least, Analysis Services is left out in the cold. However, PowerPivot for SharePoint doesn’t share these limitations, and has no problem refreshing data from a wide variety of sources. Unfortunately, if you’re using Power BI, and need to refresh from an unsupported data source, you’re out of luck for the moment, so this is a problem.

Keeping a cloud copy of a local workbook carries the inherent problems of managing two different copies of the same workbooks. Every modification must be performed twice and there is always the chance that something will get missed. Ideally, for this scenario, we should have a publishing mechanism. We can use Nintex Workflow (on premises) to provide this publishing mechanism for us. Whenever the model is refreshed, the workflow will fire, and copy the file up to Office 365 to update the model there.

In order to implement this solution, you’ll need a copy of Nintex Workflow. The low end (Team) edition is fine, but what we’re after is the “Copy document to Office 365” action, and that is available through the Nintex Live action set.

The Solution

To start, we’ll work with an Excel workbook that is using an embedded model created from the standard Contoso Data Warehouse. In this example, we’ve built a simple pivot table to interact with the model. The model has been published to a PowerPivot gallery, and the refresh schedule has been set. Once we’ve tested the refresh to ensure that it’s working, we can create the workflow. To do so, we go to the workflow options section in the library ribbon and create a new Nintex workflow.

image

We then select the Blank template. Once presented with the design canvas, we select Workflow Settings, give it a good name, and set the workflow to run on create and on edit.

image

Once the settings have been saved, we can start to build the actual workflow. The first action will be a “Pause for” action. Find it in the Workflow Actions toolbar, and drag it onto the design surface. Select configure from its drop down menu, and set it to pause for a period of time, in this case, 5 minutes.

image

The amount of time that we need to pause for depends on how long it takes our model to refresh. When the refresh operation begins, it “edits” the document, which kicks off the workflow. We don’t want to upload the workbook until the refresh operation is complete, so we need to pause it. In this case, we are working with a data model that is about 100 MB and comprised of about 3 million rows of data. In this environment, it requires about 2.5 minutes to refresh, so a 5 minute pause is sufficient.

Next, we need to use the “Office 365 upload file” action. This action is one of the Nintex Live actions that can be added at no extra charge to the Nintex environment. If it hasn’t already been added to the toolbar, we need to do so. Assuming that Nintex Live has been enabled for the farm, we browse the available actions through the Catalog which is available through the designer ribbon.

image

Once in the catalog, we can browse the available actions, and click the Add button to add them to the toolbox. The action that we need to add is named “Office 365 upload file”. Once at least one action is added, an new section will appear in the Workflow Actions toolbar named “Nintex Live”. We then open that section and drag the Office 365 file upload action onto the design surface after the pause action, then configure it.

image

Walking through the options:

  • File to upload: We want the current file to be uploaded to Office 365, so we select current item
  • Destination site URL: This is in fact the URL of the site itself. Everything up to, but not including the document library
  • Folder path: This is the name of the destination document library, and if appropriate, any subfolders
  • File name: We can create or derive a new name for the file, but in this case, we will be using the same name as the source file
  • Overwrite existing file: We select this option, as we will be updating existing content.
  • Fields: We can add additional metadata values if we wish, but in this case, we leave it empty
  • SharePoint Online URL: this is the URL to the root of the Office 365 tenant for the destination
  • Username and password are for a proxy account that will be used to upload the file. Secure workflow constants can be used.

Once we configure the action, we save it, our workflow should appear as follows:

image

and we are ready to publish the workflow. To do so, select the Publish button, then close the workflow designer.

We are now ready to test our workflow. We can force the workflow to run on demand for our workbook, but a better test is to let the refresh trigger it. The most frequently that PowerPivot for SharePoint refreshes can be scheduled to run is once per day. but there is a way to force the refresh to happen on demand.

To force a refresh, we need to edit the refresh schedule for the workbook. Open up the refresh history screen by selecting Manage Data Refresh for the workbook. 

image

From the Refresh history screen, select the Configure Schedule link to edit the schedule. This will allow us to edit the existing schedule, but more importantly, it will also let us force a refresh by selecting the “Also refresh as soon as possible” option.

image

If we are using a specific account as a credential, we will also need to re-enter that. Once complete, selecting OK will queue the refresh job for execution. The refresh job will happen the next time that the “PowerPivot Data Refresh Timer Job” executes, and it runs every 5 minutes.

If you’re really impatient, and you have access to central administration, you can edit this timer job and force it to run immediately on a case by case basis.

Moving back to the refresh history screen we will notice that the refresh is currently running.image

And in the workflow history screen for the workbook, we can see that the workflow is also running.

image

Once both operations have completed, we can navigate to our Office 365 site, and we will find the file in the destination folder. If our model is greater than 10 MB in size, we will require Power BI in order to interact with it in the browser, and we will need to enable it for use with Power BI through the Power BI application. If it has already been enabled, then no additional actions are required. If the workbook is less than 10MB, it will work directly in any Office 365 library that is enabled for Excel Services.

More Possibilities

If you’ve worked with the Power BI Data Management Gateway at all, you’ll know that data refreshes can be scheduled at most daily. This limitation is also shared by PowerPivot for SharePoint, but it is possible to change its behaviour. Using the techniques outlined in this article by Ian Smith, and this article by Tim Laqua, refreshes can be made as granular as 5 minutes. These articles are written for SharePoint 2010, but the principles work with 2013 (hint, the stored procedure that forces a refresh is [DataRefresh].[AddRefreshSchedule]). Be warned, this technique is a hack, and not supported by Microsoft. However, it can be a way to up the frequency of refreshes on premises, and with the above technique, in Office 365 too.

Another way to refresh the data more often is by using SQL Server Integration Services (SSIS). SSIS can update an embedded data model in an Excel workbook through a script action (thanks to Gobán Saor).  SSIS packages can be scheduled with SQL agent jobs, which means that they can be run as frequently as every minute. Another advantage of this approach is that with Excel 2013, this method should be fully supported.

Finally…

By relying on our workflow to publish the refreshed model, we work around the current limitations of the Data Management Gateway. We do incur the cost of a Nintex license, but a few days of development would easily exceed that cost. In fact, if our data model doesn’t exceed 10 MB, we don’t even need Power BI for this solution to work.

1 Comment

Problems Manually Refreshing Power BI Enabled Workbooks

Office 365 (without Power BI) has supported data refresh for Excel and PowerPivot for some time, and it works well provided that the data source is both in the cloud, and is one of the supported data sources. To refresh a workbook, simply open the workbook in the Excel Web App, open the data menu and select “Refresh All Connections”.

image

Up until just recently, this was how it worked with Power BI workbooks as well, with the additional ability of being able to refresh on premises data through the Data Management Gateway. However, with the latest refresh of the Power BI application, and its support of scheduled refresh, this has changed. Now, if you follow this procedure and attempt to manually refresh a Power BI enabled workbook from an on premises data source, you will receive an error.

OnPremise error: We were not able to refresh the data connections. On-premise data ources canonly be refreshed vi scheduled refresh in Power BI for Office 365

The error is pretty self-explanatory, so I won’t try to explain it. Scheduled data refresh hasn’t just been added, it has replaced the old refresh method. That’s all well and good, but what about those use cases where we want to manually refresh data? The good news is that it hasn’t been lost, it’s just been moved. It is, however, well hidden.

To refresh the workbook on demand, you must first open the Power BI application, locate your workbook, and click on its ellipsis to open its context menu.

Schedule Data Refresh

Next, you need to select “Schedule Data Refresh”. Now I know that we don’t want to schedule the refresh, but to update it on demand, so you’ll just need to trust me here. Selecting “Schedule Data Refresh” will open the scheduling interface into either the history tab (if the workbook has already been scheduled) or the settings tab (if it has not). In any event, you’ll need to be in the settings tab.

If the workbook has not already been enabled for scheduled refresh, it will need to be. Once it has, the “save and refresh report” button will be available. If it has already been saved, the button will read “refresh report now”. In either case, clicking on it will start the refresh process immediately.

On-demand refresh is still available, but I have to say that it’s well hidden. The fact that it has moved into the Power BI application means that a Power BI license will be required to refresh it on demand, which seems quite reasonable to me. However, some better visual cues would be a big help. For example, why not add “refresh now” to the context menu in the Power BI application?

In addition, given that the refresh is being initiated manually, some visual cues around the status of the refresh (started, in progress, completed) would help considerably. As it stands, the only status information is available after the refresh completes, on the history tab of “Schedule Data Refresh”.

There has also been another subtle change around how workbooks are displayed in Power BI. When a workbook is opened from the source Office 365 library, the standard Excel Web App interface is displayed, with options for opening in Excel, editing, etc. displayed.

image

However, if you first navigate to the Power BI application, and open the workbook by clicking on the thumbnail, it will open in the browser but without the Web App chrome.

image

I’m not sure what the reasoning is for this different behaviour, but it’s a change, and something that you should be aware of. UPDATE 14/2/14 – It has been explained to me that the reason for this different behaviour is an effort to reduce screen clutter for those using the Power BI application. It’s a consumption mostly application, so this change makes sense in that context. It’s also possible to add the Excel Web app chrome back in by using a new “action bar” (my name). If you look to the bottom of the worksheet window, you’ll see it, and its three icons.

image

The three icons, from left, allow you to submit feedback to Microsoft, to get embed codes for the report (a new feature!) and finally, to restore the standard Web App chrome (for editing etc.)

2 Comments

Scheduled Data Refresh in Power BI

It’s finally here.

Quietly, sometime over the past few days, Microsoft updated the Power BI application in Office 365 along with the Data Management Gateway (get it here). Chief among the changes is the ability to schedule data refresh, which to my thinking, is the single most important feature for deploying Business Intelligence solution in the cloud.

Until now, it has been possible to refresh Excel worksheets with embedded data models on demand, In fact, if your data source was also in the cloud (and was one of the supported data sources), you don’t need Power BI to do it, it’s supported natively in Office 365. If your data source is on-premises (and either Oracle or SQL Server), you can do it through the Data Management Gateway. What has been missing is the ability to have the data model refreshed in the absence of interaction. No longer.

This capability can of course be found in PowerPivot for SharePoint on premises. It is configured on a per workbook basis in the PowerPivot Gallery, which is a PowerPivot focused view of a document library that contains workbooks. In works in much the same manner with Power BI, with the Power BI application taking the place of the Power Pivot gallery.

Configuring Scheduled Refresh

To turn on automatic refresh for a workbook, you need to access the workbook’s BI context menu. To do this, first, open the Power BI application, then locate the workbook that you wish to have refreshed automatically. Click on the ellipsis to access the menu.

image_thumb2

A number of items have been added to the menu, and to the preview graphic itself. To the left of the ellipsis is information on when the model was last updated, and the context menu adds the ability to edit in Excel and to add to Q&A as well. However, the feature that we’re interested in is the scheduled data refresh, and selecting that option takes us to the scheduled refresh screen.

image_thumb6

If refresh has not already been configured, it opens into the settings tab, otherwise, the history tab will be opened.

To turn on refresh, simply select the “on” slider. If your model has multiple data sources you can choose them to be included or not. As far as I have seen, you can only have one schedule per workbook, so if a data source isn’t included, it simply won’t be updated.

Next, select your refresh schedule, which will be either daily or weekly. By default, your schedule will have a shelf life of 90 days, and will turn off after that time. You can adjust this period by changing the value of the “Ends By” field. You can then select a time (or a day and time if appropriate) for the refresh to occur. Finally, any errors will be sent to the email address that you specify in the notification field.

That’s really all there is to it. Selecting “save settings” will save the schedule, and “save and refresh report” will save the schedule, and attempt to run an immediate refresh.

If the data source is cloud based, it will be queried directly by Power BI, and if it is on premises, it will contact the appropriate Data Management Gateway process and refresh through it. I would love for there to be a little more status information for refreshes in the administration portal, but for now, the refresh will either succeed or fail. However, If the data source is on premises, you can open the Resource Monitor on the gateway machine, and monitor the “diawp.exe” process.

image_thumb10

Once the refresh kicks in you’ll notice it using a lot of send bandwidth.

Selecting the “history” tab will of course show the refresh history, and what the refresh schedule for the workbook is. At a glance you can see whether or not refreshes succeeded or failed, how long they took, and how they were initiated.

image_thumb8

I should note here that I have been working with the Power BI preview for several months now, and in order to get scheduled refresh to work with on premises data, I did need to install the latest Data Management Gateway. I’m not sure if this was because scheduled refresh required it, or just because it had expired (it had), but I would recommend installing it in any case. Update 10/02/14 – I have been informed that scheduled refresh does not require the latest data management gateway, but I would recommend getting it all the same – it’s the release version.

One interesting side note. After installing the latest DMG, accessing its configuration shows its version to be 1.0, where previous versions were all point releases (the latest being 0.11). I can’t help but assume that the General Availability of Power BI isn’t far away. UPDATE 10/02/14 – In fact, Power BI went GA today, and this is in fact the GA version of the Data Management Gateway.

Limitations

There are a number of behaviours and limitations that you should be aware of when using scheduled refresh in Power BI. The below items are by no means exhaustive, but simply things that I have either run into, or been made aware of.

Too much data

As I have outlined previously, the maximum size for an embedded workbook model in Power BI is 250 MB. If a user attempts to enable a larger model, they will receive an error message. However, scheduled refresh now allows for the possibility that the model could start small, and then grow to exceed this limit through refresh. What then happens when the limit is exceeded?

When the model is opened for refresh, its size is checked. If it’s OK, the refresh proceeds, and the model is updated.  If the model now exceeds the limits, the next refresh will fail, as will any attempts to work with the file through a browser, until the size of the model is reduced.

Collisions

Refreshes can take a fair amount of time. During this period, the file is not checked out exclusively to the refresh process, and if it is edited by a user in that time an edit collision could occur. If this situation arises, scheduled refresh will simply discard its updates and fail.

Frequency

As mentioned above, the two options for schedule frequency are daily and weekly. I was really hoping to see hourly. Monthly and annually would be great too. As it stands, if your data needs to be more current than daily, then Power BI still won’t work for you (without heavy customization). Of course, the reality is that daily is frequent enough for most situations, and this at least puts data refresh on par with its counterpart in PowerPivot for SharePoint.

If someone from the product team is reading this, hourly updates would be my #2 feature ask, for both Power BI and Power Pivot for SharePoint. (for the #1 ask, read on).

Limited Data Sources

At the moment, the refreshable data sources are those that are currently supported by Office 365 in the cloud (Azure SQL, SQL on Azure VMs, and OData feeds with simple or no authentication), and those supported by the Data Management Gateway (SQL Server 205 +, Oracle 10g +). A full list can be found in the official documentation here.

This is a great starting list, but it is limited. There are quite a number of other data sources that would be great to see on this list, multidimensional sources being right up there. However to my thinking, the most glaring omission on this list is Power Query.

The above data sources are supported if the data was imported into the model through Power Pivot’s import feature (or the native features in Excel 2013). However, if a user takes advantage of the many excellent features available in Power Query, their model will not be automatically refreshable. I have already seen in the preview forums that this difference confuses users, and given that Power Query is a highly touted integral component of Power BI, it needs to become a first class citizen, and soon. That’s my #1 ask – again, both for Power BI and Power Pivot for SharePoint.

However, for the moment, what you need to know is that if your model is built with Power Query, it can’t be refreshed automatically.

Limitations aside, it appears to me that Power BI is an absolutely compelling value proposition, and the inclusion of scheduled refresh completes the picture. I can’t wait for it to be released into the wild. Let the games begin! 

11 Comments