The Office team have published a comprehensive resource outlining all of the current boundaries and limits for SharePoint Online generally. Power BI offers a host of new capability, and in the absence of anything official from Microsoft, I thought it might be a good idea to itemize my findings below. I’m clearly leaving things out, and if you have something to add, I’d love to here from you. This list is over and above the inherent boundaries of Office 365.
This is obviously not official, and I’ll try to modify it as these limits change, but for now, here are the current limits for Power BI, in its preview form.
While it isn’t the flashiest, the Data Management Gateway is arguably the most important part of Power BI. Most of the other features of Power BI have either been available already, or could be achieved through alternate means. What was next to impossible was to have a workbook stored in Office 365 automatically update its data from an on premises data source in a manner similar to Power Pivot for SharePoint. The Data Management Gateway does this, and a fair bit more. This article will attempt to explore some of its intricacies, tricks, and things to look out for. This article is being written while using the invitation only preview of Power BI, and some of the specifics could will change by public preview, and certainly by General Availability.
The Data Management Gateway is a Windows service, and can run on any Windows PC. Well, not just any PC, it needs Windows 7 or above, but it’s a relatively light service. In my case for the preview, I’m running it on my main workstation. It’s also relatively small, and you can run multiple gateways within your organization. If you want to have a look at it, you can download it here whether or not you have a Power BI enabled Office 365 tenant, but if you don’t, you won’t be able to do much with it.
Installing the gateway is pretty straightforward, and is well documented here, so I won’t go into details on that. Once it is installed, it establishes a communication channel with your Power BI service in the cloud (no firewall holes required), and essentially waits for requests. Once it gets one, it acts as a broker between your cloud service and your on premises data.
Most configuration is done in the cloud, in the Power BI Admin Center. If you’re interested in monitoring its activity on the gateway machine, You can do so using Resource Monitor or Task Manager. The process that you’re looking for is diawp.exe or diahost.exe. I have no idea there those names come from, but I’m going to guess Data Integration Agent Worker Process and host.
Once installed, the gateway performs two major semi related functions. The first is the aforementioned on premises data refresh capability. In addition to this, the gateway also provides the ability to publish your on-premises data sources as an OData feed that is will be accessible in the cloud. In its current version, the gateway only supports a limited set of data sources – all of them SQL Server. The official list can be found at the bottom of this document. Although its not listed specifically, SQL Azure databases are also supported.
I’m going to drill down a little on these two main functions, and share some of my experiences. We’ll start with the OData feed.
Publishing an OData Feed
An OData feed is published by creating a data source. This is done from within the Admin Center by navigating to the data sources tab, and selecting “new data source”.
This starts a data source “wizard”. The first question to be answered is what will this data source be used for.
It isn’t necessary to create an OData feed in order to refresh on-premises data, but it is necessary to to create a data connection (more on this later). The “enable cloud access” option essentially tells the gateway that it’s OK to allow on demand and automatic data refreshes from the cloud. The “Enable OData Feed” option is pretty self explanatory – if you don’t enable it, the only thing that the connection can be used for is data refresh. These two can be selected independently of each other. After selecting next, you are presented with the connection info screen.
To start with, you’ll begin by giving the connection a name. Since the connection will typically be to a specific database, the name should reflect that. You may also wish to add something about the database’s location if you deal with similar data sets. The next selection is that of the gateway. You MUST have a gateway in order to create a data connection. You can have multiple gateways registered in your tenant, and the selection of the gateway will dictate the connection provider choices. It’s an easy thing to forget to choose your gateway, and then think that the page is broken because there are no Provider choices.
You can choose one of two methods to create your connection – connection string, or (if you’re lazy like me) you can choose Connection Properties, and let the tool build your string for you. When it comes to troubleshooting data refresh, you may find that the connection string method is more helpful, but either way should be equivalent. Once you have completed the bulk of this form, you need to enter credentials. First, select the method of authentication – your choices are Windows Authentication or SQL Authentication. Then select the credentials button which launches the Data Source Manager.
The Data Source Manager is a one click application that communicates directly with the gateway and is used to register the connection’s credentials directly with it – they are not stored in the Office 365 tenant or the BI Sites app. Because of this direct connection, you need to be on a network that is local to the gateway. This will not work from a remote location. If, like me, you are not joined to a domain, you will also need to be on the gateway machine itself.
On launch, it will go through a series of checks (this can take a while), verifying the gateway and the tenant. When it’s ready, it will show “Data Source Manager initialized successfully”, and you can go ahead and enter the credentials. Once you do, be sure to use the “test connection” button to verify that everything is working. When ready, click OK to register the credentials, then click save to save the data source. You will then be taken to the data settings page.
From here you can choose to not only index the metadata in the data catalogue, but also the data itself, and you can choose how frequently it is updated. The indexing option is currently disabled in the preview, so I have little to say about it at this point. Its purpose is to improve discoverability of the data via Power Query. The second section is the selection of tables to expose to the OData feed. You can choose from any tables or views, but as you can see from the example above, if your table/view contains any unsupported types (in this case geography fields), the entire object will be unavailable for publishing.
Clicking OK brings you to the users and groups page. From here, you can select those that will be able to use these data sources in Power Query, or manually refresh workbooks in the browser. As with all things SharePoint, it’s a good idea to use a group here.
Once done, your data connection is ready, and your OData feed is available. To use it, you’ll need to discover its address. You can do this by clicking on the ellipsis beside its name in the list.
As you can observe from the URL, this is a cloud service. You should be able to connect to the service from anywhere, and it will connect through the gateway to serve up the data. While this is great from a mobility standpoint, if you happen to be on premises, this would be quite inefficient, as the data would first need to be transferred to the endpoint in the cloud, and then back to the source network.
The good news is that the gateway is able to detect when you are accessing the feed locally, and it will redirect you to the source without sending the data up to the cloud and back. The bad news for us preview users is that this is the only thing working at the moment. Therefore, for the preview period at least, in order to access the OData feed, you must be on a local network. Specifically, you must be able to resolve the server name defied in the connection string.
If you meet these conditions, you can test the feed using Power Query. In Excel, go to the Power Query tab, and select “From OData Feed” in the “From Other Data Sources” dropdown.
You will then be prompted to log in. You need to use an Organizational Account (Office 365) to do this. This is an account associated with the Power BI license. However on this screen it’s referred to as a “Microsoft Online Services ID”.
I personally feel this could easily be confused with a Microsoft Account (Live ID). What’s worse is that Microsoft Accounts can be used with Office 365, so it’s certainly less than clear as to which credentials should be used here.
Clicking on the Sign In button takes you to a standard authentication dialog. In the preview, there is a small bug that requires you to enter your account completely in lower case. Failing to do so will cache the wrong credentials, and you’ll be denied access moving forward. If you encounter this problem, the solution is to close Excel, clear the browser cache and to restart.
Once you have authenticated successfully, you can save the connection (Power Query will save it in its cache), and work with it as with any other data source.
Thus far, I’ve only been able to use Power Query to connect to the OData feed. I have tried using Power Pivot directly, but although it is supposed to support OAuth, I can’t seem to save my Office 365 credentials. I’ve only tried via these two mechanisms – if anyone has tried others, I would love to know about it.
Refreshing Data in a Power BI Enabled Workbook
To start with, at the time of this writing (Power BI preview) quite a number of features that will be available in GA have not been enabled and/or are not supported. Data refresh scheduling is as yet unavailable, and the data sources that can be refreshed are restricted to direct SQL connections. Models created with Power Query cannot yet be refreshed. As these features become available, I will update this article with any relevant findings.
Given the fact that the Gateway is required to support both an OData feed and for data refresh, you might think that you must use the OData feed in your data models in order for them to be refreshable. This is not the case. When a refresh is requested, the model is interrogated for its data connections. The data catalogue is then interrogated for a data source with a matching connection string, and if found, is used. The Gateway is then called to retrieve the data if it is on premises. If the data source is SQL Azure, the Gateway is still used, but the data is loaded directly from SQL Azure – it does not need to be sent to the Gateway first.
As mentioned above, Power Query queries cannot yet be refreshed by the gateway. The only type of connection that I’ve been able to successfully refresh thus far is one created directly in Power Pivot. When creating a connection in Power Pivot, pay close attention to the connection string. You may need it later if you have refresh issues.
In addition to the data sources supported by the gateway, two other data sources can be refreshed. Project Online has a number of OData feeds that can be refreshed directly, and public OData feeds (not requiring authentication) can also be refreshed. I don’t currently have an instance of Project Online, so I don’t have much to add apart from the fact that it is supposed to work. I have tested refresh with public feeds and they do in fact work well. The interesting thing I noticed was that while it worked in my Power BI tenant, it also worked in my regular Office 365 tenant. Apparently this feature has been there for some time.
As I mentioned, scheduled refresh is not yet available. When it is, this will be done from the BI Sites app, the same way that you “enable” a workbook. For now, it must be done manually. This is done the same way that it is with an on premises workbook. First open the workbook in the browser, and then, from the data tab, select “Refresh All Connections”.
The workbook will go dim, and you’ll see “Working on it…” or “Still working on it…” for a bit – it does take some time to refresh, depending on the data set. Using the methods that I mentioned at the beginning of this article, you can monitor the progress of the refresh, and the impact on the gateway machine. Also, for the moment at least, if your data source is SQL Azure, prepare for a long wait – refresh time takes an exceptionally long time (in my case, about 10 min for a 1 million row x 20 column set of simple data types). The Azure refresh time should be addressed by GA.
Monitoring The Gateway
There are already a number of tools in the preview to help with troubleshooting Power BI, chiefly aimed at the Data Management Gateway. They can be found on the “system health” tab in the Admin Center.
The default screen shows the CPU utilization of your gateway machines (in this case, I have all of 1…) at the top, and the availability of your gateways at the bottom. At first glance, you would think that while my gateway is good from an availability standpoint, it’s taking a lot of my CPU. The reality is that the top chart shows total CPU utilization on the machine. If you want to see the utilization of the Gateway itself, you need to choose the specific machine from the Display dropdown.
Here you can see that while the machine utilization hovers around 40% (blue), the gateway utilization is barely noticeable (red).
Finally, what is likely the most useful part of monitoring, the logs, is well hidden. The logs are useful for troubleshooting data refresh issues, and can be accessed by clicking on “logs” which is at the top of this screen beside “dashboards”.
In the browser, you can see basic event information and basic error information if you have errors. However, if you download the logs as a CSV, you will see much more detailed information. If you are having problems with data refresh, particularly in the preview, I strongly recommend downloading it. One of the important pieces of information that it contains is the connection string that is being used:
You can compare that to the connection string that is being used by Power Pivot in the workbook. You find the string in Power Pivot by first clicking on “Manage” in the Power Pivot tab, and then in the Power Pivot window, choose “Existing Connections” in the ribbon. You should see your connection under “PowerPivot Data Connections”. Select it, and click the Edit button.
In the next screen, click the “Advanced” button. You should then be presented with the data connection property dialog.
Here, at the bottom, you will find the connection string being used by Power Pivot, and this is what the Gateway uses to look for one of its registered data connections. If you find any discrepancies, the chances are that they are at the source of your refresh problems, and that they should be addressed.
This is an early, first glance walkthrough of some experiences using the Data Management Gateway. Hopefully it can be of some help for the early adopters. I will try to keep this updated as Power BI moves from preview to General Availability.
I’ve mentioned before that in my opinion, one of the most compelling features of Power BI is the ability to automatically refresh data models in the cloud with data from on premises. Up until now, this couldn’t even be done manually – you needed to download the workbook, use the Excel client to do the refresh, and then re-upload. I’ve had the opportunity to work with the preview for a little while now, and while automatic refresh still isn’t there, it is at least possible to manually refresh the data in the browser, which forms the underpinnings of the upcoming automatic refresh feature.
On premises data refresh is restricted for the moment to SQL data sources (I’ll be writing more about this shortly), but until today, I had been unsuccessful in getting even those to refresh properly. Needless to say, I was pretty disappointed. After trying a number of scenarios, and spending some time in the forum (I highly recommend the forum), I got some help from the product team. It turns out that I was pretty quick off the draw, and the version of the gateway I had installed didn’t work. Since the release, it had already been updated.
So how do you know that you’re out of date? If you go to the Power BI Admin Center, and click on Gateways, you’ll see all of your registered Data Gateways. Under status, if everything is up to date, and running, you should see “online”.
However, in my case is said “expiring”. This means that a newer version is available. OK, so how do I get access to this wonderful new version? Just to the left of the status, there is an ellipsis. Click on it to open the Gateway Configuration dialog (that’s what I’m calling it).
There are a number of options available here, but at the bottom of it is a link – “Download gateway installation package here”. It’s pretty self explanatory. It may go without saying, but in case it doesn’t, you will need to be on the machine that is running the gateway to install it. It’s a simple install, and it will install over top of the existing gateway. If you have the gateway status window open, you will need to close it before the install completes.
One other thing that’s worth mentioning about this dialog. Clicking the machine name next to “Installed On” will open up a remote desktop session to the machine running the gateway. You will need to be on the same network for this to work – it’s simply launching Remote Desktop – it doesn’t have any tunnelling protocol a la Windows Live Mesh, but it’s handy nonetheless.
In my case, this wasn’t quite enough to get data refresh working. The final step was to delete my original data connection, and then create a new one. Once that completed, I was able to use a browser to update my cloud based data model from on premises data.
If you’ve worked at all with the Power BI mobile app, you’ll find that it’s quite straightforward to use. What it does in essence is to render out objects from Excel workbooks in a manner that is easy to consume from a mobile device. It uses Excel Services to render the content, but it doesn’t do so in a manner that may be familiar to Excel Services users, which is to essentially replicate the spreadsheet editing environment in a browser. Instead, each named object is presented as a distinct object, and in favourites, the workbook is represented as a section.
Clicking on any of the objects opens it in full screen mode. Swiping down from the top, or right clicking on the report opens a bar at the top that allows you to navigate to any of the other objects.
By default, any named object will appear, as well as Power View reports. (It should be noted that as of right now at least, Power Map objects are not rendered by the Mobile app). However, what happens if we don’t want an object to be rendered in the application? If you use Power Query, and you follow my recommendations of loading the data directly into the data model, you’ll likely bump into the need to do this fairly quickly. Queries are named objects in the workbook, and are rendered by the mobile app.
In the screen above, there are actually 7 objects in the workbook, but the favourites view can only display 6. Three query stubs are blocking a Power View report. Not an ideal situation. What we want to do is to hide these queries, but how to do so is not immediately obvious. The BI app uses Excel Services to render the objects, and since the early days of Excel Services, Excel has had a mechanism to control what gets rendered by Excel Services.
If we open the source workbook in Excel we will see that all of the named objects are contained in worksheets named “Pivots” and “Pie” another worksheet, “Power View 1” contains the Power View report. The queries are all stored in different worksheets. In order to control what gets rendered, we first click on the “File” tab in the ribbon, and click the “Browser view options” button.
You are then presented with a dialog that allows you to determine what is rendered by Excel Services. The default is “Entire Workbook”, but you can select specific “Sheets”, or for very fine grained control, “Items in the workbook”.
In our case, we just want to turn off items in the unnamed sheets, so we deselect them. Once this is done, all that is necessary is to save the workbook back into its library, and the results become immediately apparent on a refresh of the app.
This is a bit of a brute force approach. It affects not only the mobile app, but also anyone using the workbook from a browser – those sheets will be invisible to any users using the Excel Web App. I can foresee a need to hide things from the mobile app but not from the browser. I can also see a need to control the layout of the objects, but these are early days. For now, I’ll take this approach.
UPDATE – 2013-11-17 – After installing the November 2013 refresh of the Data Management Gateway Preview, this restriction appears to have been removed. The workaround is not longer necessary, but is still valid.
After successfully setting up the Power BI On Premises Data Gateway, and setting up a data source, you can use Power Query to query that source through the oData data source. However, if you have a significant amount of data, you may encounter an error:
[DataSource.Error] Cannot parse OData response result. Error: The maximum number of bytes allowed to be read from the stream has been exceeded. After the last read operation, a total of 104858624 bytes has been read from the stream; however a maximum of 104857600 bytes is allowed.
As has been recently confirmed by the product team, the maximum amount of data that can be returned from the gateway is 100 MB. I have no confirmation of this, but it may be removed or increased in the future. However, for now, if you hit this limit, you are stuck. One way around it is to create segmented queries that come in under the limit, and merge them together through Power Query’s Append function.