Skip to content

Year: 2014

Problems Manually Refreshing Power BI Enabled Workbooks

Office 365 (without Power BI) has supported data refresh for Excel and PowerPivot for some time, and it works well provided that the data source is both in the cloud, and is one of the supported data sources. To refresh a workbook, simply open the workbook in the Excel Web App, open the data menu and select “Refresh All Connections”.

image

Up until just recently, this was how it worked with Power BI workbooks as well, with the additional ability of being able to refresh on premises data through the Data Management Gateway. However, with the latest refresh of the Power BI application, and its support of scheduled refresh, this has changed. Now, if you follow this procedure and attempt to manually refresh a Power BI enabled workbook from an on premises data source, you will receive an error.

OnPremise error: We were not able to refresh the data connections. On-premise data ources canonly be refreshed vi scheduled refresh in Power BI for Office 365

The error is pretty self-explanatory, so I won’t try to explain it. Scheduled data refresh hasn’t just been added, it has replaced the old refresh method. That’s all well and good, but what about those use cases where we want to manually refresh data? The good news is that it hasn’t been lost, it’s just been moved. It is, however, well hidden.

To refresh the workbook on demand, you must first open the Power BI application, locate your workbook, and click on its ellipsis to open its context menu.

Schedule Data Refresh

Next, you need to select “Schedule Data Refresh”. Now I know that we don’t want to schedule the refresh, but to update it on demand, so you’ll just need to trust me here. Selecting “Schedule Data Refresh” will open the scheduling interface into either the history tab (if the workbook has already been scheduled) or the settings tab (if it has not). In any event, you’ll need to be in the settings tab.

If the workbook has not already been enabled for scheduled refresh, it will need to be. Once it has, the “save and refresh report” button will be available. If it has already been saved, the button will read “refresh report now”. In either case, clicking on it will start the refresh process immediately.

On-demand refresh is still available, but I have to say that it’s well hidden. The fact that it has moved into the Power BI application means that a Power BI license will be required to refresh it on demand, which seems quite reasonable to me. However, some better visual cues would be a big help. For example, why not add “refresh now” to the context menu in the Power BI application?

In addition, given that the refresh is being initiated manually, some visual cues around the status of the refresh (started, in progress, completed) would help considerably. As it stands, the only status information is available after the refresh completes, on the history tab of “Schedule Data Refresh”.

There has also been another subtle change around how workbooks are displayed in Power BI. When a workbook is opened from the source Office 365 library, the standard Excel Web App interface is displayed, with options for opening in Excel, editing, etc. displayed.

image

However, if you first navigate to the Power BI application, and open the workbook by clicking on the thumbnail, it will open in the browser but without the Web App chrome.

image

I’m not sure what the reasoning is for this different behaviour, but it’s a change, and something that you should be aware of. UPDATE 14/2/14 – It has been explained to me that the reason for this different behaviour is an effort to reduce screen clutter for those using the Power BI application. It’s a consumption mostly application, so this change makes sense in that context. It’s also possible to add the Excel Web app chrome back in by using a new “action bar” (my name). If you look to the bottom of the worksheet window, you’ll see it, and its three icons.

image

The three icons, from left, allow you to submit feedback to Microsoft, to get embed codes for the report (a new feature!) and finally, to restore the standard Web App chrome (for editing etc.)

2 Comments

Scheduled Data Refresh in Power BI

It’s finally here.

Quietly, sometime over the past few days, Microsoft updated the Power BI application in Office 365 along with the Data Management Gateway (get it here). Chief among the changes is the ability to schedule data refresh, which to my thinking, is the single most important feature for deploying Business Intelligence solution in the cloud.

Until now, it has been possible to refresh Excel worksheets with embedded data models on demand, In fact, if your data source was also in the cloud (and was one of the supported data sources), you don’t need Power BI to do it, it’s supported natively in Office 365. If your data source is on-premises (and either Oracle or SQL Server), you can do it through the Data Management Gateway. What has been missing is the ability to have the data model refreshed in the absence of interaction. No longer.

This capability can of course be found in PowerPivot for SharePoint on premises. It is configured on a per workbook basis in the PowerPivot Gallery, which is a PowerPivot focused view of a document library that contains workbooks. In works in much the same manner with Power BI, with the Power BI application taking the place of the Power Pivot gallery.

Configuring Scheduled Refresh

To turn on automatic refresh for a workbook, you need to access the workbook’s BI context menu. To do this, first, open the Power BI application, then locate the workbook that you wish to have refreshed automatically. Click on the ellipsis to access the menu.

image_thumb2

A number of items have been added to the menu, and to the preview graphic itself. To the left of the ellipsis is information on when the model was last updated, and the context menu adds the ability to edit in Excel and to add to Q&A as well. However, the feature that we’re interested in is the scheduled data refresh, and selecting that option takes us to the scheduled refresh screen.

image_thumb6

If refresh has not already been configured, it opens into the settings tab, otherwise, the history tab will be opened.

To turn on refresh, simply select the “on” slider. If your model has multiple data sources you can choose them to be included or not. As far as I have seen, you can only have one schedule per workbook, so if a data source isn’t included, it simply won’t be updated.

Next, select your refresh schedule, which will be either daily or weekly. By default, your schedule will have a shelf life of 90 days, and will turn off after that time. You can adjust this period by changing the value of the “Ends By” field. You can then select a time (or a day and time if appropriate) for the refresh to occur. Finally, any errors will be sent to the email address that you specify in the notification field.

That’s really all there is to it. Selecting “save settings” will save the schedule, and “save and refresh report” will save the schedule, and attempt to run an immediate refresh.

If the data source is cloud based, it will be queried directly by Power BI, and if it is on premises, it will contact the appropriate Data Management Gateway process and refresh through it. I would love for there to be a little more status information for refreshes in the administration portal, but for now, the refresh will either succeed or fail. However, If the data source is on premises, you can open the Resource Monitor on the gateway machine, and monitor the “diawp.exe” process.

image_thumb10

Once the refresh kicks in you’ll notice it using a lot of send bandwidth.

Selecting the “history” tab will of course show the refresh history, and what the refresh schedule for the workbook is. At a glance you can see whether or not refreshes succeeded or failed, how long they took, and how they were initiated.

image_thumb8

I should note here that I have been working with the Power BI preview for several months now, and in order to get scheduled refresh to work with on premises data, I did need to install the latest Data Management Gateway. I’m not sure if this was because scheduled refresh required it, or just because it had expired (it had), but I would recommend installing it in any case. Update 10/02/14 – I have been informed that scheduled refresh does not require the latest data management gateway, but I would recommend getting it all the same – it’s the release version.

One interesting side note. After installing the latest DMG, accessing its configuration shows its version to be 1.0, where previous versions were all point releases (the latest being 0.11). I can’t help but assume that the General Availability of Power BI isn’t far away. UPDATE 10/02/14 – In fact, Power BI went GA today, and this is in fact the GA version of the Data Management Gateway.

Limitations

There are a number of behaviours and limitations that you should be aware of when using scheduled refresh in Power BI. The below items are by no means exhaustive, but simply things that I have either run into, or been made aware of.

Too much data

As I have outlined previously, the maximum size for an embedded workbook model in Power BI is 250 MB. If a user attempts to enable a larger model, they will receive an error message. However, scheduled refresh now allows for the possibility that the model could start small, and then grow to exceed this limit through refresh. What then happens when the limit is exceeded?

When the model is opened for refresh, its size is checked. If it’s OK, the refresh proceeds, and the model is updated.  If the model now exceeds the limits, the next refresh will fail, as will any attempts to work with the file through a browser, until the size of the model is reduced.

Collisions

Refreshes can take a fair amount of time. During this period, the file is not checked out exclusively to the refresh process, and if it is edited by a user in that time an edit collision could occur. If this situation arises, scheduled refresh will simply discard its updates and fail.

Frequency

As mentioned above, the two options for schedule frequency are daily and weekly. I was really hoping to see hourly. Monthly and annually would be great too. As it stands, if your data needs to be more current than daily, then Power BI still won’t work for you (without heavy customization). Of course, the reality is that daily is frequent enough for most situations, and this at least puts data refresh on par with its counterpart in PowerPivot for SharePoint.

If someone from the product team is reading this, hourly updates would be my #2 feature ask, for both Power BI and Power Pivot for SharePoint. (for the #1 ask, read on).

Limited Data Sources

At the moment, the refreshable data sources are those that are currently supported by Office 365 in the cloud (Azure SQL, SQL on Azure VMs, and OData feeds with simple or no authentication), and those supported by the Data Management Gateway (SQL Server 205 +, Oracle 10g +). A full list can be found in the official documentation here.

This is a great starting list, but it is limited. There are quite a number of other data sources that would be great to see on this list, multidimensional sources being right up there. However to my thinking, the most glaring omission on this list is Power Query.

The above data sources are supported if the data was imported into the model through Power Pivot’s import feature (or the native features in Excel 2013). However, if a user takes advantage of the many excellent features available in Power Query, their model will not be automatically refreshable. I have already seen in the preview forums that this difference confuses users, and given that Power Query is a highly touted integral component of Power BI, it needs to become a first class citizen, and soon. That’s my #1 ask – again, both for Power BI and Power Pivot for SharePoint.

However, for the moment, what you need to know is that if your model is built with Power Query, it can’t be refreshed automatically.

Limitations aside, it appears to me that Power BI is an absolutely compelling value proposition, and the inclusion of scheduled refresh completes the picture. I can’t wait for it to be released into the wild. Let the games begin! 

11 Comments

Using Multiple WordPress Blogs with Azure Web Sites

This will be one of my more “meta” posts. Blogging on WordPress discussing blogging on WordPress.

In addition to this blog, my company, UnlimitedViz has a number of active bloggers, The Data Queen, and The Data Model chief among them. We all use WordPress for this purpose, primarily because of its ease of use, and ecosystem of useful add-ins. It’s a PHP application that is (normally) backed by MySQL. UnlimitedViz is a Microsoft shop, so these tools are relatively foreign to us, and Azure is Microsoft’s cloud platform. Notwithstanding that, Azure is an excellent platform choice for us to deploy WordPress, due to its low cost, its WordPress support, and our existing investment in the platform.

I had originally deployed my WordPress blog on Azure back in early 2011 when Azure was really just a rudimentary Platform as a Service product, and wrote about it here. However, I was cheating a little bit, customizing a stateless worker role in as stateless a manner as I could, but it still came back to bite me, and ultimately, I moved my blog over to Rackspace, and then back to Azure, but in an Azure Virtual Machine (IAAS).

When Azure web sites were announced, with direct support for WordPress, I was intrigued. Creating one allows you to create a corresponding, hosted MySQL database (hosted by ClearDB). Unfortunately, each Azure subscription is restricted to a single MySQL DB, and I needed to host multiple blogs with one subscription. I finally got around to looking into this recently, with happy results.

WordPress supports multisite blogs, and the Azure team has some good guidance on how to enable this. However, upon trying this, I quickly determined that this would be a non-starter for us. The WordPress multisite option isn’t supported by some plugins, requires a single master administrator, and requires that all blog owners use the same set of plug ins. Now if there is one word that could describe every UnlimitedViz team member, it’s independent.

What I needed was a way to support multiple WordPress instances with one subscription, one database, and a minimum of administrative overhead. Luckily it’s relatively easy. The trick is to use different table names in the MySQL database for each blog. Below is a step by step example of how to do this.

To begin, all of our blogs are domain oriented, not path oriented. As an example whitepages.unlimitedviz.com vs blogs.unlimitedviz.com/whitepages. In this example we’ll add a new blog with the URL newblog1.unlimitedviz.com to our Azure subscription. The process for creating the first blog is identical to the addition of a new one, with the exception that a new database is created instead of connecting to an existing one.

Step 1 – Add a new Azure Web Site

From the Azure management portal, navigate to Web Sites, select New, Web Site, From Gallery.

image

Scroll down to the bottom of he gallery, and find WordPress. Select it, and press the next arrow.

image

Next, fill in the information about your new blog. The deployment settings will be used by PHP to communicate with MySQL, and will be largely invisible after initial setup. Your new blog will also have a .azurewebsites.net domain. We will substitute (or add) our own later. For now, our new blog will be newblog1.azurewebsites.net, will use an existing database, live in the North Central US data center, and use our corporate subscription.

image

When ready, click the next arrow. If I was creating the first (and only) database, I would be able to give it a name and create it here. As it is, we can select our existing database, agree to ClearDB’s terms of service, and select the “done” check mark.

image

At this point, the web site will be created. Once done, it is possible to navigate to the URL, and set up WordPress, but we need to make an additional modification first.

Step 2 – Specify the Table Prefix

In order to tell this particular WordPress instance which tables to use in the database, we need to modify the wp-config.php file in the web root. How do we do this? We have a few options. We could use FTP to download the file, edit it, and send it back up (FTP settings are under the Dashboard tab for the web site).

image

We could also use GIT, but as I’m unfamiliar with it, I’ll let more GIT friendly folks sort that out as they wish. My preferred option is to use Webmatrix, which allows the direct editing of Azure Web Sites. Webmatrix is available from the bottom tools ribbon wherever a web site is selected.

image

If Webmatrix has already been installed, it will launch, and if not, you’ll need to install it first. Your first option will be its operation mode, either direct, or off-line. We will select direct.

image

Next, we double click on the wp-config.php file. This will open it in the editor. Now we scroll down to the line for $table_prefix, and edit it as appropriate, in our case “newblog1_”.

image

Finally save the file and close Webmatrix. Now when the WordPress configuration wizard is run, it will create table in the database that are prepended with “newblog1_”, and use them thereafter. The configuration wizard runs whenever WordPress can’t find the specified tables in the database

Step 3 – Configure WordPress

Next, we navigate to our URL, where the WordPress configuration wizard will go ahead and complete our setup. In our case, we navigate to newblog1.azurewebsites.net, and fill out the form.

image

When ready, click on the “Install WordPress” button. Once done, we log in and start building out the blog. That’s really all there is to it from the blog perspective. However, there’s likely one more important thing that we need to do.

Step 4 – Activate Your Own URL

In all likelihood, you don’t want to use azurewebsites.net in the domain of your new blog. We could use a DNS alias on our DNS to reroute traffic, but Azure won’t answer any requests that it isn’t expecting. We must first register our custom domain with the Azure web site, and this is only possible with Shared or Standard web sites. New sites get created in free mode, so we need to first switch the compute mode. Keep in mind that we’re switching away from free, so charges will accrue.

To switch modes, from the Azure Management Portal, click on the web site in question, then click the Scale tab. Under general, select on either “Shared” or “Standard” (shared is cheaper), and click save at the bottom of the screen.

image

Once you accept the disclaimers, the mode changes, and we can add our domain. However, before we do, we need to make a DNS change. Azure won’t allow you to add just any domain, it needs to know that you own it. To do so, you need to add an alias (CNAME) entry that points from a verification subdomain (awverify) to a verification subdomain of our web site. In our case, the entry is awverify.newblog1.unlimitedviz.com and it points to awverify.newblog1.azurewebsites.net.

image

Honestly, this is the biggest pain of the entire exercise. The effect of the change is not immediate, and after making the change, you may want to take a break for a while. According to the UI, there are also apparently other aliases that can be used for this purpose, but this is what works for me.

At this point, we can add the domain to the Azure web site. To do so, we open the Azure admin portal, and open the definition for the web site. Next, we click on the Configure tab, scroll down to the domain names section, and click the “manage domains” button.

image

In the dialog that pops up, enter your custom domain name

image

If your verification alias was properly set up, and all is well, you will receive a green check mark status indicator. If not, it will be a red x, and you will need to fix the problem. It could be a misspelled name or just no verification result. However, if all is well, make note of the IP address for your A records, and click the check button to save the configuration.

Finally, we add an A record to our DNS that points our custom domain newblog1.unlimitedviz.com to the IP address noted just above.

image

That’s it. Taking this approach, we can have multiple, independent WordPress blogs sharing a single Azure subscription and a single MySQL database.

image

This is the current setup for all of the UnlimitedViz blogs. There are, however a couple of caveats, that you should be aware of.

Caveat Emptor

Once you go ahead and allow WordPress to run its configuration wizard, it creates its table in the database. If you remove the web site, the underlying tables remain in the databases. This is either good or bad, depending on your perspective. If your website gets deleted, the data persists, and its simple to connect back to it, just like SharePoint. However, if you need to clean out the database, it’s pretty much impossible.

Thus far, I haven’t found any good way to manage the database directly. However, I haven’t looked very hard, as I haven’t needed to, and that’s a good thing.

The one thing that you should certainly be aware of if you’re going to be doing this to any sort of scale, is that the MySQL database that is created automatically is restricted to 20MB of total size, and that’s a limit that you will run into fairly quickly. I certainly never saw any indication of this limit while I was building the environment, but then, I never read any of the terms and conditions. Really, who does? The good news is that it can be upgraded.

The day after moving our blogs to this platform, I received an email from ClearDB stating that I was near my storage limit, and should consider upgrading. The email didn’t indicate how this could be done, so I navigated to the ClearDB site. Since there was a login button I used it, and entered the email that I use for my Azure subscription. Unfortunately my password didn’t work. Creating a MySQL database creates a ClearDB account, but I have no idea what password it uses. Using the “Lost Password” worked, but I was still  unable to log in. Finally, I logged a support issue with them using the provided support form. Very quickly, my account was enabled for “direct login” which is what was necessary, and allowed me to upgrade the database to a greater capacity. The plan that I opted for was $9.99/month, and allows up to 1 GB of space, which is plenty.

There was one other bump that I had to overcome. I was migrating existing blogs, and to do so, I used the export and import features that are a part of WordPress. The export feature downloads an XML file with all of your blog content. Supporting images are not included (they are downloaded at import time), but the file can still be rather large. The first step in the import process uploads the XML file, and then brings it back into the database. The problem is that by default, the maximum upload size for WordPress is 2MB, and I had two that were larger.

The way to get past this is to increase the maximum upload size. In Azure, this can be done through the addition of a configuration file in the web root. The file needs to be named “.user.ini”.

Open your web site using WebMatrix, right click on the site name, and select “New File”. Select TXT as the type, and name it “.user.ini”. Double click on it to open, and add the following line to it:

upload_max_filesize = 50M

image

After saving, you should be able to upload files up to 50MB. It may be necessary to restart the site through the admin portal.

I’ve been quite liking the performance and the stability of this new setup, and I recommend it highly for this sort of requirement.

6 Comments

Append Multiple Tables in Power Query

Power Query transformations can be very powerful, but they only work on one data source at a time. Sometimes data providers will only provide their data in discrete chunks, like one category per table, or data may come from different providers with the same schema. Ultimately, we want to show these different sources together with different attributes, so that it may all be analyzed simultaneously. Power Query supports this requirement through its “Append” function.

Consider the following scenario. We want to analyze alcohol consumption data. The World Health Organization provides extensive data on this, but it is reported separately for each type of alcohol.

image

(source: Global Health Observatory Data Repository)

There is a source for total, but it does not break the consumption down by type. What we need to do is to append the four (beer, wine, spirits, other) categories together. To start with, we need to query for each type separately. The data is provided by the WHO as a CSV data file, but it’s directly downloadable, so we will use the “From Web” data source (which makes refresh simple and removes a download step). First we open up Excel, click the Power Query tab then click on the “From Web” external data source. We then enter the URL of our first category (beer) and click OK. The query editor window will then be opened.

We don’t need to do much in the way of transformation, just turn the first data row into headers (by clicking on the upper left grid icon). Then, we give the query a name (Beer), and importantly, we deselect the “Load to worksheet” load setting.

image

By default, the “Load to worksheet” option is selected (I’ve griped about this elsewhere), but in this case, we don’t want to load the data into the model OR the worksheet. Why not? We’re going to be using this query as an append source with other queries into a final all encompassing appended query, so there’s no point in incurring the data load or storage overhead of the extra data.

Once complete, we repeat this procedure for the other categories. Each of these queries have the same schema, so no transformations need to be made, but keep in mind that there may be cases where we need to do extra work to make sure that the schemas match. Once all of the category queries have been defined, we are ready to perform the append.

From the Power Query tab, we click on the “Append” button which allows us to select two tables.

image

This will create a new query with the result of the append operation. But wait a minute, we have four tables to merge, and the UI only gives us an option for two. We could append our two other tables together, create another append destination, and then append the two append results together, but that’s very cumbersome, and it certainly doesn’t scale much beyond 4 input sources. The ideal scenario would be to append all four sources in one step. Fortunately, that’s possible with Power Query – it’s just not obvious.

From the query that results from the initial append operation, we can see a formula in the formula editor – Table.Combine({Beer,Wine}).

image

This formula uses Power Query’s “M” language, and the good news is that not only can it be easily edited, the Table.Combine function takes more than 2 arguments. It’s a simple matter to add in our other queries to the function to get a single append function.

image

It should be noted that if the queries have a space in their name, it is necessary to refer to them as #”query name” – i.e. #”beer consumption”, etc.   At this point, we give the resultant query a name, and change the load options to load into the data model. Once loaded, we can import any other supporting data, enhance our model, and start analyzing.

This single append also demonstrates that whether or not a particular feature is supported through the user interface, It may be possible to accomplish the goal through some creative M language work. If you’re interested on some more things that can be done with M, I suggest you check out these examples on Chris Webb’s BI blog.

9 Comments