Skip to content

Tag: Excel

Using Excel With External Data – What’s the Right Tool?

Excel has been used with external data for… well, as long as I’ve been using Excel. So why would anyone bother to write a blog post about this given that the capability is so mature? In recent years, Excel has adopted a number of new, and frankly better mechanisms for working with external data, while retaining the old. Given that there are now multiple tools in Excel for working with external data, it’s not always clear as to which one is the best, and unfortunately there is no single tool that wins over all, although I believe that that will be the case soon.

The answer, as always is, “it depends”. When it depends, the important thing is to understand the strengths and weaknesses of each approach. With that said, let’s have a look at all of the options.

ODC Connections

ODC (Office Data Connections) are the traditional method of accessing data in Excel. You can create or reuse an ODC connection from the Data tab in the Excel ribbon.

When using an ODC connection, you establish a connection with a data source, form some sort of query and import the resultant data directly into the Excel workbook. From there, the data can be manipulated and shaped in order to support whatever the end user is trying to do. The one exception to this behaviour is the connection to SQL Server Analysis Services (SSAS). When a connection is made to SSAS, only the connection is created. No data is returned until an analysis is performed (through a pivot table, chart etc), and then only the query results are retrieved.

When the workbook using an ODC connection is saved, the data is saved within it. In the case of an SSAS connected workbook, the results of the last analysis are saved along with it. For small amounts of data, this is just fine, but any large analysis is bound to quickly run into the data limits in Excel which is 1,048,576 rows by 16,384 columns in Excel 2013. In addition such a file is very large and extremely cumbersome to work with, but even as such, Excel has been the primary tool of choice for business analysts for years.

Data loaded into the workbook can be refreshed on demand, but it can also be altered, shaped, mashed up, and as is too often the case, grow stale. Workbooks such as these have become known as “spreadmarts” and are the scourge of IT and business alike. With these spreadmarts, we have multiple versions of the same data being proliferated, and it becomes harder to discern which data is most accurate/current, not to mention the governance implications.

SharePoint has provided a way to mitigate some of the concerns with these connections. SharePoint itself supports ODC connections, and therefore users can access these workbooks stored within SharePoint and it also allows them to refresh data from the source either on demand or on open. A single point of storage along with a measure of oversight and browser access helps to restore a modicum of sanity to an out of control spreadmart environment, but the core issues remain.

In order to help with the core issues, Microsoft introduced PowerPivot in 2009.

PowerPivot Connections

Created in PowerPivot

PowerPivot was originally (and still is) an add-in to Excel 2010, and is a built in add-in to Excel 2013. PowerPivot allows for the analysis of massive amounts of data within Excel, limited only by the memory available to the user’s machine (assuming a 64 bit version). It does this by highly compressing data in memory using columnar compression. The end result is that literally hundreds of millions of rows of data can be analyzed efficiently from within Excel.

You can see that compression at work by comparing the same data imported into an Excel workbook directly, and into a PowerPivot model with a workbook. The following two files contain election data, and represent the maximum number of rows that Excel can handle directly (1,048,576) and 25 columns.

Getting data into the model was originally (and still can be) a completely separate process from bringing it into Excel. PowerPivot has its own data import mechanism, accessed from the Power Pivot window itself. First, click on the PowerPivot tab in Excel and then click manage. If you don’t have a PowerPivot tab, you will need to enable the add-in. If you don’t have the add-in, you have an earlier version of Excel – you’ll need to download it.

Once the PowerPivot window opens, the “Get External Data” option is on the ribbon.

Once the appropriate data source is selected and configured, data will be loaded directly into the data model – there is no option to import that data into a worksheet. Once the data is in, pivot tables and pivot charts can be added to the workbook that connect to the data model much like when creating an ODC connection to Analysis Services. In fact, it’s pretty much exactly like connecting to Analysis services, except that the AS process is running on the workstation.

Created in Excel

PowerPivot, and more importantly the tabular data model was included in Excel 2013. With that addition, Microsoft added a few features to make the process of getting data into the data model a little easier for users that were a little less tech savvy, and may be uncomfortable working with a separate PowerPivot window. That’s actually part of the thinking in leaving the PowerPivot add-on turned off by default.

When a user creates an ODC connection as outlined above, there are a couple of new options in Excel 2013. First, the “Select Table” dialog has a new checkbox – “Enable selection of multiple tables”.

When this option is selected, more than one table from the data source can be selected simultaneously, but more importantly, the data will automatically be sent to the data model in addition to any other import destinations.

Even if the multiple selection option wasn’t chosen, the next dialog in the import process, “Import Data” also has a new check box – “Add this data to the Data Model”.

Its purpose is pretty self-explanatory. It should be noted that if you choose this option, and also choose “Only Create Connection”, the data will ONLY be added to the model, nowhere else in the workbook. This is functionally equivalent to doing the import from the PowerPivot window, without enabling the add-in.

Power Query Connections

When Power BI was originally announced, Power Query was also announced and included as a component. This was very much a marketing distinction, as Power Query exists in its own right, and does not require a Power BI license to use. It is available as an add-on to both Excel 2010 and 2013, and will be included with Excel 2016.

Power Query brings some Extract, Transform and Load (ETL) muscle to the Excel data acquisition story. Data can be not only imported and filtered, but also transformed with Power Query and its powerful M language. Power Query brings many features to the table, but this article is focused on its use as a data acquisition tool.

To use Power Query, it must first be downloaded and installed. Once installed, it is available from the Power Query tab (Excel 2010 and 2013).

Or from the data tab, New Query (Excel 2016)

Once the desired data source is selected, the query can be edited, or loaded into either the workbook, the data model, or both simultaneously. To load without editing the query, the load option at the bottom of the import dialog is selected.

Selecting “Load To” will allow you to select the destination for the data – the workbook, the model or both. Selecting Load will import the data to the default destination, which is by default the workbook. Given the fact that the workbook is an inefficient destination for data, I always recommend that you change their default settings for Power Query.

To do so, select Options from the Power Query tab (2010 and 2013) or the New Query button (2016), click the Data Load section, and then specify your default settings.

Data Refresh Options

In almost every case when external data is analyzed, it will need to be refreshed on a periodic basis. Within the Excel Client, this is simple enough – click on the data tab, and then the Refresh All button, or refresh a specific connection. This works no matter what method was used to import the data in the first place. Excel data connections can also be configured to refresh automatically every time the workbook is opened, or on a periodic basis in the background.

However, workbooks can also be used in a browser through Office Web Apps and Excel Services (SharePoint and Office 365) or as a data source for Power BI dashboards. In these cases the workbooks need to be refreshed automatically in order that the consuming users will see the most up to data when the workbooks are opened. The tricky part is that not all of the connection types listed above are supported by all of the servers or services. Let’s dive in to what works with what.

SharePoint with Excel Services

Excel Services first shipped with SharePoint 2007, is a part of 2007, 2010, and will be included with 2016. From the beginning, Excel Services allowed browser users to view and interact with Excel workbooks, including workbooks that were connected to back end data. The connection type supported by Excel Services is ODC, and ODC only.

Excel Services has no mechanism for maintaining data refresh. However, the data connection refresh options are supported which means that the workbook can be automatically refreshed when opened, or on a scheduled basis (every xxx minutes in the background). Unfortunately, this can come with a significant performance penalty, and once refreshed it is only in memory. The workbook in the library is not updated. The data in the workbook can only be changed by editing the workbook in the client, refreshing it, and re-saving it

Workbooks with embedded data models (PowerPivot) can be opened in the browser, but any attempt to interact with the model (selecting a filter, slicer, etc) will result in an error unless PowerPivot for SharePoint has been configured.

SharePoint with Excel Services and PowerPivot for SharePoint

PowerPivot for SharePoint is a combination of a SharePoint Service application and Analysis Services SharePoint mode. When installed, it allows workbooks that have embedded PowerPivot data models to be interacted with through a browser. The way that it works is that when such a workbook is initially interacted, the embedded model is automatically “promoted” to the Analysis Services instance, and a connection is made with it, thus allowing the consuming user to work with it in the same manner as with a SSAS connected workbook,

The PowerPivot for SharePoint service application runs on a SharePoint server and allows for individual workbooks to be automatically refreshed on a scheduled basis. The schedule can be no more granular than once per day, but the actual data within the model on disk is updated, along with any Excel visualizations connected to it.

When the refresh process runs, it is the functional equivalent of editing the file in the client, selecting refresh all, and saving it back to the library. However, there is one significant difference. The Excel client will refresh all connection types, but the PowerPivot for SharePoint process does not understand Power Query connections. It can only handle those created through the Excel or PowerPivot interfaces.

Power Pivot for SharePoint ships on SQL Server media, and this limitation is still true as of SQL Server 2014. At the Ignite 2015 conference in Chicago, one of the promised enhancements was Power Query support in the SharePoint 2016 timeframe.

Office 365

Office 365, or more precisely, SharePoint Online supports Excel workbooks with ODC connections and PowerPivot embedded models in a browser. These workbooks can even be refreshed if the data source is online (SQL Azure), but they cannot be refreshed automatically. In addition, only ODC and PowerPivot connections are supported for manual refresh. Power Query connections require Power BI for Office 365. In addition, Office 365 imposes a 30 MB model size limit – beyond that, the Excel client must be used. In short, the Office 365 data refresh options are very limited.

Power BI for Office 365

Power BI for Office brings the ability to automatically refresh workbooks with embedded data models. Data sources can be on premises or in the cloud. On premises refresh is achieved through the use of the Data Management Gateway. It also raises Office 365’s model size limit from 30 MB to 250 MB. With Power BI for Office 365 both manual and automatic refreshes can be performed for both PowerPivot and Power Query connections, however Power Pivot connections are currently restricted to SQL Server and Oracle only.

The automatic refresh of ODC connections is not supported. A workbook must contain a data model in order to be enabled for Power BI.

Power BI Dashboards

Power BI Dashboards is a new service, allowing users to design dashboards without necessarily having Office 365 or even Excel. It is currently in preview form, so anything said here is subject to change. It is fundamentally based on the data model and it works with Excel files as a data source currently, and it is promised to use Excel as a report source as well. The service has the ability to automatically refresh the underlying Excel files on a periodic basis more frequent than daily.

In order for a workbook to be refreshed by Power BI, it must (at present) be stored in a OneDrive or OneDrive for Business container. It also must utilize either a PowerPivot, or a Power Query connection. At present, the data source must also be cloud based (ie SQL Azure) but on premises connectivity has been promised.

SQL Server Analysis Services

Another consideration, while not a platform for workbooks is SQL Server Analysis Services (SSAS). Excel can be used to design and build a data model, and that data model can at any time be imported into SSAS. As of version 2014, SSAS fully supports all connection types for import – ODC, PowerPivot and Power Query. Once a data model has been imported into SSAS, it can be refreshed on a schedule as often as desired, and you can connect to it with Excel, and share it in SharePoint. You can also connect to it in Power BI Dashboards through the SSAS connector. From both a flexibility and power standpoint, this is the best option, but it does require additional resources and complexity.

Refresh Compatibility Summary

For convenience, the table below summarizes the refresh options for the different connection types.

 

ODC

PowerPivot

Power Query

Excel Client

M

M

M

SharePoint/Excel Services

M

SharePoint/Excel Services/PP4SP

M

A

SQL Server Analysis Services Import

A

A

A

Office 365

M

M

Office 365 with Power BI

A*

A

Power BI Dashboards

A

A

M – Manual refresh

A – Both Manual and Automatic Refresh

* only limited data sources

 

The Right Tool

I started out above by saying that the selection of import tool would depend on circumstances, and that is certainly true. However, based on the capabilities and the restrictions of each, I believe that a few rules of thumb can be derived. As always, these will change over time as technology evolves.

  1. Always use the internal Data Model (PowerPivot) when importing data for analysis.

     

  2. Power Query is the future – use it wherever possible

    All of Microsoft’s energies around ETL and data import are going into Power Query. Power Query is core to Power BI, and announcements at the Ignite Conference indicate that Power Query is being added to both SQL Server Integration Services and to SQL Server Reporting Services. Keep in mind that we have been discussing only the data retrieval side of Power Query – it has a full set of ETL capabilities as well, which should also be considered.

  3. PowerPivot or ODC Connections must be used on premises

    PowerPivot for SharePoint does not support Power Query for refresh. This means that you MUST use PowerPivot connections for workbooks with embedded models. If you are already using SSAS, use an ODC connection within Excel.

  4. Power Query or PowerPivot must be used for cloud BI.

    PowerPivot connections will work for a few limited cases, but more Power Query support is being added constantly. Where possible, invest in Power Query

  5. If on-premises, consider importing your models into SSAS

    SSAS already supports Power Query. If, instead of using PowerPivot for SharePoint, Analysts build their models using Excel and Power Query, they can be “promoted” into SSAS. All that is then required is to connect a new workbook to the SSAS server with an ODC connection for end users. The Power Query workbooks can be used in the cloud, and the SSAS connector in Power BI Dashboard can directly use the SSAS models created.

  6. Choose wisely. Changing the connection type often requires rebuilding the data model, which in many cases is no small feat.

In summary, when importing data into Excel, the preferred destination is the tabular model, and to import data into that model, Power Query is the preferred choice. The only exception to this is on premises deployments. In these environments, consideration should be given to connecting to a SSAS server, and failing that, PowerPivot imports are the best option.

21 Comments

Schedule Data Refresh for SSAS Connected Excel Workbooks with PowerPivot for SharePoint

Using Excel Services, SharePoint users have been able to share workbooks that are connected to back end data since SharePoint 2007. Typically, the connection is made to SQL Server, or to Analysis services although a wide variety of sources are available. It’s also possible to publish individual components from these workbooks anywhere within the site collection through the Excel Web Access web part. Users can navigate to a dashboard page that contains all sorts of elements including an Excel chart that is connected to back end data. Well, to be precise, it was connected to back end data, the last time the workbook was saved. The workbook itself can be refreshed, but only manually.

When you open an Excel workbook in a browser through Excel services, by default, you’ll see the visualizations and any stored data in precisely the way that the workbook was when it was last saved. If you need to see more up to date data, you can select “Refresh Connections”. If (and sometimes that’s a big if) the server and connections are set up properly, the server will fetch updated data and update the workbook.

 This works well enough, but the problem is that when you, or anyone else opens the workbook again, they’ll still see the old version of the workbook, and will need to manually refresh the date again. In addition, any visualizations published elsewhere on a dashboard will also continue to show old data unless manually refreshed. If the amount of data is significant, this poses a serious performance issue to the server(s). There’s also a significant usability impact in that it’s a pretty big ask of an end user to have them constantly hitting a refresh button.

To get around this issue, one option is to set the refresh options in the data connections of the workbook. Excel Services respects these options. There are two settings that we need to be aware of, periodic refresh, and refresh on open. Connection properties can be accessed within the Excel client by selecting the Data tab, choosing Connections, then highlighting the connection in question and selecting Properties.

Periodic refresh will allow the workbook to be automatically refreshed in the background while it is opened in the browser. This can be useful when the source data is changing frequently. Refresh on opening will have the greatest impact in our scenario, as it will automatically refresh the data in the workbook whenever the file is opened. This will also work with published objects (Excel Web Access web parts) – every time that the web part is opened, the data will be automatically refreshed. This solves the usability problem above because the user no longer needs to manually update the data. However, it does not affect the server load problem.

Due to the fact that the data and visualizations retain the state that they had when the workbook was last saved, it also affects search. When the search indexer runs, it will only index the data that is saved in the workbook. It has no means of refreshing the data. Finally, in addition to the load imposed on the servers by constant refreshes, if the quantity of data being refreshed is large, users can experience significant lags when loading the file. This obviously introduces another usability option. While the refresh options in Excel are helpful, they don’t fully solve the problem. What is needed is a way to automatically open the file for editing, refresh the data, and resave it to SharePoint.

If you have ever used Power Pivot for SharePoint, you know that it can do exactly that. Power Pivot for SharePoint contains two primary elements – a specialized instance of SQL Server Analysis Services that allows users to interact with workbooks that contain embedded PowerPivot models, and a SharePoint service application that among other things, keeps those embedded models refreshed. Using the PowerPivot Gallery (enabled when PowerPivot for SharePoint is installed), you can configure a workbook’s refresh options by clicking on the icon in the Gallery view, or by selecting “Manage PowerPivot Data Refresh” in the simple All Documents view.

 Data Refresh options in PowerPivot Gallery View

 Data Refresh options in All Documents View

Once configured, the PowerPivot for SharePoint Service will refresh the data model in the workbook on a periodic basis (no more than once per day). The service essentially opens the workbook in edit mode, refreshes all of the data connections, and saves the workbook back to the library. If versioning is enabled, it will be saved as a new version. Unfortunately, if you’re not using a PowerPivot data model, the options are unavailable. In Gallery view, the icons are simply unavailable, and while the option is available in the All Documents view, selecting it results in an error.

On the surface, it would seem that using workbooks with PowerPivot is the only option for keeping large volumes of back-end data up to date in Excel visualizations. However, there is a small loophole that you can take advantage of.

The refresh function in PowerPivot for SharePoint refreshes all of the connections in a workbook. While this option is unavailable if the workbook has no embedded PowerPivot model, when it does, it refreshes ALL of the data connections in the workbook, whether they connect to a model, a back end SSAS server, SQL server or whatever. So therefore, if you want to keep your connected data refreshed, the solution is to add a dummy PowerPivot model to your workbook.

Simply open up the PowerPivot window, import some small amount of data from an external source, and save it. Once saved, the PowerPivot refresh options will appear, and you’ll be able to schedule data refresh for your workbook. You can even deselect the refresh of the source data for your dummy model, and the other connections will work just fine.

Once your workbooks are being updated automatically, your users will be presented with up-to date data on load with no delays, all dashboard visualizations will be up to date and quick to render, and the visible data will be picked up by your search crawler. All will be well with the world.

3 Comments

Set Your Data Load Defaults in Power Query

When using Power Query to load data from a data source, the user has three choices as to where that data can be stored. It can go into a worksheet in the workbook, into the data model, or into both. The default behaviour (and therefore the one that will be used most commonly) is to load into a worksheet when one table is being imported, or into the data model when multiple tables are selected.

The reason for the load to worksheet default is understandable, as casual users will expect to see the data immediately after it is imported – they won’t want to open the PowerPivot model editor (which is itself turned off by default in Excel 2013). However, loading to the worksheet introduces quite a few limitations, as I’ve described (OK, whined about) mostly here, but also here and here. These limitations mostly revolve around the million row limit in Excel, the fact that data is stored in the model much more efficiently, and the file size limits when working with Power BI.

The great news is, that as of the May 2014 build of Power Query . These data loading defaults are now user configurable. If you’re taking the time to read this post, chances are that you should set them.

The defaults are Power Query wide, and are therefore located in the Power Query options. Once Excel is open,  go to the Power Query tab and click the Options button.

image

Now, in addition to joining the Customer Experience Program, you can set your data load settings. I always import only to the model, so I check the custom option, and Load to Data Model.

image

I can’t be completely sure, but in my limited testing, it appears that this option setting does not sync between instances of Excel on different machines the way that other (non add in) settings do. You will therefore need to set this for every Excel installation that you have. Now if there was only a Group Policy setting that could be pushed out….

Kudos to the Power Query team for implementing this – it’s a welcome addition to the product.

1 Comment

Changes to the Default Data Loading Options in Power Query

This past week, the March 2014 update for Power Query became available. As always, I wasted no time in getting it installed. There are a number of significant new features in this release (all outlined at the above link), but one that caught my attention is the new “Selection Well” for multi table import.

The Selection Well allows you to not only see all of the items that have been selected for import, but also allows each query to be edited before the query is executed. This is a significant enhancement to the previous behaviour in which the queries would be executed once, and only then could be edited.

However, the thing that jumped out at me most was the change to the default load settings. “Load to worksheet” is no longer the default, but “Load to Data Model” is.

image

Hallelujah! As I’ve pointed out before, I strongly feel that this should be the default given the file size limitations for Excel workbooks in Power BI (still 10 MB outside of the model).  This, on the surface is a very welcome development. However, as it turns out, it’s not quite that simple.

This new default only takes effect when the “Select multiple items” option is selected in the query navigator. (For the record, it is possible to select this option, and then select only one item, and thus take advantage of the new default, but if you need to think about a default, it sort of defeats the purpose). If only one item is selected, then edited, the default is still “Load to worksheet” which I maintain is the wrong default, for all of the reasons outlined in my earlier article.

image

I presume that the thinking here is that casual users will expect to see their imported data immediately in Excel, but a user is using multiple tables, then they will understand that they need to use a data model, and will want to create relationships. The fear is that casual users will be confused if they don’t immediately see their data in the workbook.

This behaviour is also similar to that found in the import data dialog. If you choose to select multiple tables:

image

The data will automatically be added to the data model, whether or not Power Pivot has been enabled. The final screen shows that, and unless the “Table” view is selected, the data will be imported only into the model, not the workbook.

image

I applaud the move within Power Query, but I still feel that load to model should be the default in all cases. The current behaviour is inconsistent. Sometimes it is the default, and sometimes it isn’t, and if you’re not watching, you’ll wind up with the wrong behaviour. I also feel that we should be encouraging Excel’s use as a data client, and discouraging its use as a data base. Leaving the import to worksheet as a default further exacerbated the use of Excel as a “spreadmart” tool.

Friends don’t let friends Load to worksheet.

1 Comment

Problems Manually Refreshing Power BI Enabled Workbooks

Office 365 (without Power BI) has supported data refresh for Excel and PowerPivot for some time, and it works well provided that the data source is both in the cloud, and is one of the supported data sources. To refresh a workbook, simply open the workbook in the Excel Web App, open the data menu and select “Refresh All Connections”.

image

Up until just recently, this was how it worked with Power BI workbooks as well, with the additional ability of being able to refresh on premises data through the Data Management Gateway. However, with the latest refresh of the Power BI application, and its support of scheduled refresh, this has changed. Now, if you follow this procedure and attempt to manually refresh a Power BI enabled workbook from an on premises data source, you will receive an error.

OnPremise error: We were not able to refresh the data connections. On-premise data ources canonly be refreshed vi scheduled refresh in Power BI for Office 365

The error is pretty self-explanatory, so I won’t try to explain it. Scheduled data refresh hasn’t just been added, it has replaced the old refresh method. That’s all well and good, but what about those use cases where we want to manually refresh data? The good news is that it hasn’t been lost, it’s just been moved. It is, however, well hidden.

To refresh the workbook on demand, you must first open the Power BI application, locate your workbook, and click on its ellipsis to open its context menu.

Schedule Data Refresh

Next, you need to select “Schedule Data Refresh”. Now I know that we don’t want to schedule the refresh, but to update it on demand, so you’ll just need to trust me here. Selecting “Schedule Data Refresh” will open the scheduling interface into either the history tab (if the workbook has already been scheduled) or the settings tab (if it has not). In any event, you’ll need to be in the settings tab.

If the workbook has not already been enabled for scheduled refresh, it will need to be. Once it has, the “save and refresh report” button will be available. If it has already been saved, the button will read “refresh report now”. In either case, clicking on it will start the refresh process immediately.

On-demand refresh is still available, but I have to say that it’s well hidden. The fact that it has moved into the Power BI application means that a Power BI license will be required to refresh it on demand, which seems quite reasonable to me. However, some better visual cues would be a big help. For example, why not add “refresh now” to the context menu in the Power BI application?

In addition, given that the refresh is being initiated manually, some visual cues around the status of the refresh (started, in progress, completed) would help considerably. As it stands, the only status information is available after the refresh completes, on the history tab of “Schedule Data Refresh”.

There has also been another subtle change around how workbooks are displayed in Power BI. When a workbook is opened from the source Office 365 library, the standard Excel Web App interface is displayed, with options for opening in Excel, editing, etc. displayed.

image

However, if you first navigate to the Power BI application, and open the workbook by clicking on the thumbnail, it will open in the browser but without the Web App chrome.

image

I’m not sure what the reasoning is for this different behaviour, but it’s a change, and something that you should be aware of. UPDATE 14/2/14 – It has been explained to me that the reason for this different behaviour is an effort to reduce screen clutter for those using the Power BI application. It’s a consumption mostly application, so this change makes sense in that context. It’s also possible to add the Excel Web app chrome back in by using a new “action bar” (my name). If you look to the bottom of the worksheet window, you’ll see it, and its three icons.

image

The three icons, from left, allow you to submit feedback to Microsoft, to get embed codes for the report (a new feature!) and finally, to restore the standard Web App chrome (for editing etc.)

2 Comments