Skip to content

The White Pages Posts

Understanding Storage Costs for Microsoft Fabric

Licensing is always an important topic for any technology, often it can drive architectural decisions. In many cases, licensing models can be difficult to understand, particularly since vendors tend to not like to dwell on discussions of cost. My last post on Fabric licensing was an attempt to help to clarify some of the “squishy” language around licensing for the recently introduced Microsoft Fabric.

That recent post centers on the cost for Fabric compute, but there is another licensing component to Fabric – for storage. With the GA of Fabric last week at Microsoft Ignite, the time is approaching that customers will be charged for storage, so it’s equally important to understand how that works as well. This recently became obvious to me as I was a participant in an online discussion with MVPs and product team members about this very topic, and if we’re not clear, chances are that another clarification post is required.

This discussion was started by Tristan Malherbe (@Datatouille), and he did most of the clarifying work – I am just posting here for accessibility. Thanks Tristan!

As mentioned above, there are two components to Fabric costs, the cost for compute, and the cost for data storage. The cost for compute is a function of the capacities that are used (Microsoft Fabric – Pricing | Microsoft Azure). Due to the way Fabric stores data, that cost is quite cheap, especially compared to the cost of storing data in transactional database systems (ie Azure SQL). It is in line with general blob storage in an Azure storage account at a list price of approximately $0.025/GB/month. Simple enough, right?

This storage price is true for all Fabric artifact that store data, with one exception. Power BI artifacts are exempt from additional storage costs up to 100 TB of storage per capacity. Any storage beyond that amount is subject to the ~$0.025/GB/month charge. The question therefore is, what, exactly is a Power BI artifact?

Qualifying Power BI artifacts are:

  • semantic models (datasets)
  • reports
  • dashboards
  • dataflows (Gen1)
  • datamarts
  • paginated reports.

It should be noted however that when the recently introduced OneLake integration for import-mode semantic models is used the data cache is stored OneLake, NOT in the semantic model itself. It is therefore subject to the Fabric storage pricing, but it will not count against your 100 TB storage quota for Power BI storage.

In summary,

  1. ALL Fabric SKUs include 100 TB storage for Power BI artifacts
  2. A Power BI artifact is any of the following items: datasets, reports, dashboards, dataflows (Gen1), datamarts, paginated reports
  3. Any other Fabric artifacts (lakehouse, warehouse etc.) are charged ~ $0.025/GB/month for storage
  4. Semantic models using the “OneLake integration for Import-mode semantic models” are charged ~ $0.025/GB/month for storage

Hopefully this helps.

1 Comment

Understanding licensing for Microsoft Fabric from a Power BI perspective

Shortly after Premium capacities were announced for Power BI, I published a post titled Understanding the Power BI Capacity Based SKUs to help the Power BI community understand how this new license model worked. With the recent announcement of Microsoft Fabric, that model got a little more complex. Admittedly, it’s not the same radical departure that Power BI Premium was initially – in fact, it’s based on the same model, but there are several nuances that are worth understanding.

Adam Saxton recently addressed Fabric licensing on Guy in a Cube here – Understanding Microsoft Fabric Licensing and Cost (Public Preview) and I highly recommend watching it. My take on the differences is below.

For those familiar with Power BI user and capacity licenses, the easiest way to understand the differences brought by Microsoft Fabric is to think of all of the new Fabric features as new Premium features in Power BI. Fabric also introduces a new type of capacity, the F sku in addition to (for the moment at least) the P and A skus currently available to Power BI. For those less familiar, and for more detail, a more comprehensive discussion is warranted.

User vs Capacity based licenses

Power BI has two different licensing models, user based, and capacity based. Power BI is a component of Fabric, but all other Fabric assets use only capacity-based licensing. Therefore, within the context of Fabric, the Power BI license only pertains to Power BI assets. There are three user-based licenses for Power BI: Free, Pro, and Premium per user. A user’s license determines which workspaces that the user can access.

A workspace contains assets created by users and provides a logical security boundary. Workspaces are also bound to a capacity, and that binding determines which capacity is used when these assets are utilized.

A capacity is a collection of resources that are consumed when various assets are utilized. For example, when a user renders a report, or a dataset refresh is run, resources are consumed from the workspace’s underlying capacity. There are several capacity types and sizes, and the type of capacity determines the capabilities that are available to the workspace.

Capacity Types

There are five capacity types commonly in use with workspaces today. These are:

  • Shared (or Pro)
  • Premium per User
  • Premium A sku
  • Premium P sku
  • Fabric F sku

Shared capacities (also referred to as Pro capacities) are the default capacity that any new Power BI workspace is backed by. Shared capacities are for Power BI only and are provided by Microsoft at no extra cost. Shared capacities do not provide Premium or Fabric features and impose certain operating limits. To access a workspace that is backed by a shared capacity, a user must have a Power BI Pro license.

Premium per User capacities are available to users that have a Premium per User license. PPU capacities are for Power BI only, are provided by Microsoft at no extra cost and they do provide most Premium, but not Fabric features. To access a workspace that is backed by a Premium per User capacity, a user must have a Power BI Premium per user license.

All other capacity types are purchased for the tenant by an organization. There is a wide variety of options to choose from. To utilize Fabric features, the capacity must be either a P or an F sku. P skus have been with us since Premium was initially introduced, and F skus have been introduced with Fabric. Functionally there is no difference between the two, apart from how they are purchased, which is covered below. A complete summary of capacity features and resources can be seen below.

Capacity Features

FeatureSharedPremium APPUPremium PFabric F
Dataset size limit1 GB3-400 GB100 GB25-400 GB1-800 GB
Refreshes per day848*48*48*48**
XMLA EndpointReadRead/writeRead/writeRead/writeRead/write
DataflowsBasicEnhancedEnhancedGen 2Gen 2
Automatic Page Refresh.5 hr min1s min / change detect1s min / change detect1s min / change detect1s min / change detect
Deployment PipelinesNoYesYesYesYes
DatamartsNoYesYesYesYes
Hybrid TablesNoYesYesYesYes
MetricsYesYesYesYesYes
AI Capabilities (Auto ML, CS)NoYesYesYesYes
Automatic AggregationsNoYesYesYesYes
Multi-geo supportNoYesNoYesYes
Other Fabric assetsNoNoNoYesYes
*  Limited by the UI. Unlimited refreshes available programmatically
** Limited by the UI. Unlimited refreshes available programmatically. Direct Lake mode datasets will not require refresh.

Capacity Resources by sku

Premium APremium PFabric FCapacity Units
(CU)
V-CoresMax Dataset size
(GB)
F220.251
F440.52
A1F8813
A2F161625
A3F3232410
A4P1F6464825
A5P2F1281281650
A6P3F25625632100
A7P4F51251264200
A8P5F10241024128400
F20482048256400

As you can see in the above chart, the A4. P1, and F64 skus all have the same resource level. Therefore, an organization could purchase an F64 capacity, disable Fabric features, and have exactly the same experience as with the P1 sku. The only difference between them is the way that they are acquired, which is discussed below.

User Access

In addition to features and resources, the capacity that backs a workspace also determines the user license that is required to consume Power BI resources contained within it. It is worth noting here that no matter what capacity backs a workspace, a Power BI Pro license is required to publish reports to it.

The license requirements for the various capacity types can be seen in the table below. Again, this pertains to Power BI artifacts only – all other Fabric artifacts require a P or F capacity and can be accessed by users without a Power BI license.

Workspace access by Power BI License

Client licensePersonalSharedPremium A SKUPremium Per UserPremium P SKUFabric
FreeXX✔*
Power BI ProX
Premium per user
*Fabric capacity must be F64 or above for access with a free license. Below F64 requires Power BI Pro.

License Acquisition

User licenses are relatively straightforward. Free licenses are available to any Azure Active Directory account. Power BI Pro licenses are part of Microsoft 365 E5 licenses, and can also be purchased separately from the Microsoft 365 store. Premium per User licenses include Power BI Pro, and can also be purchased separately, or as an add-on to an existing Pro license from the Microsoft 365 store.

Capacity acquisition is significantly more complicated. Indeed, the way that they are acquired can often play a role in selecting the appropriate capacity type for a given scenario.

The Premium P sku is purchased from the Microsoft Office store, and it requires an annual commitment. That is enough to make many smaller organizations pause before trying it out. The Premium A sku is purchased from Azure, can be turned on and off, and is billed hourly. It also has a significantly lower entry point. The A1 sku has only 1/8 the resources of a P1 sku, and is significantly cheaper to get started with. If Power BI features are the only thing of interest, thne the A sku presents a compelling choice, but it does not support Fabric features.

The new Fabric skus appear to bridge the gap between the P and A skus. Fabric skus are available in a much wider variety of resource levels (F2 to F2048) which makes them approachable for organizations that want to get started or have less demanding requirements. They can be scaled up past the maximum currently available in P5. Finally, they are purchased from Microsoft Azure and do not require an annual commitment (using pay-as-you-go). Pricing was recently announced and can be found at the Microsoft Fabric blog here – Announcing Microsoft Fabric capacities are available for purchase.

As mentioned above, a P1 is functionally equivalent to an F64. However, the price of an F64 in the blog post is approximately 40% higher than that of a P1. This is because the P1 requires an annual commitment and is effectively discounted by that very same 40%. The Fabric skus will also have an option to opt in for an annual commitment, and to then enjoy a significant discount, presumably making the cost of the P1 and the F64 equivalent. The details of this have not yet been announced, so your mileage may vary.

Trial Licenses

In order to “try before you buy”, a 60-day trial license is available. Details of the trial can be found here – Fabric (preview) trial. The trial will grant the requesting user a Premium-per-user license along with an F64 capacity. It’s a preview of all Power BI capabilities, and the capacity allows for the testing of all Fabric capabilities.

One word of warning with trials. If a user with a free license accesses Power BI assets that they have right to, but are unlicensed for, a free trial will automatically be granted, and the clock will start ticking on a 60-day trial. This grants the user not only a Premium per User license, but also a Fabric capacity. This does provide a seamless experience for the user but may come as a shock when the 60-day period is up.

Conclusion

While the existing Power BI Premium skus will continue to exist for those that want them, the Fabric skus are clearly the way forward. They provide all the same features as the legacy capacities, with increased sizing options on both the lower and higher end of resource requirements. The option to pay as you go or to take a commitment means that they can be both approachable and cost-effective, Finally, if all you need is Power BI features, you can turn off Fabric, and still enjoy these flexibility benefits.

1 Comment

Continuous export for Azure Application Insights using Azure Data Explorer (Kusto)

If you use Azure, chances are that you’ve used Application Insights. Application Insights collects telemetry data for web applications, and allow that telemetry to be queries, analyzed or used to alert for any anomalies. It’s backed by the Kusto engine, which makes it possible to query and aggregate substantial amounts of data very quickly. It also relatively inexpensive. Depending on the nature of your data, however, you may find yourself bumping into one of its limitations. In most cases these limitations can be overcome by “upgrading” to AI’s big brother, Azure Data Explorer (ADX) which is more commonly known as Kusto.

This article outlines a pattern for continuously streaming data from Application Insights data to Kusto.

Application Insight Limitations

Retention – Data in application Insights is retained for 90 days at no charge and can be retained beyond that for a cost of $0.155 per GB per month. While the price is reasonable, there is a hard cap on retention of 2 years. Data older that 2 years is purged, so if longer retention is required, another solution is required.

Backup – At present, there is no way to backup and restore Application Insights data. It is certainly possible to export this data to a variety of media, but there is no way to restore it.

Data limits – Application Insights can struggle when a large amount of data is requested. It is not possible in any circumstance to query more than 500,000 rows or 64 MB of compressed data. It is possible to implement paged queries to work around this limitation, but this can be problematic. Query timeouts are also limited to 100 seconds, and unlike the underlying Kusto engine itself, these limits are absolute.

Scale – Application Insights is a “one size fits all” service. It cannot be scaled either up or down. It is therefore not possible to overcome issues with query performance of service limits by adding power.

Schema – At present, Application Insights collects data into 10 different tables. The schema of these tables is fixed and cannot be changed. It is possible to use custom data in these tables, in fact many have columns of dynamic type for just this purpose. JSON data can be added into these columns and queried by the engine. This makes Application Insights highly flexible.

The downside of this flexibility is performance. Querying custom data requires the engine to parse data at runtime. The engine is incredibly efficient at doing this, but it cannot compare to more structured columns, particularly when querying massive amounts of data. The fixed nature of Application Insights precludes other approaches for improving query performance like materialized views, etc.

Moving to Kusto

If any of these limitations are an issue, you may wish to consider moving your Application Insights data into Azure Data Explorer, otherwise known as “Kusto”. Kusto is the engine behind all of Azure Monitor (which includes Application Insights and Log Analytics), and it employs the same query language.

When you use your own Kusto cluster, you have complete control over your data. A Kusto cluster contains one or more nodes and can be scaled automatically. Specifically, it solves the limitations inherent to Application Insights while maintaining familiarity with the same data types and query language (KQL). It addresses the AI limits in the following ways:

Retention – Kusto has advanced data retention and caching settings that can be set at both the database level and the table level. Retention can be set to unlimited if necessary.

Backup – Kusto can connect to external tables that are connected to Azure storage accounts or to SQL tables. Continuous export can be added to any Kusto tables so that the externalized data is always up to date. Data can be restored from these externalized sources, or by reingesting directly from them. Alternatively, AI data can be simultaneously streamed into Azure storage accounts, and this data can be ingested into Kusto for restoration.

Data limits – The default query limits in Kusto are the same as those found in Application Insights, but here they are soft limits. They can be overridden, and asynchronous operations can be used to circumvent them when necessary. In most cases however, by using data optimization strategies available to Kusto, these limits should be less important.

Scale – Kusto clusters can be as small as 1 node (for development – a single node cluster has no SLA), and as large as 1,000. Each node can be as small as 2 CPUs/14 GB RAM, and as large as 32 CPUs/128 GB RAM. There is no limit to the quantity of data that can be ingested.

Schema – This is where Kusto really shines. Data can be transformed natively at the time of ingestion using update policies. Custom Application Insights data can be extracted from the dynamic columns into more structured tables. This has the benefit of greatly enhancing performance. In addition, materialized views can be created to further enhance query performance, create pre-aggregated query targets, etc. These strategies can greatly improve query performance.

By streaming Application Insights data into Kusto, you can continue to take advantage of the rich data collection capabilities of Application Insights, without being constrained by its storage limitations. In this scenario, AI acts as your telemetry collector, and Kusto your storage engine. The remainder of this article outlines how to do this.

Setting it all up

In our scenario, we are collecting data from 3 Application Insights tables, pageViews, customMetrics, and customEvents. To capture this data, we will use Diagnostic Settings, which transforms the table names to AppPageViews, AppMetics, and AppEvents respectively. The entire process is shown below for reference:

Azure Monitor collects data from application Insights as it arrives through Diagnostic settings. The data is then sent to an Azure Event Hub, as well as to an Azure Data Lake Gen2 (ADLG2) account for long term storage and recoverability. Azure Data Explorer (Kusto) ingests data directly from the Event Hub in near real time. Event Hub data is transformed and delivered to three staging tables through update policies and functions. In addition, External tables are connected to three containers in the storage account for diagnostic purposes or re-ingestion on demand.

Create an Event Hub and (optionally) a storage account

Data will be streamed continuously to an Event Hub and to and Azure Data Lake Gen 2 (ADLG2) account.

The Application Insights instance, the ADLG2 account, and the Event Hub namespace must all exist within the same Azure region. This is a limitation of the Azure Monitor service. The Kusto cluster can exist anywhere.

When creating the storage account, be sure to select the option for “Enable hierarchical namespace” from the Advanced page. This is what distinguishes an ordinary storage account from an ADLG2 account.

Configure Application Insights diagnostic settings

Many Azure services can stream usage data through their “Diagnostic Settings” option. In the case of Application Insights, all the collected data can be streamed. It should be noted however that the table names do not match those within the Application Insights logs, they are the same as those found in the Log Analytics workspace that backs the AI instance. In the example below, we are collecting data from the AppEvents, AppMetrics, and AppPageViews tables (customEvents, customMetrics, and pageViews in AI).

In this case we are sending data to an Event Hub and to an ADLG2 storage account. Each table will store its data in a separate container, and it is not possible to change that container.

Create the Kusto ingestion table and set up ingestion

The data stream to the Event Hub contains records from three different tables with different schemas. To accommodate this, we will create a temporary holding table, set up a policy to automatically distribute data in this table to three tables with different schemas, and then a retention policy to purge the holding table after distribution.

The holding table to receive Event Hub data will be named Pages_EventHub, and can be created from a Kusto query window using the following command:

.create table Pages_EventHub (records: dynamic)

This will create a table with one column named records which is of the dynamic data type. Event Hub data will land here.

Next, we create an ingestion mapping to match the incoming Event Hub JSON data to the holding table. This can be done from a query window using the following command:

.create table Pages_EventHub ingestion json mapping "RawRecordsMapping"
'['
' {"column": "records", "Properties":{"Path":"$.records"}}'
']'

When we define an ingestion, we will refer to this mapping by the name RawRecordsMapping. This mapping is a property of the holding table, and it will return the records path from the incoming JSON data and place it in the records column of the Pages_EventHub table.

Next, we define the data connection for the ingestion. To define a connection, navigate to your Kusto cluster, and open the Databases node, and then open the database that will receive the data. Finally, select Data connections, then Add data connection, and select Event Hub.

Give the connection a name, select the Event Hub namespace and hub, the default consumer group, and no compression. Use the table name and mapping created above and select JSON as the data format. When finished save the data connection.

If data is flowing into the Event Hub, it should begin to appear in the ingestion table within a few minutes, a typical time lag is 5 minutes. Once confirmed, it’s time to create the destination tables and update policies.

Create destination tables and update policies

We want to take data from the Event Hub and “reconstitute” it in Kusto. To do that, we will closely copy the data structure from the Log Analytics workspace that is connected to our Application Insights instance, leaving out some unnecessary system data. In our case, we will create three tables using the following Kusto commands (one at a time) in the Query window:

.create table pages_Staging_PageViews (TenantId: string, TimeGenerated: datetime, Id: string, Name: string, Url: string, DurationMs: real, PerformanceBucket: string, Properties: dynamic, Measurements: dynamic, 
OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Events (TenantId: string, TimeGenerated: datetime, Name: string, Properties: dynamic, Measurements: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Metrics (TenantId: string, TimeGenerated: datetime, Name: string, ItemCount: int, Sum: real, Min: real, Max: real, Properties: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string)

Next, we construct queries that will fit the schemas for these three and filter the result for the appropriate type. These queries will then be used to create Kusto functions for each of the three tables. The commands to create the three functions, which contain our queries can be found below.

.create-or-alter function fn_Pages_PageViewsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppPageViews"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Id = tostring(records.Id),
Name = tostring(records.Name),
Url = tostring(records.Url),
DurationMs = toreal(records.DurationMs),
PerformanceBucket = tostring(records.PerformanceBucket),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_EventsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppEvents"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_MetricsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppMetrics"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
ItemCount = toint(records.ItemCount),
Sum = toreal(records.sum),
Min = toreal(records.Min),
Max = toreal(records.Max),
Properties = todynamic(records.Properties),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

With the three functions in place, we need to create an update policy that will use the results of a function to load a table whenever data is added to the holding table. For our pages_Staging_PageViews table, we run the following command to create the policy.

.alter table [@"pages_Staging_PageViews"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_PageViewsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

This command adds an update policy to the pages_Staging_PageViews table. This update policy will be invoked whenever data is added to the Pages_EventHub table. It will execute the fn_Pages_PageViewsIngest function created above against this new data and load the result into the pages_Staging_PageViews table. The function itself filters out all data that did not originate from the original AppPageViews table and transform it to match the destination schema.

The commands for creating the policies on the other two tables are below:

.alter table [@"pages_Staging_Events"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_EventsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

.alter table [@"pages_Staging_Metrics"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_MetricsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

The last step is to add a retention policy to the Pages_EventHub table that will remove data automatically after it has been processed. This is an optional step and can be done at any point to conserve resources. A retention policy will remove ingested data after a defined time. Setting the period to 0 will delete the data shortly after all update policies have completed.

In this case the policy is added to the holding table by running the following command:

.alter-merge table Pages_EventHub policy retention softdelete = 0d recoverability = disabled

At this point, data should be flowing into the three destination tables shortly after it arrives through the event hub.

Connect external tables to the ADLG2 data (optional)

Earlier, we selected both an event hub and a storage account to receive data from Application Insights. The reason for the storage container is to provide an authoritative source of persisted data. Data in Application Insights expires by default after 90 days and cannot be retained any longer than 2 years. Data in Kusto can be persisted for an unlimited period, but it too can be configured to expire after a period of time. Storing the data in a storage account ensures permanency, and provides a location to re-ingest from should any disaster befall the Kusto data.

Kusto can be connected to external data sources as an external table. These sources can be a storage account, or SQL databases. While not strictly required, it is a good idea to create external tables connected to this data so that this data can be queried, and re-ingested with relative ease whenever necessary.

Connecting Kusto to ADLG2 storage is a two-step process. First you create a shared access signature, and then you create an external table in Kusto using that signature. A shared access signature can be created for the entire account, a container, or even a folder. Since we will be connecting to three different containers, we will create the signature at the account level. To do this navigate to the storage account in Azure, and the select Shared access signature in the Security + networking section. Select Blob and File from Allowed services, and then Container and Object from Allowed resource types. Set an expiry date applicable to your situation. The external table will stop working once your expiry date is exceeded.

When ready, click the Generate SAS and connection string button, and the screen will appear as follows:

Make note of the Blob service SAS URL – it will be needed in the next step. It’s also a good idea to record these settings, as it’s not possible to go back and retrieve them later.

Capturing the three tables above to ADLG2 creates the following three containers in the storage account:

  • insights-logs-appevents
  • insights-logs-appmetrics
  • insights-logs-apppageviews

When creating the external tables below, the Blob service SAS URL values need to be modified to include these containers by adding them before the token in the URL. Therefore:

https://mystorageaccount.blob.core.windows.net/?sv=2021-06-08&……. becomes

https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?sv=2021-06-08&……. and so on.

To create the external table in Kusto, navigate to a Kusto query window that is connected to the appropriate database. The following commands can be used to create the table, substituting the sample url with the ones from above:

.create-or-alter external table Pages_AppEvents_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,ItemCount:int) 
kind=blob 
dataformat=json
( 
   h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?******' 
)

.create-or-alter external table Pages_AppMetrics_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,Sum:int,Min:int,Max:int,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appmetrics?******'
)

.create-or-alter external table Pages_AppPageViews_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Measurements:dynamic,Id:guid,Name:string,Url:string,DurationMs:int,PerformanceBucket:string,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-apppageviews?******'
)

Once created, the external tables can be queried like any other table. They can be used for data validation or reingestion as appropriate.

In Conclusion

Once the data is flowing, subsequent tables and update policies can be set up to further transform the data, and materialized views can be created to further optimize query performance. Moving Application Insights data into Kusto gives you the best of both worlds, the telemetry collection capabilities of Application Insights, and the big data power of Kusto. This approach is not limited to Application Insights either – it can be used with and Azure services that support Azure Monitor with Diagnostic Settings.

Be aware however that this migration is a one way street. Once the data is in Kusto, it can be retained for as long as you like, but it can’t be put back into the source.

Leave a Comment

Connect Power BI to Azure Monitor data using Direct Query with Azure Data Explorer

Man – that’s a boring title. But it’s accurate.

A few months ago, I posted an article outlining how to connect Power BI to Azure Application Insights and Azure Log Analytics (jointly referred to as Azure Monitor) with Direct Query. This article describes an approach that allows you to use a native Kusto connector to connect to the Azure Monitor instance as if it were an ADX cluster. This allows for Direct Query to be used, among other things. The option connecting Power BI available through the Azure Monitor UI uses an html connector to query the respective APIs, and that connector doesn’t support Direct Query.

The problem with using this connector is that it’s a bit of a hack. At the time it was written, you needed to use the old Power BI driver for Kusto to make it work, and that approach isn’t simple. Over time, it stopped working altogether for Application Insights. The ADX connector has since been updated to support connection to Azure Log Analytics (but not Application Insights) and is therefore still valid.

There is however another way to achieve this by using your own ADX cluster. ADX clusters allow for “cross-cluster queries” that permit tables in a database in one cluster to be joined or unioned with tables in a completely different cluster. The same proxy addresses mentioned above can be used in one of these cross-cluster queries, and in this way, be just use the ADX cluster as an intermediary.

Everything that you need to know about this approach can be found in the support article “Query data in Azure Monitor using Azure Data Explorer”

To create a Power BI report that queries Azure Monitor data using Direct Query, first create a new report, and connect to data using the “Azure Data Explorer (Kusto) connector”. Enter the address of the cluster, and the name of a database within that cluster. The database itself doesn’t matter; it simply provides a scope for the query. Finally, you need to specify the query, and this is where the cross-cluster query comes into the picture. The query takes the following form:

cluster(‘ProxyURL‘).database(‘ResourceName‘).TableName

The Proxy URLs differ between Log Analytics and Application Insights. The two take the following forms:

Log Analytics:

https://ade.loganalytics.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.operationalinsights/workspaces/<workspace-name>

Application Insights:

https://ade.applicationinsights.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.insights/components/<ai-app-name>

The cross-cluster query for the table named “pageViews” in an Application Insights instance named “WhitePagesLogs” in a Resource group named “MyResourceGroup” in the subscription “71a90792-474e-5e49-ab4e-da54baa26d5d” is therefore”

cluster('https://ade.applicationinsights.io/subscriptions/71a90792-474e-5e49-ab4e-da54baa26d5d/resourcegroups/MyResourceGroup/providers/microsoft.insights/components/WhitePagesLogs').database('WhitePagesLogs').pageViews

It is worth explicitly noting that the resource name appears twice in the query – once in the cluster address, and as the database name.

When ready, the Get data dialog box should appear as follows:

If you want to use Direct Query, don’t forget to open the Advanced Options section, and select it here.

At this point, the report can be built, and it will behave as if it was a normal ADX cluster. You can of course build more complex queries, etc, but you cannot build things like functions, or materialized vies, since you do not have administrative access to the engine behind Azure Monitor.

Compared to using the Power BI ADX connector directly, this approach has the advantage of being explicitly supported, and it also works with bot Application Insights, and Log Analytics. On the downside, there is a cost to running your own ADX cluster, although it is minimal. This cluster is simply acting as a gateway in this case, and therefore, a bare minimum of resources will suffice.

2 Comments

First impressions of the new “Visualize in Power BI” feature for SharePoint Lists

SharePoint Lists have been a data source for Power BI ever since its introduction. Power BI Desktop can connect to those lists to build reports, and once published in the Power BI service, those same reports can be surfaced in SharePoint through the Power BI web part, or in Teams through tabs or the Power BI app. In fact, I wrote a series of articles a few years ago on using Power BI to report on complex field types.

While those articles are still valid, the need for such articles means that report creation has required a high level of sophistication, and the various tools required puts report creation out of reach for a whole class of users. With this in mind, the Power BI team has recently rolled out its first iteration of a report creation feature for Lists. Given my SharePoint background, I had to dig in. What follows are my first impressions.

Using Quick Visualize

Currently this feature is part of the Microsoft 365 Targeted Release experience but will roll out more broadly over the coming months.

At the top of each list is an “Integrate” button. This button is available in both Microsoft Lists, and through the standard SharePoint interface. Selecting it allows you to create a Power App or Power Automate Flow from the list, and once the feature is rolled out, you can create a Power BI report from this button. To begin with, I started with a relatively simple list to collect donation pledges:

A specialized Power BI window then opens with a pre-configured report. The service makes a guess at what you’ll want to visualize. Measures are summarized in cards and sliced by various dimensions horizontally. At the bottom is a table laying out the detail. For my list above, the default report appeared as follows:

In my list, monetary value was the only numeric field, so a count was also added. Along the right of the report is the brand new Summarize pane, which is how you select which measures and dimensions are included in the report. In my case, I’d rather slice by year than quarter and Count of Availability doesn’t mean much so I want to deselect it and add Count of rows. Just by using the Summarize pane, the report will appear as follows:

There is a limit of 4 dimensions that can be selected for slicing the measures.

If you want to customize the report further, you can select Edit, and you will be presented with a more classic report editing experience in Power BI. When you do this, you will lose access to the Summarize pane, but you will be able to make a lot of fine-grained changes to the report.

When complete, you can use the “Publish to the list” button to save it and share it.

What’s Different

Reports created in this manner are stored along with the list that created them in a specialized workspace. They cannot be viewed in the Power BI service at all. In addition, it’s not possible to download the report as a .pbix file to publish it elsewhere – it lives in the list. To access it, simply press the Integrate button again, and the reports that have been published from this list will all be available.

Like most other Power BI reports, these reports are driven from a data set. In this case, the data set is created behind the scenes automatically, but like most datasets, it contains a cache of the list data. This means that the report will not always be up to date with respect to the list data, and therefore must be periodically refreshed.

With these types of reports, the refresh happens every 3 hours automatically. This allows the data to be refreshed 8 times per day, which is the maximum allowed for the Power BI Pro license.

You don’t need a Pro license to use this feature, but you will if you want to publish your reports back to the list. The key here is that any sharing features in Power BI require a Pro license, or a dedicated capacity (Premium) backed workspace. At the moment, there is no way to move these reports to a dedicated capacity.

Complex SharePoint Fields

Lists are infamous for their specialized column types. Lookup columns, rich text fields, managed metadata can be tricky to work with from an integration standpoint. Happily, Power Query makes working with them fairly simple as I describe in the article series that I mentioned above (the latest one on location columns with links to the others is here – Using Power BI to Report on Location Columns in SharePoint – The White Pages (unlimitedviz.com)). However, the Quick Visualize feature doesn’t use Power Query. So how does it handle complex SharePoint fields?

Not that well as it turns out. At least not for now.

I decided to turn this loose on my “Properties” list that I use for demos. It contains one of every SharePoint field type. The initial result can be seen below.

To be fair, there is only one numeric column in this list, and it is composed entirely of complex field types. However, it does mean that for the moment at least, if you’re working with complex field types, this feature will only be of limited use to you. Even in the first example above you can see that the donor value was always “[List]” because donor was a person field.

This is a list of all the complex SharePoint field types, and how they are expressed when using Quick Visualize:

TypeResult
Person[List]
Multi-Value Choice[List]
ChoiceChoice fields work – the value is shown
Managed Metadata[Record]
Rich TextThe raw HTML value is shown with all markup
Location[Record]
Picture or HyperlinkThe destination URL is show, not linkable
Lookup[List]
BooleanBoolean fields show True or False
CurrencyCurrency fields work

Impressions

It’s important to keep the design of this feature in mind. This is for users that do not have a lot of experience with Power BI that need a way to quickly visualize their list-based data. This is not for report designers that intend to publish reports for a wide audience – Power BI Desktop is there to serve that purpose.

This is a great move in the right direction. The lack of support for complex SharePoint field types mean that its usage will be limited to simple lists, but there are a lot of those out there, and I have every confidence that support for them will be coming. The lack of control of the refresh schedule may also prove limiting in several circumstances.

Overall, “Visualize in Power BI a great step in the right direction, and I suspect that we’ll see a lot more along this line in the not too distant future.

Leave a Comment