A common operation in Power BI is calculating the Distinct Count of a column. You may, for example, want to show the number of unique customers in your report, while the sales table contains duplicates. Most users will then go ahead and use the built-in DISTINCTCOUNT function to calculate this. But often this is not the best performing function for the job. Optimizing DISTINCTCOUNT can improve your calculation speeds by over 20 times! In this blog post, I share with you 5 different ways to calculate a distinct count in DAX. You can then track the best performing version for your report.
Table of contents
You are working with a simple data model. The model has a single fact table that contains the Sales and connected to it are the dimensions Date, Customer and Product.
To make this exercise more interesting, we will try to calculate something that’s not very straightforward. Let’s pretend you would like to calculate how many unique last names can be found in the dataset, and you want to split this by the colour of products customers bought and by month. It does not make sense, but for the sake of the exercise, let’s stick with it.
The implications of calculating this are:
- The Product Color does not have a direct relationship with the Customer table. A filter on Product can only reach the Customer table through the Sales Table. By default, however, any filter on the sales table does not reach the Customer table because of its many-to-one relationship.
- The Month Number is derived from the Calendar table, which is related to the sales table. Filters on the month number don’t reach the Customer table because of the many-to-one relationship between Sales and Customer.
- The Customer table is connected with the Sales Table through the CustomerKey. However, since we are calculating unique Last Names, there may be duplicates in the data. Splitting the Customer Name by Month and Colour will have to work against the one-to-many relationship with sales.
To get the desired results, you will have to use DAX to propagate filters from the Product and Calendar Table to the Customer table. We will discuss five variations, and then have a look at their performance.
2. Versions of Distinct Count
2.1. COUNTROWS + Values with Expanded Tables
The first version generates unique customer LastNames using VALUES( Customer[LastName] ). COUNTROWS then counts the number of last names. These two functions together only base their result on the customer table. The filters coming from Calendar (month/year) or from the Product table (colours) don’t reach here. This is solved by adding the expanded Sales table as a filter argument in CALCULATE. It forces CALCULATE to take into account all filters that reach the Sales table.
# Customers v1 := CALCULATE ( COUNTROWS ( VALUES ( Customer[LastName] ) ), Sales )
2.2. DISTINCTCOUNT with CROSSFILTER
The second measure calculates the unique number of last names using DISTINCTCOUNT. Again this is done on the customer table. This time, to make the filters reaching the Sales table impact the calculation, you can use CROSSFILTER and enable bi-directional filtering for the duration of the measure. The result is similar to the previous measure.
# Customers - DistinctCount + CrossFilter := CALCULATE ( DISTINCTCOUNT ( Customer[LastName] ), CROSSFILTER ( Customer[CustomerKey], sales[CustomerKey], BOTH ) )
2.3. DISTINCTCOUNT + Expanded Tables
# Customers - DISTINCTCOUNT + Expanded Table := CALCULATE ( DISTINCTCOUNT ( Customer[LastName] ), Sales )
The third measure is identical to the previous one, except it uses the concept of Expanded tables instead of CROSSFILTER.
2.4. COUNTROWS with SUMMARIZE
The fourth version uses SUMMARIZE. It summarizes the sales table, by the LastName found in the customer table. The base table for this function uses the Sales table, including all filters that reach Sales. Counting the amount of rows in this summarized table returns an identical result to previous measures.
# Customers - SUMMARIZE + Expanded Table := COUNTROWS ( SUMMARIZE ( Sales, Customer[LastName] ) )
2.5. SUMX + VALUES and Expanded Tables
The last version we discuss uses SUMX. First, VALUES generates the unique customer Last Names. Each row in this table is assigned the value 1. SUMX then sums up the unique Last Names by adding all the rows with a 1. Lastly, the expanded table is used as filter argument to ensure all relevant filters reach the customer table.
# Customers - SUMX + Expanded Table := CALCULATE ( SUMX ( VALUES ( Customer[LastName] ), 1 ), Sales )
Below table shows all these versions return the same result. The big question is, which version performs the quickest?
3. Testing Performance
Now let’s take a closer look at optimizing DISTINCTCOUNT. To test how each of these performs, you can use the Performance Analyzer in Power BI. The Performance Analyzer allows you to analyze how quick different visuals load. By putting each visual in a separate table, you can easily discern the quick from the slow measures.
After I put each measure in a separate table, I went to View -> Performance analyzer. Then after pressing Starting Recording and Refresh Visuals, the results rolled out.
Note that we are only interested in the DAX query. The items Visual Display and Other are also measured, but these have nothing to do with the optimization of our DAX code.
I got the following results for the DAX Query speed:
|1. COUNTROWS + Values with Expanded Tables||373 ms|
|2. DISTINCTCOUNT with CROSSFILTER||364 ms|
|3. DISTINCTCOUNT + Expanded Tables||375 ms|
|4. COUNTROWS with SUMMARIZE||19 ms|
|5. SUMX + VALUES and Expanded Tables||40 ms|
So which measure performs best? As this table shows version 4 that uses COUNTROWS with SUMMARIZE was quickest by far with 19 ms, being twice as fast as version 5 that uses SUMX. The first three measures approximately tied being almost 20 times slower than the quickest measure.
The performance of these measures may be different depending on the granularity of the columns you are using and dependent on where the columns come from. Do they come from the table you are counting in, or from a foreign table? A measure that may perform badly in this case, can outperform all the others in another scenario.
Obviously the requirements for this exercise were very specific so that either CROSSFILTER or an Expanded Table was needed as a filter argument. The resulting measures are at this point not very versatile as they are aimed towards a specific scenario. Nevertheless, these performance differences may also occur in more generic cases.
Hopefully, this post gave you some ideas on the performance benefits of optimizing DISTINCTCOUNT in DAX. Remember, always test the performance for the scenario you are using.
Press the link below for access to the files used in this article.