Written By Rick de Groot

Rick is the founder of BI Gorilla. He believes learning is one of life's greatest pleasures and shares his knowledge to help you improve your skills.

Optimizing the performance of distinctcount

A common operation in Power BI is calculating the Distinct Count of a column. You may, for example, want to show the number of unique customers in your report, while the sales table contains duplicates.

Most users will then go ahead and use the built-in DISTINCTCOUNT function to calculate this. But often, this is not the best-performing function for the job. Optimizing DISTINCTCOUNT can improve your calculation speeds by over 20 times!

In this blog post, I share with you 5 different ways to calculate a distinct count in DAX. You can then track the best-performing version of your report.

Table of contents

1. Introduction

You are working with a simple data model. The model has a single fact table that contains the Sales and connected to it are the dimensions Date, Customer and Product.

Optimizing DISTINCTCOUNT Data Model

To make this exercise more interesting, we will try to calculate something that’s not very straightforward.

Let’s pretend you would like to calculate how many unique last names can be found in the dataset, and you want to split this by the colour of products customers bought and by month. It does not make sense, but for the sake of the exercise, let’s stick with it.

The implications of calculating this are:

  • The Product Color does not have a direct relationship with the Customer table. A filter on Product can only reach the Customer table through the Sales Table. By default, however, any filter on the sales table does not reach the Customer table because of its many-to-one relationship.
  • The Month Number is derived from the Calendar table, which is related to the sales table. Filters on the month number don’t reach the Customer table because of the many-to-one relationship between Sales and Customers.
  • The Customer table is connected with the Sales Table through the CustomerKey. However, since we are calculating unique Last Names, there may be duplicates in the data. Splitting the Customer Name by Month and Colour will have to work against the one-to-many relationship with sales.

To get the desired results, you will have to use DAX to propagate filters from the Product and Calendar Table to the Customer table. We will discuss five variations, and then have a look at their performance.

2. Versions of Distinct Count

2.1. COUNTROWS + Values with Expanded Tables

The first version generates unique customer LastNames using VALUES( Customer[LastName] ). COUNTROWS then counts the number of last names.

These two functions together only base their result on the customer table. The filters coming from Calendar (month/year) or from the Product table (colours) don’t reach here.

This is solved by adding the expanded Sales table as a filter argument in CALCULATE. It forces CALCULATE to take into account all filters that reach the Sales table.

# Customers v1 :=
CALCULATE (
    COUNTROWS (
        VALUES ( Customer[LastName] )
    ),
    Sales
)

2.2. DISTINCTCOUNT with CROSSFILTER

The second measure calculates the unique number of last names using DISTINCTCOUNT. Again, this is done on the customer table.

This time, to make the filters reaching the Sales table impact the calculation, you can use CROSSFILTER and enable bi-directional filtering for the duration of the measure. The result is similar to the previous measure.

# Customers - DistinctCount + CrossFilter :=
CALCULATE (
    DISTINCTCOUNT ( Customer[LastName] ),
    CROSSFILTER ( Customer[CustomerKey], sales[CustomerKey], BOTH )
)

2.3. DISTINCTCOUNT + Expanded Tables

# Customers - DISTINCTCOUNT + Expanded Table :=
CALCULATE ( DISTINCTCOUNT ( Customer[LastName] ), Sales )

The third measure is identical to the previous one, except it uses the concept of Expanded tables instead of CROSSFILTER.

2.4. COUNTROWS with SUMMARIZE

The fourth version uses SUMMARIZE. It summarizes the sales table, by the LastName found in the customer table.

The base table for this function uses the Sales table, including all filters that reach Sales. Counting the number of rows in this summarized table returns an identical result to previous measures.

# Customers - SUMMARIZE + Expanded Table :=
COUNTROWS (
    SUMMARIZE (
        Sales,
        Customer[LastName]
    )
)

2.5. SUMX + VALUES and Expanded Tables

The last version we discuss uses SUMX. First, VALUES generates the unique customer Last Names. Each row in this table is assigned the value 1. SUMX then sums up the unique Last Names by adding all the rows with a 1.

Lastly, the expanded table is used as a filter argument to ensure all relevant filters reach the customer table.

# Customers - SUMX + Expanded Table :=
CALCULATE (
    SUMX (
        VALUES ( Customer[LastName] ),
        1
    ),
    Sales
)

The table below shows all these versions return the same result. The big question is, which version performs the quickest?

DISTINCTCOUNT Versions

3. Testing Performance

Now, let’s take a closer look at optimizing DISTINCTCOUNT. To test how each of these performs, you can use the Performance Analyzer in Power BI.

The Performance Analyzer allows you to analyze how quickly different visuals load. By putting each visual in a separate table, you can easily discern the quick from the slow measures.

After I put each measure in a separate table, I went to View -> Performance Analyzer. Then, after pressing Starting Recording and Refresh Visuals, the results rolled out.

Note that we are only interested in the DAX query. The items Visual Display and Other are also measured, but these have nothing to do with the optimization of our DAX code.

I got the following results for the DAX Query speed:

MeasureDAX Query
1. COUNTROWS + Values with Expanded Tables373 ms
2. DISTINCTCOUNT with CROSSFILTER364 ms
3. DISTINCTCOUNT + Expanded Tables375 ms
4. COUNTROWS with SUMMARIZE19 ms
5. SUMX + VALUES and Expanded Tables40 ms

4. Results

So which measure performs best? As this table shows, version 4, which uses COUNTROWS with SUMMARIZE was quickest by far, with 19 ms, being twice as fast as version 5, which uses SUMX. The first three measures are approximately tied, being almost 20 times slower than the quickest measure.

The performance of these measures may be different depending on the granularity of the columns you are using and dependent on where the columns come from.

Do they come from the table you are counting in, or from a foreign table? A measure that may perform badly in this case, can outperform all the others in another scenario.

Conclusion

Obviously, the requirements for this exercise were very specific, so either CROSSFILTER or an Expanded Table was needed as a filter argument. The resulting measures are, at this point, not very versatile as they are aimed towards a specific scenario. Nevertheless, these performance differences may also occur in more generic cases.

Hopefully, this post gave you some ideas on the performance benefits of optimizing DISTINCTCOUNT in DAX. Remember, always test the performance for the scenario you are using.

Download

Press the button below for access to the files used in this article.

Share this post:
  1. Hello,
    Just wanted to thank you very much for this post.
    I have been searching for a solution for a way too slow DISTINCTCOUNT on my report, and gladly I have found it in your post 🙂
    In my case, the SUMX + VALUES did the trick. it has improved the speed of calculation dramatically.
    (I wonder what makes such a big difference between the speeds of different methods? Or, why is DISTINCTCOUNT relatively slow in some cases?

    Best regards,

    Reply
  2. I’ve had an requirement calculation of average accumulated value per customer showing as following:

    
    VAR TAB =
        SUMMARIZE (
            FILTER ( 'Fact – Customer Monthly Status', [Customer Status Filter] = "Y" ),
            'Fact – Customer Monthly Status'[CUSTOMER_ID],
            "AVG TRA",
                CALCULATE (
                    DIVIDE (
                        SUM ( 'Fact – Customer Monthly Status'[TOTAL TRADING VALUE] ),
                        COUNTROWS (
                            SUMMARIZE (
                                'Fact – Customer Monthly Status',
                                'Fact – Customer Monthly Status'[DATE_KEY]
                            )
                        ),
                        0
                    ),
                    KEEPFILTERS ( 'Fact – Customer Monthly Status'[TOTAL TRADING VALUE] > 0 ),
                    ALLEXCEPT (
                        'Fact – Customer Monthly Status',
                        'Fact – Customer Monthly Status'[CUSTOMER_ID]
                    ),
                    'Fact – Customer Monthly Status'[DATE_KEY]
                        <= MAX ( 'Fact – Customer Monthly Status'[DATE_KEY] )
                )
        )
    RETURN
        AVERAGEX ( TAB, [AVG TRA] )

    It definitely works exactly as I want, but there’s a problem with its performance. Anyone can help tuning this or suggest alternative solution?

    Thanks so much!!!

    Reply
    • Hey DD,

      I know this is late, but thought I’d leave some ideas anyway:

      Some pointers here.

      1. Instead of doing a SUMMARIZE over a FILTER, you could try to SUMMARIZE the base table, and use CALCULATETABLE to filter it down to just [Customer Status Filter] = “Y”.

      2. Performing a CALCULATE within SUMMARIZE will cause many context transitions.

      3. On top of that, it’s best practice to add new columns through ADDCOLUMNS. The creation of additional columns in SUMMARIZE can have undesired side-effects as blogged about by SQLBI: https://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/

      4. For each customer you are doing a complex calculation. Is there a way to narrow down your data outside of summarize?

      Cheers,
      Rick

      Reply
  3. Thanks for this helpful write-up!! I’ve seen this topic beaten around a lot but this article gets directly to the point.

    #4 beat #5 by a tiny bit for me… same as yours.

    Reply
  4. Hey Rick,

    I was super excited to try these other methods, but I didn’t get the same results as you. I tried method 4 and 5, and 4 gave me the same execution time as a simple distinct count, and 5 was 10 times slower! (from 5 secs to 55 secs, huge data model)

    So yet again, depends a lot on your data model I think :(. Definitely worth trying though! Thanks!

    Reply
    • Hi Jaap,

      What kind of calculation are you performing? Performance is very dependent on the data model you use. The regular DISTINTCOUNT is very versatile for most situations. Often more complex calculations are worth optimizing and can result in great time savings. Enjoy DAX!

      Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.