Written By Rick de Groot

Rick is the founder of BI Gorilla. He believes learning is one of life's greatest pleasures and shares his knowledge to help you improve your skills.

In this article, we’ll explore how to remove duplicates in Power Query. We’ll cover basic and advanced methods for removing duplicates, including how to remove duplicates from single and multiple columns, how to remove duplicates case-insensitively, and how to keep values based on a condition.

We’ll also delve into an inconsistency problem, often related to sort order, that can occur when removing duplicates and how to solve it. By the end of this article, you’ll have a thorough understanding of how to remove duplicates in Power Query.

Table of contents

Importance of Removing Duplicates

Eliminating duplicate rows is a critical step when cleaning your data in Power Query. Duplicates can cause problems in many situations. For example, when:

  • Performing a join: duplicates in your join table, result in duplicate rows in the destination table.
  • Creating a dimension table: you can’t make a one-to-many relationship with duplicates on the one side.
  • Inspecting Values: duplicates can be an indication that an earlier join had duplicate values.

To prevent this, we need an expression to check for duplicates and filter our dataset only to include unique values.

Basics for Removing Duplicates

So, how can you remove duplicates using the Power Query Editor? Under the hood, the operation uses the Table.Distinct function to remove duplicates. The syntax for the function is as follows:

Table.Distinct(
    table as table, 
    optional equationCriteria as any
   )

This function has two arguments: the first argument is the table to remove duplicates from, and the second argument is an optional equationCriteria that tells the function which columns to look at when removing duplicates. You can provide a column name, or instruct the function to respect or ignore capitals. If you omit the equationCriteria, the function will remove duplicates from the entire table.

Removing Duplicates from a Single Column

Let’s start with removing duplicates from a single column. This is one of the most basic methods for removing duplicates in Power Query, and it’s really easy to do.

To remove duplicates from a single column, you:

  • Right-click on the column that contains duplicates
  • Select the “Remove Duplicates” option.
  • Now click OK.

That’s it! Power Query will remove all duplicates from the selected column.

Remove Duplicates Single Column in Power Query

The remove duplicates operation makes use of the Table.Distinct function in Power Query. When performing the operation on a single-column table, the above operation will show:

Table.Distinct( Source )

This removes duplicates from the entire table. When your table has multiple columns, the code below removes the duplicates from only the “Fruit” column.

Table.Distinct( Source, { "Fruit" } )

But what if you have duplicate values across multiple columns? No problem.

Removing Duplicates based on Multiple Columns

The easiest way to remove duplicates based on multiple columns is:

  • Hold down the CTRL Key.
  • Select the columns you want to remove duplicates from.
  • Click on Remove Duplicates.

Now, Power Query will delete all rows with duplicated values across the selected columns. This operation focuses on returning unique combinations of values.

Remove Duplicates Multiple Column in Power Query
Table.Distinct(
     Source, 
     { "Brand", "Country"} 
)

Performing this operation still uses the Table.Distinct function. This time, the second argument contains both column names within a list.

Tip

If you could use a refresher on list functions, have a look at this article that delves into use-cases and syntax.

So you know how to remove duplicates from one or more columns, but what if you want to remove duplicates case-insensitive?

Removing Duplicates Case-Insensitive

Removing duplicates is normally performed in a case-insensitive way, it ignores case. However, there are scenarios where you want to ignore capitalizations. For instance, observe the following table.

Data with Duplicates in Power Query

The dataset is unique when respecting capitalization, yet has quite a few duplicates when you ignore the capitals. A regular remove duplicates operation (case sensitive) only removes a row for the value ‘Cedar‘. To achieve this, you can use Table.Distinct function and provide the column name “Wood” in the second argument. The result is a case-sensitive removal of duplicates as below.

Remove Duplicates Case Sensitive in Power Query

To remove duplicates in a manner that is case-insensitive, you can make use of the optional equation argument of Table.Distinct. It allows you to add a comparer function. To do that:

The result is a case-insensitive removal of duplicates.

Table.Distinct(
   Source, 
   { "Wood", Comparer.OrdinalIgnoreCase } 
)
Remove Duplicates Ignore Case in Power Query

But what if you want to apply this method to multiple columns?

Table.Distinct(
     Source, 
     { { "Wood", Comparer.OrdinalIgnoreCase },
       { "Area", Comparer.OrdinalIgnoreCase } } 
)
Remove Duplicates Cases Insensitive on Multiple Columns in Power Query

You can use the Table.Distinct function and provide the method to look for duplicates for every column. This allows you to specify whether to ignore capitalization (Comparer.OrdinalIgnoreCase) or respect capitalization (Comparer.Ordinal) for each column individually.

So there you have it – some basic methods for removing duplicates in Power Query. Whether you’re working with a single column, with multiple columns, or whether you check capital sensitive, Power Query makes it easy to get rid of the duplicates and get your data in tip-top shape.

Caution

Be aware that the methods shown so far don’t always return the same results. Let’s see why that is in the next section.

Inconsistent Results

Did you know that the remove duplicates operation in Power Query often returns inconsistent results? It might sound strange, but it’s actually a side effect of certain optimizations that Power Query uses to be as efficient as possible.

One of these optimizations is Query Folding, which sends certain operations to the data source to be processed instead of in Power Query. While this can help make things faster and more efficient, it can also cause problems when it comes to removing duplicates. Without using additional measures, there’s no guarantee that removing duplicates returns the same records. That means you can’t assume that removing duplicates will return only the first duplicate row in your dataset.

But there’s a simple solution to this randomness problem. To make the remove duplicates operation predictable, you can use Table.Buffer. This function forces Power Query to cache the data in memory, which means that you can control which records are returned when removing duplicates.

So, if you want to ensure that you’re getting consistent results when removing duplicates, be sure to use Table.Buffer. We’ll dive deeper into this topic and explore advanced methods in the next chapter.

Advanced Techniques for Removing Duplicates

Sometimes, removing duplicates isn’t as straightforward as it seems, and you may need more advanced methods to get the job done. That’s where this chapter comes in – we’re going to explore some advanced methods for removing duplicates in Power Query.

In this section, we’ll dive into how you can remove duplicates based on a condition. We’ll look at retrieving the first or last value, the earliest or latest value and the lowest or highest value.

Keep First or Last Value

First, let’s talk about how to delete duplicates and keep the first or last value in your dataset. By default, Table.Distinct returns the first duplicate row it finds. But as we learned in the previous chapter, the sorting order of your table isn’t guaranteed, which can cause issues.

To ensure that we’re keeping the first or last value, we need to sort our data and make sure it stays sorted.

To remove duplicates and retrieve the first value in a dataset, you:

  • Sort your data
  • Select columns to remove duplicates from
  • Wrap the data in Table.Buffer
  • Remove duplicates

We can do that with the following dataset:

Remove Duplicates with Table.Buffer to Guarantee Sort Order in Power Query

Prepare your data by sorting the table and wrapping it in Table.Buffer. You can then select the Category column and click Remove duplicates, returning:

Removing Duplicates and Return First Value in Power Query

You will end up removing duplicates but keeping the first instance within your category. So to return the first value when removing duplicates, you sort your code with:

Table.Buffer( 
    Table.Sort( Dataset,
                { {"Category",    Order.Ascending },
                  {"Description", Order.Ascending } } ) )

If you want to keep the last value instead, you can simply sort your data in descending order by the column you want to keep the last value of. Then, wrap your data in Table.Buffer, and remove duplicates.

Table.Buffer( 
    Table.Sort( Dataset,
                { {"Category",    Order.Ascending },
                  {"Description", Order.Descending } } ) )

Keep Earliest or Latest Value

In case you want to return the earliest or the latest value, you can perform something similar. To return unique rows for the category with the latest date, you:

  1. Sort your data by Category and in a Descending order for your Date Column.
  2. Remove Duplicates from the relevant column(s).
Remove Duplicates and Return Latest Row
// Sorts and buffers data, so removing duplicates returns the latest category row
Table.Buffer( 
    Table.Sort( Dataset,
                { {"Category",    Order.Ascending }, 
                  {"Date",        Order.Descending } } ) )

// Sorts and buffers data, so removing duplicates returns the earliest category row
Table.Buffer( 
    Table.Sort( Dataset,
                { {"Category",    Order.Ascending }, 
                  {"Date",        Order.Ascending } } ) )

Keep Lowest or Highest Value

The last advanced method returns the lowest or highest value within a group. We could make use of the above method to achieve this. Yet, since I’m certain you will know how to sort your data in a similar way, let’s explore an alternative.

To remove duplicates and return the lowest value, you can:

  1. Group your data.
  2. Perform an All Rows operation.
  3. Adjust the M-code to return the row with the lowest value.
  4. Expand the record column to get your result.

Let’s see how that can work. The Group By operation returns only unique rows for the columns selected. To group your data, go to the Home tab, and select Group By.

Group Your Data to Remove Duplicates in Power Query

To keep the lowest value for each Category, you can group your data by Category and perform an All Rows operation. This creates a table object with all summarized rows for each Category. This produces below code:

Group Rows and use All Rows operation in Power Query

We’re going to change the formula above.

Tip

A useful function is Table.Min. The Table.Min function returns the table row that contains the lowest value of a column. In other words, you can delete duplicate rows based on the column value.

To return the lowest value, you can adjust the code like in the below picture. Then, expand the record column by selecting the arrows in the column header.

Return Row with Lowest Value using Table.Group in Power Query

After clicking OK, you’ll end up with the desired result:

Rows returning the Lowest Value using Table.Group

To instead keep the highest value when removing duplicates, you can simply swap out Table.Min for Table.Max. That means you either use:

// Sorts and buffers data, so removing duplicates returns the latest category row
Table.Group( Dataset, 
    {"Category"}, 
    {{"Details", each Table.Min( _, "Amount" ) , type record }} )

// Sorts and buffers data, so removing duplicates returns the earliest category row
Table.Group( Dataset, 
    {"Category"}, 
    {{"Details", each Table.Max( _, "Amount" ) , type record }} )

You can learn more about the Group By Operation or also read into the All Rows operation and how it works.

Conclusion

And there, you have everything you need to know about removing duplicates in Power Query. Whether you’re dealing with a single column or multiple columns, case-sensitive or case-insensitive duplicates, or advanced methods for keeping the first, last, earliest, latest, lowest or highest values, Power Query has you covered.

Remember, removing duplicates is an important step in cleaning your data. By removing duplicates, you can avoid issues with joins, dimension tables, and prepare your data for your data model.

Just be aware of the randomness problem that can occur with the Table.Distinct function, and use Table.Buffer to ensure consistent results.

Frequently Asked Questions

The most common operations to remove duplicates are by using the Remove Duplicates operation using Table.Distinct or by using a Group By operation using Table.Group.

Power Query always keeps the first value it finds. However, certain optimizations make it that the sort order of your table is not guaranteed. To ensure proper sort order, consider buffering your table using Table.Buffer.

Yes. Since Power Query keeps the first record it finds, you can sort your data before removing duplicates. With the right sorting, you can retrieve the first, last, latest, earlier, highest or lowest value.

Share this post:
  1. I have some dataset like some row data begin with KR-1234-5678-9012 and some row data begins with 12345, EDFCH127HGYU. There are some duplicacy only in KR no which I want to remove. Please tell me power query function to do this task.

    Reply
  2. Sorry, ondanks deze zeer duidelijke uitleg kom ik er echt niet uit. Ik weet dat het echt iets kleins moet zijn. Wat is de syntax van het commando als ik wil dat mijn hele tabel ontdubbeld wordt?

    Reply
  3. Rick sometimes I get into problems that I don’t know how to solve in Power BI/Power Query and I find articles that are not so clear or well constructed, this article is detailed and the examples are so great! Thank you so much I really appreciate the help this article gave me

    Reply
  4. If you ever put a reference book together for all your PQ knowledge – I’ll buy it! Thanks, this cleared up multiple issues. Cheers, Emile

    Reply
  5. Thanks Rick! I used the max option successfully!

    However, is there a way to use a condition instead of the max/min? For example keep rows where [column] is not 0 ?

    Reply
      • I was wondering if you can use Table.StopFolding() to keep the sort when removing duplicates. I guess buffering a huge table can hurt performance. What do you think?

        Reply
        • Unfortunately, you can’t. Table.StopFolding is purely meant to stop the engine from transforming your steps into SQL. Table.Buffer actually saves the values into memory so it remains constant for your remove duplicates operation.

          Reply
  6. Hey Rick, that was an absolutely amazing article on Duplicates. There is small typo for Table.Min which you mentioned to return highest value instead of lowest value.

    Reply
  7. Thanks Rick, another great article. I have done advanced duplicate filtering with the group table function and now I know that I could have obtained the same with the distinct table function with improved performance.

    Reply
    • Thanks Stijn. I’ve not tested performance difference between the grouping and Table.Buffer. If you learn more, I’d love to hear more about it. Cheers!

      Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.