This post shows how to perform a cross join between tables in Power Query. There are cases where you want to return all possible combinations between two tables. Doing this work manually takes a lot of time, but more importantly, will require time from you again and again.
I will teach you an efficient way to generate unique combinations from multiple columns or lists. You will find two methods to produce all possible item combinations from several columns.
Table of contents
You can find the video tutorial below, or if you would rather read it, then continue below. So, what is a crossjoin?
1. What is a Crossjoin?
A cross join is a join type that returns the Cartesian product of rows from the tables in the join. It’s when you combine all the rows from one table with all the rows from another table. You can think of it like a big multiplication table.
There are a few situations where a crossjoin can be super useful. Some situations where a crossjoin can be useful are:
- You prepare your dataset for certain look-ups for budgeting templates. You need all combinations of products and locations.
- You want to prepare a row-level security list. For admins, you want to repeat their email for each of the Properties they should have access to.
- When you want to create a big data set for testing or training a cross-join easily creates a large quantity of data
- One is when you want to create a big list of all possible combinations of two things.
Enough theory; how can you actually perform a crossjoin in Power Query?
2. Sample Dataset
This example uses the following data:
In order, the table names are CountriesCurrencies, BusinessUnits and Metrics. To start out with, get all data into Power Query by repeating the below process for each table:
- select the table -> go to the data tab -> click From Table / Range
- click close & load dropdown -> select close & load to… -> choose only create connection
Repeat the previous steps for all three tables. You should end up with below three queries.
As all your data is in Power Query, the next step is to find all unique combinations of the columns. In SQL, you would do a so-called ‘CROSS JOIN’.
This results in a table containing all records from both tables, whether the join matches or not. So how do you perform a crossjoin?
3. Performing a Crossjoin
There are two easy methods to perform a crossjoin in Power Query. The first method involves helper columns, and the second method pulls the data in straight away.
3.1. Method 1: Using a Helper Column (faster)
The crossjoin we will perform is done using two tables. The first method builds heavily on a column we add that contains a single value. Note that this method is also the fastest method of the 2, but involves a few steps.
Start out by creating a new column in both your tables. This is also the column you will perform a join on later.
Do the following:
- go to the tab add column -> click Custom Column
- name the column Join and fill in = 1 as a custom formula
The value used here is a dummy value; you can use any other value you like. Repeat these steps for the other two queries.
After performing these steps, you end up with a join column at the end of each query. For this example, I first select the Query CountriesCurrencies. To get the data of the other queries, you can perform a merge.
- Go to the tab Home -> select Merge Queries
- Merge the query CountriesCurrencies with BusinessUnits
- Select the column Join as a merge column for both queries
- For the merge, use join kind Full Outer (all rows from both) -> press OK
This will add a new column containing a table object. Perform these steps again, but this time merge the table CountriesCurrencies with Metrics.
The result is two newly added columns containing Table objects. The table objects contain the data we just merged. What’s now left is to reveal this data. For both columns BusinessUnits and Metrics:
- Expand the two arrows in the column header -> deselect column Join
- Deselect the checkbox saying ‘Use original column name as prefix’ -> click OK
The result of these steps is that all the combinations of data have appeared in the query. What’s left is:
- Delete the column Join as we don’t need it anymore.
The end result is a table with all possible combinations of data. Yet there is an easier way to do this: taking less steps.
2.2. Method 2: Formula (easier)
The previous method to produce all possible combinations was my first approach. It’s very effective but does take a few steps to perform and includes unnecessary helper columns. You can achieve the same using a formula. Let’s assume we have all our tables in Power Query, without the Join helper columns. To get all combinations using a formula, do the following:
- go to the tab add column -> click Custom Column
- name the column MergeBusinessUnits
- As a custom column formula, fill in =BusinessUnits -> press OK
This will add the below column:
What this formula does is, it looks for a query called ‘BusinessUnits’ and returns all of its rows in a Table object. In other words, it performs a Full Outer Join with the BusinessUnit query, but without using a helper column.
Perform the previous steps again, but this time fill in Column Name MergeMetrics and Formula =Metrics. For both new columns:
- Expand the two arrows in the column headers of the new columns
- Deselect the checkbox saying ‘Use original column name as prefix’. -> click OK
- Change column types to Text for both newly added columns
And again, you end up with the same table, with 64 rows.
You may feel you can handle creating 64 combinations by hand. Yet, imagine doing something similar when you replace the Business Units, with Brands. You may be working with over 100 brands. This would mean 100 * 4 * 4 = 1600 unique combinations! You better think twice before doing that by hand.
This example created all unique combinations of 4 columns. It included the variables for CountriesCurrencies, Business Units and Metrics. As each table contained 4 variables, it should have 4 * 4 * 4 = 64 unique combinations.
As always, it’s good to know different approaches to handling a situation. The first method involved using a helper column and inputting a single value. Therefore, this method is the faster performing one of the two and is preferred.
The second method involves referencing the table name in a custom column. This method is easier but also slower.
Lastly, if you know other ways to produce unique combinations in Power Query, I would love to read about them in the comments.
Enjoy Power Query!