This article delves into the Left Semi Join and Right Semi Join in Power Query. Power Query has long supported 6 types of joins to combine data from different tables: the Left and Right Outer Join, the Inner Join and Full Outer Join, and their anti-join counterparts—the Left Anti and Right Anti Join. These joins help merge or compare tables based on key columns.
With the April release of Power BI Desktop, two new join types were introduced: Left Semi Join and Right Semi Join. Unlike the standard joins, these semi-joins are inaccessible through Power Query’s user interface. However, you can manually specify them as enumeration for the Table.Join or Table.NestedJoin functions. Let’s find out how.
Left Semi Join Behavior
The Left Semi Join in Power Query tests whether rows from the left table have matching values in the right table. It returns only the rows from the left table where a match is found in the right table. It acts as a filter, ensuring that only the relevant rows from the left table—those with corresponding values in the right table—are included in the result. Unlike an Inner Join, the Left Semi Join does not allow users to pull in any data from the right table. To make use of it you can manually provide the JoinKind.LeftSemi enumeration to your join function.
This distinction becomes especially clear when using the Table.NestedJoin function. When performing a nested join, the operation creates a table object that you can expand into new columns. Yet, if you attempt to expand a column from the right table after performing a Left Semi Join, you’ll notice that the expanded columns will always return null
values. This is because the Left Semi Join is purely about filtering the left table and does not retain or return any data from the right table itself.
Example of the Left Semi Join
For instance, suppose you’re working with a factOutage
table that looks as follows:
This subset contains a number of Dates, StoreKeys and OutageKeys. In your dimStore
you have stored different customers. Here’s what those look like:
For you’re analysis you only want to work with data for these two stores. One option is to perform an inner join on the factOutage
table to keep rows from the dimStore
table.
Yet, you could achieve the same by using the Left Semi Join as follows:
Table.NestedJoin(
Source,
{"StoreKey"},
DimStore,
{"StoreKey"},
"DimStore.1",
JoinKind.LeftSemi
)
This code compares the StoreKey columns of the factOutage and DimStore tables and only keeps those existing in both tables. The result of this operation is the following table:
Now, what happens when we expand the nested table object? With other join types (inner, full outer, or anti joins), you can retrieve values from the other table. Yet, that’s not the case with the Left and Right Semi joins. Here’s what happens when expand the table object:
While applying the Table.ExpandTableColumn operation is completely valid, it returns null values for all columns.
SQL: Left Semi Join vs Left Outer Join
The great thing about these joins is that they fold to SQL-based sources. So if you’re more into SQL, how does the Left Semi Join compare to a Left Outer join? Let’s see how the code compares.
The generated SQL code for a Left Semi Join looks something like:
SELECT
A.StoreKey
FROM
factOutage A
WHERE EXISTS (
SELECT 1
FROM dimStore B
WHERE A.StoreKey = B.StoreKey
);
Here the query checks filters factSales
so that it only has StoreKeys that ‘exist’ in the dimStore
table.
The Left Outer Join, on the other hand, uses code like:
SELECT
A.StoreKey
,B.StoreKey
FROM
factOutage A
LEFT OUTER JOIN
dimStore B
ON A.StoreKey = B.StoreKey;
A similar match is made, only this time the query allows you to retrieve fields from the StoreKey table if needed.
The key differences between these two are:
Category | Left Outer Join | Left Semi Join |
---|---|---|
Data Returned | All rows from the left table are returned, along with matching data (or NULL) from the right table. | Only rows from the left table with a matching row in the right table are returned. |
Columns Returned | Returns columns from both tables (or NULL if no match). | Returns only columns from the left table, and null for any others that you would expand. |
Performance | May perform slower than a left semi-join because it retrieves data from both tables and returns all rows from the left table, even those without a match. | Often faster because it only checks for the existence of matching rows in the right table without retrieving any columns from it. Databases often better optimize a simple EXISTS clause. |
Right Semi Join Behavior
A Right Semi Join returns only the rows from the right table (Table B) that have a matching row in the left table (Table A). Like the Left Semi Join, it does not include columns from the other table (Table A in this case), only checking for the existence of a match.
The outcome of this operation is that you get rows from the right table that have corresponding matches in the left table, without retrieving columns from the left table.
Example of the Right Semi Join
Let’s look at an example of how this works. Again, suppose you’re working with below factOutage
table:
If you use a regular merge operation that uses the Table.NestedJoin function and then manually insert the JoinKind.RightSemi enumeration, this will generate the following code:
Table.NestedJoin(
Source,
{"StoreKey"},
DimStore,
{"StoreKey"},
"DimStore.1",
JoinKind.RightSemi
)
The outcome of this operation is below table:
Notice here that:
- The left table now only has a single row available with null values.
- The right table shows all available rows in the dimStore table.
This indicates that the join disregards any values it had in the main table and can only returns values from the right table.
Conclusion
In summary, semi joins—both left and right—are very similar to inner joins in that they only return rows where a match exists between the tables. However, the key difference is that semi joins do not return any columns from the joined table, only from the primary table. This behavior may lead to better performance, especially for SQL-based sources, as the semi join simply checks for the existence of matching rows without retrieving unnecessary data. This makes semi joins a useful option for filtering data based on matches without needing additional columns from the other table.
Recommended Reading…
Power Query Foundations
Beginners Guide to Power Query
List Functions (200+ Examples)
Text Functions (150+ Examples)
Creating Tables (40+ Examples)
Generate the Ultimate Date Table
Advanced Topics
Master List.Generate with Easy Examples
Deep Dive into List.Accumulate
Thank you so much for explaining these two new kinds of join. You’re the most prolific author when it comes to M Language. Your MVP from Microsoft should come with M squared!
Dear Rik,
Is there a performance benefit if left/right semi joins are used with non-folding sources like Excel or csv files?
Kind regards,
J.
Hi,
I haven’t tested this yet. I have a feeling there is no benefit, but some testing is still needed.
Rick
Thanks.
I suppose I can always just use left inner join and delete the columns from the right table if I need this without breaking into M-Code or SQL.
But I’m grateful for having the topic explained.
You could always do an inner join and delete unnecessary columns. But as said, there might be a slight performance benefit when working with SQL based data sources. In the end it’s up to you 🙂