Looking to dive into the world of Power Query? You’re in the right place! Power Query is an ETL tool for data extraction, loading and transformation. It connects to data, imports data and loads it into a destination.
If you’re tired of performing repetitive data transformation tasks or struggling to organize and analyze your data, Power Query is the solution you’ve been looking for. It has a user-friendly interface that allows you to perform data transformation in a visual way and saves your transformations for the next time you come back. This means you can work smarter, and spend your time analyzing your data, not wrangling it.
In this beginner’s guide to Power Query, we’ll take you through the basics of what Power Query is, how it works, and how it can help you transform and analyze your data. By the end of this article, you’ll have a good grasp of Power Query and how it can make your life easier. Let’s get started!
Table of contents
- 1. Introduction to Power Query
- 2. Power Query Components
- 3. Transformation Examples
- 4. Advanced Editor
- 5. Saving your Work
- 6. Conclusion
1. Introduction to Power Query
So what is Power Query in Power BI and Excel, what can it do, and what are its benefits and limitations?
1.1. What is Power Query?
Power Query is a powerful tool from Microsoft used for transforming and preparing data. It allows you to easily connect to and transform data from a variety of sources.
With Power Query, you can make your data clean and organized with just a few clicks. You don’t need to be a computer expert or write a lot of code. To help you transform your data, Power Query uses an intuitive, graphical interface. With every transformation, you see exactly what happens.
In the Power BI ecosystem, the role of Power Query is to perform the extract, transform and load (ETL) of your data. This makes sure the data is in the right shape for your data model.
So what can you do with it?
1.2. Power Query Capabilities
You can accomplish a wide range of data transformation tasks with Power Query. Here are some of the things you can do with Power Query:
- Connecting to data sources: Power Query makes it easy to connect to data sources, including databases, text files, spreadsheets, and cloud services. Once you connect to your data source, you can preview the data, select what you want to use, and transform it as needed.
- Transforming and cleaning data: Power Query includes a range of built-in transformation functions that allow you to quickly clean, reshape, and transform your data. This means you can remove extra spaces, change formats, remove duplicates and more. You can also write custom functions using the Power Query M formula language.
- Combining and appending data: Power Query lets you combine data from different sources to make one big unified dataset. You can merge tables based on matching columns, or append tables vertically to create a larger data set. This even works to combine multiple files saved within a folder.
- Aggregating data: You can use Power Query to group and summarize data, creating aggregate statistics like counts, sums, averages, and more. This is also useful for summarizing a very large dataset that is otherwise too big to import into Excel.
- Adding columns: With Power Query, you can add new columns to your data set, and populate them with calculated values based on other columns in the data set. This makes it easy to create new metrics and calculations that are based on your existing data.
In summary, Power Query can help you work with data in many different ways. So why would you choose Power Query?
1.3. Benefits of Power Query
Power Query has many benefits that can make working with data much easier. Here are some of the benefits of using Power Query:
- Easy to use: Power Query is user-friendly, and you don’t need to be an expert in coding to use it. The visual interface makes it easy to see the result of your data transformation.
- Reshape Data Easily: Power Query allows you to transform, clean, and combine data from multiple sources all in one query. You can see the result of each of your transformations visually.
- Data Remains Intact: Power Query builds its logic by referencing your source data, so your original data is not modified. You can always go back to the original data if needed.
- Automate your Data Transformation: With Power Query, you can automate recurring data transformation. You can create a query once and apply it to new data whenever you need to, which saves you time and effort.
- Connect to Data Sources: Power Query has connectors to many data sources, including databases, web pages, cloud services, and more. This means you can easily import data from a variety of sources.
- Custom Functions: When a function is missing or you want to automate multiple steps, you can create custom functions. You can easily save your logic and use it on multiple queries. Each time you need it, you simply call your function.
- Pricing: Power Query is available for free with Excel and Power BI. This means that you don’t need to pay extra for this powerful data transformation tool.
1.4. Limitations of Power Query
While Power Query has many benefits, there are also some limitations to be aware of. Here are some of the limitations:
- Storing Data: Power Query does not store data physically like a database. This means you will need to load the data into memory each time you use it. You can output the transformed data to a destination, but generally, a refresh will start from scratch.
- Availability: Power Query is not available in every tool, which means that you may need to use a different data transformation tool depending on your needs. A language like SQL is supported in many more tools compared to the Power Query M language.
- Speed: Power Query can be slow when you have many queries in your file, which can lead to frustrating wait times when working with complex datasets.
- Connectors: Power Query has a high number of native connectors but it can’t connect to all data sources. If one is missing, you may need to develop a custom connector.
1.5. Where To Find Power Query
You can find Power Query across several Microsoft products. It is currently available in two forms.
- Power Query Online: Integrated in places like Power BI Dataflows, Azure Data Factory, Dataverse and many other online web experiences.
- Power Query for Desktop: Used in Microsoft Excel and Power BI Desktop.
So where can you find Power Query in Power BI and Excel (the Desktop experience)?
In Power BI you can find Power Query through the Home tab in the ribbon. You can get there in two ways. Either you:
- Connect to a Data Source: Click the Get Data drop-down. Then select the Data source you want to connect to. After going through the wizard, you now end up in the Power Query Editor.
- Open Power Query directly: You can click the Transform Data table icon to open Power Query. Or alternatively press theTransform Data dropdown and press Transform Data.
Another common place to use Power Query is in Microsoft Excel. Where can you find Power Query in Excel?
To open Power Query in Excel you first navigate to the Data tab, click Get Data and press Launch Power Query Editor.
Alternatively, if you want to connect to a data source directly, you can also select the relevant data source from the same dropdown menu.
1.6. How Power Query Works
Power Query works by first connecting to a data source. You can then extract the required data. Once extracted, you can perform various operations to transform the data. Actions such as filtering, sorting, grouping, merging, etc.
In the last stage, you load the transformed data into a destination like:
- an Excel worksheet,
- the Data Model in Power BI or Power Pivot
- a Dataflow in the Power BI service
- a Sink in Azure Data Factory
So how do the different components in Power Query work together?
1.7. M Formula Language
To perform transformations Power Query uses a query language called M. The M refers to Mashup language and is designed with the idea of mixing data sources together. The M language is case-sensitive and uses rather descriptive function names.
Power Query has a graphical user interface through which you can perform a lot of operations. When you press a button, an action is performed and the corresponding M-code generates automatically. The more advanced users can also write custom code manually through the formula bar or advanced editor. This allows the user to fulfil even the most complex requirements.
2. Power Query Components
Power Query consists of different components. At the heart of Power Query is the Power Query Editor, where you will spend most of your time transforming data. Next, we’ll delve into the three most important Ribbon tabs that you will use for data transformations. Finally, we will discuss how to connect to different data sources.
2.1. The Power Query Editor
The Power Query Editor is where you do most of your data transformation work. It consists of 7 fundamental components that help you when transforming your data.
- Quick Access Toolbar: A customizable toolbar at the top of the Power Query Editor that gives access to your favourite commands. You can save your favourite operations and access them with a keyboard shortcut.
- The Ribbon: A set of tabs that contains a variety of tools and options for working with data in Power Query. The Ribbon is organized by functional area. You will find the most important prebuilt transformation functions in the Home, Transform and Add Column section.
- Query Pane: A window on the left-hand side of the editor that displays the list of queries in the Power Query Editor. You can use the Query Pane to organize your queries in folders, rename them, add new ones or delete existing ones.
- Formula Bar: A text box located at the top of the editor that displays the M code for the selected query step. You can use the Formula Bar to review the UI-generated formulas, learn their syntax or edit the M code directly. In case the formula bar is invisible, make sure to turn it on in the View tab of the Ribbon.
- Data Preview: A grid that displays a preview of the data in the current query. You can use the Data Preview to view and verify the results of your transformations and ensure your data is clean and properly formatted. You can click the column’s drop-down arrow to rename, filter or sort your columns. Lastly, there are options to perform actions on the entire table by clicking the Table icon in the top-left corner.
- Properties: Part of the Query Settings pane that displays the Query Name.
- Applied Steps: A list that displays the sequence of transformations that have been applied to the data in the current query. You can use the Applied Steps to view and modify the transformations, or to reorder them if needed.
2.2. Important Ribbon Tabs
Power Query comes with different ribbon tabs. For your data transformations, you will mostly use the Home, Transform, and Add Column tabs.
Each of these three ribbons is designed with a purpose:
|Home Tab||The Home Tab contains the most frequently used functions and provides access to data sources. You can use this tab to create parameters, refresh your data and combine tables. But also sort data, apply filters, change data types, and join tables.|
|Transform Tab||Operations in the Transform tab take place on existing columns. Here you can perform operations such as (un)pivot columns, split columns, and merge queries. The Transform tab is also the place where you will find data cleaning options such as the ability to replace values, extract specific parts of text, and format data. Lastly you can also transform columns into a different data type or call a Python or R script.|
|Add Column||The Add Column tab is where you can add new columns to your data set. It lets you create your own formulas in a custom column, call a function or use AI to generate a column for you. Here you can also create columns by extracting parts of existing columns or by using context (data type) specific operations.|
2.3. Connect to Data Sources
To get started with Power Query, you need first to create a connection to the data source that you want to import data from. You can connect to databases, spreadsheets, text files, webpages, etc., as well as access public datasets located on cloud storage services such as Google Drive and Dropbox.
To connect to a data source you:
- Navigate to the Home Tab.
- Click the drop-down at New Source.
- Select the desired data source.
There are just a few options available in the drop-down. If you’re looking for a different source, make sure to click More at the bottom. There are hundreds of connectors hidden behind that button!
Once the connection is established, the data will land in the data preview pane. You can then begin extracting and transforming the required data.
3. Transformation Examples
In this chapter, we’ll learn how to use Power Query to transform data. Transforming data means changing how data looks or behaves so we can use it in better ways. Analysts tend to spend nearly 80% of their time cleaning data and Power Query helps do this in an effective way. Let’s look at some examples in the below tutorial.
3.1. Sorting Data
In Power Query, you can sort your data, which means putting it in a specific order based on certain rules. Sorting can be done in two ways: ascending or descending. It is easy to sort your data by clicking the drop-down arrow in the column header, and then selecting “Sort Ascending” or “Sort Descending.”
Once you click Sort Ascending, your text values will be sorted alphabetically. You can review the syntax of the generated formula in the picture below.
If you want to sort by multiple columns, simply repeat these steps for the next column and it will be added to the sorting formulas.
The user interface automatically generates the relevant formulas including the Table.Sort function. This means that as a beginner, you don’t need to do any of the complicated work. However, as you become more comfortable, you can apply more advanced sorting algorithms.
3.2. Data Types
Data types are an important part of Power Query that helps you manage and transform your data. In Power Query, data types help you identify the type of data you are working with, such as numbers, text, or dates. This is useful because some operations are supported by one data type and not by another.
There are many different data types available in Power Query. Some of them include dates, times, text, duration, numbers, and logical values. You can easily assign data types to columns. In the picture below, you can tell what data type a column is by looking at the icon on the left side of the column header. If you click the icon, you can choose to transform the column to a different data type.
By understanding the different data types available in Power Query, you can transform your data in your desired ways. One area where the importance of data types becomes especially clear is when you’re filtering data.
3.3. Filtering Data
Filtering data allows you to show only the data that meets certain criteria. You can filter data based on specific values, ranges, or other criteria. Let’s look at a few examples.
The most basic form of filtering is when you manually select which columns to keep. You can click the column dropdown arrow and select the columns you want to keep.
If you want to go further, you can use filters that meet certain criteria. The filters provided by Power Query UI are context-dependent because different data types require different filters.
For example, when you click on Text Filters in the above dropdown, you can indicate the text should contain the letter “e”, which would return the below result.
Likewise, on a number column, you can return all values between 150 and 200 by selecting the filter Between…
What’s good to remember is that when you click the dropdown in a column header, the menu that appears will change based on the data type of the column. This means that text columns show different options than date or number columns. Below you can find the most common Data Type Filters:
Lastly, you’ll find filters for columns with the Date Type. The picture below shows a summary of what you can do with Date Filters:
No matter which filter you choose, behind-the-scenes Power Query translates this to the corresponding M code. The UI just makes sure that these actions are within reach even to those who don’t know the M-language formulas yet.
3.4. Transforming Columns
Transforming columns is an essential part of working with data in Power Query. It allows you to make changes to the data within a column, like combining columns, splitting a column into separate parts or capitalizing words. For most column transformations you will use the Transform Tab in the ribbon. Here are a few examples of how you can transform a column.
3.4.1. Merge Columns
If you have some columns with related data, you may want to combine columns into a single column. For example, you might have a column for First Name and one for Last Name. You can easily merge columns to retrieve the full name.
To do that, hold CTRL and select both the First Name and Last Name columns. Go to the Transform Tab in the ribbon and press Merge Columns. Choose a space as a separator and name the column Full Name.
Now press OK and Power Query will concatenate the text values into a single value.
3.4.2. Formatting Text
Sometimes, data may be stored as text, but you need to format it in a different way. With messy data, you could have a mix of uppercase, lowercase or capitalized words. Power Query easily lets you transform the text into the desired form.
To get started, go to the Transform Tab, click on Format and select the desired options. You have options to change the capitalization of letters, remove excess spaces and characters or add a prefix/suffix to your words.
The below picture shows the effect of lowercasing your text and of capitalizing each word.
3.4.3. Replacing Values
If you have a column with values that need to be changed, you can replace them with new values. For example, you might have a column with sentences that contain the word Horse. You realize you would like to replace the text ‘horse’ with ‘pony’.
To do this, you can go to the Transform Tab, and input both the text you want to replace and the text you want to replace the text with.
Replacing values is a simple and effective way to clean up your data and ensure its accuracy. By following these steps, you can easily replace values in Power Query and create a cleaner, more organized data set.
To learn about more advanced scenarios, you can find step-by-step explanations in the complete guide to replacing values.
3.5. Pivot Columns
Pivoting columns in Power Query is a useful feature that allows you to reorganize your data. When you pivot your data, you take a single column with many values and spread them out over several columns. Here’s how to pivot columns in Power Query:
- Select the column you want to pivot
- Go to the Transform tab
- Select Pivot Columns
- Choose the values column: In the Pivot Columns” pop-up menu, select the values column you want to summarize. Choose the aggregation function you want to use; in most cases, you will use SUM.
- After you’ve selected the values column and the aggregation function, click OK to pivot your data.
3.6. Group or Summarize
Aggregating or grouping data in Power Query is a powerful way to summarize and analyze your data. Here are three ways in which it can help:
- Decreases Rows: If you have a lot of rows in your data, grouping can help reduce the number of rows. This can be especially helpful if your data doesn’t fit in Excel.
- Performance Optimizing: If you want to speed up your DAX calculations, you can summarize your data with grouping. This can make your data model run faster.
- Perform Aggregations: Grouping data lets you perform calculations on groups. For example, you can count the number of times an item appears, create a running total, or calculate total sales by group.
So how can you group your data?
- Select the column you want to group by and click Group By on the Transform tab.
- In the pop-up menu, the selected column should appear as Group By column.
- If you want to do multiple calculations, go to the Advanced tab and add one line for each calculation. For example, you can add a line for the amount column and one for the quantity column. Then perform a sum operation.
When you press OK, your data will be summarized by the chosen column, and all underlying values will be aggregated using the sum operation. If you want to learn more about this topic, check out the article on Grouping Data in Power Query. You can even write custom code to perform additional Group By operations that are not available through the user interface.
Whereas this chapter was about transforming existing columns, what can you do to create new columns?
3.7. Adding Columns
Adding columns in Power Query is a powerful feature that lets you add new data to your data set using custom logic. You can easily create if statements, concatenate data and create custom formulas. Let’s look at a few examples of how you can use this feature.
3.7.1. Conditional Column
With an if statement you can return a value based on a condition, a so-called conditional column. For example, you might want to add an if statement that identifies the quantity of a small, medium, or large package.
A low-code way to do this is to:
- Go to the Add Column Tab in the ribbon.
- Click on Conditional Column
- Then, enter an if statement by writing
if [Package] = "Small" then 3 else if [Package] = "Medium" then 5 else if [Package] = "Large" then 10 else null
3.7.2. Custom Column
In case you need more specific logic and feel comfortable writing code, you can also create a custom column. The custom column allows you to create custom formulas and functions. Imagine you have a table with Countries and Product Prices Excluding VAT. You want to create a price that includes VAT, but each country has a different VAT percentage.
You can’t create this if statement in a single step using the Conditional Column dialogue box. But if you master creating conditional if statements in Power Query this process is relatively easy.
You go to the Add Column tab, select Custom Column and name your new column. In the custom column formula box you can write the if-statement. Now press okay to review your newly created column.
3.8. Merge or Append Queries
Combining Queries in Power Query can be useful when you have data stored in multiple files, sources or queries that you want to combine into a single table. Power Query provides two ways to combine data: Appending Queries and Merging Queries.
3.8.1. Append Queries
Appending Queries is a way to combine data vertically, that is, to stack data tables on top of one another. For example, if you have sales data for different months stored in separate files, you can use the Append Queries function to combine them into a single table.
In the below picture, the Sales March and Sales April data come from different Excel files. They currently exist in different queries. Combine them into a single query you can:
- Go to the Home Tab and select Append Queries
- Select the other Table from the dropdown menu
- Press OK. You are now left with a single table that contains sales data from both March and April.
3.8.2. Merge Queries
Merge Queries is a way to combine data horizontally, that is, to add columns from one table to another based on matching values. For those familiar with other approaches, in SQL you would join tables whereas in Excel you can achieve this with VLOOKUP.
For example, if you have sales data in one table and discount data in another, you can use the Merge Queries operation to combine tables into a single table.
To do that you:
- Navigate to the Home Tab
- Click Merge Queries
- Select the table to retrieve new data from
- Determine the Join Kind (in this case Left outer), click OK
Once this is done, Power Query creates a new column that contains a table object in each cell. To turn this into a column value, you can click the two arrows at the right of the column header, select the desired column and press okay.
And with that, you have seen the entire process of merging tables in Power Query.
It’s important to note that both Appending and Merging Queries may require some data cleaning and transformation to ensure that the data format and types are compatible and match correctly. However, once you have mastered these functions, combining data in Power Query becomes a breeze.
4. Advanced Editor
The Advanced Editor is a powerful tool in Power Query that allows you to delve deeper into the M-code generated by your applied steps. You can access the Advanced Editor by clicking on the “Advanced Editor” button in the Home tab.
The Advanced Editor shows the M language code that Power Query generates based on your transformations. It’s useful for experienced users who want to create complex transformations that can’t be done through the standard interface. For beginners, it may seem intimidating, but it can be a useful tool to learn as you become more familiar with Power Query Syntax.
When you open a query, it could look like the picture below.
You will find that the Advanced Editor always opens with a let statement. This indicates that the following lines contain variables. Once the variable (applied steps) has been defined, the word in indicates the end of this. The step that follows at the last line (AddLengthTest) is the expression that Power Query returns.
Overall, the Advanced Editor is a valuable tool in Power Query that allows you to create more complex and precise data transformations. Although it may seem intimidating at first, it’s a tool that can help you gain a better understanding of the Power Query M language and the full capabilities of Power Query.
5. Saving your Work
After you have completed the query and transformed the data in the way that you want, you will need to save your work. There are two main options available for you to do this:
- Close & Apply: Clicking close and apply loads your queries to the desired destination. This can be an Excel Table, the Power BI data model, a dataflow or any destination connected to your tool. This operation also closes the Power Query Editor.
- Apply: Performs the same action as Close & Apply, except it keeps the power Query window open.
Power Query is a powerful and user-friendly data transformation tool that comes with Excel and Power BI. It offers a variety of built-in functions to connect to different data sources, reshape data, and automate your data transformation.
While there are some limitations to Power Query, including speed and availability issues, it remains a valuable tool to help you transform your data and generate insights that you can use for your business.
In this article, we’ve covered the benefits of Power Query, including its easy learning curve, custom functions, and the ability to reshape data easily. We’ve also looked at the Power Query components, including the Power Query Editor, ribbon tabs, and how to connect to data sources.
Overall, Power Query is a free and accessible tool that can help anyone work with data. If you haven’t already, it’s time to explore Power Query and take your data analysis to the next level.