Written By Rick de Groot

Rick is the founder of BI Gorilla. He believes learning is one of life's greatest pleasures and shares his knowledge to help you improve your skills.

Removing excess spaces between words is not easy in Power Query. Despite Power Query’s inbuilt function for managing leading and trailing spaces, removing duplicate spaces within a string can be challenging.

This article will showcase three effective methods to tackle this issue gracefully. We’ll delve into strategies from creating recursive functions with the ‘@’ operator to employing List.Generate for iterative replacements. We’ll also explore a unique approach to splitting and combining text values. By the end of this guide, you’ll have a clear understanding of how to replace any series of consecutive spaces with a single one in Power Query.

Table of contents

Removing Spaces Excess Spaces

Our goal is straightforward: replace any consecutive spaces with a single one. There are several approaches to this, and this article covers three.

Recursion using the @ operator

When dealing with consecutive spaces, one approach involves using recursion in conjunction with the @ operator. Recursion enables a function to call itself, making it a suitable technique for iterative operations. By incorporating the @ operator, we can achieve efficient and concise recursion in Power Query.

To begin, we start with a basic statement that replaces double spaces with single spaces in a text value:

We start off with a basic statement such as:

Text.Replace( "G  o      r  ill   a", "  ", " " )  // Returns "G o   r ill  a"

Next, we improve this operation by making it recursive, ensuring that the replacement of double spaces continues until the desired outcome is achieved. This is accomplished through the use of an “if” statement within our recursive function:

fxRemoveSpaces = ( x as text ) as text =>
   if Text.Contains( x, "  " )  
      then Text.Replace( x, "  ", " " )
      else x

To make this function recursive, we use the @ operator to call itself:

fxRemoveSpaces = ( x as text ) as text =>
   if Text.Contains( x, "  " )  
      then @fxRemoveSpaces( Text.Replace( x, "  ", " " ) ) 
      else x

In the above example, @fxRemoveSpaces calls the function itself recursively. By applying this function to a text value, we achieve the desired outcome of removing consecutive spaces.

In this case, @fxRemoveSpaces calls the function itself recursively. Applying this function to an example, we get:

// Below example returns "G o r ill a "
let
  fxRemoveSpaces = ( x as text ) as text =>
    if Text.Contains( x, "  " ) 
       then @fxRemoveSpaces( Text.Replace( x, "  ", " " ) ) 
       else x,
  result = fxRemoveSpaces( "G  o      r  ill   a " )
in
  result

To apply recursion and the @ operator to a custom column, such as ‘[MyText], we can use the Table.AddColumn function:

Table.AddColumn(
  Source, 
  "CleanText", 
  each 
    let
      fxRemoveSpaces = ( x as text ) as text =>
        if Text.Contains( x, "  " ) 
           then @fxRemoveSpaces( Text.Replace( x, "  ", " " ) ) 
           else x, 
      result = fxRemoveSpaces( [MyText] )
    in
      result, 
  type text
)

By incorporating the recursive function within the Table.AddColumn function, we create a new column, “CleanText,” which contains the text value after the removal of consecutive spaces.

Considerations: It’s important to note that recursion with the @ operator can sometimes consume significant memory and may not be the most performant method for large datasets. As an alternative, we will explore other approaches in the next section to optimize performance and memory allocation.

Iteration with List.Generate

When it comes to tackling the challenge of removing consecutive spaces, an alternative method involves harnessing the power of List.Generate. This versatile function allows for iterative actions based on specified conditions. By utilizing List.Generate, you can iterate through the text value, checking for multiple spaces and replacing them as needed.

To illustrate the application of List.Generate, let’s consider the text value “G o r ill a” that contains multiple consecutive spaces. By implementing List.Generate, we can progressively replace the multiple spaces.

/* Returns three list items, each with another replacement.
{ 
"G  o      r  ill   a", 
"G o   r ill  a", 
"G o  r ill a" 
} 
*/
List.Generate(
    () => "G  o      r  ill   a", 
    each Text.Contains( _, "  " ), 
    each Text.Replace( _, "  ", " " )
)

The List.Generate function starts with the original text value and then checks whether it contains consecutive spaces using the Text.Contains function. It then iteratively replaces the consecutive spaces with a single space using the Text.Replace function. The output is a list of values, each representing the text after a replacement iteration.

While the list of replacements is valuable, our primary focus is obtaining the final clean string. We can extract the last value from the list. However, a remaining double space is still present in the string. So to make sure we get that last replacement in order, we can use the optional 4th argument to provide an additional replacement in the selector.

// Returns "G o r ill a"
let
    Replacements = 
        List.Generate(
            () => "G  o      r  ill   a", 
            each Text.Contains( _, "  " ), 
            each Text.Replace( _, "  ", " " ),
            each Text.Replace( _, "  ", " " )
        ),
    Result = List.Last( Replacements )
in
   Result

By adding this extra replacement and using the List.Last function, we obtain the final clean string, ensuring the removal of consecutive spaces.

To avoid unnecessary code repetition, we can convert the replacement action into a function. This approach allows us to call the function twice within List.Generate, shorten the code and improve maintainability.

// Returns "G o r ill a"
let
    Replacer = (t) => Text.Replace( t, "  ", " " ),
    Replacements = 
        List.Generate(
            () => "G  o      r  ill   a", 
            each Text.Contains( _, "  " ), 
            Replacer,
            Replacer 
        ),
    Result = List.Last( Replacements )
in
   Result

By capturing the replacement logic in the Replacer function, we achieve a more concise and elegant solution, while still ensuring the removal of all consecutive spaces.

To further explore the capabilities of List.Generate, make sure to read this comprehensive article on List.Generate.

Iteration with List.Accumulate

In our quest to remove consecutive spaces, we can employ the List.Accumulate function, which is particularly useful when the number of iterations is predetermined. Although the exact number of iterations required may not be known when dealing with strings, we can make an assumption to ensure efficient processing.

To tackle the challenge of consecutive spaces in a string, it’s important to estimate the maximum number of consecutive spaces that could occur. By considering the total length of the string, we can determine an approximate maximum. An effective approach is to divide the length by 2 using the Number.IntegerDivide function. This division will provide the integer portion of the result, serving as an estimation of the maximum number of consecutive spaces.

Let’s take a practical example to illustrate the usage of List.Accumulate. Suppose we have a string, “G o r ill a”, that contains multiple consecutive spaces. We can employ List.Accumulate to remove the excess spaces and obtain a clean version of the string.

let
   Source = "G  o      r  ill   a", 
   Result = 
      List.Accumulate(
         List.Repeat( { "  " }, Number.IntegerDivide (Text.Length( Source ), 2 ) ), 
         Source, 
         ( state, current ) => Text.Replace( state, current, " " )
  )
in
  Result

The List.Accumulate function operates by iterating through a list of repeated consecutive spaces (based on our assumption) and progressively replacing them with a single space using the Text.Replace function. The final result will be a well-formatted string with no excess consecutive spaces.

To explore the capabilities of List.Accumulate further, refer to the comprehensive article on List.Accumulate.

Splitting and Combining Text Values

In contrast to the previous approaches, this method takes a different route. Rather than replacing consecutive spaces, we remove all spaces and reintroduce them later in the process.

The first step is to split the text value by each occurrence of a space. You can make use of the Text.Split function to accomplish this.

Text.Split( "G  o      r  ill   a", " " )

The above code will return a list with individual elements representing the split portions of the text value. For instance, the result could be {"G", "", "o", "", "", "", "", "", "r", "", "ill", "", "", "a"}.

To remove the empty strings (representing consecutive spaces) from the list, you can use the List.RemoveItems function.

let 
   TextSplitBySpace = Text.Split( "G  o      r  ill   a", " " ),
   ListWithoutEmpty = List.RemoveItems( TextSplitBySpace, {""} ),
in
ListWithoutEmpty 

The code snippet above removes the empty strings from the list, resulting in {"G", "o", "r", "ill", "a"}. Now, we have a list containing the individual words or elements of the original text value.

Since the remaining items in the list originally had at least one space between them, we can now reintroduce the spaces using the Text.Combine function.

let 
  // Split string into words
  TextSplitBySpace = Text.Split( "G  o      r  ill   a", " " ),
  // Remove empty strings (consecutive spaces)
  ListWithoutEmpty = List.RemoveItems( TextSplitBySpace, {""} ),
  // Combine words with single space
  TextCombined = Text.Combine( ListWithoutEmpty, " " ) 
in
TextCombined

The final result of the code snippet above is “G o r ill a,” where the consecutive spaces have been removed, and a single space is inserted between each word.

Performance Comparison

Now that we have explored and understood the four different methods for removing consecutive spaces in Power Query, it’s time to compare their performance in terms of execution time. By using the diagnostic tools, we can compare the performance of the different approaches.

I cooked up an example to test the different approaches. The test results are as follows:

ApproachResultΔ % from fastest
Recursion using @1.1643979+ 5852 %
Iteration with List.generate0.7740501+ 3857 %
Iteration with List.Accumulate0.0394766+ 102 %
Splitting and Combining values0.0195634+ 0 %

When it comes to performance, the Splitting and Combining Values method outperforms the other approaches, offering the fastest execution time for removing consecutive spaces in Power Query. However, it’s important to consider the trade-offs between performance and other factors such as code complexity, readability, and maintainability. Each method has its own advantages and considerations.

It’s important to note that the execution times mentioned above are specific to the example test scenario and may vary depending on the dataset and complexity of the transformations applied.

To verify and evaluate the performance of these methods in your own context, I encourage you to execute the following query using your data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("vVtLc9s2EP4rrs+djKeHpJNLDu10kkNmOtPcHB9AcCkgArEsAIqWf31JkXYoESDxYr+TLJMguNjHtw89Pt5/Y1zfBSL4DuJzkSKyxNprPQ3SgKQw/67jhk0f61YY3giw3BkHilIDbQ0/WdfUDaGg3/Wf7p9+fbz/DCrt0eEnMoJINAzUfGcWQV1jJrYJhilw3LAhhzD8lFocLrL+xq538m/L6fHnn4XCTk6fK3yOecyPtm48zgNPV4L3hHnbviAv54AbSzzMJXeRxRffmwVGnR8Qw+Uh4sYlGv7ycnEKS/WzP/rZKC8nsgLKADSMUrvI659J+BqEiLQ4JzQQza7XLULO1xOGJRniZZeoIN4IR4xuj+TwCmdsU5eA040PBGm/MFWj5g9hiA7r0IaoT6OBGsu/O5Jb95ZYf9GSqOPqBcmPl2X4TdqgqrdNRvIDMyn6aw0jF1Byc17coUcBSDRYL7DbfWdALCXxhq8tUkG05nT+DeOiVCC/tw8Pv31w7LNAPMZpSVho9QPIH5g3GjSCnJfux1vbGt5zxzw7sRuJ3ZIrJ9PMCMPr5AB39+Yn9jKEbaaeCDbGmQrbCJ5658v9V0i929Fmg27dHjr14DTjMpH+FmoIVp4XGyzJ+VaQU36ZsglPzMRFREfObukJYjJsqMI4tdxGDTAws4RMcxT7nzYHGUhXOyLzx+YBJsR7H6aLsd3YTNZDKbmUoD6lL/QakSmZYr6vSAsQHHIkKa/gZpV0eIIIBaS8DpB/Kd6b/y9J646C+uogGhU5oeLRpktR7Gaxcc66EG2OOL+Fq9JBHjCynR9lzFUHFDwlkumaR1VX0RKeWUzdzIKqJx67H78nh6nxxKdrhuzfzLU5U3L2+8eHhywL3eLvr1dS3CP9eYMEuNV8eyAr2sR0SWOdZrEHhRQUv+G2y7KOIUcYj9t081cJCtD+MEQeMKva5zpwlqMcOPd6jaWMF1VZSthDCCQa2Co75E8qd8sjWWqdsorrGoX3nJIaSyFZqj4meKVcZ04FkNAIuofZ9Lm3XOSrWXE5mj9IehHWmbQ1RMecynpoyV7y1USYDGnUOoLDQEhNc8h3QmspgyuNNpllUdKjtXDxJNut/GV9Mc2r71dqjzUdeCb19jgFVnGr32Aj3L7GN6vEt7bYH6exn/uCpocgnNJ/2Iu9v+GGxqdjpfRo09jM+aodhohBHfblgHWrOeVE5uohxNcOOkwcLLIg+pwKID0pq1qRczc2KO/zLRVoB+9zOIipNryy5oX2Fs4Rgmj0W6UMylb4vN0seTRYo1LY/T+ljr0S/2GAyad4K/hxW+N1w6krmleIZYCoPOOIo8+kITqE+MHSW83CLjvQSTMUHkhQpw52apr4FmXiZNw7SOoz23Z37qUPytLtsyNvzSBrvSC5RrCA5lFDmyGwFhhSh1P9kbst3r8OBbVuMN4zZ6QmL318DOtD2fP0+ME+BqKJvNWGxEKwA56DrK8wo/02CgsBtVc2v0PjyQnVyuyRbBjLTKvi98xnGMHYzyoTOU+G7v5mRDKKnCCdc+cpSvUJiQihVw6Eu8ChGBT6lNiAaWtaDrgSoSG9Hb9zT/lm2FJqRj1P2HqykbjaFbA1mpcp7Ve7ClDiQf/jkYFClSA45djuQxUi/FGF6oDXGuhV6rThCCtjX1F9bIZZZjG3sEKRd+qPTcxX90FurVKQ9QdUS7hZazaW2cefDGPwFhCleqms69v7qLLpZik0urGU+GOVqd20TUjvn57+Aw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Sentence = _t]),
    Table_repeat = Table.Repeat( Source, 10000 ),
    TextSplit = Table.AddColumn(Table_repeat, "TextSplit", each let 
  TextSplitBySpace = Text.Split( [Sentence], " " ),
  ListWithoutEmpty = List.RemoveItems( TextSplitBySpace, {""} ),
  TextCombined = Text.Combine( ListWithoutEmpty, " " ) 
in
TextCombined, type text ),
    ListAccumulate = Table.AddColumn(Table_repeat, "ListAccumulate", each let
 Source = [Sentence],
 Result = List.Accumulate(
 List.Repeat({"  "}, Number.IntegerDivide(Text.Length(Source),2)),
 Source,
 (a,b) => Text.Replace(a,b," ")
 )
in
 Result, type text ),
    ListGenerate = Table.AddColumn(Table_repeat, "ListGenerate", each let
    Replacer = (t) => Text.Replace(t, "  ", " "),
    Replacements = 
        List.Generate(
            () => [Sentence], 
            each Text.Contains(_, "  "), 
            Replacer ,
            Replacer 
        ),
    Result = List.Last( Replacements )
in
   Result, type text ),
    #"Recursion@" = Table.AddColumn(
  Table_repeat, 
  "@recursion", 
  each 
    let
      fxRemoveSpaces = (x as text) as text =>
        if Text.Contains(x, "  ") 
           then @fxRemoveSpaces(Text.Replace(x, "  ", " ")) 
           else x, 
      result = fxRemoveSpaces([Sentence])
    in
      result, 
  type text
)
in
    #"Recursion@"

Conclusion

Cleaning up extra spaces in your text can be a bit of a puzzle, but as you’ve seen, Power Query has different ways to deal with them.

First, we saw how the ‘@’ operator can repeatedly call itself until the task is done. Next, we checked out how List.Generate and List.Accumulate iterated over a list of values. And lastly, we split values into pieces and then put them back together.

With these handy methods, you’re now ready to tackle any extra spaces that might show up in your text data.

Happy querying!

Share on:

Latest from my blog

  1. On given dataset of yours, List.Select is quite faster than List.RemoveItems
    ListWithoutEmpty = List.Select( TextSplitBySpace, (x)=>x”” )

    List.RemoveMatchingItems is also quite fast though not as fast as List.Select.
    Test these out and see whether my observations are matching yours or not.

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.