Forum Discussion

Lorenzo's avatar
Lorenzo
Silver Contributor
Nov 27, 2023
Solved

Fill BLANKs down, up or both in column(s) - similar to Power Query

Hi

 

Looked at +/- replicating Fill Down and Fill Up functionnalities in Power Query with a LAMBDA. My attempt below:

 

 

 

 

 

// FillDownUp:
=LAMBDA(array,
  LET(
    Data,       DROP(array, 1),
    DataCols,   COLUMNS(Data),
    FillCol,    LAMBDA(array,
                  SCAN(
                    IF( ISBLANK( TAKE(array, 1) ),
                      XLOOKUP(FALSE, ISBLANK(array), array, ""),
                      ""
                    ),
                    array,
                    LAMBDA(seed,x, IF( ISBLANK(x), seed, x))
                  )
    ),
    FillStack,  LAMBDA(seed,col, HSTACK( seed, FillCol( CHOOSECOLS(Data, col) ) )),
    DataFilled, IF(DataCols = 1, FillCol(Data),
                  DROP( REDUCE(0, SEQUENCE(DataCols), FillStack),, 1)
    ),
    VSTACK(CHOOSEROWS(array, 1), DataFilled)
  )
)

 

Open to alternatives...

  • Lorenzo 

    My variant:

     

    Filler
    =LAMBDA(table,LET(
        header, TAKE(table, 1),
        data, DROP(table, 1),
        counter, SEQUENCE(COLUMNS(data)),
        FillDown, LAMBDA(a, v, IF(v = "", a, v)),
        Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
        Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
        VSTACK(header, Stack)
    ))
  • Lorenzo 

    I did some timing runs using Charles Williams's 'Calc Range' routine and it returned about 3500 ms for 500,000 rows.

  • Lorenzo 

    Sadly, I didn't see this question first time around!

    My first thought is that the problem as posed doesn't really require the equivalent of the PQ FillUp functionality.  It is really just FillDown but with non-standard initialisation using the first non-blank value.

    FILLDOWNλ
    = LET(
        init, TAKE(TOCOL(column, 3), 1), 
        SCAN(init, column, LAMBDA(a, v, IF(v <> "", v, a)))
      )

    Having defined a function that works with a single column one would naturally expect

    = BYCOL(Table, FILLDOWNλ)

    to provide the solution for the complete table.  Unfortunately, because of the Microsoft error in specifying the functionality required to handle arrays of arrays, one is stuck with implementing one's own version of BYCOL.

    BYCOLλ
    = LET(
        n₀, COLUMNS(array),
        n₁, QUOTIENT(n₀, 2),
        n₂, n₀ - n₁,
        A₁, TAKE(array, , n₁),
        A₂, TAKE(array, , -n₂),
        X₁, IF(n₁ > 1, BYCOLλ(A₁, FNλ), FNλ(A₁)),
        X₂, IF(n₂ > 1, BYCOLλ(A₂, FNλ), FNλ(A₂)),
        HSTACK(X₁, X₂)
      )

    The double recursion is a bit painful but at least the function can be reused and it is straightforward to apply

    = BYCOLλ(Table, FILLDOWNλ)

     

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        The core functionality of BYCOLλ  is also intended to be widely deployable.  The assumption is that some function FNλ needs to be applied to every column of an array (or range).  Rather than rewriting the function FNλ to address multiple columns, BYCOLλ splits to problem into two parts and solves each half independently.  Recursion means that eventually one reaches the point where the argument is a single column and FNλ may be applied directly to give a single result.

         

        Whereas the built-in BYCOL function will only return the result if each application of the function returns a scalar value, BYCOLλ uses HSTACK to build an array of arrays.

         

        A problem with the alternative approach of using REDUCE and HSTACK to build an array of array is that the early results may be copied 10,000s of times during the stacking steps.  This gets to be expensive in terms of computing resource.  The bisection approach limits the steps of copying any given result to form build bigger datasets to 14 at the most.

         

        I hope that by describing the intent of the 'home-knitted' helper function, I will simplify life for anyone planning to reverse engineer it from the code.

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    Lorenzo 

    My variant:

     

    Filler
    =LAMBDA(table,LET(
        header, TAKE(table, 1),
        data, DROP(table, 1),
        counter, SEQUENCE(COLUMNS(data)),
        FillDown, LAMBDA(a, v, IF(v = "", a, v)),
        Complete, LAMBDA(vector, LET(ini, TAKE(TOCOL(vector, 1), 1), SCAN(ini, vector, FillDown))),
        Stack, DROP(REDUCE("", counter, LAMBDA(a, v, HSTACK(a, Complete(INDEX(data, , v))))), , 1),
        VSTACK(header, Stack)
    ))
  • Lorenzo 

    And back to this Can SCAN() and BYROW() be combined? - Microsoft Community Hub discussion one of variants which is similar to your one

    FillUpDn = LAMBDA(array,
    LET(
        fnFill, LAMBDA(array,
        LET(
            reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)), -1)),
            down, LAMBDA(arr, SCAN(, arr, LAMBDA(a, v, IF(v = "", a, v)))),
            reverse(down(reverse(down(array))))
        ) ),
        fillData, LAMBDA(arr,
        REDUCE(
            1,
            SEQUENCE(COLUMNS(arr)),
            LAMBDA(A, i, IF(SEQUENCE(,i) = i, fnFill(INDEX(arr,, i )), A))
        ) ),
        VSTACK(TAKE(array,1), fillData(DROP(array,1)) )
    ))
  • Lorenzo 

    Looks nice. My variant for FillCol

    FillCol = LAMBDA(array,
        LET(
        reverse, LAMBDA(arr, SORTBY(arr, SEQUENCE(ROWS(arr)),-1) ),
        down, LAMBDA(arr, SCAN(, arr, LAMBDA(a,v, IF(v="", a, v))) ),
        reverse(down(reverse(down(array))))
        )
    )

    More correct name will be Fill Down and Up (not fill up and down)

    • Lorenzo's avatar
      Lorenzo
      Silver Contributor

      SergeiBaklan 

      More correct name will be Fill Down and Up (not fill up and down) Named it UpDown as in reality this is what my approach does. Though, this is quite confusing ==> Initial post revised

      • Lorenzo's avatar
        Lorenzo
        Silver Contributor

        Thank you SergeiBaklan & Patrick2788 

         

        What I had in mind (wasn't sure that was realistic) was to provide a single LAMBDA (to allow copy/paste in Name Manager) for cases like (but not only):

         

         

        where, for whatever reason Power Query cannot be used. Added a couple of optional parameters:

         

         

        I don't post the formula as it's a monster now. It's available in the attached workbook - seems to work decently... I stop there but will wait a couple of days before closing this thread in case you have feedback/recommendation (I'm gonna change the title of the discussion accordingly)

Resources