performance – Any way to make knockout faster or is data binding fundamentally flawed?

I have recently adopted knockout for my projects as a technology to fully own and run for the next decade or so. I hate frameworks and libraries, so when I adopt something it is because I found that it is well done and that I can navigate the source to make my own fixes and enhancements. Otherwise I prefer doing it myself. Knockout made the mark.

I like to push the limits and see how things scale and I love spreadsheets. So I thought a spreadsheet should be a nice way to push the limit on a data binding gizmo. To see how easy it is, and how it scales with larger spreadsheets.

Is it easy? I thought it was not easy enough. Each spreadsheet cell should be an observable, a computed observable, where changing the formula means updating the read function. The write function is there to understand string literal values, numbers and formulas (start with “=”) and then update the formula function.

With knockout computed observables you have something that needs something else to store its value if any. And then that tends to want to be observable too, and every additional observable is added squared. And worse are dependency chains.

I have built this example up from the absolute minimum complexity to have just exactly what is needed.

  • when clicking into a cell you edit the formula
  • when leaving the edit input box, we go back to value mode
  • value is the result of a function (even constant numbers and strings end up functions)
  • formulas are essentially javascript expressions with the usual cell and range references turned into calls to a range(…) function
  • where ranges are flatMap’ped to an array of their values.

My implementation is the best I could come up with. There is another knockout spreadsheet question on here, but the person who asked this wasn’t really serious and the examples sent around were not tuned. I had another example following the more “clean” approach, as you “should” use ko, where the Cell is an object, with several things observable, but it is horribly slow in the stress test, so I stick to this here.

There is a stress test below which I used to get to the best I could do.

I will explain my code. First the HTML, is pretty simple (BTW, I have the text in comments, so you can just select the whole thing and copy and paste it and it should work, with the explanatory text then in comments.)

<html>
  <head>
    <title>Spreadsheet Example</title>
    <script src="knockout-3.5.0.debug.js" type="text/javascript"></script>
    <style type="text/css">
      table { border-collapse:collapse; } 
      th { border:1pt solid grey; } 
      td { width:10em; height:2ex; border:1pt solid grey; }
      input { width:100%, height:100%; }
    </style>
  </head>
  <body>
    <table>
      <thead>
        <tr>
          <th scope="col" data-bind="click: function() { filltest(); }">#</th>
          <!-- ko foreach: { data: sheet[0], as: 'col' } -->
          <th scope="col" data-bind="text: colalpha($index())"></th>
          <!-- /ko -->
        </tr>
      </thead>
      <tbody data-bind="foreach: { data: sheet, as: 'row' }"> 
        <tr>
          <th scope="row" data-bind="text: $index()+1"></th>
          <!-- ko foreach: row -->
          <td data-bind="click: function() {
                              $rawData._kos_editing(true); 
                              $rawData._state.isDirty = true; 
                              return true; }">
            <!-- ko if: (_ => { if(!$rawData._kos_editing)
                                    $rawData._kos_editing = ko.observable(false);
                                return $rawData._kos_editing })() -->
            <input data-bind="value: $rawData._kos_formula, 
                              event: { blur: function(obj, evn) {
                                          $rawData._kos_editing(false);
                                          $rawData(evn.target.value); } }"/>
            <!-- /ko -->
            <!-- ko if: !$rawData._kos_editing() -->
            <div data-bind="text: $data, style: $rawData.style"></div>
            <!-- /ko -->
          </td>
          <!-- /ko -->
        </tr>
      </tbody>
    </table>
    <script type="text/javascript"><!-- /*

Now that’s the end of the HTML, and I think it’s pretty obvious, generating that table. Now comes the javascript: */

const cols = 20;
const rows = 20;
const fakeDependency = ko.observable(42); /* XXX: to protect us from being disposed

This fakeDependency is so embarrassing, but without it knockout would just immediately “dispose” the computed observable because when there is no formula yet, it has no dependencies.

Now we create the 2-dimensional array and immediately fill it with ko.computed observables: */

const sheet = [...Array(rows)]
sheet.forEach((_,rownum,sheet) => { 
    const row = sheet[rownum] = [...Array(cols)];
    row.forEach((_,colnum) => {
        const cell = row[colnum] = ko.computed({ /*

Now each cell is one main writable computed observable. First the read function. We have to store the value of the computed, so I add a _kos_value property (kos is the knockout spreadsheet prefix (I use for all my properties (I stick onto the observable). It’s very awkward that I do not have a handle on the observable from within the value accessor functions and I have no place to store the value if I don’t want to make things complicated. */

            read:  function() {
                fakeDependency(); // XXX: if we don't ask for anything, we'll get disposed
                const cell = row[colnum];
                if(!cell)
                    return "";
                const value = cell._kos_value;
                if(false && cell._kos_editing())
                    return cell._kos_formula;
                else if(typeof value == 'function')
                    return value();
                else
                    return value;
            }, /*

You see above that there is the edit mode (_kos_editing) that itself is an observable because it is used to switch between the value and the formula. Then the value function in edit mode returns the formula, and in non-edit mode returns the value.

This is somewhat awkward because the read function is called much less than you’d expect. So there are a few things to force evaluation of the read function, some forcing _state.isDirty and some forcing notifySubscribers.

Now the write function. This knows that it’s going to be called only in edit mode, and then gets the formula. */

            write: function(value) {
                const formula = value;
                const number = 1*value;
                      const cell = row[colnum];
                if(formula == "")                   
                    cell._kos_value = _ => 0;
                else if(!Number.isNaN(number))
                    cell._kos_value = _ => number;
                else if(formula.charAt(0) == '=')
                    cell._kos_value = new Function([], `    return ${formula.substring(1).replaceAll(/b([A-Z]+)(d+)(?::([A-Z]+)(d+))?b/g, "range('$1',$2,'$3',$4)")};`);
                else 
                    cell._kos_value = _ => formula;

                cell._kos_formula = formula;
                cell._kos_editing(false);
                cell._state.isDirty = true;
                cell.notifySubscribers();
            }
        }); /*

The regex up there was to replace cell references A1, B3:E8, AA20, AB21, etc. That’s it.

Now comes a little thing to do right-aligned for numbers and left-aligned for text: */

        cell.style = ko.computed({
            read: function() {
                const value = cell();
                if(typeof value == 'number')
                    return { 'text-align': 'right' };
                else
                    return { 'text-align': 'left' };
            }
        });
    })
}); /*

And that’s it. Now the table is created with all the observables. The rest is helper functions. First the function that is called for all cell references (the regex above makes those calls). It returns a single value (if a single cell) or a flat array of values ​​if a range */

function range(colalpha, rownum, endcolalpha, endrownum) { 
    const [startRownum, startColnum] = rowcol(colalpha, rownum);
    const [endRownum, endColnum]     = endcolalpha ? rowcol(endcolalpha, endrownum) : [undefined, undefined];;

    if(typeof endRownum == 'undefined' || (endRownum == startRownum && endColnum == startColnum) )
        return sheet[startRownum][startColnum]();
    else 
        return sheet.slice(startRownum, endRownum + 1)
                    .flatMap(row => row.slice(startColnum, endColnum + 1)
                                       .map(cell => cell()));
} /*

with a function that converts the A..Z, AA, AB, … letter numbers to column numbers and the row numbers also moved from 1-based to zero-based array indexes: */

function rowcol(colalpha, rownum) {
    const colnum = [...colalpha].map(chr => chr.charCodeAt(0) - 64).reduce((value, digit) => (value*26)+digit, 0);
    return [rownum - 1, colnum - 1 ];
} /*

now the function that turns column number into these letter numbers, used really mostly for the stress test below. */

function colalpha(colnum) { // map 0 .. 26, 27, 28 to A ... Z, AA, AB ... 
    let result = "";
    colnum++;
    do {
        colnum--;
        result = String.fromCharCode(65 + colnum % 26) + result;
        colnum = Math.floor(colnum / 26);
    } while(colnum > 0); // NOTE: only second time in my life that a repeat-until construct was (almost) helpful
    return result;
} /*

Finally, just as a proof of concept, an example for a range function, with a formula like =A1:C5.sum() sums up an entire block: */

Array.prototype.sum = function() { return this.reduce((sum,x) => sum + x, 0); }

ko.options.deferUpdates = true;
ko.applyBindings(sheet); /*

Now this is the end of the actual sheet. Now comes the stress test, since I don’t have select, copy, paste, and paste-down with formulas that get adjusted. But I can quickly fill the sheet with a stress test of lots of dependencies, and we do it progressively to see the updates.

The test is for each cell being the sum of the cell directly above it with the cell directly left to it. This creates a heavy stress test of dependencies. */

function filltest () {
    sheet[0][0]("0");
    setTimeout(test_setup_cols, 100);
}
function test_setup_cols() {
    sheet[0].slice(1).forEach((cell,i) => cell(`= ${colalpha(i)}1+1`));
    setTimeout(test_setup_rows, 100);
}
function test_setup_rows() {
    sheet.slice(1).forEach((row,i) => row[0](`= A${1+i}+1`));
    setTimeout(test_fill, 100);
}
function test_fill (i = 1) {
    let rownum = i;
    let colnum = i;
    for(colnum = 1; colnum <= i; colnum++)
        sheet[rownum][colnum](`= ${colalpha(colnum-1)}${rownum+1} + ${colalpha(colnum)}${rownum}`); 
    colnum = i;
    for(rownum = 1; rownum <= i; rownum++)
        sheet[rownum][colnum](`= ${colalpha(colnum-1)}${rownum+1} + ${colalpha(colnum)}${rownum}`);   
    if(++i < cols && i < rows)
        setTimeout(test_fill.bind(null,i), 100);
} /*

and here is the end: */

    --></script>
  </body>
</html>

I have checked the performance of this against Google sheets, and it’s hand down losing. The filling of the test formula and the evaluation is actually not even that bad (and I challenge anyone else to make something faster). But when I change cells with lots of dependencies — the worst being A1 — it stalls and fails.

So now I am having second thoughts about knockout. Can this be rescued? How can google sheets be so fast? It’s also javascript. It has to do the same dependency tracking work. And it’s so much faster!

My last part of the header question is serious. I want to see if perhaps there is something fundamentally flawed about data binding as knockout does it.

UPATE: I think I know some of the reason for the slowness. My intuition is that the dependency tracking materialized the entire transitive closure, which will cause the same cell to be notified of change a huge number of times. Let’s test this (and play around with javascript as a “query language”).

First, let’s name all cells:

sheet.forEach((row,rownum) => row.forEach((cell,colnum) => cell._kos_name = colalpha(colnum)+(rownum+1)))

Now let’s make a list of all dependencies.

> sheet[1][1]._subscriptions.change
< (4) [ko.subscription, ko.subscription, ko.subscription, ko.subscription]

this shows that my intuition is wrong. Each cell appears to have only 4 dependencies. And I can’t really tell to what. At least it is not true what I suspected that the transitive closure of the dependencies is reified in these subscriptions.

But profiling I determined that the most time is spent here:

evaluateImmediate_CallReadWithDependencyDetection: function (notifyChange) {
    window.cccnt = (window.cccnt||0)+1;

so I added this counter cccnt, and I could in a 20 x 20 table after filling it with the formula, go to J10 and add “+1” to the formula there, something that changes the result value ( “*1” does not ).

Now here is the counts.

  1. on first initialization: 5,453
  2. after filling with the formulas: 10,387 – 5,453 = 4,934 difference
  3. when updating the formula in J10: 1,422,465 – 10,387 = 1,412,078

That’s 1.4 million times evaluating for just 400 formulas. This is definitely not right. When I put similar counters into my write and read functions, I get 420 writes after the whole test, which is good, but before the formula update in J10 I have read run 837 and after the update I have 706,268.

So this is clearly wrong and optimizable somehow. Since we have the dependencies once we should run the updates cascading them down one way J10 updates, the 2 immediately dependent cells are K10 and J11, so they update. Then their dependents, and so on which means only 100 cells should update. But somehow it runs amok here.

I am not sure how to start attacking this problem, but I have some optimism that it could be done. That it is actually a bug of some sort.

See also further tests and discussions on knockout github: https://github.com/knockout/knockout/issues/2596, in particular, I have found objective proof that the knockout dependency tracking is horribly wrong and I could build a drop-in Replacement for my bench test that would perform in O(n^2) where knockout’s would run in O(e^n), leading to thousands-fold speed improvements with my code even in tables as small as 12 x 12.

Leave a Comment