Fenwick trees: prefix sums in O(log N) time

The prefix sum

Suppose we have an array f with N values. Each subarray beginning at the start of f is called a prefix of f. So

[f[0]]
[f[0], f[1]]
[f[0], f[1], f[2]]
...
[f[0], f[1], f[2], ..., f[N - 2], f[N - 1]]

are all prefixes of f.

We want to calculate a new array F, also of size N, in which

F[0] = f[0]
F[1] = f[0] + f[1]
F[2] = f[0] + f[1] + f[2]
...
F[N - 1] = f[0] + f[1] + ... + f[N - 2] + f[N - 1]

That is, each element in F is the sum over the corresponding prefix of f. We will say that F is a prefix sum of f. Prefix sums are useful in many ways; look at the wikipedia page to get an idea.

Calculation

An obvious “brute force” way of doing a lookup on the ith prefix sum F[i] is to sequentially accumulate the values in f, from f[0] to f[i]. This clearly has a time complexity of O(N). An update to the data, on the other hand, is constant time – the only access made is to the element to be changed itself. This suits an application which is update-heavy: if the data is to change often but the prefix sum will be rarely needed, our scheme looks OK.

Let’s now suppose our application is lookup-heavy. Is there a way to improve on this? A fairly straightforward way to do so is to store the prefix sum array itself, F. This gives us lookup in O(1) – but now, of course, updates take linear time because to modify the ith element we have to update all elements from F[i] to F[N - 1]. Incidentally, getting any one element in f is still O(1), but is not so simple anymore: we have to do f[i] = F[i] - F[i-1] if i != 0.

Fenwick trees - the O(log N) way

Fenwick trees (also called binary indexing trees) offer a middle ground solution for applications which are both update- and lookup-intensive: both operations have a O(log N) time complexity. Moreover, the supporting structure is still a simple array, and all the “tree” magic is actually in the indexing process – a bit akin to how heaps are normally implemented on top of regular arrays.

Theory

As said above, the tree topology manifests itself in the process of indexing, not intrinsically in the stored data itself (the supporting array). This theoretically makes it possible to have many different trees with the same elements, only changing the way they are interconnected.

0 1 2 0 1 2

Obligatory obvious example of differently connected trees containing the same elements

This is so true that what we call a “Fenwick tree” is actually two different trees, one for updating and a different one for prefix sum lookup.

The lookup tree

Before commenting on how the lookup works, I’ll just throw an example C++ implementation at you. Here v is our supporting array.

int pref_sum(size_t i, const vector<int> &v) {
    int sum = v[0];
    while (i != 0) {
        sum += v[i];
        i &= i - 1;
    }
    return sum;
}

What is happening here? There is a running sum going on, but the indexed elements vary in a strange-looking way. The crux of the whole thing is the bitwise operation i AND (i - 1). Example:

  i     -> 424 (110101000)
& i - 1 -> 423 (110100111)
         = 416 (110100000)

In this case, i got decremented by 8. The more observant among you has already noticed that 8 also corresponds to the lowest set bit in i. This is not a coincidence: a decrement only affects the lowest set bit and the ones below it (if any), complementing them in relation to the original number and therefore zeroing them out in the AND result. But any bit below the lowest one set was already zero, therefore this operation only changes the lowest set bit.

TL;DR: for a non-negative i, this operation will always clear the lowest set bit in i.

In each loop iteration, i has its least significant set bit cleared. But i, indexing N elements, has at most log2N bits to be cleared before the loop exits on the condition i != 0. Therefore, the O(log N) time complexity on the update is established.

But how does this actually retrieve the prefix sum? Let’s try to reason our way through this by taking notice of a few important facts:

Simulating the indexing scheme for an initial i = 14 (notice the progression of the least significant set bit):

14 (1110) =>
12 (1100) =>
8  (1000) =>
0         (end)

Therefore, F[14] = v[14] + v[12] + v[8] + v[0]. But we know the 14th prefix sum must also equal F[14] = f[14] + f[13] + ... + f[1] + f[0]. A reasonable way of establishing the values of v which fits together with our previous conjectures would be the following guess:

v[14] = f[14] + f[13]
v[12] = f[12] + f[11] + f[10] + f[9]
v[8]  = f[8] + f[7] + f[6] + f[5] + f[4] + f[3] + f[2] + f[1]
v[0]  = f[0]

And – surprise, surprise – this is actually the right scheme. Consider how the sequence changes when the starting index i is progressively smaller, from 14 down to 0:

14 => 12 => 8 => 0   [14-13] + [12-9] + [8-1] + [0]
13 => 12 => 8 => 0      [13] + [12-9] + [8-1] + [0]
12       => 8 => 0             [12-9] + [8-1] + [0]
11 => 10 => 8 => 0      [11] + [10-9] + [8-1] + [0]
10       => 8 => 0             [10-9] + [8-1] + [0]
9        => 8 => 0                [9] + [8-1] + [0]
8             => 0                      [8-1] + [0]
7  =>  6 => 4 => 0        [7] + [6-5] + [4-1] + [0]
6        => 4 => 0              [6-5] + [4-1] + [0]
5        => 4 => 0                [5] + [4-1] + [0]
4             => 0                      [4-1] + [0]
3        => 2 => 0                [3] + [2-1] + [0]
2             => 0                      [2-1] + [0]
1             => 0                        [1] + [0]
0                                               [0]

To the left is the indexing sequence for a given starting i, and to the right is the corresponding sum in terms of intervals (for example, [8-1] means the element 8 accumulates all items from 8 down to 1, inclusive). Now, I don’t mean to explain to you how Prof. Fenwick got the idea for this indexing scheme – in his paper he does not really make his thought process clear –, but my ambition is to show you the scheme is consistent with what it claims to do. You may work through the example to see that, given the indexing sequences in the left, the intervals in the right are the only ones that give the correct prefix sums.

The paths from starting index i all the way to 0 can be better understood when visualized in a tree structure, with 0 at the root (more on that in a minute). That’s what we call the lookup tree: the topology the index i follows when we are retrieving a prefix sum from the data structure.

The update tree

As before, instead of first explaining anything I’ll just whack you in the head with an example implementation for the update operation:

void add(size_t i, vector<int> &v, const int k) {
    if (i == 0)
        v[i] += k;
    else
        while (i < v.size()) {
            v[i] += k;
            i += i & ~(i - 1);
        }
}

The first thing to be noticed is that i == 0 is treated as a special case. This is required because the index update inside the while does not do anything when i = 0.

Speaking of which, we have yet again another loop with a weird index update bitwise operation. This time though, we are inverting all the bits on i after decrementing it. Remember that, as seen previously, decrementing a number inverts the lowest set bit and all the ones below it, leaving the upper ones untouched. However, if we now invert the number, the lowest set bit comes back to normal again, while the upper bits are inverted and the lower ones are set back to zero (as in the original number). If we bitwise AND this with the original number, we isolate the lowest set bit. And that’s the index update operation: you’re incrementing the number with its lowest set bit.

(Note: it’s very common for implementations to use i += i & -i instead, relying on two’s complement representation for negative binary numbers.)

This looks nicely dualistic to the index update in the lookup tree, doesn’t it? In the lookup tree we were effectively decrementing the number by its lowest set bit; now we’re incrementing it by the same amount. Let’s look at how a starting index of 1 evolves in this case (considering we have 15 elements):

1  (00001)
2  (00010)
4  (00100)
8  (01000)
16 (10000) => larger than v.size() == 15, breaking out of the loop

If you now take a look back at the lookup tree example, you will see that the elements which accumulate f[1] are v[1], v[2], v[4], v[8] – exactly the ones which were indexed during the update operation.

The update tree is, in some respects, a dual opposite to the lookup tree: two nodes which have an order relation (there is a path from n2 to n1, so n2 is located in the subtree rooted at n1) in one tree will not have an order relation in the other, and vice-versa.

Again, we can establish the time complexity of this operation as O(log N) by following the indexing logic. It’s easy to see that, by adding a number’s lowest set bit to itself, this bit will be set to zero and a higher bit will become the new lowest set bit. We have a maximum of log2N bits to go by before the loop ends, therefore the number of operations can’t be higher than log2N.

Visualization

On a closing note: we covered Fenwick trees for prefix sums, but the general notion can be applied to other operations – any associative binary operation will work (it doesn’t even have to be transitive, if you take a bit of care with the code).

References