Monday, March 5, 2018

Gosper's Hack Explained


Gosper’s Hack* is famous.  Written by and named after Ralph William Gosper, Jr., it comes up any time someone on a public forum asks for people to share their favorite programming tricks or hacks.
But for as often as I’ve seen it come up, I have not seen any adequate explanation of how it works.  When someone mentions it, everyone else expresses some equivalent of “Yeah, wow” – and moves on.  So for all of you that have wondered how Gosper’s Hack works (all three of you.  You know who you are), here it is:

The Code
Here is Gosper’s Hack, written as a C function:
void GospersHack(int k, int n)
{
    int set = (1 << k) - 1;
    int limit = (1 << n);
    while (set < limit)
    {
        DoStuff(set);

        // Gosper's hack:
        int c = set & - set;
        int r = set + c;
        set = (((r ^ set) >> 2) / c) | r;
    }
}

DoStuff() is meant to be replaced with a function that processes each different value that set takes.

What It Does
Gosper’s Hack iterates through all n-bit values that have k bits set to 1, from lowest to highest.  Given k = 3 and n = 6, it will produce the following values, in order from least to greatest:
000111b (7d)
011001b (25d)
101010b (42d)
001011b (11d)
011010b (26d)
101100b (44d)
001101b (13d)
011100b (28d)
110001b (49d)
001110b (14d)
100011b (35d)
110010b (50d)
010011b (19d)
100101b (37d)
110100b (52d)
010101b (21d)
100110b (38d)
111000b (56d)
010110b (22d)
101001b (41d)


For any n k, Gosper’s Hack selects exactly the nCk values with k out of the first n bits set to 1.

How Would You Do It?
Before we see how Gosper’s Hack does it, let’s see how we could do it manually.  Is there a set of rules that will start with 000111b, and move to 001011b, then to 001101b, and so on?

Starting out is easy:  Set the rightmost k bits to 1.  For k = 3, this gets us: 000111b.
This is exactly what the first line of Gosper’s Hack does:

int set = (1 << k) - 1;


There is also a straightforward rule for moving to the next number:
1.       Find the rightmost 1-bit that can be moved left into a 0-bit.  Move that 1-bit left one position.
2.       Move all 1-bits that are to the right of that bit all the way to the right.

Here are those rules in action:
Set the rightmost k bits to 1
000111b


Find the rightmost 1-bit that can be moved left. Move it left one.
000111b => 001011b
Move all 1-bits that are to the right of that bit all the way to the right.
001011b => 001011b
Result
001011b


Find the rightmost 1-bit that can be moved left. Move it left one.
001011b => 001101b
Move all 1-bits that are to the right of that bit all the way to the right.
001101b => 001101b
Result
001101b


Find the rightmost 1-bit that can be moved left. Move it left one.
001101b => 001110b
Move all 1-bits that are to the right of that bit all the way to the right.
001110b => 001110b
Result
001110b


Find the rightmost 1-bit that can be moved left. Move it left one.
001110b => 010110b
Move all 1-bits that are to the right of that bit all the way to the right.
010110b => 010011b
Result
010011b

We should stop when moving a 1-bit to the left exceeds the n-bit range given.  For n = 6, we should stop when we are given 0111000b and produce 1000011b.  This check is performed by the 2nd and 3rd lines of Gosper’s Hack:

int limit = (1 << n);
while (set < limit)




How Gosper’s Hack Does It

Now that we know what to expect, let’s see how Gosper’s Hack takes the current value and produces the next one.  The first line in the process is this:

int c = set & -set;

By definition, -set + set = 0.  If we were to construct -set from set, we would start by putting a 1 in set’s rightmost 1 bit. Adding that to set will trigger the zeroing out of the rightmost cluster of 1’s in set:

When set = 001110b
-set (partial) =  000010b
set + -set (partial) =010000b

To complete the zeroing out of all bits in set, we need to put a 1 in -set wherever there’s a 0 in set, left of the rightmost 1 in set:

When set = 001110b
-set =  110010b
set + -set =000000b

Note that there is only one position where set and -set both have a 1 – the rightmost 1 in set.  This means that set & -set gives us the rightmost 1 bit in set.  This line alone should be added to the list of clever hacks: x & -x yields the lowest 1-bit in x.

int c = set & -set; // c is equal to the rightmost 1-bit in set.

The next line adds c to set, which clears out the rightmost cluster of 1-bits in set and puts a 1 in the first zero left of the rightmost cluster of 1 bits:

int r = set + c;

Given set = 0011001110b
c = 0000000010b
r = 0011010000b

This accomplishes step 1 of the manual process outlined above:  Find the rightmost 1-bit that can be moved left into a 0-bit. Move it left one. All that is left is to take the other bits of the rightmost cluster of 1 bits and place them as far to the right as possible.  The last line of Gosper’s Hack does exactly that.  Let’s look at it, one operation at a time:

(r ^ set)

Xor’ng r and set returns a cluster of 1-bits representing the bits that were changed between set and r:

Given set = 0011001110b
c = 0000000010b
r = 0011010000b
r ^ set = 0000011110b

This includes the original rightmost cluster of 1-bits from set, plus one more bit.  That is, it contains all the bits that need to be moved to the right – plus two more bits.

How do we shift these bits over?  Recall that c is the value of the rightmost bit in set, and therefore also the rightmost bit in r ^ set.  Dividing r ^ set by c will move all the bits to the right far enough to remove all trailing zeros:

Given set = 0011001110b
c = 0000000010b
r = 0011010000b
r ^ set = 0000011110b
(r ^ set) / c = 0000001111b

This still contains two more 1-bits than we need.  We can get rid of those by right-shifting the result by two more bits:

Given set = 0011001110b
c = 0000000010b
r = 0011010000b
r ^ set = 0000011110b
(r ^ set) / c = 0000001111b
((r ^ set) / c) >> 2 = 0000000011b

Gosper chose to use ((r ^ set) >> 2) / c) instead of ((r ^ set) / c) >> 2.   Why?  I believe he did so because some CPUs use the shift-and-subtract method to perform division and stop as soon as the remainder is 0, so dividing by a small number is faster.  Thus, in some CPUs, we can expect ((r ^ set) >> 2) / c) to be marginally faster than ((r ^ set) / c) >> 2.

All that is left is to combine the right-shifted bits with r:

set = (((r ^ set) >> 2) / c) | r;

Given set = 0011001110b
c = 0000000010b
r = 0011010000b
r ^ set = 0000011110b
(r ^ set) / c = 0000001111b
((r ^ set) >> 2) / c = 0000000011b
set = (((r ^ set) >> 2) / c) | r = 0011010011b

With that final step, Gosper’s Hack accomplishes the iteration from one integer with k bits set to the next higher integer with k bits set.  The final check stops it when more than in bits are used.
That is the entirety of Gosper’s Hack.  To see it run on a full sample, from start to finish, skip the next section.


Can It Be Improved?

Attempting to improve on something as elegant as Gosper’s Hack is audacious.  Nevertheless, there are two possible areas of improvement:

1.       k must be at least one bit shorter than the storage size.  That is, if using 32-bit integers, k must be 31 or less.  This is because of two reasons:
a.       Limit is set to the k+1th bit, necessitating one more bit of storage than k.
b.      The result of a right-shift operation on a negative integer is implementation-dependent.  Some implementations preserve the sign of a negative integer when right-shifting, by shifting a 1 into the high bit.  This means that (r^ set) >> 2 might not do what you expect.

2.       At the end of the loop, set is incremented to the first invalid value.  It might be nice to spare a few cycles by stopping at the last valid value, instead of the first invalid value.

We can resolve both of these by sacrificing some elegance.  Here’s how I would do it:

void GospersHack(int k, int n)
{
    uint c, r;                    // This is just my preference.
    uint set = (1u << k) - 1;     // Use 1u instead of 1.
    uint limit = set << (n - k);  // Limit = last valid value.
   
    DoStuff(set); // Process first value outside the loop.  Not so elegant.

    while (set != limit)
    {
        c = set & (0 - set); // -set isn’t allowed if set is a uint.
        int r = set + c;
        set = (((r ^ set) >> 2) / c) | r;

        DoStuff(set);
    }
}

You can debate whether these changes are worth the work:
  • This is a lot of work to do just for one more bit of storage, when you could just as easily change to the next higher integer size (e.g. 64-bit integers, instead of 32-bit integers).
  • Saving one iteration of Gosper’s Hack in the loop is only performant if you are running many, many iterations of the function, each with small numbers for n and k.
  • On my PC, performing 0 - set on a uint is 3.3% slower than -set on an int.  ((uint)-set) is even slower.  Looks like using a signed integer was pretty smart, after all.

Sacrificing 3.3% of the time on negating set, will quickly eat up any time saved by not running the loop once.  But if you absolutely need that last bit, and don’t mind the loss of performance … well … there you go.

A Full Example of Gosper’s Hack in action

Here is a full example of with k = 3 and n = 6:

Gosper’s Hack, Full Example
Explanation
int set = (1 << k) - 1;
set = 0000111b
int limit = (1 << n);
limit = 1000000b


while (set < limit)
      0000111b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0001000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0001111b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000011b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0001011b, combine r with bits rght-shifted.


while (set < limit)
      0001011b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0001100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000111b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0001101b, combine r with bits right-shifted.


while (set < limit)
      0001101b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0001110b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0001110b, combine r with bits right-shifted.


while (set < limit)
      0001110b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0010000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0011110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000011b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0010011b, combine r with bits right-shifted.


while (set < limit)
      0010011b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0010100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000111b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0010101b, combine r with bits right-shifted.


while (set < limit)
      0010101b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0010110b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0010110b, combine r with bits right-shifted.


while (set < limit)
      0010110b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0011000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0001110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0011001b, combine r with bits right-shifted.


while (set < limit)
      0011001b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0011010b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0011010b, combine r with bits right-shifted.


while (set < limit)
      0011010b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0011100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0011100b, combine r with bits right-shifted.


while (set < limit)
      0011100b < 1000000b, so proceed.
    int c = set & -set;
c = 0000100b, the lowest 1-bit in set.
    int r = set + c;
r = 0100000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0111100b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000011b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0100011b, combine r with bits right-shifted.


while (set < limit)
      0100011b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0100100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000111b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0100101b, combine r with bits right-shifted.


while (set < limit)
      0100101b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0100110b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0100110b, combine r with bits right-shifted.


while (set < limit)
      0100110b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0101000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0001110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0101001b, combine r with bits right-shifted.


while (set < limit)
      0101001b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0101010b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0101010b, combine r with bits right-shifted.


while (set < limit)
      0101010b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0101100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0101100b, combine r with bits right-shifted.


while (set < limit)
      0101100b < 1000000b, so proceed.
    int c = set & -set;
c = 0000100b, the lowest 1-bit in set.
    int r = set + c;
r = 0110000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0011100b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000001b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0110001b, combine r with bits right-shifted.


while (set < limit)
      0110001b < 1000000b, so proceed.
    int c = set & -set;
c = 0000001b, the lowest 1-bit in set.
    int r = set + c;
r = 0110010b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000011b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0110010b, combine r with bits right-shifted.


while (set < limit)
      0110010b < 1000000b, so proceed.
    int c = set & -set;
c = 0000010b, the lowest 1-bit in set.
    int r = set + c;
r = 0110100b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0000110b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0110100b, combine r with bits right-shifted.


while (set < limit)
      0110100b < 1000000b, so proceed.
    int c = set & -set;
c = 0000100b, the lowest 1-bit in set.
    int r = set + c;
r = 0111000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 0001100b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000000b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 0111000b, combine r with bits right-shifted.


while (set < limit)
      0111000b < 1000000b, so proceed.
    int c = set & -set;
c = 0001000b, the lowest 1-bit in set.
    int r = set + c;
r = 1000000b, move 1 bit to the left, clear others.
    (r ^ set)
   = 1111000b, the cluster of bits changed.
    (r ^ set) >> 2) / c
   = 0000011b, move remaining bits to the right.
    set = (((r ^ set) >> 2) / c) | r;
   = 1000011b, combine r with bits right-shifted.


while (set < limit)
      1000011b ≥ 1000000b, so stop.

* At the time it was named, “hack” meant a clever programming trick, not sloppy code or a way to breach security.

3 comments: