X bitmaps vs OpenGL

Of course, you all know that X started life as a monochrome window system for the VS100. Back then, bitmaps and rasterops were cool; you could do all kinds of things simple bit operations. Things changed, and eventually X bitmaps became useful only off-screen for clip masks, text and stipples. These days, you'll rarely see anyone using a bitmap -- everything we used to use bitmaps for has gone all alpha-values on us.

In OpenGL, there aren't any bitmaps. About the most 'bitmap-like' object you'll find is an A8 texture, holding 8 bits of alpha value for each pixel. There's no way to draw to or texture from anything where each pixel is represented as a single bit.

So, as Eric goes about improving Glamor, he got a bit stuck with bitmaps. We could either:

  • Support them only on the CPU, uploading copies as A8 textures when used as a source in conjunction with GPU objects.

  • Support them as 1bpp on the CPU and A8 on the GPU, doing fancy tracking between the two objects when rendering occurred.

  • Fix the CPU code to deal with bitmaps stored 8 bits per pixel.

I thought the latter choice would be the best plan -- directly share the same object between CPU and GPU rendering, avoiding all reformatting as things move around in the server.

Why is this non-trivial?

Usually, you can flip formats around with reckless abandon in X, it has separate bits-per-pixel and depth values everywhere. That's how we do things like 32 bits-per-pixel RGB surfaces; we just report them as depth 24 and everyone is happy.

Bitmaps are special though. The X protocol has separate (and overly complicated) image formats for single bit images, and those have to be packed 1 bit per pixel. Within the server, bitmaps are used for drawing core text, stippling and creating clip masks. They're the 'lingua franca' of image formats, allowing you to translate between depths by pulling a single “plane” out of a source image and painting it into a destination of arbitrary depth.

As such, the frame buffer rendering code in the server assumes that bitmaps are always 1 bit per pixel. Given that it must deal with 1bpp images on the wire, and given the history of X, it certainly made sense at the time to simplify the code with this assumption.

A blast from the past

I'd actually looked into doing this before. As part of the DECstation line, DEC built the MX monochrome frame buffer board, and to save money, they actually created it by populating a single bit in each byte rather than packed into 8 pixels per byte. I have this vague memory that they were able to use only 4 memory chips this way.

The original X driver for this exposed a depth-8 static color format because of the assumptions made by the (then current) CFB code about bitmap formats.

Jim Gettys wandered over to MIT while the MX frame buffer was in design and asked how hard it would be to support it as a monochrome device instead of the depth-8 static color format. At the time, fixing CFB would have been a huge effort, and there wasn't any infrastructure for separating the wire image format from the internal pixmap format. So, we gave up and left things looking weird to applications.

Hacking FB

These days, the amount of frame buffer code in the X server is dramatically less; CFB and MFB have been replaced with the smaller (and more general) FB code. It turns out that the number of places which need to deal with individual bits in a bitmap are now limited to a few stippling and CopyPlane functions. And, in those functions, the number of individual read operations from the bitmap are few in number. Each of those fetches looked like:

bits = READ(src++)

All I needed to do was make this operation return 32 bits by pulling one bit from each of 8 separate 32-bit chunks and merge them together. The first thing to do was to pad the pixmap out to a 32 byte boundary, rather than a 32 bit boundary. This ensured that I would always be able to fetch data from the bitmap in 8 32-bit chunks. Next, I simply replaced the READ macro call with:

    bits = fb_stip_read(src, srcBpp);
    src += srcBpp;

The new fb_stip_read function checks srcBpp and packs things together for 8bpp images:

/*
 * Given a depth 1, 8bpp stipple, pull out
 * a full FbStip worth of packed bits
 */
static inline FbStip
fb_pack_stip_8_1(FbStip *bits) {
    FbStip      r = 0;
    int         i;

    for (i = 0; i < 8; i++) {
        FbStip  b;
        uint8_t p;

        b = FB_READ(bits++);
#if BITMAP_BIT_ORDER == LSBFirst
        p = (b & 1) | ((b >> 7) & 2) | ((b >> 14) & 4) | ((b >> 21) & 8);
        r |= p << (i << 2);
#else
        p = (b & 0x80000000) | ((b << 7) & 0x40000000) |
            ((b << 14) & 0x20000000) | ((b << 21) & 0x10000000);
        r |= p >> (i << 2);
#endif
    }
    return r;
}

/*
 * Return packed stipple bits from src
 */
static inline FbStip
fb_stip_read(FbStip *bits, int bpp)
{
    switch (bpp) {
    default:
        return FB_READ(bits);
    case 8:
        return fb_pack_stip_8_1(bits);
    }
}

It turns into a fairly hefty amount of code, but the number of places this ends up being used is pretty small, so it shouldn't increase the size of the server by much. Of course, I've only tested the LSBFirst case, but I think the MSBFirst code is correct.

I've sent the patches to do this to the xorg-devel mailing list, and they're also on the 'depth1' branch in my repository

git://people.freedesktop.org/~keithp/xserver.git

Testing

Eric also hacked up the test suite to be runnable by piglit, and I've run it in that mode against these changes. I had made a few mistakes, and the test suite caught them nicely. Let's hope this adventure helps Eric out as he continues to improve Glamor.