Author Archives: Eric Anholt

VC4 driver status update

I've just spent another week hanging out with my Broadcom and Raspberry Pi teammates, and it's unblocked a lot of my work.

Notably, I learned some unstated rules about how loading and storing from the tilebuffer work, which has significantly improved stability on the Pi (as opposed to simulation, which only asserted about following half of these rules).

I got an intro on the debug process for GPU hangs, which ultimately just looks like "run it through simpenrose (the simulator) directly. If that doesn't catch the problem, you capture a .CLIF file of all the buffers involved and feed it into RTL simulation, at which point you can confirm for yourself that yes, it's hanging, and then you hand it to somebody who understands the RTL and they tell you what the deal is." There's also the opportunity to use JTAG to look at the GPU's perspective of memory, which might be useful for some classes of problems. I've started on .CLIF generation (currently simulation-environment-only), but I've got some bugs in my generated files because I'm using packets that the .CLIF generator wasn't prepared for.

I got an overview of the cache hierarchy, which pointed out that I wasn't flushing the ARM dcache to get my writes got into system L2 (more like an L3) so that the GPU could see it. This should also improve stability, since before we were only getting lucky that the GPU would actually see our command stream.

Most importantly, I ended up fixing a mistake in my attempt at reset using the mailbox commands, and now I've got working reset. Testing cycles for GPU hangs have dropped from about 5 minutes to 2-30 seconds. Between working reset and improved stability from loads/stores, we're at the point that X is almost stable. I can now run piglit on actual hardware! (it takes hours, though)

On the X front, the modesetting driver is now merged to the X Server with glamor-based X rendering acceleration. It also happens to support DRI3 buffer passing, but not Present's pageflipping/vblank synchronization. I've submitted a patch series for DRI2 support with vblank synchronization (again, no pageflipping), which will get us more complete GLX extension support, including things like GLX_INTEL_swap_event that gnome-shell really wants.

In other news, I've been talking to a developer at Raspberry Pi who's building the KMS support. Combined with the discussions with keithp and ajax last week about compositing inside the X Server, I think we've got a pretty solid plan for what we want our display stack to look like, so that we can get GL swaps and video presentation into HVS planes, and avoid copies on our very bandwidth-limited hardware. Baby steps first, though -- he's still working on putting giant piles of clock management code into the kernel module so we can even turn on the GPU and displays on our own without using the firmware blob.

Testing status:
- 93.8% passrate on piglit on simulation
- 86.3% passrate on piglit gpu.py on Raspberry Pi

All those opcodes I mentioned in the previous post are now completed -- sadly, I didn't get people up to speed fast enough to contribute before those projects were the biggest things holding back the passrate. I've started a page at http://dri.freedesktop.org/wiki/VC4/ for documenting the setup process and status.

And now, next steps. Now that I've got GPU reset, a high priority is switching to interrupt-based render job tracking and putting an actual command queue in the kernel so we can have multiple GPU jobs queued up by userland at the same time (the VC4 sadly has no ringbuffer like other GPUs have). Then I need to clean up user <-> kernel ABI so that I can start pushing my linux code upstream, and probably work on building userspace BO caching.

helping out with VC4

I've had a couple of questions about whether there's a way for others to contribute to the VC4 driver project.  There is!  I haven't posted about it before because things aren't as ready as I'd like for others to do development (it has a tendency to lock up, and the X implementation isn't really ready yet so you don't get to see your results), but that shouldn't actually stop anyone.

To get your environment set up, build the kernel (https://github.com/anholt/linux.git vc4 branch), Mesa (git://anongit.freedesktop.org/mesa/mesa) with --with-gallium-drivers=vc4, and piglit (git://anongit.freedesktop.org/git/piglit).  For working on the Pi, I highly recommend having a serial cable and doing NFS root so that you don't have to write things to slow, unreliable SD cards.

You can run an existing piglit test that should work, to check your environment: env PIGLIT_PLATFORM=gbm VC4_DEBUG=qir ./bin/shader_runner tests/shaders/glsl-algebraic-add-add-1.shader_test -auto -fbo -- you should see a dump of the IR for this shader, and a pass report.  The kernel will make some noise about how it's rendered a frame.

Now the actual work:  I've left some of the TGSI opcodes unfinished (SCS, DST, DPH, and XPD, for example), so the driver just aborts when a shader tries to use them.  How they work is described in src/gallium/docs/source/tgsi.rst. The TGSI-to_QIR code is in vc4_program.c (where you'll find all the opcodes that are implemented currently), and vc4_qir.h has all the opcodes that are available to you and helpers for generating them.  Once it's in QIR (which I think should have all the opcodes you need for this work), vc4_qpu_emit.c will turn the QIR into actual QPU code like you find described in the chip specs.

You can dump the shaders being generated by the driver using VC4_DEBUG=tgsi,qir,qpu in the environment (that gets you 3/4 stages of code dumped -- at times you might want some subset of that just to quiet things down).

Since we've still got a lot of GPU hangs, and I don't have reset wokring, you can't even complete a piglit run to find all the problems or to test your changes to see if your changes are good.  What I can offer currently is that you could run PIGLIT_PLATFORM=gbm VC4_DEBUG=norast ./piglit-run.py tests/quick.py results/vc4-norast; piglit-summary-html.py --overwrite summary/mysum results/vc4-norast will get you a list of all the tests (which mostly failed, since we didn't render anything), some of which will have assertion failed.  Now that you have which tests were assertion failing from the opcode you worked on, you can run them manually, like PIGLIT_PLATFORM=gbm /home/anholt/src/piglit/bin/shader_runner /home/anholt/src/piglit/generated_tests/spec/glsl-1.10/execution/built-in-functions/vs-asin-vec4.shader_test -auto (copy-and-pasted from the results) or PIGLIT_PLATFORM=gbm PIGLIT_TEST="XPD test 2 (same src and dst arg)" ./bin/glean -o -v -v -v -t +vertProg1 --quick (also copy and pasted from the results, but note that you need the other env var for glean to pick out the subtest to run).

Other things you might want eventually: I do my development using cross-builds instead of on the Pi, install to a prefix in my homedir, then rsync that into my NFS root and use LD_LIBRARY_PATH/LIBGL_DRIVERS_PATH on the Pi to point my tests at the driver in the homedir prefix.  Cross-builds were a *huge* pain to set up (debian's multiarch doesn't ship the .so symlink with the libary, and the -dev packages that do install them don't install simultaneously for multiple arches), but it's worth it in the end.  If you look into cross-build, what I'm using is rpi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin/arm-linux-gnueabihf-gcc and you'll want --enable-malloc0returnsnull if you cross-build a bunch of X-related packages.

X with glamor on vc4

Today I finally got X up on my vc4 driver using glamor.  As you can see, there are a bunch of visual issues, and what you can't see is that after a few frames of those gears the hardware locked up and didn't come back.  It's still major progress.
2014-08-21 16.16.37
The code can be found in my vc4 branch of mesa and linux-2.6, and the glamor branch of my xf86-video-modesetting.  I think the driver's at the point now that someone else could potentially participate.  I've intentionally left a bunch of easy problems -- things like supporting the SCS, DST, DPH, and XPD opcodes, for which we have piglit tests (in glean) and are just a matter of translating the math from TGSI's vec4 instruction set (documented in tgsi.rst) to the scalar QIR opcodes.

vc4 driver month 1

I've just pushed the vc4-sim-validate branch to my Mesa tree. It's the culmination of the last week's worth pondering and false starts since I got my first texture sampling in simulation last Wednesday.

Handling texturing on vc4 safely is a pain. The pointer to texture contents doesn't appear in the normal command stream, and instead it's in the uniform stream. Which uniform happens to contain the pointer depends on how many uniforms have been loaded by the time you get to the QPU_W_TMU[01]_[STRB] writes. Since there's no iommu, I can't trust userspace to tell me where the uniform is, otherwise I'd be allowing them to just lie and put in physical addresses and read arbitrary system memory.

This meant I had to write a shader parser for the kernel, have that spit out a collection of references to texture samples, switch the uniform data from living in BOs in the user -> kernel ABI and instead be passed in as normal system memory that gets copied to the temporary exec bo, and then do relocations on that.

Instead of trying to write this in the kernel, with a ~10 minute turnaround time per test run, I copied my kernel code into Mesa with a little bit of wrapper code to give a kernel-like API environment, and did my development on that. When I'm looking at possibly 100s of iterations to get all the validation code working, it was well worth the day spent to build that infrastructure so that I could get my testing turnaround time down to about 15 sec.

I haven't done actual validation to make sure that the texture samples don't access outside of the bounds of the texture yet (though I at least have the infrastructure necessary now), just like I haven't done that validation for so many other pointers (vertex fetch, tile load/stores, etc.). I also need to copy the code back out to the kernel driver, and it really deserves some cleanups to add sanity to the many different addresses involved (unvalidated vaddr, validated vaddr, and validated paddr of the data for each of render, bin, shader recs, uniforms). But hopefully once I do that, I can soon start bringing up glamor on the Pi (though I've got some major issue with tile allocation BO memory management before anything's stable on the Pi).

VC4 driver week 1

It's been a week now, and I've made surprising amounts of progress on the project.

I came in with this giant task list I'd been jotting down in Workflowy (Thanks for the emphatic recommendation of that, Qiaochu!). Each of the tasks I had were things where I'd have been perfectly unsurprised if they'd taken a week or two. Instead, I've knocked out about 5 of them, and by Friday I had phire's "hackdriver" triangle code running on a kernel with a relocations-based GEM interface. Oh, sure, the code's full of XXX comments, insecure, and synchronous, but again, a single triangle rendering in a month would have been OK with me.

I've been incredibly lucky, really -- I think I had reasonable expectations given my knowledge going in. One of the ways I'm lucky is that my new group is extremely helpful. Some of it is things like "oh, just go talk to Dom about how to set up your serial console" (turns out minicom fails hard, use gtkterm instead. Also, someone else will hand you a cable instead of having to order one, and Derek will solder you a connector. Also, we hid your precious dmesg from the console after boot, sorry), but it extends to "Let's go have a chat with Tim about how to get modesetting up and running fast." (We came up with a plan that involves understanding what the firmware does with the code I had written already, and basically whacking a register beyond that. More importantly, they handed me a git tree full of sample code for doing real modesetting, whenever I'm ready.).

But I'm also lucky that there's been this community of outsiders reverse engineering the hardware. It meant that I had this sample "hackdriver" code for drawing a triangle with the hardware entirely from userspace, that I could incrementally modify to sit on top of more and more kernel code. Each step of the way I got to just debug that one step to go from "does not render a triangle" back to "renders that one triangle." (Note: When a bug in your command validator results in pointing the framebuffer at physical address 0 and storing the clear color to it, the computer will go away and stop talking to you. Related note: When a bug in your command validator results in reading your triangle from physical address 0, you don't get a triangle. It's like a I need a command validator for my command validator.).

https://github.com/anholt/linux/tree/vc4 is the code I've published so far. Starting Thursday night I've been hacking together the gallium driver. I haven't put it up yet because 1) it doesn't even initialize, but more importantly 2) I've been using freedreno as my main reference, and I need to update copyrights instead of just having my boilerplate at the top of everything. But next week I hope to be incrementally deleting parts of hackdriver's triangle code and replacing it with actual driver code.

new job!

Yesterday was my first day working at Broadcom. I've taken on a new role as an open source developer there. I'm going to be working on building an MIT-licensed Mesa and kernel DRM driver for the 2708 (aka the 2835), the chip that's in the Raspberry Pi.

It's going to be a long process. What I have to work with to start is basically sample code. Talking to the engineers who wrote the code drops we've seen released from Broadcom so far, they're happy to tell me about the clever things they did (their IR is pretty cool for the target subset of their architecture they chose, and it makes instruction scheduling and register allocation *really* easy), but I've had universal encouragement so far to throw it all away and start over.

So far, I'm just beginning. I'm still working on getting a useful development environment set up and building my first bits of stub DRM code. There are a lot of open questions still as to how we'll manage the transition from having most of the graphics hardware communication managed by the VPU to having it run on the ARM (since the VPU code is a firmware blob currently, we have to be careful to figure out when it will stomp on various bits of hardware as I incrementally take over things that used to be its job).

I'll have repos up as soon as I have some code that does anything.

The X Test Suite

In the process of hacking on glamor, I got once again to the point of wondering how to test my changes. Sure, I can run some clients in Xephyr and wiggle windows around and see if things look good, but I'd like to be better than that.

Now, if you talk to any X developer about how to test your changes, they'll tell you to run the X Test Suite. This is them trolling you. Nobody runs the X Test Suite. They certainly don't themselves. While whot, dbn, Peter Harris, Kibi, and others did amazing work getting the ancient tree up to the point that it could be built by mortals, and run with "make run-tests", you ended up with a log saying "a bunch of things failed", and no idea what your implementation actually rendered. Selecting tests is awful, the reports tools are a mystery, and there are a million lame wrappers for executing tests (the best of which on the wikis was lost in the homedir backup failure, and the best keithp knows of lived only on his disk). If you do go looking at the error output, you see crap like this:


100 90 24
2328,0
100 90 24
25d,0
19,1
4b,0
19,1
4b,0
19,1
4b,0
19,1
4b,0
19,1
4b,0
19,1
4b,0
19,1
...


Does that look like an image to you? It didn't to me.

On the airplane to LCA, I set about fixing this. I grabbed piglit, which is my hammer for every case of "I've got a bunch of tests to run and compare between commits", and built a little test suite that takes a link to your built X Test Suite repository, finds all the tests in it, sets the environment for running them, and goes about running them. Then you get all the nice test filtering and spawning and results comparison of piglit.

Then, keithp dug into the error log format, and figured out that it's RLE encoded pixel values after a width/height/depth header, with a drawn image and a reference image in each log file (so that snippet above is "you got all 0 pixel values, when you should have had 0s with a 25-pixel-wide rectangle of 1s. Yes, 25."). From that, he built a tool to read those error logs and produce pairs of pngs with color values distributed around HSV. I run that tool on the output of the tests, save them off, and link them in the piglit summaries so you can actually see what your rendering did wrong, right next to the failure report.

I've ripped up our wiki stuff for the X Test Suite, which was full of ancient lies, and replaced it with: http://wiki.x.org/wiki/XorgTesting/

megadrivers!

Mesa OpenGL drivers are mostly a big pile of common code, with a little bit of hardware-specific glue. Until not too long ago, Mesa drivers linked all the shared code into their DRI driver library – the thin libGL.so loaded the big fat whatever_dri.so.

Back in early 2011, Christopher James Halse Rogers (RAOF) upstreamed a change to Mesa that allowed building the big pile of shared code as a shared library, which the various drivers could link against, so that we had only one copy on disk. Looking at a build I've got here, my i965_dri.so is 967k and libdricore is 4390k – so for each driver sharing the libdricore, we saved about 4MB. It made a big difference to distros trying to ship install CDs.

The problem with this is that it means all of Mesa's symbols have to be public, so that the drivers can get to them. This means an application could accidentally call one of our symbols (or potentially override one of our symbols with theirs). Now, we do like to prefix our symbols to make that unlikely, but looking through the symbols exported, there are some scary ones. _math_matrix_translate()? I could see that conflicting. hash_table_insert()? Oh, I bet nobody's named a function that before.

The other problem with making all our symbols visible is that the compiler doesn't get to be smart for us. All of those calls from i965_dri.so into Mesa core are actual function calls, not inlined. They all produce relocations. We could contort our coding style to move inlineable code into headers at the expense of our sanity, but not having to manually inline is why we have optimizing compilers.

Enter megadrivers. What if we built all of the drivers together as a single .so?  I've hacked up a build of i965_dri.so to build all of the driver code in with the core. If all the drivers can do this, then we get all the benefits of sharing the built code, while also allowing link-time optimization, and the application can never accidentally look under the covers.

The tricky part here was the loader interface. There are two loaders: libGL.so.1, and the X Server. Both dlopen your dri.so and look for a symbol named __driDriverExtensions (actually, libGL.so.1 also looks for __driConfigOptions, used to support the driconf application). From the vtables in that structure, all of the rest of the driver gets called. Each driver needs a different copy of the symbol, to point to its own functions. So to do the i965 megadriver, I made a tiny i965_dri.so which has just:

0000000000200b20 D driDriverAPI
0000000000200e00 D __driDriverExtensions

for a total of 5.5k, and that links against the 4.6MB libmesa_dri_drivers9.3.0-devel.so, which exports:

00000000003fbcc0 R __dri2ConfigOptions
00000000002dc120 R __driConfigOptions
00000000002d522c T _fini
0000000000033a38 T _init
0000000000660a60 D _mesa_dri_core_extension
0000000000654fc0 D _mesa_dri_dri2_extension
00000000000ed300 T _mesa_dri_intel_allocate_buffer
00000000000eddd0 T _mesa_dri_intel_create_buffer
00000000000f6520 T _mesa_dri_intel_create_context
00000000000edc00 T _mesa_dri_intel_destroy_buffer
00000000000f5790 T _mesa_dri_intel_destroy_context
00000000000edc30 T _mesa_dri_intel_destroy_screen
00000000000ed3e0 T _mesa_dri_intel_init_screen
00000000000edc70 T _mesa_dri_intel_make_current
00000000000ed2e0 T _mesa_dri_intel_release_buffer
00000000000eddb0 T _mesa_dri_intel_unbind_context

With only one driver converted, this change is hardly an improvement over the previous state of affairs – now along with libdricore, you've got another copy of the core in libmesa_dri_drivers.so. I'll be working on converting other classic drivers next, so we can hopefully drop libdricore.

Initial performance results: Enabling LTO on a dricore build, I saw a -0.798709% +/- 0.333703% (n=30) effect on INTEL_NO_HW=1 cairo-gl runtime. On a megadrivers+LTO compared to non-megadrivers, non-LTO, the difference was -6.35008% +/- 0.675067% (n=10).

I think this is definitely promising

Now, there is at least one minor downside:  Your megadriver has to link against the shared library deps of all of the sub-drivers. That means you'll be runtime linking libdrm_radeon.so along with libdrm_intel.so, for example. There's very little overhead to that, so I'm willing to trade that off for runtime overhead reduction. But the Radeon guys are excited about LLVM, which has had issues with breaking applications due to mismatched symbols between LLVM-using apps and LLVM-using drivers, and I wouldn't want our driver to suffer if that's an ongoing issue. It may be that if there are problems like this, we need to segment into megadrivers-with-that-dep and megadrivers-without-that-dep, for hopefully just two copies of Mesa core, instead of N.

I'm headed off to debconf day after tomorrow, where I'll hopefully be talking with distro folks about this plan, and some ideas for how to get graphics driver updates out faster.

zambia notes (trip done)



Holy crap cellphones.

Seven years ago, when visitors were out at Chimfunshi, they were alone. No phones. No internet. The office (somewhere else on the ranch) had a packet radio they would fire up sometimes to do e-mail, but we didn't have access. It was just you and chimps and broken english or bemba to talk to the locals.

But when we stepped off of the bus at Muchinchi village this time (the nearest village to Chimfunshi), a man immediately walked up to us to offer help and the use of his cellphone. For $1, we got to use his phone to place about 5 calls to arrange our pickup from Chimfunshi, and this is a normal transaction for visitors to make here. On the way, as the bus passed through medium sized villages, probably ¼ of the buildings were painted with the colors and logo of Airtel, offering prepaid phone cards along with any other services (so you'd see restaurant/takeaway/phone shops, or general store/phone shops, or investment/phone shops, or barbershop/takeaway/phone shops). The prepaid phone cards apparently can be used for their face value in cash, because everyone needs them all the time and are always running at near empty.

Giving that guy $1 for phone usage and then being told he had to go disappear into the shop to refill it was I think the 4th time my sister and I looked at each other thinking “This is one of those things that sucker tourists fall for and he's going to disappear with our money, right?”, but as always it went wonderfully. There was only one scam that got we pressured for (a guy asking us for 4 units of some other currency for “tickets” while we were on the free-for-passengers Zambia/Botswana ferry, even though the women he bothered immediately before us didn't fall for it either, and we hadn't had to do that on the way over), but it was just too obvious. Anything that smelled just a little bit like a setup was fine.

It was also a relief that for the hour we stood around in Muchinchi, we didn't get hassled by a single vendor. In Livingstone (where we were staying on the Victoria Falls portion of the trip), and in Lusaka, the moment we white tourists exited any vehicle, there were immediately multiple vendors approaching us with the standard script:

“Hello what is your name? My name is X. Where are you from? I'm from the village of Y (never a city or even a significant town), do you know it? I would like to show you my crafts, these are things I make myself in my village...”

Finally our driver arrived, and my sister and I ended up riding in the back for the ~45min to the ranch. From the back of the truck, on the highway after dark, you could see more stars at once than I ever can at home in Oregon. At home I'm either in the city with all the light pollution or in the woods with trees in the way. But out here it's all dark because the power lines that crisscross the country never supply the little villages, and the land is flat and the trees are short. I think I would be willing to do a 11 hour bus ride again if I could get more of that.

Then at last were were at Chimfunshi, and things came back to me. Everyone sitting around the campfire for meals. Making peanut butter and jelly sandwiches in the kitchen for lunch when you're going to hike to the enclosures. No beer left because somehow the math of 24 students * X beers per student per day * Y days between shopping trips always comes out to “you needed more cases of beer”. Communication in the student group devolving into pant-hoots and food-grunts and clapping-and-pointing as we spend too much time around the chimps (turns out groups of students thrown together and exposed to the same stimuli will develop the same jokes!).

chimps
Delicious tungulus for the chimps

This trip, I split my time between hanging out with chimps, helping at the local school, and reading in camp. The chimps were almost as fascinating as last time – I spent many hours watching enclosure 4 (sinkie, nicky, commando, kambo, kit, ken, bobby, and a few others I never learned to identify). I spent less time watching enclosure 2 (with chimps I'd met last time – pal, tara, tobar, goliath, ingrid, ilse, lionel, etc.), mostly because ken was super cute to watch, especially when the older males like sinkie would roughhouse with him.

While chimps were slightly less intrinsically interesting this time, this was redeemed by the research going on that I don't remember during my previous trip. There was a group from Wisconsin that had a couple of research projects. One study was to throw in 50 peanuts, one every 30 seconds, and record who gets it. This let them determine dominance hierarchies, since chimps are awful at sharing (how many times did I see mothers steal food from their kids, or big males steal from whoever they felt like?). This provided objective information they wanted for interpreting their second study, which was to observe behavior when chimps were exposed to pant-hoots from chimpanzees they didn't know (i.e. recordings from the group down the road) as compared to recorded pant-hoots from groups they were accustomed to (the one across the road). Over the week I was there, there was apparently an increase in tension in the groups as they became concerned that their borders were being encroached upon. In one of the tests, I saw a group that was interrupted by a foreign call during feeding (it was unintentionally early) actually leave high-reward food and go single-file out into the bush to search the border for what chimp might have made that noise (one female stayed behind and had a *feast*, though).

The one piece of research I got to participate in was a mapping project of enclosure 4 done by the Gonzaga student group we were attached to. Humans don't generally go into the enclosures, except for about 20' near the buildings for doing a bit of maintenance, so it's hard to see how much the chimps actually use their ~1km square enclosures. One day, all the chimps were brought in, and we got to transfer through inside, in ~6 groups of 3, to try to hand-map the trails through the enclosure (why didn't we just use GPS? Poor planning, there.). We visitors tend to arrive around mealtime at the buildings, so we have a biased view of the chimps spending all their time next to the human structure waiting for food. But what we found out there was a massive network of trails all throughout the enclosure, even in this oddball group that does less normal nesting behavior than enclosures 1-3 do.

The exciting chimp event while we were there was Milla's escape. While most chimps at Chimfunshi were rescued at a young age when they were being smuggled to be sold as pets, a few like Milla are more human socialized. She's an older chimp who before Chimfunshi was kept at a bar, smoking and drinking hilariously for patrons. So she had some different expectations from the more wild chimps, like having a blanket to sleep with, and drinking her water from bottles. She also was a much better problem solver, and while in the large outdoor enclosures she had figured out to take a long branch, prop it up on the electric fence, and climb on out. As a result, she'd been locked in a ~10'x10' cage so that she couldn't escape, and more importantly, couldn't teach the other chimps to escape. However, during a moment of confusion, her door was opened into another cage section, and thus she got out into the enclosure after feeding. The keepers expected her to try to escape, and sure enough she soon found a branch, and moseyed on up it and out into the world.

Since the keepers expected this, they had a video camera along for what would happen next, whih was mostly walking, sitting, looking around, and walking again. The only tense moment captured was when she made a big display dragging a wheelbarrow around by one hand. At one point she went to the keepers' hut where they make the nshima (maize meal) balls that are one of the food supplements for the chimps (this is not their native habitat, and I think the enclosures are too small to support the groups even if it was). She found the pots, put water in the pot, stirred the pot, and sampled the water from a spoon, as if she was cooking. She also did something that I think was mimicking washing dishes with one of the plates. Eventually one of the keepers got close enough to her to give a sedative injection by hand, at which point she was finally transferred to enclosure 5 where the 3 other escape artists live. This is a small enclosure more like you'd see at a zoo, with reinforced bars they can't break through and locks protected from being pried off, but at least now Milla gets friends and sun and things to climb again.

The school was fascinating. It was started 6 years ago, after my family started making earmarked donations for a teacher's salary. Teaching is done in English officially, though a lot of communication was in Bemba (the local language in this region). The morning class was ~7 students at the advanced level (averaging 5 present per day) who had some limited English and school skills, though there appeared to be a lot of rote learning. They could read and copy off the board, and sometimes respond to questions that had very structured answers, but open-ended questions went worse than in my sister's kindergarten classes. The afternoon class was the beginning 54 students, and it was chaos. How do you teach kids from age 4-12 all in one classroom? Especially when you don't have the ability to make copies, so you can't set one group up to work independently while doing something with another group of students at a different level. It looked like a lot of time at least half the class was just bored – either not understanding, or doing work they knew long ago. Still, it's better than before, when these kids were just playing around all day. Some are even going on to study in other schools and “make something of themselves” (the teacher is big on encouraging kids to become something more than subsistence farmers, and apparently has been successful at motivating many), and he's had some success in discouraging girls from pregnant at a young age.

class

"Mary wore her red dress" song in class

My mom and sister spent a lot of time writing and constructing reusable classroom materials. I was brought in as a cutter and contact-paperer. Previously the classroom materials consisted of a single badly designed curriculum book of each grade, 3 maps, faded posters of English color words, posters of English month and day names, and a few vocabulary-related posters that previous student groups had made. We added a reusable calendar, materials for doing songs to teach English, some posters for doing grammar lessons, and a string with clothespins in front so the posters could be temporarily hung up while doing a lesson.

The big news for the school while we were there was that Innocent (the Zambian general manager at Chimfunshi) successfully petitioned the government to recognize the collection of people that live next to Chimfunshi as a real village. With being a real village comes the support for having a school, which means that they'll have two teachers at 10x the salary our donation was providing. With that support, they'll be able to split the classes and more effectively teach, but also be able to get more classroom materials that are direly needed.

baboon
A student being groomed by a drunken baboon at the barbecue.
Seriously, people, hold on to your drink or the baboon will take it.

I'm now in Barcelona for the biking portion of my sabbatical. I'd like to thank daniels for hanging on to my bike whlie I was in Zambia, which saved outrageous amounts of money on shipping it that I would have had to do otherwise.

Zambia notes (day 4)


We land in Lusaka, and have another 6 hour layover before the little prop plane flight to Livingstone, so we head to the bank at the airport to exchange money. There are a couple of guys in uniforms already at the counter changing money. There are a surprising number of them waiting outside, too. We get in line, then another guy in uniform cuts ahead of us. Then a woman cuts ahead. Then another man does. Eventually they all finish up, and I think “maybe there's a different queuing system, like “ultimo” in Cuba, and we're ignorant tourists who violated that rule,” and I leave it at that.

Later we're in Livingstone, and meet up with our guide for the next few days, We get in this giant safari truck, and as he's about to pull out from the airport, we see a bunch of guys running, then some entourage of cars goes racing by, yelling at our driver for trying to pull out ahead of them. He gets stopped by some more uniforms, and hauled off to the police station to try to fish for some bribes, saying “were you trying to kill the First Lady, cutting her off like that?” The First Lady? Yeah, she just came from Lusaka. Oh! That woman who cut ahead of us in Lusaka was the president's wife! Now the cutting and all the guards (and, now that I think about it, the surprising amount of money being changed) suddenly makes sense. Also, our driver had the good sense to stash his money when he saw the cops coming, so he could tell them he didn't have any and not pay any bribes. It all worked out.

Our first morning, I groggily woke up with the thought “whatever asshole has that rooster sound for their alarm clock needs to quit hitting the snooze button.” Apparently I'm slow to adjust.

Victoria Falls is awesome. When you're offered a chance to rent a poncho for K5 (under $1), it's a good deal. Take it. You get to hike round this trail of beautiful viewpoints on a finger of rock extending out into the gorge below the falls, except that the water falling 80m or whatever produces spray back up 100m, and those viewpoints have nonstop rain on them from as spray came back down. Not just a bit of mist. Rain. The kind of rain that you'd say “man, it's pissing down rain out here.” The kind of rain that gets even Portlanders wet. You can't get a clear understanding of the width of the falls, because no matter where you are along that finger of rock, there's a fog of mist in your way as you look anywhere but directly in at the waterfall.

Hanging out in Vic Falls park is a significant pack of baboons. They are not shy. That's nice in that you get great pictures. It's not nice in that we saw two women get their bags stolen by baboons. One was successful, and the baboon ran off to a thicket to rip it apart looking for food.

We also went to a craft market. We'd been through one at Vic falls, where it was just a bunch of vendors selling the same stuff at every stall. At this market, though, there were guys in front of their stalls painting, or doing finishing sanding on chess boards. Seems legit, and we couldn't resist.

Well, I almost resisted, since I'm not that into buying things. But I got a good deal on some nice hand-carved stone bookends that I can actually use. So far my only success in haggling has been because I was legitimately not that interested (or had no money), and then walked away, and on the way back got offered a reasonable price. This was no different. The stall I was at was run by 3 brothers, and each wanted to sell you the particular pieces they made (and in the sale the one made to me, offered to throw in one of his brothers' pieces for cheap to avoid making change).

Later as we browsed other stalls, though, we saw a subset of “their” stuff elsewhere, and similarly for a few other things we'd bought. So my mom asked one of them at one point, and the woman explained that the vendors sell their things to each other to round out their stalls, which explains why the stone carvers stall was full of mostly stone, but still had some of those wooden masks and figurines, too, while the painters had mostly paintings but also a few of the most popular stone carvings and the same wood carvings as everyone else. Either that or they're all mass produced somewhere and we got scammed on authenticity of everything but the paintings, except I feel dirty thinking in terms of “authenticity” so I'm going to stop.

The next day, we took a trip to Chobe national park for a boat safari and jeep safari. I was extremely dubious of this – I imagined a fenced park with a few dejected animals maintained by the business owners. Actually, Chobe is an unenclosed, 11,000 km^2 national park, patrolled by the army to prevent poaching and keep people from offroading through it. We saw elephants, giraffes, kudu, impala, warthogs, buffalo, hippos, and an unreasonable number of crocodiles. No big cats, which while they exist in the park are more rare than I would have expected given that there were these delicious-looking kudu and impala everywhere you look. The most surprising though, was that perhaps even more numerous than the gazelles were the elephants, who were routinely mildly irritated that we were trying to drive on the roads they were moseying across. At one point, within about 20' of our jeep, 90 degrees of vision was solid elephants. 20' is not far away in the scale of elephants, especially when one looks at you, makes a little scream, and steps toward you.

My impression of wildlife worldwide has generally been that everything but squirrels and pigeons is on a rapid decline to extinction due to humanity ruining everything. But here in this park, there are reportedly 80,000 elephants, and we can attest that there are a ton of baby elephants. These aren't kept in some protected natural environment separated from humanity and predators – they're still scared of their (many) babies getting eaten by crocs when crossing the river, and there are still people in jeeps zipping around gawking at them all the time, but it still seems like they're thriving. All they're protected from is hunting by humans, which is easily paid for by gawking tourists like myself. My skepticism of ecotourism is now reduced.

On the way to Chobe, we had to take a ferry to cross into Botswana. All truck traffic apparently has to do the same. For I think a couple of kilometers before the ferry were flatbed trucks stacked with freight, waiting to get on the two ferries that could take a truck at a time. Apparently, they'll wait there for up to 2-3 days to get across (and a quick Fermi estimate says that's about right). There was a whole little town formed at the Zambian side of the crossing, offering mostly food and insurance on your goods. We were amazed – how would the governments of these countries not build a bridge here? The rivers not that huge, and it seems like you could charge some exorbitant tolls if it means saving having someone wait in line with a truck for 3 days, while worrying about your goods getting stolen (did I mention the insurance vendors?).

No photos for this post, since I lack an SD card reader. Even if I did, I'd have to sort through 1000 pictures of elephants and crocodiles to choose a couple. So far, no software work done, even on the airplane. Success!

Taking a break from software.

Today I'm leaving for my 2 month sabbatical from Intel. It's going to be a good break from nonstop software development for the last 7 years. My plan:

  • 1 week touring in Zambia (with my mom and sister)

  • 2 weeks at the Chumfunshi Wildlife Refuge in Zambia

  • 3 days in London with family.

  • Leave them, take the train to Barcelona

  • Bike from Barcelona up through France, hopefully into Italy, and end at Debconf 2013 (august 10-17).

During this time I'm going to try to do as little software work as I can.

Backyard slackline limits reached

Two weeks ago we built a slackline setup in our back yard. The issue we had was that we don't have any trees back there to tie up to. Common solutions in this case involve building an A frame and using whatever sort of anchor you can come up with, with plenty of options available.

We wanted better. The yard could only go to about 40 ft of line, and we didn't want to sacrifice precious length between our anchors and the A frame.

The first plan we were working with was to put a pipe in some cement, then slide a smaller pipe into it, and use that as our fake tree to anchor to: Now there's a solid anchor, but it's removable if I decide to sell the house or something some day. I found some numbers for guidelines for building railings, though, that indicated that you'd need massive steel pipe to support the loads we're talking about.

What we went with in the end was a wooden 4x4. We'd heard that slackliners were successfully using those in home setups. But we were a little wary of trusting a wood 4x4 more than a steel pipe. So what we buried in the cement was a post sleeve so that we could just slide our 4x4 into the cement hole after it was set. The cement was 3 feet deep and just over 1 foot across (if you decide to go this route: post hole diggers are *awesome*). This let us put an 8 foot 4x4 in each and be able to set a line at heights up to around 4 feet off the ground. But just in case, we also dropped some heavy chains into the cement as well in case we want anchors for A frames if this posts thing doesn't work out.

We first used the system last Sunday with great success. It's a typical 4-carabiner primitive system but we used a double pulley system behind that to get enough tension from a single person tightening that you'd stay off the ground in the middle. There was a disturbing amount of bending and some creaking in the 4x4s, but they held.

Today Scott was setting up the line again, and said "I got it nice and tight, look at that!", and I hopped on. I made it about 1/3 of the way, when there was a snapping sound and suddenly I was on the ground. Luckily failure wasn't as catastrophic as we feared. The post had just bent over, and not detached and gone flying.

Our next plan was to use steel I-beams: the backup plan that justified the 4x4 sleeves. I'm still concerned though -- a beam stress calculator program says that for what we're thinking is like up to 1600lbs of force at 4 feet from the support point, we end up with a maximum bending stress at the support point of 164 ksi on a S3x7.5 I-beam (the biggest that will fit in our sleeves as far as I can see). If I'm supposed to compare this number to the yield stress of the steel the beam would be made of, that number is only 22 ksi.

The plan for the moment is to throw together some A frames (actually, X frames -- Scott built and used some of those successfully this week, and it sounds easy enough) and use that unless we can figure out that I was wrong and steel will hold.

FOSDEM 2010

Through last minute travel approval, I got to come to FOSDEM again this year. I gave a short talk about cairo-gl. Openoffice presentation is here. But a few more words here since reading slides is failure.

I've been promising great 2D performance from open source graphics for years. It was reaching the point where I was feeling awfully bad about being wrong so frequently. So this summer I started playing in my free time with making a GL backend for cairo. There was a previous sort of GL backend in the form of glitz, but it made a big mistake in trying to abstract GL through a Render-like API. The problem with accelerating 2D is that Render is a bad match for hardware!

A native GL backend turned out to be shockingly easy, now that we have support for EXT_framebuffer_objects all over, non-power-of-two textures, and GLSL. Here's a comparison of 3 backends, normalized to the image backend. Bigger bars means faster.



This shows an accelerated backend beating the CPU rasterization backend on 3 tests. Note that things for the image backend are a little unfair in its favor -- we can't scan out from cached system memory buffers, so if you want to actually see the results you have to do an upload at some point, which isn't reflected in the cairo-perf-trace results. Being able to beat that with GPU rendering to something that could be scanned out is pretty awesome. But that's only 3 tests -- for most of them image is winning. I've got some ideas for hacks on the 965 driver that may fix up a bunch of those bars (it's hard to estimate, since it's all about cache effects, and fixing those has a tendency to improve by more than the amount of time spent according to sysprof).

Since comparing to image isn't too fair, and we're not using image today, I did a comparison to xlib. This looks awesome:



By replacing Xlib usage with GL, we get a speedup on almost all the testcases, and a huge speed up on one that Xlib is pathologically slow on (I haven't figured out why for xlib yet). We've got a good pass rate on the cairo test suite, so I think this stuff is ready for people to start experimenting with in apps.

There's much more to do for performance still. I've got a plan to work on the 965 driver to improve glyphs-heavy tests like firefox-talos-gfx (and ETQW and WoW as well). For firefox-talos-svg, right now we're hitting aperture full because of all the spans data we're sending out before the GPU gets done with things. If we speed up the GPU rendering just a little, for example by tuning the inefficient shaders we're using right now, we can probably avoid hitting aperture full and cut CPU further. I think we're missing throttling for non-swapbuffers apps in DRI2, and we might actually do better and avoid aperture full if we do some appropriate throttling. And there's a lot of room for people who'd like to experiment with GL shader and state optimizations to jump in and tear this code apart.

I'd say that the Linux 2D acceleration story is starting to finally look good after all these years.

FOSDEM 2010

Through last minute travel approval, I got to come to FOSDEM again this year. I gave a short talk about cairo-gl. Openoffice presentation is here. But a few more words here since reading slides is failure.

I've been promising great 2D performance from open source graphics for years. It was reaching the point where I was feeling awfully bad about being wrong so frequently. So this summer I started playing in my free time with making a GL backend for cairo. There was a previous sort of GL backend in the form of glitz, but it made a big mistake in trying to abstract GL through a Render-like API. The problem with accelerating 2D is that Render is a bad match for hardware!

A native GL backend turned out to be shockingly easy, now that we have support for EXT_framebuffer_objects all over, non-power-of-two textures, and GLSL. Here's a comparison of 3 backends, normalized to the image backend. Bigger bars means faster.



This shows an accelerated backend beating the CPU rasterization backend on 3 tests. Note that things for the image backend are a little unfair in its favor -- we can't scan out from cached system memory buffers, so if you want to actually see the results you have to do an upload at some point, which isn't reflected in the cairo-perf-trace results. Being able to beat that with GPU rendering to something that could be scanned out is pretty awesome. But that's only 3 tests -- for most of them image is winning. I've got some ideas for hacks on the 965 driver that may fix up a bunch of those bars (it's hard to estimate, since it's all about cache effects, and fixing those has a tendency to improve by more than the amount of time spent according to sysprof).

Since comparing to image isn't too fair, and we're not using image today, I did a comparison to xlib. This looks awesome:



By replacing Xlib usage with GL, we get a speedup on almost all the testcases, and a huge speed up on one that Xlib is pathologically slow on (I haven't figured out why for xlib yet). We've got a good pass rate on the cairo test suite, so I think this stuff is ready for people to start experimenting with in apps.

There's much more to do for performance still. I've got a plan to work on the 965 driver to improve glyphs-heavy tests like firefox-talos-gfx (and ETQW and WoW as well). For firefox-talos-svg, right now we're hitting aperture full because of all the spans data we're sending out before the GPU gets done with things. If we speed up the GPU rendering just a little, for example by tuning the inefficient shaders we're using right now, we can probably avoid hitting aperture full and cut CPU further. I think we're missing throttling for non-swapbuffers apps in DRI2, and we might actually do better and avoid aperture full if we do some appropriate throttling. And there's a lot of room for people who'd like to experiment with GL shader and state optimizations to jump in and tear this code apart.

I'd say that the Linux 2D acceleration story is starting to finally look good after all these years.

video hackfest

I got back from the video hackfest a day ago. It was a very productive meeting, even if most of it was getting across the features and requirements of our various projects to the other projects involved. I finally understood why using GL in gstreamer is hard, why gstreamer wants to be that way, why we're changing the locking model in cairo, and other details of projects I'm interested but not actively involved in. And I got a few chances to go over how GL works with threading and the ridiculous context model that it has.

Out of that, I ended up with the project of making GLX allow binding a single GL context into multiple threads. This should largely fix the disaster that is GL multithreading. The basic idea is: I have a collection of threads that want to work on a single context because they're sharing all the same objects and want to have some sort of serialization into a GL command stream. If we pass the context around between the threads, unbinding and rebinding, you get this forced command stream flushing that will kill performance. If we make multiple contexts, then at the transition of changed objects between one thread and another you have to flush in the producer and re-bind the object in the consumer. Whether or not that could perform well, we determined that in the gstreamer model we couldn't know in time whether the producer is going to be in a different thread than the consumer: you'd have to flush every time just like passing a single context around.

So here comes a simple hack: Just rip out the piece of the spec that says you can't bind one context into multiple threads. Tell the user that if they do this, locking is up to them. It's not an uncommon position for projects to take, and it will let us do exactly what we want in gstreamer: everyone works in the same context[1], and when you want access to the GL context, just grab the lock for the context and go.

Development trees are at:
http://cgit.freedesktop.org/~anholt/piglit/log/?h=mesa-multithread-makecurrent
http://cgit.freedesktop.org/~anholt/mesa/log/?h=mesa-multithread-makecurrent

The testcase gets bad rendering at the moment. So I made the testcase for the non-extended version, and it still didn't render, with either i965 or swrast. Next step is to test my testcase against someone else's GL.

Incidentally, Apple's GL spec allows binding a single context to multiple threads like this. Windows GL doesn't.

[1] OK, so that's not exactly true. I'm assuming that elements negotiate a single context through some handwavy caps magic -- people have said that this is possible. You can still end up with two contexts, though, like with the following pipeline: videotestsrc ! glupload ! glfilterblur ! gldownload ! gamma ! glupload ! glimagesink. The second group of gl elements doesn't know about the first, or have any way to communicate with them. But if each element calls glXMakeCurrent at the start, it'll be approximately free for the one-context case, and just work for the multiple-contexts case.

video hackfest

I got back from the video hackfest a day ago. It was a very productive meeting, even if most of it was getting across the features and requirements of our various projects to the other projects involved. I finally understood why using GL in gstreamer is hard, why gstreamer wants to be that way, why we're changing the locking model in cairo, and other details of projects I'm interested but not actively involved in. And I got a few chances to go over how GL works with threading and the ridiculous context model that it has.

Out of that, I ended up with the project of making GLX allow binding a single GL context into multiple threads. This should largely fix the disaster that is GL multithreading. The basic idea is: I have a collection of threads that want to work on a single context because they're sharing all the same objects and want to have some sort of serialization into a GL command stream. If we pass the context around between the threads, unbinding and rebinding, you get this forced command stream flushing that will kill performance. If we make multiple contexts, then at the transition of changed objects between one thread and another you have to flush in the producer and re-bind the object in the consumer. Whether or not that could perform well, we determined that in the gstreamer model we couldn't know in time whether the producer is going to be in a different thread than the consumer: you'd have to flush every time just like passing a single context around.

So here comes a simple hack: Just rip out the piece of the spec that says you can't bind one context into multiple threads. Tell the user that if they do this, locking is up to them. It's not an uncommon position for projects to take, and it will let us do exactly what we want in gstreamer: everyone works in the same context[1], and when you want access to the GL context, just grab the lock for the context and go.

Development trees are at:
http://cgit.freedesktop.org/~anholt/piglit/log/?h=mesa-multithread-makecurrent
http://cgit.freedesktop.org/~anholt/mesa/log/?h=mesa-multithread-makecurrent

The testcase gets bad rendering at the moment. So I made the testcase for the non-extended version, and it still didn't render, with either i965 or swrast. Next step is to test my testcase against someone else's GL.

Incidentally, Apple's GL spec allows binding a single context to multiple threads like this. Windows GL doesn't.

[1] OK, so that's not exactly true. I'm assuming that elements negotiate a single context through some handwavy caps magic -- people have said that this is possible. You can still end up with two contexts, though, like with the following pipeline: videotestsrc ! glupload ! glfilterblur ! gldownload ! gamma ! glupload ! glimagesink. The second group of gl elements doesn't know about the first, or have any way to communicate with them. But if each element calls glXMakeCurrent at the start, it'll be approximately free for the one-context case, and just work for the multiple-contexts case.

goodbye internets

This Sunday I'm heading out on a bike trip through Oregon and Washington with my housemate's dad's family and friends. 50-70 miles a day for 7 days. Gear hauled by car, sleeping in hotels, and eating out all the time. The first two ground rules that were sent out:


  • The morning of the ride we draw straws for SAG wagon driver order. Everybody drives a half day. After everybody has driven once we negotiate.

  • Afternoon driver responsible for buying a fifth of Jack Daniels and three or four six packs of good beer, plus chips and salsa. Also some soda.



You probably won't hear from me for a while. I don't exactly intend to be merging patches in the evenings.

I've certainly been busy with patches recently, though. The big update yesterday was http://lists.freedesktop.org/archives/intel-gfx/2009-September/004122.html -- those nasty 8xx hangs with GEM should now be fixed in drm-intel-next, to be landing in a 2.6.31.x near you soon. I've spent a long time on this bug, and we came incredibly close to fixing it with the clflush idea on the 8xx_chipset_flush() back in March, but the fact that we were writing to an uncached page meant we completely bypassed the cache we were trying to flush.

The 2.6.31 kernel is finally out, with a lot of improvements in our graphics driver and only a slight delay in releasing due to us screwing up (uh, let's not do that again). One of the biggest additions is DisplayPort support. DP is like HDMI done right -- more bandwidth, nice connectors, better compatibility story, and a low power design for use inside of laptops. keithp wrote the code and has been using it on his x200s's dock, and I came awfully close to picking up an x301 which has it on the laptop. KMS is also in much better shape once again, with a ton of work from Ma Ling, Zhao Yakui, and Jesse Barnes. It's also nice in looking back at the log to see a lot of fixes for serious issues in from non-Intel folks -- integrating with the community is continuing to pay off for us. My contribution this cycle was generally GEM stability fixes again, though a number of fixes came from Chris Wilson's careful reviews of the code.

I've also fixed what I think was the last major regression for texture tiling, which is around a 30% win on many GPU-bound OpenGL apps on the 965 and newer. That'll be landing on by default in Mesa 7.6.

On the plate for 2.6.32 is framebuffer compression support for around a .5W power savings, automatic downclocking of the GPU when idle (no need for user power management, and another .5W or so), experimental new GPU reset support from Ben Gamari and Owain Ainsworth (That's right, an OpenBSD developer), and of course more KMS fixes and new hardware support. We're also probably going to land execbuf2, which will let us do texture tiling efficiently on the pre-965 hardware.

Most of my time recently, though, I've finally been able to get back to spending serious time with Mesa after being stuck in 2D and the kernel for years. It's where I enjoy working, despite the build system and development model. In the last few weeks, I've added support for ARB_sync, ARB_map_buffer_range, ARB_depth_clamp, ARB_copy_buffer, and ARB_draw_elements_base_vertex for our hardware. All but ARB_depth_clamp (AKA NV_depth_clamp) work on pre-965 hardware as well.

I also fixed up some major performance penalties for applications using OpenGL correctly. I of course found these by writing an application using OpenGL the way it should be used as of the last 5 years or so. Unfortunately most open-source software out there fails in that respect (thinking in particular of OpenArena, blender, and the various TuxRacer forks, being what seem to be the most popular offenders). One fix was a 50-100% fps improvement, and another was a 70% CPU usage win. I'm going to be working on NV_primitive_restart support soon, which will be another CPU usage win, and likely a performance win as well on the 965.

Another big change for us has been the addition of Ben Holmes to our team. He's a fellow student in idr's OpenGL class, who's been writing tests for us. A lot of bugfixes I've done recently have been "hey, Ben, I think there might be a problem if an app does the following thing." He writes a test in piglit, tests that it fails, sends it out, we add it to the tree, then go fix the driver. Today, I got dFdx()/dFdy() fixed in our GLSL support by exactly this method. Unfortunately we'll be losing him back to school soon, but it's been well worth it so far.

goodbye internets

This Sunday I'm heading out on a bike trip through Oregon and Washington with my housemate's dad's family and friends. 50-70 miles a day for 7 days. Gear hauled by car, sleeping in hotels, and eating out all the time. The first two ground rules that were sent out:


  • The morning of the ride we draw straws for SAG wagon driver order. Everybody drives a half day. After everybody has driven once we negotiate.

  • Afternoon driver responsible for buying a fifth of Jack Daniels and three or four six packs of good beer, plus chips and salsa. Also some soda.



You probably won't hear from me for a while. I don't exactly intend to be merging patches in the evenings.

I've certainly been busy with patches recently, though. The big update yesterday was http://lists.freedesktop.org/archives/intel-gfx/2009-September/004122.html -- those nasty 8xx hangs with GEM should now be fixed in drm-intel-next, to be landing in a 2.6.31.x near you soon. I've spent a long time on this bug, and we came incredibly close to fixing it with the clflush idea on the 8xx_chipset_flush() back in March, but the fact that we were writing to an uncached page meant we completely bypassed the cache we were trying to flush.

The 2.6.31 kernel is finally out, with a lot of improvements in our graphics driver and only a slight delay in releasing due to us screwing up (uh, let's not do that again). One of the biggest additions is DisplayPort support. DP is like HDMI done right -- more bandwidth, nice connectors, better compatibility story, and a low power design for use inside of laptops. keithp wrote the code and has been using it on his x200s's dock, and I came awfully close to picking up an x301 which has it on the laptop. KMS is also in much better shape once again, with a ton of work from Ma Ling, Zhao Yakui, and Jesse Barnes. It's also nice in looking back at the log to see a lot of fixes for serious issues in from non-Intel folks -- integrating with the community is continuing to pay off for us. My contribution this cycle was generally GEM stability fixes again, though a number of fixes came from Chris Wilson's careful reviews of the code.

I've also fixed what I think was the last major regression for texture tiling, which is around a 30% win on many GPU-bound OpenGL apps on the 965 and newer. That'll be landing on by default in Mesa 7.6.

On the plate for 2.6.32 is framebuffer compression support for around a .5W power savings, automatic downclocking of the GPU when idle (no need for user power management, and another .5W or so), experimental new GPU reset support from Ben Gamari and Owain Ainsworth (That's right, an OpenBSD developer), and of course more KMS fixes and new hardware support. We're also probably going to land execbuf2, which will let us do texture tiling efficiently on the pre-965 hardware.

Most of my time recently, though, I've finally been able to get back to spending serious time with Mesa after being stuck in 2D and the kernel for years. It's where I enjoy working, despite the build system and development model. In the last few weeks, I've added support for ARB_sync, ARB_map_buffer_range, ARB_depth_clamp, ARB_copy_buffer, and ARB_draw_elements_base_vertex for our hardware. All but ARB_depth_clamp (AKA NV_depth_clamp) work on pre-965 hardware as well.

I also fixed up some major performance penalties for applications using OpenGL correctly. I of course found these by writing an application using OpenGL the way it should be used as of the last 5 years or so. Unfortunately most open-source software out there fails in that respect (thinking in particular of OpenArena, blender, and the various TuxRacer forks, being what seem to be the most popular offenders). One fix was a 50-100% fps improvement, and another was a 70% CPU usage win. I'm going to be working on NV_primitive_restart support soon, which will be another CPU usage win, and likely a performance win as well on the 965.

Another big change for us has been the addition of Ben Holmes to our team. He's a fellow student in idr's OpenGL class, who's been writing tests for us. A lot of bugfixes I've done recently have been "hey, Ben, I think there might be a problem if an app does the following thing." He writes a test in piglit, tests that it fails, sends it out, we add it to the tree, then go fix the driver. Today, I got dFdx()/dFdy() fixed in our GLSL support by exactly this method. Unfortunately we'll be losing him back to school soon, but it's been well worth it so far.

anholt @ 2009-07-22T21:19:00

Another quarter, and another release. I think we've made big progress here. One of my favorite reports on the mailing list was that a company deploying our graphics driver was delightfully surprised that their XV tearing issues were fixed. That was a lot of work, and I was despairing of them saying it didn't work out.

A few things I was involved with that I'm happy with:
- Fixing the "memory leak" of GEM buffer objects
- Fixing quake4 and doom3 which regressed back in Mesa 7.4
- Fixing VBO performance on 915 (I don't like discouraging people from doing the right thing)
- OpenGL Performance and correctness fixes for cairo-gl
- Fixing a bunch of FBO problems, particularly on 965
- Hardware-accelerated glGenerateMipmaps and SGIS_generate_mipmap. Finally.
- Fixing many GLSL bugs on 965.
- Fixing occlusion queries (sauerbraten)
- Fixing crystalspace regression (woo supporting other open-source software developers)
- Fixing ut2004 hangs on G4x.

Already, things are looking exciting for our next release. Thanks to cairo-perf-trace, I've just landed a 10% improvement in firefox performance for UXA. We're also working towards the future with cairo-gl, which I'd started doing in my free time based off of ickle's cairo-drm work, and is now merged into cairo master. This is a major step towards maintaining one driver (OpenGL) per chipset, with the other part being doing a real X-on-GL backend so that legacy stuff not using cairo doesn't suffer too badly. Both of these are now on our plates for work activity in the next few months.

But we've got a ways to go to get there. I know we've got fixes to be made to our OpenGL before cairo-gl's going to shine, and cairo-gl needs a lot of work as well:
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image             firefox-20090601   92.877  108.208   6.54%   15/15
[  0]     xlib             firefox-20090601   46.609   46.832   0.28%    6/6
[  0]       gl             firefox-20090601  238.103  238.195   0.35%    5/6
(tested with master of everything on a 945GM, lower is better)


One of the things we need to figure out is what sort of shader support cairo-gl's going to be based on, and what we want to do in our driver to support it. The 915 and other hardware of that era can't do dynamic flow control, so many GLSL shaders would be unimplementable. But we as developers of software targeting GL on these chips would love to write in GLSL instead of ugly ARB_fragment_program and ARB_vertex_program, even if we know we can't use some language features. We could maybe expose the GLES GLSL extension on 915, which explicitly says that programs with dynamic flow control and other features missing on this generation of chips may not compile. We could also be sneaky and do it on desktop OpenGL GLSL and be within spec (I think, and afaik some closed vendors have done it as well), though some apps might get angry with us for doing so. This is up in the air, but I'm hoping our answer is "expose it".

(Oh, and other people are doing exciting things, like jbarnes and mjg59 making big power savings, but I'll leave blogging to them)