Keith has noted the adventures
that we've recently had in the GEM branch with the disagreement between the CPU and GPU about how memory gets addressed. We've got a pretty decent solution now I think, though I'm having some troubles getting the MCHBAR mapped on desktop 965s, so I can't tell what mode the CPU is in. I automatically disable tiling in that case, to avoid broken rendering. Apparently the MCHBAR's locked out so that mortals don't go in and break their memory configuration. But overclockers have figured out how to unlock it, so I'm sure we will figure out as well in due time.
Next project for me is fixing issues with PBO and FBO. While we enabled support for the ARB_framebuffer_objects and ARB_pixel_buffer_objects on 965 with TTM/GEM development, that implementation from the 915 came with a lot of bugs that we haven't got around to fixing. Now as more people are looking to use buffer objects, we need to get those bugs fixed.
The first issue I've found is that we're not flushing batchbuffers full of rendering before mapping buffer objects. This will anger conformance tests that do rendering then read the resulting data out immediately. I wrote up a fix for this today that I'll be testing in the next couple of days.
The second is that we're trying to use the 2d blitter for accelerating copies between buffer objects and the screen. That's things like glReadPixels to a PBO, glDrawPixels from a PBO, glCopyTexSubImage, glCopyPixels with read != draw drawable, etc. The trick is that for buffer objects we render them upside down compared to the shared front and back buffers. So when we're blitting between the two we need to invert the data -- set a negative pitch on one of the buffers and a base address somewhere at the other end. The blitter's supposed to be cool with this. Except that there's this comment in the code:
/* Initial y values don't seem to work with negative pitches. If
* we adjust the offsets manually (below), it seems to work fine.
So we need to not just put the offset somewhere on the other side, but exactly so that y == 0 in the offset we set is the end of the blit area. Combine this with the fact that for tiled buffers the offset has to be 4KB-aligned, and it means that we're probably going to be angering our blitter if you choose unpleasant offsets for the the part of the blit that's in the shared front/back/depth.
I might be able to work around it today by just flipping the buffer that isn't tiled instead, assuming that both aren't tiled. But we want to get to the point of tiling them both. I think the right solution here is to just ditch using the blitter for almost everything. If we figure out how to add meta operations to mesa for these sort of pixel path operations, we could write generic acceleration for anybody who wanted to use it, by mapping them to normal GL operations using texturing. While it's some CPU overhead for state management, the 3D path is supposed to be faster than 2D GPU-wise, and it would get rid of a bunch of metaops code inside of our driver which has proved to be fragile at best.
In the next week I'm hoping to work on getting some metaops set up. Beyond PBO, it would also help us for implementing accelerated SGIS_generate_mipmap, which is currently hurting compiz and other apps, and more complete glBitmap which is hurting mesa demos (and we know how important those are).