Monthly Archive for May, 2008

Oleksandr Tymoshenko: bsddev blog SitRep

Long time no blog. Many things have happened during last 6 months: I got a commit bit, FreeBSD/MIPS reached multiuser and started migration from P4 to CVS. Right now we're waiting for toolchain patches to be imported to contrib/binutils properly. So let's prepare to celebrate buildable FreeBSD/MIPS world in a couple of weeks! Meanwhile I'm busy with getting latest zaptel (they changed name to DAHDI, actually) drivers to FreeBSD and a couple of side aÑ?tivities like digging into aio code from OpenSolaris/Linux/FreeBSD. Hope to blog more regularly now. Stay tuned.

Henrik Brix Andersen: Life is too short for (cheap) hardware

I recently acquired a Revoltec Alu Book USB mass storage enclosure for a 2.5″ PATA HDD, which is based on the Myson CE8818 chipset and therefore matched by the (wrongly named, as this matches all CE8818 based devices) following USB quirk in FreeBSD -current:


        { USB_VENDOR_MYSON,  USB_PRODUCT_MYSON_HEDEN, RID_WILDCARD,
          UMASS_PROTO_SCSI | UMASS_PROTO_BBB,
          NO_INQUIRY | IGNORE_RESIDUE
        },

The enclosure worked fine for a while, but then started to fail under FreeBSD with heavy disk activity, spewing the following messages in dmesg:

kernel: umass0:  on uhub4
root: Unknown USB device: vendor 0x04cf product 0x8818 bus uhub4
kernel: da0 at umass-sim0 bus 0 target 0 lun 0
kernel: da0: <  > Removable Direct Access SCSI-2 device
kernel: da0: 40.000MB/s transfers
kernel: da0: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C)
...
kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 8 45 78 6f 0 0 48 0
kernel: (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
kernel: (da0:umass-sim0:0:0:0): SCSI Status: Check Condition
kernel: (da0:umass-sim0:0:0:0): ILLEGAL REQUEST asc:20,0
kernel: (da0:umass-sim0:0:0:0): Invalid command operation code
kernel: (da0:umass-sim0:0:0:0): Unretryable error
kernel: g_vfs_done():da0s1a[READ(offset=71050477568, length=36864)]error = 22
kernel: vnode_pager_getpages: I/O read error
kernel: vm_fault: pager read error, pid 27989 (cp)

cp(1), which I used for reproducing the problem, said cp: /foo/bar/baz.txt: Bad address and the destination file was corrupt.

I investigated the possible course of the failure. A dying disk? A quirky chipset? A change in the FreeBSD umass(4) driver or something else?

After having spent weeks of debugging this periodical error, it eventually turned out to be a dying USB HDD enclosure.

Close examination of the PCB showed that some of the lines connecting the HDD connector to the chipset had clearly been repaired before shipping this unit, but no coat of varnish had been given afterwards – leading to corrosion of the PCB over time.

I have just replaced the USB HDD enclosure with a new one (from a different vendor, of course) – and I can no longer reproduce the above problem with the same HDD installed.

Lesson learned: Life is too short for dealing with (cheap) hardware.

Anton Berezin: recoverdisk(1) and the sad, sad story of a bad, bad disk block

Over the last couple of days I had a sad opportunity to use Poul-Henning Kamp’s recoverdisk(1) utility.

Since it turned out to be a life- (well, disk-) saving device, and it is covered by the beer-ware license, I definitely owe phk a beer.

Yesterday morning I discovered that my server is down. After traveling to the server room (there is no remote console) and pretending that my index finger is square in its cross-section, my investigation showed that there is something fishy with the system disk (which is backed up but not mirrored).

Namely, there are bad blocks, and periodic scripts that run at night touch some of them, which leads to bad things.

Now, the first thing I did was to order a replacement disk. The problem was that it won’t arrive until the beginning of next week and I want the box up and running now. Even worse, during the weekend I am going to be in Stockholm for the Nordic Perl Workshop (oh, and by the way I still have a presentation to prepare), and thus won’t be able to fix things that require my presence on-site.

After asking around, I got pointed in the direction of recoverdisk(1) by Phil. Thankfully, recoverdisk /dev/ad4 told me exactly how many bad blocks there are (one) and what offsets they are at.

The next step was to make the on-disk controller to remap the block to one of the good reserve sectors on the disk. While I am sure that there are programs that will do just that, I am not aware of any that run on FreeBSD.

Besides, having the offsets, it was a trivial task to quickly create a simple one-shot program that writes something to the bad block, so that the disk will have an opportunity to remap the sector all by itself.

The only problem I had with this was that I could not open(2) the raw disk for writing while any partitions were mounted on it. I was ready to move the disk to another box to run the program there, but Flemming has helpfully told me that doing sysctl kern.geom.debugflags=16 will do the trick. And it did. Thanks, Flemming!

After this there are four more steps - run recoverdisk again to make sure everything’s fine, run fsck on all partitions, put the box online, and move the system to the new and shiny disk when it finally arrives.

While this last step will have to wait a bit more, the fact that you are reading this shows that everything else worked.

I do realize that the primary purpose of the recoverdisk(1) is to salvage the data from media that has gone hopelessly bad. Nevertheless, I think that my example shows that it is pretty darn useful in other cases as well.

Several people helped me along the way. I have not yet mentioned Lars, who did some heavy lifting, and Kristoffer for doing the network magic with subnet routing when the box was moved to another, closer location.

Lessons learned:

  • remote console is useful;

  • mirroring the system disk is essential.

Open question: what does phk do with all the beers he is getting when I know for a fact that he does not drink much?

Eric Anholt: On the Rain-Slick Precipice of Darkness and Mesa

I was very excited for the release of On the Rain-Slick Precipice of Darkness. Unfortunately, it appears to be broken on open-source graphics drivers. After a bit of debugging with some folks on the forums, it looks like the game is testing for OpenGL extensions before creating a GL context. Since we only load the driver at the time someone makes a direct context, we have no idea what to return for the extension list, and just return NULL. The game appears to take that answer and get all indignant that we don't support things like ARB_multitexture.

They're looking into it, and hopefully we'll have a playable game soon.

Eric Anholt: freedesktop.org mess

EDIT: daniels (the guy doing all the work) posted a good summary of what happened to fd.o to announce@, so I'll quote that:

Hi,
Due to the recent Debian OpenSSL trainwreck[0], we've had to do a fair
bit of housecleaning with regards to authentication.

Firstly, the host keys have been regenerated, as below:
root@fruit:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 1e:81:13:df:b9:68:fc:c2:ec:9d:c3:87:d1:5e:30:77 /etc/ssh/ssh_host_rsa_key.pub
root@gabe:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 c1:1a:8a:e5:99:ce:5a:d9:a9:e2:b3:95:67:95:9d:f7 /etc/ssh/ssh_host_rsa_key.pub
root@kemper:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 95:b5:28:3d:9b:37:55:d4:fc:3d:99:b4:06:9d:9b:5f /etc/ssh/ssh_host_rsa_key.pub
root@annarchy:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 32:3e:0c:df:0a:c8:a6:33:72:9c:6c:ba:68:58:d2:30 /etc/ssh/ssh_host_rsa_key.pub

You'll note that these are RSA-only. DSA is no longer supported, nor is
SSH1.

Secondly, all vulnerable keys (weak RSA keys, RSA1 keys, and DSA keys)
have been removed; anyone who had a vulnerable key will have received an
email from myself at whichever address you had in LDAP, explaining what
happened, and how to fix it[1].

annarchy.fd.o (hosting bugs.fd.o, www.x.org, and others) is still having
major issues, thanks to the Moin 1.6 upgrade being unbelievably painful;
thanks very much to Benjamin Close for somehow dealing with this
godawful upgrade, which is running its load average up to 116, and using
up to 7GB of RAM just to convert a wiki from Moin 1.5 to 1.6.

The snakeoil cert from bugs.fd.o is still vulnerable, and feel free to
distrust it just as much as any other snakeoil cert. We'll be getting a
real cert from CAcert[2] soonish, but regenerating our snakeoil in the
meantime.

Thanks for bearing with us; if it's any consolation, it's not been the
best week for admins.

Cheers,
Daniel

[0]: http://lists.debian.org/debian-security-announce/2008/msg00152.html
[1]: http://www.freedesktop.org/wiki/AccountMaintenance
[2]: http://www.cacert.org -- add its certs to your browser if they
aren't there, and don't forget to let your distribution and/or
browser vendor know.

Gabor Kovesdan: Hungarian Handbook

Finally, it’s ready! It was committed to CVS by pgj@. You can read it here.

Gabor Kovesdan: SoC 2008: Porting BSD-licensed Text-Processing Tools from OpenBSD

This year, I’m working on porting grep, sort and diff from OpenBSD. You can read more about my project in the original proposal. If you wanto to see the progress, you can look at my wiki page, although I’m going to post the most important milestones here.