Monthly Archive for May, 2008
I recently acquired a Revoltec Alu Book USB mass storage enclosure for a 2.5″ PATA HDD, which is based on the Myson CE8818 chipset and therefore matched by the (wrongly named, as this matches all CE8818 based devices) following USB quirk in FreeBSD -current:
{ USB_VENDOR_MYSON, USB_PRODUCT_MYSON_HEDEN, RID_WILDCARD,
UMASS_PROTO_SCSI | UMASS_PROTO_BBB,
NO_INQUIRY | IGNORE_RESIDUE
},
The enclosure worked fine for a while, but then started to fail under FreeBSD with heavy disk activity, spewing the following messages in dmesg:
kernel: umass0:
cp(1), which I used for reproducing the problem, said cp: /foo/bar/baz.txt: Bad address and the destination file was corrupt.
I investigated the possible course of the failure. A dying disk? A quirky chipset? A change in the FreeBSD umass(4) driver or something else?
After having spent weeks of debugging this periodical error, it eventually turned out to be a dying USB HDD enclosure.
Close examination of the PCB showed that some of the lines connecting the HDD connector to the chipset had clearly been repaired before shipping this unit, but no coat of varnish had been given afterwards – leading to corrosion of the PCB over time.
I have just replaced the USB HDD enclosure with a new one (from a different vendor, of course) – and I can no longer reproduce the above problem with the same HDD installed.
Lesson learned: Life is too short for dealing with (cheap) hardware.
Over the last couple of days
I had a sad opportunity to use
Poul-Henning Kamp’s recoverdisk(1) utility.
Since it turned out to be a life- (well, disk-) saving device, and it is covered by the beer-ware license, I definitely owe phk a beer.
Yesterday morning I discovered that my server is down. After traveling to the server room (there is no remote console) and pretending that my index finger is square in its cross-section, my investigation showed that there is something fishy with the system disk (which is backed up but not mirrored).
Namely, there are bad blocks, and periodic scripts that run at night touch some of them, which leads to bad things.
Now, the first thing I did was to order a replacement disk. The problem was that it won’t arrive until the beginning of next week and I want the box up and running now. Even worse, during the weekend I am going to be in Stockholm for the Nordic Perl Workshop (oh, and by the way I still have a presentation to prepare), and thus won’t be able to fix things that require my presence on-site.
After asking around, I got pointed in the direction of recoverdisk(1)
by Phil.
Thankfully, recoverdisk /dev/ad4 told me exactly how many
bad blocks there are (one) and what offsets they are at.
The next step was to make the on-disk controller to remap the block to one of the good reserve sectors on the disk. While I am sure that there are programs that will do just that, I am not aware of any that run on FreeBSD.
Besides, having the offsets, it was a trivial task to quickly create a simple one-shot program that writes something to the bad block, so that the disk will have an opportunity to remap the sector all by itself.
The only problem I had with this was that I could not open(2)
the raw disk for writing while any partitions were mounted
on it. I was ready to move the disk to another box to
run the program there,
but Flemming has helpfully told me that doing
sysctl kern.geom.debugflags=16 will do the trick. And it did.
Thanks, Flemming!
After this there are four more steps - run recoverdisk again
to make sure everything’s fine, run fsck on all partitions,
put the box online, and move
the system to the new and shiny disk when it finally arrives.
While this last step will have to wait a bit more, the fact that you are reading this shows that everything else worked.
I do realize that the primary purpose of the recoverdisk(1)
is to salvage the data from media that has gone hopelessly bad.
Nevertheless, I think that my example shows that it is pretty darn
useful in other cases as well.
Several people helped me along the way. I have not yet mentioned Lars, who did some heavy lifting, and Kristoffer for doing the network magic with subnet routing when the box was moved to another, closer location.
Lessons learned:
remote console is useful;
mirroring the system disk is essential.
Open question: what does phk do with all the beers he is getting when I know for a fact that he does not drink much?
They're looking into it, and hopefully we'll have a playable game soon.
Hi,
Due to the recent Debian OpenSSL trainwreck[0], we've had to do a fair
bit of housecleaning with regards to authentication.
Firstly, the host keys have been regenerated, as below:
root@fruit:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 1e:81:13:df:b9:68:fc:c2:ec:9d:c3:87:d1:5e:30:77 /etc/ssh/ssh_host_rsa_key.pub
root@gabe:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 c1:1a:8a:e5:99:ce:5a:d9:a9:e2:b3:95:67:95:9d:f7 /etc/ssh/ssh_host_rsa_key.pub
root@kemper:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 95:b5:28:3d:9b:37:55:d4:fc:3d:99:b4:06:9d:9b:5f /etc/ssh/ssh_host_rsa_key.pub
root@annarchy:~% ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
2048 32:3e:0c:df:0a:c8:a6:33:72:9c:6c:ba:68:58:d2:30 /etc/ssh/ssh_host_rsa_key.pub
You'll note that these are RSA-only. DSA is no longer supported, nor is
SSH1.
Secondly, all vulnerable keys (weak RSA keys, RSA1 keys, and DSA keys)
have been removed; anyone who had a vulnerable key will have received an
email from myself at whichever address you had in LDAP, explaining what
happened, and how to fix it[1].
annarchy.fd.o (hosting bugs.fd.o, www.x.org, and others) is still having
major issues, thanks to the Moin 1.6 upgrade being unbelievably painful;
thanks very much to Benjamin Close for somehow dealing with this
godawful upgrade, which is running its load average up to 116, and using
up to 7GB of RAM just to convert a wiki from Moin 1.5 to 1.6.
The snakeoil cert from bugs.fd.o is still vulnerable, and feel free to
distrust it just as much as any other snakeoil cert. We'll be getting a
real cert from CAcert[2] soonish, but regenerating our snakeoil in the
meantime.
Thanks for bearing with us; if it's any consolation, it's not been the
best week for admins.
Cheers,
Daniel
[0]: http://lists.debian.org/debian-security-announce/2008/msg00152.html
[1]: http://www.freedesktop.org/wiki/AccountMaintenance
[2]: http://www.cacert.org -- add its certs to your browser if they
aren't there, and don't forget to let your distribution and/or
browser vendor know.
This year, I’m working on porting grep, sort and diff from OpenBSD. You can read more about my project in the original proposal. If you wanto to see the progress, you can look at my wiki page, although I’m going to post the most important milestones here.