Category Archives: en

The return of the FreeBSD desktop

I have a confession to make: I haven't used FreeBSD as a desktop OS for years. The reason is twofold:

  1. Since 2005, my work has required me to run Linux (Debian and Ubuntu at Linpro, RedHat at the University of Oslo) and, briefly, Windows at Kongsberg Maritime. I eventually stopped using stationary computers, resorting instead to a (company-provided) laptop running either Ubuntu, or Windows with Ubuntu in VirtualBox.
  2. More importantly, around the time I started at Linpro, it became increasingly difficult to maintain a FreeBSD desktop. The modularization of X.org and the increasing complexity of desktop environments mean that the number of packages required for a complete desktop system has grown from a bit over 100 to well over 600 (in addition to the kernel and base operating system, which is monolithic in FreeBSD). The FreeBSD ports system does not scale well, and the lack of a proper binary update procedure makes it almost impossible to keep that many packages up-to-date.

This is about to change. Thanks to what may very well be the most important innovations in recent FreeBSD history, I am now once again running a FreeBSD desktop.

I am referring, of course, to PKGNG and its companion, Poudrière.

(Actually, I've been running PC-BSD for a while, but I'm not a big fan of PBIs, and they don't really address the updating problem. This is about building a FreeBSD desktop from scratch, and keeping it up-to-date.)

Here is the procedure I followed:

I created a new VirtualBox VM with a 128 GB disk—this may not seem like much, but most of my storage requirements are met by the host system, which has two 2 TB disks, and an older file server with over 2 TB of storage, including ~40 GB of source code.

Instead of the default NAT configuration, I set up a bridged network interface with promiscuous mode enabled; I run my own DNS and DHCP servers (on a soekris, of course), so within my home network, the VM has its own fixed IP address and DNS name.

I then installed FreeBSD 9.0-RELEASE amd64 from the disc1 ISO. I did not select any packages, nor did I install the ports tree. However, as soon as the system booted, I downloaded and extracted an up-to-date ports tree using portsnap and built pkg and poudriere.

I had initially planned to rely entirely on the pkgbeta repository, but there are two problems with this: firstly, all the packages there are built with the default options, which in many cases make no sense at all; and secondly and most importantly, it did not at that time have a full set of Gnome 2 packages.

I therefore set up my own Poudrière. I ran a first build with only ports-mgmt/pkg and ports-mgmt/poudriere, then updgraded those two packages and started over with a slightly larger set of packages, including mail/postfix, security/sudo, shells/zsh and other essentials which don't take long to build. Finally, I added emulators/virtualbox-ose-additions, x11/xorg, x11/gnome2, and a number of desktop applications (e.g. Emacs) and development tools (e.g. Subversion) which I use. Then it was just a matter of

% sudo pkg install x11/xorg x11/gnome2 editors/emacs

Whenever I need an additional package, I add it to the package list and re-run Poudrière. I don't really need to do that—I could just as easily build it straight from the ports tree—but it ensures that I get a “clean” package that I can safely use on another VM or machine, should I ever need to, and that it is rebuilt when updated, along with all the other packages I use:

% sudo poudriere ports -u
% sudo poudriere bulk -f ~/poudriere-packages -j 83amd64 -k

I had some minor configuration trouble. First, X.org's autoconfiguration feature still doesn't work very well, at least in VirtualBox, so I used the xorg.conf from a previous post, with only minor modifications:

Section "InputDevice"
        Identifier      "Generic Keyboard"
        Driver          "kbd"
        Option          "XkbRules"      "xorg"
        Option          "XkbModel"      "pc105"
        Option          "XkbLayout"     "us"
EndSection

Section "InputDevice"
        Identifier      "VBox Mouse"
        Driver          "vboxmouse"
        Option          "CorePointer"
EndSection

Section "Device"
        Identifier      "VBox Video"
        Driver          "vboxvideo"
EndSection

Section "Monitor"
        Identifier      "VBox Monitor"
EndSection

Section "Screen"
        Identifier      "Default Screen"
        Device          "VBox Video"
        Monitor         "VBox Monitor"
EndSection

Section "ServerLayout"
        Identifier      "Default Layout"
        Screen          "Default Screen"
        InputDevice     "Generic Keyboard"
        InputDevice     "VBox Mouse"
EndSection

With the VirtualBox guest additions installed and this xorg.conf in place, everything works beautifully—mouse integration, clipboard integration and dynamic desktop resizing.

The second issue I had was with GDM's greeter, which did not display my user in the user list and would not let me type in my user name. The first part is understandable, as it only displays users who have logged in at least once already. The second is more surprising, but I did not have the energy to try to figure out why. Instead, I worked around it by disabling the user list:

% sudo -u gdm gconftool-2 --type bool \
    --set /apps/gdm/simple-greeter/disable_user_list true

Shared folders are still not implemented for FreeBSD guests, but I rarely need to transfer files between the host and the guest; when I do, I use FileZilla to SFTP them over, and there is always SMB for more complex use.

If I were running on real hardware or a weaker host (this is a four-core i7 with 16 GB RAM), I might use a subset of PC-BSD's tricked-out sysctl.conf, but I don't need audio and I haven't experienced any interactivity issues yet.

That's it—try it yourself, and share your experience below!

On testing

Last fall, I wrote a completely new configuration parser for OpenPAM Lycopsida. Although the new parser was far more robust than the one it replaced, it was large, unwieldy, and suffered from a number of issues relating to whitespace handling, which stemmed from reusing some old code which unfortunately was thoroughly documented and therefore could not be easily modified. So I decided to rewrite it again, from scratch this time.

Then I did what I should have done last fall but didn't: I wrote some unit tests. And of the first dozen or so tests I came up with, three failed, revealing two different bugs—one of them fairly serious.

There's a lesson in here somewhere...

Downtime

I haven't been able to read email sent to [email protected] or [email protected] for five days, due to a series of unfortunate incidents involving dodgy power supplies and the fragility of ZFS boot in FreeBSD. Work and other duties prevented me from addressing the issue in a more timely manner, but I am now regaining control. Luckily, neither my ~30 GB IMAP spool nor any other data was lost, nor did my backup MX bounce any mail. My IMAP server is now back up with a small UFS SU+J boot / root partition instead of ZFS. I am still unable to read email, but that should be fixed within 24 hours.

I also uncovered an annoying but luckily not fatal bug in the Cyrus IMAP server. When TLS is configured, the IMAP daemon stores state for each TLS session in a DB file. If that file is corrupted, the server will start, but it will refuse any incoming IMAP or LMTP connections, and will instead spit out a stream of completely unhelpful error messages. The only recourse is to delete the TLS session state database; I set up an rc script to do that at boot time, so hopefully this won't bite me again.

ZFS-to-ZFS backups

ZFS has a couple of very useful functions, zfs send and zfs receive, which allow you to serialize a complete ZFS dataset and recreate it in a different location. They can also be used to serialize a delta between two snapshots and apply that delta to a previously created copy of the dataset. You see where I'm going with this... That's right, incremental backups of a ZFS dataset or even an entire pool to a different ZFS dataset or pool.

Why would you want to perform incremental ZFS-to-ZFS backups instead of just adding redundancy to the pool, or cloning a snapshot? Because—provided the ZFS pool and filesystem versions match—it allows you to duplicate your dataset or pool on removable media (which you can store off-site), or even on a different machine across the network. This technique is far more efficient than rsync, because there is no need to compare the source and destination: ZFS already knows exactly what has changed. It also preserves the filesystem hierarchy and dataset properties.

In my case, I need to duplicate a pool onto removable media because I am replacing a server that only takes PATA disks with another that only takes SATA disks, which precludes just moving the disks over and progressively replacing them with new ones. Using this technique, when the time comes, I can slide the new server into the rack, hook up the backup disk, and restore just the parts I want to keep.

Of course, like a good little hacker, I wrote a script, which you can find here, to automate this.

The script takes two arguments: the source dataset and the destination dataset. Either of these can be the root of a ZFS pool or a dataset within a pool; they can even be datasets within the same pool, provided they do not overlap. The script selects the latest snapshot of the destination dataset (it uses a naming scheme which ensures that lexical order corresponds to chronological order), verifies that the source dataset has a snapshot with the same name, takes a new snapshot of the source dataset, and streams the difference between the old and new snapshots from the source dataset to the destination dataset. Finally, it deletes the old snapshot to allow ZFS to reclaim the space occupied by old data.

You can use this script with multiple backup disks, since it will only delete the snapshot that was actually used for the current disk. If you have one disk for each day of the week, for instance, it will delete last Monday's snapshot once it has completed this Monday's backup, but leave the other six in place. Likewise, if you decide to keep Sunday's disk for a month instead of reusing it next Sunday, the script will leave the snapshot in place until you run it again with the same disk.

The script does not currently support over-the-network backups, but it should be fairly easy to implement.

More Advanced Format drives: Samsung SpinPoint F4 EcoGreen and Seagate Barracuda Green

I've acquired a couple more 2 TB Advanced Format drives: a Seagate Barracuda Green (ST2000DL003) and a Samsung SpinPoint F4 EcoGreen (HD204UI, no data sheet available online).

I was extremely impressed with the Samsung HD204UI. It's the first AF drive I've seen with decent performance. In fact, it's the fastest disk I've tested so far—its unaligned writes are faster than the non-AF Hitachi I used as a reference last time, and its aligned writes are twice as fast.

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       43984    2979    2979
  131072    1024     512    4096      127047    1031    1031

   65536    2048       0    8192       14764    4438    8877
   65536    2048     512    8192       12453    5262   10524
   65536    2048    1024    8192       12460    5259   10518

   32768    4096       0   16384        4609    7109   28436
   32768    4096     512   16384        7829    4185   16740
   32768    4096    1024   16384        8413    3894   15579
   32768    4096    2048   16384        8211    3990   15961

   16384    8192       0   32768        3952    4145   33165
   16384    8192     512   32768        9050    1810   14481
   16384    8192    1024   32768        9317    1758   14067
   16384    8192    2048   32768        9315    1758   14069
   16384    8192    4096   32768        3996    4099   32793

The Seagate ST2000DL003, on the other hand, is so slow it's not even funny. It's actually the slowest of all the drives I've tested: its performance on aligned random writes is half that of the Western Digital WD20EARS. It's three times as fast on unaligned writes, but three times nothing (100 kBps) is still nothing (300 kBps) compared to the Samsung HD204UI (15 MBps). Here are the numbers:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096     2419280      54      54
  131072    1024     512    4096     2199286      59      59

   65536    2048       0    8192     1283667      51     102
   65536    2048     512    8192      985184      66     133
   65536    2048    1024    8192      995423      65     131

   32768    4096       0   16384       45980     712    2850
   32768    4096     512   16384      345291      94     379
   32768    4096    1024   16384      432533      75     303
   32768    4096    2048   16384      429781      76     304

   16384    8192       0   32768       34192     479    3833
   16384    8192     512   32768      166440      98     787
   16384    8192    1024   32768      210147      77     623
   16384    8192    2048   32768      207356      79     632
   16384    8192    4096   32768       34221     478    3830

This time, I also ran sequential write tests—basically, dding eight gigabytes' worth of zeroes to the disk in 128 kB blocks, which is the optimal I/O size for FreeBSD. This time, the results are pretty close: the Samsung HD204UI gets slightly less than 90 MBps, and the Seagate ST2000DL003 gets slightly less than 80 MBps.

OpenBSD IPSec backdoor allegations: update

I'm sure I don't need to remind anyone what this is about...

The latest news: Theo now says that it is probable that NetSec was indeed contracted to insert backdoor code into OpenBSD, but after a month of review and changelog archeology, there is still no sign that they succeeded or even attempted to push tainted code into the tree.

The audit (which is still ongoing) did uncover one serious bug, but there is no reason to believe that it was planted deliberately. This relates to CBC mode, an encryption protocol in which each block of plaintext is combined with the ciphertext of the previous block before encryption to make it harder to attack ciphertext blocks individually.

If I understand Theo's message correctly,

  • It used to be common practice to use the last ciphertext block from one message as IV for the next message. This seemed like a good idea at the time, because the alternative is to generate a random IV for each new message, which requires a strong, fast PRNG, and strong, fast PRNGs didn't grow on trees back when this scheme was devised. By reusing the last ciphertext block from the previous message, a costly random IV was only required for the very first message.
  • This practice was discovered to be a bad idea because in n - 1 out of n cases (where n is the block size in bytes), the last plaintext block of any message encrypted with a block cipher contains somewhat predictable padding.
  • The flawed IV logic was replicated in several parts of the OpenBSD source tree, and the fix was implemented in some of them, but not all.
  • The person who implemented this flawed logic was at that time a NetSec employee, but he had been involved in the development of OpenBSD's IPSec stack for years before he was hired, and, as previously mentioned, he was only following common practice.
  • The same person implemented the obvious fix (generating a new, random IV for every message) once the attack was discovered.
  • The person responsible for those parts of the tree in which the fix was not implemented is one of the people fingered by Perry, but his tenure started after Perry had left and ended before the attack was discovered.
  • Anyone with any amount of experience in a large F/OSS project, or any large software development effort for that matter, can tell you that this kind of oversight is the rule rather than the exception. Although there is no evidence that he did not intentionally “forget

OpenBSD IPSec backdoor allegations: update

I'm sure I don't need to remind anyone what this is about...

The latest news: Theo now says that it is probable that NetSec was indeed contracted to insert backdoor code into OpenBSD, but after a month of review and changelog archeology, there is still no sign that they succeeded or even attempted to push tainted code into the tree.

The audit (which is still ongoing) did uncover one serious bug, but there is no reason to believe that it was planted deliberately. This relates to CBC mode, an encryption protocol in which each block of plaintext is combined with the ciphertext of the previous block before encryption to make it harder to attack ciphertext blocks individually.

If I understand Theo's message correctly,

  • It used to be common practice to use the last ciphertext block from one message as IV for the next message. This seemed like a good idea at the time, because the alternative is to generate a random IV for each new message, which requires a strong, fast PRNG, and strong, fast PRNGs didn't grow on trees back when this scheme was devised. By reusing the last ciphertext block from the previous message, a costly random IV was only required for the very first message.
  • This practice was discovered to be a bad idea because in n - 1 out of n cases (where n is the block size in bytes), the last plaintext block of any message encrypted with a block cipher contains somewhat predictable padding.
  • The flawed IV logic was replicated in several parts of the OpenBSD source tree, and the fix was implemented in some of them, but not all.
  • The person who implemented this flawed logic was at that time a NetSec employee, but he had been involved in the development of OpenBSD's IPSec stack for years before he was hired, and, as previously mentioned, he was only following common practice.
  • The same person implemented the obvious fix (generating a new, random IV for every message) once the attack was discovered.
  • The person responsible for those parts of the tree in which the fix was not implemented is one of the people fingered by Perry, but his tenure started after Perry had left and ended before the attack was discovered.
  • Anyone with any amount of experience in a large F/OSS project, or any large software development effort for that matter, can tell you that this kind of oversight is the rule rather than the exception. Although there is no evidence that he did not intentionally “forget” to fix his code, it is far more likely that he simply did not realize that the fix that had already been committed did not extend to his own code, or that he wasn't paying attention, and nobody else noticed.

My bounty still stands, and I will even relax the requirements a bit: you are not required to show that OpenBSD is still exploitable, only that it was exploitable on December 11, 2010 (the date of Perry's email to Theo).

4k drive update

Just to let you know what the current status is wrt. 4k drives:

It looks like the consensus in the industry (meaning everyone except Western Digital) is to announce dual sector sizes, i.e. 512-byte logical sectors on top of 4096-byte physical sectors.

Ivan Voras has taken the initiative to organize a 4k BoF at BSDCan, although judging from the (private) email exchange on the subject, it's quite possible that a decision will be made before then. Currently, it looks like we're moving towards having the low-level driver report a 512-byte sector size and 4096-byte stripe width (and, if necessary, an appropriate offset) to GEOM. This preserves backward compatibility, but announces to GEOM consumers that it is a good idea to do I/O in 4096-byte blocks and align data structures on 4096-byte boundaries. All that remains is then to make sure that those GEOM consumers we care about (particularly ZFS) take advantage of this information.

The situation for WD “Advanced Format

4k drive update

Just to let you know what the current status is wrt. 4k drives:

It looks like the consensus in the industry (meaning everyone except Western Digital) is to announce dual sector sizes, i.e. 512-byte logical sectors on top of 4096-byte physical sectors.

Ivan Voras has taken the initiative to organize a 4k BoF at BSDCan, although judging from the (private) email exchange on the subject, it's quite possible that a decision will be made before then. Currently, it looks like we're moving towards having the low-level driver report a 512-byte sector size and 4096-byte stripe width (and, if necessary, an appropriate offset) to GEOM. This preserves backward compatibility, but announces to GEOM consumers that it is a good idea to do I/O in 4096-byte blocks and align data structures on 4096-byte boundaries. All that remains is then to make sure that those GEOM consumers we care about (particularly ZFS) take advantage of this information.

The situation for WD “Advanced Format” drives is a bit more complex, because they announce 512-byte logical sectors. The only solution I can see is to add a quirk system to the ada driver (and possibly to ata as well, if we still care about it) similar to the ones we have for SCSI and USB devices, and match the model number. I believe /WD\d+[A-Z]+RS/ should match all existing Advanced Format drives with no false positives.

OpenBSD IPSec backdoor allegations: triple $100 bounty

In case you hadn't heard: Gregory Perry alleges that the FBI paid OpenBSD contributors to insert backdoors into OpenBSD's IPSec stack, with his (Perry's) knowledge and collaboration.

If that were true, it would also be a concern for FreeBSD, since some of our IPSec code comes from OpenBSD.

I'm having a hard time swallowing this story, though. In fact, I think it's preposterous. Rather than go into further detail, I'll refer you to Jason Dixon's summary, which links to other opinions, and add only one additional objection: if this were true, there would be no “recently expired NDA”; it would be a matter of national security.

I'll put my money where my mouth is, and post a triple bounty:

  1. I pledge USD 100 to the first person to present convincing evidence showing:

    • that the OpenBSD Crypto Framework contains vulnerabilities which can be exploited by an eavesdropper to recover plaintext from an IPSec stream,
    • that these vulnerabilities can be traced directly to code submitted by Jason Wright and / or other developers linked to Perry, and
    • that the nature of these vulnerabilities is such that there is reason to suspect, independently of Perry's allegations, that they were inserted intentionally—for instance, if the surrounding code is unnecessarily awkward or obfuscated and the obvious and straightforward alternative would either not be vulnerable or be immediately recognizable as vulnerable.
  2. I pledge an additional USD 100 to the first person to present convincing evidence showing that the same vulnerability exists in FreeBSD.

  3. Finally, I pledge USD 100 to the first person to present convincing evidence showing that a government agency successfully planted a backdoor in a security-critical portion of the Linux kernel.

Additional conditions:

  • In all three cases, the vulnerability must still be present and exploitable when the evidence is assembled and presented to the affected parties. Allowances will be made for the responsible disclosure process.
  • Exploitability must be demonstrated, not theorized.
  • I will not evaluate the evidence myself, but rely on the consensus of the OpenBSD, FreeBSD, Linux and / or infosec communities.
  • Primacy will be determined in a similar manner.
  • The evidence must be presented, and the bounty claimed, no later than 2012-12-31 23:59:59 UTC—a little more than two years from today.
  • The bounty will, at the claimant's discretion, either be transferred to the claimant by PayPal—no cash, checks, direct deposits or wire transfers—or donated directly to a non-profit of his or her choice.

[2010-12-16 fixed link]

OpenBSD IPSec backdoor allegations: triple $100 bounty

In case you hadn't heard: Gregory Perry alleges that the FBI paid OpenBSD contributors to insert backdoors into OpenBSD's IPSec stack, with his (Perry's) knowledge and collaboration.

If that were true, it would also be a concern for FreeBSD, since some of our IPSec code comes from OpenBSD.

I'm having a hard time swallowing this story, though. In fact, I think it's preposterous. Rather than go into further detail, I'll refer you to Jason Dixon's summary, which links to other opinions, and add only one additional objection: if this were true, there would be no “recently expired NDA”; it would be a matter of national security.

I'll put my money where my mouth is, and post a triple bounty:

  1. I pledge USD 100 to the first person to present convincing evidence showing:

    • that the OpenBSD Crypto Framework contains vulnerabilities which can be exploited by an eavesdropper to recover plaintext from an IPSec stream,
    • that these vulnerabilities can be traced directly to code submitted by Jason Wright and / or other developers linked to Perry, and
    • that the nature of these vulnerabilities is such that there is reason to suspect, independently of Perry's allegations, that they were inserted intentionally—for instance, if the surrounding code is unnecessarily awkward or obfuscated and the obvious and straightforward alternative would either not be vulnerable or be immediately recognizable as vulnerable.
  2. I pledge an additional USD 100 to the first person to present convincing evidence showing that the same vulnerability exists in FreeBSD.

  3. Finally, I pledge USD 100 to the first person to present convincing evidence showing that a government agency successfully planted a backdoor in a security-critical portion of the Linux kernel.

Additional conditions:

  • In all three cases, the vulnerability must still be present and exploitable when the evidence is assembled and presented to the affected parties. Allowances will be made for the responsible disclosure process.
  • Exploitability must be demonstrated, not theorized.
  • I will not evaluate the evidence myself, but rely on the consensus of the OpenBSD, FreeBSD, Linux and / or infosec communities.
  • Primacy will be determined in a similar manner.
  • The evidence must be presented, and the bounty claimed, no later than 2012-12-31 23:59:59 UTC—a little more than two years from today.
  • The bounty will, at the claimant's discretion, either be transferred to the claimant by PayPal—no cash, checks, direct deposits or wire transfers—or donated directly to a non-profit of his or her choice.

[2010-12-16 fixed link]

Correct numbers for EARS and Deskstar

Western Digital WD10EARS:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096      143839     911     911
  131072    1024     512    4096      145876     898     898

   65536    2048       0    8192       96727     677    1355
   65536    2048     512    8192       88182     743    1486
   65536    2048    1024    8192       89126     735    1470

   32768    4096       0   16384       23063    1420    5683
   32768    4096     512   16384       76939     425    1703
   32768    4096    1024   16384       75719     432    1731
   32768    4096    2048   16384       76007     431    1724

   16384    8192       0   32768       16567     988    7911
   16384    8192     512   32768       67676     242    1936
   16384    8192    1024   32768       68772     238    1905
   16384    8192    2048   32768       68422     239    1915
   16384    8192    4096   32768       15728    1041    8333

Western Digital WD20EARS:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096     1963003      66      66
  131072    1024     512    4096     1964781      66      66

   65536    2048       0    8192      897269      73     146
   65536    2048     512    8192      898143      72     145
   65536    2048    1024    8192      897456      73     146

   32768    4096       0   16384       18923    1731    6926
   32768    4096     512   16384     1216071      26     107
   32768    4096    1024   16384     1212785      27     108
   32768    4096    2048   16384     1213512      27     108

   16384    8192       0   32768       13645    1200    9605
   16384    8192     512   32768      804264      20     162
   16384    8192    1024   32768      802154      20     163
   16384    8192    2048   32768      802877      20     163
   16384    8192    4096   32768       13960    1173    9388

Hitach HDS722020ALA330:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       34849    3761    3761
  131072    1024     512    4096       34807    3765    3765

   65536    2048       0    8192       19341    3388    6776
   65536    2048     512    8192       19332    3389    6779
   65536    2048    1024    8192       19352    3386    6772

   32768    4096       0   16384        8756    3741   14967
   32768    4096     512   16384        8803    3722   14888
   32768    4096    1024   16384        8782    3731   14924
   32768    4096    2048   16384        8744    3747   14989

   16384    8192       0   32768        5036    3253   26025
   16384    8192     512   32768        5035    3253   26029
   16384    8192    1024   32768        4997    3278   26227
   16384    8192    2048   32768        5042    3249   25994
   16384    8192    4096   32768        5039    3251   26010

Slightly different numbers, same conclusion: stay away from WD's Green Series.

Correct numbers for EARS and Deskstar

Western Digital WD10EARS:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096      143839     911     911
  131072    1024     512    4096      145876     898     898

   65536    2048       0    8192       96727     677    1355
   65536    2048     512    8192       88182     743    1486
   65536    2048    1024    8192       89126     735    1470

   32768    4096       0   16384       23063    1420    5683
   32768    4096     512   16384       76939     425    1703
   32768    4096    1024   16384       75719     432    1731
   32768    4096    2048   16384       76007     431    1724

   16384    8192       0   32768       16567     988    7911
   16384    8192     512   32768       67676     242    1936
   16384    8192    1024   32768       68772     238    1905
   16384    8192    2048   32768       68422     239    1915
   16384    8192    4096   32768       15728    1041    8333

Western Digital WD20EARS:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096     1963003      66      66
  131072    1024     512    4096     1964781      66      66

   65536    2048       0    8192      897269      73     146
   65536    2048     512    8192      898143      72     145
   65536    2048    1024    8192      897456      73     146

   32768    4096       0   16384       18923    1731    6926
   32768    4096     512   16384     1216071      26     107
   32768    4096    1024   16384     1212785      27     108
   32768    4096    2048   16384     1213512      27     108

   16384    8192       0   32768       13645    1200    9605
   16384    8192     512   32768      804264      20     162
   16384    8192    1024   32768      802154      20     163
   16384    8192    2048   32768      802877      20     163
   16384    8192    4096   32768       13960    1173    9388

Hitach HDS722020ALA330:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       34849    3761    3761
  131072    1024     512    4096       34807    3765    3765

   65536    2048       0    8192       19341    3388    6776
   65536    2048     512    8192       19332    3389    6779
   65536    2048    1024    8192       19352    3386    6772

   32768    4096       0   16384        8756    3741   14967
   32768    4096     512   16384        8803    3722   14888
   32768    4096    1024   16384        8782    3731   14924
   32768    4096    2048   16384        8744    3747   14989

   16384    8192       0   32768        5036    3253   26025
   16384    8192     512   32768        5035    3253   26029
   16384    8192    1024   32768        4997    3278   26227
   16384    8192    2048   32768        5042    3249   25994
   16384    8192    4096   32768        5039    3251   26010

Slightly different numbers, same conclusion: stay away from WD's Green Series.

Correct numbers for EADS

As Pieter de Goeje kindly pointed out, due to an overflow bug, phybs reported incorrect tps numbers. I've corrected the bug and started re-running the benchmarks, but so far, I've only had time to test the WD20EADS. Here are the results:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       55121    2377    2377
  131072    1024     512    4096      102522    1278    1278

   65536    2048       0    8192       47447    1381    2762
   65536    2048     512    8192       36531    1793    3587
   65536    2048    1024    8192       64753    1012    2024

   32768    4096       0   16384       37204     880    3522
   32768    4096     512   16384       28833    1136    4545
   32768    4096    1024   16384       43464     753    3015
   32768    4096    2048   16384       37968     863    3452

   16384    8192       0   32768       19079     858    6869
   16384    8192     512   32768       25722     636    5095
   16384    8192    1024   32768       27485     596    4768
   16384    8192    2048   32768       21333     768    6144
   16384    8192    4096   32768       27655     592    4739

Remember, this is not an Advanced Format drive, but it's still surprisingly slow and surprisingly inconsistent.

Correct numbers for EADS

As Pieter de Goeje kindly pointed out, due to an overflow bug, phybs reported incorrect tps numbers. I've corrected the bug and started re-running the benchmarks, but so far, I've only had time to test the WD20EADS. Here are the results:

   count    size  offset    step        msec     tps    kBps

  131072    1024       0    4096       55121    2377    2377
  131072    1024     512    4096      102522    1278    1278

   65536    2048       0    8192       47447    1381    2762
   65536    2048     512    8192       36531    1793    3587
   65536    2048    1024    8192       64753    1012    2024

   32768    4096       0   16384       37204     880    3522
   32768    4096     512   16384       28833    1136    4545
   32768    4096    1024   16384       43464     753    3015
   32768    4096    2048   16384       37968     863    3452

   16384    8192       0   32768       19079     858    6869
   16384    8192     512   32768       25722     636    5095
   16384    8192    1024   32768       27485     596    4768
   16384    8192    2048   32768       21333     768    6144
   16384    8192    4096   32768       27655     592    4739

Remember, this is not an Advanced Format drive, but it's still surprisingly slow and surprisingly inconsistent.

Off by one

I made a small modification to phybs to verify the function of jumpers 7-8 on the WD Advanced Format drives (see here and here). It is supposed to cause the disk to internally shift every write by one sector, so that a write to sector 63 (where the first partition on a PC normally starts) actually goes to sector 64, which coincides with the beginning of a physical 4,096-byte sector. These numbers confirm this:

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       78631      34    1666
   32768    4096     512   16384       79880      33    1640
   32768    4096    1024   16384       73164      36    1791
   32768    4096    1536   16384       77727      34    1686
   32768    4096    2048   16384       76975      35    1702
   32768    4096    2560   16384       74970      36    1748
   32768    4096    3072   16384       79379      34    1651
   32768    4096    3584   16384       28094      96    4665

The firmware on the disk shifts everything forward by 512 bytes, so all these passes are unaligned except the last one, because 3,584 + 512 = 4,096.

Off by one

I made a small modification to phybs to verify the function of jumpers 7-8 on the WD Advanced Format drives (see here and here). It is supposed to cause the disk to internally shift every write by one sector, so that a write to sector 63 (where the first partition on a PC normally starts) actually goes to sector 64, which coincides with the beginning of a physical 4,096-byte sector. These numbers confirm this:

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       78631      34    1666
   32768    4096     512   16384       79880      33    1640
   32768    4096    1024   16384       73164      36    1791
   32768    4096    1536   16384       77727      34    1686
   32768    4096    2048   16384       76975      35    1702
   32768    4096    2560   16384       74970      36    1748
   32768    4096    3072   16384       79379      34    1651
   32768    4096    3584   16384       28094      96    4665

The firmware on the disk shifts everything forward by 512 bytes, so all these passes are unaligned except the last one, because 3,584 + 512 = 4,096.

Disks and equipment

Here is a quick overview of the disks used in my tests:

Brand Model Capacity Speed Interface Notes
Western Digital WD4000AAKS 400 GB 7,200 rpm SATA 3 Gbps  
Western Digital WD10EARS 1 TB > 5,400 rpm SATA 3 Gbps 1
Western Digital WD20EARS 2 TB > 5,400 rpm SATA 3 Gbps 2
Western Digital WD20EADS 2 TB > 5,400 rpm SATA 3 Gbps 3
Hitachi HDS722020ALA330 2 TB 7,200 rpm SATA 3 Gbps 3

The computer runs FreeBSD 9 on an Intel E6600 with an ICH9 chipset and 4 GB RAM. For convenience, the disks were tested in an Akasa Duo Dock connected by eSATA cable to one of the ICH9 SATA ports.

1 Kindly provided by Alastair Hogge

2 Kindly provided by GetOnline Ltd.

3 Kindly provided by Dansk Scanning AS