more on zfs…

18. October 07

This time I ran izone with a 8GB file on :

  1. UFS 2 (newfs defaults) over a gstripe (stripe size 64k) volume
  2. zfs with 3 disks stripped
  3. UFS 2 (newfs defaults) over a zvol (equal to zpool size)
  4. zfs over a gstripe (stripe size 64k) volume
  5. UFS2 async + gjournal over a gstripe (stripe size 64k) volume

zfs -- read
zfs --write

I’m confused about what to expect from these values…
As usual complete set of graphes are here. (note: I also splitted iozone results by record size).

clement vs 7.0-PRERELEASE

16. October 07

No, I’m not dead (yet?).

[Disclaimer]: I’m far from being an expert in VM or storage. Don’t use this post if you need to feed various trolls.

To make the story short, I’m confused about my ZFS benchmarks – Yeah I know benchmark sucks – Write performances are quite impressive, but I feel uncomfortable with read performances when it reaches arc limits…

Most of my workstations and personal servers are running CURRENT and I was waiting for RELENG_7 to test it on my test servers at work. Why? just because I’m lazy and I want to update my servers with my freebsd-update receipe (It _almost_ works ;)).
I was also waiting for RELENG_7 because of zfs, to use it on the “low cost” storage server we received.
It’s a HP DL 365 G1 dual core with 5GB of RAM. The storage part is a MSA 20 with 12 500GB disks attached to a Smart Array 4602 (ciss(4)) controller, 192MB of BBW cache (50/50). Due to limitations of the latter, volumes can’t exceed 2 TB. We splitted the disk pool into 3 RAID 5 volumes of 1.4To each. [dmesg here]

ZFS will strip volumes for us.
# zpool create test da0 da1 da2

Out of the box, raid volumes speed are not that bad. Applying scottl@’s ciss patch helps a little.
Once the zpool created, I was impressed by the “irrevelant dd benchmark”: read 180MB/s, write 110MB/s. I couldn’t resist to launch iozone. As expected it panic() :)
All iozone runs are performed with the following command:
# iozone -aRcWe -g 2G -f /test/testfile

I rose vm.kmem* to 1GB and kern.maxvnodes to 400000, I re-ran iozone and played with postmark. Few hours later, Yet Another Panic. I tried to set vm.kmem* up to 1.5GB or 2G with no luck. Anyway, I’ll investigate later. I decided to test without prefetch. No surprise. YAP.
I thought it was time to give pjd’s vm_kern.c hack a chance. Once the kernel recompiled, I restarted iozone (without prefetch). 6 hours later, still no panic, great ! A lot of successfull rsync later, disappointing tar over nc transfers (it requires more investigation, I’ll “blog” about later), I felt uncomfortable with read performance.

I ran my bench with and without prefetch. Read performance was dissatisfied (compared to write performances).
Last time I’ve seen kinds of dramatical performance hits, it was on linux, buffer cache starvation. I decided to check it compare it to a striped volume. I destroyed my zpool and create a gstripe with 64k stripe.
# zpool destroy test
# gstripe label -s 64k da0 da1 da2
# newfs /dev/stripe/data

I finally got read performance I expected.



As you can see, the fall appears when arc gets full.
Here are arc infos, except for the run where vfs.zfs.arc_{min,max} was set to 32MB
vfs.zfs.arc_min: 33554432
vfs.zfs.arc_max: 805306368

Backing out pjd’s patch didn’t help. I’m currently running the same benchmark with a iozone file of 8GB.

You can get all the graph here.

mpm-itk is a working alternative to the b0rked perchild mpm, but in
prefork mode. It’s currently maintained by Steinar H. Gunderson. Please
for more details.
It has been developed for apache 2.0.x and been ported to apache 2.2
recently.Since I enjoy apache 2.2 this little howto is made under
apache 2.2.x but works for apache 2.0.x too.

First of all CVSup your ports tree. Go to apache22 port and build
apache with “itk” as MPM :

$ cd /usr/ports/www/apache22
$ sudo make WITH_MPM=itk
< output...>
$ sudo make install clean

Let’s check we have te good MPM.

$ /usr/local/sbin/httpd -V | grep MPM
Server MPM:     ITK
-D APACHE_MPM_DIR="server/mpm/experimental/itk"

Nice! We can now see how our apache22 reacts :)

$ sudo /usr/local/etc/rc.d/apache22 forcestart
Performing sanity check on apache22 configuration:
Syntax OK
Starting apache22.
$ ps auxwww | grep httpd
root    92484  0.0  0.7 73452  7116  ?? Ss    4:18PM   0:00.09 /usr/local/sbin/httpd -DNOHTTPACCEPT
root    92485  0.0  0.7 73484  7132  ?? S     4:18PM   0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
root    92486  0.0  0.7 73484  7132  ?? S     4:18PM   0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
root    92487  0.0  0.7 73484  7132  ?? S     4:18PM   0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
root    92488  0.0  0.7 73484  7132  ?? S     4:18PM   0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT
root    92489  0.0  0.7 73484  7132  ?? S     4:18PM   0:00.00 /usr/local/sbin/httpd -DNOHTTPACCEPT

Ohoh! apache is owned by root. No I didn’t put a backdoor ;) apache runs as root because until the request get parsed, we don’t know which virtual host is required and, so, the final user. Security risk exists: Any bug before request parsing can lead to root compromise. If there is a major flaw in mod_ssl, pray and update ASAP.

Still here? Not scared? you should be :)
mpm-itk uses 2 Directives:
AssignUserID, UID/GID assigned to child process and MaxClientsVHost, the maximum number of children alive at the same time for this virtual host

Don’t uncomment ‘#Include etc/apache22/extra/httpd-vhosts.conf’ in
${PREFIX}/etc/apache22/httpd.conf, the config file is b0rked (it’s on
my ToDo list).

Instead edit ${PREFIX}/etc/apache22/Includes/100.NameVirtualHost.conf
and put:

NameVirtualHost *:80

(I usually leave 0xx for modules configurations)

Add something like this to

<VirtualHost *:80>
    DocumentRoot /usr/local/www/apache22/data
    AssignUserID  nobody nogroup
    MaxClientsVHost 10

Now add a different vhost. My vhost is “”, So
in ${PREFIX}/etc/apache22/Includes/ I

<VirtualHost *:80>
    ServerAdmin [email protected]
    DocumentRoot /home/www/
    AssignUserID  clement clement
    MaxClientsVHost 50
    <Directory /home/www/>
            Order allow,deny
            Allow from all

I’ve also install PHP5 to perform some basic testing. Let’s restat

$ sudo /usr/local/etc/rc.d/apache22 forcerestart

I run this simple script:


We run _the_ *basic* test…

$ fetch -q -o -
uid=65534(nobody) gid=65534(nobody) egid=65533(nogroup) groups=65533(nogroup)
$ fetch -q -o -
uid=1000(clement) gid=1000(clement) groups=1000(clement)

And it works! I don’t even have to keep the o+x bit on directories :)

Have fun!

Last tuesday, I finally received few servers. We (my co-worker and I) have wasted 3 days waiting for hardware, sat next to datacenter door, playing Mario Kart DS. We added a new rack and put into it a cisco 2970 switch, 3 HP DL360, and 1 DL380 (all are G4). We configured iLO and went away. Oh, I forgot to tell you we also plugged back our beloved “bob”, our build machine (lame joke, isn’t it?).

I couldn’t resist to play with them.

I wanted to see how FreeBSD-update can deal with branches and not releases, even if it’s not designed for. Important note: I DON’T USE IT IN PRODUCTION IT’S FOR TESTING PURPOSE ONLY. But if it works… why not :-) I won’t describe how to use tinderbox to build packages. It’s quite easy: build packages, build INDEX, create subdirectories in ftp root, copy packages, and it’s done. iLO allows you to grab remote video/keyboard. Perfect to install FreeBSD over PXE, comfortably sat in my working chair.

Here’s what I have:

  • a build server with free CPU cycles and disk space
  • an up-to-date copy of FreeBSD CVS
  • test servers
  • few hours
  • coffee, beer and cigarettes

First part of the job is to make a base release. I decided to use a vanilla release, to avoid conflicts later. Since I want to play with FreeBSD-update, let I make a release just after it gets committed.


More here… because I didn’t manage to make it look “nice” in wp.

I finally got time and motivation to start writing pflsa. Code is pretty ugly. My PHP skills suck and it seems it prevents me from putting some decent logic in it. I will surely be rewritten, it’s still in proof-of-concept-mode.

What did change?

  • pxe_conf_url became pxe.conf.url and pxe_post_install_script_url is no longer used, moved to “${pxe.conf.url}?getscript”.
  • The configuration format changed too:
    1. It’s splitted in 3 parts %sysinstall, %packages and %post (a la kickstart ;-))
    2. in %sysinstall nothing changed. Except you can use ${size}K/M/G/T/P/E to define partition sizes, instead of number of sectors. installCommit is automatically added. If you have wishes about this part, ping me.
    3. %package target is just a list of packages. pflsa translates it into sysintall format.
    4. %post is actually a list of shell commands. When script is retrivied, pflsa will preprend shebang and few ready-to-go functions (to set timezone, enable services, set root password). I plan to add more functions: gmirror, adding users, etc. Here again, if you have ideas, ping me.
  • Per IP config support, with fallback to default.conf
  • Brownser friendly viewing (not yet finished and lamely based on User Agent).
  • I also add to pxe_crunch gdi (grab disk infos), with libdisk. I don’t really know if I’ll keep, do you need to inform pflsa about disk names/sizes?

If you want to have a look at pflsa files (source are not released yet, it’s too ugly ;)) it’s here.

Now it’s time for me to prepare dinner ;-) after that I’ll review gabor’s DESTDIR patch, and update apache ports.

In few weeks, I’ll have to upgrade our 2 clusters from old customised redhat 7.3 linux to FreeBSD 6-STABLE. The preliminary tasks are already done: applications are now FreeBSD friendly, they have been ports-ified too, config files are ready to be deployed. It sounds really good doesn’t it? But I have a big constraint: I have to install one of the clusters in 3 hours. The only way to install ten servers in this timeframe is to use PXE. I already use PXE for basic installations, but here I must install ready-to-go servers. The cluster is also made of 3 different kinds of servers. I need to support different NICs, types of disk and even set up software RAID on one box. How can I do with our PXE stuff? I was dreaming of kickstart for FreeBSD ;-) I was thinking about patching sysinstall, but I’m too lazy :-) Let’s cheat!

(you can download a tarball with files and scripts here. Warning ! it’s quick and dirty !)


  • FreeBSD source of the desired version
  • A FreeBSD ISO image
  • a Web and a FTP servers
  • a working PXE installation (Ceri’s post on PXE)

Basically, the idea is the to fetch a configuration file at the begining of sysinstall process, load it into sysinstall, commit install, fetch a post-install script and finally run it.

  1. Cook mfsroot (please read Ceri’s post to know how to play with it)
    First of all we have modify the mfsroot image to add fetch(1), our install.cfg and a simple script to fetch/configure our sysinstall.
    To add fetch, we need to regenerate boot_crunch to support fetch, like this one: pxe_crunch.conf.
    mkdir crunch

    cd crunch
    crungen pxe_crunch.conf
    make -f
    cd ..
    Mount mfsroot in a directory called mfsfd. Now we repopulate /stand with our new mfsfd/stand
    rm *
    install 755 ../../crunch/pxe_crunch .
    for i in $(crunchgen -l ../../pxe_crunch.conf); do ln pxe_crunch ${i} ; done
    install -m 755 /sbin/dhclient-script .
    Now we install our small install.cfg and the script fetch
    chmod +x
    cd ..
    mkdir -p var/db tmp var/run

    Umount mfsroot and copy it in you boot/ directory in your nfs share.
  2. Populate web server
    On your webserver, install pxe.conf and
    mkdir ${DOCUMENTROOT}/pxeinstall

    Edit those files to suit your needs.
    vi pxe.conf

    If you want to install FreeBSD via the interface PXE used, keep %%IFNAME%% as netDev value.
    Currently disk support is not yes available. It will be supported soon. is executed in post-install stage so you can use previously installed stuff, like perl or so. It’s a live system so beware :)
    Note: don’t put stuff in /etc/rc.conf, it will be commented out. Use /etc/rc.conf.local instead.
    Note 2: you can put those file on the FTP server instead.
  3. Populate FTP server
    mkdir ${FTPROOT}/pub/FreeBSD/releases/i386/
    tar xvf /path/to/6.1-RELEASE-i386-disc1.iso 6.1-RELEASE ${FTPROOT}/pub/FreeBSD/releases/i386/
  4. Edit boot/loader.rc
    We use loader.rc to set 2 variables: pxe_conf_url and pxe_post_install_script_url.
    Your loader.rc shoud look like this:
    load /boot/kernel/kernel
    load -t mfs_root /boot/mfsroot
    set vfs.root.mountfrom=”ufs:/dev/md0c”
    set pxe_conf_url=””
    set pxe_post_install_script_url=””
  5. The end
    Boot your machine via PXE and it should be ok :-)

The future is to replace hard-coded pxe.conf et for a php script to generation config file/script on the fly. should be improved to make gmirror configuration, nss stuff, and more (like data restoration via bacula for example). I’ll surely add more functions when it get rewriten.

FYI, I use my own FreeBSD release, packages are built in marcuscom tinderbox and a script to generate INDEX file. I also use a meta-port to avoid polluting pxe.conf.

See ya for part II… if any :-)

Log-In | Wordpress | Cappuccino