Setting up a ZFS-only system

After loader support for ZFS was imported into FreeBSD around a month ago, I’ve been thinking of installing a ZFS-only system on my laptop. I also decided to try out using the GPT layout instead of using disklabels etc.

The first thing I started with was to grab a snapshot of FreeBSD CURRENT. Since sysinstall doesn’t support setting up ZFS etc, it can’t be used, so one have to use the Fixit environment on the FreeBSD install cd to set it up. I started out by removing the existing partition table on the disk (just writing zeros to the start of the disk will do). If you’re reading this before the january 2009 snapshot of CURRENT comes out , you have to create your own iso image in order to get loader with the latest fixes. Look in src/release/Makefile and src/release/i386/mkisoimages.sh for how to do this.

Then, the next step was to setup the GPT with the partitions that I wanted to have. Using gpt in FreeBSD, one should create one partition to contain the initial gptzfsboot loader. In addition, I wanted a swap partition, as well as a partition to use for a zpool for the whole system.

To setup the GPT, I used gpart(8) and looked at examples from the man-page. The first thing to do is to setup the GPT partition scheme, first by creating the partition table, and then add the appropriate partitions.

# gpart create -s GPT ad4
# gpart add -b 34 -s 128 -t freebsd-boot ad4
# gpart add -b 162 -s 5242880 -t freebsd-swap ad4
# gpart add -b 5243042 -s 125829120 -t freebsd-zfs ad4
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad4

This creates the initial GPT, and adds three partitions. The first partition contains the gptzfsboot loader which is able to recognize and load the loader from a zfs partition. The second partition is the swap partition (I used 2.5 GB for swap in this case). The third partition is the partition containing the zpool (60GB). Sizes and offsets are specified in sectors (1 sector is typically 512 bytes). The last command puts the needed bootcode into ad4p1 (freebsd-boot).

Having setup the partitions, the hardest part should be done. As we are in the fixit environment, we can now create the zpool as well.

# zpool create data /dev/ad4p3

The zpool should now be up and running. I then decided to create the different filesystems i wanted to have in this pool. I created /usr, /home and /var (I use tmpfs for /tmp).

Then, freebsd must be installed on the system. I did this by copying all folders from /dist in the fixit environment into the zpool. In addition, the /dev folder have to be created. For better details on this, you can follow (http://wiki.freebsd.org/AppleMacbook) At least /dist/boot should be copied in order to be able to boot.

Then, the boot have to be setup. First, boot/loader.conf have to contain:

zfs_load="YES"
vfs.root.mountfrom="zfs:data"

Any additional filesystems or swap has to be entered into etc/fstab, in my case:

/dev/ad4p2 none swap sw 0 0

I also entered the following into etc/rc.conf

zfs_enable="YES"

In addition, boot/zfs/zpool.cache has to exist in order to be able to let the zpool be imported automatically when zfs loads on system boot. To do this, I had to:

# mkdir /boot/zfs
# zpool export data && zpool import data

In order to make /boot/zfs/zpool.cache get populated in the Fixit environment. Then, I copied zpool.cache to boot/zfs on the zpool:

# cp /boot/zfs/zpool.cache /data/boot/zfs

Finally, a basic system should be installed.The last ting to do is to unmount the filesystem(s) and set a few properties:

# zfs set mountpoint=legacy data
# zfs set mountpoint=/usr data/usr
# zfs set mountpoint=/var data/var
# zfs set mountpoint=/home data/home
# zpool set bootfs=data data

To get all the quirks right, such as permissions etc, you should to a real install with making world or using sysinstall when booted into the system. Reboot, and you might be as lucky as me and boot into your ZFS-only system :) For further information, take a look at:

http://wiki.freebsd.org/ZFSOnRoot
which contains some information on how to use ZFS as root, but by booting from ufs and:
http://wiki.freebsd.org/AppleMacbook
which has a nice section on setting up the zpool in a Fixit environment.

Update:

When rebuilding FreeBSD after this type of install, it’s also important that you build with LOADER_ZFS_SUPPORT=YES in order for the loader to be able to read zpools.

22 Responses to “Setting up a ZFS-only system”

  1. Joe says:

    My understanding of ZFS support in FreeBSD is that it is still unstable and you might lose your data. Has this changed?

  2. [...] in volums has an entry describing how to setup a ZFS based FreeBSD install.

  3. lulf says:

    Well, i’ve not encountered any problems regarding corruption. The ZFS related problems these days seem to be a few issues with NFS, as well as kernel memory exhaustion. The latter doesn’t result in any data loss, and can be solved by testing and tuning your system appropriately.

  4. David M. Garawand says:

    I had some tests on FreeBSD (as my favorite OS) with ZFS, everything seemed cool, including the performance, but then just for a capability named “NFSLOG” I switched to Opensolaris on my storage servers, the result was awful at performance side of NFS, and the reason was rational: the zi-log (ZIL) should be t0o speedy, than a hard disk can do! and I decided to use a NVRAM to overcome this problem. but even now, I don’t know how FreeBSD passed the problem, but I guess that they ignored something in NFS over ZFS (perhaps they just intended async NFS). However, I had have no problems with ZFS on freebsd at the time of tests, I think it was 9 months ago, but be careful.

  5. Morbius says:

    @Joe: ZFS is still considered experimental in FBSD. ZFS is also still at version 6 in FreeBSD 7.1; I beleive that’s going to change to ZFS v.13 in 8.0. As for stability, it’s been reported that heavy multithreaded loads can cause a panic. For example, the following will bring down the current version of ZFS on 7.0-RELEASE (and I’m assuming 7.1 too) in a few minutes:

    /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C

    Rumour has it that the bugs which cause this are fixed in Pawel’s latest ZFS code but that’s not in 7.1 RELEASE, unfortunately. My impression is it won’t appear until 8.0-RELEASE.

    I’m just testing FBSD ZFS on a new 8-core, 8GB ram Xeon 24TB (24x1TB disc) storage system and while I’ve not paniced the system I’m finding that it’s slow (although to be fair, in my experience BSD software RAID tends to be slow, period). I had hoped to go the FBSD/ZFS route on this system, and held out till 7.1-RELEASE hoping for some needed fixes, but now it’s looking like Solaris will be the OS of choice based on the employer’s demand for ZFS.

    For example, I can’t write to a 6 disc (6x1TB) Raidz2 ZFS pool faster than about 50 MB/s average. It also has a tendency to write/pause/write/pause. You can see the array lights go on, then go off, pause, go on etc. I tried a different brand of controller, same thing. I also tried a Raidz i.e. ZFS Raid5 briefly on another fast multicore system (all of these are various Supermicro boards) and it still had the tendency to cycle write/pause/write at about 5 second intervals. ZFS is also very CPU intensive when scrubbing, which could well be the case on Solaris too. For example, all 8 cores are in use (2.33GHz) and it still takes well over an hour to scrub 1.1TB of data on a Raidz2. So scrubbing 24TB is going to take all day using all 8 cores. By comparison, the 24 port Areca 1680 controller in that box can do its RAID6 scrub in about the same time (even at its low priority setting) but the difference is the system CPU cores are idle/usable for something else.

    I have no idea how people on Solaris are getting 100s of MB/sec with ZFS. Days of tweaking the knobs on 7.0-RELEASE ZFS didn’t fundamentally change my reported performance. My conclusion at this time is that if you want totally stable (i.e. stablest) and fastest ZFS then use the Solaris reference implentation. If you don’t need the fastest speed and the ZFS array is not heavily trafficked, then you can try the BSD version and hope the stability wrinkles are ironed out in time, which is hopefully 8.0-RELEASE by the looks of things.

  6. Joe says:

    @Morbius: Your comments are similar to what I’ve read so far. For now, I’m running Open Solaris. I will revisit ZFS when 8.0 is released.

  7. SLL says:

    ZFS is stable, I’ve been using it since mid 2007 on a 10TB file-server. Had some problems with samba, but nothing else…

  8. GM says:

    I too have been using ZFS with FreeBSD 7 since mid 2007 on a 4.5TB RAIDZ2 array. At that time FBSD7 was not yet released i.e. it was 7.0-CURRENT. No problems at all. Totally stable.
    You will find hardware support problems with Opensolaris. It supports so little. You have to specifically buy hardware for *it*. Once you have it installed, good luck on ever updating it, without a reinstall.
    FBSD is much easier to keep up to date, both OS and application wise.
    Read the opensolaris forums and you will see reports of kernel panics with Opensolaris too.

  9. Buganini says:

    I installed my zfs-only system, following this instruction.
    But I need LOADER_ZFS_SUPPORT=YES when I build loader,
    or the loader can’t recognize zfs.

  10. lulf says:

    Ah, yes. I’ll add it, thanks :)

  11. Freddie says:

    @ Morbius: ZFSv13 was just MFC’d from -CURRENT to -STABLE, which means it will be available with the release of FreeBSD 7.3, and can be tested right now by upgrading to RELENG_7.

    Running the following on our server (4 CPU cores, 8 GB RAM, 24 SATA drives configured into 3 raidz2 vdevs of 8-drives) gives 15 MB/s per drive writes, without any lockups. FreeBSD 7.2, without kmem tuning, arc limited to 1 GB.

    /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C

    “I

  12. Freddie says:

    @Morbius: running that iozone command on my storage server gives me just shy of 350 MBytes/sec write throughput, or just under 15 MBytes/sec per drive.

    Removing the -S and -L, changing the -r to 128 KB (max size for ZFS) I just 400 MBytes/sec write, or just under 20 MBytes/sec per drive.

  13. Jeremy McMillan says:

    In 8.0 RC1 the PMBR and gptsfzloader both work because I can get loader loaded, but loader can’t seem to see the GPT. It only sees an invalid slicemap, and the zfs devices list is empty. Thus it can’t find loader.conf or any of it’s 4th magic, nor kernel or modules.

    This is sad because the stuff that works is written in assembler, and the loader, in C should be easier to fix. So there’s PMBR code, gpt*boot, and loader code duplicating the same function of reading the list of partitions in the GPT. The last one is b0rked.

    Can someone help me identify a version that is known to work in this configuration so I can hack something together?

  14. Buganini says:

    128 blocks (64k) freebsd-boot is not enough for latest gptzfsboot,
    its 80k now.

    And remember to update gptzfsboot if you `zpool upgrade`
    or it won’t boot :(

  15. [...] advocacy, but how I upgraded my staging environment under VMWare Fusion.Most steps I followed from lulf’s excellent article. but I had to do some extra work as this was not an install from scratchThe system was originally [...]

  16. Quora says:

    Should I install FreeBSD 8.2 now, or wait until 9.0 comes out?…

    Really depends if you need the features of ZFS post v15 which 8.2 comes with. 8.2 itself is great, so if there’s not the need for the latest ZFS, I’d go with 8.2. You’ll have to unlearn some Linuxisms, but overall I think you’ll eventually like Fre…

  17. [...] using the gpt based partition tables. If you want to use these, please refer to the following page: http://lulf.geeknest.org/blog/freebsd/Setting_up_a_zfs-only_system/ or booting zfs as root using a small ufs boot partition as provided by the instructions [...]

  18. Recommended Resources…

    [...]the time to read or visit the content or sites we have linked to below the[...]…

  19. Sources…

    [...]here are some links to sites that we link to because we think they are worth visiting[...]…

Leave a Reply