After loader support for ZFS was imported into FreeBSD around a month ago, I’ve been thinking of installing a ZFS-only system on my laptop. I also decided to try out using the GPT layout instead of using disklabels etc.
The first thing I started with was to grab a snapshot of FreeBSD CURRENT. Since sysinstall doesn’t support setting up ZFS etc, it can’t be used, so one have to use the Fixit environment on the FreeBSD install cd to set it up. I started out by removing the existing partition table on the disk (just writing zeros to the start of the disk will do). If you’re reading this before the january 2009 snapshot of CURRENT comes out , you have to create your own iso image in order to get loader with the latest fixes. Look in src/release/Makefile and src/release/i386/mkisoimages.sh for how to do this.
Then, the next step was to setup the GPT with the partitions that I wanted to have. Using gpt in FreeBSD, one should create one partition to contain the initial gptzfsboot loader. In addition, I wanted a swap partition, as well as a partition to use for a zpool for the whole system.
To setup the GPT, I used gpart(8) and looked at examples from the man-page. The first thing to do is to setup the GPT partition scheme, first by creating the partition table, and then add the appropriate partitions.
# gpart create -s GPT ad4
# gpart add -b 34 -s 128 -t freebsd-boot ad4
# gpart add -b 162 -s 5242880 -t freebsd-swap ad4
# gpart add -b 5243042 -s 125829120 -t freebsd-zfs ad4
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad4
This creates the initial GPT, and adds three partitions. The first partition contains the gptzfsboot loader which is able to recognize and load the loader from a zfs partition. The second partition is the swap partition (I used 2.5 GB for swap in this case). The third partition is the partition containing the zpool (60GB). Sizes and offsets are specified in sectors (1 sector is typically 512 bytes). The last command puts the needed bootcode into ad4p1 (freebsd-boot).
Having setup the partitions, the hardest part should be done. As we are in the fixit environment, we can now create the zpool as well.
# zpool create data /dev/ad4p3
The zpool should now be up and running. I then decided to create the different filesystems i wanted to have in this pool. I created /usr, /home and /var (I use tmpfs for /tmp).
Then, freebsd must be installed on the system. I did this by copying all folders from /dist in the fixit environment into the zpool. In addition, the /dev folder have to be created. For better details on this, you can follow (http://wiki.freebsd.org/AppleMacbook) At least /dist/boot should be copied in order to be able to boot.
Then, the boot have to be setup. First, boot/loader.conf have to contain:
zfs_load="YES"
vfs.root.mountfrom="zfs:data"
Any additional filesystems or swap has to be entered into etc/fstab, in my case:
/dev/ad4p2 none swap sw 0 0
I also entered the following into etc/rc.conf
zfs_enable="YES"
In addition, boot/zfs/zpool.cache has to exist in order to be able to let the zpool be imported automatically when zfs loads on system boot. To do this, I had to:
# mkdir /boot/zfs
# zpool export data && zpool import data
In order to make /boot/zfs/zpool.cache get populated in the Fixit environment. Then, I copied zpool.cache to boot/zfs on the zpool:
# cp /boot/zfs/zpool.cache /data/boot/zfs
Finally, a basic system should be installed.The last ting to do is to unmount the filesystem(s) and set a few properties:
# zfs set mountpoint=legacy data
# zfs set mountpoint=/usr data/usr
# zfs set mountpoint=/var data/var
# zfs set mountpoint=/home data/home
# zpool set bootfs=data data
To get all the quirks right, such as permissions etc, you should to a real install with making world or using sysinstall when booted into the system. Reboot, and you might be as lucky as me and boot into your ZFS-only system
For further information, take a look at:
http://wiki.freebsd.org/ZFSOnRoot
which contains some information on how to use ZFS as root, but by booting from ufs and:
http://wiki.freebsd.org/AppleMacbook
which has a nice section on setting up the zpool in a Fixit environment.
Update:
When rebuilding FreeBSD after this type of install, it’s also important that you build with LOADER_ZFS_SUPPORT=YES in order for the loader to be able to read zpools.
My understanding of ZFS support in FreeBSD is that it is still unstable and you might lose your data. Has this changed?
[...] in volums has an entry describing how to setup a ZFS based FreeBSD install. Too bad it’s not built into the installer. In 7.2 [...]
Well, i’ve not encountered any problems regarding corruption. The ZFS related problems these days seem to be a few issues with NFS, as well as kernel memory exhaustion. The latter doesn’t result in any data loss, and can be solved by testing and tuning your system appropriately.
I had some tests on FreeBSD (as my favorite OS) with ZFS, everything seemed cool, including the performance, but then just for a capability named “NFSLOG” I switched to Opensolaris on my storage servers, the result was awful at performance side of NFS, and the reason was rational: the zi-log (ZIL) should be t0o speedy, than a hard disk can do! and I decided to use a NVRAM to overcome this problem. but even now, I don’t know how FreeBSD passed the problem, but I guess that they ignored something in NFS over ZFS (perhaps they just intended async NFS). However, I had have no problems with ZFS on freebsd at the time of tests, I think it was 9 months ago, but be careful.
@Joe: ZFS is still considered experimental in FBSD. ZFS is also still at version 6 in FreeBSD 7.1; I beleive that’s going to change to ZFS v.13 in 8.0. As for stability, it’s been reported that heavy multithreaded loads can cause a panic. For example, the following will bring down the current version of ZFS on 7.0-RELEASE (and I’m assuming 7.1 too) in a few minutes:
/usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C
Rumour has it that the bugs which cause this are fixed in Pawel’s latest ZFS code but that’s not in 7.1 RELEASE, unfortunately. My impression is it won’t appear until 8.0-RELEASE.
I’m just testing FBSD ZFS on a new 8-core, 8GB ram Xeon 24TB (24×1TB disc) storage system and while I’ve not paniced the system I’m finding that it’s slow (although to be fair, in my experience BSD software RAID tends to be slow, period). I had hoped to go the FBSD/ZFS route on this system, and held out till 7.1-RELEASE hoping for some needed fixes, but now it’s looking like Solaris will be the OS of choice based on the employer’s demand for ZFS.
For example, I can’t write to a 6 disc (6×1TB) Raidz2 ZFS pool faster than about 50 MB/s average. It also has a tendency to write/pause/write/pause. You can see the array lights go on, then go off, pause, go on etc. I tried a different brand of controller, same thing. I also tried a Raidz i.e. ZFS Raid5 briefly on another fast multicore system (all of these are various Supermicro boards) and it still had the tendency to cycle write/pause/write at about 5 second intervals. ZFS is also very CPU intensive when scrubbing, which could well be the case on Solaris too. For example, all 8 cores are in use (2.33GHz) and it still takes well over an hour to scrub 1.1TB of data on a Raidz2. So scrubbing 24TB is going to take all day using all 8 cores. By comparison, the 24 port Areca 1680 controller in that box can do its RAID6 scrub in about the same time (even at its low priority setting) but the difference is the system CPU cores are idle/usable for something else.
I have no idea how people on Solaris are getting 100s of MB/sec with ZFS. Days of tweaking the knobs on 7.0-RELEASE ZFS didn’t fundamentally change my reported performance. My conclusion at this time is that if you want totally stable (i.e. stablest) and fastest ZFS then use the Solaris reference implentation. If you don’t need the fastest speed and the ZFS array is not heavily trafficked, then you can try the BSD version and hope the stability wrinkles are ironed out in time, which is hopefully 8.0-RELEASE by the looks of things.
@Morbius: Your comments are similar to what I’ve read so far. For now, I’m running Open Solaris. I will revisit ZFS when 8.0 is released.
ZFS is stable, I’ve been using it since mid 2007 on a 10TB file-server. Had some problems with samba, but nothing else…
[...] instructions can be found here (Lost in volumes – 16/12/2008) Tags: [...]
I too have been using ZFS with FreeBSD 7 since mid 2007 on a 4.5TB RAIDZ2 array. At that time FBSD7 was not yet released i.e. it was 7.0-CURRENT. No problems at all. Totally stable.
You will find hardware support problems with Opensolaris. It supports so little. You have to specifically buy hardware for *it*. Once you have it installed, good luck on ever updating it, without a reinstall.
FBSD is much easier to keep up to date, both OS and application wise.
Read the opensolaris forums and you will see reports of kernel panics with Opensolaris too.
I installed my zfs-only system, following this instruction.
But I need LOADER_ZFS_SUPPORT=YES when I build loader,
or the loader can’t recognize zfs.
Ah, yes. I’ll add it, thanks
@ Morbius: ZFSv13 was just MFC’d from -CURRENT to -STABLE, which means it will be available with the release of FreeBSD 7.3, and can be tested right now by upgrading to RELENG_7.
Running the following on our server (4 CPU cores, 8 GB RAM, 24 SATA drives configured into 3 raidz2 vdevs of 8-drives) gives 15 MB/s per drive writes, without any lockups. FreeBSD 7.2, without kmem tuning, arc limited to 1 GB.
/usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C
“I’m just testing FBSD ZFS on a new 8-core, 8GB ram Xeon 24TB (24×1TB disc) storage system and while I’ve not paniced the system I’m finding that it’s slow (although to be fair, in my experience BSD software RAID tends to be slow, period). I had hoped to go the FBSD/ZFS route on this system, and held out till 7.1-RELEASE hoping for some needed fixes, but now it’s looking like Solaris will be the OS of choice based on the employer’s demand for ZFS.”
We sustain 80 MBps read (with peaks to 120 MBps) and 50 MBps write (with peaks just over 70 MBps) on our storage server, doing rsync backups for 105 systems every night, to a gzip-9 compressed ZFS filesystem. Other than a few hiccups before we got the tuning right for this setup (we started wtih 7-STABLE before 7.1 was released), it’s been rock-solid for us.
“For example, I can’t write to a 6 disc (6×1TB) Raidz2 ZFS pool faster than about 50 MB/s average.”
We’re using a 3Ware 9650SE-16ML (PCIe) and a 3Ware 9550SXU-16ML (PCI-X) connected to 24 WD SATA2 500 GB drives. Each drive is configured on the RAID controller as “Single Drive” ‘arrays’ (not JBOD) which allows us to monitor the drives using the RAID controller, and use the cache on the controller. They show up as separate SCSI disks in the OS.
“It also has a tendency to write/pause/write/pause.”
I’ve seen this on my home system using a simple 3-drive raidz vdev, and it’s kind of annoying. But we haven’t seen this on the work systems.
“ZFS is also very CPU intensive when scrubbing, which could well be the case on Solaris too. For example, all 8 cores are in use (2.33GHz) and it still takes well over an hour to scrub 1.1TB of data on a Raidz2.”
I’ll have to double-check this on our systems. We’ve only run a re-silver once, and that was extremely CPU/disk intensive, but we had a poorly configured pool at that time (single 24-drive raidz2 vdev).
Just started a scrub … each CPU core is listed as 75% idle in top, although “zpool status” does show an ETA of 550 hours (10 TB pool). gstat shows 15 MBps read per drive.
“So scrubbing 24TB is going to take all day using all 8 cores. By comparison, the 24 port Areca 1680 controller in that box can do its RAID6 scrub in about the same time (even at its low priority setting) but the difference is the system CPU cores are idle/usable for something else.”
Well, duh, the RAID controller has it’s own processor for doing that.
“I have no idea how people on Solaris are getting 100s of MB/sec with ZFS.”
It all depends on how the pool is created, how many vdevs it has (the more vdevs, the better the I/O as it stripes across all the vdevs), how each vdev is created (don’t use more than 8-9 drives per raidz vdev), and what the IO/sec rating is for the drives.
“Days of tweaking the knobs on 7.0-RELEASE ZFS didn’t fundamentally change my reported performance.”
ZFS in 7.0 was known to have issues, and the recommendation was to upgrade to 7-STABLE to get the fixes. ZFS in 7.1 worked much better. And things are even better in 7.2.
With the upgrade to ZFSv13 in 7-STABLE (which will be available in 7.3), things are getting even better.
IOW, now is the time to start looking at ZFS in FreeBSD.
@Morbius: running that iozone command on my storage server gives me just shy of 350 MBytes/sec write throughput, or just under 15 MBytes/sec per drive.
Removing the -S and -L, changing the -r to 128 KB (max size for ZFS) I just 400 MBytes/sec write, or just under 20 MBytes/sec per drive.
In 8.0 RC1 the PMBR and gptsfzloader both work because I can get loader loaded, but loader can’t seem to see the GPT. It only sees an invalid slicemap, and the zfs devices list is empty. Thus it can’t find loader.conf or any of it’s 4th magic, nor kernel or modules.
This is sad because the stuff that works is written in assembler, and the loader, in C should be easier to fix. So there’s PMBR code, gpt*boot, and loader code duplicating the same function of reading the list of partitions in the GPT. The last one is b0rked.
Can someone help me identify a version that is known to work in this configuration so I can hack something together?
128 blocks (64k) freebsd-boot is not enough for latest gptzfsboot,
its 80k now.
And remember to update gptzfsboot if you `zpool upgrade`
or it won’t boot
[...] advocacy, but how I upgraded my staging environment under VMWare Fusion.Most steps I followed from lulf’s excellent article. but I had to do some extra work as this was not an install from scratchThe system was originally [...]