From sysinstall to ZFS-only configuration.

August 6, 2010 by · 49 Comments 

I’d like to share how we set up new servers at my company.

As you know, sysinstall currently doesn’t support ZFS or GEOM configuration, but I’ll show you how to convert system installed with sysinstall to ZFS-only server. Even if you don’t want to follow all the steps, please take a look at ZFS datasets layout, which after several modifications I consider quite optimal for FreeBSD.

After doing all the steps below your system will use GPT partitions, encrypted+mirrored swap and mirrored ZFS system pool.

I’m assuming your server contains two identical disks (ada0 and ada1).

Start from installing FreeBSD on the first disk using regular installation CD/DVD. Choose exactly one slice and exactly one partition. Reboot.

Your system is now up and running, booted from single UFS file system.
Add the following lines to /boot/loader.conf:

geom_eli_load="YES"
geom_label_load="YES"
geom_mirror_load="YES"
geom_part_gpt_load="YES"
zfs_load="YES"
vm.kmem_size="6G" # This should be 150% of your RAM.
vfs.zfs.arc_max="3G" # This should be a little less than the amount of your RAM.

Partition the second disk:

# gpart create -s GPT ada1
# gpart add -b 34 -s 128 -t freebsd-boot ada1
# gpart add -s 2g -t freebsd-swap -l swap1 ada1
# gpart add -t freebsd-zfs -l system1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Create swap on the second partition and ZFS system pool on the third partition:

# gmirror label -F -h -b round-robin swap /dev/gpt/swap1
# zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O canmount=off system /dev/gpt/system1

Create root file system and update /etc/fstab:

# cat > /etc/fstab
system/rootfs / zfs rw,noatime 0 0
/dev/mirror/swap.eli none swap sw 0 0
^D
# zfs create -o mountpoint=legacy -o setuid=on system/rootfs
# zpool set bootfs=system/rootfs system
# mount -t zfs system/rootfs /mnt

Create the rest of the file systems (everything should be mounted below /mnt/):

# zfs create system/root
# zfs create -o compress=lzjb system/tmp
# chmod 1777 /mnt/tmp
# zfs create -o canmount=off system/usr
# zfs create -o setuid=on system/usr/local
# zfs create -o compress=gzip system/usr/src
# zfs create -o compress=lzjb system/usr/obj
# zfs create -o compress=gzip system/usr/ports
# zfs create -o compress=off system/usr/ports/distfiles
# zfs create -o canmount=off system/var
# zfs create -o compress=gzip system/var/log
# zfs create -o compress=lzjb system/var/audit
# zfs create -o compress=lzjb system/var/tmp
# chmod 1777 /mnt/var/tmp
# zfs create -o canmount=off system/usr/home
# zfs create system/usr/home/pjd
(create file systems for all your users)

Enable ZFS in /etc/rc.conf:

# echo 'zfs_enable="YES"' >> /etc/rc.conf

I recommend setting ports work directory to /usr/obj:

# echo WRKDIRPREFIX=/usr/obj >> /etc/make.conf

Copy entire system to the second disk (note there are two dashes before one-file-system!!):

# cd /
# tar -c --one-file-system -f - . | tar xpf - -C /mnt/

Unmount ZFS file system and change pool mountpoint to /. It will be inherited by all file systems:

# zfs umount -a
# umount /mnt
# zfs set mountpoint=/ system

Reboot. If your machine booted fine (it should, but…) you will see the following:

# mount
system/rootfs on / (zfs, local, noatime)
devfs on /dev (devfs, local)
system/root on /root (zfs, local, noatime, nosuid)
system/tmp on /tmp (zfs, local, noatime, nosuid)
system/usr/home/pjd on /usr/home/pjd (zfs, local, noatime, nosuid)
system/usr/obj on /usr/obj (zfs, local, noatime, nosuid)
system/usr/ports on /usr/ports (zfs, local, noatime, nosuid)
system/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime, nosuid)
system/usr/src on /usr/src (zfs, local, noatime, nosuid)
system/var/audit on /var/audit (zfs, local, noatime, nosuid)
system/var/log on /var/log (zfs, local, noatime, nosuid)
system/var/tmp on /var/tmp (zfs, local, noatime, nosuid)

# swapctl -l
Device: 1024-blocks Used:
/dev/mirror/swap.eli 4194300 4888

Now we need to attach the first disk (ada0):

# dd if=/dev/zero of=/dev/ada0 count=79
# gpart create -s GPT ada0
# gpart add -b 34 -s 128 -t freebsd-boot ada0
# gpart add -s 2g -t freebsd-swap -l swap0 ada0
# gpart add -t freebsd-zfs -l system0 ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

# gmirror insert -h -p 1 swap /dev/gpt/swap0
(wait for gmirror to synchronize swap)
# gmirror status
Name Status Components
mirror/swap COMPLETE gpt/swap1
gpt/swap0

# zpool attach system /dev/gpt/system1 /dev/gpt/system0
(wait for pool to resilver)
# zpool status
pool: system
state: ONLINE
scrub: resilver completed after 0h2m with 0 errors on Mon Aug 2 11:28:45 2010
config:

NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system1 ONLINE 0 0 0 55,5M resilvered
gpt/system0 ONLINE 0 0 0 1,67G resilvered

errors: No known data errors

That’s all folks.

BTW. Because your /var/log/ is compressed by ZFS, you can turn off compressing the logs while you rotate them in /etc/newsyslog.conf.

Below you can find few reasons why I use the proposed datasets layout:

  • There is no manipulation of mountpoint property of any dataset except for dataset system, which starts the tree. When you manage ZFS it might be confusing when dataset place in ZFS hierarchy doesn’t match mount point. The mountpoint property is always inherited, so if you change it for some reason, you know that all datasets below will inherit it properly and no other changes are needed.
  • Note that system/usr and system/var are not mounted. The /usr/bin, /usr/sbin, /var/named, etc. directories all belong to system/rootfs dataset. When you upgrade your system, it should be enough to snapshot system/rootfs dataset alone and rollback only it when something goes wrong.
  • In proposed layout you can set setuid to off for all datasets except for system/rootfs and system/usr/local.
  • You have /usr/src, /usr/obj, /tmp, /var/tmp, /var/log and /var/audit compressed.
  • Having entire OS in one dataset make it easy to create jails from it. What I do when I create new jail is to clone system/rootfs to system/jails/<name> and create the following additional datasets for jail:
    • system/jails/<name>/etc
    • system/jails/<name>/tmp
    • system/jails/<name>/var
    • system/jails/<name>/usr (set canmount to off)
    • system/jails/<name>/usr/local
    • system/jails/<name>/usr/work (I build ports here)
    • system/jails/<name>/root (jailed root home directory)

    This way you can make system/jails/<name> dataset read-only and when
    you update your main system you perform the followin steps to update a jail:

    • stop the jail
    • rename destroy system/jails/<name> to system/jails/<name>_old
    • clone system/jails/<name> again from system/rootfs
    • rename all datasets which are now below system/jails/<name>_old to newly cloned system/jails/<name>
    • update system/jails/<name>/etc
    • update system/jails/<name>/var