From sysinstall to ZFS-only configuration.

August 6, 2010 by · 49 Comments 

I’d like to share how we set up new servers at my company.

As you know, sysinstall currently doesn’t support ZFS or GEOM configuration, but I’ll show you how to convert system installed with sysinstall to ZFS-only server. Even if you don’t want to follow all the steps, please take a look at ZFS datasets layout, which after several modifications I consider quite optimal for FreeBSD.

After doing all the steps below your system will use GPT partitions, encrypted+mirrored swap and mirrored ZFS system pool.

I’m assuming your server contains two identical disks (ada0 and ada1).

Start from installing FreeBSD on the first disk using regular installation CD/DVD. Choose exactly one slice and exactly one partition. Reboot.

Your system is now up and running, booted from single UFS file system.
Add the following lines to /boot/loader.conf:

geom_eli_load="YES"
geom_label_load="YES"
geom_mirror_load="YES"
geom_part_gpt_load="YES"
zfs_load="YES"
vm.kmem_size="6G" # This should be 150% of your RAM.
vfs.zfs.arc_max="3G" # This should be a little less than the amount of your RAM.

Partition the second disk:

# gpart create -s GPT ada1
# gpart add -b 34 -s 128 -t freebsd-boot ada1
# gpart add -s 2g -t freebsd-swap -l swap1 ada1
# gpart add -t freebsd-zfs -l system1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Create swap on the second partition and ZFS system pool on the third partition:

# gmirror label -F -h -b round-robin swap /dev/gpt/swap1
# zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O canmount=off system /dev/gpt/system1

Create root file system and update /etc/fstab:

# cat > /etc/fstab
system/rootfs / zfs rw,noatime 0 0
/dev/mirror/swap.eli none swap sw 0 0
^D
# zfs create -o mountpoint=legacy -o setuid=on system/rootfs
# zpool set bootfs=system/rootfs system
# mount -t zfs system/rootfs /mnt

Create the rest of the file systems (everything should be mounted below /mnt/):

# zfs create system/root
# zfs create -o compress=lzjb system/tmp
# chmod 1777 /mnt/tmp
# zfs create -o canmount=off system/usr
# zfs create -o setuid=on system/usr/local
# zfs create -o compress=gzip system/usr/src
# zfs create -o compress=lzjb system/usr/obj
# zfs create -o compress=gzip system/usr/ports
# zfs create -o compress=off system/usr/ports/distfiles
# zfs create -o canmount=off system/var
# zfs create -o compress=gzip system/var/log
# zfs create -o compress=lzjb system/var/audit
# zfs create -o compress=lzjb system/var/tmp
# chmod 1777 /mnt/var/tmp
# zfs create -o canmount=off system/usr/home
# zfs create system/usr/home/pjd
(create file systems for all your users)

Enable ZFS in /etc/rc.conf:

# echo 'zfs_enable="YES"' >> /etc/rc.conf

I recommend setting ports work directory to /usr/obj:

# echo WRKDIRPREFIX=/usr/obj >> /etc/make.conf

Copy entire system to the second disk (note there are two dashes before one-file-system!!):

# cd /
# tar -c --one-file-system -f - . | tar xpf - -C /mnt/

Unmount ZFS file system and change pool mountpoint to /. It will be inherited by all file systems:

# zfs umount -a
# umount /mnt
# zfs set mountpoint=/ system

Reboot. If your machine booted fine (it should, but…) you will see the following:

# mount
system/rootfs on / (zfs, local, noatime)
devfs on /dev (devfs, local)
system/root on /root (zfs, local, noatime, nosuid)
system/tmp on /tmp (zfs, local, noatime, nosuid)
system/usr/home/pjd on /usr/home/pjd (zfs, local, noatime, nosuid)
system/usr/obj on /usr/obj (zfs, local, noatime, nosuid)
system/usr/ports on /usr/ports (zfs, local, noatime, nosuid)
system/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime, nosuid)
system/usr/src on /usr/src (zfs, local, noatime, nosuid)
system/var/audit on /var/audit (zfs, local, noatime, nosuid)
system/var/log on /var/log (zfs, local, noatime, nosuid)
system/var/tmp on /var/tmp (zfs, local, noatime, nosuid)

# swapctl -l
Device: 1024-blocks Used:
/dev/mirror/swap.eli 4194300 4888

Now we need to attach the first disk (ada0):

# dd if=/dev/zero of=/dev/ada0 count=79
# gpart create -s GPT ada0
# gpart add -b 34 -s 128 -t freebsd-boot ada0
# gpart add -s 2g -t freebsd-swap -l swap0 ada0
# gpart add -t freebsd-zfs -l system0 ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

# gmirror insert -h -p 1 swap /dev/gpt/swap0
(wait for gmirror to synchronize swap)
# gmirror status
Name Status Components
mirror/swap COMPLETE gpt/swap1
gpt/swap0

# zpool attach system /dev/gpt/system1 /dev/gpt/system0
(wait for pool to resilver)
# zpool status
pool: system
state: ONLINE
scrub: resilver completed after 0h2m with 0 errors on Mon Aug 2 11:28:45 2010
config:

NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system1 ONLINE 0 0 0 55,5M resilvered
gpt/system0 ONLINE 0 0 0 1,67G resilvered

errors: No known data errors

That’s all folks.

BTW. Because your /var/log/ is compressed by ZFS, you can turn off compressing the logs while you rotate them in /etc/newsyslog.conf.

Below you can find few reasons why I use the proposed datasets layout:

  • There is no manipulation of mountpoint property of any dataset except for dataset system, which starts the tree. When you manage ZFS it might be confusing when dataset place in ZFS hierarchy doesn’t match mount point. The mountpoint property is always inherited, so if you change it for some reason, you know that all datasets below will inherit it properly and no other changes are needed.
  • Note that system/usr and system/var are not mounted. The /usr/bin, /usr/sbin, /var/named, etc. directories all belong to system/rootfs dataset. When you upgrade your system, it should be enough to snapshot system/rootfs dataset alone and rollback only it when something goes wrong.
  • In proposed layout you can set setuid to off for all datasets except for system/rootfs and system/usr/local.
  • You have /usr/src, /usr/obj, /tmp, /var/tmp, /var/log and /var/audit compressed.
  • Having entire OS in one dataset make it easy to create jails from it. What I do when I create new jail is to clone system/rootfs to system/jails/<name> and create the following additional datasets for jail:
    • system/jails/<name>/etc
    • system/jails/<name>/tmp
    • system/jails/<name>/var
    • system/jails/<name>/usr (set canmount to off)
    • system/jails/<name>/usr/local
    • system/jails/<name>/usr/work (I build ports here)
    • system/jails/<name>/root (jailed root home directory)

    This way you can make system/jails/<name> dataset read-only and when
    you update your main system you perform the followin steps to update a jail:

    • stop the jail
    • rename destroy system/jails/<name> to system/jails/<name>_old
    • clone system/jails/<name> again from system/rootfs
    • rename all datasets which are now below system/jails/<name>_old to newly cloned system/jails/<name>
    • update system/jails/<name>/etc
    • update system/jails/<name>/var

About pjd

Comments

49 Responses to “From sysinstall to ZFS-only configuration.”
  1. Giacomo Olgeni says:

    I was wondering… is “-b 34″ required for some reason, or the default is ok too?

  2. Daniel says:

    You don’t need “-b 34″ although IMO it makes sense to specify “-b 1m” that way your partitions are well aligned (eg for 4k sector drives).

  3. Giacomo Olgeni says:

    Got it… thanks!

  4. Justin says:

    Neat. I am ceratinly going to try this. If I wanted a raidz instead I could do the same thing assuming I had three drives right?

  5. Fluff says:

    Well, I dont think you can boot from a raidz pool, so you would have to create two pools, a zfs mirror to boot from and a raidz to store your data.

  6. pjd says:

    Fluff: Yes can boot from RAIDZ on FreeBSD just fine.

  7. Daniel says:

    Some good ideas. Nice.

    The tar argument should be –one-file-system; You sort of skip the geli config, yet assume it exists for the swap parturition.

    You also need to create system/jails/name/usr filesystem with canmount=off property before creating it’s children.

    It might also be good idea to create an system/usr/local filesystem to not keep the installed packages together with the base system. Then, you might clone it to the jail’s /usr/local if desired.

    I am curious about the jails setup though — how do you populate the per-jail /etc and /var dirs?

  8. pjd says:

    Daniel:
    Thanks for your comments, answers below:

    > The tar argument should be

  9. Daniel says:

    About tar, it’s the blog software: I only see one dash :)

    Coincidentally, I have about the same setups (although rely more on setting the mountpoint property) and just wanted to verify how your instructions work — here is an spare system to play with right now.

  10. Dan says:

    Shouldn’t vm_kmem_size be vm.kmem_size??

  11. pjd says:

    Indeed. Fixed. Thanks.

  12. BSDr says:

    Thanks for this excellent article. This is a great “best practice” summary for ZFS installations. A future handbook chapter no doubt ;-)

    It would be even easier if swap could reliably be done on ZFS volumes (it seems like something people avoid … at least on the 7.* releases of ZFS it always would lead to a panic/freeze/crash after a while and sometimes right after boot) and ifSun/Oracle would finish and release ZFS encryption :)

    I notice you are not using ZVols anywhere I have a few questions about them. ZVol swap seems convenient but a bad idea for a few reasons though. Do you use it or recommend it at all? I ask because even if zfs swap volumes were more stable, swap on its own dedicated partition/disk just seems better be a better ideas. Are the zvols independent of other ZFS operations or are they somehow constrained by ZFS “layer”? ZFS seems incredibly slow compared to ufs/ext etc (not a criticism it does more and will likely improve with future releases etc etc) that I wouldn’t want ZFS “mixed up” in swap and slowing it down. I like the convenience ans data integrity of ZFS but I’m not sure how to use ZVols (ufs, swap, iscsi exported whatever). They seem to make zfs behave strangely and work better when set up on the hardware in the “traditional” geom manner.

    Any thoughts?

  13. Dan says:

    I have a question about the system/root (legacy) mountpoint.

    Could you please explain what it’s for? If I look at it, I see that it references a fair bit of data (908M on my 8.1 system). What’s in that data and how is it accessed?

    Thanks!

  14. BSDr says:

    @Dan that should be data that is “on the platter” inside the “/” filesystem only … in my case that’s /boot/ /etc/ /root/ … The reason to have it is to nest the inherited zfs mount points under a single root as typical “tree”. It’s less confusing that way. You could have a zfs called zpool/zfilesystem/weirdlocation set to mouint on “/” if you wanted along with a 2nd zfs with the mount point set to “/usr” that is at otherzpool/zfilesystem/weirderlocation … Pavel’s method keeps things clean and organized.

    On my system If I use “du -chx / ” I see about 350 MB of data referenced on / (which is mounted from zfs system/root). It’s mostly just junk in root’s home /root and I also have a messy boot from using source upgrades and freebsd-update etc so I have /boot/GENERIC and /boot/kernel etc. etc.

    If you have a few kernels, maybe a core file or some stuff in /root or accidentally left over junk under /mnt (eg when nothing is mounted there) you could get to nearly a Gb pretty quickly.

    Try du -chx / to see what is on the legacy mounted “/”

  15. Dan says:

    @BSDr: I see some things with du -chx that I would not expect. Specifically I see some files from /var, /usr and others.

    Could you please email me on fbsd at dannyspace dot net to discuss?

  16. cowbert says:

    In 8.2, the post-reboot “mount” does not show any zfs filesystems mounted except for system/root on / and devfs. I have to manually invoke “zfs mount -a” to get the rest of the zfs to legacy mount. Any idea why this might be happening? I don’t know if it affects functionality either, as without the rest of the zfs datasets mounted in legacy, I can still access their data through the root mount…

  17. Igor says:

    “–one-file-system”, not “-one-file-system” (additional dash).
    I feel kind of stupid, but I wasted 10 minutes on that.
    Thank you for the nice guide!

  18. Henri Hennebert says:

    To be allowed to rollback system/root without disturbing system/usr/local I think it would be
    useful to create 2 directories:

    /usr/local/var/db/pkg
    /usr/local/var/db/ports

    and have

    lrwxr-xr-x 1 root wheel 21 May 30 14:47 /var/db/pkg -> /usr/local/var/db/pkg
    lrwxr-xr-x 1 root wheel 23 May 30 14:47 /var/db/ports -> /usr/local/var/db/ports

    PS – ZFS rocks

  19. I just like the approach you took with this subject. It isn

  20. Nice brief and this post helped me alot in my college assignement. Say thank you you as your information.

  21. mac makeup says:

    I searched for something completely different, but found your website! And have to say thanks. Nice read. Will come back.

  22. Ari Maniatis says:

    Your recommendation to set vm.kmem_size to 150% of memory is at odds with the FreeBSD default of vm.kmem_size_scale=1. But having just suffered from a FreeBSD 8.2 machine hit that kmem_map too small error ( http://freebsd.1045724.n5.nabble.com/kmem-map-too-small-with-ZFS-and-8-2-RELEASE-td4029979.html ) I wonder if your recommendation is the way to go.

    So, what are your thoughts about this:

    1. Should we be setting vm.kmem_size_scale=1.5 or setting vm.kmem_size directly to 150% of actual RAM?
    2. Do the instructions in this blog supercede the recommendation of the official FreeBSD wiki which suggest that no tuning at all is best for ZFS on recent FreeBSD versions under 64bit?

    Thanks for all your great work.

    Ari

  23. No effort would be no return! ! !

  24. Welcome to UK Mulberry Factory Shop! Shop luxury Mulberry Bags at online Mulberry factory shop,here you can enjoy many different kinds of fashionable mulberry bags. Mulberry bag is a classic and fashionable brand of Britain. It becomes more and more popular among people in UK.

  25. Precisely what I was looking for. thank you for your time in the education section and keep the good work!

  26. This article is very use full for me! I can see that you are putting a lots of efforts into your blog. I will keep watching in your blog, thanks.

  27. such an amazing blog you have posted dear i like it and also suggesting it to my friends for visiting your blog because it has really admirable and informative data which provide us through your blog so i would like to thank you for sharing it with us and also appreciate you on this so keep it up

  28. If you are looking at purchasing on your own bags that are the perfect example for luxurious, modernity and trend then you definitely should check out the latest assortment of mulberry bags.

  29. I like this article very much because it is innovative. More innovative things are there to read. Good work. Thanks for sharing.

  30. Boot camp says:

    I thinks the daily walk or running is the best and better option for you to reduce your extra body fats easily,.Swimming is also best choice for that,.

  31. I will keep your new article. I really enjoyed reading this post, thanks for sharing.

  32. Just wanted to convey my appreciation for what I learned from this article. I look forward to learning more on this website.

  33. seo india says:

    For women who live effective alternative inside Exceptional Nike Air Yeezy Good performance shoes. These come in impartial colorings with regards to buy nike sb shoes add-ons after which it orange.

  34. I really agree on what you say here.Hope We can be friends to exchange idea on this subject.Thanks.

  35. efect,I like this post very much,Bay the way Introduction my shop? my shop sale Mtb shoes and another handbags cheap Ugg boot.Very comfortable to wear?http://www.oakleycheapsunglass.com

  36. Really this is a new information for me. You have described very well. I like this article very much. Thanks for sharing.

  37. Ford ids says:

    Offer high quality and brand new VAS 5054A at low price. 100Z

  38. Hello,
    From sysinstall to ZFS-only configuration.
    Really Fantastic posts I was looking for some information about this idea,
    thank you!

  39. Really this is a new idiot for me. You have described very well. I like this item very much. Thanks for sharing.

  40. This is really a great help especially for those who are out of themselves since the situation is going to be actuality difficult to handle. You have described very well. I like this object very much. Thanks for sharing.

  41. Smithd297 says:

    Hey very nice blog!! Man .. Excellent .. Amazing .. I will bookmark your site and take the feeds alsoI’m satisfied to find a lot of useful info right here within the post, we’d like develop extra strategies in this regard, thanks for sharing. gekekbcfeefacfcc

  42. Pharmc71 says:

    Very nice site!

  43. Smithg414 says:

    Thanks for the post.Really looking forward to read more. Will read on kfadddecccgcakee

Trackbacks

Check out what others are saying about this post...
  1. [...] From sysinstall to ZFS-only configuration.

  2. Private Servers says:

    Habbo Retros…

    Habbo Retros are slowly attracting more users with each passing day, most people prefer to play Habbo Retros with pets according to a recent google study, further evidence also supports that Habbo Retros have lead to an annual decrease in revenue for s…

  3. [...] then followed, more or less, Pawel’s excellent instructions for creating a fully mirrored setup. However, I had to deviate from them somewhat, so here’s my [...]

  4. Websites You Should Visit…

    [...]very few websites that happen to be detailed below, from our point of view are undoubtedly well worth checking out[...]…

  5. Trackback Link…

    [...]Here are some of the sites we recommend for our visitors[...]…



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!