From sysinstall to ZFS-only configuration.
August 6, 2010 by pjd · 46 Comments
I’d like to share how we set up new servers at my company.
As you know, sysinstall currently doesn’t support ZFS or GEOM configuration, but I’ll show you how to convert system installed with sysinstall to ZFS-only server. Even if you don’t want to follow all the steps, please take a look at ZFS datasets layout, which after several modifications I consider quite optimal for FreeBSD.
After doing all the steps below your system will use GPT partitions, encrypted+mirrored swap and mirrored ZFS system pool.
I’m assuming your server contains two identical disks (ada0 and ada1).
Start from installing FreeBSD on the first disk using regular installation CD/DVD. Choose exactly one slice and exactly one partition. Reboot.
Your system is now up and running, booted from single UFS file system.
Add the following lines to /boot/loader.conf:
geom_eli_load="YES"
geom_label_load="YES"
geom_mirror_load="YES"
geom_part_gpt_load="YES"
zfs_load="YES"
vm.kmem_size="6G" # This should be 150% of your RAM.
vfs.zfs.arc_max="3G" # This should be a little less than the amount of your RAM.
Partition the second disk:
# gpart create -s GPT ada1
# gpart add -b 34 -s 128 -t freebsd-boot ada1
# gpart add -s 2g -t freebsd-swap -l swap1 ada1
# gpart add -t freebsd-zfs -l system1 ada1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
Create swap on the second partition and ZFS system pool on the third partition:
# gmirror label -F -h -b round-robin swap /dev/gpt/swap1
# zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O canmount=off system /dev/gpt/system1
Create root file system and update /etc/fstab:
# cat > /etc/fstab
system/rootfs / zfs rw,noatime 0 0
/dev/mirror/swap.eli none swap sw 0 0
^D
# zfs create -o mountpoint=legacy -o setuid=on system/rootfs
# zpool set bootfs=system/rootfs system
# mount -t zfs system/rootfs /mnt
Create the rest of the file systems (everything should be mounted below /mnt/):
# zfs create system/root
# zfs create -o compress=lzjb system/tmp
# chmod 1777 /mnt/tmp
# zfs create -o canmount=off system/usr
# zfs create -o setuid=on system/usr/local
# zfs create -o compress=gzip system/usr/src
# zfs create -o compress=lzjb system/usr/obj
# zfs create -o compress=gzip system/usr/ports
# zfs create -o compress=off system/usr/ports/distfiles
# zfs create -o canmount=off system/var
# zfs create -o compress=gzip system/var/log
# zfs create -o compress=lzjb system/var/audit
# zfs create -o compress=lzjb system/var/tmp
# chmod 1777 /mnt/var/tmp
# zfs create -o canmount=off system/usr/home
# zfs create system/usr/home/pjd
(create file systems for all your users)
Enable ZFS in /etc/rc.conf:
# echo 'zfs_enable="YES"' >> /etc/rc.conf
I recommend setting ports work directory to /usr/obj:
# echo WRKDIRPREFIX=/usr/obj >> /etc/make.conf
Copy entire system to the second disk (note there are two dashes before one-file-system!!):
# cd /
# tar -c --one-file-system -f - . | tar xpf - -C /mnt/
Unmount ZFS file system and change pool mountpoint to /. It will be inherited by all file systems:
# zfs umount -a
# umount /mnt
# zfs set mountpoint=/ system
Reboot. If your machine booted fine (it should, but…) you will see the following:
# mount
system/rootfs on / (zfs, local, noatime)
devfs on /dev (devfs, local)
system/root on /root (zfs, local, noatime, nosuid)
system/tmp on /tmp (zfs, local, noatime, nosuid)
system/usr/home/pjd on /usr/home/pjd (zfs, local, noatime, nosuid)
system/usr/obj on /usr/obj (zfs, local, noatime, nosuid)
system/usr/ports on /usr/ports (zfs, local, noatime, nosuid)
system/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime, nosuid)
system/usr/src on /usr/src (zfs, local, noatime, nosuid)
system/var/audit on /var/audit (zfs, local, noatime, nosuid)
system/var/log on /var/log (zfs, local, noatime, nosuid)
system/var/tmp on /var/tmp (zfs, local, noatime, nosuid)
# swapctl -l
Device: 1024-blocks Used:
/dev/mirror/swap.eli 4194300 4888
Now we need to attach the first disk (ada0):
# dd if=/dev/zero of=/dev/ada0 count=79
# gpart create -s GPT ada0
# gpart add -b 34 -s 128 -t freebsd-boot ada0
# gpart add -s 2g -t freebsd-swap -l swap0 ada0
# gpart add -t freebsd-zfs -l system0 ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
# gmirror insert -h -p 1 swap /dev/gpt/swap0
(wait for gmirror to synchronize swap)
# gmirror status
Name Status Components
mirror/swap COMPLETE gpt/swap1
gpt/swap0
# zpool attach system /dev/gpt/system1 /dev/gpt/system0
(wait for pool to resilver)
# zpool status
pool: system
state: ONLINE
scrub: resilver completed after 0h2m with 0 errors on Mon Aug 2 11:28:45 2010
config:
NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system1 ONLINE 0 0 0 55,5M resilvered
gpt/system0 ONLINE 0 0 0 1,67G resilvered
errors: No known data errors
That’s all folks.
BTW. Because your /var/log/ is compressed by ZFS, you can turn off compressing the logs while you rotate them in /etc/newsyslog.conf.
Below you can find few reasons why I use the proposed datasets layout:
- There is no manipulation of mountpoint property of any dataset except for dataset system, which starts the tree. When you manage ZFS it might be confusing when dataset place in ZFS hierarchy doesn’t match mount point. The mountpoint property is always inherited, so if you change it for some reason, you know that all datasets below will inherit it properly and no other changes are needed.
- Note that system/usr and system/var are not mounted. The /usr/bin, /usr/sbin, /var/named, etc. directories all belong to system/rootfs dataset. When you upgrade your system, it should be enough to snapshot system/rootfs dataset alone and rollback only it when something goes wrong.
- In proposed layout you can set setuid to off for all datasets except for system/rootfs and system/usr/local.
- You have /usr/src, /usr/obj, /tmp, /var/tmp, /var/log and /var/audit compressed.
- Having entire OS in one dataset make it easy to create jails from it. What I do when I create new jail is to clone system/rootfs to system/jails/<name> and create the following additional datasets for jail:
- system/jails/<name>/etc
- system/jails/<name>/tmp
- system/jails/<name>/var
- system/jails/<name>/usr (set canmount to off)
- system/jails/<name>/usr/local
- system/jails/<name>/usr/work (I build ports here)
- system/jails/<name>/root (jailed root home directory)
This way you can make system/jails/<name> dataset read-only and when
you update your main system you perform the followin steps to update a jail:- stop the jail
- rename destroy system/jails/<name> to system/jails/<name>_old
- clone system/jails/<name> again from system/rootfs
- rename all datasets which are now below system/jails/<name>_old to newly cloned system/jails/<name>
- update system/jails/<name>/etc
- update system/jails/<name>/var
I was wondering… is “-b 34″ required for some reason, or the default is ok too?
You don’t need “-b 34″ although IMO it makes sense to specify “-b 1m” that way your partitions are well aligned (eg for 4k sector drives).
Got it… thanks!
Neat. I am ceratinly going to try this. If I wanted a raidz instead I could do the same thing assuming I had three drives right?
Well, I dont think you can boot from a raidz pool, so you would have to create two pools, a zfs mirror to boot from and a raidz to store your data.
Fluff: Yes can boot from RAIDZ on FreeBSD just fine.
Some good ideas. Nice.
The tar argument should be –one-file-system; You sort of skip the geli config, yet assume it exists for the swap parturition.
You also need to create system/jails/name/usr filesystem with canmount=off property before creating it’s children.
It might also be good idea to create an system/usr/local filesystem to not keep the installed packages together with the base system. Then, you might clone it to the jail’s /usr/local if desired.
I am curious about the jails setup though — how do you populate the per-jail /etc and /var dirs?
Daniel:
Thanks for your comments, answers below:
> The tar argument should be –one-file-system;
This is exactly what is there. Or am I missing something?
> You sort of skip the geli config, yet assume it exists for the swap parturition.
This is correct. You don’t have to initialize swap partition in any way. The /etc/rc.d/encswap script will scan /etc/fstab on boot and will configure geli with one-time key for each swap provider that ends with .eli.
> You also need to create system/jails/name/usr filesystem with canmount=off property before creating it’s children.
I added a note about that.
> It might also be good idea to create an system/usr/local filesystem to not keep the installed packages together with the base system. Then, you might clone it to the jail’s /usr/local if desired.
Yes, this is what I do in fact, but forgot to mention it.
> I am curious about the jails setup though — how do you populate the per-jail /etc and /var dirs?
We have jailcreate.sh and jailupgrade.sh scripts that do some initialization steps. To populate those dirs you need to run:
# cd /usr/src
# make -k distrib-dirs DESTDIR=/jails/
# make -k distribution DESTDIR=/jails/
About tar, it’s the blog software: I only see one dash
Coincidentally, I have about the same setups (although rely more on setting the mountpoint property) and just wanted to verify how your instructions work — here is an spare system to play with right now.
Shouldn’t vm_kmem_size be vm.kmem_size??
Indeed. Fixed. Thanks.
Thanks for this excellent article. This is a great “best practice” summary for ZFS installations. A future handbook chapter no doubt
It would be even easier if swap could reliably be done on ZFS volumes (it seems like something people avoid … at least on the 7.* releases of ZFS it always would lead to a panic/freeze/crash after a while and sometimes right after boot) and ifSun/Oracle would finish and release ZFS encryption
I notice you are not using ZVols anywhere I have a few questions about them. ZVol swap seems convenient but a bad idea for a few reasons though. Do you use it or recommend it at all? I ask because even if zfs swap volumes were more stable, swap on its own dedicated partition/disk just seems better be a better ideas. Are the zvols independent of other ZFS operations or are they somehow constrained by ZFS “layer”? ZFS seems incredibly slow compared to ufs/ext etc (not a criticism it does more and will likely improve with future releases etc etc) that I wouldn’t want ZFS “mixed up” in swap and slowing it down. I like the convenience ans data integrity of ZFS but I’m not sure how to use ZVols (ufs, swap, iscsi exported whatever). They seem to make zfs behave strangely and work better when set up on the hardware in the “traditional” geom manner.
Any thoughts?
I have a question about the system/root (legacy) mountpoint.
Could you please explain what it’s for? If I look at it, I see that it references a fair bit of data (908M on my 8.1 system). What’s in that data and how is it accessed?
Thanks!
@Dan that should be data that is “on the platter” inside the “/” filesystem only … in my case that’s /boot/ /etc/ /root/ … The reason to have it is to nest the inherited zfs mount points under a single root as typical “tree”. It’s less confusing that way. You could have a zfs called zpool/zfilesystem/weirdlocation set to mouint on “/” if you wanted along with a 2nd zfs with the mount point set to “/usr” that is at otherzpool/zfilesystem/weirderlocation … Pavel’s method keeps things clean and organized.
On my system If I use “du -chx / ” I see about 350 MB of data referenced on / (which is mounted from zfs system/root). It’s mostly just junk in root’s home /root and I also have a messy boot from using source upgrades and freebsd-update etc so I have /boot/GENERIC and /boot/kernel etc. etc.
If you have a few kernels, maybe a core file or some stuff in /root or accidentally left over junk under /mnt (eg when nothing is mounted there) you could get to nearly a Gb pretty quickly.
Try du -chx / to see what is on the legacy mounted “/”
@BSDr: I see some things with du -chx that I would not expect. Specifically I see some files from /var, /usr and others.
Could you please email me on fbsd at dannyspace dot net to discuss?
In 8.2, the post-reboot “mount” does not show any zfs filesystems mounted except for system/root on / and devfs. I have to manually invoke “zfs mount -a” to get the rest of the zfs to legacy mount. Any idea why this might be happening? I don’t know if it affects functionality either, as without the rest of the zfs datasets mounted in legacy, I can still access their data through the root mount…
“–one-file-system”, not “-one-file-system” (additional dash).
I feel kind of stupid, but I wasted 10 minutes on that.
Thank you for the nice guide!
To be allowed to rollback system/root without disturbing system/usr/local I think it would be
useful to create 2 directories:
/usr/local/var/db/pkg
/usr/local/var/db/ports
and have
lrwxr-xr-x 1 root wheel 21 May 30 14:47 /var/db/pkg -> /usr/local/var/db/pkg
lrwxr-xr-x 1 root wheel 23 May 30 14:47 /var/db/ports -> /usr/local/var/db/ports
PS – ZFS rocks
I just like the approach you took with this subject. It isn’t every day that you discover something so concise and enlightening.
Nice brief and this post helped me alot in my college assignement. Say thank you you as your information.
I searched for something completely different, but found your website! And have to say thanks. Nice read. Will come back.
Your recommendation to set vm.kmem_size to 150% of memory is at odds with the FreeBSD default of vm.kmem_size_scale=1. But having just suffered from a FreeBSD 8.2 machine hit that kmem_map too small error ( http://freebsd.1045724.n5.nabble.com/kmem-map-too-small-with-ZFS-and-8-2-RELEASE-td4029979.html ) I wonder if your recommendation is the way to go.
So, what are your thoughts about this:
1. Should we be setting vm.kmem_size_scale=1.5 or setting vm.kmem_size directly to 150% of actual RAM?
2. Do the instructions in this blog supercede the recommendation of the official FreeBSD wiki which suggest that no tuning at all is best for ZFS on recent FreeBSD versions under 64bit?
Thanks for all your great work.
Ari
No effort would be no return! ! !
Welcome to UK Mulberry Factory Shop! Shop luxury Mulberry Bags at online Mulberry factory shop,here you can enjoy many different kinds of fashionable mulberry bags. Mulberry bag is a classic and fashionable brand of Britain. It becomes more and more popular among people in UK.
Precisely what I was looking for. thank you for your time in the education section and keep the good work!
This article is very use full for me! I can see that you are putting a lots of efforts into your blog. I will keep watching in your blog, thanks.
Good post,thx
such an amazing blog you have posted dear i like it and also suggesting it to my friends for visiting your blog because it has really admirable and informative data which provide us through your blog so i would like to thank you for sharing it with us and also appreciate you on this so keep it up
If you are looking at purchasing on your own bags that are the perfect example for luxurious, modernity and trend then you definitely should check out the latest assortment of mulberry bags.
I like this article very much because it is innovative. More innovative things are there to read. Good work. Thanks for sharing.
I thinks the daily walk or running is the best and better option for you to reduce your extra body fats easily,.Swimming is also best choice for that,.
I will keep your new article. I really enjoyed reading this post, thanks for sharing.
Just wanted to convey my appreciation for what I learned from this article. I look forward to learning more on this website.
For women who live effective alternative inside Exceptional Nike Air Yeezy Good performance shoes. These come in impartial colorings with regards to buy nike sb shoes add-ons after which it orange.
I really agree on what you say here.Hope We can be friends to exchange idea on this subject.Thanks.
efect,I like this post very much,Bay the way Introduction my shop? my shop sale Mtb shoes and another handbags cheap Ugg boot.Very comfortable to wear?http://www.oakleycheapsunglass.com
Really this is a new information for me. You have described very well. I like this article very much. Thanks for sharing.
Offer high quality and brand new VAS 5054A at low price. 100Z
Hello,
From sysinstall to ZFS-only configuration.
Really Fantastic posts I was looking for some information about this idea,
thank you!
Really this is a new idiot for me. You have described very well. I like this item very much. Thanks for sharing.
This is really a great help especially for those who are out of themselves since the situation is going to be actuality difficult to handle. You have described very well. I like this object very much. Thanks for sharing.