Category Archives: kernel

The FreeBSD-linuxulator explained (for developers): basics

The last post about the Linuxulator where I explained the Linuxulator from an user point of view got some good amount of attention. Triggered by a recent explanation of the Linuxulator errno stuff to a fellow FreeBSD developer I decided so see if more developers are interested in some more info too…

The syscall vector

In sys/linux/linux_sysvec.c is all the basic setup to handle Linux “system stuff

ZFS and NFS / on-disk-cache

In the FreeBSD mailinglists I stumbled over  a post which refers to a blog-post which describes why ZFS seems to be slow (on Solaris).

In short: ZFS guarantees that the NFS client does not experience silent corruption of data (NFS server crash and loss of data which is supposed to be already on disk for the client). A recommendation is to enable the disk-cache for disks which are completely used by ZFS, as ZFS (unlike UFS) is aware of disk-caches. This increases the performance to what UFS is delivering in the NFS case.

There is no in-deep description of what it means that ZFS is aware of disk-caches, but I think this is a reference to the fact that ZFS is sending a flush command to the disk at the right moments. Letting aside the fact that there are disks out there which lie to you about this (they tell the flush command finished when it is not), this would mean that this is supported in FreeBSD too.

So everyone who is currently disabling the ZIL to get better NFS performance (and accept silent data corruption on the client side): move your zpool to dedicated (no other real FS than ZFS, swap and dump devices are OK) disks (honest ones) and enable the disk-caches instead of disabling the ZIL.

I also recommend that people which have ZFS already on dedicated (and honest) disks have a look if the disk-caches are enabled.

Share

v4l support in the linuxulator MFCed to 8-stable

I merged the v4l translation layer into the linuxulator of 8-stable. As in –current, this just means that linux apps (like Skype) can now use FreeBSD native devices which conform to the v4l ABI. The port multimedia/webcamd provides access to some webcams (or DVB hardware) via the v4l ABI.

People which want to test the linuxulator part should first make sure a native FreeBSD application has no problem accessing the device.

Share

ARC (adaptive replacement cache) explained

At work we have the situation of a slow application. The vendor of the custom application insists that the ZFS (Solaris 10u8) and the Oracle DB are badly tuned for the application. Part of their tuning is to limit the ARC to 1 GB (our max size is 24 GB on this machine). One problem we see is that there are many write operations (rounded values: 1k ops for up to 100 MB) and the DB is complaining that the logwriter is not able to write out the data fast enough. At the same time our database admins see a lot of commits and/or rollbacks so that the archive log grows very fast to 1.5 GB. The funny thing is… the performance tests are supposed to only cover SELECTs and small UPDATEs.

I proposed to reduce the zfs_txg_timeout from the default value of 30 to some seconds (and as no reboot is needed like for the max arc size, this can be done fast instead of waiting some minutes for the boot-checks of the M5000). The first try was to reduce it to 5 seconds and it improved the situation. The DB still complained about not being able to write out the logs fast enough, but it did not do it as often as before. To make the vendor happy we reduced the max arc size and tested again. First we have not seen any complains from the DB anymore, which looked strange to me because my understanding of the ARC (and the description of the ZFS Evil Tuning Guide regarding the max size setting) suggest that this should not show this behavior we have seen, but the machine was also rebooted for this, so there could also be another explanation.

Luckily we found out that our testing infrastructure had a problem so that only a fraction of the performance test was performed. This morning the people responsible for that made some changes and now the DB is complaining again.

This is what I expected. To make sure I fully understand the ARC, I had a look at the theory behind it at the IBM research center. There are some papers which explain how to extend a cache which uses the LRU replacement policy with some lines of code to an ARC. It looks like it would be an improvement to have a look at which places in FreeBSD a LRU policy is used to test if an ARC would improve the cache hit rate. From reading the paper it looks like there are a lot of places where this should be the case. The authors also provide two adaptive extensions to the CLOCK algorithm (used in various OS in the VM subsystem) which indicate that such an approach could be beneficial for a VM system. I already contacted Alan (the FreeBSD one) and asked if he knows about it and if it could be beneficial for FreeBSD.

WITH_CTF is really usable now

I just committed a patch which makes WITH_CTF usable now.

Yes, you could use it before, but you had to remember to specify it at each build. Now you can add it to your kernel config (via makeoptions), and then you can forget about it.

Thanks to jhb and imp for review and suggestions.

Stability problems solved (hardware problem)

After putting the disks of the 7-stable system which exhibited stability problems into a completely different system (it is a rented root-server, not our own hardware), the system now survived more than a day (and still no trace of problems) with the UFS setup. Previously it would crash after some minutes.

The ZFS setup with the changed hardware had a problem during the night before (like always after all my ZFS related changes on this machine), but on this machine I changed all locks in ZFS from shared locks to exclusive locks (this extended the uptime from 4

I merged a lot of ZFS patches to 7-stable

During the last weeks I identified 64 patches for ZFS which are in 8-stable but not in 7-stable. For 56 of them I had a deeper look and most of them are commited now to 7-stable. The ones of those 56 which I did not commit are not applicable to 7-stable (infrastructure differences between 8 and 7).

Unfortunately this did not solve the stability problems I have on a 7-stable system.

I also committed a diff reduction (between 8-stable and 7-stable) patch which also fixed some not so harmless mismerges (mem-leak and initializing the same mutex twice at different places). No idea yet if it helps in my case.

I also want to merge the new arc reclaim logic from head to 8-stable and 7-stable. Maybe I can do this tomorrow.

Currently I run a test with a kernel where the shared locks for ZFS are switched to exclusive locks.

FreeNAS & Sensors for FreeBSD

This WE I was told that FreeNAS seems to want to move from FreeBSD to Linux (since then it seems there could be a linux and a FreeBSD version). One of the reasons seems to be a missing sensors framework.

As I was committing a port of the OpenBSD sensors framework (produced as part of the Google Summer of Code 2007) to FreeBSD and had to remove it afterwards because one committer complained very loudly, I was asked what the status of this is.

The short status is: Nobody is doing something about it.

Before I explain the long status, I give

Video4Linux support in FreeBSD

Yesterday I committed the v4l support into the linuxulator (in 9-current). Part of this was the import of the v4l header from linux. We have the permission to use it, it is not licensed via GPL. This means we can use it in FreeBSD native drivers, and they are even allowed to be compiled into GENERIC (but I doubt we have a driver which could provide the v4l interface in GENERIC).

The code I committed is

Daily doxygen generated docs of the FreeBSD kernel (head)

I managed to get some time to setup an automated generation of the doxygen docs for kernel subsystems of FreeBSD on my webserver.

Every night/morning (German timezone) the sources will be updated, and the docs get regenerated (this takes some time). Currently this depends upon some patches to the makefile and doxygen config files in tools/kerneldoc/subsys. Everything is generated directly in the place where the webserver will look for to deliver the pages, so if you browse this in the middle of the generation, the content may not be consistent (yet).

Please be nice to the webserver and do not mirror this. You can generate this yourself very easy. Assuming you have the FreeBSD source on a local hard disk, you just need to download the patch from http://www.Leidinger.net/FreeBSD/current-patches/ (if you do not find dox.diff, update your FreeBSD sources and everything will be OK), apply the patch, cd into tools/kerneldoc/subsys and run

FreeBSD Kernel Internals Lecture Posted

The first lecture from Kirk McKusick's full length FreeBSD Kernel Internals course has been posted to the BSD Conferences channel on YouTube. It's been about 10 years since I first took a shortened version of this course at FreeBSDCon 1999, and only a few years since I took the follow up kernel code reading course in Berkeley, and I highly recommend this unique resource to others.This makes the 24th video uploaded to the BSD Conferences channel since I created it just over a month ago. Thanks to Julian Elisher, Jason Dixon, Tomasz Dudzisz, and Kirk McKusick for uploading the conference videos and for contributing to our growing page of tips about video production and publishing on the FreeBSD Wiki.As of this writing we have 644 unique subscribers to the channel and approximately 400 daily views of these videos. To date the most popular videos have been Kris Kennaway speaking about the New features in FreeBSD 7 at MeetBSD 2007, and Jason Dixon's tongue-in-cheek BSD is Dying talk at NYCBSDCon 2006. Note to conference organizers: high level talks about the new features, or talks by speakers as entertaining as Jason Dixon are likely to be well received. The YouTube analytics to the right show the top 10 most popular videos from the channel as well as some demographic information.

FreeBSD Kernel Internals Lecture Posted

The first lecture from Kirk McKusick's full length FreeBSD Kernel Internals course has been posted to the BSD Conferences channel on YouTube. It's been about 10 years since I first took a shortened version of this course at FreeBSDCon 1999, and only a few years since I took the follow up kernel code reading course in Berkeley, and I highly recommend this unique resource to others.



This makes the 24th video uploaded to the BSD Conferences channel since I created it just over a month ago. Thanks to Julian Elisher, Jason Dixon, Tomasz Dudzisz, and Kirk McKusick for uploading the conference videos and for contributing to our growing page of tips about video production and publishing on the FreeBSD Wiki.

As of this writing we have 644 unique subscribers to the channel and approximately 400 daily views of these videos. To date the most popular videos have been Kris Kennaway speaking about the New features in FreeBSD 7 at MeetBSD 2007, and Jason Dixon's tongue-in-cheek BSD is Dying talk at NYCBSDCon 2006. Note to conference organizers: high level talks about the new features, or talks by speakers as entertaining as Jason Dixon are likely to be well received. The YouTube analytics to the right show the top 10 most popular videos from the channel as well as some demographic information.

Module/kernel parameters

Sometimes it's desirable to pass some arguments to module to customize its behavior, e.g.: to set/unset verbosity level of debug output, set operation mode etc... Linux modules can get this information from insmod utility, but kldload is not capable of doing such kind of things. What a pity. But don't get desperate - tunables to the rescue!Just like an ordinary command shell (bash, csh, sh) kernel has its own environment, the set of pairs. You can get, set, test, unset these variables using getenv, setenv, testenv, unsetenv functions in the kernel and kenv(2) syscall or kenv(1) command in userland. So if you want to set verbosity level for module, you would do something like this:
    # kenv zaptel.debug=1    # kldload ./zaptel.ko
and then in module initialization routine:
    static int debug = 0; /* Hush-hush */    ...    char * value = getenv("zaptel.debug");    if (value) {        debug = strtol(value, NULL, 10);        freeenv(value);    }
Too much code for such a simple task, don't you think? Yeah, like for every common task there are useful macroses defined in kernel headers, you should just find them. Heavy coffeinated kernel hackers knew most of them. In this particular case neat code would look something like that (for statically initialized variable):
    static int debug = 0; /* Hush-hush */    TUNABLE_INT("zaptel.debug", &debug);
or, if debug is the member of structure or local variable:
    sc->debug = 0;    TUNABLE_INT_FETCH("zaptel.debug", &sc->debug);
TUNABLE_XXX macroses exist for INT, LONG, ULONG and STR types. TUNABLE_STR unlike the others requires third parameter - maximum size of the string. For a dynamic value retrieval every type has TUNABLE_XXX_FETCH macro defined. Nota bene: if there is no environent variable set with requested name, the value of acceptor variable remains untouched.