Category Archives: kernel


I’ve been having a lot of fun over the last few months with FreeBSD on BeagleBone.  Most recently, that’s involved working on the CPSW ethernet driver.

One nasty bug has been eluding me for quite some time: The controller just stops sending packets after about 20 minutes.

Eventually, I will track down this problem. However, I’ve managed to make the driver quite usable even with such an unpleasant bug: I can leave SSH sessions open for days, download port tarballs, use NFS mounts, and generally do the things you expect a network to do.

The key was to find a really good watchdog strategy. Even with the controller locking up regularly, a good watchdog notices this and resets the driver within a few seconds, fast enough that network protocols simply retry and keep going after the reset. A less effective watchdog can leave the controller non-functioning for a minute or more, resulting in failed transfers and dropped connections.

Network Driver Basics

Three functions that appear in every network driver play a part in the watchdog process:

  • The “start send” routine hands packets to the controller.
  • The “transmit completion interrupt” is invoked by the controller when a packet has finished being sent; this routine recycles the memory and other resources for subsequent packets.
  • The “watchdog ticker” wakes up once a second and decides whether or not to reset the controller.

I’ll refer to these three functions as the “start”, “interrupt”, and “watchdog” functions to match how they are named in the source code of most drivers.

The Standard Watchdog

The standard logic that appears in many FreeBSD network drivers uses a single “timer” variable that is updated in each of the above functions:

  • The start routine sets it to 5 whenever a new packet is added to the controller.
  • The interrupt routine sets it to zero whenever it reclaims the last outstanding packet.
  • The watchdog subtracts one and resets the controller when the counter changes from one to zero. (If the counter is already zero, it’s left alone.)

Remember the goal here is to detect when the network is no longer running.  That is, we want to know when the interrupt has stopped getting called.  In fact, because the interrupt isn’t getting invoked, we can entirely ignore that function for now.

To understand the standard logic, consider three different scenarios:

Scenario One:  An almost idle network, with more than 5 seconds between packets. In this case, the start routine queues some packet and sets the timer to 5. The watchdog function counts this down every second until it hits zero, then resets the controller.

Scenario Two:  A very busy network. It can take only a few milliseconds for a busy machine to completely fill the transmit queue. In this environment, each new packet will cause the timer to get reset to 5.  The watchdog may tick and reduce the timer to 4, but a new packet will immediately reset it to 5. Once the transmit queue is full, however, new packets stop getting added and the start function stops resetting the timer. Again, the watchdog function counts down and resets the controller fairly promptly.

Scenario Three:  A lightly used network. Suppose “ping” is running and the transmitter stops. In this case, one packet is getting sent every second. If the transmit queue holds 100 packets, it will take 100 seconds before the transmit queue fills up.  During that time, the start routine and the watchdog routine alternately set the timer count to 5 and 4.

This last scenario is troublesome. In the first two scenarios, the watchdog function detects and resets the failed transmitter in about 5 seconds.  But in the third scenario, the standard logic can leave the network broken for more than a minute, long enough for TCP sessions to time out and reset.

After a few days of not finding the cause for the transmitter stalls, I decided to spend a little time trying to improve the watchdog itself. I tried a variety of different approaches:  only a few handled all of these scenarios well.

A Better Watchdog

I spent almost a week experimenting with different watchdog logic before I formulated two key questions:

  • Is there something to be done?  If there’s no work to be done, then the watchdog should sit quietly.  For a network driver, this just requires checking whether there are packets in the queue waiting to be sent.
  • Has progress been made?  For network drivers, we have “progress” when any packet completes sending.

Translating these questions into code gives a watchdog function that looks something like this:

cpsw_tx_watchdog(struct cpsw_softc *sc)
  if (sc->tx_in_queue == 0) {
    sc->tx_wd_timer = 0; /* Nothing to do. */
  } else if (sc->tx_completed > sc->tx_completed_at_last_tick) {
    sc->tx_wd_timer = 0;  /* More stuff got done. */
  } else {
    /* Something should have been done! */
    if (sc->tx_wd_timer > 3) {
      ... reset controller ...
  sc->tx_completed_at_last_tick = sc->tx_completed;

Notice that the watchdog timer is no longer touched in either the start or interrupt routines.  The start and interrupt routines only need to maintain two statistics:  a count of how many packets have been taken off the queue by the interrupt routine (tx_completed), and a count of how many packets are still on the controller’s queue (tx_in_queue).

There’s another interesting feature of this new logic. The timer variable now has a comparatively simple interpretation: It is the number of seconds that have elapsed since we noticed work being missed. (As an exercise, try formulating a single sentence that accurately describes the timer variable in the standard logic.)

Most importantly, of course, this logic works well in all of the scenarios described above, including the lightly-used network scenario that causes problems for the standard logic.

Of course, all of the above will be considerably less interesting once I figure out how to keep the controller from stalling every 20 minutes…

Book review: FreeBSD Device Drivers

In mid-April a woman from the marketing department of No Starch Press contacted me and asked if I am interested to do a public review of the FreeBSD Device Drivers book by Joseph Kong (no link to a book shop, go and have a look in your preferred one). Just this simple question, no strings attached.

I had my nose in some device drivers in the past, but I never wrote one, and never had a look at the big picture. I was interested to know how everything fits together, so this made me a good victim for a review (novice enough to learn something new and to have a look if enough is explained, and experienced enough to understand what is going on in the FreeBSD kernel).

Some minutes after I agreed to review it (but with a little notice that I do not know how long I need to review it), I had the PDF version of the book. That was faster than I expected (maybe I am too old-school and used to have paper versions of books in my hands).

Let the review begin… but bear with me, this is the first time I do a real public review of a book (instead of a technical review for an author). And as this is my very own personal opinion, I will not allow comments here. This page is all about my opinion while reading the book, questions I have while reading the book shall serve as a hint about the quality of the book and they should be answered in the book, not here.

In short, the book is not perfect, but it is a good book. There is room for improvement, but on a very high level. If you want to write a device driver for FreeBSD, this book is a must. I suggest to read it completely, even chapters which do not belong to the type of driver you want to write (specially the case studies of real drivers). The reason is that each chapter has some notes which may not only apply to the chapter in question, but to all kinds of device drivers. The long review follows now.

The first chapter is titled “Building and running modules”. The author begins with description of the usual device driver types (NIC driver, pseudo-device, …) and how they can be added to the kernel (statically linked in or as a module). The first code example is a small and easy kernel module, so that we do not have to reboot the system we use to develop a driver (except we make a fault during driver development which causes the machine to panic or hang). Every part of the example is well explained. This is followed by an overview about character devices (e.g. disks) and a simple character-device driver (so far a pseudo-device, as we do not have real hardware we access) which is not only as-well explained as the module-example, but there is also a note where the code was simplified and what should be done instead.

After reading this chapter you should be able to write your own kernel module in 5 minutes (well, after 5 minutes it will not be able to do a lot — just a “hello world” – but at least you can already load/unload/execute some code into/from/in the kernel).

I have not tried any example myself, but I compiled a lot of modules and drivers I modified in the past and remember to have seen the described parts.

The second chapter explains how to allocate and free memory in the kernel. There is the possibility to allocate maybe-contiguous memory (the normal case, when your hardware does not do DMA or does not have the requirement that the memory region it makes DMA from/too needs to be contiguous), and really contiguous. For the size argument of the freeing of the the contiguous memory there is the sentence “Generally, size should be equal the amount allocated.”. Immediately I wanted to know what happens if you specify a different size (as a non-native english speaker I understand this sentence in a way that I am allowed to specify a different size and as such are able to free only parts of the allocated memory). Unfortunately this is not answered. I had a look into the source, the kernel frees memory pages, so the size argument (and addr argument) will be rounded to include a full page. This means theoretically I am able to free parts of the allocated memory, but this is a source-maintenance nightmare (needs knowledge about the machine specific page boundaries and you need to make sure that you do the absolutely correct size calculations).  To me this looks more like as long as nobody is pointing a gun at my head and tells me to use a different size, specifying the same size as made during the allocation of this memory region is the way to go.

After reading this chapter you should know how to kill the system by allocating all the RAM in the kernel.

Again, I did not try to compile the examples in this chapter, but the difference of the memory allocation in the kernel compared with memory allocation in the userland is not that big.

The third chapter explains the device communication and control interfaces (ioctl/sysctl) of a driver. The ioctl part teached me some parts I always wanted to know when I touched some ioctls, but never bothered to find out before. Unfortunately this makes me a little bit nervous about the way ioctls are handled in the FreeBSD linuxulator, but this is not urgent ATM (and can probably be handled by a commend in the right place). The sysctl part takes a little bit longer to follow through, but there is also more to learn about it. If you just modify an existing driver with an existing sysctl interface, it probably just comes down to copy&paste with little modifications, but if you need to make more complex changes or want to add a sysctl interface to a driver, this part of the book is a good way to understand what is possible and how everything fits together. Personally I would have wished for a more detailed guide when to pick the ioctl interface and when the sysctl interface than what was written in the conclusion of the chapter, but it is probably not that easy to come up with a good list which fits most drivers.

After reading this chapter you should be able to get data in and out of the kernel in 10 minutes.

As before, I did not compile the examples in this chapter. I already added ioctls and sysctls in various places in the FreeBSD kernel.

Chapter 4 is about thread synchronization – mutexes, shared/exclusive locks, reader/writer locks and condition variables. For me this chapter is not as good as the previous ones. While I got a good explanation of everything, I missed a nice overview table which compares the various methods of thread synchronization. Brendan Gregg did a nice table to give an overview of DTrace variable types and when to use them. Something like this would have been nice in this chapter too. Apart from this I got all the info I need (but hey, I already wrote a NFS client for an experimental computer with more than 200000 CPUs in 1998, so I’m familiar with such synchronization primitives).

Delayed execution is explained in chapter 5. Most of the information presented there was new to me. While there where not much examples presented (there will be some in a later chapter), I got a good overview about what exists. This time there was even an overview when to use which type of delayed execution infrastructure. I would have preferred to have this overview in the beginning of the chapter, but that is maybe some kind of personal preference.

In chapter 6 a complete device driver is dissected. It is the virtual null modem terminal driver. The chapter provides real-world examples of event-handlers, callouts and taskqueues which where not demonstrated in chapter five. At the same time the chapter serves as a description of the functions a TTY driver needs to have.

Automated device detection with Newbus and the corresponding resource allocation (I/O ports, device memory and interrupts) are explained in chapter 7. It is easy… if you have a real device to play with. Unfortunately the chapter missed a paragraph or two about the suspend and resume methods. If you think about it, it is not hard to come up with what they are supposed to do, but a little explicit description of what they shall do, in what state the hardware should be put and what to assume when being called would have been nice.

Chapter 8 is about interrupts. It is easy to add an interrupt handler (or to remove one), the hard part is to generate an interrupt. The example code uses the parallel port, and the chapter also contains a little explanation how to generate an interrupt… if you are not afraid to touch real hardware (the parallel port) with a resistor.

In chapter 9 the lpt(4) driver is explained, as most of the topics discussed so far are used inside. The explanation how everything is used is good, but what I miss sometimes is why they are used. The most prominent (and only) example here for me is why are callouts used to catch stray interrupts? That callouts are a good way of handling this is clear to me, the big question is why can there be stray interrupts. Can this happen only for the parallel port (respectively a limited amount of devices), or does every driver for real interrupt driven hardware need to come with something like this? I assume this is something specific to the device, but a little explanation regarding this would have been nice.

Accessing I/O ports and I/O memory for devices are explained in chapter 10 based upon a driver for a LED device (turn on and off 2 LEDs on an ISA bus). All the functions to read and write data are well explained, just the part about the memory barrier is a little bit short. It is not clear why the CPU reordering of memory accesses matter to what looks like function calls. Those function calls may be macros, but this is not explained in the text. Some little examples when to use the barriers instead of an abstract description would also have been nice at this point.

Chapter 11 is similar to chapter 10, just that a PCI bus driver is discussed instead of an ISA bus driver. The differences are not that big, but important.

In chapter 12 it is explained how to do DMA in a driver. This part is not easy to understand. I would have wanted to have more examples and explanations of the DMA tag and DMA map parts. I am also surprised to see different supported architectures for the flags BUS_DMA_COHERENT and BUS_DMA_NOCACHE for different functions. Either this means FreeBSD is not coherent in those parts, or it is a bug in the book, or it is supposed to be like this and the reasons are not explained in the book. As there is no explicit note about this, it probably leads to confusion of readers which pay enough attention here. It would also have been nice to have an explanation when to use those flags which are only implemented on a subset of the architectures FreeBSD supports. Anyway, the explanations give enough information to understand what is going on and to be able to have a look at other device drivers for real-live examples and to get a deeper understanding of this topic.

Disk drivers and block I/O (bio) requests are described in chapter 13. With this chapter I have a little problem. The author used the word “undefined” in several places where I as a non-native speaker would have used “not set” or “set to 0″. The word “undefined” implies for me that there may be garbage inside, whereas from a technical point of view I can not imagine that some random value in those places would have the desired result. In my opinion each such place is obvious, so I do not expect that an experienced programmer would lose time/hairs/sanity over it, but inexperienced programmers which try to assemble the corresponding structures on the (uninitialized) heap (for whatever reason), may struggle with this.

Chapter 14 is about the CAM layer. While the previous chapter showed how to write a driver for a disk device, chapter 14 gave an overview about how to an HBA to the CAM layer. It is just an overview, it looks like CAM needs a book on its own to be fully described. The simple (and most important) cases are described, with the hardware-specific parts being an exercise for the person writing the device driver. I have the impression it gives enough details to let someone with hardware (or protocol), and more importantly documentation for this device, start writing a driver.

It would have been nice if chapter 13 and 14 would have had a little schematic which describes at which level of the kernel-subsystems the corresponding driver sits. And while I am at it, a schematic with all the driver components discussed in this book at the beginning as an overview, or in the end as an annex, would be great too.

An overview of USB drivers is given in chapter 15 with the USB printer driver as an example for the explanation of the USB driver interfaces. If USB would not be as complex as it is, it would be a nice chapter to start driver-writing experiments (due to the availability of various USB devices). Well… bad luck for curious people. BTW, the author gives pointers to the official USB docs, so if you are really curious, feel free to go ahead. :)

Chapter 16 is the first part about network drivers. It deals with ifnet (e.g. stuff needed for ifconfig), ifmedia (simplified: which kind of cable and speed is supported), mbufs and MSI(-X). As in other chapters before, a little overview and a little picture in the beginning would have been nice.

Finally, in chapter 17, the packet reception and transmission of network drivers is described. Large example code is broken up into several pieces here, for more easy discussion of related information.

One thing I miss after reaching the end of the book is a discussion of sound drivers. And this is surely not the only type of drivers which is not discussed, I can come up with crypto, firewire, gpio, watchdog, smb and iic devices within a few seconds. While I think that it is much more easy to understand all those drivers now after reading the book, it would have been nice to have at least a little overview of other driver types and maybe even a short description of their driver methods.

Conclusion: As I wrote already in the beginning, the book is not perfect, but it is good. While I have not written a device driver for FreeBSD, the book provided enough insight to be able to write one and to understand existing drivers. I really hope there will be a second edition which addresses the minor issues I had while reading it to make it a perfect book.


Alexander Leidinger

A while ago I committed the linuxulator D-Trace probes I talked about earlier. I waited a little bit for this announcement to make sure I have not broken anything. Nobody complained so far, so I assume nothing obviously bad crept in.

The >500 probes I committed do not cover the entire linuxulator, but are a good start. Adding new ones is straight forward, if someone is interested in a junior–kernel–hacker task, this would be one. Just ask me (or ask on emulation@), and I can guide you through it.


Alexander Leidinger

Seems I forgot to announce that the linux_base-c6 is in the Ports Collection now. Well, it is not a replacement for the current default linux base, the linuxulator infrastructure ports are missing and we need to check if the kernel supports enough of 2.6.18 that nothing breaks.


To my knowledge, nobody is working on anything of this. Anyone is welcome to have a look and provide patches.


Alexander Leidinger

In case you have not noticed yet, KDTRACE_HOOKS is now in the GENERIC kernel in FreeBSD-current. This means you just need to load the DTrace modules and can use DTrace with the GENERIC kernel.

In case you do not know what you can do with DTrace, take the time to have a look at the DTrace blog. It is worth any minute you invest reading it.


Alexander Leidinger

This weekend I made some progress in the linuxulator:

  • I MFCed the reporting of some linux-syscalls to 9-stable and 8-stable.
  • I updated my linuxulator-dtrace patch to a recent -current. I already compiled it on i386 and arundel@ has it compiled on amd64. I counted more than 500 new DTrace probes. Now that DTrace rescans for SDT probes when a kernel module is loaded, there is no kernel panic anymore when the linux module is loaded after the DTrace modules and you want to use DTrace. I try to commit this at a morning of a day where I can fix things during the day in case some problems show up which I did not notice during my testing.
  • I created a PR for portmgr@ to repocopy a new linux_base port.
  • I set the expiration date of linux_base-fc4 (only used by 7.x and upstream way past its EoL) and all dependent ports. It is set to the EoL of the last 7.x release, which can not use a later linux_base port. I also added a comment which explains that the date is the EoL of the last 7.x release.


Alexander Leidinger

I just updated to a recent -current and tried the new nullfs. Sockets (e.g. the MySQL one) work now with nullfs. No need to have e.g. jails on the same FS and hardlink the socket to not need to use TCP in MySQL (or an IP at all for the jail).

Great work!


New CentOS linux_base for testing soonish

It seems my HOWTO create a new linux_base port was not too bad. There is now a PR for a CentOS 6 based linux_base port. I had a quick look at it and it seems that it is nearly usable to include into the Ports Collection (the SRPMs need to be added, but that can be done within some minutes).

When FreeBSD 8.3 is released and the Ports Collection open for sweeping commits again, I will ask portmgr to do a repo-copy for the new port and commit it. This is just the linux_base port, not the complete infrastructure which is needed to completely replace the current default linuxulator userland. This is just a start. The process of switching to a more recent linux_base port is a long process, and in this case depends upon enough support in the supported FreeBSD releases.

Attention: Anyone installing the port from the PR should be aware that using it is a highly experimental task. You need to change the linuxulator to impersonate himself as a linux 2.6.18 kernel (described in the pkg-message of the port), and the code in FreeBSD is far from supporting this. Anyone who wants to try it is welcome, but you have to run FreeBSD-current as of at least the last weekend, and watch out for kernel messages about unsupported syscalls. Reports to [email protected] please, not here on the webpage.


New opportunities in the linuxulator

Last weekend I committed some dummy-syscalls to the linuxulator in FreeBSD-current. I also added some comments to syscalls.master which should give a hint which linux kernel had them for the first time (if the linux man–page I looked this up in is correct). So if someone wants to experiment with a higher compat.linux.osrelease than 2.6.16 (as it is needed for a CentOS based linux_base), he should now get some kernel messages about unimplemented syscalls instead of a silent failure.

There may be some low-hanging fruits in there, but I did not really verify this by checking what the dummy syscalls are supposed to do in linux and if we can easily map this to existing FreeBSD features. In case someone has a look, please send an email to


Static DTrace probes for the linuxulator updated

I got a little bit of time to update my 3 year old work of adding static DTrace probes to the linuxulator.

The changes are not in HEAD, but in my linuxulator-dtrace branch. The revision to have a look at is r230910. Included are some DTrace scripts:

  • script to check internal locks
  • script to trace futexes
  • script to generate stats for DTracified linuxulator parts
  • script to check for errors:
    • emulation errors (unsupported stuff, unknown stuff, …)
    • kernel errors (resource shortage, …)
    • programming errors (errors which can happen if someone made a mistake, but should not happen)

The programming-error checks give hints about userland programming errors respectively a hint about the reason of error return values due to resource shortage or maybe a wrong combination of parameters. An example error message for this case is “Application %s issued a sysctl which failed the length restrictions.nThe length passed is %d, the min length supported is 1 and the max length supported is %d.n

More drivers available in the FreeBSD-kernel doxygen docs

Yesterday I committed some more configs to generate doxygen documentation of FreeBSD-kernel drivers. I mechanically generated missing configs for subdirectories of src/sys/dev/. This means there is no dependency information included in the configs, and as such you will not get links e.g. to the PCI documentation, if a driver calls functions in the PCI driver (feel free to tell me about such dependencies).

If  you want to generate the HTML or PDF version of some subsystem, just go to src/tools/kerneldoc/subsys/ an run “make

Linuxulator explained (for developers): adding ioctls directly to the kernel

After giving an overview of the in-kernel basics of the Linuxulator, I want now to describe how to add support for new ioctls to the Linuxulator.

Where are the files to modify?

The platform independent code for the ioctls is in SRC/sys/compat/linux/linux_ioctl.c. The defines to have names for the ioctl values are in SRC/sys/compat/linux/linux_ioctl.h.

How to modify them?

First of all create a new header which will contain all the structures, named values and macros for those new ioctls. As written above, the ioctl values (e.g. #define LINUX_VIDIOC_ENCODER_CMD 0x564d /* 0xc028564d */) do not belong there, they shall be added to linux_ioctl.h. During the course of adding support for ioctls, you will need this new header. Add it in the SRC/sys/compat/linux/ directory, and prefix the name with a linux_. It would be good to decide on a common tag here (referenced as yourtag in the following), and stay with it. Use it wherever you need to have some specific name for the ioctl-set you want to add. In this case it would result in linux_yourtag.h (or even linux_ioctl_yourtag.h, depending if this is used for something very specific to the ioctls, or some generic linux feature) as the name of the header file. This was not done in the past, so do not expect that the names inside the linux_ioctl.c file will be consistent to this naming scheme, but it is never too late to correct mistakes of the past (at least in Open Source software development).

Now add this header to linux_ioctl.c (you need to include compat/linux/linux_yourtag.h). After that add the ioctl values to linux_ioctl.h. As can be seen above, the defines should be named the same as on linux, but with a LINUX_ prefix (make sure they where not defined before somewhere else). The ioctl values need to be the same hex values as in Linux, off course. Sort them according to their hex value. When you added all, you need to add two more defines. The LINUX_IOCTL_yourtag_MIN and LINUX_IOCTL_yourtag_MAX ones. The MIN-one needs to be an alias for the first (sorted according to the hex value) ioctl you added, and MAX needs to be an alias for the last (again, sorted according to the hex value) ioctl you added.

The next step is to let the Linuxulator know that it is able to handle the ioctls in the LINUX_IOCTL_yourtag_MIN to LINUX_IOCTL_yourtag_MAX range. Search the static linux_ioctl_function_t section of linux_ioctl.c and add such a variable for your ioctl set. The name of the variable should be something like linux_ioctl_yourtag.

Similar for the handler-definition for this. Search the static struct linux_ioctl_handler section and add a yourtag_handler. Set it to { linux_ioctl_yourtag, LINUX_IOCTL_yourtag_MIN, LINUX_IOCTL_yourtag_MAX }. To make this handler known to the Linuxulator, you need to add it to the DATA_SET section. Add DATA_SET(linux_ioctl_handler_set, yourtag_handler) there.

Now the meat, the function which handles the ioctls. You already defined it as linux_ioctl_function_t, but now you need to write it. The outline of it looks like this:

static int
linux_ioctl_yourtag(struct thread *td, struct linux_ioctl_args *args)
        struct file *fp;
        int error;
        switch (args->cmd & 0xffff) {
        case LINUX_an_easy_ioctl:
        case LINUX_a_not_so_easy_ioctl:
                /* your handling of the ioctl */
                fdrop(fp, td);
                return (error);
        /* some more handling of your ioctls */
       return (ENOIOCTL);
        error = ioctl(td, (struct ioctl_args *)args);
        return (error);

An easy ioctl in the switch above is an ioctl where you do not have to do something but can pass the ioctl through to FreeBSD itself. The not so easy ioctl case is an ioctl where you need to do e.g. a fget(td, args->fd, &fp). This is just an example, there are also other possibilities where you need to do additional stuff before the return, or where you do not pass the ioctl to FreeBSD. A typical example of what needs to be done here is to copy values from linux structures to FreeBSD structures (and the other way too), or to translate between 64bit and 32bit. Linux programs on amd64 are 32bit executables and 32bit structures/pointers. To make this work on amd64, you need to find a way to map between the two. There are examples in the kernel where this is already the case. The more prominent examples in the 64bit<->32bit regard are the v4l and v4l2 ioctls.

The tedious part is to research if a translation has to be done and if yes what needs to be translated how. When this is done, most of the work is not so hard. The linux_yourtag.h should contain the structures you need for this translation work.

It is also possible to add ioctls in a kernel module, but this is not subject to this description (I will update this posting with a link to it when I get time to write about it).


v4l2 support in the linuxulator now in 8-stable

I MFCed the v4l2 support in the linuxulator to 8–stable. This allows now to use v4l2–webcams in skype/flash on 8-stable too.


Video4Linux2 sup­port in FreeBSD (linuxulator)

I com­mit­ted the v4l2 sup­port into the lin­ux­u­la­tor (in 9-current). Part of this was the import of the v4l2 header from linux. We have the per­mis­sion to use it (like the v4l one), it is not licensed via GPL. This means we can use it in FreeBSD native dri­vers, and they are even allowed to be com­piled into GENERIC (but I doubt we have a dri­ver which could pro­vide the v4l2 inter­face in GENERIC).

The code I com­mit­ted is “just

DTrace probes for the Linuxulator updated

If someone had a look at the earlier post about DTrace probes for the Linuxulator: I updated the patch at the same place. The difference between the previous one is that some D-scripts are fixed now to do what I meant, specially the ones which provide statistics output.


New DTrace probes for the linuxulator

I forward ported my DTrace probes for the FreeBSD linuxulator from a 2008-current to a recent –current. I have not the complete FreeBSD linuxulator covered, but a big part is already done. I can check the major locks in the linuxulator, trace futexes, and I have a D-script which yells at a lot of errors which could happen but should not.

Some of my D-scripts need some changes, as real-world testing showed that they are not really working as expected. They can get overwhelmed by the amount of speculation and dynamic variables (error message: dynamic variable drops with non-empty dirty list). For the dynamic variables problem I found a discussion on the net with some suggestions. For the speculation part I expect similar tuning-possibilities.

Unfortunately the D-script which checks the internal locks fails to compile. Seems there is a little misunderstanding on my side how the D-language is supposed to work.

I try to get some time later to have a look at those problems.

During my development I stumbled over some generic DTrace problems with the SDT provider I use for my probes:

  • If you load the Linux module after the SDT module, your system will panic as soon as you want to access some probes, e.g. “dtrace –l

Additional FEATURE macros on the way

It seems I have a bit of free time now to take care about some FreeBSD related things.

As part of this I already committed the UFS/FFS related FEATURE macros which where developed by kibab@ during the Google Summer of Code 2010. The network/ALTQ related FEATURE macros are in the hands of bz@, he already reviewed them and wants to commit them (with some changes) as part of his improvements of parts of the network related code.

The GEOM related FEATURE macros I just send some minutes ago to geom@ for review. All the rest went out to hackers@ for review. The rest in this case is related to AUDIT, CAM, IPC, KTR, MAC, NFS, NTP, PMC, SYSV and a few other things.

If everything is committed, it should look a bit like this if queried from userland (not all features are shown, those are just the ones which are enabled in the kernel in one of my machines):

kern.features.scbus: 1
kern.features.geom_vol: 1
kern.features.geom_part_bsd: 1
kern.features.geom_part_ebr_compat: 1
kern.features.geom_part_ebr: 1
kern.features.geom_part_mbr: 1
kern.features.kposix_priority_scheduling: 1
kern.features.kdtrace_hooks: 1
kern.features.ktrace: 1
kern.features.invariant_support: 1
kern.features.compat_freebsd7: 1
kern.features.compat_freebsd6: 1
kern.features.pps_sync: 1
kern.features.stack: 1
kern.features.sysv_msg: 1
kern.features.sysv_sem: 1
kern.features.sysv_shm: 1
kern.features.posix_shm: 1
kern.features.ffs_snapshot: 1
kern.features.softupdates: 1
kern.features.ufs_acl: 1



After reading Jim Gettys investigations about the problems current buffer sizes of network equipment provoke (which may even have implications in the net neutrality debate), I had a look at which active queue management (AQM) algorithms with or without explicit congestion notification (ECN) FreeBSD supports.

It looks like there is not much implemented (if the best solution would be implemented, it would not matter how much there is, but unfortunately there is no best solution). Other systems offer more. RED is implemented, but even the inventor/researcher of RED thinks the algorithm needs some improvements (he is in the process of preparing a paper about this, as Jim Gettys reveals). Blue/SFBlue is not implemented (a more turnkey-solution than the current RED implementation). PID controller (which may or may not be something someone wants to use in this case… no idea about its pros/cons in this regard, but it is referenced in the AQM article on Wikipedia) is also not implemented.

Regarding ECN for FreeBSD you can find more or less no real documentation in the net (at least with a simple “ECN FreeBSD

How big are the buffers in FreeBSD drivers?

Today I have read an interesting investigation and problem analysis from Jim Gettys.

It is a set of articles he wrote over several months and is not finished writing as of this writing (if you are deeply interested in it go and read them, the most interesting ones are from December and January and the comments to the articles are also contributing to the big picture). Basically he is telling that a lot of network problems users at home (with ADSL/cable or WLAN) experience  are because buffers in the network hardware or in operating systems are too big. He also proposes workarounds until this problem is attacked by OS vendors and equipment manufacturers.

Basically he is telling the network congestion algorithms can not do their work good, because the network buffers which are too big come into the way of their work (not reporting packet loss timely enough respectively try to not lose packets in situations where packet loss would be better because it would trigger action in the congestion algorithms).

He investigated the behavior of Linux, OS X and Windows (the system he had available). I wanted to have a quick look at the situation in FreeBSD regarding this, but it seems at least with my network card I am not able to see/find the corresponding size of the buffers in drivers in 30 seconds.

I think it would be very good if this issue is investigated in FreeBSD, and apart from maybe taking some action in the source also write some section for the handbook which explains the issue (one problem here is, that there are situations where you want/need to have such big buffers and as such we can not just downsize them) and how to benchmark and tune this.

Unfortunately I even have too much on my plate to even further look into this. :( I hope one of the network people in FreeBSD is picking up the ball and starts playing.