Category Archives: Network

Watchdogs

I’ve been having a lot of fun over the last few months with FreeBSD on BeagleBone.  Most recently, that’s involved working on the CPSW ethernet driver.

One nasty bug has been eluding me for quite some time: The controller just stops sending packets after about 20 minutes.

Eventually, I will track down this problem. However, I’ve managed to make the driver quite usable even with such an unpleasant bug: I can leave SSH sessions open for days, download port tarballs, use NFS mounts, and generally do the things you expect a network to do.

The key was to find a really good watchdog strategy. Even with the controller locking up regularly, a good watchdog notices this and resets the driver within a few seconds, fast enough that network protocols simply retry and keep going after the reset. A less effective watchdog can leave the controller non-functioning for a minute or more, resulting in failed transfers and dropped connections.

Network Driver Basics

Three functions that appear in every network driver play a part in the watchdog process:

  • The “start send” routine hands packets to the controller.
  • The “transmit completion interrupt” is invoked by the controller when a packet has finished being sent; this routine recycles the memory and other resources for subsequent packets.
  • The “watchdog ticker” wakes up once a second and decides whether or not to reset the controller.

I’ll refer to these three functions as the “start”, “interrupt”, and “watchdog” functions to match how they are named in the source code of most drivers.

The Standard Watchdog

The standard logic that appears in many FreeBSD network drivers uses a single “timer” variable that is updated in each of the above functions:

  • The start routine sets it to 5 whenever a new packet is added to the controller.
  • The interrupt routine sets it to zero whenever it reclaims the last outstanding packet.
  • The watchdog subtracts one and resets the controller when the counter changes from one to zero. (If the counter is already zero, it’s left alone.)

Remember the goal here is to detect when the network is no longer running.  That is, we want to know when the interrupt has stopped getting called.  In fact, because the interrupt isn’t getting invoked, we can entirely ignore that function for now.

To understand the standard logic, consider three different scenarios:

Scenario One:  An almost idle network, with more than 5 seconds between packets. In this case, the start routine queues some packet and sets the timer to 5. The watchdog function counts this down every second until it hits zero, then resets the controller.

Scenario Two:  A very busy network. It can take only a few milliseconds for a busy machine to completely fill the transmit queue. In this environment, each new packet will cause the timer to get reset to 5.  The watchdog may tick and reduce the timer to 4, but a new packet will immediately reset it to 5. Once the transmit queue is full, however, new packets stop getting added and the start function stops resetting the timer. Again, the watchdog function counts down and resets the controller fairly promptly.

Scenario Three:  A lightly used network. Suppose “ping” is running and the transmitter stops. In this case, one packet is getting sent every second. If the transmit queue holds 100 packets, it will take 100 seconds before the transmit queue fills up.  During that time, the start routine and the watchdog routine alternately set the timer count to 5 and 4.

This last scenario is troublesome. In the first two scenarios, the watchdog function detects and resets the failed transmitter in about 5 seconds.  But in the third scenario, the standard logic can leave the network broken for more than a minute, long enough for TCP sessions to time out and reset.

After a few days of not finding the cause for the transmitter stalls, I decided to spend a little time trying to improve the watchdog itself. I tried a variety of different approaches:  only a few handled all of these scenarios well.

A Better Watchdog

I spent almost a week experimenting with different watchdog logic before I formulated two key questions:

  • Is there something to be done?  If there’s no work to be done, then the watchdog should sit quietly.  For a network driver, this just requires checking whether there are packets in the queue waiting to be sent.
  • Has progress been made?  For network drivers, we have “progress” when any packet completes sending.

Translating these questions into code gives a watchdog function that looks something like this:

cpsw_tx_watchdog(struct cpsw_softc *sc)
{
  if (sc->tx_in_queue == 0) {
    sc->tx_wd_timer = 0; /* Nothing to do. */
  } else if (sc->tx_completed > sc->tx_completed_at_last_tick) {
    sc->tx_wd_timer = 0;  /* More stuff got done. */
  } else {
    /* Something should have been done! */
    ++sc->tx_wd_timer;
    if (sc->tx_wd_timer > 3) {
      ... reset controller ...
    }
  }
  sc->tx_completed_at_last_tick = sc->tx_completed;
}

Notice that the watchdog timer is no longer touched in either the start or interrupt routines.  The start and interrupt routines only need to maintain two statistics:  a count of how many packets have been taken off the queue by the interrupt routine (tx_completed), and a count of how many packets are still on the controller’s queue (tx_in_queue).

There’s another interesting feature of this new logic. The timer variable now has a comparatively simple interpretation: It is the number of seconds that have elapsed since we noticed work being missed. (As an exercise, try formulating a single sentence that accurately describes the timer variable in the standard logic.)

Most importantly, of course, this logic works well in all of the scenarios described above, including the lightly-used network scenario that causes problems for the standard logic.

Of course, all of the above will be considerably less interesting once I figure out how to keep the controller from stalling every 20 minutes…

Alexander Leidinger » FreeBSD 2012-07-13 15:10:36

In mid-April a woman from the marketing department of No Starch Press contacted me and asked if I am interested to do a public review of the FreeBSD Device Drivers book by Joseph Kong (no link to a book shop, go and have a look in your preferred one). Just this simple question, no strings attached.

I had my nose in some device drivers in the past, but I never wrote one, and never had a look at the big picture. I was interested to know how everything fits together, so this made me a good victim for a review (novice enough to learn something new and to have a look if enough is explained, and experienced enough to understand what is going on in the FreeBSD kernel).

Some minutes after I agreed to review it (but with a little notice that I do not know how long I need to review it), I had the PDF version of the book. That was faster than I expected (maybe I am too old-school and used to have paper versions of books in my hands).

Let the review begin… but bear with me, this is the first time I do a real public review of a book (instead of a technical review for an author). And as this is my very own personal opinion, I will not allow comments here. This page is all about my opinion while reading the book, questions I have while reading the book shall serve as a hint about the quality of the book and they should be answered in the book, not here.

In short, the book is not perfect, but it is a good book. There is room for improvement, but on a very high level. If you want to write a device driver for FreeBSD, this book is a must. I suggest to read it completely, even chapters which do not belong to the type of driver you want to write (specially the case studies of real drivers). The reason is that each chapter has some notes which may not only apply to the chapter in question, but to all kinds of device drivers. The long review follows now.

The first chapter is titled “Building and running modules�. The author begins with description of the usual device driver types (NIC driver, pseudo-device, …) and how they can be added to the kernel (statically linked in or as a module). The first code example is a small and easy kernel module, so that we do not have to reboot the system we use to develop a driver (except we make a fault during driver development which causes the machine to panic or hang). Every part of the example is well explained. This is followed by an overview about character devices (e.g. disks) and a simple character-device driver (so far a pseudo-device, as we do not have real hardware we access) which is not only as-well explained as the module-example, but there is also a note where the code was simplified and what should be done instead.

After reading this chapter you should be able to write your own kernel module in 5 minutes (well, after 5 minutes it will not be able to do a lot — just a “hello world� – but at least you can already load/unload/execute some code into/from/in the kernel).

I have not tried any example myself, but I compiled a lot of modules and drivers I modified in the past and remember to have seen the described parts.

The second chapter explains how to allocate and free memory in the kernel. There is the possibility to allocate maybe-contiguous memory (the normal case, when your hardware does not do DMA or does not have the requirement that the memory region it makes DMA from/too needs to be contiguous), and really contiguous. For the size argument of the freeing of the the contiguous memory there is the sentence “Generally, size should be equal the amount allocated.�. Immediately I wanted to know what happens if you specify a different size (as a non-native english speaker I understand this sentence in a way that I am allowed to specify a different size and as such are able to free only parts of the allocated memory). Unfortunately this is not answered. I had a look into the source, the kernel frees memory pages, so the size argument (and addr argument) will be rounded to include a full page. This means theoretically I am able to free parts of the allocated memory, but this is a source-maintenance nightmare (needs knowledge about the machine specific page boundaries and you need to make sure that you do the absolutely correct size calculations).  To me this looks more like as long as nobody is pointing a gun at my head and tells me to use a different size, specifying the same size as made during the allocation of this memory region is the way to go.

After reading this chapter you should know how to kill the system by allocating all the RAM in the kernel.

Again, I did not try to compile the examples in this chapter, but the difference of the memory allocation in the kernel compared with memory allocation in the userland is not that big.

The third chapter explains the device communication and control interfaces (ioctl/sysctl) of a driver. The ioctl part teached me some parts I always wanted to know when I touched some ioctls, but never bothered to find out before. Unfortunately this makes me a little bit nervous about the way ioctls are handled in the FreeBSD linuxulator, but this is not urgent ATM (and can probably be handled by a commend in the right place). The sysctl part takes a little bit longer to follow through, but there is also more to learn about it. If you just modify an existing driver with an existing sysctl interface, it probably just comes down to copy&paste with little modifications, but if you need to make more complex changes or want to add a sysctl interface to a driver, this part of the book is a good way to understand what is possible and how everything fits together. Personally I would have wished for a more detailed guide when to pick the ioctl interface and when the sysctl interface than what was written in the conclusion of the chapter, but it is probably not that easy to come up with a good list which fits most drivers.

After reading this chapter you should be able to get data in and out of the kernel in 10 minutes.

As before, I did not compile the examples in this chapter. I already added ioctls and sysctls in various places in the FreeBSD kernel.

Chapter 4 is about thread synchronization – mutexes, shared/exclusive locks, reader/writer locks and condition variables. For me this chapter is not as good as the previous ones. While I got a good explanation of everything, I missed a nice overview table which compares the various methods of thread synchronization. Brendan Gregg did a nice table to give an overview of DTrace variable types and when to use them. Something like this would have been nice in this chapter too. Apart from this I got all the info I need (but hey, I already wrote a NFS client for an experimental computer with more than 200000 CPUs in 1998, so I’m familiar with such synchronization primitives).

Delayed execution is explained in chapter 5. Most of the information presented there was new to me. While there where not much examples presented (there will be some in a later chapter), I got a good overview about what exists. This time there was even an overview when to use which type of delayed execution infrastructure. I would have preferred to have this overview in the beginning of the chapter, but that is maybe some kind of personal preference.

In chapter 6 a complete device driver is dissected. It is the virtual null modem terminal driver. The chapter provides real-world examples of event-handlers, callouts and taskqueues which where not demonstrated in chapter five. At the same time the chapter serves as a description of the functions a TTY driver needs to have.

Automated device detection with Newbus and the corresponding resource allocation (I/O ports, device memory and interrupts) are explained in chapter 7. It is easy… if you have a real device to play with. Unfortunately the chapter missed a paragraph or two about the suspend and resume methods. If you think about it, it is not hard to come up with what they are supposed to do, but a little explicit description of what they shall do, in what state the hardware should be put and what to assume when being called would have been nice.

Chapter 8 is about interrupts. It is easy to add an interrupt handler (or to remove one), the hard part is to generate an interrupt. The example code uses the parallel port, and the chapter also contains a little explanation how to generate an interrupt… if you are not afraid to touch real hardware (the parallel port) with a resistor.

In chapter 9 the lpt(4) driver is explained, as most of the topics discussed so far are used inside. The explanation how everything is used is good, but what I miss sometimes is why they are used. The most prominent (and only) example here for me is why are callouts used to catch stray interrupts? That callouts are a good way of handling this is clear to me, the big question is why can there be stray interrupts. Can this happen only for the parallel port (respectively a limited amount of devices), or does every driver for real interrupt driven hardware need to come with something like this? I assume this is something specific to the device, but a little explanation regarding this would have been nice.

Accessing I/O ports and I/O memory for devices are explained in chapter 10 based upon a driver for a LED device (turn on and off 2 LEDs on an ISA bus). All the functions to read and write data are well explained, just the part about the memory barrier is a little bit short. It is not clear why the CPU reordering of memory accesses matter to what looks like function calls. Those function calls may be macros, but this is not explained in the text. Some little examples when to use the barriers instead of an abstract description would also have been nice at this point.

Chapter 11 is similar to chapter 10, just that a PCI bus driver is discussed instead of an ISA bus driver. The differences are not that big, but important.

In chapter 12 it is explained how to do DMA in a driver. This part is not easy to understand. I would have wanted to have more examples and explanations of the DMA tag and DMA map parts. I am also surprised to see different supported architectures for the flags BUS_DMA_COHERENT and BUS_DMA_NOCACHE for different functions. Either this means FreeBSD is not coherent in those parts, or it is a bug in the book, or it is supposed to be like this and the reasons are not explained in the book. As there is no explicit note about this, it probably leads to confusion of readers which pay enough attention here. It would also have been nice to have an explanation when to use those flags which are only implemented on a subset of the architectures FreeBSD supports. Anyway, the explanations give enough information to understand what is going on and to be able to have a look at other device drivers for real-live examples and to get a deeper understanding of this topic.

Disk drivers and block I/O (bio) requests are described in chapter 13. With this chapter I have a little problem. The author used the word “undefined� in several places where I as a non-native speaker would have used “not set� or “set to 0″. The word “undefined� implies for me that there may be garbage inside, whereas from a technical point of view I can not imagine that some random value in those places would have the desired result. In my opinion each such place is obvious, so I do not expect that an experienced programmer would lose time/hairs/sanity over it, but inexperienced programmers which try to assemble the corresponding structures on the (uninitialized) heap (for whatever reason), may struggle with this.

Chapter 14 is about the CAM layer. While the previous chapter showed how to write a driver for a disk device, chapter 14 gave an overview about how to an HBA to the CAM layer. It is just an overview, it looks like CAM needs a book on its own to be fully described. The simple (and most important) cases are described, with the hardware-specific parts being an exercise for the person writing the device driver. I have the impression it gives enough details to let someone with hardware (or protocol), and more importantly documentation for this device, start writing a driver.

It would have been nice if chapter 13 and 14 would have had a little schematic which describes at which level of the kernel-subsystems the corresponding driver sits. And while I am at it, a schematic with all the driver components discussed in this book at the beginning as an overview, or in the end as an annex, would be great too.

An overview of USB drivers is given in chapter 15 with the USB printer driver as an example for the explanation of the USB driver interfaces. If USB would not be as complex as it is, it would be a nice chapter to start driver-writing experiments (due to the availability of various USB devices). Well… bad luck for curious people. BTW, the author gives pointers to the official USB docs, so if you are really curious, feel free to go ahead. :)

Chapter 16 is the first part about network drivers. It deals with ifnet (e.g. stuff needed for ifconfig), ifmedia (simplified: which kind of cable and speed is supported), mbufs and MSI(-X). As in other chapters before, a little overview and a little picture in the beginning would have been nice.

Finally, in chapter 17, the packet reception and transmission of network drivers is described. Large example code is broken up into several pieces here, for more easy discussion of related information.

One thing I miss after reaching the end of the book is a discussion of sound drivers. And this is surely not the only type of drivers which is not discussed, I can come up with crypto, firewire, gpio, watchdog, smb and iic devices within a few seconds. While I think that it is much more easy to understand all those drivers now after reading the book, it would have been nice to have at least a little overview of other driver types and maybe even a short description of their driver methods.

Conclusion: As I wrote already in the beginning, the book is not perfect, but it is good. While I have not written a device driver for FreeBSD, the book provided enough insight to be able to write one and to understand existing drivers. I really hope there will be a second edition which addresses the minor issues I had while reading it to make it a perfect book.

Share

Alexander Leidinger » FreeBSD 2012-05-31 16:25:32

In several previous posts I wrote about my quest for the right source format to stream video to my Sony BRAVIA TV (build in 2009). The last week-end I finally found something which satisfies me.

What I found was serviio, a free UPnP-AV (DLNA) server. It is written in java and runs on Windows, Linux and FreeBSD (it is not listed on the website, but we have an not-so-up-to-date version in the ports tree). If necessary it transcodes the input to an appropriate format for the DLNA renderer (in my case the TV).

I tested it with my slow Netbook, so that I was able to see with which input format it will just remux the input container to a MPEG transport stream, and which input format would be really re-encoded to a format the TV understands.

The bottom line of the tests is, that I just need to use a supported container (like MKV or MP4 or AVI) with H.264-encoded video (e.g. encoded by x264) and AC3 audio.

The TV is able to chose between several audio streams, but I have not tested if serviio is able to serve files with multiple audio streams (my wife has a different mother language than me, so it is interesting for us to have multiple audio streams for a movie), and I do not know if DLNA supports something like this.

Now I just have to replace minidlna (which only works good with my TV for MP3s and Pictures) with serviio on my FreeBSD file server and we can forget about the disk-juggling.

Share

AQM/ECN in FreeBSD

After reading Jim Gettys investigations about the problems current buffer sizes of network equipment provoke (which may even have implications in the net neutrality debate), I had a look at which active queue management (AQM) algorithms with or without explicit congestion notification (ECN) FreeBSD supports.

It looks like there is not much implemented (if the best solution would be implemented, it would not matter how much there is, but unfortunately there is no best solution). Other systems offer more. RED is implemented, but even the inventor/researcher of RED thinks the algorithm needs some improvements (he is in the process of preparing a paper about this, as Jim Gettys reveals). Blue/SFBlue is not implemented (a more turnkey-solution than the current RED implementation). PID controller (which may or may not be something someone wants to use in this case… no idea about its pros/cons in this regard, but it is referenced in the AQM article on Wikipedia) is also not implemented.

Regarding ECN for FreeBSD you can find more or less no real documentation in the net (at least with a simple “ECN FreeBSD

How big are the buffers in FreeBSD drivers?

Today I have read an interesting investigation and problem analysis from Jim Gettys.

It is a set of articles he wrote over several months and is not finished writing as of this writing (if you are deeply interested in it go and read them, the most interesting ones are from December and January and the comments to the articles are also contributing to the big picture). Basically he is telling that a lot of network problems users at home (with ADSL/cable or WLAN) experience  are because buffers in the network hardware or in operating systems are too big. He also proposes workarounds until this problem is attacked by OS vendors and equipment manufacturers.

Basically he is telling the network congestion algorithms can not do their work good, because the network buffers which are too big come into the way of their work (not reporting packet loss timely enough respectively try to not lose packets in situations where packet loss would be better because it would trigger action in the congestion algorithms).

He investigated the behavior of Linux, OS X and Windows (the system he had available). I wanted to have a quick look at the situation in FreeBSD regarding this, but it seems at least with my network card I am not able to see/find the corresponding size of the buffers in drivers in 30 seconds.

I think it would be very good if this issue is investigated in FreeBSD, and apart from maybe taking some action in the source also write some section for the handbook which explains the issue (one problem here is, that there are situations where you want/need to have such big buffers and as such we can not just downsize them) and how to benchmark and tune this.

Unfortunately I even have too much on my plate to even further look into this. :( I hope one of the network people in FreeBSD is picking up the ball and starts playing.

Share