Category Archives: Linux

Meraki Sparky boards, and constant resetting

There's a Mesh internet project at Sudo Room and they've been doing some great work getting a platform up and running. However, like a lot of volunteer projects, they're working with whatever time and equipment they've been donated.

A few months ago they were donated a few hundred Meraki Sparky boards. They're an Atheros AR2317 SoC based device with an integrated 2GHz 802.11bg radio, 10/100 ethernet and.. well, a hardware watchdog that resets the board after five minutes.

Now, annoyingly, this reset occurs inside of Redboot too - which precludes them from being (fully) flashed before the unit reboots. Once the unit was flashed with OpenWRT, the unit still reboots every five minutes.

So, I started down the path of trying to debug this.

What did I know?

Firstly, the AR2317 watchdog doesn't have a way of resetting things itself - instead, all it can do is post an interrupt. The AR7161 and later SoCs do indeed have a way to do a full hardware reset if the watchdog is tickled.

Secondly, redboot has a few tricksy ways to manipulate the hardware:

  • 'x' can examine registers. Since we need them in KSEG1 (unmapped, uncached) then the reset registers (0x11000xxx becomes 0xb1000xxx.) Since its hardware access, we should do them as DWORDS and not bytes.
  • 'mfill' can be used to write to registers.
Thirdly, there's an Atheros specific command - bdshow - which is surprisingly informative:

RedBoot> bdshow
name:     Meraki Outdoor 1.0
magic:    35333131
cksum:    2a1b
rev:      10
major:    1
minor:    0
pciid:    0013
wlan0:    yes 00:18:0a:50:7b:ae
wlan1:    no  00:00:00:00:00:00
enet0:    yes 00:18:0a:50:7b:ae
enet1:    no  00:00:00:00:00:00
uart0:    yes
sysled:   no, gpio 0
factory:  no, gpio 0
serclk:   internal
cpufreq:  calculated 184000000 Hz
sysfreq:  calculated 92000000 Hz
memcap:   disabled
watchdg:  disabled (WARNING: for debugging only!)

serialNo: Q2AJYS5XMYZ8
Watchdog Gpio pin: 6
secret number: e2f019a200ee517e30ded15cdbd27ba72f9e30c8


.. hm. Watchdog GPIO pin 6? What's that?

Next, I tried manually manipulating the watchdog registers but nothing actually happened.

Then I wondered - what about manipulating the GPIO registers? Maybe there's a hardware reset circuit hooked up to GPIO 6 that needs to be toggled to keep the board from resetting.

Board: ap61
RAM: 0x80000000-0x82000000, [0x8003ddd0-0x80fe1000] available
FLASH: 0xa8000000 - 0xa87e0000, 128 blocks of 0x00010000 bytes each.
== Executing boot script in 2.000 seconds - enter ^C to abort
^C
RedBoot> # set direction of gpio6 to out
RedBoot> mfill -b 0xb1000098 -l 4 -p 0x00000043
RedBoot> x -b 0xb1000098
B1000098: 00 00 00 43 00 00 00 00  00 00 00 00 00 00 00 03  |...C............|
B10000A8: FF EF F7 B9 7D DF 5F FF  00 00 00 00 00 00 00 00  |....}._.........|

RedBoot> # pat gpio6 - set it high, then low.
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000042
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000002

.. then I manually did this every minute or so.

RedBoot>
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000042
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000002
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000042
RedBoot> mfill -b 0xb1000090 -l 4 -p 0x00000002

.. so, the solution here seems to be to "set gpio6 to be output", then "pat it every 60 seconds."

I hope this helps people bring OpenWRT up on this board finally. There seems to be a few of them out there!

Lin­ux­u­la­tor explained: How to create Linux binaries on FreeBSD

There may by cases where you want to generate a Linux binary on a FreeBSD machine. This is not a problem with the linuxulator, but not with the default linux_base port.

As you may know, the linux_base port is designed to deliver an integrated experience with FreeBSD native programs. As such some parts of the native FreeBSD infrastructure is used. If you would try to use a Linux–compiler to generate Linux–binaries, you would run into the problem that by default the FreeBSD includes are used.

Prerequisites

To have a fully featured and non-integrated Linux environment on your FreeBSD system either mount an existing (and compatible) Linux installation somewhere into your FreeBSD system, or install a linux_dist port. This can be done additionally to an already installed linux_base port.

Preparation

When you have a complete Linux environment available, you need to mount the FreeBSD devfs to /path/to/complete_linux/dev, linprocfs to /path/to/complete_linux/proc and linsysfs to /path/to/complete_linux/sys to have a complete setup.

Use it

Now you just need to chroot into this  /path/to/complete_linux and you configure/make/install or whatever you need to do to generate your desired Linux binary.

Bookmark/FavoritesShare/Save

Fixing Shifted-Arrow Keys in 256-Color Terminals on Linux

The terminfo entry for “xterm-256color" that ships by default as part of ncurses-base on Debian Linux and its derivatives is a bit annoying. In particular, shifted up-arrow key presses work fine in some programs, but fail in others. It's a bit of a gamble if Shift-Up works in joe, pico, vim, emacs, mutt, slrn, or what have you.

THis afternoon I got bored enough of losing my selected region in Emacs, because I forgot that I was typing in a terminal launched by a Linux desktop. SO I thought "what the heck... let's give the FreeBSD termcap entry for xterm-256color a try":

keramida> scp bsd:/etc/termcap /tmp/termcap-bsd
keramida> captoinfo -e $(                                  \
  echo $( grep '^xterm' termcap | sed -e 's/[:|].*//' ) |  \
  sed -e 's/ /,/g'                                         \
  ) /tmp/termcap  > /tmp/terminfo.src
keramida> tic /tmp/terminfo.src

Restarted my terminal, and quite unsurprisingly, the problem of Shift-Up keys was gone.

The broken xterm-256color terminfo entry from /lib/terminfo/x/xterm-256color is now shadowed by ~/.terminfo/x/xterm-256color, and I can happily keep typing without having to worry about losing mental state because of this annoying little misfeature of Linux terminfo entries.

The official terminfo database sources[1], also work fine. So now I think some extra digging is required to see what ncurses-base ships with. There's definitely something broken in the terminfo entry of ncurses-base, but it will be nice to know which terminal capabilities the Linux package botched.

Notes:
[1] http://invisible-island.net/ncurses/ncurses.faq.html#which_terminfo


Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software

Mutt-like Scrolling for Gnus

Mutt scrolls the index of email folders up or down, one line at a time, with the press of a single key: ‘<’ or ‘>’. This is a very convenient way to skim through email folder listings, so I wrote a small bit of Emacs Lisp to do the same in Gnus tonight.

;;;
;; Scrolling like mutt for group, summary, and article buffers.
;;
;; Being able to scroll the current buffer view by one line with a
;; single key, rather than having to guess a random number and recenter
;; with `C-u NUM C-l' is _very_ convenient.  Mutt binds scrolling by one
;; line to '<' and '>', and it's something I often miss when working
;; with Gnus buffers.  Thanks to the practically infinite customizability
;; of Gnus, this doesn't have to be an annoyance anymore.

(defun keramida-mutt-like-scrolling ()
  "Set up '<' and '>' keys to scroll down/up one line, like mutt."
  ;; mutt-like scrolling of summary buffers with '<' and '>' keys.
  (local-set-key (kbd ">") 'scroll-up-line)
  (local-set-key (kbd "<") 'scroll-down-line))

(add-hook 'gnus-group-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-summary-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-article-prepare-hook 'keramida-mutt-like-scrolling)

This is now the latest addition to my ~/.gnus startup code, and we're one step closer to making Gnus behave like my favorite old-time mailer.


Filed under: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software Tagged: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software

Alexander Leidinger

Seems I forgot to announce that the linux_base-c6 is in the Ports Collection now. Well, it is not a replacement for the current default linux base, the linuxulator infrastructure ports are missing and we need to check if the kernel supports enough of 2.6.18 that nothing breaks.

TODO:

To my knowledge, nobody is working on anything of this. Anyone is welcome to have a look and provide patches.

Share

The initial introduction into "it’s the NIC, stupid!"

I've finally hit that dreaded condition which hardware guys hate - where a NIC is behaving badly.

In my case, an IBM/Lenovo Thinkpad T60 has been modified (not by me) to take an Atheros AR9280 NIC. Unfortunately, the NIC was proving to be very unstable when doing 802.11n throughput. The investigations did show I was doing something slightly incorrect with TX descriptors (and I've since fixed that) but the stability issues remained.

The Atheros NICs can expose some host interface error conditions via the AR_INTR_SYNC_CAUSE register. These include PCI(e) transaction timeouts, illegal chip access (eg whilst the MAC is asleep), parity errors, and other rather nice things. FreeBSD's HAL and Linux ath9k does have the register definition for what the bits do - but unfortunately we don't keep statistics.

In my particular case:
  • I'd see AR_INTR_SYNC_LOCAL_TIMEOUT occur. This is because a PCI(e) transaction didn't complete in time. I can tune these timeouts via a local register but that's not the point - I was seeing these errors when receiving only beacons from the access point. That's a bit silly.
  • I'd also see AR_INTR_SYNC_RADM_CPL_DLLP_ABORT, which is an indication that the PCIe layer isn't behaving well.

I swapped it out with another AR9280 based NIC and suddenly all the instabilities have gone away. No TX hangs, no missed TX interrupts. Everything looks great.

So as an open source developer, I want to try and put some tools into the hands of the community to be able to debug what's going on - or, if that's not possible, at least get some indication that things are going wrong. Right now the only thing people see is "I see TX timeouts, it must be the driver/chip fault." There's too much going on to be able to conclude that.

My game plan is this:

  • Implement statistics keeping for each of the SYNC interrupts and expose those via a diagnostic interface. Ben Grear has done something similar for Linux ath9k after a private email discussion. He's also seeing MAC sleep accesses, so it's quite likely we'll start finding/squishing these.
  • Take the offending laptop/NIC to the office and attach it to a very expensive and fancy looking PCIe analyser. I'm hoping we'll find something really silly occuring - like lots of sleep state transitions, or a high number of parity errors.
  • Try documenting this a lot better so users are able to understand what's going on when their NIC is misbehaving.

New CentOS linux_base for testing soonish

It seems my HOWTO create a new linux_base port was not too bad. There is now a PR for a CentOS 6 based linux_base port. I had a quick look at it and it seems that it is nearly usable to include into the Ports Collection (the SRPMs need to be added, but that can be done within some minutes).

When FreeBSD 8.3 is released and the Ports Collection open for sweeping commits again, I will ask portmgr to do a repo-copy for the new port and commit it. This is just the linux_base port, not the complete infrastructure which is needed to completely replace the current default linuxulator userland. This is just a start. The process of switching to a more recent linux_base port is a long process, and in this case depends upon enough support in the supported FreeBSD releases.

Attention: Anyone installing the port from the PR should be aware that using it is a highly experimental task. You need to change the linuxulator to impersonate himself as a linux 2.6.18 kernel (described in the pkg-message of the port), and the code in FreeBSD is far from supporting this. Anyone who wants to try it is welcome, but you have to run FreeBSD-current as of at least the last weekend, and watch out for kernel messages about unsupported syscalls. Reports to [email protected] please, not here on the webpage.

Share

New opportunities in the linuxulator

Last weekend I committed some dummy-syscalls to the linuxulator in FreeBSD-current. I also added some comments to syscalls.master which should give a hint which linux kernel had them for the first time (if the linux man–page I looked this up in is correct). So if someone wants to experiment with a higher compat.linux.osrelease than 2.6.16 (as it is needed for a CentOS based linux_base), he should now get some kernel messages about unimplemented syscalls instead of a silent failure.

There may be some low-hanging fruits in there, but I did not really verify this by checking what the dummy syscalls are supposed to do in linux and if we can easily map this to existing FreeBSD features. In case someone has a look, please send an email to emulation@FreeBSD.org.

Share

.. and the price of packaging up software? Billions.

A friend of mine from Perth recently wrote an article outlining the "cost" of Debian Wheezy:

http://blog.james.rcpt.to/2012/02/13/debian-wheezy-us19-billion-your-price-free/

To quote:

"In my analysis the projected cost of producing Debian Wheezy in February 2012 is US$19,070,177,727 (AU$17.7B, EUR€14.4B, GBP£12.11B), making each package’s upstream source code wrth an average of US$1,112,547.56 (AU$837K) to produce. Impressively, this is all free (of cost)."

Now this has apparently caused a bit of a stir among Linux and IT news sites. It's a large number, right? It's all free, right?

However - Debian for the most part is a package repository. Sure, a lot of effort goes into building and maintaining that - and I think _that_ should be assigned a cost - but I think counting all of those package upstreams as part of Debian is hiding the true nature of software development.

According to Ohloh, the cost of producing FreeBSD, at $72,000 a year per programmer, would be $ 243,777,135, _before_ all of the packages in the FreeBSD ports repository. FreeBSD has over 23,000 packages too, just like Debian - if those were also counted, it may start to push that figure far up from millions to billions.

Does it mean Debian Wheezy is equivalent of $20B of effort? Maybe. But then a tiny (comparatively) more effort and you end up with FreeBSD. Or, with different effort - Redhat. Or Gentoo. Or Mandrake. Or NetBSD. Or OpenBSD. Or MacPorts/Fink, which packages this similar software for MacOS X.

What I guess I'm trying to say is this. You get cool stuff from programmers for free. Including that in your project "cost" just seems silly.

A phoronix benchmark creates a huge benchmarking discussion

The recent Phoronix benchmark which compared a release candidate of FreeBSD 9 with Oracle Linux Server 6.1 created a huge discussion in the FreeBSD mailinglists. The reason was that some people think the numbers presented there give a wrong picture of FreeBSD. Partly because not all benchmark numbers are presented in the most prominent page (as linked above), but only at a different place. This gives the impression that FreeBSD is inferior in this benchmark while it just puts the focus (for a reason, according to some people) on a different part of the benchmark (to be more specific, blogbench is doing disk reads and writes in parallel, FreeBSD gives higher priority to writes than to reads, FreeBSD 9 outperforms OLS 6.1 in the writes while OLS 6.1 shines with the reads, and only the reads are presented on the first page). Other complaints are that it is told that the default install was used (in this case UFS as the FS), when it was not (ZFS as the FS).

The author of the Phoronix article participated in parts of the discussion and asked for specific improvement suggestions. A FreeBSD committer seems to be already working to get some issues resolved. What I do not like personally, is that the article is not updated with a remark that some things presented do not reflect the reality and a retest is necessary.

As there was much talk in the thread but not much obvious activity from our side to resolve some issues, I started to improve the FreeBSD wiki page about benchmarking so that we are able to point to it in case someone wants to benchmark FreeBSD. Others already chimed in and improved some things too. It is far from perfect, some more eyes — and more importantly some more fingers which add content — are needed. Please go to the wiki page and try to help out (if you are afraid to write something in the wiki, please at least tell your suggestions on a FreeBSD mailinglist so that others can improve the wiki page).

What we need too, is a wiki page about FreeBSD tuning (a first step would be to take the man-page and convert it into a wiki page, then to improve it, and then to feed back the changes to the man-page while keeping the wiki page to be able to cross reference parts from the benchmarking page).

I already told about this in the thread about the Phoronix benchmark: everyone is welcome to improve the situation. Do not talk, write something. No matter if it is an improvement to the benchmarking page, tuning advise, or a tool which inspects the system and suggests some tuning. If you want to help in the wiki, create a FirstnameLastname account and ask a FreeBSD comitter for write access.

A while ago (IIRC we have to think in months or even years) there was some framework for automatic FreeBSD benchmarking. Unfortunately the author run out of time. The framework was able to install a FreeBSD system on a machine, run some specified benchmark (not much benchmarks where integrated), and then install another FreeBSD version to run the same benchmark, or to reinstall the same version to run another benchmark. IIRC there was also some DB behind which collected the results and maybe there was even some way to compare them. It would be nice if someone could get some time to talk with the author to get the framework and set it up somewhere, so that we have a controlled environment where we can do our own benchmarks in an automatic and repeatable fashion with several FreeBSD versions.

Share

Concatenating Video Files with MPlayer

I found out how to concatenate the parts of a video to a single file with MPlayer. It’s relatively easy, so this is just a mini-post to save it for posterity:

$ cat video_part1.avi video_part2.avi ... > temp.avi
$ mencoder -forceidx -oac copy -ovc copy \
    temp.avi -o final.avi && \
    rm temp.avi

Filed under: Computers, Free software, FreeBSD, Linux, Open source, Software Tagged: Computers, Free software, FreeBSD, hellug, Linux, Open source, Software

Adventures with FOG project, aka "All software sucks"

We needed to uplift 80+ desktop PCs from a pile of cheap parts to usable state in about two weeks. Nothing to it, right? Mass installations and cloning are done every day, right? Well, no. I basically had two projects to pick from - Clonezilla and FOG, and picked FOG because Clonezilla is old and clunky, while FOG has a relatively nice and powerful web interface. As it turned out, either of them would have collapsed under the task...

Read more...

Adventures with FOG project, aka "All software sucks"

[this post is basically me venting after an all-nighter, it's way too harsh than it should have been]

An alternative title to this could be "if something looks too good to be true, it usually is."

We needed to uplift 80+ desktop PCs from a pile of cheap parts to usable state in about two weeks. Nothing to it, right? Mass installations and cloning are done every day, right? Well, no. I basically had two projects to pick from - Clonezilla and FOG, and picked FOG because Clonezilla is old and clunky, while FOG has a relatively nice and powerful web interface. As it turned out, either of them would have collapsed under the task...

Read more...

Unit Testing Uncovers Bugs

As part of the ‘utility’ library in one of the projects we are using at work, I wrote two small wrappers around strtol() and strtoul(). These two functions support a much more useful error reporting mechanism than the plain atoi() and atol() functions, but getting the error checking right in all the places they are called is a bit boring and cumbersome. This is probably part of the reason why there are still programs out there that use atoi() and atol().

For example here’s how I usually check for errors in calls to the strtol() and strtoul() functions:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &amp;endp, base);
if (errno != 0 || (endp != NULL && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
}
/* At this point 'x' contains the parsed value. */

This is a lot of code for parsing a single long value. For one or two input strings it may be ok to repeat the code in the places where the numeric parsing code is needed. For more than a couple of input strings it really feels boring to repeat this code again and again.

When I set out to write the wrapper code for strtol() and strtoul() my goal was to make it very easy to parse input strings. A typical call to the parsing function should be a single line of code; it should be very clear if the parsing attempt succeeded or failed; it should also be possible to get both the parsing success or failure and the numeric value we just parsed; it should also be possible to get hold of the last character we managed to parse, so that strings like "100 200 300" can be parsed efficiently without having to manually find where the textual representation of the first number ends or the second one starts.

That's quite a list of goals for a single function, but the function call style I envisioned looked something like this:

long value;
char *endp = NULL;

if (parselong("0x12345678", &endp, 16, &value) != 0) {
        err(1, "parse error");
}

The return value of parselong() makes it very clear if the parsing attempt succeeded or failed. A return value of zero means success. Any other return value means failure.

The parsed value is returned through the &value pointer. If the parsing attempt has failed parselong() can leave the value unmodified to avoid inflicting spurious side-effects to its calling code because of a failed attempt to parse an input string.

If the parsing attempt has succeeded, &endp may be set to point right after the last character that was successfully parsed. This is actually part of the documented interface of strtol() and strtoul(), so it comes for free by wrapping these functions.

Finally, parsing a long value is a single function call. It is a lot easier to call the parsing function without having to repeat all the error checking boilerplate at each calling site. It's even easy to "chain" multiple parsing attempts using a style similar to:

long value1, value2, value3;

if (parselong("0x12345678", NULL, 16, &value1) != 0 ||
    parselong("0xdeadbeef", NULL, 16, &value2) != 0 ||
    parselong("0xf00fc0de", NULL, 16, &value3) != 0)
        err(1, "parse error");

Not that this is a good style of reporting errors, but it is possible, just because it's now easy to parse a value and check if it was parsed correctly with a single line of code.

The Unit Tests Fail on Linux

Several months passed after I wrote the initial parselong() and parseulong() functions. In the meantime I had to port the program using them to other platforms. The initial target platform was FreeBSD.

This is a bug that lurked for a few months in the initial code of parselong() until I had to port the function to another platform and started writing unit tests to verify that it works the way I expected it to work on all possible systems. In retrospect I should have started by writing the unit tests, but that's something I can say now because I finally got around to doing it and they did serve a very useful purpose.

When I had to port my 'utility' functions to work on several Linux versions too, I wrote a collection of unit tests for parselong() and parseulong(). The testing framework I used was CUnit because of the way it nicely integrates with plain ANSI C code.

One of the test functions I wrote was supposed to check for failures returned by parselong() for invalid input strings. The bulk of the test function was something like this:

#include "CUnit/Basic.h"

void
test_parselong_failures(void)
{
        long value = TEST_VALUE_ULONG_MAGIC;

        CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);

        CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);

        CU_ASSERT_EQUAL(parselong("-", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);
        ...
        CU_PASS("parselong() failures for invalid values look ok");
}

Running the unit tests on FreeBSD seemed to work fine. After all the initial version of the parselong() function had been manually tested with the same input strings earlier.

When I tried running the same test cases on Linux though, they failed. Apparently parselong() was not detecting that strtol() failed to parse the input string "xxx" or any other input strings from the ones tested in the test_parselong_failures() function!

The Bug Uncovered

Adding a couple of debugging printf() calls to parselong() itself showed that on Linux parselong() was returning zero for invalid input strings when strtol() could parse no character at all from the input string.

The initial version of the error checking code for strtol() was similar to:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &endp, base);
if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
}
/* At this point 'x' contains the parsed value. */

The highlighted part (endp != str) of the error checking code assumes that strtol() will move the 'endp' pointer at least one character after the start of the input string. Apparently on Linux this is not the case. The strtol() function of Linux does not move 'endp' at all if it cannot parse even a single character of the input string. This seems to be the correct behavior for strtol(), but it was hidden for a while, lurking in the original parselong() code, until I ran the unit tests of the function on Debian GNU/Linux.

The CUnit driver program that I used to run the test cases failed on Linux with error messages like:

  1. test_parselong.c:63  - CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value),-1)
  2. test_parselong.c:64  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)
  3. test_parselong.c:66  - CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1)
  4. test_parselong.c:67  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)

The culprit for these test case failures was the assumption that Linux would set errno to a non-zero value for an invalid input string... Apparently, it doesn't. The following small program prints different output on BSD vs. Linux:

$ cat -n strtest.c
     1  #include <errno.h>
     2  #include <limits.h>
     3  #include <stdio.h>
     4  #include <stdlib.h>
     5
     6  int
     7  main(void)
     8  {
     9          long value;
    10          const char *input = "xxx";
    11          char *endp = NULL;
    12
    13          errno = 0;
    14          value = strtol(input, &endp, 0);
    15          printf("str = %p = \"%s\"\n", input, input);
    16          printf("endp = %p \"%s\"\n", endp, endp ? endp : "(null)");
    17          if (endp != NULL) {
    18                  printf("endp[0] = '%c' (%d 0%03o #x%02x)\n",
    19                    *endp, *endp, *endp, *endp);
    20          }
    21          printf("errno = %d\n", errno);
    22          printf("value = %ld 0%lo #x%lx\n", value, value, value);
    23          return EXIT_SUCCESS;
    24  }

On FreeBSD the output of this program includes an errno value of EINVAL:

freebsd$ cc strtest.c
freebsd$ ./a.out
str = 0x8048604 = "xxx"
endp = 0x8048604 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 22
value = 0 00 #x0
freebsd$ fgrep 22 /usr/include/sys/errno.h
#define EINVAL          22              /* Invalid argument */
freebsd$

On a recent update of Debian GNU/Linux "testing" the output is slightly different:

debian$ cc strtest.c
debian$ ./a.out
str = 0x8048630 = "xxx"
endp = 0x8048630 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 0
value = 0 00 #x0
debian$

This means that the only indication we have that the Linux version of strtol() failed to parse some of the input text is the value of 'endp': it's the same as the input string. The error-checking code of the original parselong() wrapper was:

        x = strtol(str, &endp, base);
        if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
            (isdigit(*endp) != 0 || isspace(*endp) == 0)))
                error(...);

But on Linux both of the following are true:

  • errno is not set to a non-zero value.
  • If strtol() could not parse even one input character, endp == str.

This caused parselong() to bypass the error checking code, and try to return a 'valid' result even tough the Linux strtol() version has failed. Hence the failure of the unit tests.

Removing the (endp != str) conditional expression means that the error checking code works equally well on Linux and BSD. The BSD version of strtol() returns a non-zero errno value, triggerring the first part of the error checking code. The Linux version returns an endp pointer that is non-null and fails the '\0' check later on. The new parselong() function is slightly shorter and it passes the unit tests on both BSD and Linux.

Conclusions

There is something thrilling about fixing bugs by removing code. This bug was one of the few cases I've come across during the last couple of months where removing code was an improvement. There's probably a joke about "writing too much code" and the bug-resolving debt each line of new code introduces. I think I'll leave that for another time though.

The most important conclusion of today's bug hunting session was that Unit Testing really does work and it pays back in real, quite tangible ways. Had I not spent a bit of time to think about what the parselong() and parseulong() functions are supposed to do, when they are supposed to fail and how they are allowed to fail, I would not spent the time to write test cases for them. Had I not written the test cases, I wouldn't notice there is a failing test case on Linux. Had I not seen that I wouldn't realize some times the two functions were returning completely bogus results on Linux systems.

The central place the unit testing code has in this story is an important and serious lesson for me:

KEEP TESTING!

Filed under: Computers, FreeBSD, GNU/Linux, Linux, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, Programming, Software, testing

Emprisonner une debian dans un FreeBSD

Parfois on a besoin de tester des choses sous linux, parfois on a besoin d'utiliser des applications linux-only, où pour tout un tas d'autre raison on a besoin de faire tourner un linux.

Pour ça c'est vrai que l'on peut virtualiser dans un virtualbox ou un qemu, c'est bien mais c'est quand même coûteux en terme de ressources pour le host.

Sous FreeBSD nous disposons du linuxulator c'est la couche d'emulation permettant de faire tourner des applications Linux sous FreeBSD.

Ni une ni deux je me dis qu'il est possible de faire de petites choses sympathiques avec ça : une jail linux-only.

Il faut d'abord préparer le terrain :

# mkdir /home/jails/debian
# mkdir /home/jails/debian/dev
# mkdir /home/jails/debian/proc
# mkdir /home/jails/debian/sys
# kldload linux
# kldload linprocfs
# kldload linsysfs
# kldload lindev
# mount -t devfs none /home/jails/debian/dev
# mount -t linprocfs none /home/jails/debian/proc
# mount -t linsysfs none /home/jails/debian/sys

Nous allons donc utiliser /home/jails/debian comme racine de la debian que nous allons installer.

Nous chargeons tous les pilotes nécessaires (notez que lindev est apparu en FreeBSD 9 et a été MFCed en 8-STABLE il n'est absolument pas obligatoire)

On pourrait faire l'installation avec debootstrap, mais comme je suis un feignant j'ai préféré utiliser un template openvz tout fait :

# fetch http://download.openvz.org/template/precreated/debian-5.0-x86.tar.gz

Puis le depacker dans ma jail :

# tar xvfp debian-5.0-x86.tar.gz -C debian --exclude dev* --exclude proc* --exclude sys*

Pour démarrer correctement la jail il faut qu'au minimum un service tourne dans la jail (je n'ai pas réussit à faire une jail linux-only persistente). Par défaut le script de démarrage des jails essaye de lancer /etc/rc que nous allons créer et de lancer /etc/rc.shutdown pour s'arrêter.

# echo "/etc/init.d/cron start" > /home/jails/debian/etc/rc
# chmod 755 /home/jails/debian/etc/rc
# echo "/etc/init.d/cron stop" > /home/jails/debian/etc/rc.shutdown
# chmod 755 /home/jails/debian/etc/rc.shutdown

dans /etc/rc.conf on règle le lancement de la jail

jail_debian_rootdir=/home/jails/debian
jail_debian_hostname="debian"
jail_debian_ip="192.168.1.3"
jail_debian_interface="nfe0"
jail_debian_devfs_enable="YES"
jail_debian_devfs_ruleset="devfsrules_jail"
jail_debian_flags="-n debian"

on démarre la jail :

# /etc/rc.d/jail start debian

et magie :

#jls
   JID  IP Address      Hostname                      Path
    15  192.168.1.3     debian                        /home/jails/debian
#jexec debian uname -a
Linux debian 2.6.16 FreeBSD 8.0-STABLE #3: Sun Jan 10 20:39:38 CET 2010 i686 GNU/Linux
#jexec debian cat /etc/debian_version
5.0.4

Vous voila avec une belle debian emprisonnée dans un freebsd.

Attention, tout ne fonctionne pas parfaitement : sysklogd ne marche pas à cause des accès /dev par exemple, mais c'est 99% fonctionnel quand même.

Using the Condensed DejaVu Font-Family Variant by Default

The DejaVu font family is a very popular font collection for Linux and BSD systems. The font package of DejaVu includes a condensed variant; a variation of the same basic font theme that sports narrower characters.

The difference between the two font variants is very easy to spot when they are displayed side by side. The following image shows a small part of a Firefox window, displaying news articles as part of a Google Reader session:

DejaVu Font Family Variants

The window snapshot on the left of this image shows the normal variant of the DejaVu Sans font family. The right-hand snapshot shows the condensed variant of the font family.

I usually to prefer the condensed variant for the display of text on a computer monitor. The narrower characters, with a height that is slightly larger than the width of each glyph, are more æsthetically pleasing for my eyes. Naturally, this is only a matter of personal preference; the normal variant may look and feel more pleasing to some other person. If your own preference leans towards the normal variant of the font, this article may not be very interesting to you, so it is probably ok if you stop reading now.

If you are one of those people who like the condensed variant of the DejaVu font family more than the normal variant though, by all means, keep reading. The “hack” I am going to describe uses a configuration tweak of the fontconfig package to forcibly replace all instances of the DejaVu Sans and the DejaVu Serif fonts with their condensed version.

The Main Problem with Firefox and Condensed Fonts

An interaction between fontconfig and the way some programs select font variants means that Firefox, OpenOffice and a few other programs cannot display the condensed variant in their GTK+ font selection dialog. Firefox, for example, shows only “DejaVu Sans”:

Firefox Font Selection Dialog

As a result, it is impossible to use the font selection dialog of Firefox to configure the condensed font variant as the default font for web content. This kept annoying me for a while, but not enough to actually do something about it. I always thought it would be much better if we could select either font variant, but kept saying to myself that “this may eventually be fixed”. Since this is a long standing bug that has not been fixed in Firefox or the other programs that exhibit the same misbehavior, I decided this morning to forcefully substitute all instances of DejaVu Sans with DejaVu Sans Condensed and all instances of DejaVu Serif with DejaVu Serif Condensed in my FreeBSD/Gnome desktop.

What Could Work but is Not a Good Idea

One way to do this is, of course, by manually replacing the TrueType font files of the non-condensed font variants with their condensed counterparts (e.g. by logging in as the system administrator and overwriting the non-condensed font files). I didn’t want to go that way. Manually modifying the installed versions of files registered into the package database is a bad idea and very ugly hack, because it will stop working the next time the same package is installed. So I chose to read a bit more about fontconfig and see if I could do the same without all the smelly hackery of overwriting font files.

Using Fontconfig Might be a Better Idea

Fontconfig is a library that enables system-wide and per-user configuration, customization and application access to font files. The personal fontconfig configuration of each user is stored in a file called .fonts.conf, in the home directory of each user. The format of this file is defined by a relatively "simple" XML schema. Some of the options supported by the fontconfig schema are described in the fontconfig manual. The quotes around "simple" are there because once you see a few examples of fontconfig tweaks, it is not very hard to come up with similar configuration tweaks, but the precise format and syntax of all the options supported by the syntax is, alas, not very intuitive.

When the .fonts.conf exists in your home directory it has the following general format:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
...
</fontconfig>

Font options that apply to your personal fontconfig setup go inside the <fontconfig>...</fontconfig> element. There are many sorts of options that can be placed inside this XML element, but it is not the intent of this article to describe all of them. For a full list of the options, you should read the fontconfig manual.

There are only three fontconfig options that we are interested in to install the condensed font tweak: <match>, <test> and <edit>. Using these three, we can forcefully replace all uses of the DejaVu fonts with their condensed versions, by adding the following XML snippet to ~/.fonts.conf:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

<fontconfig>
  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Sans</string>
    </test>
    <edit mode="assign" name="family">
      <string>DejaVu Sans Condensed</string>
    </edit>
  </match>

  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Serif</string>
    </test>
    <edit mode="assign" name="family">
      <string>DejaVu Serif Condensed</string>
    </edit>
  </match>
</fontconfig>

XML is a very verbose and chatty format, but what these two small snippers of XML configuration do should be easy to understand:

  • When an application asks for a font whose family matches "DejaVu Sans", return a font from the "DejaVu Sans Condensed" variant.
  • When an application asks for a font whose family matches "DejaVu Serif", return a font from the "DejaVu Serif Condensed" variant.

That's it. Now every time an application asks for a font of the DejaVu family, fontconfig will always return a condensed variant of the font. The font-variant replacement is done transparently by fontconfig, so you don't have to configure each application separately, or to install application-specific hacks that will work for one application but fail or be invisible to all other programs running on your desktop.

The downside of this forceful font-variant replacement is that it is now impossible to select the non-condensed variants of the DejaVu fonts. The good thing is that, at least in my case, this is "ok" and I can certainly live with it, because I always prefer the condensed variant for this particular font family.


Posted in Computers, Free software, FreeBSD, GNOME, GNU/Linux, Linux, Open source, Software Tagged: Computers, Free software, FreeBSD, GNOME, GNU/Linux, hellug, Linux, Open source, Software

fts(3) or Avoiding to Reinvent the Wheel

One of the C programs I was working on this weekend had to find all files that satisfy a certain predicate and add them to a list of “pending work”. The first thing that comes to mind is probably a custom opendir(), readdir(), closedir() hack. This is probably ok when one only has these structures, but it also a bit cumbersome. There are various sorts of DIR and dirent structures and recursing down a large path requires manually keeping track of a lot of state.

A small program that uses opendir() and its friends to traverse a hierarchy of files and print their names may look like this:

% cat -n opendir-sample.c
     1  #include <sys/types.h>
     2
     3  #include <sys/stat.h>
     4
     5  #include <assert.h>
     6  #include <dirent.h>
     7  #include <limits.h>
     8  #include <stdio.h>
     9  #include <string.h>
    10
    11  static int      ptree(char *curpath, char * const path);
    12
    13  int
    14  main(int argc, char * const argv[])
    15  {
    16          int k;
    17          int rval;
    18
    19          for (rval = 0, k = 1; k < argc; k++)
    20                  if (ptree(NULL, argv[k]) != 0)
    21                          rval = 1;
    22          return rval;
    23  }
    24
    25  static int
    26  ptree(char *curpath, char * const path)
    27  {
    28          char ep[PATH_MAX];
    29          char p[PATH_MAX];
    30          DIR *dirp;
    31          struct dirent entry;
    32          struct dirent *endp;
    33          struct stat st;
    34
    35          if (curpath != NULL)
    36                  snprintf(ep, sizeof(ep), "%s/%s", curpath, path);
    37          else
    38                  snprintf(ep, sizeof(ep), "%s", path);
    39          if (stat(ep, &st) == -1)
    40                  return -1;
    41          if ((dirp = opendir(ep)) == NULL)
    42                  return -1;
    43          for (;;) {
    44                  endp = NULL;
    45                  if (readdir_r(dirp, &entry, &endp) == -1) {
    46                          closedir(dirp);
    47                          return -1;
    48                  }
    49                  if (endp == NULL)
    50                          break;
    51                  assert(endp == &entry);
    52                  if (strcmp(entry.d_name, ".") == 0 ||
    53                      strcmp(entry.d_name, "..") == 0)
    54                          continue;
    55                  if (curpath != NULL)
    56                          snprintf(ep, sizeof(ep), "%s/%s/%s", curpath,
    57                              path, entry.d_name);
    58                  else
    59                          snprintf(ep, sizeof(ep), "%s/%s", path,
    60                              entry.d_name);
    61                  if (stat(ep, &st) == -1) {
    62                          closedir(dirp);
    63                          return -1;
    64                  }
    65                  if (S_ISREG(st.st_mode) || S_ISDIR(st.st_mode)) {
    66                          printf("%c %s\n", S_ISDIR(st.st_mode) ? 'd' : 'f', ep);
    67                  }
    68                  if (S_ISDIR(st.st_mode) == 0)
    69                          continue;
    70                  if (curpath != NULL)
    71                          snprintf(p, sizeof(p), "%s/%s", curpath, path);
    72                  else
    73                          snprintf(p, sizeof(p), "%s", path);
    74                  snprintf(ep, sizeof(ep), "%s", entry.d_name);
    75                  ptree(p, ep);
    76          }
    77          closedir(dirp);
    78          return 0;
    79  }

With more than 80 lines, this looks a bit too complex for the simple task it does. It has to keep a lot of temporary state information around in the two ep[] and p[] buffers, and all the manual work of setting updating and maintaining this internal state is adding so much noise around the actual printf() statement at line 68 that it is almost too hard to understand what this particular bit of code is supposed to do.

The program still "works", in a way, so if you compile and run it, the expected results come up:

keramida@kobe:/home/keramida$ cc -O2 opendir-sample.c
keramida@kobe:/home/keramida$ ./a.out /tmp
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
keramida@kobe:/home/keramida$ ./a.out /tmp /etc/defaults
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
f /etc/defaults/rc.conf
f /etc/defaults/bluetooth.device.conf
f /etc/defaults/devfs.rules
f /etc/defaults/periodic.conf

But this program looks "ugly". Fortunately, the BSDs and Linux provide a more elegant interface for traversing file hierarchies: the fts(3) family of functions. A similar program that uses fts(3) to traverse the filesystem hierarchies rooted at the arguments of main() is:

% cat -n fts-sample.c
     1  #include <sys/types.h>
     2
     3  #include <sys/stat.h>
     4
     5  #include <err.h>
     6  #include <fts.h>
     7  #include <stdio.h>
     8
     9  static int      ptree(char * const argv[]);
    10
    11  int
    12  main(int argc, char * const argv[])
    13  {
    14          int rc;
    15
    16          if ((rc = ptree(argv + 1)) != 0)
    17                  rc = 1;
    18          return rc;
    19  }
    20
    21  static int
    22  ptree(char * const argv[])
    23  {
    24          FTS *ftsp;
    25          FTSENT *p, *chp;
    26          int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
    27          int rval = 0;
    28
    29          if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL) {
    30                  warn("fts_open");
    31                  return -1;
    32          }
    33          /* Initialize ftsp with as many argv[] parts as possible. */
    34          chp = fts_children(ftsp, 0);
    35          if (chp == NULL) {
    36                  return 0;               /* no files to traverse */
    37          }
    38          while ((p = fts_read(ftsp)) != NULL) {
    39                  switch (p->fts_info) {
    40                  case FTS_D:
    41                          printf("d %s\n", p->fts_path);
    42                          break;
    43                  case FTS_F:
    44                          printf("f %s\n", p->fts_path);
    45                          break;
    46                  default:
    47                          break;
    48                  }
    49          }
    50          fts_close(ftsp);
    51          return 0;
    52  }

This version is not particularly smaller; it's only 34-35% smaller in LOC. It is, however, far more elegant and a lot easier to read:

  • By using a higher level interface, the program is shorter and easier to understand.
  • By using simpler constructs in the fts_read() loop, it very obvious what the program does for each file type (file vs. directory).
  • The FTS_COMFOLLOW flag sets up things for following symbolic links in one simple place (something entirely missing from the opendir version).
  • There are no obvious bugs about copying half of a pathname, or forgetting to recurse in some cases, or forgetting to print some directory because of a complex interaction between superfluous bits of code. Simpler is also less prone to bugs in this case.

So the next time you are about to build a filesystem traversal toolset from scratch, you can avoid all the pain (and bugs): use fts(3)! :-)


Posted in Computers, FreeBSD, GNU/Linux, Linux, NetBSD, Open source, OpenBSD, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, NetBSD, Open source, OpenBSD, Programming, Software