Category Archives: GNU/Linux

Powerful Regular Expressions Combined with Lisp in Emacs

Regular expressions are a powerful text transformation tool. Any UNIX geek will tell you that. It’s so deeply ingrained into our culture, that we even make jokes about it. Another thing that we also love is having a powerful extension language at hand, and Lisp is one of the most powerful extension languages around (and of course, we make jokes about that too).

Emacs, one of the most famous Lisp applications today, has for a while now the ability to combine both of these, to reach entirely new levels of usefulness. Combining regular expressions and Lisp can do really magical things.

An example that I recently used a few times is parsing & de-humanizing numbers in dstat output. The output of dstat includes numbers that are printed with a suffix, like ‘B’ for bytes, ‘k’ for kilobytes and ‘M’ for megabytes, e.g.:

----system---- ----total-cpu-usage---- --net/eth0- -dsk/total- sda-
     time     |usr sys idl wai hiq siq| recv  send| read  writ|util
16-05 08:36:15|  2   3  96   0   0   0|  66B  178B|   0     0 |   0
16-05 08:36:16| 42  14  37   0   0   7|  92M 1268k|   0     0 |   0
16-05 08:36:17| 45  11  36   0   0   7|  76M 1135k|   0     0 |   0
16-05 08:36:18| 27  55   8   0   0  11|  67M  754k|   0    99M|79.6
16-05 08:36:19| 29  41  16   5   0  10| 113M 2079k|4096B   63M|59.6
16-05 08:36:20| 28  48  12   4   0   8|  58M  397k|   0    95M|76.0
16-05 08:36:21| 38  37  14   1   0  10| 114M 2620k|4096B   52M|23.2
16-05 08:36:22| 37  54   0   1   0   8|  76M 1506k|8192B   76M|33.6

So if you want to graph one of the columns, it's useful to convert all the numbers in the same unit. Bytes would be nice in this case.

Separating all columns with '|' characters is a good start, so you can use e.g. a CSV-capable graphing tool, or even simple awk scripts to extract a specific column. 'C-x r t' can do that in Emacs, and you end up with something like this:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66B| 178B|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1268k|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1135k|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 754k|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2079k|4096B|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 397k|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2620k|4096B|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1506k|8192B|  76M|33.6|

The leading and trailing '|' characters are there so we can later use orgtbl-mode, an awesome table editing and realignment tool of Emacs. Now to the really magical step: regular expressions and lisp working together.

What we would like to do is convert text like "408B" to just "408", text like "1268k" to the value of (1268 * 1024), and finally text like "67M" to the value of (67 * 1024 * 1024). The first part is easy:

M-x replace-regexp RET \([0-9]+\)B RET \1 RET

This should just strip the "B" suffix from byte values.

For the kilobyte and megabyte values what we would like is to be able to evaluate an arithmetic expression that involves \1. Something like "replace \1 with the value of (expression \1)". This is possible in Emacs by prefixing the substitution pattern with \,. This instructs Emacs to evaluate the rest of the substitution pattern as a Lisp expression, and use its string representation as the "real" substitution text.

So if we match all numeric values that are suffixed by 'k', we can use (string-to-number \1) to convert the matching digits to an integer, multiply by 1024 and insert the resulting value by using the following substitution pattern:

\,(* 1024 (string-to-number \1))

The full Emacs command would then become:

M-x replace-regexp RET \([0-9]+\)k RET \,(* 1024 (string-to-number \1)) RET

This, and the byte suffix removal, yield now the following text in our Emacs buffer:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 772096|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2128896|4096|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 406528|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2682880|4096|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1542144|8192|  76M|33.6|

Note: Some of the columns are indeed not aligned very well. We'll fix that later. On to the megabyte conversion:

M-x replace-regexp RET \([0-9]+\)M RET \,(* 1024 1024 (string-to-number \1)) RET

Which produces a version that has no suffixes at all:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  96468992|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  79691776|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  70254592| 772096|   0 |  103809024|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 118489088|2128896|4096|  66060288|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  60817408| 406528|   0 |  99614720|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 119537664|2682880|4096|  54525952|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  79691776|1542144|8192|  79691776|33.6|

Finally, to align everything in neat, pipe-separated columns, we enable M-x orgtbl-mode, and type "C-c C-c" with the pointer somewhere inside the transformed dstat output. The buffer now becomes something usable for pretty-much any graphing tool out there:

| time           | cpu | cpu | cpu | cpu | cpu | cpu |      eth0 |    eth0 |  disk |      disk | sda- |
| time           | usr | sys | idl | wai | hiq | siq |      recv |    send |  read |      writ | util |
| 16-05 08:36:15 |   2 |   3 |  96 |   0 |   0 |   0 |        66 |     178 |     0 |         0 |    0 |
| 16-05 08:36:16 |  42 |  14 |  37 |   0 |   0 |   7 |  96468992 | 1298432 |     0 |         0 |    0 |
| 16-05 08:36:17 |  45 |  11 |  36 |   0 |   0 |   7 |  79691776 | 1162240 |     0 |         0 |    0 |
| 16-05 08:36:18 |  27 |  55 |   8 |   0 |   0 |  11 |  70254592 |  772096 |     0 | 103809024 | 79.6 |
| 16-05 08:36:19 |  29 |  41 |  16 |   5 |   0 |  10 | 118489088 | 2128896 |  4096 |  66060288 | 59.6 |
| 16-05 08:36:20 |  28 |  48 |  12 |   4 |   0 |   8 |  60817408 |  406528 |     0 |  99614720 | 76.0 |
| 16-05 08:36:21 |  38 |  37 |  14 |   1 |   0 |  10 | 119537664 | 2682880 |  4096 |  54525952 | 23.2 |
| 16-05 08:36:22 |  37 |  54 |   0 |   1 |   0 |   8 |  79691776 | 1542144 |  8192 |  79691776 | 33.6 |

The trick of combining arbitrary Lisp expressions with regexp substitution patterns like \1, \2 ... \9 is something I have found immensely useful in Emacs. Now that you know how it works, I hope you can find even more amusing use-cases for it.

Update: The Emacs manual has a few more useful examples of \, in action, as pointed out by tunixman on Twitter.

Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software

Fixing Shifted-Arrow Keys in 256-Color Terminals on Linux

The terminfo entry for “xterm-256color" that ships by default as part of ncurses-base on Debian Linux and its derivatives is a bit annoying. In particular, shifted up-arrow key presses work fine in some programs, but fail in others. It's a bit of a gamble if Shift-Up works in joe, pico, vim, emacs, mutt, slrn, or what have you.

THis afternoon I got bored enough of losing my selected region in Emacs, because I forgot that I was typing in a terminal launched by a Linux desktop. SO I thought "what the heck... let's give the FreeBSD termcap entry for xterm-256color a try":

keramida> scp bsd:/etc/termcap /tmp/termcap-bsd
keramida> captoinfo -e $(                                  \
  echo $( grep '^xterm' termcap | sed -e 's/[:|].*//' ) |  \
  sed -e 's/ /,/g'                                         \
  ) /tmp/termcap  > /tmp/terminfo.src
keramida> tic /tmp/terminfo.src

Restarted my terminal, and quite unsurprisingly, the problem of Shift-Up keys was gone.

The broken xterm-256color terminfo entry from /lib/terminfo/x/xterm-256color is now shadowed by ~/.terminfo/x/xterm-256color, and I can happily keep typing without having to worry about losing mental state because of this annoying little misfeature of Linux terminfo entries.

The official terminfo database sources[1], also work fine. So now I think some extra digging is required to see what ncurses-base ships with. There's definitely something broken in the terminfo entry of ncurses-base, but it will be nice to know which terminal capabilities the Linux package botched.


Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software

Mutt-like Scrolling for Gnus

Mutt scrolls the index of email folders up or down, one line at a time, with the press of a single key: ‘<’ or ‘>’. This is a very convenient way to skim through email folder listings, so I wrote a small bit of Emacs Lisp to do the same in Gnus tonight.

;; Scrolling like mutt for group, summary, and article buffers.
;; Being able to scroll the current buffer view by one line with a
;; single key, rather than having to guess a random number and recenter
;; with `C-u NUM C-l' is _very_ convenient.  Mutt binds scrolling by one
;; line to '<' and '>', and it's something I often miss when working
;; with Gnus buffers.  Thanks to the practically infinite customizability
;; of Gnus, this doesn't have to be an annoyance anymore.

(defun keramida-mutt-like-scrolling ()
  "Set up '<' and '>' keys to scroll down/up one line, like mutt."
  ;; mutt-like scrolling of summary buffers with '<' and '>' keys.
  (local-set-key (kbd ">") 'scroll-up-line)
  (local-set-key (kbd "<") 'scroll-down-line))

(add-hook 'gnus-group-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-summary-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-article-prepare-hook 'keramida-mutt-like-scrolling)

This is now the latest addition to my ~/.gnus startup code, and we're one step closer to making Gnus behave like my favorite old-time mailer.

Filed under: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software Tagged: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software

Unit Testing Uncovers Bugs

As part of the ‘utility’ library in one of the projects we are using at work, I wrote two small wrappers around strtol() and strtoul(). These two functions support a much more useful error reporting mechanism than the plain atoi() and atol() functions, but getting the error checking right in all the places they are called is a bit boring and cumbersome. This is probably part of the reason why there are still programs out there that use atoi() and atol().

For example here’s how I usually check for errors in calls to the strtol() and strtoul() functions:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &amp;endp, base);
if (errno != 0 || (endp != NULL && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
/* At this point 'x' contains the parsed value. */

This is a lot of code for parsing a single long value. For one or two input strings it may be ok to repeat the code in the places where the numeric parsing code is needed. For more than a couple of input strings it really feels boring to repeat this code again and again.

When I set out to write the wrapper code for strtol() and strtoul() my goal was to make it very easy to parse input strings. A typical call to the parsing function should be a single line of code; it should be very clear if the parsing attempt succeeded or failed; it should also be possible to get both the parsing success or failure and the numeric value we just parsed; it should also be possible to get hold of the last character we managed to parse, so that strings like "100 200 300" can be parsed efficiently without having to manually find where the textual representation of the first number ends or the second one starts.

That's quite a list of goals for a single function, but the function call style I envisioned looked something like this:

long value;
char *endp = NULL;

if (parselong("0x12345678", &endp, 16, &value) != 0) {
        err(1, "parse error");

The return value of parselong() makes it very clear if the parsing attempt succeeded or failed. A return value of zero means success. Any other return value means failure.

The parsed value is returned through the &value pointer. If the parsing attempt has failed parselong() can leave the value unmodified to avoid inflicting spurious side-effects to its calling code because of a failed attempt to parse an input string.

If the parsing attempt has succeeded, &endp may be set to point right after the last character that was successfully parsed. This is actually part of the documented interface of strtol() and strtoul(), so it comes for free by wrapping these functions.

Finally, parsing a long value is a single function call. It is a lot easier to call the parsing function without having to repeat all the error checking boilerplate at each calling site. It's even easy to "chain" multiple parsing attempts using a style similar to:

long value1, value2, value3;

if (parselong("0x12345678", NULL, 16, &value1) != 0 ||
    parselong("0xdeadbeef", NULL, 16, &value2) != 0 ||
    parselong("0xf00fc0de", NULL, 16, &value3) != 0)
        err(1, "parse error");

Not that this is a good style of reporting errors, but it is possible, just because it's now easy to parse a value and check if it was parsed correctly with a single line of code.

The Unit Tests Fail on Linux

Several months passed after I wrote the initial parselong() and parseulong() functions. In the meantime I had to port the program using them to other platforms. The initial target platform was FreeBSD.

This is a bug that lurked for a few months in the initial code of parselong() until I had to port the function to another platform and started writing unit tests to verify that it works the way I expected it to work on all possible systems. In retrospect I should have started by writing the unit tests, but that's something I can say now because I finally got around to doing it and they did serve a very useful purpose.

When I had to port my 'utility' functions to work on several Linux versions too, I wrote a collection of unit tests for parselong() and parseulong(). The testing framework I used was CUnit because of the way it nicely integrates with plain ANSI C code.

One of the test functions I wrote was supposed to check for failures returned by parselong() for invalid input strings. The bulk of the test function was something like this:

#include "CUnit/Basic.h"

        long value = TEST_VALUE_ULONG_MAGIC;

        CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value), -1);

        CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1);

        CU_ASSERT_EQUAL(parselong("-", NULL, 0, &value), -1);
        CU_PASS("parselong() failures for invalid values look ok");

Running the unit tests on FreeBSD seemed to work fine. After all the initial version of the parselong() function had been manually tested with the same input strings earlier.

When I tried running the same test cases on Linux though, they failed. Apparently parselong() was not detecting that strtol() failed to parse the input string "xxx" or any other input strings from the ones tested in the test_parselong_failures() function!

The Bug Uncovered

Adding a couple of debugging printf() calls to parselong() itself showed that on Linux parselong() was returning zero for invalid input strings when strtol() could parse no character at all from the input string.

The initial version of the error checking code for strtol() was similar to:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &endp, base);
if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
/* At this point 'x' contains the parsed value. */

The highlighted part (endp != str) of the error checking code assumes that strtol() will move the 'endp' pointer at least one character after the start of the input string. Apparently on Linux this is not the case. The strtol() function of Linux does not move 'endp' at all if it cannot parse even a single character of the input string. This seems to be the correct behavior for strtol(), but it was hidden for a while, lurking in the original parselong() code, until I ran the unit tests of the function on Debian GNU/Linux.

The CUnit driver program that I used to run the test cases failed on Linux with error messages like:

  1. test_parselong.c:63  - CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value),-1)
  2. test_parselong.c:64  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)
  3. test_parselong.c:66  - CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1)
  4. test_parselong.c:67  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)

The culprit for these test case failures was the assumption that Linux would set errno to a non-zero value for an invalid input string... Apparently, it doesn't. The following small program prints different output on BSD vs. Linux:

$ cat -n strtest.c
     1  #include <errno.h>
     2  #include <limits.h>
     3  #include <stdio.h>
     4  #include <stdlib.h>
     6  int
     7  main(void)
     8  {
     9          long value;
    10          const char *input = "xxx";
    11          char *endp = NULL;
    13          errno = 0;
    14          value = strtol(input, &endp, 0);
    15          printf("str = %p = \"%s\"\n", input, input);
    16          printf("endp = %p \"%s\"\n", endp, endp ? endp : "(null)");
    17          if (endp != NULL) {
    18                  printf("endp[0] = '%c' (%d 0%03o #x%02x)\n",
    19                    *endp, *endp, *endp, *endp);
    20          }
    21          printf("errno = %d\n", errno);
    22          printf("value = %ld 0%lo #x%lx\n", value, value, value);
    23          return EXIT_SUCCESS;
    24  }

On FreeBSD the output of this program includes an errno value of EINVAL:

freebsd$ cc strtest.c
freebsd$ ./a.out
str = 0x8048604 = "xxx"
endp = 0x8048604 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 22
value = 0 00 #x0
freebsd$ fgrep 22 /usr/include/sys/errno.h
#define EINVAL          22              /* Invalid argument */

On a recent update of Debian GNU/Linux "testing" the output is slightly different:

debian$ cc strtest.c
debian$ ./a.out
str = 0x8048630 = "xxx"
endp = 0x8048630 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 0
value = 0 00 #x0

This means that the only indication we have that the Linux version of strtol() failed to parse some of the input text is the value of 'endp': it's the same as the input string. The error-checking code of the original parselong() wrapper was:

        x = strtol(str, &endp, base);
        if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
            (isdigit(*endp) != 0 || isspace(*endp) == 0)))

But on Linux both of the following are true:

  • errno is not set to a non-zero value.
  • If strtol() could not parse even one input character, endp == str.

This caused parselong() to bypass the error checking code, and try to return a 'valid' result even tough the Linux strtol() version has failed. Hence the failure of the unit tests.

Removing the (endp != str) conditional expression means that the error checking code works equally well on Linux and BSD. The BSD version of strtol() returns a non-zero errno value, triggerring the first part of the error checking code. The Linux version returns an endp pointer that is non-null and fails the '\0' check later on. The new parselong() function is slightly shorter and it passes the unit tests on both BSD and Linux.


There is something thrilling about fixing bugs by removing code. This bug was one of the few cases I've come across during the last couple of months where removing code was an improvement. There's probably a joke about "writing too much code" and the bug-resolving debt each line of new code introduces. I think I'll leave that for another time though.

The most important conclusion of today's bug hunting session was that Unit Testing really does work and it pays back in real, quite tangible ways. Had I not spent a bit of time to think about what the parselong() and parseulong() functions are supposed to do, when they are supposed to fail and how they are allowed to fail, I would not spent the time to write test cases for them. Had I not written the test cases, I wouldn't notice there is a failing test case on Linux. Had I not seen that I wouldn't realize some times the two functions were returning completely bogus results on Linux systems.

The central place the unit testing code has in this story is an important and serious lesson for me:


Filed under: Computers, FreeBSD, GNU/Linux, Linux, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, Programming, Software, testing


Earlier tonight, on December 7 2009, a friend and me booked our flight tickets for FOSDEM 2010. I am really excited that I am going to attend another open source & free software conference. It has been a while since I had a chance to meet with other BSD people. The last time was in Milan, in EuroBSDCon 2006. It will certainly be tons of fun to meet in person with other free and open source fans, contributors and developers!


FOSDEM is an open conference, organized every year by volunteers to promote the widespread use of Free Software and Open Source Software. It takes place in the beautiful city of Brussels (Belgium). FOSDEM meetings are recognized as “The best Free Software and Open Source events in Europe”.

This year’s FOSDEM will take place on February 6-7 of 2010. The web page of the conference is already online at Updates about the organization of the conference, transportation tips, accommodation options, the dev rooms and talks available this year, and any other bits that may be useful to attendees are often posted there by the organizing team. So if you are planning to attend, add this link to your bookmarks and keep up with the news until we meet in Brussels.

More Updates

That’s all for now. I will be posting more details about the conference and my trip to Belgium as they become available.

Posted in Computers, Conferences, Free software, FreeBSD, GNU/Linux, Open source, Programming, Software Tagged: Computers, Conferences, Free software, FreeBSD, GNU/Linux, hellug, Open source, Programming, Software

Using the Condensed DejaVu Font-Family Variant by Default

The DejaVu font family is a very popular font collection for Linux and BSD systems. The font package of DejaVu includes a condensed variant; a variation of the same basic font theme that sports narrower characters.

The difference between the two font variants is very easy to spot when they are displayed side by side. The following image shows a small part of a Firefox window, displaying news articles as part of a Google Reader session:

DejaVu Font Family Variants

The window snapshot on the left of this image shows the normal variant of the DejaVu Sans font family. The right-hand snapshot shows the condensed variant of the font family.

I usually to prefer the condensed variant for the display of text on a computer monitor. The narrower characters, with a height that is slightly larger than the width of each glyph, are more æsthetically pleasing for my eyes. Naturally, this is only a matter of personal preference; the normal variant may look and feel more pleasing to some other person. If your own preference leans towards the normal variant of the font, this article may not be very interesting to you, so it is probably ok if you stop reading now.

If you are one of those people who like the condensed variant of the DejaVu font family more than the normal variant though, by all means, keep reading. The “hack” I am going to describe uses a configuration tweak of the fontconfig package to forcibly replace all instances of the DejaVu Sans and the DejaVu Serif fonts with their condensed version.

The Main Problem with Firefox and Condensed Fonts

An interaction between fontconfig and the way some programs select font variants means that Firefox, OpenOffice and a few other programs cannot display the condensed variant in their GTK+ font selection dialog. Firefox, for example, shows only “DejaVu Sans”:

Firefox Font Selection Dialog

As a result, it is impossible to use the font selection dialog of Firefox to configure the condensed font variant as the default font for web content. This kept annoying me for a while, but not enough to actually do something about it. I always thought it would be much better if we could select either font variant, but kept saying to myself that “this may eventually be fixed”. Since this is a long standing bug that has not been fixed in Firefox or the other programs that exhibit the same misbehavior, I decided this morning to forcefully substitute all instances of DejaVu Sans with DejaVu Sans Condensed and all instances of DejaVu Serif with DejaVu Serif Condensed in my FreeBSD/Gnome desktop.

What Could Work but is Not a Good Idea

One way to do this is, of course, by manually replacing the TrueType font files of the non-condensed font variants with their condensed counterparts (e.g. by logging in as the system administrator and overwriting the non-condensed font files). I didn’t want to go that way. Manually modifying the installed versions of files registered into the package database is a bad idea and very ugly hack, because it will stop working the next time the same package is installed. So I chose to read a bit more about fontconfig and see if I could do the same without all the smelly hackery of overwriting font files.

Using Fontconfig Might be a Better Idea

Fontconfig is a library that enables system-wide and per-user configuration, customization and application access to font files. The personal fontconfig configuration of each user is stored in a file called .fonts.conf, in the home directory of each user. The format of this file is defined by a relatively "simple" XML schema. Some of the options supported by the fontconfig schema are described in the fontconfig manual. The quotes around "simple" are there because once you see a few examples of fontconfig tweaks, it is not very hard to come up with similar configuration tweaks, but the precise format and syntax of all the options supported by the syntax is, alas, not very intuitive.

When the .fonts.conf exists in your home directory it has the following general format:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

Font options that apply to your personal fontconfig setup go inside the <fontconfig>...</fontconfig> element. There are many sorts of options that can be placed inside this XML element, but it is not the intent of this article to describe all of them. For a full list of the options, you should read the fontconfig manual.

There are only three fontconfig options that we are interested in to install the condensed font tweak: <match>, <test> and <edit>. Using these three, we can forcefully replace all uses of the DejaVu fonts with their condensed versions, by adding the following XML snippet to ~/.fonts.conf:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Sans</string>
    <edit mode="assign" name="family">
      <string>DejaVu Sans Condensed</string>

  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Serif</string>
    <edit mode="assign" name="family">
      <string>DejaVu Serif Condensed</string>

XML is a very verbose and chatty format, but what these two small snippers of XML configuration do should be easy to understand:

  • When an application asks for a font whose family matches "DejaVu Sans", return a font from the "DejaVu Sans Condensed" variant.
  • When an application asks for a font whose family matches "DejaVu Serif", return a font from the "DejaVu Serif Condensed" variant.

That's it. Now every time an application asks for a font of the DejaVu family, fontconfig will always return a condensed variant of the font. The font-variant replacement is done transparently by fontconfig, so you don't have to configure each application separately, or to install application-specific hacks that will work for one application but fail or be invisible to all other programs running on your desktop.

The downside of this forceful font-variant replacement is that it is now impossible to select the non-condensed variants of the DejaVu fonts. The good thing is that, at least in my case, this is "ok" and I can certainly live with it, because I always prefer the condensed variant for this particular font family.

Posted in Computers, Free software, FreeBSD, GNOME, GNU/Linux, Linux, Open source, Software Tagged: Computers, Free software, FreeBSD, GNOME, GNU/Linux, hellug, Linux, Open source, Software

fts(3) or Avoiding to Reinvent the Wheel

One of the C programs I was working on this weekend had to find all files that satisfy a certain predicate and add them to a list of “pending work”. The first thing that comes to mind is probably a custom opendir(), readdir(), closedir() hack. This is probably ok when one only has these structures, but it also a bit cumbersome. There are various sorts of DIR and dirent structures and recursing down a large path requires manually keeping track of a lot of state.

A small program that uses opendir() and its friends to traverse a hierarchy of files and print their names may look like this:

% cat -n opendir-sample.c
     1  #include <sys/types.h>
     3  #include <sys/stat.h>
     5  #include <assert.h>
     6  #include <dirent.h>
     7  #include <limits.h>
     8  #include <stdio.h>
     9  #include <string.h>
    11  static int      ptree(char *curpath, char * const path);
    13  int
    14  main(int argc, char * const argv[])
    15  {
    16          int k;
    17          int rval;
    19          for (rval = 0, k = 1; k < argc; k++)
    20                  if (ptree(NULL, argv[k]) != 0)
    21                          rval = 1;
    22          return rval;
    23  }
    25  static int
    26  ptree(char *curpath, char * const path)
    27  {
    28          char ep[PATH_MAX];
    29          char p[PATH_MAX];
    30          DIR *dirp;
    31          struct dirent entry;
    32          struct dirent *endp;
    33          struct stat st;
    35          if (curpath != NULL)
    36                  snprintf(ep, sizeof(ep), "%s/%s", curpath, path);
    37          else
    38                  snprintf(ep, sizeof(ep), "%s", path);
    39          if (stat(ep, &st) == -1)
    40                  return -1;
    41          if ((dirp = opendir(ep)) == NULL)
    42                  return -1;
    43          for (;;) {
    44                  endp = NULL;
    45                  if (readdir_r(dirp, &entry, &endp) == -1) {
    46                          closedir(dirp);
    47                          return -1;
    48                  }
    49                  if (endp == NULL)
    50                          break;
    51                  assert(endp == &entry);
    52                  if (strcmp(entry.d_name, ".") == 0 ||
    53                      strcmp(entry.d_name, "..") == 0)
    54                          continue;
    55                  if (curpath != NULL)
    56                          snprintf(ep, sizeof(ep), "%s/%s/%s", curpath,
    57                              path, entry.d_name);
    58                  else
    59                          snprintf(ep, sizeof(ep), "%s/%s", path,
    60                              entry.d_name);
    61                  if (stat(ep, &st) == -1) {
    62                          closedir(dirp);
    63                          return -1;
    64                  }
    65                  if (S_ISREG(st.st_mode) || S_ISDIR(st.st_mode)) {
    66                          printf("%c %s\n", S_ISDIR(st.st_mode) ? 'd' : 'f', ep);
    67                  }
    68                  if (S_ISDIR(st.st_mode) == 0)
    69                          continue;
    70                  if (curpath != NULL)
    71                          snprintf(p, sizeof(p), "%s/%s", curpath, path);
    72                  else
    73                          snprintf(p, sizeof(p), "%s", path);
    74                  snprintf(ep, sizeof(ep), "%s", entry.d_name);
    75                  ptree(p, ep);
    76          }
    77          closedir(dirp);
    78          return 0;
    79  }

With more than 80 lines, this looks a bit too complex for the simple task it does. It has to keep a lot of temporary state information around in the two ep[] and p[] buffers, and all the manual work of setting updating and maintaining this internal state is adding so much noise around the actual printf() statement at line 68 that it is almost too hard to understand what this particular bit of code is supposed to do.

The program still "works", in a way, so if you compile and run it, the expected results come up:

keramida@kobe:/home/keramida$ cc -O2 opendir-sample.c
keramida@kobe:/home/keramida$ ./a.out /tmp
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
keramida@kobe:/home/keramida$ ./a.out /tmp /etc/defaults
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
f /etc/defaults/rc.conf
f /etc/defaults/bluetooth.device.conf
f /etc/defaults/devfs.rules
f /etc/defaults/periodic.conf

But this program looks "ugly". Fortunately, the BSDs and Linux provide a more elegant interface for traversing file hierarchies: the fts(3) family of functions. A similar program that uses fts(3) to traverse the filesystem hierarchies rooted at the arguments of main() is:

% cat -n fts-sample.c
     1  #include <sys/types.h>
     3  #include <sys/stat.h>
     5  #include <err.h>
     6  #include <fts.h>
     7  #include <stdio.h>
     9  static int      ptree(char * const argv[]);
    11  int
    12  main(int argc, char * const argv[])
    13  {
    14          int rc;
    16          if ((rc = ptree(argv + 1)) != 0)
    17                  rc = 1;
    18          return rc;
    19  }
    21  static int
    22  ptree(char * const argv[])
    23  {
    24          FTS *ftsp;
    25          FTSENT *p, *chp;
    26          int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
    27          int rval = 0;
    29          if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL) {
    30                  warn("fts_open");
    31                  return -1;
    32          }
    33          /* Initialize ftsp with as many argv[] parts as possible. */
    34          chp = fts_children(ftsp, 0);
    35          if (chp == NULL) {
    36                  return 0;               /* no files to traverse */
    37          }
    38          while ((p = fts_read(ftsp)) != NULL) {
    39                  switch (p->fts_info) {
    40                  case FTS_D:
    41                          printf("d %s\n", p->fts_path);
    42                          break;
    43                  case FTS_F:
    44                          printf("f %s\n", p->fts_path);
    45                          break;
    46                  default:
    47                          break;
    48                  }
    49          }
    50          fts_close(ftsp);
    51          return 0;
    52  }

This version is not particularly smaller; it's only 34-35% smaller in LOC. It is, however, far more elegant and a lot easier to read:

  • By using a higher level interface, the program is shorter and easier to understand.
  • By using simpler constructs in the fts_read() loop, it very obvious what the program does for each file type (file vs. directory).
  • The FTS_COMFOLLOW flag sets up things for following symbolic links in one simple place (something entirely missing from the opendir version).
  • There are no obvious bugs about copying half of a pathname, or forgetting to recurse in some cases, or forgetting to print some directory because of a complex interaction between superfluous bits of code. Simpler is also less prone to bugs in this case.

So the next time you are about to build a filesystem traversal toolset from scratch, you can avoid all the pain (and bugs): use fts(3)! :-)

Posted in Computers, FreeBSD, GNU/Linux, Linux, NetBSD, Open source, OpenBSD, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, NetBSD, Open source, OpenBSD, Programming, Software

Spawning Fetchmail with a Minimal Environment

I often ran fetchmail in the background, in “daemon mode”, to keep fetching my email from multiple accounts and piping it all through the Sendmail instance running as the local MTA of my laptop.

But I don’t always remember to run fetchmail before launching GNOME or before “polluting” my shell’s environment with dozens of environment variables that may be either useless or even mildly dangerous for a long running process like fetchmail.

So I started making a habit out of starting fetchmail under a relatively minimalistic env(1) invocation like this one:

% env -i LANG='C' LC_ALL='C' \
    HOME="${HOME}" TERM='dumb' \
    MAIL="${MAIL}" TMPDIR="${TMPDIR:-/tmp}" \
    PATH='/bin:/usr/bin:/usr/local/bin' fetchmail -a -K -d 101

The env -i start of this command clears all environment variables from the current shell, and then I copy only a small set of environment values from the current shell to the env-subprocess that fetchmail will run under.

Now I feel much safer about a fetchmail process running hours or even days in the background. Even if I restart my desktop session a few times, or I decide to experiment a bit with alternate environments, I know that fetchmail will be running with a working and very minimal set of environment options.

Posted in Computers, Email, Free software, FreeBSD, GNU/Linux, Linux, Open source, Security, Software Tagged: Computers, Email, Free software, FreeBSD, GNU/Linux, Linux, Open source, Security, Software

Stalking Keramida

Quick tip for finding if keramida’s been active in a machine the last few days.

Run the command:

% ps xau | sed -n -e 1p -e /sed/d -e '/keramida.*emacs.*daemon/p'

Watch for activity in the Emacs daemon, and if you see any, well… you know that keramida is active :-)

Posted in Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software