Author Archives: keramida

Powerful Regular Expressions Combined with Lisp in Emacs

Regular expressions are a powerful text transformation tool. Any UNIX geek will tell you that. It’s so deeply ingrained into our culture, that we even make jokes about it. Another thing that we also love is having a powerful extension language at hand, and Lisp is one of the most powerful extension languages around (and of course, we make jokes about that too).

Emacs, one of the most famous Lisp applications today, has for a while now the ability to combine both of these, to reach entirely new levels of usefulness. Combining regular expressions and Lisp can do really magical things.

An example that I recently used a few times is parsing & de-humanizing numbers in dstat output. The output of dstat includes numbers that are printed with a suffix, like ‘B’ for bytes, ‘k’ for kilobytes and ‘M’ for megabytes, e.g.:

----system---- ----total-cpu-usage---- --net/eth0- -dsk/total- sda-
     time     |usr sys idl wai hiq siq| recv  send| read  writ|util
16-05 08:36:15|  2   3  96   0   0   0|  66B  178B|   0     0 |   0
16-05 08:36:16| 42  14  37   0   0   7|  92M 1268k|   0     0 |   0
16-05 08:36:17| 45  11  36   0   0   7|  76M 1135k|   0     0 |   0
16-05 08:36:18| 27  55   8   0   0  11|  67M  754k|   0    99M|79.6
16-05 08:36:19| 29  41  16   5   0  10| 113M 2079k|4096B   63M|59.6
16-05 08:36:20| 28  48  12   4   0   8|  58M  397k|   0    95M|76.0
16-05 08:36:21| 38  37  14   1   0  10| 114M 2620k|4096B   52M|23.2
16-05 08:36:22| 37  54   0   1   0   8|  76M 1506k|8192B   76M|33.6

So if you want to graph one of the columns, it's useful to convert all the numbers in the same unit. Bytes would be nice in this case.

Separating all columns with '|' characters is a good start, so you can use e.g. a CSV-capable graphing tool, or even simple awk scripts to extract a specific column. 'C-x r t' can do that in Emacs, and you end up with something like this:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66B| 178B|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1268k|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1135k|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 754k|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2079k|4096B|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 397k|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2620k|4096B|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1506k|8192B|  76M|33.6|

The leading and trailing '|' characters are there so we can later use orgtbl-mode, an awesome table editing and realignment tool of Emacs. Now to the really magical step: regular expressions and lisp working together.

What we would like to do is convert text like "408B" to just "408", text like "1268k" to the value of (1268 * 1024), and finally text like "67M" to the value of (67 * 1024 * 1024). The first part is easy:

M-x replace-regexp RET \([0-9]+\)B RET \1 RET

This should just strip the "B" suffix from byte values.

For the kilobyte and megabyte values what we would like is to be able to evaluate an arithmetic expression that involves \1. Something like "replace \1 with the value of (expression \1)". This is possible in Emacs by prefixing the substitution pattern with \,. This instructs Emacs to evaluate the rest of the substitution pattern as a Lisp expression, and use its string representation as the "real" substitution text.

So if we match all numeric values that are suffixed by 'k', we can use (string-to-number \1) to convert the matching digits to an integer, multiply by 1024 and insert the resulting value by using the following substitution pattern:

\,(* 1024 (string-to-number \1))

The full Emacs command would then become:

M-x replace-regexp RET \([0-9]+\)k RET \,(* 1024 (string-to-number \1)) RET

This, and the byte suffix removal, yield now the following text in our Emacs buffer:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 772096|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2128896|4096|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 406528|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2682880|4096|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1542144|8192|  76M|33.6|

Note: Some of the columns are indeed not aligned very well. We'll fix that later. On to the megabyte conversion:

M-x replace-regexp RET \([0-9]+\)M RET \,(* 1024 1024 (string-to-number \1)) RET

Which produces a version that has no suffixes at all:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  96468992|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  79691776|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  70254592| 772096|   0 |  103809024|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 118489088|2128896|4096|  66060288|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  60817408| 406528|   0 |  99614720|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 119537664|2682880|4096|  54525952|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  79691776|1542144|8192|  79691776|33.6|

Finally, to align everything in neat, pipe-separated columns, we enable M-x orgtbl-mode, and type "C-c C-c" with the pointer somewhere inside the transformed dstat output. The buffer now becomes something usable for pretty-much any graphing tool out there:

| time           | cpu | cpu | cpu | cpu | cpu | cpu |      eth0 |    eth0 |  disk |      disk | sda- |
| time           | usr | sys | idl | wai | hiq | siq |      recv |    send |  read |      writ | util |
| 16-05 08:36:15 |   2 |   3 |  96 |   0 |   0 |   0 |        66 |     178 |     0 |         0 |    0 |
| 16-05 08:36:16 |  42 |  14 |  37 |   0 |   0 |   7 |  96468992 | 1298432 |     0 |         0 |    0 |
| 16-05 08:36:17 |  45 |  11 |  36 |   0 |   0 |   7 |  79691776 | 1162240 |     0 |         0 |    0 |
| 16-05 08:36:18 |  27 |  55 |   8 |   0 |   0 |  11 |  70254592 |  772096 |     0 | 103809024 | 79.6 |
| 16-05 08:36:19 |  29 |  41 |  16 |   5 |   0 |  10 | 118489088 | 2128896 |  4096 |  66060288 | 59.6 |
| 16-05 08:36:20 |  28 |  48 |  12 |   4 |   0 |   8 |  60817408 |  406528 |     0 |  99614720 | 76.0 |
| 16-05 08:36:21 |  38 |  37 |  14 |   1 |   0 |  10 | 119537664 | 2682880 |  4096 |  54525952 | 23.2 |
| 16-05 08:36:22 |  37 |  54 |   0 |   1 |   0 |   8 |  79691776 | 1542144 |  8192 |  79691776 | 33.6 |

The trick of combining arbitrary Lisp expressions with regexp substitution patterns like \1, \2 ... \9 is something I have found immensely useful in Emacs. Now that you know how it works, I hope you can find even more amusing use-cases for it.

Update: The Emacs manual has a few more useful examples of \, in action, as pointed out by tunixman on Twitter.

Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software

Fixing Shifted-Arrow Keys in 256-Color Terminals on Linux

The terminfo entry for “xterm-256color" that ships by default as part of ncurses-base on Debian Linux and its derivatives is a bit annoying. In particular, shifted up-arrow key presses work fine in some programs, but fail in others. It's a bit of a gamble if Shift-Up works in joe, pico, vim, emacs, mutt, slrn, or what have you.

THis afternoon I got bored enough of losing my selected region in Emacs, because I forgot that I was typing in a terminal launched by a Linux desktop. SO I thought "what the heck... let's give the FreeBSD termcap entry for xterm-256color a try":

keramida> scp bsd:/etc/termcap /tmp/termcap-bsd
keramida> captoinfo -e $(                                  \
  echo $( grep '^xterm' termcap | sed -e 's/[:|].*//' ) |  \
  sed -e 's/ /,/g'                                         \
  ) /tmp/termcap  > /tmp/terminfo.src
keramida> tic /tmp/terminfo.src

Restarted my terminal, and quite unsurprisingly, the problem of Shift-Up keys was gone.

The broken xterm-256color terminfo entry from /lib/terminfo/x/xterm-256color is now shadowed by ~/.terminfo/x/xterm-256color, and I can happily keep typing without having to worry about losing mental state because of this annoying little misfeature of Linux terminfo entries.

The official terminfo database sources[1], also work fine. So now I think some extra digging is required to see what ncurses-base ships with. There's definitely something broken in the terminfo entry of ncurses-base, but it will be nice to know which terminal capabilities the Linux package botched.


Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software

Mutt-like Scrolling for Gnus

Mutt scrolls the index of email folders up or down, one line at a time, with the press of a single key: ‘<’ or ‘>’. This is a very convenient way to skim through email folder listings, so I wrote a small bit of Emacs Lisp to do the same in Gnus tonight.

;; Scrolling like mutt for group, summary, and article buffers.
;; Being able to scroll the current buffer view by one line with a
;; single key, rather than having to guess a random number and recenter
;; with `C-u NUM C-l' is _very_ convenient.  Mutt binds scrolling by one
;; line to '<' and '>', and it's something I often miss when working
;; with Gnus buffers.  Thanks to the practically infinite customizability
;; of Gnus, this doesn't have to be an annoyance anymore.

(defun keramida-mutt-like-scrolling ()
  "Set up '<' and '>' keys to scroll down/up one line, like mutt."
  ;; mutt-like scrolling of summary buffers with '<' and '>' keys.
  (local-set-key (kbd ">") 'scroll-up-line)
  (local-set-key (kbd "<") 'scroll-down-line))

(add-hook 'gnus-group-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-summary-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-article-prepare-hook 'keramida-mutt-like-scrolling)

This is now the latest addition to my ~/.gnus startup code, and we're one step closer to making Gnus behave like my favorite old-time mailer.

Filed under: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software Tagged: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software

Concatenating Video Files with MPlayer

I found out how to concatenate the parts of a video to a single file with MPlayer. It’s relatively easy, so this is just a mini-post to save it for posterity:

$ cat video_part1.avi video_part2.avi ... > temp.avi
$ mencoder -forceidx -oac copy -ovc copy \
    temp.avi -o final.avi && \
    rm temp.avi

Filed under: Computers, Free software, FreeBSD, Linux, Open source, Software Tagged: Computers, Free software, FreeBSD, hellug, Linux, Open source, Software

Unit Testing Uncovers Bugs

As part of the ‘utility’ library in one of the projects we are using at work, I wrote two small wrappers around strtol() and strtoul(). These two functions support a much more useful error reporting mechanism than the plain atoi() and atol() functions, but getting the error checking right in all the places they are called is a bit boring and cumbersome. This is probably part of the reason why there are still programs out there that use atoi() and atol().

For example here’s how I usually check for errors in calls to the strtol() and strtoul() functions:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &amp;endp, base);
if (errno != 0 || (endp != NULL && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
/* At this point 'x' contains the parsed value. */

This is a lot of code for parsing a single long value. For one or two input strings it may be ok to repeat the code in the places where the numeric parsing code is needed. For more than a couple of input strings it really feels boring to repeat this code again and again.

When I set out to write the wrapper code for strtol() and strtoul() my goal was to make it very easy to parse input strings. A typical call to the parsing function should be a single line of code; it should be very clear if the parsing attempt succeeded or failed; it should also be possible to get both the parsing success or failure and the numeric value we just parsed; it should also be possible to get hold of the last character we managed to parse, so that strings like "100 200 300" can be parsed efficiently without having to manually find where the textual representation of the first number ends or the second one starts.

That's quite a list of goals for a single function, but the function call style I envisioned looked something like this:

long value;
char *endp = NULL;

if (parselong("0x12345678", &endp, 16, &value) != 0) {
        err(1, "parse error");

The return value of parselong() makes it very clear if the parsing attempt succeeded or failed. A return value of zero means success. Any other return value means failure.

The parsed value is returned through the &value pointer. If the parsing attempt has failed parselong() can leave the value unmodified to avoid inflicting spurious side-effects to its calling code because of a failed attempt to parse an input string.

If the parsing attempt has succeeded, &endp may be set to point right after the last character that was successfully parsed. This is actually part of the documented interface of strtol() and strtoul(), so it comes for free by wrapping these functions.

Finally, parsing a long value is a single function call. It is a lot easier to call the parsing function without having to repeat all the error checking boilerplate at each calling site. It's even easy to "chain" multiple parsing attempts using a style similar to:

long value1, value2, value3;

if (parselong("0x12345678", NULL, 16, &value1) != 0 ||
    parselong("0xdeadbeef", NULL, 16, &value2) != 0 ||
    parselong("0xf00fc0de", NULL, 16, &value3) != 0)
        err(1, "parse error");

Not that this is a good style of reporting errors, but it is possible, just because it's now easy to parse a value and check if it was parsed correctly with a single line of code.

The Unit Tests Fail on Linux

Several months passed after I wrote the initial parselong() and parseulong() functions. In the meantime I had to port the program using them to other platforms. The initial target platform was FreeBSD.

This is a bug that lurked for a few months in the initial code of parselong() until I had to port the function to another platform and started writing unit tests to verify that it works the way I expected it to work on all possible systems. In retrospect I should have started by writing the unit tests, but that's something I can say now because I finally got around to doing it and they did serve a very useful purpose.

When I had to port my 'utility' functions to work on several Linux versions too, I wrote a collection of unit tests for parselong() and parseulong(). The testing framework I used was CUnit because of the way it nicely integrates with plain ANSI C code.

One of the test functions I wrote was supposed to check for failures returned by parselong() for invalid input strings. The bulk of the test function was something like this:

#include "CUnit/Basic.h"

        long value = TEST_VALUE_ULONG_MAGIC;

        CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value), -1);

        CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1);

        CU_ASSERT_EQUAL(parselong("-", NULL, 0, &value), -1);
        CU_PASS("parselong() failures for invalid values look ok");

Running the unit tests on FreeBSD seemed to work fine. After all the initial version of the parselong() function had been manually tested with the same input strings earlier.

When I tried running the same test cases on Linux though, they failed. Apparently parselong() was not detecting that strtol() failed to parse the input string "xxx" or any other input strings from the ones tested in the test_parselong_failures() function!

The Bug Uncovered

Adding a couple of debugging printf() calls to parselong() itself showed that on Linux parselong() was returning zero for invalid input strings when strtol() could parse no character at all from the input string.

The initial version of the error checking code for strtol() was similar to:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &endp, base);
if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
/* At this point 'x' contains the parsed value. */

The highlighted part (endp != str) of the error checking code assumes that strtol() will move the 'endp' pointer at least one character after the start of the input string. Apparently on Linux this is not the case. The strtol() function of Linux does not move 'endp' at all if it cannot parse even a single character of the input string. This seems to be the correct behavior for strtol(), but it was hidden for a while, lurking in the original parselong() code, until I ran the unit tests of the function on Debian GNU/Linux.

The CUnit driver program that I used to run the test cases failed on Linux with error messages like:

  1. test_parselong.c:63  - CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value),-1)
  2. test_parselong.c:64  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)
  3. test_parselong.c:66  - CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1)
  4. test_parselong.c:67  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)

The culprit for these test case failures was the assumption that Linux would set errno to a non-zero value for an invalid input string... Apparently, it doesn't. The following small program prints different output on BSD vs. Linux:

$ cat -n strtest.c
     1  #include <errno.h>
     2  #include <limits.h>
     3  #include <stdio.h>
     4  #include <stdlib.h>
     6  int
     7  main(void)
     8  {
     9          long value;
    10          const char *input = "xxx";
    11          char *endp = NULL;
    13          errno = 0;
    14          value = strtol(input, &endp, 0);
    15          printf("str = %p = \"%s\"\n", input, input);
    16          printf("endp = %p \"%s\"\n", endp, endp ? endp : "(null)");
    17          if (endp != NULL) {
    18                  printf("endp[0] = '%c' (%d 0%03o #x%02x)\n",
    19                    *endp, *endp, *endp, *endp);
    20          }
    21          printf("errno = %d\n", errno);
    22          printf("value = %ld 0%lo #x%lx\n", value, value, value);
    23          return EXIT_SUCCESS;
    24  }

On FreeBSD the output of this program includes an errno value of EINVAL:

freebsd$ cc strtest.c
freebsd$ ./a.out
str = 0x8048604 = "xxx"
endp = 0x8048604 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 22
value = 0 00 #x0
freebsd$ fgrep 22 /usr/include/sys/errno.h
#define EINVAL          22              /* Invalid argument */

On a recent update of Debian GNU/Linux "testing" the output is slightly different:

debian$ cc strtest.c
debian$ ./a.out
str = 0x8048630 = "xxx"
endp = 0x8048630 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 0
value = 0 00 #x0

This means that the only indication we have that the Linux version of strtol() failed to parse some of the input text is the value of 'endp': it's the same as the input string. The error-checking code of the original parselong() wrapper was:

        x = strtol(str, &endp, base);
        if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
            (isdigit(*endp) != 0 || isspace(*endp) == 0)))

But on Linux both of the following are true:

  • errno is not set to a non-zero value.
  • If strtol() could not parse even one input character, endp == str.

This caused parselong() to bypass the error checking code, and try to return a 'valid' result even tough the Linux strtol() version has failed. Hence the failure of the unit tests.

Removing the (endp != str) conditional expression means that the error checking code works equally well on Linux and BSD. The BSD version of strtol() returns a non-zero errno value, triggerring the first part of the error checking code. The Linux version returns an endp pointer that is non-null and fails the '\0' check later on. The new parselong() function is slightly shorter and it passes the unit tests on both BSD and Linux.


There is something thrilling about fixing bugs by removing code. This bug was one of the few cases I've come across during the last couple of months where removing code was an improvement. There's probably a joke about "writing too much code" and the bug-resolving debt each line of new code introduces. I think I'll leave that for another time though.

The most important conclusion of today's bug hunting session was that Unit Testing really does work and it pays back in real, quite tangible ways. Had I not spent a bit of time to think about what the parselong() and parseulong() functions are supposed to do, when they are supposed to fail and how they are allowed to fail, I would not spent the time to write test cases for them. Had I not written the test cases, I wouldn't notice there is a failing test case on Linux. Had I not seen that I wouldn't realize some times the two functions were returning completely bogus results on Linux systems.

The central place the unit testing code has in this story is an important and serious lesson for me:


Filed under: Computers, FreeBSD, GNU/Linux, Linux, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, Programming, Software, testing

Mercurial Clones without a Working Copy

Mercurial repository clones can have two parts:

  1. An .hg/ subdirectory, where all the repository metadata is stored
  2. A "working copy" area, where checked out files may live

The .hg/ subdirectory stores the repository metadata of the specific clone, including the history of all changesets stored in the specific clone, clone-specific hooks and scripts, information about local tags and bookmarks, and so on. This is the only part of a Mercurial repository that is actually mandatory for a functional repository.

The "working copy" area is everything under the clone that is not under the toplevel .hg/ subdirectory of the particular clone. The working area of each Mercurial repository may contain a snapshot of the files stored in the repository: either a clean snapshot, checked out from one of the changesets stored in the repository itself, or a locally modified version of a changeset.

One important detail that may not be apparent from the descriptions above is that:

Even if you have already checked out a particular version, you can delete everything except the .hg/ subdirectory and the Mercurial repository will still function normally.

Clones Without a Working Copy

An example is a good way to demonstrate how a clone still functions as a Mercurial repository without a working copy. Let's assume that you have a tiny repository at /tmp/hgdemo that contains revisions of just a small hello.c program:

% pwd
% hg root
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c

% hg manifest tip

You can check-out a copy of the latest file revision of hello.c with the "hg checkout" command:

% hg checkout --clean tip
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% cat -n hello.c
     1  #include <stdio.h>
     2  #include <stdlib.h>
     4  int
     5  main(void)
     6  {
     7      printf("Hello world\n");
     8      return EXIT_SUCCESS;
     9  }

The repository does not need a checkout to function though. The fact that your working copy has been updated to a particular revision is independent of the way the repository machinery under .hg/ works. So you can remove the source of hello.c and still use the repository to browse the history of the project:

% rm -f hello.c
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c


With a clone like this it is still possible to use any Mercurial command that does not require a working copy, e.g. "hg diff" to look at the differences between two arbitrary revisions:

% hg diff -r 0:1
diff -r 041227edc91b -r c48ee3a9fd78 hello.c
--- a/hello.c   Mon Jan 11 08:32:59 2010 +0200
+++ b/hello.c   Mon Jan 11 08:33:28 2010 +0200
@@ -1,8 +1,9 @@
 #include <stdio.h>
+#include <stdlib.h>
     printf("Hello world\n");
-    return 0;
+    return EXIT_SUCCESS;

You can even checkout the "null" revision (a magic revision name which Mercurial treats as "not any revision stored in this repository"):

% hg checkout --clean null
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
% hg identify --id --branch
000000000000 default

When a Mercurial clone has checked out the null revision all the tracked files of the working copy are removed. If the clone does not already contain build-time artifacts you should only see the .hg/ subdirectory when you look at the clone:

% find . -maxdepth 2 -exec /bin/ls -1 -dF {} +

The disk space such clone requires is limited by the size of the history metadata.

Why Would You Want Such a Clone

For a small repository like the one shown in this example, it seems pretty useless to be able to have a Mercurial clone without a working copy. You don't really gain much by deleting the source of a small 9-line C program. The space savings of doing that are quite insignificant.

If you are, however, hosting clones of large repositories in a web server somewhere, stripping the working copy of Mercurial clones may be very handy indeed and it may save you a large part of the disk space you would need to keep working copies around. By "large repository" I mean something like a single clone with several hundreds or thousands of files, or a clone whose working copy requires tens or hundreds of megabytes of data.

The OpenSolaris onnv-gate repository is one of the large repositories that use Mercurial. My own Mercurial-based mirror of the FreeBSD head branch is another example for which I readily have size data. Size information for these two repositories is shown in the table below:

  FreeBSD head/ branch
since 2008-01-01
onnv-gate repository
Tracked files 41.807 44.784
Changesets 15.513 11.462
Size of .hg store 238 MB 292 MB
Size of working copy 385 MB 543 MB

Both of these Mercurial repositories have a moderately large number of files. It's also important that the size of the working copy exceeds the size of the .hg/ repository store in both cases. In the onnv-gate repository of OpenSolaris the working copy needs almost twice as much as the entire history of the project. That's a lot of disk space to carry around in all your local clones of onnv-gate!

If all you are looking for is a local mirror of the project sources — so that you can look at the history of a project, browse the diffs committed over time, search for interesting commit information (e.g. "when was bug 6801336 fixed in OpenSolaris?") — carrying around a full working copy is probably a waste of space. Updating the files of the working copy after every pull operation from the upstream master-repository is a waste of time too.

Posted in Computers, Free software, FreeBSD, Mercurial, Open source, Programming, SCM, Software Tagged: Computers, Free software, FreeBSD, hellug, Mercurial, Open source, Programming, SCM, Software


Earlier tonight, on December 7 2009, a friend and me booked our flight tickets for FOSDEM 2010. I am really excited that I am going to attend another open source & free software conference. It has been a while since I had a chance to meet with other BSD people. The last time was in Milan, in EuroBSDCon 2006. It will certainly be tons of fun to meet in person with other free and open source fans, contributors and developers!


FOSDEM is an open conference, organized every year by volunteers to promote the widespread use of Free Software and Open Source Software. It takes place in the beautiful city of Brussels (Belgium). FOSDEM meetings are recognized as “The best Free Software and Open Source events in Europe”.

This year’s FOSDEM will take place on February 6-7 of 2010. The web page of the conference is already online at Updates about the organization of the conference, transportation tips, accommodation options, the dev rooms and talks available this year, and any other bits that may be useful to attendees are often posted there by the organizing team. So if you are planning to attend, add this link to your bookmarks and keep up with the news until we meet in Brussels.

More Updates

That’s all for now. I will be posting more details about the conference and my trip to Belgium as they become available.

Posted in Computers, Conferences, Free software, FreeBSD, GNU/Linux, Open source, Programming, Software Tagged: Computers, Conferences, Free software, FreeBSD, GNU/Linux, hellug, Open source, Programming, Software

Be Careful With SHELL=/usr/local/bin/bash

One of the things I often do on FreeBSD machines is to install shells/mksh or shells/bash and work most of the time with a bourne-compatible shell.

The default /bin/csh shell is mostly ok for short interactive sessions, but I can't stand its command-syntax for semi-complex looping, iteration or other combined commands. So install bash or mksh and I launch one of them from my login prompt, using something like:

csh# env SHELL=/usr/local/bin/bash /usr/local/bin/bash -l

There's a minor catch with SHELL being set to /usr/local/bin/bash though. The default binary of bash is dynamically linked, and it depends on from the devel/gettext package. This means that if you happen to run a package update command that rebuilds gettext from source, there is a small period between the time the old gettext is uninstalled and the new version is installed that the following are all true:

  • Your current SHELL points to bash
  • The bash binary needs to run, so it (temporarily) fails with a runtime linker error
  • The "configure" script of the gettext sources thinks that configure-time checks can use your current SHELL
  • Boom!...

So you cannot upgrade gettext, and your current shell needs it to run. Any other packages that depend on gettext cannot be upgraded either. Not a very nice corner to paint yourself into...

There are, however, at least three options to recover from a mess like this:

  1. Run a shell that is statically linked version of bash. The shells/bash port of FreeBSD can build a statically linked version of the shell when WITH_STATIC_BASH is set at build-time.
  2. Use /bin/csh or shells/mksh as a temporary shell to reinstall the broken packages, e.g. gettext and any other package that depends on it.
  3. Set CONFIG_SHELL to a shell that works even without gettext (the /bin/sh shell of FreeBSD should work fine for this), or to one that has minimal library dependencies (the /usr/local/bin/mksh shell only depends on on FreeBSD, so it should work fine too).

Posted in Computers, Free software, FreeBSD, Software Tagged: Computers, Free software, FreeBSD, hellug, Software

Using the Condensed DejaVu Font-Family Variant by Default

The DejaVu font family is a very popular font collection for Linux and BSD systems. The font package of DejaVu includes a condensed variant; a variation of the same basic font theme that sports narrower characters.

The difference between the two font variants is very easy to spot when they are displayed side by side. The following image shows a small part of a Firefox window, displaying news articles as part of a Google Reader session:

DejaVu Font Family Variants

The window snapshot on the left of this image shows the normal variant of the DejaVu Sans font family. The right-hand snapshot shows the condensed variant of the font family.

I usually to prefer the condensed variant for the display of text on a computer monitor. The narrower characters, with a height that is slightly larger than the width of each glyph, are more æsthetically pleasing for my eyes. Naturally, this is only a matter of personal preference; the normal variant may look and feel more pleasing to some other person. If your own preference leans towards the normal variant of the font, this article may not be very interesting to you, so it is probably ok if you stop reading now.

If you are one of those people who like the condensed variant of the DejaVu font family more than the normal variant though, by all means, keep reading. The “hack” I am going to describe uses a configuration tweak of the fontconfig package to forcibly replace all instances of the DejaVu Sans and the DejaVu Serif fonts with their condensed version.

The Main Problem with Firefox and Condensed Fonts

An interaction between fontconfig and the way some programs select font variants means that Firefox, OpenOffice and a few other programs cannot display the condensed variant in their GTK+ font selection dialog. Firefox, for example, shows only “DejaVu Sans”:

Firefox Font Selection Dialog

As a result, it is impossible to use the font selection dialog of Firefox to configure the condensed font variant as the default font for web content. This kept annoying me for a while, but not enough to actually do something about it. I always thought it would be much better if we could select either font variant, but kept saying to myself that “this may eventually be fixed”. Since this is a long standing bug that has not been fixed in Firefox or the other programs that exhibit the same misbehavior, I decided this morning to forcefully substitute all instances of DejaVu Sans with DejaVu Sans Condensed and all instances of DejaVu Serif with DejaVu Serif Condensed in my FreeBSD/Gnome desktop.

What Could Work but is Not a Good Idea

One way to do this is, of course, by manually replacing the TrueType font files of the non-condensed font variants with their condensed counterparts (e.g. by logging in as the system administrator and overwriting the non-condensed font files). I didn’t want to go that way. Manually modifying the installed versions of files registered into the package database is a bad idea and very ugly hack, because it will stop working the next time the same package is installed. So I chose to read a bit more about fontconfig and see if I could do the same without all the smelly hackery of overwriting font files.

Using Fontconfig Might be a Better Idea

Fontconfig is a library that enables system-wide and per-user configuration, customization and application access to font files. The personal fontconfig configuration of each user is stored in a file called .fonts.conf, in the home directory of each user. The format of this file is defined by a relatively "simple" XML schema. Some of the options supported by the fontconfig schema are described in the fontconfig manual. The quotes around "simple" are there because once you see a few examples of fontconfig tweaks, it is not very hard to come up with similar configuration tweaks, but the precise format and syntax of all the options supported by the syntax is, alas, not very intuitive.

When the .fonts.conf exists in your home directory it has the following general format:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

Font options that apply to your personal fontconfig setup go inside the <fontconfig>...</fontconfig> element. There are many sorts of options that can be placed inside this XML element, but it is not the intent of this article to describe all of them. For a full list of the options, you should read the fontconfig manual.

There are only three fontconfig options that we are interested in to install the condensed font tweak: <match>, <test> and <edit>. Using these three, we can forcefully replace all uses of the DejaVu fonts with their condensed versions, by adding the following XML snippet to ~/.fonts.conf:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">

  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Sans</string>
    <edit mode="assign" name="family">
      <string>DejaVu Sans Condensed</string>

  <match target="pattern">
    <test name="family" qual="any">
      <string>DejaVu Serif</string>
    <edit mode="assign" name="family">
      <string>DejaVu Serif Condensed</string>

XML is a very verbose and chatty format, but what these two small snippers of XML configuration do should be easy to understand:

  • When an application asks for a font whose family matches "DejaVu Sans", return a font from the "DejaVu Sans Condensed" variant.
  • When an application asks for a font whose family matches "DejaVu Serif", return a font from the "DejaVu Serif Condensed" variant.

That's it. Now every time an application asks for a font of the DejaVu family, fontconfig will always return a condensed variant of the font. The font-variant replacement is done transparently by fontconfig, so you don't have to configure each application separately, or to install application-specific hacks that will work for one application but fail or be invisible to all other programs running on your desktop.

The downside of this forceful font-variant replacement is that it is now impossible to select the non-condensed variants of the DejaVu fonts. The good thing is that, at least in my case, this is "ok" and I can certainly live with it, because I always prefer the condensed variant for this particular font family.

Posted in Computers, Free software, FreeBSD, GNOME, GNU/Linux, Linux, Open source, Software Tagged: Computers, Free software, FreeBSD, GNOME, GNU/Linux, hellug, Linux, Open source, Software

fts(3) or Avoiding to Reinvent the Wheel

One of the C programs I was working on this weekend had to find all files that satisfy a certain predicate and add them to a list of “pending work”. The first thing that comes to mind is probably a custom opendir(), readdir(), closedir() hack. This is probably ok when one only has these structures, but it also a bit cumbersome. There are various sorts of DIR and dirent structures and recursing down a large path requires manually keeping track of a lot of state.

A small program that uses opendir() and its friends to traverse a hierarchy of files and print their names may look like this:

% cat -n opendir-sample.c
     1  #include <sys/types.h>
     3  #include <sys/stat.h>
     5  #include <assert.h>
     6  #include <dirent.h>
     7  #include <limits.h>
     8  #include <stdio.h>
     9  #include <string.h>
    11  static int      ptree(char *curpath, char * const path);
    13  int
    14  main(int argc, char * const argv[])
    15  {
    16          int k;
    17          int rval;
    19          for (rval = 0, k = 1; k < argc; k++)
    20                  if (ptree(NULL, argv[k]) != 0)
    21                          rval = 1;
    22          return rval;
    23  }
    25  static int
    26  ptree(char *curpath, char * const path)
    27  {
    28          char ep[PATH_MAX];
    29          char p[PATH_MAX];
    30          DIR *dirp;
    31          struct dirent entry;
    32          struct dirent *endp;
    33          struct stat st;
    35          if (curpath != NULL)
    36                  snprintf(ep, sizeof(ep), "%s/%s", curpath, path);
    37          else
    38                  snprintf(ep, sizeof(ep), "%s", path);
    39          if (stat(ep, &st) == -1)
    40                  return -1;
    41          if ((dirp = opendir(ep)) == NULL)
    42                  return -1;
    43          for (;;) {
    44                  endp = NULL;
    45                  if (readdir_r(dirp, &entry, &endp) == -1) {
    46                          closedir(dirp);
    47                          return -1;
    48                  }
    49                  if (endp == NULL)
    50                          break;
    51                  assert(endp == &entry);
    52                  if (strcmp(entry.d_name, ".") == 0 ||
    53                      strcmp(entry.d_name, "..") == 0)
    54                          continue;
    55                  if (curpath != NULL)
    56                          snprintf(ep, sizeof(ep), "%s/%s/%s", curpath,
    57                              path, entry.d_name);
    58                  else
    59                          snprintf(ep, sizeof(ep), "%s/%s", path,
    60                              entry.d_name);
    61                  if (stat(ep, &st) == -1) {
    62                          closedir(dirp);
    63                          return -1;
    64                  }
    65                  if (S_ISREG(st.st_mode) || S_ISDIR(st.st_mode)) {
    66                          printf("%c %s\n", S_ISDIR(st.st_mode) ? 'd' : 'f', ep);
    67                  }
    68                  if (S_ISDIR(st.st_mode) == 0)
    69                          continue;
    70                  if (curpath != NULL)
    71                          snprintf(p, sizeof(p), "%s/%s", curpath, path);
    72                  else
    73                          snprintf(p, sizeof(p), "%s", path);
    74                  snprintf(ep, sizeof(ep), "%s", entry.d_name);
    75                  ptree(p, ep);
    76          }
    77          closedir(dirp);
    78          return 0;
    79  }

With more than 80 lines, this looks a bit too complex for the simple task it does. It has to keep a lot of temporary state information around in the two ep[] and p[] buffers, and all the manual work of setting updating and maintaining this internal state is adding so much noise around the actual printf() statement at line 68 that it is almost too hard to understand what this particular bit of code is supposed to do.

The program still "works", in a way, so if you compile and run it, the expected results come up:

keramida@kobe:/home/keramida$ cc -O2 opendir-sample.c
keramida@kobe:/home/keramida$ ./a.out /tmp
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
keramida@kobe:/home/keramida$ ./a.out /tmp /etc/defaults
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
f /etc/defaults/rc.conf
f /etc/defaults/bluetooth.device.conf
f /etc/defaults/devfs.rules
f /etc/defaults/periodic.conf

But this program looks "ugly". Fortunately, the BSDs and Linux provide a more elegant interface for traversing file hierarchies: the fts(3) family of functions. A similar program that uses fts(3) to traverse the filesystem hierarchies rooted at the arguments of main() is:

% cat -n fts-sample.c
     1  #include <sys/types.h>
     3  #include <sys/stat.h>
     5  #include <err.h>
     6  #include <fts.h>
     7  #include <stdio.h>
     9  static int      ptree(char * const argv[]);
    11  int
    12  main(int argc, char * const argv[])
    13  {
    14          int rc;
    16          if ((rc = ptree(argv + 1)) != 0)
    17                  rc = 1;
    18          return rc;
    19  }
    21  static int
    22  ptree(char * const argv[])
    23  {
    24          FTS *ftsp;
    25          FTSENT *p, *chp;
    26          int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
    27          int rval = 0;
    29          if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL) {
    30                  warn("fts_open");
    31                  return -1;
    32          }
    33          /* Initialize ftsp with as many argv[] parts as possible. */
    34          chp = fts_children(ftsp, 0);
    35          if (chp == NULL) {
    36                  return 0;               /* no files to traverse */
    37          }
    38          while ((p = fts_read(ftsp)) != NULL) {
    39                  switch (p->fts_info) {
    40                  case FTS_D:
    41                          printf("d %s\n", p->fts_path);
    42                          break;
    43                  case FTS_F:
    44                          printf("f %s\n", p->fts_path);
    45                          break;
    46                  default:
    47                          break;
    48                  }
    49          }
    50          fts_close(ftsp);
    51          return 0;
    52  }

This version is not particularly smaller; it's only 34-35% smaller in LOC. It is, however, far more elegant and a lot easier to read:

  • By using a higher level interface, the program is shorter and easier to understand.
  • By using simpler constructs in the fts_read() loop, it very obvious what the program does for each file type (file vs. directory).
  • The FTS_COMFOLLOW flag sets up things for following symbolic links in one simple place (something entirely missing from the opendir version).
  • There are no obvious bugs about copying half of a pathname, or forgetting to recurse in some cases, or forgetting to print some directory because of a complex interaction between superfluous bits of code. Simpler is also less prone to bugs in this case.

So the next time you are about to build a filesystem traversal toolset from scratch, you can avoid all the pain (and bugs): use fts(3)! :-)

Posted in Computers, FreeBSD, GNU/Linux, Linux, NetBSD, Open source, OpenBSD, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, NetBSD, Open source, OpenBSD, Programming, Software

What keramida said…

As pleasures go, it is a strange yet somewhat refined one to see a project one has started pick up speed. My fellow translators at the Greek documentation team of FreeBSD have been busy lately, and the result of our collective work is a fairly large number of commits to the “doc-el” repository.

There are now at least four translators actively working on a chapter of their own: Manolis Kiagias, Vaggelis Typaldos, Kyriakos Kentrotis and me. Changesets flow between our repository clones almost every day, and I often find myself pulling patches from two or three places at the same time.

This morning I picked up patches from both Kyriakos and Manolis. Manolis had already integrated with Vaggelis, so pulling from him I also got the translations of Vaggelis. In the meantime, my nightly cron job had finished importing a new snapshot from the official CVS tree, so today’s history graph looks scary:

Greek FreeBSD Translations: Commit History of 28 June 2009

The “surface complexity” of a change history like this may seem scary, but to me it is nothing of the sort. It is, in fact, quite the opposite: something to be proud and happy about, because it shows a lively team, working steadily towards our common goal—a fully translated doc/ tree with a translated, accessible version of the FreeBSD Handbook for Greek users.

It really makes me very happy to see an effort started several years ago gain momentum. My own personal commits are far less than those of the other translators now, and I often find myself in the role of a “patch integrator” instead of actively translating new text. But this is ok, because now we have more people working on the translations so we still get many improvements every day :-)

Posted in Computers, Free software, FreeBSD, Mercurial, Open source, Software Tagged: Computers, Free software, FreeBSD, hellug, Mercurial, Open source, Software

What keramida said…

Looking at the search terms that people used to reach this weblog, I noticed that one of the most popular posts of all time is the “Contributing to FreeBSD” post of Feb 2009.

Search terms for this weblog

This is fantastic! I didn’t realize readers of this weblog would like the particular post so much, but I am extremely pleased you did!

Posted in Computers, Free software, FreeBSD, Open source, Software Tagged: Computers, Free software, FreeBSD, hellug, Open source, Software

Announcement: FreeBSD 7.2 release

The FreeBSD 7.2 release is out!

Ken Smith has posted the official email announcement of 7.2-RELEASE earlier today:

This is the third minor release in the 7.X branch of FreeBSD development, and it includes many bug fixes, improvements, and a few nice new features:

  • Support for fully transparent use of superpages for application memory
  • The kernel virtual address space has been bumped to 6 GB on amd64 systems. This improves performance for large kmem consumers, i.e. the zfs(8) adaptive replacement cache (arc)
  • Support for multiple IPv4 and IPv6 addresses for jails
  • The integrated csup(1) source update utility now support CVSmode, making it possible to ftech full CVS mirrors of remote CVS repositories
  • GNOME has been updated to version 2.26 and KDE to version 4.2.2
  • The sparc64 build now supports UltraSPARC-III processors

Many other updates to userland tools have been backported from the CURRENT branch. For a full list of the improvements and bug fixes see the 7.2 release notes

The full release notes and last minute errata are available online at:

The 7.2 release marks another important milestone in the 7.X release series. It is the continuation of the excellent 7.1 release. We hope that the improvements of the 7.2 release, especially the jail and superpage support, will make it an even more attractive platform for our userbase.

Happy upgrading and enjoy the latest FreeBSD 7.X release!

Posted in Computers, Free software, FreeBSD, Open source, Software Tagged: Computers, Free software, FreeBSD, Open source, Software

Spawning Fetchmail with a Minimal Environment

I often ran fetchmail in the background, in “daemon mode”, to keep fetching my email from multiple accounts and piping it all through the Sendmail instance running as the local MTA of my laptop.

But I don’t always remember to run fetchmail before launching GNOME or before “polluting” my shell’s environment with dozens of environment variables that may be either useless or even mildly dangerous for a long running process like fetchmail.

So I started making a habit out of starting fetchmail under a relatively minimalistic env(1) invocation like this one:

% env -i LANG='C' LC_ALL='C' \
    HOME="${HOME}" TERM='dumb' \
    MAIL="${MAIL}" TMPDIR="${TMPDIR:-/tmp}" \
    PATH='/bin:/usr/bin:/usr/local/bin' fetchmail -a -K -d 101

The env -i start of this command clears all environment variables from the current shell, and then I copy only a small set of environment values from the current shell to the env-subprocess that fetchmail will run under.

Now I feel much safer about a fetchmail process running hours or even days in the background. Even if I restart my desktop session a few times, or I decide to experiment a bit with alternate environments, I know that fetchmail will be running with a working and very minimal set of environment options.

Posted in Computers, Email, Free software, FreeBSD, GNU/Linux, Linux, Open source, Security, Software Tagged: Computers, Email, Free software, FreeBSD, GNU/Linux, Linux, Open source, Security, Software

GNOME updated to 2.26 in FreeBSD Ports

The FreeBSD GNOME ports have been updated to GNOME 2.26.

A recent update of my laptop pulled in the new GNOME version:

FreeBSD GNOME 2.26 Screenshot

Version 2.26 of the GNOME desktop includes many updates:

  • Improved support for multiple monitors
  • Many enhancements to CD burning
  • Better support for live messaging (text, or audio & video) in the Empathy messaging client
  • And lots more…

For the full list of changes, please refer to the GNOME 2.26 release notes at:

Posted in Computers, Free software, FreeBSD, GNOME, Open source, Software Tagged: Computers, Free software, FreeBSD, GNOME, Open source, Software

Slowly but Steadily Getting There

The Greek FreeBSD translation team has been working on and off on the Greek translation of the FreeBSD documentation set for a long time now. We started getting a lot of commit actions when Manolis joined the team, and he is now an undisputed “overlord” of the Greek Handbook. He gates patches from other contributors, and has submitted far more text than me at this point:

$ pwd
$ hg churn
ncvs                                627371 ******************************************
sonicy                              114351 *************
keramida                             23019 ***

We have Greek versions of more than half of the English documentation tree, and there are 18 files marked for update with special comments in the source of each file:

$ find en_US.ISO8859-1 -type f | wc -l
$ find el_GR.ISO8859-7 -type f | wc -l
$ find el_GR.ISO8859-7 | checkupdate -c | nl
     1  1.39       -> 1.60       el_GR.ISO8859-7/articles/Makefile
     2  1.49       -> 1.51       el_GR.ISO8859-7/articles/new-users/article.sgml
     3  1.43       -> 1.61       el_GR.ISO8859-7/articles/problem-reports/article.sgml
     4  1.6        -> 1.19       el_GR.ISO8859-7/articles/releng-packages/article.sgml
     5  1.18       -> 1.19       el_GR.ISO8859-7/articles/releng/Makefile
     6  1.48       -> 1.81       el_GR.ISO8859-7/articles/releng/article.sgml
     7  1.1103     -> 1.1110     el_GR.ISO8859-7/books/faq/book.sgml
     8  1.414      -> 1.421      el_GR.ISO8859-7/books/handbook/advanced-networking/chapter.sgml
     9  1.1        -> NONE       el_GR.ISO8859-7/books/handbook/appendix.decl
    10  1.1        -> NONE       el_GR.ISO8859-7/books/handbook/chapter.decl
    11  1.233      -> 1.236      el_GR.ISO8859-7/books/handbook/config/chapter.sgml
    12  1.288      -> 1.290      el_GR.ISO8859-7/books/handbook/disks/chapter.sgml
    13  1.73       -> 1.74       el_GR.ISO8859-7/books/handbook/mac/chapter.sgml
    14  1.1        -> 1.116      el_GR.ISO8859-7/books/handbook/network-servers/chapter.sgml
    15  1.323      -> 1.334      el_GR.ISO8859-7/books/handbook/security/chapter.sgml
    16  1.45       -> 1.46       el_GR.ISO8859-7/books/handbook/vinum/chapter.sgml
    17  NO-%SRCID% -> 1.34       el_GR.ISO8859-7/share/sgml/glossary/freebsd-glossary.sgml
    18  1.43       -> 1.44       el_GR.ISO8859-7/share/sgml/trademarks.ent

Tonight, I think I am going to work on bringing the “new-users” and “problem-reports” articles up to date. Here’s to hope that the “checkupdate” output list will be shorter tomorrow :)

Posted in Computers, Free software, FreeBSD, FreeBSD people, Open source, Software Tagged: Computers, Free software, FreeBSD, FreeBSD people, Open source, Software

Stalking Keramida

Quick tip for finding if keramida’s been active in a machine the last few days.

Run the command:

% ps xau | sed -n -e 1p -e /sed/d -e '/keramida.*emacs.*daemon/p'

Watch for activity in the Emacs daemon, and if you see any, well… you know that keramida is active :-)

Posted in Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Linux, Open source, Software

Contributing to FreeBSD

As part of the FreeBSD team, I often get asked the same question: “How can I get started as a FreeBSD contributor?”

There are usually two reasons why a new contributor feels overwhelmed by the idea of getting started. One of them is that he or she feels that it is difficult to find out exactly how to start contributing to a free software project. The second reason is usually a feeling of impotency, the notion that “I am such a newbie, how could I ever make a difference in such a large project?”

Both of these concerns can be addressed quite easily. This post is my attempt at recording what I have learned by being a part of the FreeBSD team for almost a decade now, so let me start by the most serious one of these two obstacles to becoming a FreeBSD contributor: the feeling of being too small to make a difference.


You Can Make a Difference!

The best response I can think to the idea that a new contributor is too small to do something important for FreeBSD is a short story by one of the Argentinian authors I love. A story by Jorge Bucay:

The Story of the Chained Elephant

When I was a small boy, I loved going to the circus. Animal acts were my favorite. I was quite impressed by the elephant, who is — as I found out later — the favorite animal of all children. The elephant’s part of the show was a display of his huge weight, his immense size and power… Then, as the show was approaching its end, slightly before the elephant had to return to his tent, he was standing tied to a tiny wooden stake driven partially into the ground. A chain was wrapped around his feet.

The size of the stake was very small, and the part of it that was driven into the ground was even smaller. The chain that was wrapped around the legs of the elephant was quite large, but it seemed quite obvious, even to my childish mind, that an animal whose power was so large, so immense that it could rip trees off the ground and hurl them to others, was more than enough to let the elephant just rise and walk away.

That was the mystery of the elephant.

What sort of immense force could keep the elephant tied to that tiny stake?

Why didn’t he rise and walk away?

When I was five or six years old, I put great trust in the wisdom of the elder people. So I asked my teacher, my father, and my uncle about the mystery of the elephant. I don’t remember anymore who gave me the particular answer, but one of the replies was that the elephant doesn’t run away because he is “tame”.

Then I asked the obvious question: “If he’s tame, why do they have to chain him?” I don’t think I ever got a satisfactory answer to this question.

As time went by, I forgot all about the mystery of the huge elephant and the tiny stake. The mystery would only resurface when I was at the company of others who had wondered about the same thing.

Then, a few years ago, I discovered that someone knew why the elephant doesn’t run away.

The elephant doesn’t run away because they have been tying him to a similar stake ever since he was very very small too.

I closed my eyes, and I tried to imagine the small, newborn elephant, chained to the ground. The small elephant would push, pull and struggle with all his strength, trying to free himself, but he would fail. Despite all his efforts, he would fail again and again, because that stake and chain was too big for his strength.

The elephant would sleep exhausted from all his efforts to free himself, and would wake up the next day. All his struggles would fail the next day too, and a third day, and a fourth, and many tiresome, exhausting days after those. Then one day would come — a horrible day for the history of our elephant — a day that he would just give up, and accept his fate, deciding that he was too weak to escape, that his strength was not enough and would never be enough.

The huge and immensely powerful elephant that we see in the circus does not run away because the poor animal believes that he cannot do that.

The memory of the lack of strength he felt a little after his birth is now deeply engraved to his very soul and spirit.

The worst of it all is that he has never tried to free himself since.

He never ever tried to test his powers again.

The story of the circus elephant is often why new people do not try to contribute to FreeBSD. They have this strange idea that they are, for some odd reason, “not good enough”; that they cannot really stand side to side with the giants who have built this enormous, immensely huge system; that their feeble attempts to improve their favorite OS will be met with scorn, or contemptuous laughter by the super smart alien beings that are behind such a complex beast of a system.

This is, fortunately, not true. FreeBSD has been developed by humans, by people like me and, most importantly, you, the new contributor who is passionate about his favorite OS. We are not superhuman entities from outer space, but we like what we are doing, and we try to develop, improve and extend the operating environment that we all love.

We have all tried to do many things about FreeBSD and with FreeBSD. Some of them have worked, and a small percentage of what has worked later became a part of the official FreeBSD system. But there have also been thousands of times that we failed. Utterly and unrecoverably failed. We went down the wrong path for a long time. We tried things that were risky, amusing but also very very easy to break; to do funny, or silly things, or even to just explode in our face.

If you are a new FreeBSD contributor then try to avoid getting stuck in that tiny stake and chain that keeps the circus elephant from being free. We have all failed in out attempts to do something that improves FreeBSD. We have failed not once, not twice, but many times over and over again.

But we keep trying our strength, and in the end we do find our place in the team that makes FreeBSD the wonderful system that it is today and the amazing system that it will be tomorrow :)


Finding Out How to Get Started

So you decided that you do want to help, but there’s a tiny obstacle that has to be overcome first. You don’t know where to get started and learn more about FreeBSD, how it works, how it is developed, and how you can contribute to make it a better system.

First of all, congratulations for wanting to contribute to FreeBSD! We always need more hands to work on the open bugs, to answer questions of new users, to write documentation, to test new drivers, to debug and fix old drivers, and so on.

There are many things you can do to help FreeBSD. You can start with easy tasks, and move to more difficult ones as you pick up the details of how everything works.

My suggestion would be to start by reading the latest version of the “Contributing to FreeBSD” article. You can find it online at:

If you are interested in helping us with the FreeBSD Ports Collection, one of the major selling points of FreeBSD, there is a separate article that may give you some ideas to get started: “Contributing to the
FreeBSD Collection
“. This one is available at:

The “FreeBSD Development Projects” page is a third option you have. This is a a list of interesting, active and/or useful things we could do to extend, improve and adapt FreeBSD to do new or just more cool things. The list of projects is visible online at:

Some of the most interesting projects are listed separately in that page, under the “Project Ideas” section:

All these pages are public information, accessible to anyone who wants to know about ways to help FreeBSD. So you are most welcome to have a look at these pages, and look for something that seems interesting for you.

When you do find something interesting, you will probably have a few questions about how to work on the idea, where to grab the sources of FreeBSD, where to submit patches, how to do that, and so on. Our large collection of mailing lists is going to be helpful at this point. Visit the mailing list information page at:

Look for a mailing list that matches the work you are doing, and then either post directly to the list, or subscribe to it. One of the lists that is probably going to be useful for general questions about FreeBSD (questions like “where do I get the source of the ls(1) utility?”) is the freebsd-questions mailing list:

If you can’t locate the correct list for something you are working on, if you have questions that don’t seem to fit neatly into the topics of another list, or even if you just want to ask something quick about FreeBSD but you don’t have the time to seek the right mailing list to do that, the freebsd-questions list should be your fallback choice.

Posted in Computers, Free software, FreeBSD, Open source, Programming, Software Tagged: Computers, Free software, FreeBSD, Open source, Programming, Software

A bunch of updates for the Greek FreeBSD/doc translations

Translations of technical documentation from English to Greek are a relatively difficult task. It takes a certain level of attention to detail and a fairly good command of both languages. Then there is the minor issue of keeping the translations up to date with their English counterparts.

Updating translations (the old style)

We have a growing body of translated work at the FreeBSD Greek documentation project team, and it was getting rather unwieldy going through each file manually and checking if there are updates in the English version that we would like to pull out of CVS, translate from scratch or re-translate, and commit to our main translation tree. Back when I started writing the original Greek translation build glue, I copied a tagging scheme used by existing translations that was helpful for this sort of manual check. Each translated file had a comment of the form:

<!-- Original revision: 1.17 -->

When looking for updates, one had to manually perform the following steps for each file in the doc/el_GR.ISO8859-7/ directory:

  • Check if the file includes an “Original revision” comment.
  • Extract the revision number from that comment, and note it down somewhere.
  • Make an educated guess about the pathname of the original English text. Some times the path is easy to guess by substituting el_GR.ISO8859-7 with en_US.ISO8859-1 in the file’s path name. Some other times, it isn’t so easy (especially for files in the el_GR.ISO8859-7/share directory).
  • Locate the $FreeBSD: ... $ line in the original English text.
  • Compare with the saved revision from the comment of the Greek text, and see if there are updates to translate.

There are just five steps for each file in this checking process. When translated files are just a bunch of articles, and a few makefiles, it’s boring to repeat these steps for each file, but it isn’t so difficult that nobody can do it. Now that we have Greek translations for a large part of the FreeBSD Handbook, and I am a bit more pressed for time, manually performing these steps for each file of the Greek translation tree started becoming very difficult to do in a timely manner.

New tools (checkupdate)

This was the main reason for writing the checkupdate script. With a lot of help from Gabor Pali, one of the committers who work for the Hungarian FreeBSD translations, I wrote a Python script called checkupdate and designed a tagging scheme that would make this part of the translator’s work much easier. We started by defining how a translator can “tag” a translated source file with the revision of the last fully translated English version. The idea we came up was:

Each translated file will contain a pair of tags called “%SOURCE%” and “%SRCID%“. The %SOURCE tag will point to the relative path of the English text under the doc/ tree. The %SRCID% tag will refer to the last fully translated revision of the %SOURCE% file.

An example for one of the translated Greek articles is:

$ pwd
$ head -10 el_GR.ISO8859-7/articles/new-users/article.sgml

  $FreeBSD: doc/el_GR.ISO8859-7/articles/new-users/article.sgml,v 1.4 2008/01/14 14:19:42 keramida Exp $

  ??? ??????? ????? ???? ??? FreeBSD ??? ??? ??? Unix

  The FreeBSD Greek Documentation Project

  %SOURCE%      en_US.ISO8859-1/articles/new-users/article.sgml
  %SRCID%       1.24

Then we wrote a Python script that can “parse” the %SOURCE% and %SRCID% tags, look up the CVS (or Subversion) revision number of the original English text, and report any differences. The “interface” of the script was quite simple: a list of filenames is fed to the script through standard input, and it assumes they are relative pathnames under the top of a doc/ checkout. This way, to check all the files of the Greek translation one would run:

$ pwd
$ find el_GR.ISO8859-7 | checkupdate

To check multiple translations trees at once it would be possible either to loop through the translations:

$ pwd
$ for dname in el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 ; do \
    find "${dname}" | checkupdate ; \

or just pass their names directly to find:

$ pwd
$ find el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 | checkupdate

The first version of the script tried to include as much information about each translated file as possible, so it used a relatively verbose output format. This is the default output format even today. For the current version of the el_GR.ISO8859-7 translation tree the checkupdate script output includes the following:

$ find el_GR.ISO8859-7 | checkupdate
el_GR.ISO8859-7/articles/Makefile rev. 1.16
    1.39       -> 1.60        en_US.ISO8859-1/articles/Makefile

el_GR.ISO8859-7/articles/laptop/article.sgml rev. 1.4
    1.9        -> 1.25        en_US.ISO8859-1/articles/laptop/article.sgml


Gabor (pgj) later added an option for compact output, because he likes seeing one line of output for each file. The compact mode is enabled with the -c option of the checkupdate script:

$ find el_GR.ISO8859-7 | checkupdate -c
1.39       -> 1.60       el_GR.ISO8859-7/articles/Makefile
1.9        -> 1.25       el_GR.ISO8859-7/articles/laptop/article.sgml

The checkupdate script has now been committed to the FreeBSD doc/ tree in CVS, and it includes a short manpage too. The script and manpage sources are browsable online at:

Updating translations (new style)

Using the checkupdate script and a CVS checkout of the doc/ tree is much easier now. I usually open two side-by-side terminals, and keep running CVS diff commands in one of them and checkupdate in the other. A typical MFen session for one of the Greek articles includes:

  • Picking one of the translated files to update, from the output of checkupdate. For this example, let’s assume I want to update the laptop/article.sgml file.
  • Running “cvs log” and “cvs diff” in the second terminal window, to look at each change committed in CVS:

    $ cvs log -r1.9:1.25 en_US.ISO8859-1/articles/laptop/article.sgml | more
    $ cvs diff -r1.9 -r1.25 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • If the diffs seem to large to translate in one go, I may opt to translate each CVS change as a separate piece. The FreeBSD doc committers try to keep content and indentation changes separate, so it is often the case that translating revision 1.9 (a content change) as a standalone change is a lot easier than trying to decipher what changed between 1.8 and 1.10 (because revision 1.10 rewrapped and reformatted lots of text and it makes looking for the content changes of 1.9 unnecessarily hard).
  • Looking at only one revision of a file is slightly boring in CVS, but not really tough:

    $ cvs diff -r1.8 -r1.9 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • When the translation of revision 1.9 is done, I commit it to the Mercurial tree I am using for local work, taking care to update the %SRCID% comment in the file to show that it is now synchronized with English revision 1.9.
  • Some time later, a bunch of changes are pushed to the main Mercurial tree at

Recent updates

aUsing the checkupdate script and the CVS diff commands described so far, I merged from the English text a fair number of updates since last night. The commit email started tricking in late at night, when I extracted the patches from my personal Mercurial tree and committed them into CVS:

2008-08-31 [  29: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/books/developers-handbook/policies chapter.s$
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml mailing-lists.ent
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml freebsd.ent
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/releng extra.css
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  13: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook colophon.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  13: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml

The number of commits looks scary, but in reality this was only because I was experimenting with separate MFen commits of each English revision.

In retrospect, this may not be a very good idea. We don’t really need *all* the English versions translated in CVS (some may be broken, others may be intermediate commits, or may be missing some bits). It doesn’t make sense to include all the false starts of the English docs in the el_GR.ISO8859-7 tree too. So the last two commits to CVS included a bunch of English revision merges in “collapsed” form:

keramida    2008-09-02 13:56:43 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  MFen: 1.11 -> 1.13  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.5       +9 -4      doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

keramida    2008-09-02 13:57:41 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  MFen: 1.13 -> 1.17  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.6       +198 -3    doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

I think I like this commit style a bit better, and after a short discussion in the mailing list of the translators, Manolis seems to like this style too.