Category Archives: Programming

Powerful Regular Expressions Combined with Lisp in Emacs

Regular expressions are a powerful text transformation tool. Any UNIX geek will tell you that. It’s so deeply ingrained into our culture, that we even make jokes about it. Another thing that we also love is having a powerful extension language at hand, and Lisp is one of the most powerful extension languages around (and of course, we make jokes about that too).

Emacs, one of the most famous Lisp applications today, has for a while now the ability to combine both of these, to reach entirely new levels of usefulness. Combining regular expressions and Lisp can do really magical things.

An example that I recently used a few times is parsing & de-humanizing numbers in dstat output. The output of dstat includes numbers that are printed with a suffix, like ‘B’ for bytes, ‘k’ for kilobytes and ‘M’ for megabytes, e.g.:

----system---- ----total-cpu-usage---- --net/eth0- -dsk/total- sda-
     time     |usr sys idl wai hiq siq| recv  send| read  writ|util
16-05 08:36:15|  2   3  96   0   0   0|  66B  178B|   0     0 |   0
16-05 08:36:16| 42  14  37   0   0   7|  92M 1268k|   0     0 |   0
16-05 08:36:17| 45  11  36   0   0   7|  76M 1135k|   0     0 |   0
16-05 08:36:18| 27  55   8   0   0  11|  67M  754k|   0    99M|79.6
16-05 08:36:19| 29  41  16   5   0  10| 113M 2079k|4096B   63M|59.6
16-05 08:36:20| 28  48  12   4   0   8|  58M  397k|   0    95M|76.0
16-05 08:36:21| 38  37  14   1   0  10| 114M 2620k|4096B   52M|23.2
16-05 08:36:22| 37  54   0   1   0   8|  76M 1506k|8192B   76M|33.6

So if you want to graph one of the columns, it's useful to convert all the numbers in the same unit. Bytes would be nice in this case.

Separating all columns with '|' characters is a good start, so you can use e.g. a CSV-capable graphing tool, or even simple awk scripts to extract a specific column. 'C-x r t' can do that in Emacs, and you end up with something like this:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66B| 178B|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1268k|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1135k|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 754k|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2079k|4096B|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 397k|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2620k|4096B|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1506k|8192B|  76M|33.6|

The leading and trailing '|' characters are there so we can later use orgtbl-mode, an awesome table editing and realignment tool of Emacs. Now to the really magical step: regular expressions and lisp working together.

What we would like to do is convert text like "408B" to just "408", text like "1268k" to the value of (1268 * 1024), and finally text like "67M" to the value of (67 * 1024 * 1024). The first part is easy:

M-x replace-regexp RET \([0-9]+\)B RET \1 RET

This should just strip the "B" suffix from byte values.

For the kilobyte and megabyte values what we would like is to be able to evaluate an arithmetic expression that involves \1. Something like "replace \1 with the value of (expression \1)". This is possible in Emacs by prefixing the substitution pattern with \,. This instructs Emacs to evaluate the rest of the substitution pattern as a Lisp expression, and use its string representation as the "real" substitution text.

So if we match all numeric values that are suffixed by 'k', we can use (string-to-number \1) to convert the matching digits to an integer, multiply by 1024 and insert the resulting value by using the following substitution pattern:

\,(* 1024 (string-to-number \1))

The full Emacs command would then become:

M-x replace-regexp RET \([0-9]+\)k RET \,(* 1024 (string-to-number \1)) RET

This, and the byte suffix removal, yield now the following text in our Emacs buffer:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 772096|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2128896|4096|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 406528|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2682880|4096|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1542144|8192|  76M|33.6|

Note: Some of the columns are indeed not aligned very well. We'll fix that later. On to the megabyte conversion:

M-x replace-regexp RET \([0-9]+\)M RET \,(* 1024 1024 (string-to-number \1)) RET

Which produces a version that has no suffixes at all:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  96468992|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  79691776|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  70254592| 772096|   0 |  103809024|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 118489088|2128896|4096|  66060288|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  60817408| 406528|   0 |  99614720|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 119537664|2682880|4096|  54525952|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  79691776|1542144|8192|  79691776|33.6|

Finally, to align everything in neat, pipe-separated columns, we enable M-x orgtbl-mode, and type "C-c C-c" with the pointer somewhere inside the transformed dstat output. The buffer now becomes something usable for pretty-much any graphing tool out there:

| time           | cpu | cpu | cpu | cpu | cpu | cpu |      eth0 |    eth0 |  disk |      disk | sda- |
| time           | usr | sys | idl | wai | hiq | siq |      recv |    send |  read |      writ | util |
| 16-05 08:36:15 |   2 |   3 |  96 |   0 |   0 |   0 |        66 |     178 |     0 |         0 |    0 |
| 16-05 08:36:16 |  42 |  14 |  37 |   0 |   0 |   7 |  96468992 | 1298432 |     0 |         0 |    0 |
| 16-05 08:36:17 |  45 |  11 |  36 |   0 |   0 |   7 |  79691776 | 1162240 |     0 |         0 |    0 |
| 16-05 08:36:18 |  27 |  55 |   8 |   0 |   0 |  11 |  70254592 |  772096 |     0 | 103809024 | 79.6 |
| 16-05 08:36:19 |  29 |  41 |  16 |   5 |   0 |  10 | 118489088 | 2128896 |  4096 |  66060288 | 59.6 |
| 16-05 08:36:20 |  28 |  48 |  12 |   4 |   0 |   8 |  60817408 |  406528 |     0 |  99614720 | 76.0 |
| 16-05 08:36:21 |  38 |  37 |  14 |   1 |   0 |  10 | 119537664 | 2682880 |  4096 |  54525952 | 23.2 |
| 16-05 08:36:22 |  37 |  54 |   0 |   1 |   0 |   8 |  79691776 | 1542144 |  8192 |  79691776 | 33.6 |

The trick of combining arbitrary Lisp expressions with regexp substitution patterns like \1, \2 ... \9 is something I have found immensely useful in Emacs. Now that you know how it works, I hope you can find even more amusing use-cases for it.

Update: The Emacs manual has a few more useful examples of \, in action, as pointed out by tunixman on Twitter.


Filed under: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software Tagged: Computers, Emacs, Free software, FreeBSD, GNU/Linux, Lisp, Open source, Programming, Software

Mutt-like Scrolling for Gnus

Mutt scrolls the index of email folders up or down, one line at a time, with the press of a single key: ‘<’ or ‘>’. This is a very convenient way to skim through email folder listings, so I wrote a small bit of Emacs Lisp to do the same in Gnus tonight.

;;;
;; Scrolling like mutt for group, summary, and article buffers.
;;
;; Being able to scroll the current buffer view by one line with a
;; single key, rather than having to guess a random number and recenter
;; with `C-u NUM C-l' is _very_ convenient.  Mutt binds scrolling by one
;; line to '<' and '>', and it's something I often miss when working
;; with Gnus buffers.  Thanks to the practically infinite customizability
;; of Gnus, this doesn't have to be an annoyance anymore.

(defun keramida-mutt-like-scrolling ()
  "Set up '<' and '>' keys to scroll down/up one line, like mutt."
  ;; mutt-like scrolling of summary buffers with '<' and '>' keys.
  (local-set-key (kbd ">") 'scroll-up-line)
  (local-set-key (kbd "<") 'scroll-down-line))

(add-hook 'gnus-group-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-summary-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-article-prepare-hook 'keramida-mutt-like-scrolling)

This is now the latest addition to my ~/.gnus startup code, and we're one step closer to making Gnus behave like my favorite old-time mailer.


Filed under: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software Tagged: Computers, Emacs, Email, Free software, FreeBSD, GNU/Linux, Gnus, Linux, Open source, Programming, Software

Parallella on Kickstarter

Kickstarter is a great thing - it allows projects which limited appeal to be succesfully financed, which helps them succeed. One such project is the Parallela. It is basically an ARM-based highly-NUMA computer with 16-64 cores which can be used both to teach parallel programming and actually do some useful work with very little electrical power. The numbers cited as on the on the order of 45 GFLOPS/watt for the maximum configuration. As the Kickstarter deadline is approaching, I think it is a good time to use this opportunity to call on all enthusiasts to help fund this cool project!

Read more...

How slow are virtual methods in C++?

In one project I have a choice of modifying a behaviour of a class either by abstracting a base class with virtual methods and creating two descendant classes implementing those methods differently or by adding a flag and an "if" statement in the class and implementing the different behaviour based on the flag. Which one would you choose? Which one do you thing would be faster?

Read more...

Unit Testing Uncovers Bugs

As part of the ‘utility’ library in one of the projects we are using at work, I wrote two small wrappers around strtol() and strtoul(). These two functions support a much more useful error reporting mechanism than the plain atoi() and atol() functions, but getting the error checking right in all the places they are called is a bit boring and cumbersome. This is probably part of the reason why there are still programs out there that use atoi() and atol().

For example here’s how I usually check for errors in calls to the strtol() and strtoul() functions:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &amp;endp, base);
if (errno != 0 || (endp != NULL && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
}
/* At this point 'x' contains the parsed value. */

This is a lot of code for parsing a single long value. For one or two input strings it may be ok to repeat the code in the places where the numeric parsing code is needed. For more than a couple of input strings it really feels boring to repeat this code again and again.

When I set out to write the wrapper code for strtol() and strtoul() my goal was to make it very easy to parse input strings. A typical call to the parsing function should be a single line of code; it should be very clear if the parsing attempt succeeded or failed; it should also be possible to get both the parsing success or failure and the numeric value we just parsed; it should also be possible to get hold of the last character we managed to parse, so that strings like "100 200 300" can be parsed efficiently without having to manually find where the textual representation of the first number ends or the second one starts.

That's quite a list of goals for a single function, but the function call style I envisioned looked something like this:

long value;
char *endp = NULL;

if (parselong("0x12345678", &endp, 16, &value) != 0) {
        err(1, "parse error");
}

The return value of parselong() makes it very clear if the parsing attempt succeeded or failed. A return value of zero means success. Any other return value means failure.

The parsed value is returned through the &value pointer. If the parsing attempt has failed parselong() can leave the value unmodified to avoid inflicting spurious side-effects to its calling code because of a failed attempt to parse an input string.

If the parsing attempt has succeeded, &endp may be set to point right after the last character that was successfully parsed. This is actually part of the documented interface of strtol() and strtoul(), so it comes for free by wrapping these functions.

Finally, parsing a long value is a single function call. It is a lot easier to call the parsing function without having to repeat all the error checking boilerplate at each calling site. It's even easy to "chain" multiple parsing attempts using a style similar to:

long value1, value2, value3;

if (parselong("0x12345678", NULL, 16, &value1) != 0 ||
    parselong("0xdeadbeef", NULL, 16, &value2) != 0 ||
    parselong("0xf00fc0de", NULL, 16, &value3) != 0)
        err(1, "parse error");

Not that this is a good style of reporting errors, but it is possible, just because it's now easy to parse a value and check if it was parsed correctly with a single line of code.

The Unit Tests Fail on Linux

Several months passed after I wrote the initial parselong() and parseulong() functions. In the meantime I had to port the program using them to other platforms. The initial target platform was FreeBSD.

This is a bug that lurked for a few months in the initial code of parselong() until I had to port the function to another platform and started writing unit tests to verify that it works the way I expected it to work on all possible systems. In retrospect I should have started by writing the unit tests, but that's something I can say now because I finally got around to doing it and they did serve a very useful purpose.

When I had to port my 'utility' functions to work on several Linux versions too, I wrote a collection of unit tests for parselong() and parseulong(). The testing framework I used was CUnit because of the way it nicely integrates with plain ANSI C code.

One of the test functions I wrote was supposed to check for failures returned by parselong() for invalid input strings. The bulk of the test function was something like this:

#include "CUnit/Basic.h"

void
test_parselong_failures(void)
{
        long value = TEST_VALUE_ULONG_MAGIC;

        CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);

        CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);

        CU_ASSERT_EQUAL(parselong("-", NULL, 0, &value), -1);
        CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC);
        ...
        CU_PASS("parselong() failures for invalid values look ok");
}

Running the unit tests on FreeBSD seemed to work fine. After all the initial version of the parselong() function had been manually tested with the same input strings earlier.

When I tried running the same test cases on Linux though, they failed. Apparently parselong() was not detecting that strtol() failed to parse the input string "xxx" or any other input strings from the ones tested in the test_parselong_failures() function!

The Bug Uncovered

Adding a couple of debugging printf() calls to parselong() itself showed that on Linux parselong() was returning zero for invalid input strings when strtol() could parse no character at all from the input string.

The initial version of the error checking code for strtol() was similar to:

char *endp;
long x;

endp = NULL;
errno = 0;
x = strtol(str, &endp, base);
if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
    (isdigit(*endp) != 0 || isspace(*endp) == 0)))
        /* Return 'endp' if possible. */
        return -1;
}
/* At this point 'x' contains the parsed value. */

The highlighted part (endp != str) of the error checking code assumes that strtol() will move the 'endp' pointer at least one character after the start of the input string. Apparently on Linux this is not the case. The strtol() function of Linux does not move 'endp' at all if it cannot parse even a single character of the input string. This seems to be the correct behavior for strtol(), but it was hidden for a while, lurking in the original parselong() code, until I ran the unit tests of the function on Debian GNU/Linux.

The CUnit driver program that I used to run the test cases failed on Linux with error messages like:

  1. test_parselong.c:63  - CU_ASSERT_EQUAL(parselong("xxx", NULL, 0, &value),-1)
  2. test_parselong.c:64  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)
  3. test_parselong.c:66  - CU_ASSERT_EQUAL(parselong("+", NULL, 0, &value), -1)
  4. test_parselong.c:67  - CU_ASSERT_EQUAL(value, TEST_VALUE_ULONG_MAGIC)

The culprit for these test case failures was the assumption that Linux would set errno to a non-zero value for an invalid input string... Apparently, it doesn't. The following small program prints different output on BSD vs. Linux:

$ cat -n strtest.c
     1  #include <errno.h>
     2  #include <limits.h>
     3  #include <stdio.h>
     4  #include <stdlib.h>
     5
     6  int
     7  main(void)
     8  {
     9          long value;
    10          const char *input = "xxx";
    11          char *endp = NULL;
    12
    13          errno = 0;
    14          value = strtol(input, &endp, 0);
    15          printf("str = %p = \"%s\"\n", input, input);
    16          printf("endp = %p \"%s\"\n", endp, endp ? endp : "(null)");
    17          if (endp != NULL) {
    18                  printf("endp[0] = '%c' (%d 0%03o #x%02x)\n",
    19                    *endp, *endp, *endp, *endp);
    20          }
    21          printf("errno = %d\n", errno);
    22          printf("value = %ld 0%lo #x%lx\n", value, value, value);
    23          return EXIT_SUCCESS;
    24  }

On FreeBSD the output of this program includes an errno value of EINVAL:

freebsd$ cc strtest.c
freebsd$ ./a.out
str = 0x8048604 = "xxx"
endp = 0x8048604 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 22
value = 0 00 #x0
freebsd$ fgrep 22 /usr/include/sys/errno.h
#define EINVAL          22              /* Invalid argument */
freebsd$

On a recent update of Debian GNU/Linux "testing" the output is slightly different:

debian$ cc strtest.c
debian$ ./a.out
str = 0x8048630 = "xxx"
endp = 0x8048630 "xxx"
endp[0] = 'x' (120 0170 #x78)
errno = 0
value = 0 00 #x0
debian$

This means that the only indication we have that the Linux version of strtol() failed to parse some of the input text is the value of 'endp': it's the same as the input string. The error-checking code of the original parselong() wrapper was:

        x = strtol(str, &endp, base);
        if (errno != 0 || (endp != NULL && endp != str && *endp != '\0' &&
            (isdigit(*endp) != 0 || isspace(*endp) == 0)))
                error(...);

But on Linux both of the following are true:

  • errno is not set to a non-zero value.
  • If strtol() could not parse even one input character, endp == str.

This caused parselong() to bypass the error checking code, and try to return a 'valid' result even tough the Linux strtol() version has failed. Hence the failure of the unit tests.

Removing the (endp != str) conditional expression means that the error checking code works equally well on Linux and BSD. The BSD version of strtol() returns a non-zero errno value, triggerring the first part of the error checking code. The Linux version returns an endp pointer that is non-null and fails the '\0' check later on. The new parselong() function is slightly shorter and it passes the unit tests on both BSD and Linux.

Conclusions

There is something thrilling about fixing bugs by removing code. This bug was one of the few cases I've come across during the last couple of months where removing code was an improvement. There's probably a joke about "writing too much code" and the bug-resolving debt each line of new code introduces. I think I'll leave that for another time though.

The most important conclusion of today's bug hunting session was that Unit Testing really does work and it pays back in real, quite tangible ways. Had I not spent a bit of time to think about what the parselong() and parseulong() functions are supposed to do, when they are supposed to fail and how they are allowed to fail, I would not spent the time to write test cases for them. Had I not written the test cases, I wouldn't notice there is a failing test case on Linux. Had I not seen that I wouldn't realize some times the two functions were returning completely bogus results on Linux systems.

The central place the unit testing code has in this story is an important and serious lesson for me:

KEEP TESTING!

Filed under: Computers, FreeBSD, GNU/Linux, Linux, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, Programming, Software, testing

Mercurial Clones without a Working Copy

Mercurial repository clones can have two parts:

  1. An .hg/ subdirectory, where all the repository metadata is stored
  2. A "working copy" area, where checked out files may live

The .hg/ subdirectory stores the repository metadata of the specific clone, including the history of all changesets stored in the specific clone, clone-specific hooks and scripts, information about local tags and bookmarks, and so on. This is the only part of a Mercurial repository that is actually mandatory for a functional repository.

The "working copy" area is everything under the clone that is not under the toplevel .hg/ subdirectory of the particular clone. The working area of each Mercurial repository may contain a snapshot of the files stored in the repository: either a clean snapshot, checked out from one of the changesets stored in the repository itself, or a locally modified version of a changeset.

One important detail that may not be apparent from the descriptions above is that:

Even if you have already checked out a particular version, you can delete everything except the .hg/ subdirectory and the Mercurial repository will still function normally.

Clones Without a Working Copy

An example is a good way to demonstrate how a clone still functions as a Mercurial repository without a working copy. Let's assume that you have a tiny repository at /tmp/hgdemo that contains revisions of just a small hello.c program:

% pwd
/tmp/hgdemo
% hg root
/tmp/hgdemo
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c

% hg manifest tip
hello.c
%

You can check-out a copy of the latest file revision of hello.c with the "hg checkout" command:

% hg checkout --clean tip
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% cat -n hello.c
     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  
     4  int
     5  main(void)
     6  {
     7      printf("Hello world\n");
     8      return EXIT_SUCCESS;
     9  }
%

The repository does not need a checkout to function though. The fact that your working copy has been updated to a particular revision is independent of the way the repository machinery under .hg/ works. So you can remove the source of hello.c and still use the repository to browse the history of the project:

% rm -f hello.c
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c

%

With a clone like this it is still possible to use any Mercurial command that does not require a working copy, e.g. "hg diff" to look at the differences between two arbitrary revisions:

% hg diff -r 0:1
diff -r 041227edc91b -r c48ee3a9fd78 hello.c
--- a/hello.c   Mon Jan 11 08:32:59 2010 +0200
+++ b/hello.c   Mon Jan 11 08:33:28 2010 +0200
@@ -1,8 +1,9 @@
 #include <stdio.h>
+#include <stdlib.h>
 
 int
 main(void)
 {
     printf("Hello world\n");
-    return 0;
+    return EXIT_SUCCESS;
 }
%

You can even checkout the "null" revision (a magic revision name which Mercurial treats as "not any revision stored in this repository"):

% hg checkout --clean null
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
% hg identify --id --branch
000000000000 default

When a Mercurial clone has checked out the null revision all the tracked files of the working copy are removed. If the clone does not already contain build-time artifacts you should only see the .hg/ subdirectory when you look at the clone:

% find . -maxdepth 2 -exec /bin/ls -1 -dF {} +
./
./.hg/
./.hg/00changelog.i
./.hg/branch
./.hg/dirstate
./.hg/last-message.txt
./.hg/requires
./.hg/store/
./.hg/tags.cache
./.hg/undo.branch
./.hg/undo.dirstate
%

The disk space such clone requires is limited by the size of the history metadata.

Why Would You Want Such a Clone

For a small repository like the one shown in this example, it seems pretty useless to be able to have a Mercurial clone without a working copy. You don't really gain much by deleting the source of a small 9-line C program. The space savings of doing that are quite insignificant.

If you are, however, hosting clones of large repositories in a web server somewhere, stripping the working copy of Mercurial clones may be very handy indeed and it may save you a large part of the disk space you would need to keep working copies around. By "large repository" I mean something like a single clone with several hundreds or thousands of files, or a clone whose working copy requires tens or hundreds of megabytes of data.

The OpenSolaris onnv-gate repository is one of the large repositories that use Mercurial. My own Mercurial-based mirror of the FreeBSD head branch is another example for which I readily have size data. Size information for these two repositories is shown in the table below:

  FreeBSD head/ branch
since 2008-01-01
OpenSolaris
onnv-gate repository
Tracked files 41.807 44.784
Changesets 15.513 11.462
Size of .hg store 238 MB 292 MB
Size of working copy 385 MB 543 MB

Both of these Mercurial repositories have a moderately large number of files. It's also important that the size of the working copy exceeds the size of the .hg/ repository store in both cases. In the onnv-gate repository of OpenSolaris the working copy needs almost twice as much as the entire history of the project. That's a lot of disk space to carry around in all your local clones of onnv-gate!

If all you are looking for is a local mirror of the project sources — so that you can look at the history of a project, browse the diffs committed over time, search for interesting commit information (e.g. "when was bug 6801336 fixed in OpenSolaris?") — carrying around a full working copy is probably a waste of space. Updating the files of the working copy after every pull operation from the upstream master-repository is a waste of time too.


Posted in Computers, Free software, FreeBSD, Mercurial, Open source, Programming, SCM, Software Tagged: Computers, Free software, FreeBSD, hellug, Mercurial, Open source, Programming, SCM, Software

FOSDEM 2010

Earlier tonight, on December 7 2009, a friend and me booked our flight tickets for FOSDEM 2010. I am really excited that I am going to attend another open source & free software conference. It has been a while since I had a chance to meet with other BSD people. The last time was in Milan, in EuroBSDCon 2006. It will certainly be tons of fun to meet in person with other free and open source fans, contributors and developers!

About FOSDEM

FOSDEM is an open conference, organized every year by volunteers to promote the widespread use of Free Software and Open Source Software. It takes place in the beautiful city of Brussels (Belgium). FOSDEM meetings are recognized as “The best Free Software and Open Source events in Europe”.

This year’s FOSDEM will take place on February 6-7 of 2010. The web page of the conference is already online at http://www.fosdem.org/2010/. Updates about the organization of the conference, transportation tips, accommodation options, the dev rooms and talks available this year, and any other bits that may be useful to attendees are often posted there by the organizing team. So if you are planning to attend, add this link to your bookmarks and keep up with the news until we meet in Brussels.

More Updates

That’s all for now. I will be posting more details about the conference and my trip to Belgium as they become available.


Posted in Computers, Conferences, Free software, FreeBSD, GNU/Linux, Open source, Programming, Software Tagged: Computers, Conferences, Free software, FreeBSD, GNU/Linux, hellug, Open source, Programming, Software

fts(3) or Avoiding to Reinvent the Wheel

One of the C programs I was working on this weekend had to find all files that satisfy a certain predicate and add them to a list of “pending work”. The first thing that comes to mind is probably a custom opendir(), readdir(), closedir() hack. This is probably ok when one only has these structures, but it also a bit cumbersome. There are various sorts of DIR and dirent structures and recursing down a large path requires manually keeping track of a lot of state.

A small program that uses opendir() and its friends to traverse a hierarchy of files and print their names may look like this:

% cat -n opendir-sample.c
     1  #include <sys/types.h>
     2
     3  #include <sys/stat.h>
     4
     5  #include <assert.h>
     6  #include <dirent.h>
     7  #include <limits.h>
     8  #include <stdio.h>
     9  #include <string.h>
    10
    11  static int      ptree(char *curpath, char * const path);
    12
    13  int
    14  main(int argc, char * const argv[])
    15  {
    16          int k;
    17          int rval;
    18
    19          for (rval = 0, k = 1; k < argc; k++)
    20                  if (ptree(NULL, argv[k]) != 0)
    21                          rval = 1;
    22          return rval;
    23  }
    24
    25  static int
    26  ptree(char *curpath, char * const path)
    27  {
    28          char ep[PATH_MAX];
    29          char p[PATH_MAX];
    30          DIR *dirp;
    31          struct dirent entry;
    32          struct dirent *endp;
    33          struct stat st;
    34
    35          if (curpath != NULL)
    36                  snprintf(ep, sizeof(ep), "%s/%s", curpath, path);
    37          else
    38                  snprintf(ep, sizeof(ep), "%s", path);
    39          if (stat(ep, &st) == -1)
    40                  return -1;
    41          if ((dirp = opendir(ep)) == NULL)
    42                  return -1;
    43          for (;;) {
    44                  endp = NULL;
    45                  if (readdir_r(dirp, &entry, &endp) == -1) {
    46                          closedir(dirp);
    47                          return -1;
    48                  }
    49                  if (endp == NULL)
    50                          break;
    51                  assert(endp == &entry);
    52                  if (strcmp(entry.d_name, ".") == 0 ||
    53                      strcmp(entry.d_name, "..") == 0)
    54                          continue;
    55                  if (curpath != NULL)
    56                          snprintf(ep, sizeof(ep), "%s/%s/%s", curpath,
    57                              path, entry.d_name);
    58                  else
    59                          snprintf(ep, sizeof(ep), "%s/%s", path,
    60                              entry.d_name);
    61                  if (stat(ep, &st) == -1) {
    62                          closedir(dirp);
    63                          return -1;
    64                  }
    65                  if (S_ISREG(st.st_mode) || S_ISDIR(st.st_mode)) {
    66                          printf("%c %s\n", S_ISDIR(st.st_mode) ? 'd' : 'f', ep);
    67                  }
    68                  if (S_ISDIR(st.st_mode) == 0)
    69                          continue;
    70                  if (curpath != NULL)
    71                          snprintf(p, sizeof(p), "%s/%s", curpath, path);
    72                  else
    73                          snprintf(p, sizeof(p), "%s", path);
    74                  snprintf(ep, sizeof(ep), "%s", entry.d_name);
    75                  ptree(p, ep);
    76          }
    77          closedir(dirp);
    78          return 0;
    79  }

With more than 80 lines, this looks a bit too complex for the simple task it does. It has to keep a lot of temporary state information around in the two ep[] and p[] buffers, and all the manual work of setting updating and maintaining this internal state is adding so much noise around the actual printf() statement at line 68 that it is almost too hard to understand what this particular bit of code is supposed to do.

The program still "works", in a way, so if you compile and run it, the expected results come up:

keramida@kobe:/home/keramida$ cc -O2 opendir-sample.c
keramida@kobe:/home/keramida$ ./a.out /tmp
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
keramida@kobe:/home/keramida$ ./a.out /tmp /etc/defaults
d /tmp/.snap
d /tmp/.X11-unix
d /tmp/.XIM-unix
d /tmp/.ICE-unix
d /tmp/.font-unix
f /tmp/aprtdTjbX
f /tmp/aprEdWP4d
d /tmp/fam-gdm
f /tmp/.X0-lock
d /tmp/fam-keramida
d /tmp/.esd-1000
d /tmp/screens
d /tmp/screens/S-root
d /tmp/screens/S-keramida
d /tmp/emacs1000
f /tmp/a
f /tmp/b
f /tmp/kot
f /tmp/logsort
f /etc/defaults/rc.conf
f /etc/defaults/bluetooth.device.conf
f /etc/defaults/devfs.rules
f /etc/defaults/periodic.conf

But this program looks "ugly". Fortunately, the BSDs and Linux provide a more elegant interface for traversing file hierarchies: the fts(3) family of functions. A similar program that uses fts(3) to traverse the filesystem hierarchies rooted at the arguments of main() is:

% cat -n fts-sample.c
     1  #include <sys/types.h>
     2
     3  #include <sys/stat.h>
     4
     5  #include <err.h>
     6  #include <fts.h>
     7  #include <stdio.h>
     8
     9  static int      ptree(char * const argv[]);
    10
    11  int
    12  main(int argc, char * const argv[])
    13  {
    14          int rc;
    15
    16          if ((rc = ptree(argv + 1)) != 0)
    17                  rc = 1;
    18          return rc;
    19  }
    20
    21  static int
    22  ptree(char * const argv[])
    23  {
    24          FTS *ftsp;
    25          FTSENT *p, *chp;
    26          int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
    27          int rval = 0;
    28
    29          if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL) {
    30                  warn("fts_open");
    31                  return -1;
    32          }
    33          /* Initialize ftsp with as many argv[] parts as possible. */
    34          chp = fts_children(ftsp, 0);
    35          if (chp == NULL) {
    36                  return 0;               /* no files to traverse */
    37          }
    38          while ((p = fts_read(ftsp)) != NULL) {
    39                  switch (p->fts_info) {
    40                  case FTS_D:
    41                          printf("d %s\n", p->fts_path);
    42                          break;
    43                  case FTS_F:
    44                          printf("f %s\n", p->fts_path);
    45                          break;
    46                  default:
    47                          break;
    48                  }
    49          }
    50          fts_close(ftsp);
    51          return 0;
    52  }

This version is not particularly smaller; it's only 34-35% smaller in LOC. It is, however, far more elegant and a lot easier to read:

  • By using a higher level interface, the program is shorter and easier to understand.
  • By using simpler constructs in the fts_read() loop, it very obvious what the program does for each file type (file vs. directory).
  • The FTS_COMFOLLOW flag sets up things for following symbolic links in one simple place (something entirely missing from the opendir version).
  • There are no obvious bugs about copying half of a pathname, or forgetting to recurse in some cases, or forgetting to print some directory because of a complex interaction between superfluous bits of code. Simpler is also less prone to bugs in this case.

So the next time you are about to build a filesystem traversal toolset from scratch, you can avoid all the pain (and bugs): use fts(3)! :-)


Posted in Computers, FreeBSD, GNU/Linux, Linux, NetBSD, Open source, OpenBSD, Programming, Software Tagged: Computers, FreeBSD, GNU/Linux, hellug, Linux, NetBSD, Open source, OpenBSD, Programming, Software

Contributing to FreeBSD


As part of the FreeBSD team, I often get asked the same question: “How can I get started as a FreeBSD contributor?”

There are usually two reasons why a new contributor feels overwhelmed by the idea of getting started. One of them is that he or she feels that it is difficult to find out exactly how to start contributing to a free software project. The second reason is usually a feeling of impotency, the notion that “I am such a newbie, how could I ever make a difference in such a large project?”

Both of these concerns can be addressed quite easily. This post is my attempt at recording what I have learned by being a part of the FreeBSD team for almost a decade now, so let me start by the most serious one of these two obstacles to becoming a FreeBSD contributor: the feeling of being too small to make a difference.

 

You Can Make a Difference!

The best response I can think to the idea that a new contributor is too small to do something important for FreeBSD is a short story by one of the Argentinian authors I love. A story by Jorge Bucay:

The Story of the Chained Elephant

When I was a small boy, I loved going to the circus. Animal acts were my favorite. I was quite impressed by the elephant, who is — as I found out later — the favorite animal of all children. The elephant’s part of the show was a display of his huge weight, his immense size and power… Then, as the show was approaching its end, slightly before the elephant had to return to his tent, he was standing tied to a tiny wooden stake driven partially into the ground. A chain was wrapped around his feet.

The size of the stake was very small, and the part of it that was driven into the ground was even smaller. The chain that was wrapped around the legs of the elephant was quite large, but it seemed quite obvious, even to my childish mind, that an animal whose power was so large, so immense that it could rip trees off the ground and hurl them to others, was more than enough to let the elephant just rise and walk away.

That was the mystery of the elephant.

What sort of immense force could keep the elephant tied to that tiny stake?

Why didn’t he rise and walk away?

When I was five or six years old, I put great trust in the wisdom of the elder people. So I asked my teacher, my father, and my uncle about the mystery of the elephant. I don’t remember anymore who gave me the particular answer, but one of the replies was that the elephant doesn’t run away because he is “tame”.

Then I asked the obvious question: “If he’s tame, why do they have to chain him?” I don’t think I ever got a satisfactory answer to this question.

As time went by, I forgot all about the mystery of the huge elephant and the tiny stake. The mystery would only resurface when I was at the company of others who had wondered about the same thing.

Then, a few years ago, I discovered that someone knew why the elephant doesn’t run away.

The elephant doesn’t run away because they have been tying him to a similar stake ever since he was very very small too.

I closed my eyes, and I tried to imagine the small, newborn elephant, chained to the ground. The small elephant would push, pull and struggle with all his strength, trying to free himself, but he would fail. Despite all his efforts, he would fail again and again, because that stake and chain was too big for his strength.

The elephant would sleep exhausted from all his efforts to free himself, and would wake up the next day. All his struggles would fail the next day too, and a third day, and a fourth, and many tiresome, exhausting days after those. Then one day would come — a horrible day for the history of our elephant — a day that he would just give up, and accept his fate, deciding that he was too weak to escape, that his strength was not enough and would never be enough.

The huge and immensely powerful elephant that we see in the circus does not run away because the poor animal believes that he cannot do that.

The memory of the lack of strength he felt a little after his birth is now deeply engraved to his very soul and spirit.

The worst of it all is that he has never tried to free himself since.

He never ever tried to test his powers again.

The story of the circus elephant is often why new people do not try to contribute to FreeBSD. They have this strange idea that they are, for some odd reason, “not good enough”; that they cannot really stand side to side with the giants who have built this enormous, immensely huge system; that their feeble attempts to improve their favorite OS will be met with scorn, or contemptuous laughter by the super smart alien beings that are behind such a complex beast of a system.

This is, fortunately, not true. FreeBSD has been developed by humans, by people like me and, most importantly, you, the new contributor who is passionate about his favorite OS. We are not superhuman entities from outer space, but we like what we are doing, and we try to develop, improve and extend the operating environment that we all love.

We have all tried to do many things about FreeBSD and with FreeBSD. Some of them have worked, and a small percentage of what has worked later became a part of the official FreeBSD system. But there have also been thousands of times that we failed. Utterly and unrecoverably failed. We went down the wrong path for a long time. We tried things that were risky, amusing but also very very easy to break; to do funny, or silly things, or even to just explode in our face.

If you are a new FreeBSD contributor then try to avoid getting stuck in that tiny stake and chain that keeps the circus elephant from being free. We have all failed in out attempts to do something that improves FreeBSD. We have failed not once, not twice, but many times over and over again.

But we keep trying our strength, and in the end we do find our place in the team that makes FreeBSD the wonderful system that it is today and the amazing system that it will be tomorrow :)

 

Finding Out How to Get Started

So you decided that you do want to help, but there’s a tiny obstacle that has to be overcome first. You don’t know where to get started and learn more about FreeBSD, how it works, how it is developed, and how you can contribute to make it a better system.

First of all, congratulations for wanting to contribute to FreeBSD! We always need more hands to work on the open bugs, to answer questions of new users, to write documentation, to test new drivers, to debug and fix old drivers, and so on.

There are many things you can do to help FreeBSD. You can start with easy tasks, and move to more difficult ones as you pick up the details of how everything works.

My suggestion would be to start by reading the latest version of the “Contributing to FreeBSD” article. You can find it online at:

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/contributing/

If you are interested in helping us with the FreeBSD Ports Collection, one of the major selling points of FreeBSD, there is a separate article that may give you some ideas to get started: “Contributing to the
FreeBSD Collection
“. This one is available at:

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/contributing-ports/

The “FreeBSD Development Projects” page is a third option you have. This is a a list of interesting, active and/or useful things we could do to extend, improve and adapt FreeBSD to do new or just more cool things. The list of projects is visible online at:

http://www.freebsd.org/projects/

Some of the most interesting projects are listed separately in that page, under the “Project Ideas” section:

http://www.freebsd.org/projects/ideas/

All these pages are public information, accessible to anyone who wants to know about ways to help FreeBSD. So you are most welcome to have a look at these pages, and look for something that seems interesting for you.

When you do find something interesting, you will probably have a few questions about how to work on the idea, where to grab the sources of FreeBSD, where to submit patches, how to do that, and so on. Our large collection of mailing lists is going to be helpful at this point. Visit the mailing list information page at:

http://lists.freebsd.org/mailman/listinfo/

Look for a mailing list that matches the work you are doing, and then either post directly to the list, or subscribe to it. One of the lists that is probably going to be useful for general questions about FreeBSD (questions like “where do I get the source of the ls(1) utility?”) is the freebsd-questions mailing list:

http://lists.freebsd.org/mailman/listinfo/freebsd-questions

If you can’t locate the correct list for something you are working on, if you have questions that don’t seem to fit neatly into the topics of another list, or even if you just want to ask something quick about FreeBSD but you don’t have the time to seek the right mailing list to do that, the freebsd-questions list should be your fallback choice.

Posted in Computers, Free software, FreeBSD, Open source, Programming, Software Tagged: Computers, Free software, FreeBSD, Open source, Programming, Software

A bunch of updates for the Greek FreeBSD/doc translations


Translations of technical documentation from English to Greek are a relatively difficult task. It takes a certain level of attention to detail and a fairly good command of both languages. Then there is the minor issue of keeping the translations up to date with their English counterparts.

Updating translations (the old style)

We have a growing body of translated work at the FreeBSD Greek documentation project team, and it was getting rather unwieldy going through each file manually and checking if there are updates in the English version that we would like to pull out of CVS, translate from scratch or re-translate, and commit to our main translation tree. Back when I started writing the original Greek translation build glue, I copied a tagging scheme used by existing translations that was helpful for this sort of manual check. Each translated file had a comment of the form:

<!-- Original revision: 1.17 -->

When looking for updates, one had to manually perform the following steps for each file in the doc/el_GR.ISO8859-7/ directory:

  • Check if the file includes an “Original revision” comment.
  • Extract the revision number from that comment, and note it down somewhere.
  • Make an educated guess about the pathname of the original English text. Some times the path is easy to guess by substituting el_GR.ISO8859-7 with en_US.ISO8859-1 in the file’s path name. Some other times, it isn’t so easy (especially for files in the el_GR.ISO8859-7/share directory).
  • Locate the $FreeBSD: ... $ line in the original English text.
  • Compare with the saved revision from the comment of the Greek text, and see if there are updates to translate.

There are just five steps for each file in this checking process. When translated files are just a bunch of articles, and a few makefiles, it’s boring to repeat these steps for each file, but it isn’t so difficult that nobody can do it. Now that we have Greek translations for a large part of the FreeBSD Handbook, and I am a bit more pressed for time, manually performing these steps for each file of the Greek translation tree started becoming very difficult to do in a timely manner.

New tools (checkupdate)

This was the main reason for writing the checkupdate script. With a lot of help from Gabor Pali, one of the committers who work for the Hungarian FreeBSD translations, I wrote a Python script called checkupdate and designed a tagging scheme that would make this part of the translator’s work much easier. We started by defining how a translator can “tag” a translated source file with the revision of the last fully translated English version. The idea we came up was:

Each translated file will contain a pair of tags called “%SOURCE%” and “%SRCID%“. The %SOURCE tag will point to the relative path of the English text under the doc/ tree. The %SRCID% tag will refer to the last fully translated revision of the %SOURCE% file.

An example for one of the translated Greek articles is:

$ pwd
/ws/bsd/doc
$ head -10 el_GR.ISO8859-7/articles/new-users/article.sgml
<!--

  $FreeBSD: doc/el_GR.ISO8859-7/articles/new-users/article.sgml,v 1.4 2008/01/14 14:19:42 keramida Exp $

  ??? ??????? ????? ???? ??? FreeBSD ??? ??? ??? Unix

  The FreeBSD Greek Documentation Project

  %SOURCE%      en_US.ISO8859-1/articles/new-users/article.sgml
  %SRCID%       1.24
$

Then we wrote a Python script that can “parse” the %SOURCE% and %SRCID% tags, look up the CVS (or Subversion) revision number of the original English text, and report any differences. The “interface” of the script was quite simple: a list of filenames is fed to the script through standard input, and it assumes they are relative pathnames under the top of a doc/ checkout. This way, to check all the files of the Greek translation one would run:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 | checkupdate

To check multiple translations trees at once it would be possible either to loop through the translations:

$ pwd
/ws/bsd/doc
$ for dname in el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 ; do \
    find "${dname}" | checkupdate ; \
done

or just pass their names directly to find:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 | checkupdate

The first version of the script tried to include as much information about each translated file as possible, so it used a relatively verbose output format. This is the default output format even today. For the current version of the el_GR.ISO8859-7 translation tree the checkupdate script output includes the following:

$ find el_GR.ISO8859-7 | checkupdate
el_GR.ISO8859-7/articles/Makefile rev. 1.16
    1.39       -> 1.60        en_US.ISO8859-1/articles/Makefile

el_GR.ISO8859-7/articles/laptop/article.sgml rev. 1.4
    1.9        -> 1.25        en_US.ISO8859-1/articles/laptop/article.sgml

[...]

Gabor (pgj) later added an option for compact output, because he likes seeing one line of output for each file. The compact mode is enabled with the -c option of the checkupdate script:

$ find el_GR.ISO8859-7 | checkupdate -c
1.39       -> 1.60       el_GR.ISO8859-7/articles/Makefile
1.9        -> 1.25       el_GR.ISO8859-7/articles/laptop/article.sgml
[...]

The checkupdate script has now been committed to the FreeBSD doc/ tree in CVS, and it includes a short manpage too. The script and manpage sources are browsable online at:

http://cvsweb.freebsd.org/doc/el_GR.ISO8859-7/share/tools/checkupdate/

Updating translations (new style)

Using the checkupdate script and a CVS checkout of the doc/ tree is much easier now. I usually open two side-by-side terminals, and keep running CVS diff commands in one of them and checkupdate in the other. A typical MFen session for one of the Greek articles includes:

  • Picking one of the translated files to update, from the output of checkupdate. For this example, let’s assume I want to update the laptop/article.sgml file.
  • Running “cvs log” and “cvs diff” in the second terminal window, to look at each change committed in CVS:

    $ cvs log -r1.9:1.25 en_US.ISO8859-1/articles/laptop/article.sgml | more
    $ cvs diff -r1.9 -r1.25 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • If the diffs seem to large to translate in one go, I may opt to translate each CVS change as a separate piece. The FreeBSD doc committers try to keep content and indentation changes separate, so it is often the case that translating revision 1.9 (a content change) as a standalone change is a lot easier than trying to decipher what changed between 1.8 and 1.10 (because revision 1.10 rewrapped and reformatted lots of text and it makes looking for the content changes of 1.9 unnecessarily hard).
  • Looking at only one revision of a file is slightly boring in CVS, but not really tough:

    $ cvs diff -r1.8 -r1.9 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • When the translation of revision 1.9 is done, I commit it to the Mercurial tree I am using for local work, taking care to update the %SRCID% comment in the file to show that it is now synchronized with English revision 1.9.
  • Some time later, a bunch of changes are pushed to the main Mercurial tree at http://hg.hellug.gr/freebsd/doc-el/.

Recent updates

aUsing the checkupdate script and the CVS diff commands described so far, I merged from the English text a fair number of updates since last night. The commit email started tricking in late at night, when I extracted the patches from my personal Mercurial tree and committed them into CVS:

2008-08-31 [  29: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/books/developers-handbook/policies chapter.s$
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml mailing-lists.ent
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml freebsd.ent
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books Makefile.inc
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/releng extra.css
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  13: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook colophon.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  13: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml

The number of commits looks scary, but in reality this was only because I was experimenting with separate MFen commits of each English revision.

In retrospect, this may not be a very good idea. We don’t really need *all* the English versions translated in CVS (some may be broken, others may be intermediate commits, or may be missing some bits). It doesn’t make sense to include all the false starts of the English docs in the el_GR.ISO8859-7 tree too. So the last two commits to CVS included a bunch of English revision merges in “collapsed” form:

keramida    2008-09-02 13:56:43 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.11 -> 1.13  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.5       +9 -4      doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

keramida    2008-09-02 13:57:41 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.13 -> 1.17  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.6       +198 -3    doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

I think I like this commit style a bit better, and after a short discussion in the mailing list of the translators, Manolis seems to like this style too.

cpupowerd FreeBSD Patch

Es hat zwar ein paar Tage länger gedauert als ursprünglich geplant aber jetzt ist mein FreeBSD Patch für cpupowerd vollständig und stabil. So wie es aussieht wird er in die nächste größere Version 0.2.0 integriert werden. Ich hätte auch gerne OpenBSD/NetBSD oder DragonflyBSD unterstützt aber leider sind die BSDs in diesem Bereich alle so unterschiedlich und bieten derzeit keine Möglichkeit auf die CPU MSR Register zuzugreifen, dass es sehr viel Aufwand wäre dafür erst einmal die nötige Infrastruktur zu schaffen. Auf FreeBSD ist dank Stanislav Sedov sysutils/devcpu vorhanden das als Kernelmodul einen Zugriff auf die MSR Register ermöglicht. Wenn sich jemand darum kümmert das Kernelmodul auf die anderen BSDs zu portieren oder einen anderen Weg kennt um auf die MSR Register zuzugreifen dann mache ich gerne den Rest der dafür nötig ist.

Experimenting with Mercurial “named branches”


As an experiment with the “named branch” support of Mercurial (Hg hereafter), I’ve started updating the editors/emacs-devel port of FreeBSD, using an Hg repository with two branches:

  • HEAD is the main branch where history is imported from the official FreeBSD CVS repository
  • keramida is a named branch where my own, local changes are committed

The experiment seems to be going pretty well so far, and the port has been updated to a CVS snapshot of the GNU Emacs source tree obtained at 1 Jan 2008, 21:19:17 UTC. You can see the Hg repository with the two named branches at:

http://hg.hellug.gr/keramida/ports/emacs-devel/

I’ll keep the converted port repository around, and see how future updates work. I’m really interested to see what happens with “merges” of upstream code, after the current “keramida” branch has been committed upstream, to the official FreeBSD ports/ repository :-)

Greek FreeBSD doc/ update


Another update of our base freebsd/doc Mercurial tree has been pushed to hg.hellug.gr.

The bundles for populating new Mercurial trees with the changes are being uploaded to freefall as I’m typing this post. They should be available in a short while from:

bsd.hg-2007.12.02.20

A bundle of the clean imports done from FreeBSD doc/

el.hg-2007.12.02.20

A bundle of the merged sources, including all the changesets I have pulled from other translators

Happy translating!