Category Archives: Mercurial

What keramida said…

Mercurial repository clones can have two parts:

  1. An .hg/ subdirectory, where all the repository metadata is stored
  2. A “working copy” area, where checked out files may live

The .hg/ subdirectory stores the repository metadata of the specific clone, including the history of all changesets stored in the specific clone, clone-specific hooks and scripts, information about local tags and bookmarks, and so on. This is the only part of a Mercurial repository that is actually mandatory for a functional repository.

The “working copy” area is everything under the clone that is not under the toplevel .hg/ subdirectory of the particular clone. The working area of each Mercurial repository may contain a snapshot of the files stored in the repository: either a clean snapshot, checked out from one of the changesets stored in the repository itself, or a locally modified version of a changeset.

One important detail that may not be apparent from the descriptions above is that:

Even if you have already checked out a particular version, you can delete everything except the .hg/ subdirectory and the Mercurial repository will still function normally.

Clones Without a Working Copy

An example is a good way to demonstrate how a clone still functions as a Mercurial repository without a working copy. Let’s assume that you have a tiny repository at /tmp/hgdemo that contains revisions of just a small hello.c program:

% pwd
/tmp/hgdemo
% hg root
/tmp/hgdemo
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c

% hg manifest tip
hello.c
%

You can check-out a copy of the latest file revision of hello.c with the “hg checkout” command:

% hg checkout --clean tip
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% cat -n hello.c
     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  
     4  int
     5  main(void)
     6  {
     7      printf("Hello world\n");
     8      return EXIT_SUCCESS;
     9  }
%

The repository does not need a checkout to function though. The fact that your working copy has been updated to a particular revision is independent of the way the repository machinery under .hg/ works. So you can remove the source of hello.c and still use the repository to browse the history of the project:

% rm -f hello.c
% hg log --style compact
1[tip]   c48ee3a9fd78   2010-01-11 08:33 +0200   keramida
  Use EXIT_SUCCESS instead of hard-coded zero.

0   041227edc91b   2010-01-11 08:32 +0200   keramida
  Add hello.c

%

With a clone like this it is still possible to use any Mercurial command that does not require a working copy, e.g. “hg diff” to look at the differences between two arbitrary revisions:

% hg diff -r 0:1
diff -r 041227edc91b -r c48ee3a9fd78 hello.c
--- a/hello.c   Mon Jan 11 08:32:59 2010 +0200
+++ b/hello.c   Mon Jan 11 08:33:28 2010 +0200
@@ -1,8 +1,9 @@
 #include <stdio.h>
+#include <stdlib.h>
 
 int
 main(void)
 {
     printf("Hello world\n");
-    return 0;
+    return EXIT_SUCCESS;
 }
%

You can even checkout the “null” revision (a magic revision name which Mercurial treats as “not any revision stored in this repository”):

% hg checkout --clean null
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
% hg identify --id --branch
000000000000 default

When a Mercurial clone has checked out the null revision all the tracked files of the working copy are removed. If the clone does not already contain build-time artifacts you should only see the .hg/ subdirectory when you look at the clone:

% find . -maxdepth 2 -exec /bin/ls -1 -dF {} +
./
./.hg/
./.hg/00changelog.i
./.hg/branch
./.hg/dirstate
./.hg/last-message.txt
./.hg/requires
./.hg/store/
./.hg/tags.cache
./.hg/undo.branch
./.hg/undo.dirstate
%

The disk space such clone requires is limited by the size of the history metadata.

Why Would You Want Such a Clone

For a small repository like the one shown in this example, it seems pretty useless to be able to have a Mercurial clone without a working copy. You don’t really gain much by deleting the source of a small 9-line C program. The space savings of doing that are quite insignificant.

If you are, however, hosting clones of large repositories in a web server somewhere, stripping the working copy of Mercurial clones may be very handy indeed and it may save you a large part of the disk space you would need to keep working copies around. By “large repository” I mean something like a single clone with several hundreds or thousands of files, or a clone whose working copy requires tens or hundreds of megabytes of data.

The OpenSolaris onnv-gate repository is one of the large repositories that use Mercurial. My own Mercurial-based mirror of the FreeBSD head branch is another example for which I readily have size data. Size information for these two repositories is shown in the table below:

  FreeBSD head/ branch
since 2008-01-01
OpenSolaris
onnv-gate repository
Tracked files 41.807 44.784
Changesets 15.513 11.462
Size of .hg store 238 MB 292 MB
Size of working copy 385 MB 543 MB

Both of these Mercurial repositories have a moderately large number of files. It’s also important that the size of the working copy exceeds the size of the .hg/ repository store in both cases. In the onnv-gate repository of OpenSolaris the working copy needs almost twice as much as the entire history of the project. That’s a lot of disk space to carry around in all your local clones of onnv-gate!

If all you are looking for is a local mirror of the project sources — so that you can look at the history of a project, browse the diffs committed over time, search for interesting commit information (e.g. “when was bug 6801336 fixed in OpenSolaris?”) — carrying around a full working copy is probably a waste of space. Updating the files of the working copy after every pull operation from the upstream master-repository is a waste of time too.


Posted in Computers, Free software, FreeBSD, Mercurial, Open source, Programming, SCM, Software Tagged: Computers, Free software, FreeBSD, hellug, Mercurial, Open source, Programming, SCM, Software

What keramida said…

As pleasures go, it is a strange yet somewhat refined one to see a project one has started pick up speed. My fellow translators at the Greek documentation team of FreeBSD have been busy lately, and the result of our collective work is a fairly large number of commits to the “doc-el” repository.

There are now at least four translators actively working on a chapter of their own: Manolis Kiagias, Vaggelis Typaldos, Kyriakos Kentrotis and me. Changesets flow between our repository clones almost every day, and I often find myself pulling patches from two or three places at the same time.

This morning I picked up patches from both Kyriakos and Manolis. Manolis had already integrated with Vaggelis, so pulling from him I also got the translations of Vaggelis. In the meantime, my nightly cron job had finished importing a new snapshot from the official CVS tree, so today’s history graph looks scary:

Greek FreeBSD Translations: Commit History of 28 June 2009

The “surface complexity” of a change history like this may seem scary, but to me it is nothing of the sort. It is, in fact, quite the opposite: something to be proud and happy about, because it shows a lively team, working steadily towards our common goal—a fully translated doc/ tree with a translated, accessible version of the FreeBSD Handbook for Greek users.

It really makes me very happy to see an effort started several years ago gain momentum. My own personal commits are far less than those of the other translators now, and I often find myself in the role of a “patch integrator” instead of actively translating new text. But this is ok, because now we have more people working on the translations so we still get many improvements every day :-)


Posted in Computers, Free software, FreeBSD, Mercurial, Open source, Software Tagged: Computers, Free software, FreeBSD, hellug, Mercurial, Open source, Software

A bunch of updates for the Greek FreeBSD/doc translations


Translations of technical documentation from English to Greek are a relatively difficult task. It takes a certain level of attention to detail and a fairly good command of both languages. Then there is the minor issue of keeping the translations up to date with their English counterparts.

Updating translations (the old style)

We have a growing body of translated work at the FreeBSD Greek documentation project team, and it was getting rather unwieldy going through each file manually and checking if there are updates in the English version that we would like to pull out of CVS, translate from scratch or re-translate, and commit to our main translation tree. Back when I started writing the original Greek translation build glue, I copied a tagging scheme used by existing translations that was helpful for this sort of manual check. Each translated file had a comment of the form:

<!-- Original revision: 1.17 -->

When looking for updates, one had to manually perform the following steps for each file in the doc/el_GR.ISO8859-7/ directory:

  • Check if the file includes an “Original revision” comment.
  • Extract the revision number from that comment, and note it down somewhere.
  • Make an educated guess about the pathname of the original English text. Some times the path is easy to guess by substituting el_GR.ISO8859-7 with en_US.ISO8859-1 in the file’s path name. Some other times, it isn’t so easy (especially for files in the el_GR.ISO8859-7/share directory).
  • Locate the $FreeBSD: ... $ line in the original English text.
  • Compare with the saved revision from the comment of the Greek text, and see if there are updates to translate.

There are just five steps for each file in this checking process. When translated files are just a bunch of articles, and a few makefiles, it’s boring to repeat these steps for each file, but it isn’t so difficult that nobody can do it. Now that we have Greek translations for a large part of the FreeBSD Handbook, and I am a bit more pressed for time, manually performing these steps for each file of the Greek translation tree started becoming very difficult to do in a timely manner.

New tools (checkupdate)

This was the main reason for writing the checkupdate script. With a lot of help from Gabor Pali, one of the committers who work for the Hungarian FreeBSD translations, I wrote a Python script called checkupdate and designed a tagging scheme that would make this part of the translator’s work much easier. We started by defining how a translator can “tag” a translated source file with the revision of the last fully translated English version. The idea we came up was:

Each translated file will contain a pair of tags called “%SOURCE%” and “%SRCID%“. The %SOURCE tag will point to the relative path of the English text under the doc/ tree. The %SRCID% tag will refer to the last fully translated revision of the %SOURCE% file.

An example for one of the translated Greek articles is:

$ pwd
/ws/bsd/doc
$ head -10 el_GR.ISO8859-7/articles/new-users/article.sgml
<!--

  $FreeBSD: doc/el_GR.ISO8859-7/articles/new-users/article.sgml,v 1.4 2008/01/14 14:19:42 keramida Exp $

  ??? ??????? ????? ???? ??? FreeBSD ??? ??? ??? Unix

  The FreeBSD Greek Documentation Project

  %SOURCE%      en_US.ISO8859-1/articles/new-users/article.sgml
  %SRCID%       1.24
$

Then we wrote a Python script that can “parse” the %SOURCE% and %SRCID% tags, look up the CVS (or Subversion) revision number of the original English text, and report any differences. The “interface” of the script was quite simple: a list of filenames is fed to the script through standard input, and it assumes they are relative pathnames under the top of a doc/ checkout. This way, to check all the files of the Greek translation one would run:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 | checkupdate

To check multiple translations trees at once it would be possible either to loop through the translations:

$ pwd
/ws/bsd/doc
$ for dname in el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 ; do \
    find "${dname}" | checkupdate ; \
done

or just pass their names directly to find:

$ pwd
/ws/bsd/doc
$ find el_GR.ISO8859-7 mn_MN.UTF-8 hu_HU.ISO8859-2 | checkupdate

The first version of the script tried to include as much information about each translated file as possible, so it used a relatively verbose output format. This is the default output format even today. For the current version of the el_GR.ISO8859-7 translation tree the checkupdate script output includes the following:

$ find el_GR.ISO8859-7 | checkupdate
el_GR.ISO8859-7/articles/Makefile rev. 1.16
    1.39       -> 1.60        en_US.ISO8859-1/articles/Makefile

el_GR.ISO8859-7/articles/laptop/article.sgml rev. 1.4
    1.9        -> 1.25        en_US.ISO8859-1/articles/laptop/article.sgml

[...]

Gabor (pgj) later added an option for compact output, because he likes seeing one line of output for each file. The compact mode is enabled with the -c option of the checkupdate script:

$ find el_GR.ISO8859-7 | checkupdate -c
1.39       -> 1.60       el_GR.ISO8859-7/articles/Makefile
1.9        -> 1.25       el_GR.ISO8859-7/articles/laptop/article.sgml
[...]

The checkupdate script has now been committed to the FreeBSD doc/ tree in CVS, and it includes a short manpage too. The script and manpage sources are browsable online at:

http://cvsweb.freebsd.org/doc/el_GR.ISO8859-7/share/tools/checkupdate/

Updating translations (new style)

Using the checkupdate script and a CVS checkout of the doc/ tree is much easier now. I usually open two side-by-side terminals, and keep running CVS diff commands in one of them and checkupdate in the other. A typical MFen session for one of the Greek articles includes:

  • Picking one of the translated files to update, from the output of checkupdate. For this example, let’s assume I want to update the laptop/article.sgml file.
  • Running “cvs log” and “cvs diff” in the second terminal window, to look at each change committed in CVS:

    $ cvs log -r1.9:1.25 en_US.ISO8859-1/articles/laptop/article.sgml | more
    $ cvs diff -r1.9 -r1.25 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • If the diffs seem to large to translate in one go, I may opt to translate each CVS change as a separate piece. The FreeBSD doc committers try to keep content and indentation changes separate, so it is often the case that translating revision 1.9 (a content change) as a standalone change is a lot easier than trying to decipher what changed between 1.8 and 1.10 (because revision 1.10 rewrapped and reformatted lots of text and it makes looking for the content changes of 1.9 unnecessarily hard).
  • Looking at only one revision of a file is slightly boring in CVS, but not really tough:

    $ cvs diff -r1.8 -r1.9 en_US.ISO8859-1/articles/laptop/article.sgml | cdiff
  • When the translation of revision 1.9 is done, I commit it to the Mercurial tree I am using for local work, taking care to update the %SRCID% comment in the file to show that it is now synchronized with English revision 1.9.
  • Some time later, a bunch of changes are pushed to the main Mercurial tree at http://hg.hellug.gr/freebsd/doc-el/.

Recent updates

aUsing the checkupdate script and the CVS diff commands described so far, I merged from the English text a fair number of updates since last night. The commit email started tricking in late at night, when I extracted the patches from my personal Mercurial tree and committed them into CVS:

2008-08-31 [  29: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/books/developers-handbook/policies chapter.s$
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml mailing-lists.ent
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/share/sgml freebsd.ent
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books Makefile.inc
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/releng extra.css
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook/jails chapter.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  13: Giorgos Keramidas   ] cvs commit: doc/en_US.ISO8859-1/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  15: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-01 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/dialup-firewall article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/books/handbook colophon.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  13: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  12: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml
2008-09-02 [  14: Giorgos Keramidas   ] cvs commit: doc/el_GR.ISO8859-7/articles/freebsd-questions article.sgml

The number of commits looks scary, but in reality this was only because I was experimenting with separate MFen commits of each English revision.

In retrospect, this may not be a very good idea. We don’t really need *all* the English versions translated in CVS (some may be broken, others may be intermediate commits, or may be missing some bits). It doesn’t make sense to include all the false starts of the English docs in the el_GR.ISO8859-7 tree too. So the last two commits to CVS included a bunch of English revision merges in “collapsed” form:

keramida    2008-09-02 13:56:43 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.11 -> 1.13  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.5       +9 -4      doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

keramida    2008-09-02 13:57:41 UTC

  FreeBSD doc repository

  Modified files:
    el_GR.ISO8859-7/books/handbook/virtualization chapter.sgml
  Log:
  MFen: 1.13 -> 1.17  en_US.ISO8859-1/books/handbook/virtualization/chapter.sgml

  Revision  Changes    Path
  1.6       +198 -3    doc/el_GR.ISO8859-7/books/handbook/virtualization/chapter.sgml

I think I like this commit style a bit better, and after a short discussion in the mailing list of the translators, Manolis seems to like this style too.

Experimenting with Mercurial “named branches”


As an experiment with the “named branch” support of Mercurial (Hg hereafter), I’ve started updating the editors/emacs-devel port of FreeBSD, using an Hg repository with two branches:

  • HEAD is the main branch where history is imported from the official FreeBSD CVS repository
  • keramida is a named branch where my own, local changes are committed

The experiment seems to be going pretty well so far, and the port has been updated to a CVS snapshot of the GNU Emacs source tree obtained at 1 Jan 2008, 21:19:17 UTC. You can see the Hg repository with the two named branches at:

http://hg.hellug.gr/keramida/ports/emacs-devel/

I’ll keep the converted port repository around, and see how future updates work. I’m really interested to see what happens with “merges” of upstream code, after the current “keramida” branch has been committed upstream, to the official FreeBSD ports/ repository :-)

Greek FreeBSD doc/ update


Another update of our base freebsd/doc Mercurial tree has been pushed to hg.hellug.gr.

The bundles for populating new Mercurial trees with the changes are being uploaded to freefall as I’m typing this post. They should be available in a short while from:

bsd.hg-2007.12.02.20

A bundle of the clean imports done from FreeBSD doc/

el.hg-2007.12.02.20

A bundle of the merged sources, including all the changesets I have pulled from other translators

Happy translating!