Hi to all readers! I’m moving this blog (or at least I’m moving my intention to write ) to a new address. I’m thankful to Florent Thoumie (and possibly others) who have been hosting and managing this collection of post as a part of the blogs.freebsdish.org system. I’m happy with the hosting and the reason I’m moving is to write about other things beside FreeBSD (which falls beyond the agreement for use of this hosting service).
The new blog system is very new and will probably take some time before all the kinks are fixed but it will do for now. I don’t intend to write here anymore, but I also have no plans of moving old posts to the new system, since blog posts are ephemeral anyway. So update your bookmarks, feed readers, etc. to the new address.
VirtualBox was mentioned at Slashdot so I gave it another try. The last time I tried it I had problems with FreeBSD as a guest, but the new version (1.6.2) seems to solve those problems. It’s noticeably faster than VMWare so far:
At the start I’d like to say I don’t have anything against VMWare Inc. and their products, and use their products regularly.
I tried VMWare Server 2.0 RC1 again. The first time was sometime early in beta so I just shrugged at the problems and returned to 1.0 to wait until they are solved. This time, it’s a release candidate. And it just sucks. It’s full of WTF moments:
Todays wow, this really works! moment brought to you by: NFS, sshfs (from FUSE) and Samba.
There’s a NFS file server in my office, from which I mount stuff into my home directory on my workstation. There’s a small FreeBSD server in my living room which, among other things, serves Samba to my Windows desktop machine. Using sshfs on the home server, I mounted my home directory from my workplace as a subdirectory of my local user, and I’m accessing it from my Windows desktop over Samba.
Before the bytes hit the drives on the server, here’s the path they must take:
[Home desktop, Windows] -- Samba -- [Home server, FreeBSD] -- sshfs -- [Work desktop, Linux] -- NFS -- [Work server, FreeBSD]
And it works. Really. I’m editing OpenOffice files on my Work server right now.
Of course it should work – all of these individual components in the chain are tested and known to work so there’s practically no real concern, but seeing all this in operation made me think how many standards, interoperability specs and engineering went into making this possible, especially since the actual connections between the components are very varying: ADSL, Ethernet of various speeds and I’m sure there’s still ATM somewhere in the telco’s infrastructure. The number of different operating systems the bytes go through (if “embedded” ones on routers and similar equipment is also counted) is probably huge.
We live in great times.
(Of course, I won’t try anything that depends on file locking )
The only problem is that sshfs basically hangs the system when the IP changes on the ADSL side (file system lookups hang).
The answer, of course, is “two or more”. And it’s not nice when it happens.
Two of the drives on a nice shiny FC array failed at approximately the same time (possibly within about two minutes of each other), and both were in the same RAID5 array. Definitely not good. On the other hand I confirmed that PostgreSQL can run off NFS (both server and client on FreeBSD) without problems so far (this is the temporary setup until we get a new array).
I created a small program to help me synchronize files in sort-of real-time between two directories (the idea is that one of them is on a NFS server). There are no replicated file systems for FreeBSD and the canonical way to do this is usually to use rsync or something like it. The problem here is that rsync always traverses both directory structures, compares files and then copies them (via a variety of smart algorithms but it’s still very resource-intensive).
So I created a daemon that uses kqueue(2) to monitor which files changed and feed only those files to rsync (it’s not exactly a new idea, I’m sure somebody has also mentioned it somewhere on the FreeBSD lists). This is in many ways a suboptimal solution since it needs to keep an open file descriptor for all the monitored files (which ties in kernel resources and memory) so it won’t scale for really large directories, which could actually benefit the most from this approach. It will work reasonably well for a small number of files (up to several tens of thousands), with modifying kern.maxfiles and kern.maxfilesperproc sysctls and login session limits (if applicable).
Anyone who’s interested can download the adfs daemon and try it. This was hacked together over the weekend so it probably has some problems. I’ll fix those problems that prevent me from using it, but I’ll update the online archive only if there are interested users.
I’m writing a small project in C (will talk about it later) and I really miss the expressiveness that dynamic languages like Python offer. There’s one more thing in addition to elusive “elegance” and similar nontangible properties: the ability to easily use and implement better algorithms. Yes, since Turing it was obvious that the actual programming language in use is more-or-less syntactic sugar, but you wouldn’t exactly like to spend your days programming infinite tapes of symbols, would you?
In this (again, emphasis on “small”) project, there were a couple opportunities where I could make use of a fast data-access structure like a hash table (since I need to store and retrieve a lot of data entries) or dynamic memory allocation (since I don’t want to artificially limit the number of these entries) but I just didn’t feel like writing all that code to implement a hash table in C (or use a heavy external library) and deal with memory reallocation and track all those pointers. Yes, I’m lazy. In a more abstract language I could just instantiate a dictionary and say d[i] = something and this would actually be very efficient and take care of memory allocation automagically. Since I limited myself to basic C, I chose simpler algorithms like linked lists and evil static arrays on stack. Ironically, these structures would be comparatively significantly more inefficient in Python.
Of course, at its roots this can be stripped down to be simply a choice between using pre-packaged routines instead of writing your own (aka the NIH problem), but in this case it would actually make my simple program faster and more efficient – despite the overhead of an interpreted, dynamic language.
There are many more similar cases – programmers write bubblesorts in C because they are easy to implement, while going to a higher level of abstraction they could just write mylist.sort() and would get QuickSort or some other efficient algorithm for free, etc. etc.
Does anyone know of a library / collection of algorithms for C similar to glib only BSD-licensed? (Yes, I know about C++ algorithms, I don’t want to use C++).
The day has finally come – FreeBSD is using Subversion instead of CVS for the base source tree! Congratulations to everyone involved, especially Peter Wemm
FreeBSD’s source CVS is one of the oldest and biggest in existence; it’s approximately 12 years old and has apparently had something like 180,000 commits over the years, or on average slightly more than 41 commits daily. A checkout of RELENG_7 branch holds more than 42,000 files (in 482 MB as du sees it).
This move was discussed extensively during the DevSummit at BSDCan 2008; there have been many issues with CVS over the years, most of which are minor enough to be overlooked, but some of which are just nasty (the inability of CVS to move/rename files, bad handling of branching in the event of constant new development and additions to the directory tree, non-atomic commits) and have frequently required manual interventions in the CVS repository (“repocopy” is one of the relatively frequently requested operations to the CVS admins).
Old infrastructure, of which cvsup / csup is probably the most important part, will continue to work as code will continually be mirrored from SVN to CVS, until suitable replacements or upgrades to the above tools are created. This is also the reason the name “CVS” will be present for some time in the system infrastructure, until all of it is updated. Ports will continue to use CVS for the foreseeable future.
To make a source base this large work efficiently on SVN, version 1.5 had to be used, since it creates its database files in a hash-tree of directories on the file system instead of one huge directory with all the files in it.
Also see the official announcement of Subversion and Peter Wemm’s notes about Subversion (very useful to developers).
It was a really great conference! I have so many good impressions it’s just hard to sort them all out and write them down. Instead of that, here’s a treat for the geeky-minded
In accordance with the specification expressed here:
… I’ve created a device driver that implements the functionality in kernel. In honour of the operating system by which this work was inspired, I name the driver random.debian. The kernel module creates a device entry (/dev/random.debian) which is a infinite source of random data with entropy compatible with the above specification. The source code tarball for the Debian-like random data source is of course available under the BSD license. This will work on any recent version of FreeBSD.
As much as I would like to have though of this first, the idea was actually put out by PHK or Robert Watson while we were waiting for dinner, so that part of the credit goes to them.
This DevSummit+BSDCan was very fun and educational and will definitely try to be here in the next years also.
It was a nice day in Ottawa, Canada today, though judging by the clouds tomorrow might not be. Not that we noticed, spending all the time within the conference room. Various talks kept developers interested, and the lack of Internet connectivity (web is not the Internet…) kept them focused on the issues at hand. I gave my talk about finstall among the first in the morning and gathered very positive feedback and many new ideas. It’s unfortunate that the project isn’t sponsored any more (Google SoC for it was not extended to this year) so this might encourage people or organizations to support the project financially. Among other interesting talks (my personal, unobjective choice) were presentations on VImage (network virtualization) by my colleague from University, Marko Zec, the DTrace talk by John Birrell, and various talks about the ongoing network stack optimizations (many people here). Unfortunately the Release packaging BoF didn’t happen due to lack of interest apparently, so there’s one missed chance to discuss finstall.
All together, it was a very nice spent day with many opportunities to learn new things and speak to like-minded people working on interesting things.