Archive for the ‘Uncategorized’ Category

Enter Libc

Tuesday, August 5th, 2008

I spend last weeks on importing interesting parts of Apple’s libc to FreeBSD. I ended up with this patch: http://versus.ath.cx/patches/fbsd_7.0_collation.patch, which I than applied to the CURRENT libc, and commited as my p4 libc branch.

It wasn’t however without accident - I suffered two serious laptop breakages, ending with en essentially dead one. First the display broke down - stopped showing anything - I think it was the backlight. I was lucky - I had external monitor enabled and configured for Dual Head. When I connected it however, I started to get IDE timeout problems when booting FreeBSD. So my hard drive also died. I managed to boot off live-cd, and copy and my gsoc data, and that was it.

After installing everything on my new-bought laptop, I got to finishing my work on libc, and was able to get in into - I think - fully working state. There are still some little things I want to fix/enhance, but general functionality is there.

Now I’m focusing on writing regression tests, than will come manpages. At the end will be a call for testers.

Reinforcements from Apple

Saturday, July 12th, 2008

Last time I said that it would be nice to use some Apple’s work in the area of collation. I exchanged a few emails with Jordan K. Hubbard and it seems we can user their code without problems - all interesting parts are still on the BSD licence. That’s because this is still our code, only extended by Apple. Even the copyrights weren’t changed a bit (so we don’t know who did the extending).

Anyway, as the code is fairly mature, I decided to use it. The libc part of the code is the one I am most interested in, but to see how it works, I first had to port the userland tool - Apple’s version of colldef. Doing this I extended it a little - to not choke on the expansions. As I don’t have the locale data that Apple is using, I made the tool work on my data - at the same time making it more posix compliant. There were many little issues while porting the code - and I wanted it to work perfectly before I submitted it - so it took me more than a week to complete the porting. I even made it compile with “-ansi -Wall -Wextra -pedantic”, thing I always do with my code.

Now, as the tool is completed and I did a final cleanup, I will test it on a bigger amount of data, and then proceed to port the libc part. I’m really excited to see how it works. When those two things are completed I will have to make a few more extensions to Apple’s code to make if fully compliant with UCA.

Gathering the basic elements

Monday, June 16th, 2008

Last time I’ve written about the purpose of collation. Now is the time to write a little about how I want to deal with it.

I’ve been a little busy with my exams lately (who hasn’t), but I have the last one June 26′th. Anyway, I’ve managed to gather some basic building blocks, which will support the rest of my project:

  1. imported “Common Unicode Data Repository” - the source of all locale data that you will ever need - into my p4 tree
  2. written converter scripts to change the symbolic character names as found in this repository into UTF-8 sequences
  3. written a program called colldef.c that uses the data output from the scripts and builds the binary collation table, doing some fancy compression/reduction on the way, so that all character weighs fit within one byte.

The next steps that I will take will be writing the libc part - the one that uses the binary table and does the sorting/collation. I will have to rewrite most of the string/strcoll.c and locale/collate.c.

I’ve been contacted by Alexander Leidinger recently, who told me that Apple has already done full conversion to UTF-8 of their base system. I skimmed through their strcoll.c and collate.c and I can confirm this. It would be nice if we could use some part of this work.