Monthly Archive for February, 2010

After my initial experiments last month, I applied to the FreeBSD Foundation for funds to pay for additional human editing of the YouTube machine generated transcripts. The screenshot on the left shows an example HIT (Human Intelligence Task) available on Amazon Mechanical Turk.
The task description on the left is based on a template I created with three variables: $VIDEO_URL, $VIDEO_TITLE, and $CAPTIONS_URL. New HITs are then created by uploading a CSV file with three columns for each of those variables, e.g.
VIDEO_URL,VIDEO_TITLE,CAPTIONS_URL
http://www.youtube.com/watch?v=mMmbjJI5su0,"BSD v. GPL, Jason Dixon, NYCBSDCon 2008",http://people.FreeBSD.org/~murray/improved-captions-bsdvsgpl.sbv
http://www.youtube.com/watch?v=Pe8LdJpBGJ4,"Isolating Cluster Jobs for Performance and Predictability, Brooks Davis (DCBSDCon 2009",http://people.FreeBSD.org/~murray/improved-captions-isolatingcluster.sbv
Using this method I created 12 HITs for the first pass of editing for which I offered between $9 and $14 per video. A slightly modified template with the same three variables was used to pay ~$7 per video for a second pass to further improve the transcripts improved in the first pass.
The template has gotten more detailed over the past month in response to all of the minor ways that workers submitted less than perfect transcripts. The actual SBV file format used by YouTube captions is not formally specified anywhere as far as I can tell, but the 60 character maximum width and simple format can be verified in submitted transcripts with a few emacs macros.
The transcript files have been checked into the FreeBSD Doc CVS Repository. The full list of videos with human-edited English language transcripts is:
- "M. Warner Losh, An Overview of FreeBSD/mips, AsiaBSDCon2009" (captions)
- "AsiaBSDCon 2009: Internet Mail — Past, Present, and (a bit of) the Future" (captions)
- "A. Rao: The Locking Infrastructure in the FreeBSD kernel #1" (captions)
- "A. Rao: The Locking Infrastructure in the FreeBSD kernel #2" (captions)
- "PC-BSD, Matt Olander, AsiaBSDCon 2008" (captions)
- "FreeBSD, Protecting Privacy with Tor" (captions)
- "Isolating Cluster Jobs for Performance and Predictability, Brooks Davis (DCBSDCon 2009" (captions)
- "Richard Bejtlich, Network Security Monitoring Using FreeBSD" (captions)
- "Jason Dixon Closing Remarks of DCBSDCon - BSD is Still Dying" (captions)
- "A Narrative History of BSD, Dr. Kirk McKusick" (captions)
- "BSD is Dying, Jason Dixon, NYCBSDCon 2007" (captions)
- "BSD v. GPL, Jason Dixon, NYCBSDCon 2008" (captions)
- "FreeBSD Kernel Internals, Dr. Marshall Kirk McKusick" (captions)
It looks like my post about MongoDB got a lot more popular than usual, and also provoked a sort-of official response from the MongoDB developer(s). It is fair to metion them together to allow people finding one part of the story to find the other. Since my original post talks about multiple issues and the comments wander through various topics I want to summarize the part of the discussion about durability here.
It looks like my post about MongoDB got a lot more popular than usual, and also provoked a sort-of official response from the MongoDB developer(s). It is fair to metion them together to allow people finding one part of the story to find the other. Since my original post talks about multiple issues and the comments wander through various topics I want to summarize the part of the discussion about durability here.
HAST is ready!
I'm very happy to report to FreeBSD users that the HAST project I was working on for the last three months is ready for testing and already committed to the HEAD branch.
I'll describe what HAST does in few words. HAST allows for synchronous block-level replication of any storage media (called GEOM providers, using FreeBSD nomenclature) over a TCP/IP network for fast failure recovery. HAST provides storage using the GEOM infrastructure, meaning it is file system and application independent and can be combined with any existing GEOM class. In case of a primary node failure, the cluster will automatically switch to the secondary node, check and mount the UFS file system or import the ZFS pool, and continue to work without missing a single bit of data.
I must admit the project was quite challenging, not only from the technical point of view, but also because it was sponsored by the FreeBSD Foundation. The FreeBSD Foundation has a great reputation and is known to select the projects it funds very carefully. I felt strong pressure that should I fail, the FreeBSD Foundation's reputation might be hurt. Of course, not a single dollar would be spent on a failed project, but the FreeBSD community's expectations were very high and I really wanted to do a good job.
During the work a number of people contacted me privately offering help, explaining how important HAST is for FreeBSD and giving me the motivation to soldier on.
I hope that HAST will meet the community's expectations and I myself am looking forward to using it :)
Once again, I'd like to thank the HAST sponsors: the FreeBSD Foundation, OMCnet Internet Service GmbH, and TransIP BV.
All cards I had to deal with ("both" wouldn't be that impressive here) had similar design save for registers layout and some quirks. I believe that vast majority of NICs have the same design to some extent: there are circular RX/TX rings of more or less similar structure, interrupt status/mask register, media settings registers, you name it. Not a rocket science.
So I took if_arge driver from Atheros AR71XX SoC and replaced hardware-dependent parts with FIXME comments. Also string "ARGE" was replaced to "ADAPTER" and "arge" to "adapter" so simple s/adapter/xyz/g and s/ADAPTER/XYZ/g would give us a half-baked source base for if_xyz driver.
It's yet to be tested whether this approach would be of any good. I'm planning to try it in next few days :) Meanwhile you can check sources here.
One of the things that has been on my TODO list for quite some time was to port the Arduino IDE over to FreeBSD. Fortunately, Warren Block took the time to sit down work on a port and he is please to announce that a preliminary version of it is ready for testing.
I’ll be testing it out over the next few days and I encourage you to do the same. As always, any feedback or patches will be much appreciated. If all goes well, I will be committing it to the tree in the very near future.
The port can be found on GitHub: http://github.com/wblock/Arduino-port-for-FreeBSD
Howdy All,
How you all know is Robert Noland our X guy but he lose most of his time
for his new job and x11 is to many for one people. Robert is dealing
most time with x stuff on the src site and we need now some people to
help him on the ports side. Beat@ and I have been started to help him,
we’ve setup a SVN [1] and small wiki page [2] with all needed infomations.
If you have intrested to help us a bit please mail me back or join
us via irc EFnet/#freebsd-xorg.
[1]
http://trillian.chruetertee.ch/ports/browser/branches/xorg-dev
[2]:
http://wiki.freebsd.org/ModularXorg/7.5
Many Thanks
- Martin
In preparation for 7.3-RELEASE, the ports tree is now in feature freeze.
Normal upgrade, new ports, and changes that only affect other branches are allowed without prior approval but with the extra Feature safe: yes tag in the commit message. Any commit that is sweeping, i.e. touches a large number of ports, infrastructural changes, commits to ports with unusually high number of dependent ports, and any other commit that requires the rebuilding of many packages is not allowed without prior explicit approval from portmgr after that date.
When in doubt, please do not hesitate to contact portmgr.
Again this year, the FOSDEM organization had reserved a DevRoom for the BSDs. I hadn’t been to FOSDEM for several years and was pleasantly surprised to see how many BSD developers and users had turned up.
Unfortunately, I did miss the first talk as the Sunday bus schedule clearly didn’t scale to the huge numbers of conference goers. The second talk was Ed Schouten on his Newcons project for FreeBSD. Of course, I was already familiar with the utmpx part of the project with ~100 ports failing on the cluster after those changes and we’re working together on fixing those. Ed showed some very promising performance improvements and much better UTF-8 non-ASCII support, although some fonds do need more work.
Benny Siegert introduced some of the nitty-gritty of autotools and libtool to ease software portability over multiple platforms. While some of the most hated parts in the ports world, they are by far an improvement over previous tools and, especially, manual development.
Next up was Shteryana Shopova showing how to debug the FreeBSD kernel with the large number of tools provided by the operating system. With generous amounts of examples and demos, she gave a number of tips on which information to include when sending a problem report to the FreeBSD bug tracking database to get the best support from the FreeBSD developers, and even more important, how to collect that data out of a crashed system.
A face seen at most european BSD-related conferences over the last many years, Marc Balmer presented a case study of using BSD Unix and BSD licensed software in a commercial setting, talking both of the advantages of the BSD license over other licenses (illustrated by the number of words in the license), the BSD development process and contributing code back to the project, and about the point of sale (POS) software his company makes on top of a BSD operating system.
By far the most popular talk with well over 80 attendees, Axel Beckert talked about the Debian/kFreeBSD project, building a Debian GNU userland on top of a FreeBSD kernel. While he spend some time to answer the biggest question of all: “Why?”, I’m not sure everybody was convinced by the answer: “Because we can”. Most people will probably still install Debian when they want Debian and FreeBSD if they want FreeBSD, there are some that consider this combination the best of both worlds. It will be interesting to follow how the project will develop in the future.
Treading carefully not the restart the version control system wars of old, Giorgos Keramidas showed how he tracked the FreeBSD subversion changes in his local Mercurial setup. Some of the speedups of having the changes locally in a Mercurial repository over a remote subversion system were quite impressive, and the combination does provide some advantages for people wanting to develop proprietary changes locally while still easily being able to import upstream changes.
Last but not least was Brooks Davis with a short presentation on his current work to increase the number of groups a process (and thus user) can be a member of. The historic lessons on how FreeBSD and other Unices handled this was both hilarious and sad at the same time, but with the number already increased from ~15 to 1023 in 8.0 and going forward to not having any limit at all, the future looks bright.
In all, a very interesting and well-attended BSD track at FOSDEM this year. Many thanks to the FOSDEM organizers for providing the room and to Marius Nünnerich for inviting the speakers. I hope to be there again next year.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.
Currently I play a little bit around with my ZFS setup. I want to make it faster, but I do not want to spend a lot of money.
The disks are connected to an ICH 5 controller, so an obvious improvement would be to either buy a controller for the PCI slot which is able to do NCQ with the SATA disks (a siis(4) based one is not cheap), or to buy a new system which comes with a chipset which knows how to do NCQ (this would mean new RAM, new CPU, new MB and maybe even a new PSU). A new controller is a little bit expensive for the old system which I want to tune. A new system would be nice, and reading about the specs of new systems lets me want to get a Core i5 system. The problem is that I think the current offers of mainboards for this are far from good. The system should be a little bit future proof, as I would like to use it for about 5 years or more (the current system is somewhere between 5–6 years old). This means it should have SATA-3 and USB 3, but when I look at what is offered currently it looks like there are only beta-versions of hardware with SATA-3 and USB 3 support available on the marked (according to tests there is a lot of variance of the max speed the controllers are able to achieve, bugs in the BIOS, or the controllers are attached to a slow bus which prevents to use the full bandwidth). So it will not be a new system soon.
As I had a 1GB USB-stick around, I decided to attach it to the one of the EHCI USB ports and use it as a cache device for ZFS. If someone wants to try this too, be careful with the USB ports. My mainboard has only 2 USB ports connected to an EHCI, the rest are UHCI ones. This means that only 2 USB ports are fast (sort of… 40 MBit/s), the rest is only usable for slow things like a mouse, keyboard or a serial line.
Be warned, this will not give you a lot of bandwidth (if you have a fast USB stick, the 40MBit/s of the EHCI are the limit which prevent a big streaming bandwidth), but the latency of the cache device is great when doing small random IO. When I do a gstat and have a look how long a read operation takes for each involved device, I see something between 3 msec and 20 msec for the harddisks (depending if they are reading something at the current head position, or if the harddisk needs to seek around a lot). For the cache device (the USB stick) I see something between around 1 mssec and 5 msec. That is 1/3th to 1/4th of the latency of the harddisks.
With a “zfs send� I see about 300 IOops per harddisk (3 disks in a RAIDZ). Obviously this is an optimum streaming case where the disks do not need to seek around a lot. You see this in the low latency, it is about 2 msec in this case. In the random-read case, like for example when you run a find, the disks can not keep this amount of IOops, as they need to seek around. And here the USB-stick shines. I’ve seen upto 1600 IOops on it during running a find (if the corresponding data is in the cache, off course). This was with something between 0.5 and 0.8 msec of latency.
This is the machine at home which is taking care about my mails (incoming and outgoing SMTP, IMAP and Webmail), has a squid proxy and acts as a file server. There are not many users (just me and my wife) and there is no regular usage pattern for all those services. Because of this I did not do any benchmark to see how much time I can gain with various workloads (and I am not interested in some artificial performance numbers of my webmail session, as the browsing experience is highly subjective in this case). For this system a 1 GB USB stick (which was just collecting dust before) seems to be a cheap way to improve the response time for often used small data. When I use the webmail interface now, my subjective impression is, that it is faster. I am talking about listing emails (subject, date, sender, size) and displaying the content of some emails. FYI, my maildir storage has 849 MB with 35000 files in 91 folders.
Bottom line is: do not expect a lot of bandwidth increase with this, but if you have a workload which generates random read requests and you want to decrease the read latency, it could be a cheap solution to add a (big) USB stick as a cache device.
Hello Internet,
We the FreeBSD KDE Team are happy to let you know KDE SC 4.4.0 was
released few mins ago, and we’re ready for a public test. Before
you ask we don’t want to put KDE 4.4.0 in the ports tree before
FreeBSD 7.3 was released.
What is new:
KDE SC 4.4.0 provide many new features, designed to integrate
local and network social services .. and a lot more. The official
release notes for this release can be found at
http://kde.org/announcements/4.4/
Now you can get KDE SC 4.4 with a svn checkout:
svn co http://area51.pcbsd.org/trunk/area51/PORTS
svn co http://area51.pcbsd.org/trunk/area51/KDE
svn co http://area51.pcbsd.org/trunk/area51/Tools/
now try:
sh Tools/scripts/portsmerge
sh Tools/scripts/kdemerge
Please read carefull /usr/ports/UPDATING-area51
http://area51.pcbsd.org/trunk/area51/UPDATING-area51
Happy Updating!!
Firefox 3.6 was committed by beat@ latest night, we’re happy to got
all finish before the ports tree is going in the slush mode
to prepair packages for FreeBSD 7.3 Release. Please read careful
ports/UPDATING. We’d like to say thanks to all helpers and
submitters, and a special big thanks to nox for his great debug
session to fix our addon’s problem.
I've been promising great 2D performance from open source graphics for years. It was reaching the point where I was feeling awfully bad about being wrong so frequently. So this summer I started playing in my free time with making a GL backend for cairo. There was a previous sort of GL backend in the form of glitz, but it made a big mistake in trying to abstract GL through a Render-like API. The problem with accelerating 2D is that Render is a bad match for hardware!
A native GL backend turned out to be shockingly easy, now that we have support for EXT_framebuffer_objects all over, non-power-of-two textures, and GLSL. Here's a comparison of 3 backends, normalized to the image backend. Bigger bars means faster.

This shows an accelerated backend beating the CPU rasterization backend on 3 tests. Note that things for the image backend are a little unfair in its favor -- we can't scan out from cached system memory buffers, so if you want to actually see the results you have to do an upload at some point, which isn't reflected in the cairo-perf-trace results. Being able to beat that with GPU rendering to something that could be scanned out is pretty awesome. But that's only 3 tests -- for most of them image is winning. I've got some ideas for hacks on the 965 driver that may fix up a bunch of those bars (it's hard to estimate, since it's all about cache effects, and fixing those has a tendency to improve by more than the amount of time spent according to sysprof).
Since comparing to image isn't too fair, and we're not using image today, I did a comparison to xlib. This looks awesome:

By replacing Xlib usage with GL, we get a speedup on almost all the testcases, and a huge speed up on one that Xlib is pathologically slow on (I haven't figured out why for xlib yet). We've got a good pass rate on the cairo test suite, so I think this stuff is ready for people to start experimenting with in apps.
There's much more to do for performance still. I've got a plan to work on the 965 driver to improve glyphs-heavy tests like firefox-talos-gfx (and ETQW and WoW as well). For firefox-talos-svg, right now we're hitting aperture full because of all the spans data we're sending out before the GPU gets done with things. If we speed up the GPU rendering just a little, for example by tuning the inefficient shaders we're using right now, we can probably avoid hitting aperture full and cut CPU further. I think we're missing throttling for non-swapbuffers apps in DRI2, and we might actually do better and avoid aperture full if we do some appropriate throttling. And there's a lot of room for people who'd like to experiment with GL shader and state optimizations to jump in and tear this code apart.
I'd say that the Linux 2D acceleration story is starting to finally look good after all these years.