Category Archives: FreeBSD

The Ports Management Team 2012-10-19 15:28:17

The FreeBSD Ports Management team is pleased to welcome Bernhard Froelich, aka decke@, to it’s ranks.

Bernhard was a long time ports contributor, and received his ports commit bit back in March 2010.

More recently, Bernhard was the one responsible for bringing us Redports.org shared tinderbox.

Please join me in welcoming decke@ to the team.

Thomas
on behalf of portmgr@

The Ports Management Team 2012-10-19 15:25:36

Pav Lucistnik, aka pav@, recently stepped down from his roll on the FreeBSD Ports Management team.

Pav started on portmgr back in November 2006, he was the one responsible for many of the -exp runs over the years. His most dubious claim to fame was talking over the responsibility of krismails. We all looked forwward to our pavmails, right?

On behalf of the Ports Management team, we want to thank Pav for his years of  service, he will be missed.
Thomas
on behalf of portmgr@

The Ports Management Team 2012-10-10 19:41:24

FreeBSD 9.1 RC2 has been pulicly announced, it is now time for the the Ports Feature Freeze.

Normal upgrade, new ports, and changes that do not affect other ports will be allowed without prior approval, but with the extra

Feature safe: yes

tag in the commit message. Any commit that is sweeping, that is, touches a large number of ports, infrastructural changes, commits to ports with unusually high number of dependencies, and any other commit that requires the rebuilding of many packages will not be allowed without prior explicit approval from portmgr@.

Check out http://www.freebsd.org/portmgr/implementation.html#sweeping_changes for what constiutes a sweeping change.
Thomas
on behalf of portmgr@

Power save, CABQ, multicast frames, EAPOL frames and sequence numbers (or why does my Macbook Pro keep disassociating?)

I do lots of traffic tests when I commit changes to the FreeBSD Atheros HAL or driver. And I hadn't noticed any problems until very recently (when I was doing filtered frames work.) I noticed that my macbook pro would keep disassociating after a while. I had no idea why - it would happen with or without any iperf traffic. Very, very odd.

So I went digging into it a bit further (and it took quite a few evenings to narrow down the cause.) Here's the story.

Firstly - hostapd kept kicking off my station. Ok, so I had to figure out why. It turns out that the group rekey would occasionally fail. When it's time to do a group rekey, hostapd will send a unicast EAPOL frame to each associated station with the new key and each station must send back another EAPOL frame, acknowledging the fact. This wasn't happening so hostapd would just disconnect my laptop.

Ok, so then I went digging to see why. After adding lots of debugging code I found that the EAPOL frames were actually making to my Macbook Pro _AND_ it was ACKing them at the 802.11 layer. Ok, so the frame made it out there. But why the hell was it being dropped?

Now that I knew it was making it to the end node, I could eliminate a bunch of possibilities. What was left:


  • Sequence number is out of order;
  • CCMP IV replay counter is out of order;
  • Invalid/garbled EAPOL frame contents.
I quickly ruled out the EAPOL frame contents. The sequence number and CCMP IV were allocated correctly and in order (and never out of sequence from each other.) Good. So what was going on?

Then I realised - ok, all the traffic is in TID 16 (the non-QoS TID.) That means it isn't a QoS frame but it still has a sequence number; so it is allocated one from TID 16. There's only one CCMP IV number for a transmitter (the receiver tracks a per-TID CCMP IV replay counter, but the transmitter only has one global counter.) So that immediately rings alarm bells - what if the CCMP IV sequence number isn't being allocated in a correctly locked fashion?

Ok. So I should really fix that bug. Actually, let me go and file a bug right now. There.

There. Bug filed. PR 172338.

Now, why didn't this occur back in Perth? Why is it occuring here? Why doesn't it occur under high throughput iperf (150Mbps+) but it is when the iperf tests are capped at 100Mbps ethernet speeds? Why doesn't it drop my FreeBSD STAs?

Right. So what else is in TID 16? Guess what I found ? All the multicast and broadcast traffic like ARPs are in TID 16.

Then I discovered what was really going on. The pieces fell into place.

  • My mac does go in and out of powersave - especially when it does a background scan.
  • When the mac is doing 150Mbps+ of test traffic, it doesn't do background scans.
  • When it's doing 100Mbps of traffic, the stack sneaks in a background scan here and there.
  • Whenever it goes into background scan, it sends a "power save" to the AP..
  • .. and the AP puts all multicast traffic into the CABQ instead of sending it to the destination hardware queue.
  • Now, when this occured, the EAPOL frames would go into the software queue for TID 16 and the ARP/multicast/etc traffic would go into the CABQ
  • .. but the CABQ has higher priority, so it'll be transmitted just after the beacon frame goes out, before the EAPOL frames in the software queue.
Now, given the above set of conditions, the ARP/multicast traffic (which there's more of in my new place, thanks to a DSL modem that constantly scans the local DHCP range for rogue/disconnected devices) would be assigned sequence numbers AFTER the EAPOL frames that went out but are sitting in the TID 16 software queue. The Mac would receive those CABQ frames with later sequence numbers, THEN my EAPOL frame. Which would be rejected for being out of sequence.

The solution? Complicated.

The temporarily solution? TID 16 traffic is now in a higher priority hardware queue, so it goes out first. Yes, I should mark EAPOL frames that way. I'll go through and tidy this up soon. I just needed to fix this problem before others started reporting the instability.

The real solution is complicated. It's complicated because in power save mode, there's both unicast and multicast traffic going into the same TID(s) but different hardware queues. Given this, it's quite possible that the traffic in the CABQ will burst out before the unicast packets with the same TID make it out via another hardware queue.

I'm still thinking of the best way to fix this.

Lessons learnt from fiddling with the rate control code..

(Note before I begin: a lot of these ideas about rate control are stuff I came up with before I began working at my current employer.)

Once I had implemented filtered frames and did a little digging, I found that the rate control code was doing some relatively silly things. Lots of rates were failing quite quickly and the rate control was bouncing all over the place.

The first bug I found was that I was checking the TX descriptor completion before I had copied it over - and so I was randomly failing TX when it didn't fail. Oops.

Next, don't call the rate control code on filtered frames. They've been filtered, not transmitted. My code wasn't doing that - I'm just pointing it out to anyone else who is implementing this.

Then I looked at what was going on with rate control. I noticed that whenever the higher transmission rates failed, it took a long time for the rate control code to try and sample them again. I went and did some digging - and found it was due to a coding decision I had made about 18 months ago. I treated higher rate failures with a low EWMA success rate as successive failures. The ath_rate_sample code treats "successive failures" as "don't try to probe this for ten seconds." Now, there's a few things you need to know about 802.11n:


  • The higher rates fail, often;
  • The channel state changes, often;
  • Don't be afraid to occasionally try those higher rates; it may actually work out better for you even under higher error rates.
So given that, I modified the rate control code a bit:

  • Only randomly sample a few rates lower than the current one; don't try sampling all 6, 14 or 22 rates below the high MCS rates;
  • Don't treat low EWMA as "successive failures"; just let the rate control code oscillate a bit;
  • Drop the EWMA decay time a bit to let the oscillation swing a little more.
Now the rate control code behaves much better and recovers much quicker during unstable channel conditions (eg - adrian walking around a house whilst doing iperf tests.)

Given this, what could I do better? I decided to start reading up on what the current state of play with 802.11n aware rate control and rapidly came to the conclusion that - wow, we likely could do it better. The Linux minstrel_ht algorithm is also based on John Bickett's sample rate code, but instead of using a EWMA and minimising packet transmission time, it uses the EWMA to calculate a theoretical throughput and maximises that. So, that sounds good, right?

Except that the research shows that 802.11n channels can vary very frequently and very often, especially at the higher MCS rates. The higher MCS rates can become better and worse within a window of a second or two. So, do you want to try and squeeze the last of throughput out of that, or not?

Secondly, using "throughput" as a metric is fine if your air time is .. well, cheap. But what if you have many, many clients on an AP? Your choice of maximising throughput based on what the error rate predicts your data throughput is doesn't take airtime into account. In fact, if you choose a higher MCS rate with a higher error rate but higher throughput, you may actually be wasting more air with those retransmissions. Great for a single station, but perhaps not so great when you have lots.

So what's the solution? The open source rate control stuff doesn't take the idea of "air utilisation" into account. There's enough data available to create an air time model, but no-one is using it yet. Patches are gratefully accepted. :-)

Finally, the current packet scheduler is pretty simple and stupid (and does break in a lot of scenarios, sigh.) It's just a FIFO, servicing nodes/TIDs with traffic in said FIFO mechanism. But that's not very fair - both from a "who is next" standpoint and "what's the most efficient use of the air" view. In addition, the decision about which node/TID to schedule next is done totally separate to the rate control decision. Rate control occurs rather late in the packet transmission process (ie, once we've committed to queuing it to the hardware.) Wouldn't it be better to have the packet scheduler and rate control code joined at the hip, where the scheduler doesn't aggressively schedule traffic to a slow/lossy end node?

Lots of things to think about..

TeXLive 2012 on FreeBSD

Two months ago (August 2nd, 2012), I updated the freebsd-texlive project to provide TeXLive 2012 to the FreeBSD community.

More recently (October 1st, 2012), I made major changes in the freebsd-texlive project to fix the long-standing problem of TeXLive shipping distfiles with no version in the filename and replacing them in place. After setting up a new mirror with renamed distfiles (sponsored by NFrance, a French hosting company using the FreeBSD operating system. Many thanks to them for providing hosting and bandwidth!), updating the program that creates and updates ports, updating the updating tools, switching to the new mirror and updating all ~2250 ports; all the caveats around this choice from the maintainers of the TeXLive distribution should be gone!

This change will however not be without consequences for users. Distfiles version numbers where created by using the date at which a file appeared or where updated on the upstream mirrors, but because of the mirroring delays, the file mtime could be earlier by more than one day. As a consequence, a file updated on 17th May, 2012 could have version number 20120518 (getting the real package version number in the distfile is a PITA). Because having the version in the distfile filename require consistent dates, the new ports use the date from the upstream distfile mtime, and many ports had their version going backwards… PORTEPOCH have so been bumped, and port management tools will want to update all these ports which are in fact unchanged.

It's the first time such a bad situation happen in the freebsd-texlive repository. I hope this will be the last and apologies to users who will be worried with this massive update.

Happy TeXing!

Filtered frames support, or how not to spam the air with useless transmission attempts

I haven't updated this blog in a while - but that doesn't mean I haven't been a busy little beaver in the world of FreeBSD Atheros support.

A small corner part of correct and efficient software retransmission handling is what Atheros hardware calls "filtered frames", or what actually happens - "how to make sure the hardware doesn't just spam the air with transmission attempts to a dead remote node."

So here's the run-down.

You feed the Atheros hardware a set of frames to transmit. There's a linked list (or FIFO for the more recent hardware) of TX descriptors which represent the fragments of each frame you're transmitting. The hardware will attempt each one in turn and then return a TX completion status explaining what happened. For frames without ACKs the TX status is "did I get a chance to squeeze this out into the air" - there's no ACK, so there's no way to know if it made it out there. But for frames with ACKs, there's a response from the remote end which the transmitter uses, and it'll attempt retransmission in hardware before either succeeding or giving up. Either way, it tells you what happened.

The reality is a little more complicated - there's multiple TX queues with varying TX priorities (implementing the 802.11 "WME" QoS mechanism. There's all kinds of stuff going on behind the scenes to figure out which queue wins arbitration, gets access to the air to transmit, etc, etc. For this particular discussion it doesn't matter.

Now, say you then queue 10 frames to a remote node. The hardware will walk the TX queue (or queues) and transmit those frames one at a time. Some may fail, some may not. If you don't care about software retransmission then you're fine.

But say you want to handle software retransmission. Say that retransmission is for legacy, non-aggregation / non-blockack sessions. You transmit one frame at a time. Each traffic identifier (TID) has its own sequence number space, as well as the "Non-QoS" traffic identifier (ie, non-QoS traffic that does have a sequence number.) By design, each frame is transmitted in order, with incrementing sequence numbers. The receiver throws away frames that are out of sequence. That way packets are delivered in order. They may not be reliably received, but the sequence number ordering is enforced.

So, you now want to handle software retransmission of some frames. You get some frames that are transmitted and some frames that weren't. Can you retransmit the failed ones? Well, the answer is "sure", but there are implications. Specifically, those frames may now be out of sequence, so when you retransmit them the receiver will simply drop them. You could choose to reassign them new sequence numbers so the receiver doesn't reject them out of hand, but now the receiver is seeing out-of-order frames. It doesn't see out of sequence frames, but the underlying payloads are all jumbled. This makes various protocols (like TCP) very unhappy. If you're running some older PPTP VPN sessions, the end point may simply drop your now out-of-order frames. So it's very important that you actually maintain the order of frames to a station.

Given that, how can you actually handle software retransmission? The key lies in this idea of "filtered frames." The Atheros MAC has what it calls a "keycache", where it stuffs encryption keys for each destination node (and multicast keys, and WEP keys..) So upon reception of a frame, it'll look in the keycache for that particular station MAC, then attempt to decrypt the data with that key. Transmission is similar - there's keycache entries for each destination station and the TX descriptor has a TX Key ID.

Now, I don't know if the filtered frame bit is stored in the keycache entry or elsewhere (I should check, honest) but I'm pretty sure it is.

So the MAC has a bit for each station in the keycache (I think!) and when a TX fails, it sets that bit. That bit controls whether any further TX attempts to that station will actually occur. So when the first frame TX to that station fails, any subsequent attempts are automatically failed by the MAC, with the TX status "TX destination filtered." Thus, anything already in the hardware TX queue(s) for that destination get automatically filtered for you. The software can then grab those frames in the order you tried them (which is in sequence number order, right?) and since _all_ frames are filtered, you don't have to worry about some intermediary frames being transmitted. You just queue them in a software queue, wait until the node is alive again, and then set the "clear destination mask (CLRDMASK)" bit in the next TX attempt.

This gives you three main benefits:
  • All the frames are filtered at the point the first fails, so you get all the subsequent attempted frames back, in the order you queued them. This makes life much easier when retransmitting them;
  • The MAC now doesn't waste lots of time trying to transmit to destinations that aren't available. So if you have a bunch of UDP traffic being pushed to a dead or far away node, the airtime won't be spent trying to transmit all those frames. The first failure will filter the rest, freeing up the TX queue (and air) to transmit frames to other destinations;
  • When stations go into power save mode, you may have frames already in the hardware queue for said station. You can't cancel them (easily/cleanly/fast), so instead they'll just fail to transmit (as the receiver is asleep.) Now you just get them filtered; you store them away until the station wakes up and then you retransmit them. A little extra latency (which is ok for some things and not others!) but much, much lower packet loss.
This is all nice and easy. However, there are a few gotchas here.

Firstly, it filters all frames to that destination. For all TIDs, QoS or otherwise. That's not a huge deal; if however you're me and you have per-TID queues you need to requeue the frames into the correct queues to retry. No biggie.

Secondly, if a station is just far away or under interference, you'll end up filtering a lot of traffic to it. So a lot of frames will cycle through the filtered frames handling code. Right now in FreeBSD I'm treating them the same as normal software retransmissions and dropping them after 10 attempts. I have a feeling I need to fix that logic a bit as under heavy dropping conditions, the traffic is being prematurely filtered and prematurely dropped (especially when the node is going off-channel to do things like background scans.) So the retransmission and frame expiry is tricky. You can't just keep trying them forever as you'll just end up wasting TX air time and CPU time transmitting lots of frames that will just end up being immediately filtered. Yes, tricky.

Thirdly, there may be traffic that needs to go out to that node that can't be filtered. If that's the case, you may actually end up with some subsequent frames being transmitted. You then can't just naively requeue all of your failed/filtered frames or you'll transmit some frames out of sequence. You then _have_ to drop anything with a sequence number or that was queued _before_ the successfully transmitted frame.

Luckily for 802.11n aggregation sessions the third point isn't a big deal - you already can transmit out of sequence (just within the block-ack window or BAW), so you can just retransmit filtered frames with sequence numbers in any particular sequence. Life is good there.

So what have I done in FreeBSD?

I have filtered frames support working for 802.11n aggregation sessions. I haven't yet implemented software retransmission (and all of the hairy logic to handle point #3 above) for non-aggregate traffic, so I can't do filtered frames there. But I'm going to have to do it in order to nicely support AP power save operation.

Droso » FreeBSD 2012-09-26 07:30:33

With only three weeks to go, we so far have 7 people registered for the Ports and Packages Summit during the DevSummit at EuroBSDCon in Warsaw.
I’m sure that can’t be right. If you intend to come, please register (by sending an email to me) as soon as possible. If you don’t intend to come, please reconsider.

So far we have 4 main topics to discuss in 2 1,5 hour slots:
- Status of the move to subversion
- Status of the implementation and uptake of the new package tools
- Status and proposed schedule for scheduled releases of binary packages
- Quality assurance in all shapes and forms: QAT, redports, pointyhat

Please send any topics you’d like to discuss, presentations to present, and other items that should go on the schedule to me in the next week or two so I can prepare a draft agenda at least a week before.

Thank you and see you there!

New release: ELF Toolchain v0.6.1

I am pleased to announce the availability of version 0.6.1 of the software being developed by the ElfToolChain project.

This new release supports additional operating systems (DragonFly BSD, Minix and OpenBSD), in addition to many bug fixes and documentation improvements.

This release also marks the start of a new "stable" branch, for the convenience of downstream projects interested in using our code.

Comments welcome.

The Ports Management Team 2012-09-17 16:21:45

It was recently posted on, http://blogs.freebsdish.org/portmgr/2012/09/01/change-to-the-header-in-ports-makefiles/ that we would adopt a new header for the ports Makefiles. The initial discussion seemed to show enough support for the idea of completely stripping the header, leaving only the $FreeBSD$ tag. After the announcement was made, more people stated strong feelings that when and where possible attribution be maintained in the header.

A private discussion was held among ports committers, and while opinions were as varied as the individuals who shared them, it was decided to unify on a two line header.

# Created by: J.Q. Public <[email protected]>
# $FreeBSD$

The Whom line from the classic six line header becomes Created By.

Sometimes, as a result of a repocopy, or changed maintainership, the Created By and MAINTAINER is no longer in synchronisation. To avoid confusion, the first line can be removed, optionally leaving us with a one line header.

# $FreeBSD$

Removing the line of attribution is to be done only at the consent/request of the original contributor.

As before, we ask this header only be updated in conjunction with a regular update, as we do not want any unnecessary churn to the repo prior to the pending Ports Feature Freeze.

Thomas
on behalf of portmgr@

The Ports Management Team 2012-09-09 03:58:43

The development of FreeBSD ports is done in Subversion nowadays. For the sake of compatibility a Subversion to CVS exporter is in place which has some limitations. For CVSup mirroring cvsup based on Ezm3 is used which breaks regularly especially on amd64 and with Clang and becomes more and more unmaintainable.

Read more at http://lists.freebsd.org/pipermail/freebsd-ports/2012-September/078099.html

The Ports Management Team 2012-09-01 06:41:32

An idea has been floating around for some time, and it was brought up again on the ports@ mailing list recently, please remove the extraneous header information from the Makefile, leaving only the $FreeBSD$ id on the first line.

It is an idea that is long overdue, so from now on, the other fives lines shall be removed.

We do request that this be done sparingly in the short term, as we do not want to cause any additional churn on the repo as we approach our upcoming Ports Feature Freeze, still tentatively scheduled for September 7.

So please proceed only on existing updates. Please do not do any sweeping commits until we have the ports tree stablised post 9.1 tagging. Also bear in mind that Redports/QAT queues a job for every change done to a Makefile, we do not want to overburden the QAT at this time. It is important to allow this service to run at peek efficiency at this time to ensure it’s full potential as we approach the upcoming Feature Freeze.

The new look of the Makefile has been document in the Porter’s Handbook.

The next item on the todo list is to update devel/newfile for those that do a port create.

Thomas
on behalf of portmgr@

Using pkgng in real life

I have been using pkgng on a few machines now and I'd like to share some thoughts about how it behaves in real-world situations. Overall, I'm very happy with it and it's immensely better than what we had before. There are some rough edges which need to be solved but those are mostly a property of the ports system itself rather than pkgng.

Read more...

The Ports Management Team 2012-08-23 16:26:19

Florent Thoumie, aka flz@, recently stepped down from his roll on the FreeBSD Ports Management team.

Florent started on portmgr back in August 2008, being instrumental in maintaining the legacy pkg_* code plus other aspects of the ports infrastructure, including but not limited to the unifying of the code base for the ports build system.

On behalf of the Ports Management team, we want to thank Florent for his years of service, he will be missed.

Thomas
on behalf of portmgr@

PmcTools: Motivation and Future Steps

This (long delayed) post describes the original motivation for my PmcTools whole-system profiling toolkit, and touches on some of the possible next steps for the project.

Motivation

Around the year 2000, I happened to read a paper titled "Continuous Profiling: Where Have All the Cycles Gone?". The techniques described by the paper were inspiring, but the DEC Alpha™ systems they were implemented on were out of reach for a hobbyist living in India. A FreeBSD™-equivalent of those tools and techniques, running on affordable hardware, seemed a good idea.

From the outset, my goal was to create a programming toolkit for using in-CPU performance counters:

  • I wanted an API that would permit tools to fully use the features provided by the hardware.
  • I wanted tools that had low overheads, in order not to disturb the behaviour of the system being measured.
  • I wanted tools that were non-disruptive—usable without needing to restart running processes, rebooting the system or requiring recompilation, etc.
  • I wanted to analyse the "whole system" at once; i.e., to simultaneously analyze userspace applications, the top-half of the kernel, and the kernel's interrupt handlers.
  • I wanted the toolset to be SMP-ready, since SMP seemed affordable in the future.

When affordable systems using AMD Athlon™ CPUs (with publically documented in-CPU performance counters) entered the Indian market in early 2003, I built myself a machine, and started on the project.

(Pre-)History

  • 2003: Initial work, which was managed using homebrew tools tuned for dialup speeds (shell scripts running RCS layered over CVS/CVSup).
  • 2004: With the arrival of broadband access, development moved to FreeBSD's Perforce™ server.
  • 2005: The first check-in into the FreeBSD source tree in April 2005.

Current Status

At the time of writing, PmcTools is being actively maintained and extended by the FreeBSD community.

The design of the toolkit is briefly described in a tech talk presented at ACM Bangalore in 2009 (slides).

Future Steps

Platforms, simplicity and portability are likely to be the focus of future work.

Platforms
PmcTools would be useful on popular hobby platforms such as the BeagleBoard. PmcTools already supports 'remote' data collection on embedded systems. However, the specific PMCs on these systems would need to be supported.
Simplicity
Based on the experience gained so far, both the programming APIs and the implementation of PmcTools could be simplified without losing useful functionality.
Portability
PmcTools would be a useful addition to other open-source operating systems.

In addition to the above, many innovative tools can be created: in the paper "Exploiting hardware performance counters with flow and context sensitive profiling", the authors show how to add PMC-based instrumentation to program binaries for fine-grained analyses. To be able to add such instrumentation, we need tools to parse and modify binary instruction streams—one of the motivations for the proposed libmc library, part of the Elftoolchain project.

Comments welcome.

The Ports Management Team 2012-08-02 06:18:53

After almost seven years, I figured it was about time for a new GPG key for the portmgr-secretary.

The key id is BBC4D7D5, the fingerprint is FB37 45C8 6F15 E8ED AC81 32FC D829 4EC3 BBC4 D7D5, and the public key is viewable at http://people.freebsd.org/~portmgr/portmgr-secretary.asc.

I have signed it with the old secretary key, as well as my own personal key.

Please update your keyring with the new info.

Thomas

Evilcoder » FreeBSD 2012-08-01 20:34:30

Dear readers of my blog,

I have a “simple” question for you. I Would like to do the following, can someone that reads this and has suggestions and ideas respond to me at [email protected]

I have three various mailrelays, I would like to finish off mail that shouldn’t get in at the border relays. For this I have setup LDAP so that all three relays can query this LDAP Server. To fill the LDAP I use the Virtualmin application to make this as automatic as possible.

Currently the Virtual-addresses and Aliases are all in LDAP, as well as the useraccounts that receive email. No specific tag is added for local users.

I would like to have the relays do the following:

- Receive mail from XXX
- do RBL checks
- do postscreen checks and the like
- resolve the destination address (expand alias or virtual account)
if the resolved destination address lives outside of my domain (mailforwarding accounts) i would like to deliver it there immediately.
- check whether the resolved destination address is listed as local user and send it to the internal mailserver
(The internal mailserver will receive mail for local-user and only has to do spam checks for this user, no need to expand aliases etc).

Suggestions are welcome :) )

A build system for the Elftoolchain project

Summary

This post describes the cross-platform build and test system being written for the Elftoolchain project.

The need for a cross-platform build and test system


In the upcoming v1.0 release of the Elftoolchain project we plan to support 6 operating system families---prominent *BSD OSes such as FreeBSD and NetBSD, Ubuntu GNU/Linux and Minix. We may additionally support an Illumos-derived operating system.

For each supported OS, we may need to test our sources on multiple released OS versions. We may need to build and test on multiple architectures (e.g., ARM, MIPS, i386 and X64/AMD64), depending on the target operating system.

Apart from mainline development, we would need to support maintenance branches of our source tree on these OS instances.

Clearly, some kind of automation is needed to help manage this combinatorial increase in support load.

Design goals


In brief, the design goals are:
  • The system should support builds of our source tree, on the target operating systems and machine architectures of our interest.
  • The system should support builds on non-native architectures (relative to the build host).
  • The system should allow a source tree that is in-development to be built and tested, prior to a check-in.
  • The system should be deployable with the minimum of software dependencies, and should be easy to configure.
  • The system should be able to run entirely on a relatively power and resource constrained system such as a laptop, i.e., without needing a beefy build box, or architecture-specific hardware.

Related Projects

Continuous integration systems such as Buildbot, Bitten and Hudson are commonly used to manage automated builds. While these tools are featureful, their large resource requirements, and the additional dependencies needed to run them (a Java/Python runtime, along with other dependencies) make these tools difficult to use in the Elftoolchain context.

The QEMU and VirtualBox programs are popular machine emulators. When running on X86/X64 hardware, these programs support the emulation of i386 and x86_64/amd64 CPUs. Additionally, QEMU can emulate non-native architectures using dynamic translation techniques. The GXemul project is a BSD-licensed machine emulator, similar to QEMU.

The Design


In the current design, the build system comprises of two major parts:

  • A simple daemon---a portable C program built using libevent, that runs inside the target OS in the machine emulator. This 'slave' component connects to a 'master/despatcher' component that runs on the build host and executes commands issued to it.
  • A 'master' component that is responsible for managing the build process at the top-level: starting up the relevant machine emulators, waiting for the OS inside to boot and for the 'slave' inside to connect back, transferring the source tree of interest into the slave, running the build/test cycle, collecting output files and output status, and shutting down the emulator cleanly.

Note that the actual build (& test) within a source tree is controlled using BSD make.

The protocol between the 'slave' and the 'master' components is spartan: it supports the execution of arbitrary shell script fragments on the slave with redirection of input and output, and supports simple data transfer between the 'slave' and the 'master'. In order to maintain responsiveness, the protocol between the 'slave' and the 'master' is asynchronous. Multiple 'slaves' could be connected to the 'master' concurrently.

The 'master' would be controlled by a set of configuration files and shell scripts.

The Implementation

The implementation (a work-in-progress) may be found in the tools/build-automation directory of the Elftoolchain project's source tree.

The implementation is being written as a literate program.

Comments welcome.

pkgng – best thing since sliced bread!

FreeBSD (and BSDs in general) traditionally have source-based upgrades and installs which extends to the third party software collections - ports or pkgsrc and similar. This is all fine and offers unprecedended flexibility when tailoring system to specific needs, but sometimes this flexibility is less important than ease of use or time savings which can only be achieved with binary packages. Enter pkgng, the next-generation binary package management system by Baptiste Daroussin and others, which replaces the old-style ports and packages system.

Read more...

FreeBSD 9.1 ports feature freeze

The FreeBSD 9.1 schedule has been published, http://www.freebsd.org/releases/9.1R/schedule.html. Historically we have done a Feature Freeze at RC1, we are going to try do it with RC2 this time, tentatively scheduled for August 3, subject to schedule slippage.

At the time the the Release Engineering team announces RC2 is ready, we will then enforce “Feature Safe” commits only. This means no sweeping changes will be allowed, see http://www.freebsd.org/portmgr/implementation.html#sweeping_changes

Once portmgr@ is satisfied that the requisite packages are built to ship with FreeBSD 9.1, the ports tree will be re-opened for business.

Thomas
on behalf of portmgr