Using the xdev target with qemu-user-static on #FreeBSD

I’ve been playing with building ports for ARM on an AMD64 machine via a bunch of tools.  The duct tape and bailing wire is a bit thick with this method, but if you keep at it, this should work.

1. build armv6 chroot:
make buildworld TARGET=arm TARGET_ARCH=armv6
make installworld TARGET=arm TARGET_ARCH=armv6 DESTDIR=/armv6
make distribution TARGET=arm TARGET_ARCH=armv6 DESTDIR=/armv6

2. build xdev
make xdev TARGET=arm TARGET_ARCH=armv6 NOSHARED=y

3. move xdev into chroot
mv /usr/armv6-freebsd /armv6/usr/

4. add toolchain to make.conf:
CFLAGS+=-integrated-as
CC=/usr/armv6-freebsd/usr/bin/cc
CPP=/usr/armv6-freebsd/usr/bin/cpp
CXX=/usr/armv6-freebsd/usr/bin/c++
AS=/usr/armv6-freebsd/usr/bin/as
NM=/usr/armv6-freebsd/usr/bin/nm
LD=/usr/armv6-freebsd/usr/bin/ld
OBJCOPY=/usr/armv6-freebsd/usr/bin/objcopy
SIZE=/usr/armv6-freebsd/usr/bin/size
STRIPBIN=/usr/armv6-freebsd/usr/bin/strip
5. Install qemu-static-user from ports and copy into jail:
pkg instlal qemu-static-user
mkdir -p /armv6/usr/local/bin
cp /usr/local/bin/qemu-arm /armv6/usr/local/bin/

6. setup binmiscctl to handle armv6 translations:
binmiscctl add armv6 –interpreter “/usr/local/bin/qemu-arm” –magic “x7fx45x4cx46x01x01x01x00x00x00x00x00x00x00x00x00x02x00x28x00″ –mask “xffxffxffxffxffxffxffx00xffxffxffxffxffxffxffxffxfexffxffxff” –size 20 –set-enabled

7. mount devfs and ports if needed
mount -t devfs devfs /armv6/dev
mount -t nullfs /usr/ports /armv6/usr/ports

8. chroot
chroot /armv6

Foundation is Accepting Travel Grant Applications for EuroBSDCon 2014

Calling all FreeBSD developers needing assistance with travel expenses to EuroBSDCon 2014.

The FreeBSD Foundation will be providing a limited number of travel grants to individuals requesting assistance. Please fill out and submit  the Travel Grant Request Application at http://www.freebsdfoundation.org/documents/TravelRequestForm.pdf by August 15th, 2014 to apply for this grant.

This program is open to FreeBSD developers of all sorts (kernel hackers,  documentation authors, bugbusters, system administrators, etc).  In some cases we are also able to fund non-developers, such as active community members and FreeBSD advocates.

If you are a speaker at the conference, we expect the conference to cover your travel costs, and will most likely not approve your direct request to us.

There's some flexibility in the mechanism, so talk to us if something about the model doesn't quite work for you or if you have any questions. The travel grant program is one of the most effective ways we can spend money to help support the FreeBSD Project, as it helps developers get together in the same place at the same time, and helps advertise and advocate FreeBSD in the larger community.

bsdtalk243 – mandoc with Ingo Schwarze

Interview about mandoc with Ingo Schwarze.  The project webpage describes mandoc as "a suite of tools compiling mdoc, the roff macro language of choice for BSD manual pages, and man, the predominant historical language for UNIX manuals."

Recorded at BSDCan 2014.

File Info: 16Min, 8MB.

Ogg Link: http://cis01.uma.edu/~wbackman/bsdtalk/bsdtalk243.ogg

pkg 1.3.0 out!

Hi all,

I’m very please to announce the release of pkg 1.3.0
This version is the result of almost 9 month of hard work

Here are the statistics for the version:
- 373 files changed, 66973 insertions(+), 38512 deletions(-)
- 29 different contributors

Please not that for the first time I’m not the main contributor, and I would
like to particularly thanks Vsevold Stakhov for all the hard work he has done to
allow us to get this release out. I would like also to give a special thanks to
Andrej Zverev for the tons of hours spending on testing and cleaning the bug
tracker!

So much has happened that it is hard to summarize so I’ll try to highlight the
major points:
- New solver, now pkg has a real SAT solver able to automatically handle
conflicts and dynamically discover them. (yes pkg set -o is deprecated now)
- pkg install now able to install local files as well and resolve their
dependencies from the remote repositories
- Lots of parts of the code has been sandboxed
- Lots of rework to improve portability
- Package installation process has been reworked to be safer and handle properly
the schg flags
- Important modification of the locking system for finer grain locks
- Massive usage of libucl
- Simplification of the API
- Lots of improvements on the UI to provide a better user experience.
- Lots of improvements in multi repository mode
- pkg audit code has been moved into the library
- pkg -o A=B that will overwrite configuration file from cli
- The ui now support long options
- The unicity of a package is not anymore origin
- Tons of bug fixes
- Tons of behaviours fixes
- Way more!

Thank you to all contributors:
Alberto Villa, Alexandre Perrin, Andrej Zverev, Antoine Brodin, Brad Davis,
Bryan Drewery, Dag-Erling Smørgrav, Dmitry Marakasov, Elvira Khabirova, Jamie
Landeg Jones, Jilles Tjoelker, John Marino, Julien Laffaye, Mathieu Arnold,
Matthew Seaman, Maximilian Gaß, Michael Gehring, Michael Gmelin, Nicolas Szalay,
Rodrigo Osorio, Roman Naumann, Rui Paulo, Sean Channel, Stanislav E. Putrya,
Vsevolod Stakhov, Xin Li, coctic

Regards,
Bapt on behalf of the pkg@

vc4 driver month 1

I've just pushed the vc4-sim-validate branch to my Mesa tree. It's the culmination of the last week's worth pondering and false starts since I got my first texture sampling in simulation last Wednesday.

Handling texturing on vc4 safely is a pain. The pointer to texture contents doesn't appear in the normal command stream, and instead it's in the uniform stream. Which uniform happens to contain the pointer depends on how many uniforms have been loaded by the time you get to the QPU_W_TMU[01]_[STRB] writes. Since there's no iommu, I can't trust userspace to tell me where the uniform is, otherwise I'd be allowing them to just lie and put in physical addresses and read arbitrary system memory.

This meant I had to write a shader parser for the kernel, have that spit out a collection of references to texture samples, switch the uniform data from living in BOs in the user -> kernel ABI and instead be passed in as normal system memory that gets copied to the temporary exec bo, and then do relocations on that.

Instead of trying to write this in the kernel, with a ~10 minute turnaround time per test run, I copied my kernel code into Mesa with a little bit of wrapper code to give a kernel-like API environment, and did my development on that. When I'm looking at possibly 100s of iterations to get all the validation code working, it was well worth the day spent to build that infrastructure so that I could get my testing turnaround time down to about 15 sec.

I haven't done actual validation to make sure that the texture samples don't access outside of the bounds of the texture yet (though I at least have the infrastructure necessary now), just like I haven't done that validation for so many other pointers (vertex fetch, tile load/stores, etc.). I also need to copy the code back out to the kernel driver, and it really deserves some cleanups to add sanity to the many different addresses involved (unvalidated vaddr, validated vaddr, and validated paddr of the data for each of render, bin, shader recs, uniforms). But hopefully once I do that, I can soon start bringing up glamor on the Pi (though I've got some major issue with tile allocation BO memory management before anything's stable on the Pi).

Application awareness of receive side scaling (RSS) on FreeBSD

Part of testing this receive side scaling work is designing a set of APIs that allow for some kind of affinity awareness. It's not easy - the general case is difficult and highly varying. But something has to be tested! So, where should it begin?

The main tricky part of this is the difference between incoming, outgoing and listening sockets.

For incoming traffic, the NIC has already calculated the RSS hash value and there's already a map between RSS hash and destination CPU. Well, destination queue to be much more precise; then there's a CPU for that queue.

For outgoing traffic, the thread(s) in question can be scheduled on any CPU core and as you have more cores, it's increasingly unlikely to be the right one. In FreeBSD, the default is to direct dispatch transmit related socket and protocol work in the thread that started it, save a handful of places like TCP timers. Once the driver if_transmit() method is called to transmit a frame it can check the mbuf to see what the flowid is and map that to a destination transmit queue. Before RSS, that's typically done to keep packets vaguely in some semblance of in-order behaviour - ie, for a given traffic flow between two endpoints (say, IP, or TCP, or UDP) the packets should be transmitted in-order. It wasn't really done for CPU affinity reasons.

Before RSS, there was no real consistency with how drivers hashed traffic upon receive, nor any rules on how it should select an outbound transmit queue for a given buffer. Most multi-queue drivers got it "mostly right". They definitely didn't try to make any CPU affinity choices - it was all done to preserve the in-order behaviour of traffic flows.

For an incoming socket, all the information about the destination CPU can be calculated from the RSS hash provided during frame reception. So, for TCP, the RSS hash for the received ACK during the three way handshake goes into the inpcb entry. For UDP it's not so simple (and the inpcb doesn't get a hash entry for UDP - I'll explain why below.)

For an outgoing socket, all the information about the eventual destination CPU isn't necessarily available. If the application knows the source/destination IP and source/destination port then it (or the kernel) can calculate the RSS hash that the hardware would calculate upon frame reception and use that to populate the inpcb. However this isn't typically known - frequently the source IP and port won't be explicitly defined and it'll be up to the kernel to choose them for the application. So, during socket creation, the destination CPU can't be known.

So to make it simple (and to make it simple for me to ensure the driver and protocol stack parts are working right) my focus has been on incoming sockets and incoming packets, rather than trying to handle outgoing sockets. I can handle outbound sockets easily enough - I just need to do a software hash calculation once all of the required information is available (ie, the source IP and port is selected) and populate the inpcb with that particular value. But I decided to not have to try and debug that at the same time as I debugged the driver side and the protocol stack side, so it's a "later" task.

For TCP, traffic for a given connection will use the same source/destination IP and source/destination port values. So for a given socket, it'll always hash to the same value. However, for UDP, it's quite possible to get UDP traffic from a variety of different source IP/ports and respond from a variety of different source/IP ports. This means that the RSS hash value that we can store in the inpcb isn't at all guaranteed to be the same for all subsequent socket writes.

Ok, so given all of that above information, how exactly is this supposed to work?

Well, the slightly more interesting and pressing problem is how to break out incoming requests/packets to multiple receive threads. In traditional UNIX socket setups, there are a couple of common design patterns for farming off incoming requests to multiple worker threads:

  • There's one thread that just does accept() (for TCP) or recv() (for UDP) and it then farms off new connections to userland worker threads; or
  • There are multiple userland worker threads which all wait on a single socket for accept() or recv() - and hope that the OS will only wake up one thread to hand work to.
It turns out that the OS may wake up one thread at a time for accept() or recv() but then userland threads will sit in a loop trying to accept connections / packets - and then you tend to find they get called a lot only to find another worker thread that was running stole the workload. Oops.

I decided this wasn't really acceptable for the RSS work. I needed a way to redirect traffic to a thread that's also pinned to the same CPU as the receive RSS bucket. I decided the cheapest way would be to allow multiple PCB entries for the same socket details (eg, multiple TCP sockets listening on *:80). Since the PCBGROUPS code in this instance has one PCB hash per RSS bucket, all I had to do was to teach the stack that wildcard listen PCB entries (eg, *:80) could also exist in each PCB hash bucket and to use those in preference to the global PCB hash.

The idea behind this decision is pretty simple - Robert Watson already did all this great work in setting up and debugging PCBGROUPS and then made the RSS work leverage that. All I'd have to do is to have one userland thread in each RSS bucket and have the listen socket for that thread be in the RSS bucket. Then any incoming packet would first check the PCBGROUP that matched the RSS bucket indicated by the RSS hash from the hardware - and it'd find the "right" PCB entry in the "right" PCBGROUP PCB has table for the "right" RSS bucket.

That's what I did for both TCP and UDP.

So the programming model is thus:

  • First, query the RSS sysctl (net.inet.rss) for the RSS configuration - this gives the number of RSS buckets and the RSS bucket -> CPU mapping.
  • Then create one worker thread per RSS bucket..
  • .. and pin each thread to the indicated CPU.
  • Next, each worker thread creates one listen socket..
  • .. sets the IP_BINDANY or IP6_BINDANY option to indicate that there'll be multiple RSS entries bound to the given listen details (eg, binding to *:80);
  • .. then IP_RSS_LISTEN_BUCKET to set which RSS bucket the incoming socket should live in;
  • Then for UDP - call bind()
  • Or for TCP - call bind(), then call listen()
Each worker thread will then receive TCP connections / UDP frames that are local to that CPU. Writing data out the TCP socket will also stay local to that CPU. Writing UDP frames out doesn't - and I'm about to cover that.

Yes, it's annoying because now you're not just able to choose an IO model that's convenient for your application / coding style. Oops.

Ok, so what's up with UDP?

The problem with UDP is that outbound responses may be to an arbitrary destination setup and thus may actually be considered "local" to another CPU. Most common services don't do this - they'll send the UDP response to the same remote IP and port that it was sent from.

My plan for UDP (and TCP in some instances, see below!) is four-fold:

  • When receiving UDP frames, optionally mark them with RSS hash and flowid information.
  • When transmitting UDP frames, allow userspace to inform the kernel about a pre-calculated RSS hash / flow information.
  • For the fully-connected setup (ie, where a single socket is connect() ed to a given UDP remote IP:port and frame exchange only occurs between the fixed IP and port details) - cache the RSS flow information in the inpcb;
  • .. and for all other situations (if it's not connected, if there's no hint from userland, if it's going to a destination that isn't in the inpcb) - just do a software hash calculation on the outgoing details.
I mostly have the the first two UDP options implemented (ie, where userland caches the information to re-use when transmitting the response) and I'll commit them to FreeBSD soon. The other two options are the "correct" way to do the default methods but it'll take some time to get right.

Ok, so does it work?

I don't have graphs. Mostly because I'm slack. I'll do up some before I present this - likely at BSDCan 2015.

My testing has been done with Intel 1G and 10G NICs on desktop Ivy Bridge 4-core hardware. So yes, server class hardware will behave better.

For incoming TCP workloads (eg a webserver) then yes, there's no lock contention between CPUs in the NIC driver or network stack any longer. The main lock contention between CPUs is the VM and allocator paths. If you're doing disk IO then that'll also show up.

For incoming UDP workloads, I've seen it scale linearly on 10G NICs (ixgbe(4)) from one to four cores. This is with no-defragmentation, 510 byte sized datagrams.

Ie, 1 core reception (ie, all flows to one core) was ~ 250,000 pps into userland with just straight UDP reception and no flow/hash information via recvmsg(); 135,000 pps into userland with UDP reception and flow/hash information via recvmsg().

4 core reception was ~ 1.1 million pps into userland, roughly ~ 255,000 pps per core. There's no contention between CPU cores at all.

Unfortunately what I was sending was markedly different. The driver quite happily received 1.1 million frames on one queue and up to 2.1 million when all four queues were busy. So there's definitely room for improvement.

Now, there is lock contention - it's just not between CPU cores. Now that I'm getting past the between-core contention, we see the within-core contention.

For TCP HTTP request reception and bulk response transmission, most of the contention I'm currently seeing is between the driver transmit paths. So, the following occurs:

  • TCP stack writes some data out;
  • NIC if_transmit() method is called;
  • It tries to grab the queue lock and succeeds;
It then appends the frame to the buf_ring and schedules a transmit out the NIC. This bit is fine.

But then whilst the transmit lock is held, because the driver is taking frames from the buf_ring to push into the NIC TX DMA queue
  • The NIC queue interrupt fires, scheduling the software interrupt thread;
  • This pre-empts the existing running transmit thread;
  • The NIC code tries to grab the transmit lock to handle completed transmissions;
  • .. and it fails, because the code it preempted holds the transmit lock already.
So there's some context switching and thrashing going on there which needs to be addressed.

Ok, what about UDP? It turns out there's some lock contention with the socket receive buffer.

The soreceive_dgram() routine grabs the socket receive buffer (SOCKBUF_LOCK()) to see if there's anything to return. If not, and if it can sleep, it'll call sbwait() that will release the lock and msleep() waiting for the protocol stack to indicate that something has been received. However, since we're receiving packets at such a very high rate, it seems that the receive protocol path contends with the socket buffer lock that is held by the userland code trying to receive a datagram. It pre-empts the user thread, tries to grab the lock and fails - and then goes to sleep until the userland code finishes with the lock. soreceive_dgram() doesn't hold the lock for very long - but I do see upwards of a million context switches a second.

To wrap up - I'm pleased with how things are going. I've found and fixed some issues with the igb(4) and ixgbe(4) drivers that were partly my fault and the traffic is now quite happily and correctly being processed in parallel. There are issues with scaling within a core that are now being exposed and I'm glad to say I'm going to ignore them for now and focus on wrapping up what I've started.

There's a bunch more to talk about and I'm going to do it in follow-up posts.
  • what I'm going to do about UDP transmit in more detail;
  • what about creating outbound connections and how applications can be structured to handle this;
  • handling IP fragments and rehashing packets to be mostly in-order - and what happens when we can't guarantee ordering with the hardware hashing UDP frames to a 4-tuple;
  • CPU hash rebalancing - what if a specific bucket gets too much CPU load for some reason;
  • randomly creating a toeplitz RSS hash key at bootup and how that should be verified;
  • multi-socket CPU and IO domain awareness;
  • .. and whatever else I'm going to stumble across whilst I'm slowly fleshing this stuff out.
I hope to get the UDP transmit side of things completed in the next couple of weeks so I can teach memcached about TCP and UDP RSS. After that, who knows!

Using qemu-user to chroot and bootstrap other architectures on #FreeBSD

My last post spawned enough feedback that I thought I would dump some notes here for those interested in building a chroot on FreeBSD that allows you to test and prototype architectures, e.g. ARMv6 on AMD64.

The FreeBSD buildsys has many targets used for many things, the two we care about here are buildworld and distribution.  We will also be changing the output architecture through the use of TARGET and TARGET_ARCH command line variables.  I’ll assume csh is your shell here, just for simplicity.  You’ll need 10stable or 11current to do this, as it requires the binary activator via binmiscctl(8) which has not appeared in a release version of FreeBSD yet.

Checkout the FreeBSD source tree somewhere, your home directory will be fine and start a buildworld.  This will take a while, so get a cup of tea and relax.

make -s -j <number of cpus on your machine> buildworld TARGET=mips TARGET_ARCH=mips64 MAKEOBJDIRPREFIX=/var/tmp

Some valid combinations of TARGET/TARGET_ARCH are:

mips:mips

mips:mip64

arm:armv6

sparc64:sparc64

powerpc:powerpc

powerpc:powerpc64

i386:i386

amd64:amd64

Once this is done, you have an installable tree in /var/tmp.  You need to be root for the next few steps, su now and execute these steps:

make -s installworld TARGET=mips TARGET_ARCH=mips64 MAKEOBJDIRPREFIX=/var/tmp DESTDIR=/opt/test

DESTDIR is where you intend on placing the installed FreeBSD system.  I chose /opt/test here only because I wanted to be FAR away from anything in my running system.  Just to be clear here, this will crush and destroy your host computer without DESTDIR set.

Next, there are some tweaks that have to be done by the buildsys, so run this command as root:

make -s distribution TARGET=mips TARGET_ARCH=mips64 MAKEOBJDIRPREFIX=/var/tmp DESTDIR=/opt/test

Now we need to install the emulator tools (QEMU) to allow us to use the chroot on our system.  I suggest using emulators/qemu-user-static for this as Juergen Lock has set it up for exactly this purpose.  It will install only the tools you need here.

Once that is installed, via pkg or ports, setup your binary activator module for the architecture of your chroot.  Use the listed options on the QEMU user mode wiki page for the architecture you want.  I know the arguments are not straight forward, but there should be examples for the target that you are looking for.

For this mips/mips64 example:

binmiscctl add mips64elf –interpreter “/usr/local/bin/qemu-mips64-static”
–magic “x7fx45x4cx46x02x02x01x00x00x00x00x00x00x00x00x00x00x02x00x08″
–mask “xffxffxffxffxffxffxffx00xffxffxffxffxffxffxffxffxffxfexffxff”
–size 20 –set-enabled

Copy the binary qemu that you setup in this step *into* the chroot environment:

mkdir -p /opt/tmp/usr/local/bin

cp /usr/local/bin/qemu-mips64-static /opt/tmp/usr/local/bin/

Mount devfs into the chroot:

mount -t devfs devfs /opt/tmp/dev

Want to try building ports in your chroot?  Mount the ports tree in via nullfs:

mkdir /opt/tmp/usr/ports

mount -t nullfs /usr/ports /opt/tmp/usr/ports

And now, through the QEMU and FreeBSD, you can simply chroot into the environment:

chroot /opt/tmp

Hopefully, you can now “do” things as though you were running on a MIPS64 or whatever architecture machine you have as a target.

arm:armv6, mips:mips, mips:mips64 are working at about %80-90 functionality.  powerpc:powerpc64 and powerpc:powerpc are still a work in progress and need more work.  sparc64:sparc64 immediately aborts and probably needs someone with an eye familiar with the architecture to give QEMU a look.  If you are interested in further development of the qemu-user targets, please see my github repo and clone away.

If you are looking to see what needs to be done, Stacey Son has kept an excellent log of open item on the FreeBSD Wiki

DNS improvements in FreeBSD 11

Erwin Lansing just posted a summary of the DNS session at the FreeBSD DevSummit that was held in conjunction with BSDCan 2014 in May. It gives a good overview of the current state of affairs, including known bugs and plans for the future.

I’ve been working on some of these issues recently (in between $dayjob and other projects). I fixed two issues in the last 48 hours, and am working on two more.

Reverse lookups in private networks

Fixed in 11.

In its default configuration, Unbound 1.4.22 does not allow reverse lookups for private addresses (RFC 1918 and the like). NLNet backported a patch from the development version of Unbound which adds a configuration option, unblock-lan-zones, which disables this filtering. But that alone is not enough, because the reverse zones are actually signed (EDIT: the problem is more subtle than that, details in comments); Unbound will attempt to validate the reply, and will reject it because the zone is supposed to be empty. Thus, for reverse lookups to work, the reverse zones for all private address ranges must be declared as insecure:

server:
    # Unblock reverse lookups for LAN addresses
    unblock-lan-zones: yes
    domain-insecure:      10.in-addr.arpa.
    domain-insecure:     127.in-addr.arpa.
    domain-insecure: 254.169.in-addr.arpa.
    domain-insecure:  16.172.in-addr.arpa.
    # ...
    domain-insecure:  31.172.in-addr.arpa.
    domain-insecure: 168.192.in-addr.arpa.
    domain-insecure: 8.e.ip6.arpa.
    domain-insecure: 9.e.ip6.arpa.
    domain-insecure: a.e.ip6.arpa.
    domain-insecure: b.e.ip6.arpa.
    domain-insecure: d.f.ip6.arpa.

FreeBSD 11 now has both the unblock-lan-zones patch and an updated local-unbound-setup script which sets up the reverse zones. To take advantage of this, simply run the following command to regenerate your configuration:

# service local_unbound setup

This feature will be available in FreeBSD 10.1.

Building libunbound writes to /usr/src (#190739)

Fixed in 11.

The configuration lexer and parser were included in the source tree instead of being generated at build time. Under certain circumstances, make(1) would decide that they needed to be regenerated. At best, this inserted spurious changes into the source tree; at worst, it broke the build.

Part of the reason for this is that Unbound uses preprocessor macros to place the code generated by lex(1) and yacc(1) in its own separate namespace. FreeBSD’s lex(1) is actually Flex, which has a command-line option to achieve the same effect in a much simpler manner, but to take advantage of this, the lexer needed to be cleaned up a bit.

Allow local domain control sockets

Work in progress

An Unbound server can be controlled by sending commands (such as “reload your configuration file”, “flush your cache”, “give me usage statistics”) over a control socket. Currently, this can only be a TCP socket. Ilya Bakulin has developed a patch, which I am currently testing, that allows Unbound to use a local domain (aka “unix”) socket instead.

Allow unauthenticated control sockets

Work in progress

If the control socket is a local domain socket instead of a TCP socket, there is no need for encryption and little need for authentication. In the local resolver case, only root on the local machine needs access, and this can be enforced by the ownership and file permissions of the socket. A second patch by Ilya Bakulin makes encryption and authentication optional so there is no need to generate and store a client certificate in order to use unbound-control(8).

Cross building ports with qemu-user and poudriere-devel on #FreeBSD

I’ve spent the last few months banging though the bits and pieces of the work that Stacey Son implemented for QEMU to allow us to more or less chroot into a foreign architecture as though it were a normal chroot.  This has opened up a lot of opportunities to bootstrap the non-x86 architectures on FreeBSD.

Before I get started, I’d like to thank Stacey Son, Ed Maste, Juergen Lock, Peter Wemm, Justin Hibbits, Alexander Kabaev, Baptiste Daroussin and Bryan Drewery for the group effort in getting us the point of working ARMv6, MIPS32 and MIPS64 builds.  This has been a group effort for sure.

This will require a 10stable or 11current machine, as this uses Stacey’s binary activator patch to redirect execution of binaries through QEMU depending on the ELF header of the file.  See binmiscctl(8) for more details.

Mechanically, this is a pretty easy setup.  You’ll need to install ports-mgmt/poudriere-devel with the qemu-user option selected.  This will pull in the qemu-user code to emulate the environment we need to get things going.

I’ll pretend that you want an ARMv6 environment here.  This is suitable to build packages for the Rasberry PI and Beagle Bone Black.  Run this as root:

binmiscctl add armv6 –interpreter “/usr/local/bin/qemu-arm” –magic
“x7fx45x4cx46x01x01x01x00x00x00x00x00x00x00x00x00x02
x00x28x00″ –mask “xffxffxffxffxffxffxffx00xffxffxffxff
xffxffxffxffxfexffxffxff” –size 20 –set-enabled

This magic will load the imgact_binmisc.ko kernel module.  The rest of the command line instructs the kernel to redirect execution though /usr/local/bin/qemu-arm if the ELF header of the file matches an ARMv6 signature.

Build your poudriere jail (remember to install poudriere-devel for now as it has not been moved to stable at the time of this writing) with the following command:

poudriere jail -c -j 11armv632 -m svn -a armv6 -v head

Once this is done, you will be able to start a package build via poudriere bulk as you normally would:

poudriere bulk -j 11armv632 -a

or

poudriere bulk -j 11armv632 -f <my_file_of_ports_to_build>

Currently, we are running the first builds in the FreeBSD project to determine what needs the most attention first.  Hopefully, soon we’ll have something that looks like a coherent package set for non-x86 architectures.

For more information, work in progress things and possible bugs in qemu-user code, see Stacey’s list of things at:

https://wiki.freebsd.org/QemuUserModeHowTo

https://wiki.freebsd.org/QemuUserModeToDo

BSDCan 2014 – Ports and Packages WG

Baptiste Daroussin started the session with a status update on package building. All packages are now built with poudriere. The FreeBSD Foundation sponsored some large machines on which it takes around 16 hours to build a full tree. Each Wednesday at 01:00UTC the tree is snapshot and an incremental build is started for all supported released, the 2 stable branches (9 and 10) and quarterly branches for 9.x-RELEASE and 10.x-RELEASE. The catalogue is signed on a dedicated signing machine before upload. Packages can be downloaded from 4 mirrors (us-west, us-east, UK, and Russia) and feedback so far has been very positive.

He went on to note that ports people need better coordination with src people on ABI breakage. We currently only support i386 and amd64, with future plans for ARM and a MIPS variant. Distfiles are not currently mirrored (since fixed), and while it has seen no progress, it’s still a good idea to build a pkg of the ports tree itself.

pkg 1.3 will include a new solver, which will help 'pkg upgrade' understand that an old packages needs to be replaced with a newer one, with no more need for 'pkg set' and other chicanery. Cross building ports has been added to the ports tree, but is waiting for pkg-1.3. All the dangerous operations in pkg have now been sandboxed as well.

EOL for pkg_tools has been set for September 1st. An errata notice has gone out that adds a default pkg.conf and keys to all supported branches, and nagging delays have been added to ports.

Quarterly branches based on 3 month support cycle has been started on an evaluation basis. We’re still unsure about the manpower needed to maintain those. Every quarter a snapshot of the tree is created and only security fixed, build and runtime fixed, and upgrades to pkg are allowed to be committed to it. Using the MFH tag in a commit message will automatically send an approval request to portmgr and an mfh script on Tools/ makes it easy to do the merge.

Experience so far has been good, some minor issues to the insufficient testing. MFHs should only contain the above mentioned fixes; cleanups and other improvements should be done in separate commits only to HEAD. A policy needs to be written and announced about this. Do we want to automatically merge VuXML commits, or just remove VuXML from the branch and only use the one in HEAD?

A large number of new infrastructure changes have been introduces over the past few months, some of which require a huge migration of all ports. To speed these changes up, a new policy was set to allow some specific fixes to be committed without maintainer approval. Experience so far has been good, things actually are being fixed faster than before and not many maintainers have complained. There was agreement that the list of fixes allowed to be committed without explicit approval should be a specific whitelist published by portmgr, and not made too broad in scope.

Erwin Lansing quickly measured the temperature of the room on changing the default protocol for fetching distils from MASTER_SITE_BACKUP from ftp to http. Agreement all around and erwin committed the change.

Ben Kaduk gave an introduction and update on MIT’s Athena Environment with some food for thought. While currently not FreeBSD based, he would like to see it become so. Based on debian/ubuntu and rolled out on hundreds of machines, it now has it’s software split into about 150 different packages and metapackages.

Dag-Erling Smørgrav discussed changes to how dependencies are handled, especially splitting dependencies that are needed at install time (or staging time) and those needed at run time. This may break several things, but pkg-1.3 will come with better dependency tracking solving part of the problem.

Ed Maste presented the idea of “package transparency”, loosely based on Google’s Certificate Transparency. By logging certificate issuance to a log server, which can be publicly checked, domain owners can search for certificates issued for their domains, and notice when a certificate is issued without their authority. Can this model be extended to packages? Mostly useful for individually signed packages, while we currently only sign the catalogue. Can we do this with the current infrastructure?

Stacy Son gave an update on Qemu user mode, which is now working with Qemu 2.0.0. Both static and dynamic binaries are supported, though only a handful of system call are supported.

Baptiste introduced the idea of having pre-/post-install scripts be a library of services, like Casper, for common actions. This reduces the ability of maintainers to perform arbitrary actions and can be sandboxed easily. This would be a huge security improvement and could also enhance performance.

Cross building is coming along quite well and most of the tree should be able to be build by a simple 'make package'. Major blockers include perl and python.

Bryan Drewery talked about a design for a PortsCI system. The idea is that committer easily can schedule a build, be it an exp-run, reference, QAT, or other, either via a web interface or something similar to a pull request, which can fire off a build.

Steve Wills talked about using Jenkins for ports. The current system polls SVN for commits and batches several changes together for a build. It uses 8 bhyve VMs instances, but is slow. Sean Bruno commented that there are several package building clusters right now, can they be unified? Also how much hardware would be needed to speed up Jenkins? We could duse Jenkins as a fronted for the system Bryan just talked about. Also, it should be able to integrate with phabricator.

Erwin opened up the floor to talk about freebsd-version(1) once more. It was introduced as a mechanism to find out the version of user land currently running as uname -r only represents the kernel version, and would thus miss updates of the base system that do no touch the kernel. Unfortunately, freebsd-version(1) cannot really be used like this in all cases, it may work for freebsd-update, but not in general. No real solution was found this time either.

The session ended with a discussion about packaging the base system. It’s a target for FreeBSD 11, but lots of questions are still to be answered. What granularity to use? What should be packages into how many packages? How to handle options? Where do we put the metadata for this? How do upgrades work? How to replace shared libraries in multiuser mode? This part also included the quote of the day: “Our buildsystem is not a paragon of configurability, but a bunch of hacks that annoyed people the most.”

Thanks to all who participated in the working group, and thanks again to DK Hostmaster for sponsoring my trip to BSDCan this year, and see you at the Ports and Packages WG meet up at EuroBSDCon in Sofia in September.

FreeBSD Developer Summit – BSDCan 2014 – Ports and Packages WG

Baptiste Daroussin started the session with a status update on package building. All packages are now built with poudriere. The FreeBSD Foundation sponsored some large machines on which it takes around 16 hours to build a full tree. Each Wednesday at 01:00UTC the tree is snapshot and an incremental build is started for all supported released, the 2 stable branches (9 and 10) and quarterly branches for 9.x-RELEASE and 10.x-RELEASE. The catalogue is signed on a dedicated signing machine before upload. Packages can be downloaded from 4 mirrors (us-west, us-east, UK, and Russia) and feedback so far has been very positive.

He went on to note that ports people need better coordination with src people on ABI breakage. We currently only support i386 and amd64, with future plans for ARM and a MIPS variant. Distfiles are not currently mirrored (since fixed), and while it has seen no progress, it’s still a good idea to build a pkg of the ports tree itself.

pkg 1.3 will include a new solver, which will help 'pkg upgrade' understand that an old packages needs to be replaced with a newer one, with no more need for 'pkg set' and other chicanery. Cross building ports has been added to the ports tree, but is waiting for pkg-1.3. All the dangerous operations in pkg have now been sandboxed as well.

EOL for pkg_tools has been set for September 1st. An errata notice has gone out that adds a default pkg.conf and keys to all supported branches, and nagging delays have been added to ports.

Quarterly branches based on 3 month support cycle has been started on an evaluation basis. We’re still unsure about the manpower needed to maintain those. Every quarter a snapshot of the tree is created and only security fixed, build and runtime fixed, and upgrades to pkg are allowed to be committed to it. Using the MFH tag in a commit message will automatically send an approval request to portmgr and an mfh script on Tools/ makes it easy to do the merge.

Experience so far has been good, some minor issues to the insufficient testing. MFHs should only contain the above mentioned fixes; cleanups and other improvements should be done in separate commits only to HEAD. A policy needs to be written and announced about this. Do we want to automatically merge VuXML commits, or just remove VuXML from the branch and only use the one in HEAD?

A large number of new infrastructure changes have been introduces over the past few months, some of which require a huge migration of all ports. To speed these changes up, a new policy was set to allow some specific fixes to be committed without maintainer approval. Experience so far has been good, things actually are being fixed faster than before and not many maintainers have complained. There was agreement that the list of fixes allowed to be committed without explicit approval should be a specific whitelist published by portmgr, and not made too broad in scope.

Erwin Lansing quickly measured the temperature of the room on changing the default protocol for fetching distils from MASTER_SITE_BACKUP from ftp to http. Agreement all around and erwin committed the change.

Ben Kaduk gave an introduction and update on MIT’s Athena Environment with some food for thought. While currently not FreeBSD based, he would like to see it become so. Based on debian/ubuntu and rolled out on hundreds of machines, it now has it’s software split into about 150 different packages and metapackages.

Dag-Erling Smørgrav discussed changes to how dependencies are handled, especially splitting dependencies that are needed at install time (or staging time) and those needed at run time. This may break several things, but pkg-1.3 will come with better dependency tracking solving part of the problem.

Ed Maste presented the idea of “package transparency”, loosely based on Google’s Certificate Transparency. By logging certificate issuance to a log server, which can be publicly checked, domain owners can search for certificates issued for their domains, and notice when a certificate is issued without their authority. Can this model be extended to packages? Mostly useful for individually signed packages, while we currently only sign the catalogue. Can we do this with the current infrastructure?

Stacy Son gave an update on Qemu user mode, which is now working with Qemu 2.0.0. Both static and dynamic binaries are supported, though only a handful of system call are supported.

Baptiste introduced the idea of having pre-/post-install scripts be a library of services, like Casper, for common actions. This reduces the ability of maintainers to perform arbitrary actions and can be sandboxed easily. This would be a huge security improvement and could also enhance performance.

Cross building is coming along quite well and most of the tree should be able to be build by a simple 'make package'. Major blockers include perl and python.

Bryan Drewery talked about a design for a PortsCI system. The idea is that committer easily can schedule a build, be it an exp-run, reference, QAT, or other, either via a web interface or something similar to a pull request, which can fire off a build.

Steve Wills talked about using Jenkins for ports. The current system polls SVN for commits and batches several changes together for a build. It uses 8 bhyve VMs instances, but is slow. Sean Bruno commented that there are several package building clusters right now, can they be unified? Also how much hardware would be needed to speed up Jenkins? We could duse Jenkins as a fronted for the system Bryan just talked about. Also, it should be able to integrate with phabricator.

Erwin opened up the floor to talk about freebsd-version(1) once more. It was introduced as a mechanism to find out the version of user land currently running as uname -r only represents the kernel version, and would thus miss updates of the base system that do no touch the kernel. Unfortunately, freebsd-version(1) cannot really be used like this in all cases, it may work for freebsd-update, but not in general. No real solution was found this time either.

The session ended with a discussion about packaging the base system. It’s a target for FreeBSD 11, but lots of questions are still to be answered. What granularity to use? What should be packages into how many packages? How to handle options? Where do we put the metadata for this? How do upgrades work? How to replace shared libraries in multiuser mode? This part also included the quote of the day: “Our buildsystem is not a paragon of configurability, but a bunch of hacks that annoyed people the most.”

Thanks to all who participated in the working group, and thanks again to DK Hostmaster for sponsoring my trip to BSDCan this year, and see you at the Ports and Packages WG meet up at EuroBSDCon in Sofia in September.

FreeBSD Developer Summit – BSDCan 2014 – DNS WG

The DNS Working Group at the FreeBSD Developer Summit at BSDCan this year was off to a good start by noticing that DNSSEC validation could not work on the University of Ottawa’s wireless network. The university’s resolvers added additional records to the root zone, thus failing validation at the root. This led to some discussion on how to provide a user-friendly way to explain this in an understandable way to the user and giver the user a choice of turning off validation or find another network. This certainly is going to be a major problem when turning on validation by default as broken resolvers are very common at hotels, coffee shops, etc. etc.

On a more positive note, all the FreeBSD projects zones are DNSSEC signed and all project-owned servers have SSHFP records in the zone. Dog food was eaten.

Dag-Erling Smørgrav started off by giving an overview of the current state of affairs. ldns and unbound are imported into base in HEAD and 10.x. unbound is meant to act as a local resolver only and as it is not linked to libevent, it will not scale to anything else. For a network-wide resolver or any other configuration, it is recommended to install unbound from ports. DES further went into some of the implementation details on how the base unbound is installed to make sure it does not conflict with an unbound installed from ports.

DES explained some issues he encountered with local and RFC1918 zones which are filtered by default by unbound. Others reported no issues with the right configuration options, so more investigation is needed.

Some people reported having difficulty getting patches accepted upstream by NLNetLabs, which gave some cause for concern as we clearly want a good and active working relationship with our DNS vendor. Others reported no problem working with NLNetLabs, quite the opposite, they are very interested to see the work going on in operation systems, so we’ll just need to build upon that relationship and make sure to invite them to the next WG meeting. Patches that are currently being worked on, DES has some code cleanups, Björn a DNS64 feature, should be submitted through the “normal” submission process and review with NLNetLabs and we’ll see how that goes.

Erwin Lansing started the brainstorm session on future work. Some command line tools would be nice to have; drill does most things one wants, but people are too used to writing dig and dig has many more options; Peter Wemm would like to see contrib scripts line ldns-dane, which are just really easy to use; the control socket should be a unix socket, there’s a patch floating around and should be submitted upstream.

The “Starbucks” problem came up again, with a proposal to turn on val-permissive-mode by default. Another solution may be by looking at how unbound-trigger does its magic.

After a coffee break, Peter Losher, ISC, went over some of the recent changes at ISC. BIND10 development has been handed over to a new project and ISC will concentrate on BIND9 and a stand-alone project for the DHCP component. BIND 9.10 was recently released and plans are in place for 9.11. ISC is open to suggestions and feature requests.

Peter brought up the topic of clientID for which a IETF draft (draft-edns0-client-subnet) is available. This would help client find the nearest CDN node, etc. ISC wants this to be an opt-out in operating systems as it will peel off a layer of anonymisation, and should be controllable by the user.

Next up was Michael Bentkofsky, Verisign, who, while not involved in the project himself, gave an introduction into the getDNS API, which is a replacement for getaddrinfo and allows the stub resolver to get validation information down at the client level. It’s available in ports. The discussion went into more of a brainstorm on how applications should get DNS and DNSSEC information and who gets to make decisions about its security. There should be a clear separation between policy and mechanism, where application programmers should not have to worry about this; it should be a system policy. There should be a higher level API where an application basically can ask the operating system for a “connection” and the operation system takes care of everything behind the scenes, DNS, DNSSEC, SSL, DANE, etc. and just return a socket, with some information on how the connection was established and which security mechanisms were used. In FreeBSD, it would make sense to let the Casper daemon hand out the different sub-tasks to ensure all lookups, cryptography, etc. are properly compartmentalised. One potential problem with passing on additional information is that all DNS lookups currently go through nsswitch, which would need to grow knowledge about that data as well. Are people still using other mechanisms for hostname lookups besides the hosts file and DNS? We can probably just remove nsswitch for the hostname lookups.

The session ended with some aims for the 11.0 release. We’ll need to have a wider discussion about the aforementioned removal of nsswitch out of the hostname lookups. We’ll also need a better understanding of what API capabilities applications may need. Can Casper provide all these? Can it run unbound behind the scenes to do all the DNS “stuff” for it? Can we capsicumize unbound and will that be accepted upstream? Enough food for thought and even more for writing code.

Thanks again to DK Hostmaster for sponsoring my trip to BSDCan this year, and see you at the DNS WG meet up at EuroBSDCon in Sofia in September.

FreeBSD 9.3-RELEASE Available

FreeBSD 9.3-RELEASE is now available. Please be sure to check the Release Notes and Release Errata before installation for any late-breaking news and/or issues with 9.3. More information about FreeBSD releases can be found on the Release Information page.

FreeBSD 9.3-RELEASE Now Available

FreeBSD 9.3-RELEASE Announcement

The FreeBSD Release Engineering Team is pleased to announce the availability of FreeBSD 9.3-RELEASE. This is the fourth release of the stable/9 branch, which improves on the stability of FreeBSD 9.2-RELEASE and introduces some new features.

Some of the highlights:
  • The zfs(8) filesystem has been updated to support the bookmarks feature.
  • The uname(1) utility has been updated to include the -U and -K flags, which print the __FreeBSD_version for the running userland and kernel, respectively.
  • The fetch(3) library has been updated to support SNI (Server Name Identification), allowing to use virtual hosts on HTTPS.
  • Several updates to gcc(1) have been imported from Google.
  • The hastctl(8) utility has been updated to output the current queue sizes.
  • The protect(1) command has been added, which allows exempting processes from being killed when swap is exhausted.
  • The etcupdate(8) utility, a tool for managing updates to files in /etc, has been merged from head/.
  • A new shared library directory, /usr/lib/private, has been added for internal-use shared libraries.
  • OpenPAM has been updated to Nummularia (20130907).
  • A new flag, "onifconsole" has been added to /etc/ttys. This allows the system to provide a login prompt via serial console if the device is an active kernel console, otherwise it is equivalent to off.
  • Sendmail has been updated to version 8.14.9.
  • BIND has been updated to version 9.9.5.
  • The xz(1) utility has been updated to a post-5.0.5 snapshot.
  • OpenSSH has been updated to version 6.6p1.
  • OpenSSL has been updated to version 0.9.8za.
For a complete list of new features and known problems, please see the online release notes and errata list, available at:
For more information about FreeBSD release engineering activities, please see:

Availability

FreeBSD 9.3-RELEASE is now available for the amd64, i386, ia64, powerpc, powerpc64, and sparc64 architectures.

FreeBSD 9.3-RELEASE can be installed from bootable ISO images or over the network. Some architectures also support installing from a USB memory stick. The required files can be downloaded via FTP as described in the section below. While some of the smaller FTP mirrors may not carry all architectures, they will all generally contain the more common ones such as amd64 and i386.

SHA256 and MD5 hashes for the release ISO and memory stick images are included at the bottom of this message.  A PGP-signed version of this announcement is available at:
Please refer to the official announcement email for the full details regarding FreeBSD 9.3-RELEASE.

Acknowledgments

Many companies donated equipment, network access, or man-hours to support the release engineering activities for FreeBSD 9.3 including The FreeBSD Foundation, Yahoo!, NetApp, Internet Systems Consortium, ByteMark Hosting, Sentex Communications, New York Internet, Juniper Networks, NLNet Labs, iXsystems, and Yandex.

The release engineering team for 9.3-RELEASE includes:
Glen Barber <[email protected]> Release Engineering Lead, 9.3-RELEASE Release Engineer
Konstantin Belousov <[email protected]> Release Engineering
Joel Dahl <[email protected]> Release Engineering
Baptiste Daroussin <[email protected]> Package Building
Bryan Drewery <[email protected]> Package Building
Marc Fonvieille <[email protected]> Release Engineering, Documentation
Steven Kreuzer <[email protected]> Release Engineering
Xin Li <[email protected]> Release Engineering, Security Officer
Josh Paetzel <[email protected]> Release Engineering
Colin Percival <[email protected]> Security Officer Emeritus
Craig Rodrigues <[email protected]> Release Engineering
Hiroki Sato <[email protected]> Release Engineering, Documentation
Gleb Smirnoff <[email protected]> Release Engineering
Ken Smith <[email protected]> Release Engineering
Dag-Erling Smøgrav <[email protected]> Security Officer
Marius Strobl <[email protected]> Release Engineering
Robert Watson <[email protected]> Release Engineering, Security

Trademark

FreeBSD is a registered trademark of The FreeBSD Foundation.

Love FreeBSD? Support this and future releases with a donation to The FreeBSD Foundation!