I committed the 64bit support for the linux base ports (disabled by default, check the commit message), but this broke the INDEX build. Portmgr was faster than me to revert it. All errors are mine. I think most of the work is done, I just need to find out what the correct way is to handle this make/fmake difference (malformed conditional).
I had a look at the open PR’s for a quick-win and found one where the dependencies where incomplete. Fixed.
Then I reviewed Alan Jude’s patch for 64bit linux_base-c6 ports (on amd64). Looks good so far. Just a few minor issues. I took the time to get familiar with reviews.FreeBSD.org and the arc command line tool, applied the patch to my source tree, worked a while on merge-conflicts, added some minor changes, and validated the download of the 32bit RPM’s of the linux_base-c6 port.
In between I also discussed/reviewed some fixes for docs with Dru, signed some PGP keys, and served as a source for a funny picture (at least what geeks/nerds consider a funny picture). I also checked how to allow multi-cast in jails. There is a PR with a patch inside, but it’s IPv6 only. I did something similar for IPv4 and compiled a kernel. No compile time issues, but as the system where I can easily test this is at home, I prefer to be in front of the box in case it panics (that tells something about my confidence level of my patch… no idea if what I do there is actually correct… ENOCLUE about the network code in the kernel).
TODO for the last day of the Hackathon:
- validate all RPM’s (download / distinfo) of the ports which changed
- validate the install/deinstall of the 32bit version of the ports for regression
- validate the 64bit install/deinstall for at least the linux base port (more if time permits tomorrow)
FDT overlay is an extension to FDT format that lets user to modify base FDT run-time: add new nodes, add new properties to existing nodes or modify existing properties. It’s useful when you have base board and some extension units like cape/shield for Pi/BBB or loadable FPGA logic for Zynq. I will not go into details you can find internals described on Adafruit or Raspberry Pi websites.
When dealing with overlays there are two options where to handle them: loader or kernel. Managing overlays at kernel level gives more flexibility but requires more related logic, e.g. re-init pinmux after applying overlay, re-run newbus probe/attach. On the other hand loader-level support is quite straightforward and involves nothing but DTB modifications and it’s a natural first step to adding FDT overlays to FreeBSD.
Proposed solution is to add fdt_overlays variable that contains coma-separated list of dtbo files, e.g.: “bbb-no-hdmi.dtbo,bbb-4dcape-43.dtbo”. This variable can be defined either as a loader(8) variable or as a u-boot env variable. During the boot ubldr load base DTB and right before passing control to the kernel it would go through files, load them from /boot/dtb/ direсtory on root partition and apply to the base blob. Final DTB would be passed to kernel.
You can find patch and review comments to it on Differential site: D3180. It contains:
- Extension to dtc to generate dynamic symbols and fixup info.
- ubldr fdt_overlays support
As Warner Losh mentioned it’s not clear yet how to deal with dynamic symbols support patch. It’s not part of official dtc tree though it’s accepted by RPi and BBB communities.
The Essen Hackathon 2015 starts. More or less around 6pm people started to show up (including myself). The socializing session (BBQ) had some funny/interesting stories, and provided already some interesting topics to have a closer look at.
Possible candidates where I can provide some input are around DTrace: How to use it (but probably Sean Chittenden has some much more interesting DTrace things to show) and how to add SDT probes to the kernel.
On the ports side I want to get some insight into the USES framework, to see if it may be easy to convert the linuxulator ports to it or not. Maybe I can also have a deeper look into patches for the 64bit side of the linux_base ports.
FreeBSD-9 introduced basic NUMA awareness in the physical allocator (sys/vm/vm_phys.c.) It implemented first-touch page allocation, and then fell back to searching through the domains, round-robin style. It wasn't perfect, for some workloads it was apparently okay. But it had some shortcomings - it wasn't configurable, UMA and other subsystems didn't know about NUMA domains, and the scheduler really didn't know about NUMA domains. So I'm sure there are plenty of workloads which it didn't work for.
That was all ripped out before FreeBSD-10. FreeBSD-10 NUMA just implements round-robin physical page allocation. It still tracks the per-domain physical memory regions, but it doesn't do any kind of NUMA aware allocation. From what I can gather, it was removed until something 'better' would land.
However, nothing (yet) has landed. So I decided I'd take a look into it. I found that for a lot of simple workloads (ie, where you're doing lots of anonymous memory allocation - eg, you're doing math crunching) the FreeBSD-9 model works fine. It's also a perfectly good starting point for experimenting.
So all my NUMA work in -HEAD does is provide an API to exactly the above. It doesn't teach the kernel APIs about domain aware allocations - there's currently no way to ask for memory from a specific domain when calling UMA, or contigmalloc, etc. The scheduler doesn't know about NUMA, so threads/processes will migrate off-socket very quickly unless you explicitly limit things. Devices don't yet do NUMA local work - the ACPI code is in there to enumerate which NUMA domain they're in, but it's not used anywhere just yet.
Then what is it good for?
If you're doing math workloads where you read in data into memory, do a bunch of work, and spit it out - it works fine. If you're running bhyve instances, you can run them using numactl and have them pinned to a local NUMA domain. Those coarse-grained things work fine. You can also change the system default back to round-robin and use first-touch or fixed-domain for specific processes. It's useful for exactly the same subset of tasks as it was in FreeBSD-9, but now it's at least configurable.
So what's next?
Well, my main aim is to get the minimum done so kernel side work is NUMA aware. This includes UMA, contigmalloc, malloc, mbuf allocation and such. It'd be nice to tag VM objects with a domain allocation policy, but that's currently out of scope. I'd also like to plumb in domain configuration into devices and allow devices to allocate memory for different driver threads with different policies.
But the first thing that showed up is that KVA allocation and superpages get in the way of malloc/contigmalloc working. Allocating memory in FreeBSD first allocates KVA space, then back-fills it with pages. As far as malloc/contigmalloc is concerned, KVA is KVA and it finds the first available space in a time-fast way. It then backfills it with physical pages. The superpage reservation bits (sys/vm/vm_reserv.[ch]) join together regions that are contiguous and in the same superpage and turn it into an allocation from the same superpage. These have no idea about NUMA domains. So, if you allocate a 4KiB page via malloc() from domain 0 and then try to allocate a 4KiB page from domain 1, it will likely mess it up:
- First page gets allocated - first KVA, then the underlying 2mb superpage is allocated and a 4k page is returned - from physical memory domain 0;
- Second page gets allocated - first KVA, and if it's adjacent or within the same 2mb superpage as the above allocation, it'll "fake" the page allocation via refcounting and it'll really be that same underlying superpage - but it's from physical memory domain 0.
So, here's 11ish or so year old Adrian. It's the early 90s. I was hiding in my bedroom, trying to make another crystal set out of random parts and scraping away the paint at my windowsill. In walks my Aunty, who introduces her new boyfriend.
"Hi, I'm Julian." he said. That wasn't all that interesting.
"Oh, are you making a crystal set?" .. ok, so that was interesting.
And, that was that. Suddenly, someone role-model-y shows up in my life out of the blue. There I was, an 11 year old who felt very mostly alone most of the time, and someone shows up who I can look up to and think I can relate to. So, I'm a sponge for everything he shows me. Whenever he comes over, he has some new story to tell, some new thing to show me. He would show me better ways of building transistor switch circuits when I was in the "make large arcs with car alternator" phase of my early teens. And, when I saved up and bought a PC, he started to show me programming.
Now, I was already programming. My parents had saved up and bought me an Amstrad CPC464. We had a second-hand commodore 64 for a short while, but that eventually somehow stopped working and I didn't have the clue to fix it. But I was programming Locomotive BASIC and dabbling in Z80 assembly when I was 12, and had "upgraded" to Turbo Pascal 6 when I hit high school. (Yes, school taught Turbo Pascal at Grade 10 level, and I decided to learn it a bit earlier. That's .. wow, that dates me.) I hadn't yet really stumbled into C yet. I had heard about it, but I didn't have anything that could write it.
Julian explained task switching to me one day during a walk along the beach. He explained that computers can just appear to be doing multiple things at once - but the CPU only does one thing at a time, and you can just switch things really quickly to give the appearance that it's multitasking. With that bright spark planted in my head, I went home and started dreaming up ways to make my Z80 based CPC do something like this.
My mother dragged me to McDonalds to apply for a job the moment I was legally able to (14 years, 9 months) and I saw a computer at a second hand shop - it was a $500 IBM PC/AT, with EGA monitor, two floppy disks and a printer. We put down a down-payment and I paid it off myself with my minimum wage money. Once I had that home I quickly erm, "acquired" a copy of Turbo Pascal for home and was off drawing funny little fractals.
So yes - it's Julian's fault I discovered FreeBSD. Yes, this is Julian Elischer. One day he showed me his computer, running something called BSD. He was trying to explain bourne shell scripting and the installer. I nodded, very confused, and eventually went back to the VGA programming book he lent me. He also showed me fractint running in X on his monochome 486 DX2-50 laptop. I had no idea what was going on under the scenes, only that the fractals were much more interesting than the ones I was drawing. So I took the VGA book home and started learning how to use the higher resolutions available. One thing stuck in my mind: so much bit-plane work. Ugh. One other thing stuck in my mind - reading from VGA memory is one of the slowest things you can do. Don't do it. Ever. (Do you hear that console driver authors? Don't do it. It's bad.)
One day he explained pointers to me. I had erm, "acquired" a copy of Turbo C 2.0 from a friend after failing to make much traction with the less friendly versions (Tiny C, for example.) I had coded up a few things, but I didn't really "get" it. So he sat me down with a pen and paper, and drew diagrams to explain what was going on. I remember that lightbulb going off in the back of my mind, as I dimly connected the whole idea of types and sizes together - and that was it. I was off and doing bad things to C code.
I eventually saved up enough for an updated 286 motherboard, then an updated graphics card (full VGA!), then a sound blaster card, and finally a 486-DX33 motherboard. He introduced me to his friend Peter (who had, and I believe still has, a rather extensive electronics collection) and handed me a FreeBSD-1.1 CDROM. I took it home, put it in, and .. it didn't do anything. My 486 had a soundblaster pro + CD-ROM, and .. well, FreeBSD-1.1 didn't speak to that hardware. So, I eventually put Slackware Linux 3.0 on the thing, and became a Linux nerd for a bit.
I did eventually try FreeBSD-1.1 on it - after putting a lot of FreeBSD bits on a lot of floppies - but I couldn't figure out what to do when it booted. This is going to sound silly - but the lack of colorls turned me off. I know, it seems silly now, but that's honestly why I went back to Slackware.
I eventually went back to FreeBSD in the 2.x era once I had an IDE CDROM and I was working part time at an ISP after (high) school finished. Yes, I figured out how to get colorls to work, I got in trouble disagreeing with a Michael (O, not M) at iiNet about Squid on Linux versus FreeBSD, and well.. stuff. Here was this 17yo kid disagreeing with things and acting like he knew everything. I'm sure it was endearing.
Fast-forward a couple years, and I had been hacking on FreeBSD here and there. I got in a little erm, "trouble" before I finished high school, which phk reminded me of - when they granted me a commit bit. I forget when this was, but I wouldn't have been much older than 20.
So - this is why mentoring kids is important. It may seem like a waste of time; it may seem like they don't understand, but we were all there once. We wanted someone to relate to, someone to look up to, and something interesting to do. Julian was that person for me, and I owe both him and my mother (of course) pretty much everything about my existence in this silly little computer industry.
(This is also why you don't skimp on hardware support for popular, if cheaper platforms and "shiny" looking features if you want people to adopt your stuff - but that's a different rant.)
Ok, that's done. I'm going back to hacking on VGA/VESA boot loader support for FreeBSD-HEAD. That's long overdue, and I want my pretty splash screen.
So, getting it going was pretty easy:
# pkg install rtl-sdr
Then, using it to test ADSB is pretty easy:
# rtl_adsb -V -S
.. this is verbose and listens to short packets.
Where I live (near San Jose Airport!) I receive a lot of ADSB transmissions. It's quite interesting.
Ok, so next - what about something more GUI like? Someone's already done it - https://github.com/antirez/dump1090 . There's already a package for it:
# pkg install dump1090
# dump1090 --net --aggressive
Then, point a webserver at http://localhost:8080/ and watch!
In my previos post I described how to run libvirt/libxl on the FreeBSD Xen dom0 host. Today we're going a little further and run OpenStack on top of that.
Screenshot showing the Ubuntu guest running on OpenStack on the FreeBSD host.
I'm running a slightly modified OpenStack stable/kilo version. Everything is deployed on two hosts: controller and compute.
Controller host is running FreeBSD -CURRENT. It has the following components:
- MySQL 5.5
- RabbitMQ 3.5
- keystone through apache httpd 2.4 w/ mod_wsgi
Everything here is installed through FreeBSD ports (except glance and keystone) and don't require any modifications.
For glance I wrote rc.d to have a convenient ways to start it:
(18:19) novel@kloomba:~ %> sudo service glance-api status
glance_api is running as pid 792.
(18:19) novel@kloomba:~ %> sudo service glance-registry status
glance_registry is running as pid 796.
(18:19) novel@kloomba:~ %>
Compute node is running the following:
- libvirt from the git repo
This hosts is running FreeBSD -CURRENT as well. I also wrote some rc.d scripts for nova services except nova-network and nova-compute because I start it by hand and want to see logs right on the screen.
Nova-network is running in the FlatDHCP mode. For Nova I had to implement a FreeBSD version of the linux_net.LinuxNetInterfaceDriver that's responsible for bridge creation and plugging devices into it. It doesn't support vlans at this point though.
Additionally, I have implemented NoopFirewallManager to be used instead linux_net.IptablesManager and modified nova to allow to specify firewall driver to use.
Few more things I modified is fixing network.l3.NullL3 class mismatching interface and modified virt.libvirt to use the 'phy' driver for disks in libvirt domains XML.
And of course I had to disable a few things in nova.conf that obviously not work on FreeBSD.
I hope to put everything together and upload the code on github and create some wiki page documenting the deployment. It's definitely worth to note that things are very very far from being stable. There are some traces here and there, VMs sometimes fail to start, xenlight for some reason could start failing at VMs startup etc etc etc. So if you're looking at it as a production tool, you should definitely forget about it, at this point it's just a thing to hack on.
Few months ago FreeBSD Xen dom0 support was announced. There's even a guide available how to run it: http://wiki.xen.org/wiki/FreeBSD_Dom0.
I will not duplicate stuff described in that document, just suggest that if you're going to try it, it'd probably be better to use the port emulators/xen instead of compiling stuff manually from the git repo.. I'll just share some bits that probably could save some of your time.
X11 and Xen dom0
I wasn't able to make X11 work under dom0. When I startx with the x11/nvidia-driver enabled in xorg.conf, kernel panics. I tried to use an integrated Intel Haswell video, but it's not supported by x11-drivers/xf86-video-intel. It works with x11-driver/xf86-video-vesa, however, the vesa driver causes system lock up on shutdown that triggers fsck every time on the next boot and it's very annoying. Apparently, this behavior is the same even when not under Xen. I decided to stop wasting my time on trying to fix it and just started using it in a headless mode.
You should really not ignore the IOMMU requirement and check if your CPU supports that. If you boot Xen kernel and you don't have IOMMU support, it will fail to boot and you'll have to perform some boot loader tricks to disable Xen to boot your system (i.e. do unload xen and unset xen_kernel). Just google up your CPU name, e.g. 'i5-4690' and follow the link to ark.intel.com. Make sure that it lists VT-d as supported under the 'Advanced Technologies' section. Also, make sure it's enabled in BIOS as well.
At the time of writing (May / June 2015), Xen doesn't work with the UEFI loader.
xl cannot allocate memory
You most likely will have to modify your /etc/login.conf to set memorylocked=unlimited for your login class, otherwise the xl tool will fail with some 'cannot allocate memory' error.
It's very good that Xen provides the libxl toolkit. It should have been installed when you installed the emulators/xen port as a dependency. The actual port that installs it is sysutils/xen-tools. As the libvirt Xen driver supports libxl, there's not so much work required to make it work on FreeBSD. I made only a minor change to disable some Linux specific /proc checks inside libvirt to make it work on FreeBSD and pushed that to the 'master' branch of libvirt today.
If you want to test it, you'd need to checkout libvirt source code using git:
git clone git://libvirt.org/libvirt.git
and then run ./bootstrap. It will inform if it needs something that's not installed.
For my libxl test setup I configure libvirt this way:
./configure --without-polkit --with-libxl --without-xen --without-vmware --without-esx --without-bhyve CC=gcc48 CFLAGS=-I/usr/local/include LIBS=-L/usr/local/lib
The only really important part here is the '--with-libxl', other flags are more or less specific to my setup. After configure just run gmake and it should build fine. Now you can install everything and run the libvirtd daemon.
If everything went fine, you should be able to connect to it using:
virsh -c "xen://"
Now we can define some domains. Let's check these two examples:
The first one is for a simple pre-configured FreeBSD guest image. The second one defines CDROM device and hard disk devices. It's set to boot from CDROM to be able to install Linux. Both domains are configured to attach to the default libvirt network on the virbr0 bridge. Additionally, both domains support VNC.
You could get domain VNC display number using the vncdisplay command in virsh and then connect to a VM with your favorite VNC client.
I've been using this setup for a couple of days and it works fine. However, more testers are welcome, if you're using it and have some issues please drop me an email to novel@`uname -s`.org or poke me on twitter.
It's pretty simple in concept - I take FreeBSD-HEAD, build it with some cut-down options, create a custom filesystem image with some custom boot scripts and a custom configuration file, and provide an image that you can TFTP (using a serial console and ethernet cable) or upload directly to the AP if it supports it.
The supported hardware list is here:
Now, it's not a huge list like OpenWRT, but that's mostly because I don't have an infinite supply of Atheros MIPS based routers. I think I'll get some of the TP-Link Archer series stuff next.
Building it is pretty simple:
You checkout the build repo, check out FreeBSD-HEAD, install a couple of packages, and run the build for your board. Once it's done, the images for your board appear in ../tftpboot/. There's a wiki page for each of the supported boards with a walkthrough with how to get FreeBSD going on it.
It comes up on 192.168.1.20/24 with 'user' and 'root' users, with no password. So, the first thing you should do after installation is telnet in, configure /etc/cfg/rc.conf with your actual LAN IPs, set the user/root passwords, and then 'cfg_save' to save things. Then, reboot and voila!
The configuration file format looks like FreeBSD but it isn't. I'm keeping it somewhat hierarchical-looking in naming but flat in implementation so I can migrate it to something like a sqlite or luci backend in the future.
It's good enough for me to be able to set up an AP to be a bridge with a management IP address and configure the ethernet switch. Others have added ipfw support to do NAT and firewalling - I'm going to add configuration rules for NAT, IPFW and routing soon so it's all integrated.
It's FreeBSD, all the way through:
$ uname -a
FreeBSD tl-wdr3600 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r282406M: Wed May 6 22:27:16 PDT 2015 adrian@lucy-11i386:/usr/home/adrian/work/freebsd/head-embedded/obj/mips/mips.mips/usr/home/adrian/work/freebsd/head-embedded/src/sys/TL-WDR4300 mips
$ ifconfig wlan0 list sta
ADDR AID CHAN RATE RSSI IDLE TXSEQ RXSEQ CAPS FLAG
18:ee:69:15:f4:12 2 1 26M 37.0 45 2703 51888 EPS AQEHTRM RSN HTCAP WME
04:e5:36:0d:1b:0d 1 1 19M 23.0 15 1524 47072 EPS AQEPHTR RSN HTCAP WME
cc:3a:61:0e:33:a0 3 1 19M 32.0 30 2585 43072 EPS AQEPHTR RSN HTCAP WME
40:0e:85:1a:f1:69 4 1 19M 25.0 30 1138 54800 EPS AQEPHTR RSN HTCAP WME
00:0f:13:97:14:54 5 1 54M 30.0 45 1808 57424 EPS AE RSN
00:22:fa:c2:d1:20 6 1 26M 24.5 0 574 57776 EPS AQEHTRS RSN HTCAP WME
So if you'd like a FreeBSD based device to act as your home gateway, this is where you can start. It's not pfsense, but it's designed to run on things much smaller than pfsense supports and it's a good introduction into the world of FreeBSD embedded.
For background: http://7-cpu.com/cpu/SandyBridge.html .
But! Intel has this magical thing called DDIO. In theory (and there's a lot of theory here), DMA is done via a small (~10%) fraction of LLC (L3) cache, which is shared between all cores. If the data is already in cache when the CPU accesses it, it will be quick. Also, if you then wish to DMA out data from something in cache, it doesn't have to get flushed to memory first - it's just DMAed straight out of cache.
However! When I was doing packet bridge testing (using netmap + bridge, 64 byte payloads), I noticed that I was doing a significant amount of memory bandwidth. It wasn't quite at the rate of 10G worth of bridged data, but DDIO should be doing almost all of that work for me at 64 byte payloads.
So, to reproduce: run netmap bridge (eg 'bridge -i netmap:ix0 -i netmap:ix1') and run pkt-gen between two nodes.
This is the output of 'pcm-memory.x 1' from the intel-pcm toolkit (which is available as a binary package on FreeBSD.)
-- System Read Throughput(MB/s): 300.68 --
-- System Write Throughput(MB/s): 970.81 --
-- System Memory Throughput(MB/s): 1271.48 --
The first theory - the bridging isn't occuring fast enough to service what's in LLC before it gets flushed out by other packets. So, assume:
- It's 1/10th of the LLC - which is 1/10th of an 8 core * 2.5MB per core setup, is ~ 2MB.
- 64 byte payloads are being cached.
- Perfect (!) LLC use.
-- System Read Throughput(MB/s): 104.92 --
-- System Write Throughput(MB/s): 382.32 --
-- System Memory Throughput(MB/s): 487.24 --
- It does batch receive from netmap;
- but it then looks at the ethernet header do decap that;
- then it gets the IPv4 src/dst addresses;
- .. and looks them up in a (very large) traditional hash table.
- Bridges about 6.5 million pps;
- .. maxes out the CPU core;
- Memory access: 1000MB/sec read; 423MB/sec write (~1400MB/sec in total).
- Bridges around 10 million pps;
- 98% of a CPU core;
- Memory access: 125MB/sec read, 32MB/sec write, ~ 153MB/sec in total.
- Pull in up to 1024 entries from the netmap receive ring;
- Loop through, up to 16 at a time, and place them in a batch
- For each packet in a batch do:
- For each packet in the batch: optional prefetch on the ethernet header
- For each packet in the batch: decapsulate ethernet/IP header;
- For each packet in the batch: optional prefetch on the hash table bucket head;
- For each packet in the batch: do hash table lookup, decide whether to forward/block
- For each packet in the batch: forward (ie, ignore the forward/block for now.)
- Batch size of 1: 10 million pps;
- Batch size of 2: 11.1 million pps;
- Batch size of 4: 11.7 million pps.
- Batch size of 1: 10 million pps;
- Batch size of 2: 10.8 million pps;
- Batch size of 4: 11.5 million pps.
- Batch size of 1: 3.7 million pps;
- Batch size of 2: 4.5 million pps;
- Batch size of 4: 4.8 million pps.
- Batch size of 1: 5 million pps;
- Batch size of 2: 5.6 million pps;
- Batch size of 4: 5.6 million pps.
- Batch size of 4, ethernet prefetching: 5.5 million pps
- Batch size of 4, hash bucket prefetching: 7.7 million pps
- Batch size of 4, ethernet + hash bucket prefetching: 7.5 million pps
- Batch size of 1, no prefetching: 6.1 million pps;
- Batch size of 2, no prefetching: 7.1 million pps;
- Batch size of 4, no prefetching: 7.1 million pps;
- Batch size of 4, hash bucket prefetching: 8.9 million pps.
- 1 thread: 8.9 million pps;
- 4 threads: 12 million pps.
- 1 thread: 7 million pps;
- 4 threads: 4.7 million pps.
There's three modes:
- default - all ports are in the same VLAN;
- per-port - each port can be in a VLAN 'group';
- dot1q - each port can be in multiple VLAN groups, with 802.1q tagging going on.
The dot1q VLAN is for switches that support multiple VLANs, each can have an arbitrary VLAN ID (0..4095) with optional other VLAN options (like tag-in-tag support.)
The etherswitch configuration side has a few options and they're supported by different hardware:
- Each port has a port VLAN ID - this is the "native port" for dot1q support. I don't think it has any particular meaning in the per-port VLAN code in arswitch but I could be terribly wrong. I thought it did when I initially did the port, but the documentation is .. lacking.
- Then there's a set of per-port flags - eg q-in-q, 802.1q tagging, etc.
- Then there's the vlangroup - each vlangroup has a vlan ID, and then a set of port members. Each port member can be tagged or untagged.
Firstly - the AR934x SoC switch support doesn't include VLANs. I need to add that. I'm not sure which side of the wall this falls.
The switches previous to the AR8327 support per-port and VLAN configuration, but they don't support per-port-per-VLAN tagging. Ie, you can configure 802.1q VLANs, and you can enable tagging on the port - but it tags all packets that aren't the port 'VLAN ID'.
The per-port VLAN ID seems ignored by the arswitch code - it's only used by the dot1q support.
So I think (and it hasn't yet been tested) that on the earlier switches, I can use per-port VLANs with tagging by:
- Configuring per port vlans - "etherswitch config vlan_mode port"
- Adding vlangroups as appropriate with membership - tag/untag doesn't matter
- Set the CPU port up to have tagging - "etherswitch port0 addtag"
But on the AR8327, the VLAN map hardware actually supports enabling/disabling tagging on a per-port-per-VLAN basis. Ie, when the VLAN table is programmed with the port membership, it takes a list of both the ports and whether the ports are tagged/untagged/open/filtered. So, I don't think per-port VLAN tagging works - only dot1q tagging. Maybe I can make it work, but I haven't really sat down for long enough with the documentation to see what combinations are required.
- Configure the hardware - "etherswitch config vlan_mode dot1q"
- Add vlangroups as appropriate, set pvid as appropriate
- For each vlangroup membership, the port can be tagged or untagged - eg to tag the cpu port 0, you'd use '0t' as the port member. That says "port0 is a member, and it's tagged."
The TL;DR is this - there's some hardware inside the Intel CPUs that tracks memory ordering and cache contents - but they don't use all the address bits.
The relevant chapter in the intel optimisation guide is 3.6.8 - Capacity Limits and Aliasing in Caches. The specific thing I was hitting was in 188.8.131.52 - Store Forwarding Aliasing.
Assembly/Compiler Coding Rule 56. (H impact, M generality) Avoid having a store followed by a non-dependent load with addresses that differ by a multiple of 4 KBytes. Also, lay out data or order computation to avoid having cache lines that have linear addresses that are a multiple of 64 KBytes apart in the same working set. Avoid having more than 4 cache lines that are some multiple of 2 KBytes apart in the same first-level cache working set, and avoid having more than 8 cache lines that are some multiple of 4 KBytes apart in the same first-level cache working set.
So, given this, what can be done? In this workload, a bunch of large matrices were allocated via jemalloc, which page aligns large allocations. In the default invocation of the benchmark (where the allocation padding size is 0), the memory access patterns showed a very large number of counter events on "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS" - which is the number of 64k address aliases on the Sandy Bridge Xeon processors I've been testing on. (The same occurs on Westmere, Ivy Bridge and Haswell.) As I vary the padding size, the address aliasing value drops, the memory access counters increase, and the general performance increases.
On the test boxes I have (running pmcstat -w 120 -C -p LD_BLOCKS_PARTIAL.ADDRESS_ALIAS ./himenobmtxpa M )
0 217799413 830.995025
64 18138386 1624.296713
96 8876469 1662.486298
128 19281984 1645.370750
192 18247069 1643.119908
256 18511952 1661.426341
320 19636951 1674.154119
352 19716236 1686.694053
384 19684863 1681.110499
448 18189029 1683.163673
512 19380987 1691.937818
So there's still plenty of aliasing going on at different padding offsets, however it's a very marked drop between 0 and, well, anything.
It turns out that someone's gone and done a bunch more digging into the effects of various CPU magic under the hood. The last paper in the list (Analysing Contextual Bias..) looks at Aliasing and Cache Effects and the effect of memory layout. There's some cute (and sobering!) analysis of the performance changes due to something as simple as the length of your login name in the UNIX environment. It's worth reading.
The summary? Maybe page alignment of all of your memory accesses isn't the way to go.
For further reading:
- Intel Architecture Optimisation Manual: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
- Intel: Adjusting Thread Stack Address To Improve Performance on Intel Xeon Processors: https://software.intel.com/en-us/articles/adjusting-thread-stack-address-to-improve-performance-on-intel-xeonr-processors/
- Analysing Contextual Bias of Program Execution on Modern CPUs: http://daim.idi.ntnu.no/masteroppgaver/009/9231/masteroppgave.pdf
When the same benchmark is run on FreeBSD/DragonflyBSD using the Linux layer (ie, a linux binary compiled for linux, but run on BSD) it gives the same or better behaviour.
Some digging was done, and it turned out it was due to memory allocation patterns and memory layout. The jemalloc library allocates large chunks at page aligned boundaries, whereas the allocator in glibc under Linux does not.
I've put the code online in the hope that others can test and verify this:
The branch 'local/freebsd' has my local change to allow the allocator offset to be specified. The offset compounds on each allocation - so with an 'n' byte offset, the first allocation is 0 bytes offset from the page boundary, the next is 'n' bytes offset from the page boundary, the next is '2n' bytes offset, etc.
You can experiment with different values and get completely different behavioural results. It's non-trivial: there's a 100% speedup by using a 127 byte offset for each allocation, versus a 0 byte offset.
I'd like to investigate cache line aliasing effects further. There was work done a few years ago to offset mbuf headers in the FreeBSD kernel so they weren't all page-aligned or 256/512/1024 byte aligned - and apparently this gave a significant performance improvement. But it wasn't folded into FreeBSD. What I'd like to do is come up with some better strategies / profiling guides for identifying when this is actually happening so the underlying objects being accessed can be adjusted.
So - if anyone out there has any tips, hints or suggestions on how to do this, please let me know. I'd like to document and automate this testing.
After a bit of wrangling of hardware logistics and with the FreeBSD Foundation purchasing a box, a Tyan POWER8 evaluation server appeared. Nathan Whitehorn started poking at it and managed to get a basic "hello world" going, but stalled on issues with the Linux KVM virtualisation environment.
Fast forward a few weeks - he's figured out the KVM issues, their lack of support for some mandated hypervisor APIs and other bugs - FreeBSD now boots inside of the hypervisor environment and seems stable enough to do development on.
He then found the existing powerpc pmap (physical memory management) code wasn't very SMP friendly - it works fine on one and two CPU powerpc machines, but this POWER8 evaluation board is a 4-core, 32-thread CPU. So a few days of development went by and he rewrote most of the pmap code to be much more fine grained locked and scale much, much better than the existing code. (He also found the PS3 hypervisor layer isn't thread-safe.)
What's been done thus far?
- FreeBSD boots inside the hypervisor environment;
- Virtualised console, networking and storage all work;
- (in progress) new, scalable pmap implementation;
- Initial support for the Vector-Scalar Extension (VSX) that's found on POWER7 and POWER8.
Now I kind of want some larger POWER8 hardware.
I've just brought up FreeBSD's TDMA support on the AR9380 chipset. Specifically, the AR9331, since I have a Carambola 2 on me today.
It was pretty simple to bring up - I was missing the beacon configuration HAL call that the TDMA code expected. It's only used by the TDMA code - the STA and AP modes rely on the normal HAL beacon methods that date back to the Atheros HAL.
The only problem - it seems something is up with ANI (noise immunity) and sensitivity on at least the AR9331. It doesn't seem to behave well on slightly loaded channels and thus the beacons don't always go out when they're supposed to.
But, if you've been wanting to play with TDMA on the later Atheros chips, now you can!
We did but on unregular basis and only paid attention to very critical reports And not all reports.
That is now fixed, I relaunched a few scan via coverity and I'm happy to say that the latest scan on master claims 0 defects!
Meaning that all known defects have been fixed.
I was also planning to use lint(1) as well, unfortunatly on FreeBSD lint is not supporting C99...
If I'm brave enough I may synchronise lint(1) with NetBSD which seems to have added C99 support to that tool. Or maybe someone will volunteer to do it? :)
I usually build my own packages with poudriere but it’s not fun to do on tiny boxes so I just do ‘pkg install ‘ on them and use upstream packages. One downside is, that package is build with default options. I recently ran into a situation where I wanted to change some options for just a single port.
Now, what is the minimal set of things in /usr/ports/ that I need to checkout to be able to config/build just one port?
Turns out to be:
And the port I want to actually build. Now I can ‘make config’, change the options and build/install that port without checking out entire ports tree.
Recently I moved a server into a proper cabinet with doors. After a few days I noticed the fans were spinning up and down. So I started investigating ways to monitor the fan speed. I figured having a graph of them long term would give me a nice way to show changes in the environment, beyond the temperature monitoring I am already doing.
I was not having much luck searching the Internet. Luckily, Darius on IRC pointed me to a project called bsdhwmon by Jeremy Chadwick, a fellow FreeBSD Developer. The server is running an older Supermicro X7SBi motherboard with a Winbond 83627HG chip which is listed on the supported page of bsdhwmon.
It was easy to setup:
- Install bsdhwmon:
pkg install bsdhwmon
- Load the SMBus Controller driver for my motherboard:
- Load the Generic SMB I/O Device driver:
All I had to do from that point was run bsdhwmon:
CPU1 Temperature 46 C
System Temperature 29 C
FAN1 10975 RPM
FAN2 11344 RPM
FAN3 7219 RPM
FAN4 7068 RPM
FAN5 0 RPM
FAN6 11065 RPM
VcoreA 1.122 V
MCH Core 1.508 V
-12V -12.672 V
V_DIMM 1.808 V
+3.3V 3.296 V
+12V 11.904 V
5Vsb 5.046 V
5VDD 4.998 V
P_VTT 1.228 V
Vbat 3.312 V
It is important to remember to add the kernel modules to be loaded at boot. Adding the following to /boot/loader.conf will take care of that:
ichsmb will load
smbus, but not the smb kernel driver.
Now that I have the tools, I can monitor it at will.
Build SD card image using crochet-freebsd with
option VideoCore enabled. Mount either SD card itself of image to build host
mount /dev/mmcsd0s2a /pi
Checkout Qt5 sources and patch them
cd /src git clone git://gitorious.org/qt/qt5.git qt5 cd qt5 git checkout 5.4.0 MODULES=qtbase,qtdeclarative,qtgraphicaleffects,qtimageformats,qtquick1,qtquickcontrols,qtscript,qtsvg,qtxmlpatterns ./init-repository --module-subset=$MODULES fetch -q -o - http://people.freebsd.org/~gonzo/arm/rpi/qt5-freebsd-pi.diff | patch -p1
Configure, build and install Qt5 to SD card
./configure -platform unsupported/freebsd-clang -no-openssl -opengl es2 -device freebsd-rasp-pi-clang -device-option CROSS_COMPILE=/usr/armv6-freebsd/usr/bin/ -sysroot /pi/ -no-gcc-sysroot -opensource -confirm-license -optimized-qmake -release -prefix /usr/local/Qt5 -no-pch -nomake tests -nomake examples -plugin-sql-sqlite gmake -j `sysctl -n hw.ncpu` sudo gmake install
You need BSD-specific plugins to enable mouse and keyboard input in EGLFS mode
cd /src/ git clone https://github.com/gonzoua/qt5-bsd-input.git cd qt5-bsd-input /src/qt5/qtbase/bin/qmake gmake sudo gmake install
Build application you’d like run and install it. I use one of the examples here
cd /src/qt5/qtbase/examples/opengl/cube /src/qt5/qtbase/bin/qmake gmake sudo gmake install
Unmount SD card, boot Pi, make sure vchiq is loaded
root@raspberry-pi:~ # kldload
root@raspberry-pi:~ # /usr/local/Qt5/examples/opengl/cube/cube -plugin bsdkeyboard -plugin bsdsysmouse
If you see something like this:
EGL Error : Could not create the egl surface: error = 0x3003
QOpenGLFramebufferObject: Framebuffer incomplete attachment.
It means you need to increase GPU memory by setting
config.txt. Amount depends on framebuffer resolution. 128Mb works for me on 1920×1080 display.
bsdsysmouse plugin uses
/dev/sysmouse by default, so you either should have moused running or specify actual mouse device, e.g.:
root@raspberry-pi:~ # cube -plugin bsdkeyboard -plugin bsdsysmouse:/dev/ums0
bsdkeyboard uses STDIN as input device, so if you’re trying to start app from serial console it should be something like this:
root@raspberry-pi:~ # cube -plugin bsdkeyboard -plugin bsdsysmouse < /dev/ttyv0