Category Archives: 802.11

Porting a wifi driver from openbsd – AR9170

I told myself a long, long time ago that I really don't want to be working on USB wireless. It's not that I dislike USB or wireless; it's just that the hoops required to get it all working in a stable way is quite a bit to keep in your mind. But, I decided recently that it's about time I learnt how it worked and I was very sad that we still didn't have any working USB wifi devices that also operated with 802.11n.

So, I picked a NIC and dove in.

I picked if_rsu(4) - it's the RTL8188SU / RTL8192SU series hardware from Realtek. It turned out I chose reasonably well.

First off - it's a "fullmac" device - meaning that outside of a handful of things, the device firmware offloads a lot of the 802.11 complications. The driver does hardware initialisation and the wireless stack speaks WPA/WPA2/etc for negotiating encryption, but the hardware handles scanning, authentication, 802.11n aggregation negotiation and most management frame work.

Secondly - it's ported from OpenBSD. The OpenBSD folk do a good job of getting drivers up and running, but there tend to be some sharp edges and the 802.11n bits just don't work.

So, besides currently doing encryption in software, the rsu(4) driver behaves rather well. I'll write a separate article about that. This article is about the AR9170, or otus(4) driver in FreeBSD/OpenBSD parlance.

Now, the AR9170 is a ZyDAS device with an Atheros 802.11n PHY and radio. It's quite a hybrid beast. It's also buggy - there are issues with QoS frames and 802.11n aggregation that make it impossible to behave well. So, for now I'm treating it like a 11abg device and I'll worry about 802.11n when someone gives me patches to make it work.

The OpenBSD driver is based on the initial otus driver that Atheros provided to the Linux developers circa 2009. The firmware blob is closed and very old - the ar9170fw project is still out there on the internet (and I have a mirror at but I can't get it to build on a recent FreeBSD install so a firmware update will take time. But, it does seem to work.

There are a few pieces to think about when porting a USB driver. The biggest piece is that it's not memory mapped IO or IO port based - everything is a message. There are USB device control commands you can send which will sleep until they're done, but the majority of stuff is done using bulk transmit and receive endpoints and that's all conveniently asynchronous. But it complicates things in the driver world.

Memory mapped and IO port drivers treat device IO as this magical "I do it, then the next instruction executes when it's done" mostly serialised paradigm. It's a lie, of course - the intel x86 CPUs will pretend things are occuring in a specific order, but a lot of platforms require you to mark memory as uncached or use memory / cache flush operations to ensure things go out to the device in any particularly controlled manner. But USB doesn't - outside of USB control transfers, USB devices tend to look like remote network devices and this includes register accesses. Now, the RTL8188SU driver (rsu(4)) implements the firmware upload and register accesses using control transfers, so it's all pretty easy to get the driver initialisation and attaching working before you care about the asynchronous parts. But the AR9170 driver implements register accesses as firmware commands - and so I have to get a lot more of the stack up and working first.

So, here's what I did.

First up - I commented out almost all of the device driver, and focused on getting the probe, attach and detach methods working. That wasn't too hard. But yes, almost all the code was commented out.

Next up was firmware loading. This was done using control transfers, so I didn't have to worry about implementing the bulk transmit and receive endpoint handling. I had to convert the firmware load path to the FreeBSD firmware API rather than the OpenBSD API, but that was mostly trivial.

Then I realised I wasn't doing any driver locking - so yes, I ensured I did the bare minimum of driver locking required to stop the kernel panicing. OpenBSD doesn't use locks, they use old style BSD spl() levels.

Next up was command transmit and receive. Now, I needed to setup the USB endpoints - which FreeBSD makes really easy to do using a structure to define what endpoints are what. It was pretty clean. The complicated bit is the bulk callback - it handles transfer statuses and transfer initiation. This is the bit that took me a little time to wrap my head around.

The USB stuff handles things in-sequence. Everything going to an endpoint here gets handled in the sequence you queue it. It also will process the bulk callback in a single worker thread taskqueue, rather than the driver author having to worry about creating their own worker threads. So, this is what
you end up doing:

  • The bulk callback has three states: USB_ST_TRANSFERRED, USB_ST_SETUP, and everything else (error.)
  • USB_ST_TRANSFERRED says "I've finished a transfer".
  • USB_ST_SETUP says "I've been asked to initiate a transfer."
  • Any driver thread starts a transmit by calling usbd_transfer_start() on the usb_xfer struct, which will kick off a call into the bulk callback with USB_ST_SETUP.
  • So, the driver has to maintain its own queues of "pending", "active" and "waiting" transactions. "pending" is the queue to put outbound transmit messages on. "active" is the queue you put messages that you've submitted when USB_ST_SETUP is called. When USB_ST_TRANSFERRED or an error is called, you pop off the top entry from "active" and you finish with it, then you fall through to USB_ST_SETUP to start a new transfer.
It's a little complicated because you have to maintain your own submission queues in/out of the USB stack, but in practice it's just a linked list.

So, I stole the framework from rsu(4) for buffer management, transmit submission and completion. It submitted things fine. I also registered buffers for receive, and .. nothing happened. I would send a PING message to the firmware to see if it was awake, and I'd get nothing from the receive pipe.

Then I remembered an interesting bug from when I tried this in 2012 - the AR9170 firmware required the IRQ endpoint to be setup, even though no interrupt messages were ever posted. So, I set the endpoint up, started reception on it.. and now I started to see receive messages. My PING messages were being PONG'ed.

But here's the first complication - although everything is asynchronous here, a lot of places want to send a command and wait for a response. For the PING command it's waiting for a matching PONG response. For setting frequency, starting calibration, etc, you get back interesting status from the firmware. But for things like register read commands, you have to wait until you get the register value back before you can continue. We need to be able to put the caller to sleep until the response comes back, or some timeout occurs.

So, cmd_otus() submits a transfer buffer and then will msleep() on it for up to a second, waiting for a response. When a command is transmitted, a couple of things can occur:
  • Once the transfer succeeds, if the command needs no response then we just send a wakeup to notify the sender that we've sent it, and we free the buffer.
  • If the transfer succeeds but the command needs a response, then we put it on the "waiting" queue.
Then in the receive path we pull out firmware notifications and if they're responses we copy the response into the callers provided buffer, call wakeup() to wake up the caller, and free the buffer.

OpenBSD cheats - it only has one single outstanding command buffer for all threads to use.

The tricky, unimplemented bit here is error handling - if I yank out a NIC during active commands then the driver will sleep for a second, wakeup with an error and pass an error back. But, the rest of the driver doesn't know anything was sleeping, so state gets freed from underneath it. I need to go and add what OpenBSD does - refcount when the driver is entered from say, the transmit and ioctl paths, and then upon detach just wait for pending things to finish before freeing.

Ok, so that got command transmit/receive and sleep/wake notification working okay. Next up is packet reception and basic initialization. That was mostly the same - the same hardware bits are needed, the same 802.11 packet format is needed for the stack. The main differences here were in the OpenBSD versus FreeBSD net80211 interface layout - FreeBSD has vaps (virtual access points, etc) but OpenBSD does not. It's still pre-vap work, so there's only one interface. This required a little bit of splitting to put the vap bits in vap routines, and driver bits in the driver. The notable exceptions are vap_create, vap_destroy and newstate.

Next up was realising OpenBSD is also still driving 802.11 state from the driver, not from the net80211 stack. FreeBSD drives the state changes and tells the driver what to do. That required me undoing some manual state transitions (eg otus_init() setting the state to SCAN or RUN depending upon the interface mode) and just letting net80211 do it.

So, net80211 created a vap, called otus_init(), then brought up the interface, set the initial vap state to SCAN via a call to newstate and started changing channels. This worked fine. I had some locking concerns - check the driver to see what I did. It was pretty straightforward.

And then yes - because the receive path was pretty simple and I got straight 802.11 frames back, yes, I started seeing beacons in a tcpdump session. This was great.

Then I ripped up a bunch of callback code that isn't needed. A few years ago FreeBSD's USB drivers maintained their own taskqueue to defer things like crypto key setting, state changes and such. Now net80211 has a per-device taskqueue that it runs these things on, and a lot of the driver calls are done as deferred tasks. OpenBSD doesn't have this so the drivers create their own deferred task and async callback framework to schedule these. It's duplicated work and I removed all of that from the driver.

Next up is transmit. This is trickier for a few reasons.

First, FreeBSD doesn't use if_start() anymore, with network stack provided queues. I have to maintain my own queue and free net80211 node references as appropriate. It took a while to craft up a correctly behaving transmit side when I fixed rsu(4), so I just stole it for the AR9170. I'll describe that in a subsequent article about rsu.

FreeBSD's net80211 stack handles 802.11 encapsulation itself; we're not handed ethernet frames unless we ask for them. So, I don't call ieee80211_encap(). Yes, I do call for software encryption as required, and that was done.

The biggest sticking point is the rate control. FreeBSD's net80211 stack has a reasonable implementation of transmit rate control modules and it's per vap and per associated node. I don't have to do anything too manual for it. OpenBSD did a bunch of manual work to do the AMRR setup/teardown/updating, so I had to rip it out and call the ratectl init/destroy methods in the vap create/destroy methods.

Next up was what ni->ni_txrate represented. In OpenBSD it seems like an index into the rate control table. In FreeBSD it's the 802.11 rate to use! So, I ripped out a bunch of rate table stuff in the driver and replaced it with a couple of mapping functions to go 802.11 rate to AR9170 hardware rate. That worked like a charm, and transmit works fine.

The last annoying thing with transmit is how the firmware tells us about failed frames. We don't get a completion message upon each frame - the later firmware does this, but the original blob doesn't. We only get told upon retries and errors. So, I hacked up something where the transmit path counts outbound packets, the RX command path counts retries/errors, and each time I transmit a packet I update net80211 with the transmit/retry/error counts. This works pretty well.

Finally - teardown. The correct order for teardown is:
  • Shut down the MAC - eg, disable TX/RX DMA, etc
  • Disable the USB transfers, wait until they're done
  • Free the transmit/receive buffers and any net80211 node references they may have; and
  • then call ieee80211_ifdetach() to ensure vaps and the top level interface is destroyed.
The initial port called ieee80211_ifdetach() too early and the subsequent node references would refer to now-freed nodes and vaps, causing lots of hilarity.

And that's that. I haven't made 802.11n work; I haven't fixed up the radiotap support so received 802.11 packets in tcpdump actually provide the right rate/channel/etc. That's all details that I'll do when i feel like it. But, the driver is stable, there aren't any lock ordering issues that I've seen so far, and it actually behaves remarkably well.

freebsd-wifi-build, or "wait, you can run freebsd on atheros MIPS access points? where do I get that?"

I've been running FreeBSD at home as my primary internet/wifi access for a few years now. It's cheap, it's easy to do, and I've tried very hard to wrap up the whole process into a mostly-simple build system that spits out a useful image to use.

It's pretty simple in concept - I take FreeBSD-HEAD, build it with some cut-down options, create a custom filesystem image with some custom boot scripts and a custom configuration file, and provide an image that you can TFTP (using a serial console and ethernet cable) or upload directly to the AP if it supports it.

The supported hardware list is here:

Now, it's not a huge list like OpenWRT, but that's mostly because I don't have an infinite supply of Atheros MIPS based routers. I think I'll get some of the TP-Link Archer series stuff next.

Building it is pretty simple:

You checkout the build repo, check out FreeBSD-HEAD, install a couple of packages, and run the build for your board. Once it's done, the images for your board appear in ../tftpboot/. There's a wiki page for each of the supported boards with a walkthrough with how to get FreeBSD going on it.

It comes up on with 'user' and 'root' users, with no password. So, the first thing you should do after installation is telnet in, configure /etc/cfg/rc.conf with your actual LAN IPs, set the user/root passwords, and then 'cfg_save' to save things. Then, reboot and voila!

The configuration file format looks like FreeBSD but it isn't. I'm keeping it somewhat hierarchical-looking in naming but flat in implementation so I can migrate it to something like a sqlite or luci backend in the future.

It's good enough for me to be able to set up an AP to be a bridge with a management IP address and configure the ethernet switch. Others have added ipfw support to do NAT and firewalling - I'm going to add configuration rules for NAT, IPFW and routing soon so it's all integrated.

It's FreeBSD, all the way through:

$ uname -a
FreeBSD tl-wdr3600 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r282406M: Wed May 6 22:27:16 PDT 2015 adrian@lucy-11i386:/usr/home/adrian/work/freebsd/head-embedded/obj/mips/mips.mips/usr/home/adrian/work/freebsd/head-embedded/src/sys/TL-WDR4300 mips
$ ifconfig wlan0 list sta
18:ee:69:15:f4:12 2 1 26M 37.0 45 2703 51888 EPS AQEHTRM RSN HTCAP WME
04:e5:36:0d:1b:0d 1 1 19M 23.0 15 1524 47072 EPS AQEPHTR RSN HTCAP WME
cc:3a:61:0e:33:a0 3 1 19M 32.0 30 2585 43072 EPS AQEPHTR RSN HTCAP WME
40:0e:85:1a:f1:69 4 1 19M 25.0 30 1138 54800 EPS AQEPHTR RSN HTCAP WME
00:0f:13:97:14:54 5 1 54M 30.0 45 1808 57424 EPS AE RSN
00:22:fa:c2:d1:20 6 1 26M 24.5 0 574 57776 EPS AQEHTRS RSN HTCAP WME

So if you'd like a FreeBSD based device to act as your home gateway, this is where you can start. It's not pfsense, but it's designed to run on things much smaller than pfsense supports and it's a good introduction into the world of FreeBSD embedded.

TDMA (somewhat) working on AR9380 chips

(Wow, I have a lot of posts to write to catch up on things.)

I've just brought up FreeBSD's TDMA support on the AR9380 chipset. Specifically, the AR9331, since I have a Carambola 2 on me today.

It was pretty simple to bring up - I was missing the beacon configuration HAL call that the TDMA code expected. It's only used by the TDMA code - the STA and AP modes rely on the normal HAL beacon methods that date back to the Atheros HAL.

The only problem - it seems something is up with ANI (noise immunity) and sensitivity on at least the AR9331. It doesn't seem to behave well on slightly loaded channels and thus the beacons don't always go out when they're supposed to.

But, if you've been wanting to play with TDMA on the later Atheros chips, now you can!

So, FreeBSD on the AR9344? What happened?

I committed a bunch of code a while ago to FreeBSD-HEAD to at least start booting on the AR934x SoCs. The AR934x SoC is a MIPS74k core - a dual-issue superscalar 11-stage pipeline MIPS32r2 CPU. It's slightly different to the existing MIPS24k stuff (which is a single 8-stage pipeline.)

So - first step - it booted up a little, then hit a machine check. At that point the FreeBSD MIPS peeps believed there was hilarity in the TLB exception handling code, so we put it to sleep for a while and I went back to real work.

Then a few weeks ago I decided to finish it off. I brought my developer board to Eurobsdcon in Malta and sat down with Warner Losh, who also has said developer board. We spent a bunch of time going over the TLB code and realised that FreeBSD's instruction/execution hazards are all.. just wrong. Then, on a whim, I read up some more about MIPS32r2 and superscalar stuff and discovered that the correct hazard instruction isn't NOPs or SSNOPs - it's EHB (execution hazard barrier.) It's 'SLL $0, $0, 3' in MIPS parlance which on older CPUs is just a NOP (since register 0 is always 'zero'.) So, this fixed the TLB management and the boot proceeded quite a bit further.

Next - bringing up ethernet and the switch PHY. I was seeing totally crappy and invalid register values when reading/writing the attached switch chips. Even probing didn't work reliably - in fact, I got to the point where I was reading the value I'd expect from the previous register read. So, I wondered if this was another out-of-order behaviour from the MIPS74k superscalar architecture.

After digging into the MIPS bus space code, I found two things:

  1. The MIPS driver(s) don't call bus barrier functions at all - so there's no driver enforced access ordering. It was all assuming that the CPU doesn't re-order things; and
  2. The bus barrier code for MIPS was a no-op. It just plainly wasn't defined.
So, I added read/write memory barriers to the MIPS bus barrier routines and I modified the ethernet driver to use barriers. For good measure, I also added barriers to the SPI driver code as that also has a bunch of register accesses that require ordering.

And with that, the switch PHY probe/attached fine, the SPI driver worked fine and the device started booting userland off of SPI connected NOR flash.

Then, it hung. I dug into that a bit and wondered what the hell was going on. Then after a day of poking, I discovered that the interrupt acknowledgement was not working. It's a quirky thing that I should really fix in the atheros platform support - the AR71xx chips don't require the CPU peripheral interrupts to be ack'ed (eg the uart) but later chips do. I added the AR934x to the list of SoCs that need interrupts to be ack'ed and the system kept booting, all the way to userland.

Next - I haven't yet written the AR8327 support but I started fleshing out the AR934x on-board switch support. I got it probing, attaching.. but not passing any traffic. After more digging, I realised my mistake - I was writing some registers incorrectly. I would mask out the right bits to set, but then I'd always set bit 0. Sigh. So, that came up and things worked.

Then I decided to do the wifi part. This was pretty damned simple. The HAL from Qualcomm Atheros already has support for the AR934x in it and I had already modified it to work for the AR933x SoC (which just required me to 'teach' it the FreeBSD way of exposing the calibration/configuration data from on-board flash.) So, all I had to do was this:

  1. Add the device to the kernel configuration;
  2. Add a hint pointing out where the device is mapped in IO space;
  3. Add a hint pointing out where the calibration data is in the NOR flash;
  4. Reboot.
That's it. No weeks of merging code in from Linux or the internal Qualcomm Atheros driver into the FreeBSD driver. No real debugging required. Just enable it, point it at the right place in memory/flash and .. boot it. I think this again vindicates my efforts to open source the Qualcomm Atheros HAL - I just inherit this working code for free. I don't have to try and merge it into anything.

So, I have a port that's dirty and working. There's a lot of infrastructure changes I need to commit before I can commit this port - lots of new clocking options (there's now variations on the clock rate that the MDIO bus (the MII bus connecting the ethernet port(s) to a PHY or switch), there's lots of new configuration options for how the on-chip ethernet port(s) map to external ports and a bunch of other ancillary stuff that's not really worth mentioning. But it's going to show up in FreeBSD-HEAD soon.

Doing traffic with the Carambola 2..

Now that the port is working, I've started doing some traffic with the carambola 2 board on FreeBSD.

So far, so good:

# athstats
546236       data frames received
509242       data frames transmit
155          tx frames with an alternate rate
14818        short on-chip tx retries
13617        long on-chip tx retries
645          tx failed 'cuz too many retries
MCS7         current transmit rate
2            recv eol interrupts
9            tx frames with no ack marked
506786       tx frames with short preamble
1414         rx failed 'cuz of bad CRC
1543         rx failed 'cuz of PHY err
    12           OFDM restart
    1531         CCK restart
20610        beacons transmitted
71           periodic calibrations
-0/+0        TDMA slot adjust (usecs, smoothed)
24           rssi of last ack
25           avg recv rssi
-96          rx noise floor
2447         tx frames through raw api
39730        A-MPDU sub-frames received
494045       Half-GI frames received
5967         40MHz frames received
8037         CRC errors for non-last A-MPDU subframes
2            CRC errors for last subframe in an A-MPDU
498972       Frames transmitted with HT Protection
3            TX Timeout
177          Number of frames retransmitted in software
15717        A-MPDU sub-frame TX attempt success
177          A-MPDU sub-frame TX attempt failures
1            spur immunity level
4            first step level
128          OFDM weak signal detect
9            CCK weak signal threshold
108          ANI increased spur immunity
105          ANI decrease spur immunity
108          ANI increased first step level
105          ANI decreased first step level
943666       cumulative OFDM phy error count
108574       cumulative CCK phy error count
2            ANI parameters zero'd for non-STA operation
44           ANI forced listen time to zero
44           ANI calculated listen time < 0
13603        missing ACK's
14996        RTS without CTS
504970       successful RTS
34928        bad FCS
Antenna profile:
[0] tx   496835 rx        0
[2] tx        0 rx   546236

Making the AR9330 SoC wifi, or "how it feels doing things right.."

Well, "doing it right" is subjective. Sure. I'll grant you that.

I brought up the AR9330/AR9331 SoC support a couple of months ago. Unfortunately the Atheros reference board (AP121) comes with 16MB of RAM and 4MB of flash - which is just painful to do FreeBSD-HEAD development in.

Yes, I know. 16MB of RAM is tons of space... for FreeBSD-4. Anyway. That is a rant for another day.

So I managed to bring up the basic SoC support (which took longer than I thought - I had to learn how to write a FreeBSD uart driver!) but I decided to put wifi on hold until I found a board with more RAM and flash.

Along comes the Carabola 2 from ( . It's an AR9330, but with 64MB RAM, 16MB flash and a full-featured uboot. This is perfect for .. well, anything. And it's 30 Euros in quantities of one. Wait, it's cheap, it's fully-featured and it's available online? No way. What's the catch?

The catch - it wasn't running FreeBSD.

So I finally decided to bring up wifi support on FreeBSD.

The AR9300 HAL from Qualcomm Atheros includes the AR9330/AR9331 SoC wifi support. So I had to make it compile and make it work. How hard could it be?

Firstly - I wasn't compiling it in by default as it's only really useful for the SoC and not for normal PCIe NIC support. So, I needed to add that in. Luckily, I had to set AH_SUPPORT_HORNET into the source. Cool.

Next - the bus glue. The SoC internal bus isn't PCIe, it's what they call AHB, or "Atheros Host Bus." It's a derivative of a standard on-chip peripheral interconnect bus. The FreeBSD ath_ahb driver only supported AR9130, so I had to extend it to support non-AR9130 devices. That got it probing and attaching, but it wasn't finding the calibration / configuration space.

Next - gluing in the calibration data. It's on-board in the system flash, rather than on-chip (OTP) or an external EEPROM. The EEPROM space is 16KiB in size, rather than the 4KiB space used by the AR9xxx series SoCs. Also, the AR9300 HAL already seeks into the EEPROM space to grab the data at offset 0x1000, so I don't have to do that like I do with the AR9130 and related chips.

Finally - I had to teach ar9300_attach() that it needed to copy the EEPROM data I was giving it from ath_ahb into the copy it uses when setting things up.

And... that was it. After that, it booted and came up correctly. I was shocked.

You can find the boot log and dmesg at .

I haven't yet tested 802.11s (mesh) on this stuff, nor have I made TDMA work with this series of chips. But it's my eventual goal to make this board one of the "gold standard" boards for people wishing to enable their projects with wifi mesh. I bet it'll work out of the box as it stands, so if you're up for a bit of tinkering, buy a handful and set it up!

Enjoy! It's the best 30 euro you'll spend!

Working on Bluetooth Coexistence

I decided to bite the bullet and start hacking on bluetooth coexistence on these Atheros NICs. It's a bit of a rabbit hole.

I'll write up a bit more documentation on this when I'm not overly tired, but the general overview is pretty simple: "It's all done in software."

The bluetooth and wifi stacks need to speak to each other to know when is an appropriate time to prefer wifi traffic or bluetooth traffic. When pairing, bluetooth should be preferred. When scanning, associating, authenticating and rekeying, wifi should be preferred. When different profiles are active (eg A2DP audio), the bluetooth traffic should be periodically given preference so the A2DP frames can go out reliably. This has to be controlled in software.

So to make this work well on FreeBSD, I'll have to teach the wifi and bluetooth stacks to interface with each other somehow so this can be synchronised.

I have basic (static) coexistence working with the AR9285+AR3011 combo NIC. That's now in -HEAD.

I'm working on basic (static) coexistence on the AR9485+AR3012 combo NIC, however my NIC has an older BT part which requires quite a bit of dancing to make work. I'll have to teach ath3kfw how to load the config and firmware image for the required NIC. It's going to take some time but it'll be worth it.

I was hoping that FreeBSD would have basic A2DP support but it currently doesn't. I'd love to see that happen as it'd simplify a lot of my development/testing - as I can then do audio stream testing both playing and recording audio, then stream that over wifi.

Oh well. Another day of hacking!

Today’s Journey: Making AP mode power-save work better

I've been working on improving the net80211 and ath driver support for AP mode power save.

There's a few parts to it:

  • A station can tell an access point it's going to sleep by setting the power mgmt bit to 1 in a TXed frame;
  • The AP will then update the TIM entry in the beacon frames it sends out to reflect whether that station has any traffic queued;
  • A station can signal an AP that it's awake by sending a data frame with the power mgmt bit set to 0;
  • .. or it can request a frame at a time by using PS-POLL;
  • There's also the uAPSD stuff which I haven't yet implemented and won't likely do so for a while.
Now, it shouldn't be that difficult. Except, that it is.

If an AP has a bunch of frames queued to a station that has gone to sleep, it will keep trying to transmit those frames. That wastes air-time and results in annoying levels of packet loss.

When you're doing 802.11n, there's a whole lot more traffic going on and a lot more room to cause massive traffic issues if you drop frames. But you don't want to keep failing to transmit those frames or you'll end up spending a lot of time transmitting BAR frames to the station.

If the driver maintains a queue of frames (for say, software retransmit) then it also needs to ensure that the TIM bit is set correctly. Otherwise the AP may set the TIM bit to 0 because the net80211 stack has no queued frames to that node; but the driver itself has some frames. Thus, the station won't wake up and you'll see increased packet latency.

When PS-POLL is received, frames need to first be leaked from the driver queue BEFORE it starts leaking frames from the net80211 power save queue. The last thing you want is the wrong set of frames to go out.

So, I've spent the last few months extending the driver and network stack to make this feasible. There's new net80211 driver methods for tying into the TIM update process, the node power save status and the PS-POLL handling. The filtered frames handling in the ath driver is another precursor to this - it means that frames can be failed out very quickly and retried when appropriate.

(No, I'm not implementing software retransmit for non-11n traffic just yet. I will eventually. Just not yet.)

The final bits that I've been working on have been tricky.

When a node goes to sleep, you want to pause the driver transmission to the node - otherwise it will keep trying to transmit whatever is in the driver queue. For 11n this is terrible; it means that frames will keep failing to be transmitted and with enough failures, the traffic will stop whilst a BAR frame is sent. Grr.

Next was figuring out how to send frames whilst the node is "paused". I introduced a per-node "leak" counter which tells the driver transmit path that even though the node is asleep, a single frame should be scheduled. If one isn't available, the next frame sent will be scheduled. This handles the PS-POLL "null" response - ie, if there's nothing in the queue, the net80211 stack will queue a null data response with the MORE bit clear. That way the station will know there's currently nothing to receive.

But then, something odd started happening. Devices would disassociate and re-associate, but they'd still be marked as "asleep". So no traffic would occur. After digging into it a bit, I discovered that the only time a station transitions back to awake is when it receives a DATA frame with the power mgmt bit set to 0. Seeing management/control traffic from the station isn't enough. So for now, I just always transmit management/control frames regardless if the station is asleep or awake - except BAR frames. Those get software queued if the node is asleep. Now that management/control frames are transmitted directly, a station can re-associate and be marked as 'awake.'

Then I found that once a station re-associates, it should have all of its current association state reset. It may have had a bunch of aggregate frames queued to the hardware and those need to finish transmitting before we can start transmitting new data to the re-associated station. It may even have been in the middle of receiving a BAR frame! So, I have to gently (well, "gently") reset the association state to allow for currently queued frames to be cleaned up, but reset things like filtered frame state and BAR TX. Ew, but it needs to be done.

Also, if there's data queued to an asleep station and a BAR frame needs to go out, the BAR frame needs to go into the head of the software queue, not the tail. Otherwise it will have to wait for the queue to be transmitted - which, if there's a gap in the transmit block-ack window (hence needing the BAR), no further transmission will occur. Oops!

I then found that a sufficiently chatty node could end up filling the software queue full of buffers destined to it. This is a general problem in the ath driver which I'll eventually fix, but it became a huge problem with power save enabled. So, I've introduced a per-node maximum queue depth when it's asleep. That should limit the amount of pain that a single sleeping node can cause. I'll eventually introduce a limit for how many buffers an individual node can consume whether it's awake or asleep but that's for another day.

There's likely lots more corner cases that need to be addressed before I can merge this into -HEAD. I'm still seeing my macbook pro occasionally disassociate and not automatically re-associate and I'm not sure why. But things are behaving much, much better with sleeping devices.

Why PCI latency timers matter..

My latest "are you serious?" moment recently was trying to figure out the root cause of this performance issue with the AR5416 cardbus NIC on some of my test laptops.

Now, the AR5416 is Atheros' first 802.11n NIC, so it has some rough edges. But I was seeing some ridiculously bad transmission failures and I couldn't pinpoint them.

Not only that, I was seeing great performance (~ 130mbit TCP) on a specific laptop (Lenovo T41p) but the Lenovo T60 and T400 both performed extremely poorly.

To make matters weirder - the NIC performed great when speaking to another NIC in the same laptop. Just not to another physically separate device.

So, after much digging, here's what I discovered.

Firstly - I used my athalq packet descriptor logging and inspection tool (that's in FreeBSD-HEAD - no custom closed source code here!) to investigate the TX frames being sent to the hardware. What I found was troubling - large numbers of frames had TX data and TX delimiter underruns.

I then discovered that my code for counting TX data / delimiter underruns was totally incorrect - it's possible to see both a data/delimiter underrun error _with_ a valid transmitting frame. What was going on was cute - the hardware would start transmitting an aggregate frame but the DMA wouldn't keep up during said transmission and half way through the frame it would underrun. This only happened at higher MCS rates.

So making shorter aggregate frames fixed it, as well as increasing the delimiter count between frames. Both had the effect of reducing the likelihood of the NIC failing to transmit a longer aggregate. But they weren't solutions.

So I went digging. What I found was pretty simple in theory: the PCI latency timer on the NIC was being set to something appropriate (0xa8) but the PCI latency timer on the cardbus PCI bridge itself was not (0x20.) So any other bus activity would cause the NIC to not get the bus and it'd miss its DMA window.

Once I manually fixed the PCI bridge latency timer to be 0xa8, everything returned to normal.

However - there's only one thing on this PCI bridge - the cardbus interface itself. That's why it's so kooky. I would've thought that I'd have to up the value on the rest of the PCI bridges up to the root complex. There's no latency timer for PCIe, so it's not a problem there. So there's likely some very subtle timing involved that's just plain broken by default on how the BIOS initialises this cardbus slot and FreeBSD is not overriding it.

Now, if you see crappy performance on the PCI/cardbus 802.11n NICs in FreeBSD, you can check the output of 'athstats' to see if you do see TX underruns of any sort. If you are, the hardware isn't meeting the DMA deadlines it needs to DMA out frames and you need to do some further digging into your system to see why.

Be careful of adding debugging, as microseconds count..

.. after tinkering with the TDMA code a bit more, I discovered why I was seeing larger swings in the TDMA slot timings.

Two words: Debug Code.

Well, to be more specific - I added some debugging code that by default didn't do anything. But it was still there; it checked a debug flag and didn't log anything if it was disabled. But that would take time to execute. Since that debugging code sat _between_ the routines doing math with the RX timestamp and the nexttbtt register, it would calculate a slightly larger TSF offset.

Once I moved the debug code out from where it is and grouped all that register access and math together, the slot timing swings dropped by a few microseconds and everything went back to smooth.

Tsk. I should've known better.

At least now the TDMA code is working well on the 802.11n chips. Yes, it's still only 802.11abg rates, but it works. I've also found the PCU MISC_MODE bit to enforce packets don't transmit outside of the burst window and that is working quite fine with TDMA.

So, I think I can say "mission accomplished." I'll tidy up a few more things and make sure TX only occurs in one data queue (as mentioned in my previous post, they all burst independently at the moment..) and then patiently wait for someone to implement 802.11n adhoc negotiation so 802.11n MCS rates and aggregation magically begins to work. Once that's done, 802.11n TDMA will become a reality.

Getting TDMA working on 802.11n chipsets

A few years ago, a bunch of clever people figured out how to implement TDMA using the Atheros 802.11abg NICs. Sam Leffler has a great write-up here. He finished that particular paper with some comments about the (then) upcoming 802.11n chipsets from Atheros and how they would be better suited to the kinds of tricks he pulled with the Atheros MAC.

But, if you tried bringing up TDMA on the Atheros 802.11n chips, it plain just didn't work. Lots of people gnashed teeth about it. I was knee deep in TX aggregation work at the time so I just pushed TDMA to the back of my mind.

How it works is pretty cute in itself. To setup a TX "slot", the beacon timer is used to gate the TX queues to be able to start transmitting. Then a "channel ready time" burst length is configured, which is the period of time the TX queue can transmit. Once that timer expires, no new TX is allowed to begin. Sam then slides the slave TX window along based on when it sees a beacon from the master, as everything is synchronised against that.

Luckily, someone did some initial investigation and discovered that a couple of things were very very wrong.

Firstly, when fetching the next target beacon transmission time ("TBTT"), the AR5212 era NICs returned it in TU, but the AR5416 and later returned it in TSF.

Secondly, the TSF from each RX frame on the AR5212 is only 15 bits; on the AR5416 and later its 32 bits. The wrong logic was used when extending the RX frame timestamp from the AR5416 from 32 bits to 64 bits, and it was causing the TSF to jump all over the place.

So with that in place, he managed to stop the NICs from spewing stuck beacons everywhere (a classic "whoa, who setup the timers wrong!" symptom) and got two 11n NICs configured in a TDMA setup. But he reported the traffic was very unstable, so he had to stop.

Fast-forward about 12 months. I've finished the TX aggregation and BAR handling; I've debugged a bunch of AP power save handling and I'm about to reimplement some things to allow me to finish of AP power save handling (legacy/ps-poll and uapsd) in a sane, correct fashion. I decide, "hey, TDMA shouldn't be that hard to fix. Hopefully there are no chip bugs, right?" So, I plug in a pair of AR5413 (pre-11n NICs) and get it up and running. Easy. Then I plug in an AR5416 as the slave node, and .. it worked. Ok, so why was he reporting such bad results?

Firstly, Sam exposed a bunch of useful TDMA stats from "athstats". Specifically, if you start tinkering with TDMA, do this:

$ athstats -i ath0 -o tdma 1

   input   output  bexmit tdmau   tdmadj crcerr  phyerr  TOR rssi noise  rate
  619817   877907   25152 25152    -4/+6    142     143    1   74   -96   24M
     492      712      20    20    -0/+7      0       0    0   74   -96   24M
     496      720      20    20    -2/+6      0       0    0   74   -96   24M
     500      723      21    21    -6/+4      0       0    0   75   -96   24M

When I was debugging the initial AR5416 TDMA stuff, the tdma adjust figures bounced everywhere between 0 and 1000uS off. That was obviously not stable.

So, I looked at what debugging was in the driver itself. There was some (check if_ath_debug.h for the TDMA and TDMA timer flags), and after a bit of digging I realised that every time the TSF was just about to converge, it would be bumped out 1000uS. Then it would slowly drift back to converge, then it'd fall out 1000uS. This kept repeating. It made no sense; every time it calculated the delta between the expected and real TSF, it would "bump" the TSF by that much. That way the TSF would actually be correct. It shouldn't be out by almost as much the next RX'ed frame.

I did some initial testing to ensure the TSF was running at the expected 1uS interval (it was) and the master side was also running at the expected 1uS interval (it also was), so it wasn't out of sync clocks. The TSF bump must not be "right".

Enter the next bug - on the AR5416 and later, the TSF writes must be done as a 64 bit write. Ie, you write TSF_L32 first, then TSF_U32. At that point it gets internally updated and everything is consistent. If you don't do that, it doesn't latch.

Ok, so that fixed the intial drift. But after about 60 seconds, the TSF adjust parameters started varying ridiculously wildly. Ok, so 60 seconds equaled around 65,535 TU (where a TU is 1.024 milliseconds) so I began to wonder if I was seeing something wrap at that point.

Enter the next bug. The math involved in calculating the expected slot time was based on the 64 bit TSF and it was converted down to a 16 bit TU value from 0 .. 65535 TU. On the AR5212 era chips, the nexttbtt timer had a 16 bit resolution. When the nexttbtt value was read from that register, it was already 16 bits. So the "TSF delta" between the expected and real slot time was calculated between these two 16 bit values. However, on the AR5416 and later, the nexttbtt value was a 32 bit TSF (microsecond) value. Even when converted to a TU (1.024 millisecond) value, it would wrap at a value much greater than 65,535 TU. So the comparison would soon be between a value from 0..65,535 TU and 0 .. much-bigger-than-65,535 TU. The tsfdelta would become very, very negative.. and things would go nuts.

Ok, so that fixed another behavioural issue. Things were looking good. The slot time sync was stable. So I started passing traffic. Everything looked good.. for about 60 seconds. Then everything went slightly nuts again. But only with traffic. The timing calculations went way, way out.

Here's an example of the beacons coming in. Note that the expected beacon interval here is 49,152uS.

[34759308] [100933] BEACON: RX TSF=67127545 Beacon TSF=3722387514 (49152)
[34759357] [100933] BEACON: RX TSF=67176714 Beacon TSF=3722436670 (49156)
[34759442] [100933] BEACON: RX TSF=67262432 Beacon TSF=3722521354 (84684)
[34759454] [100933] BEACON: RX TSF=67275216 Beacon TSF=3722533850 (12496)
[34759504] [100933] BEACON: RX TSF=67325995 Beacon TSF=3722583802 (49952)
[34759552] [100933] BEACON: RX TSF=67374479 Beacon TSF=3722632108 (48306)
[34759602] [100933] BEACON: RX TSF=67424546 Beacon TSF=3722681282 (49174)
[34759652] [100933] BEACON: RX TSF=67475842 Beacon TSF=3722731578 (50296)
[34759701] [100933] BEACON: RX TSF=67525900 Beacon TSF=3722780730 (49152)

The master beacons were not coming in stable in any way. The main reason this would happen is if the air was busy at the master target beacon transmission time. So it would delay transmitting the beacon until the air was free.

This is where I decided it was about time I inserted some tracing into the TDMA code. I had introduced some ALQ based tracing in the ath(4) driver recently, specifically to trace TX and RX descriptors. I decided to add TDMA trace points. That way I could look at the TDMA recalculation along with the TX and RX from the driver.

What I found was very .. grr-y. After about 60 seconds (surprise), the TX would burst FAR past the 2.5 milliseconds it was supposed to. Why the heck was that happening?

After a bunch of staring-at-documentation and talking with some people well-versed in how the Atheros MAC worked, we realised the only real explanation is that the beacon timer was firing after the burst time, retriggering the timer. But why would it be? I stared at the debugging output a little more, and look at what I saw:

[34759258] [100933] BEACON: RX TSF=67077388 Beacon TSF=3722338362 (49152)
[34759258] [100933] SLOTCALC: NEXTTBTT=67081216 nextslot=67081224 tsfdelta=8 avg (5/8)
[34759258] [100933] TIMERSET: bt_intval=8388616 nexttbtt=65510 nextdba=524078 nextswba=524070 nextatim=65511 flags=0x0 tdmadbaprep=2 tdmaswbaprep=10
[34759259] [100933] TSFADJUST: TSF64 was 67077561, adj=1016, now 67078577

.. everything here is fine. We're programming nexttbtt in TU, not TSF (because the HAL API specifies it in TU for the older, pre-11n chips. Ok. Suspiciously close to the 65,535 TU boundary.


[34759308] [100933] BEACON: RX TSF=67127545 Beacon TSF=3722387514 (49152)
[34759308] [100933] SLOTCALC: NEXTTBTT=22528 nextslot=67131381 tsfdelta=-11 avg (5/7)
[34759308] [100933] TSFADJUST: TSF64 was 67127704, adj=11, now 67127715

Ok, but it's just a TSF adjust, no biggie. But, then this happened:

[34759357] [100933] BEACON: RX TSF=67176714 Beacon TSF=3722436670 (49156)
[34759357] [100933] SLOTCALC: NEXTTBTT=71680 nextslot=67180550 tsfdelta=6 avg (5/7)
[34759357] [100933] TIMERSET: bt_intval=8388616 nexttbtt=71 nextdba=566 nextswba=558 nextatim=72 flags=0x0 tdmadbaprep=2 tdmaswbaprep=10
[34759357] [100933] TSFADJUST: TSF64 was 67176888, adj=1018, now 67177906

At this point, it was clear. nexttbtt was very very small. Somehow it was very very small - 71 TU is very, very much before the current TSF of somewhere around 67,127,545. At this point the Next TBTT timer would just keep continously firing. And this would keep re-gating the TX queue, allowing it to just plain keep bursting. That explains why everything was going crazy during traffic.

This again was another example of the code assuming it was an AR5212 era NIC. The nexttbtt value was being trimmed to be between 0 and 65,535 TU. After I fixed that and fixed up the math a bit, nexttbtt was being correctly programmed and suddenly everything started working. And quite well.

So, now the basics are working. I'll audit the math to ensure everything wraps consistently at the 32-bit TSF boundary (ie, 4 billion microseconds, give or take) as that doesn't take too long to occur. But the 11n chips now behave the same as the 11a chips do when doing TDMA.

So what's next?
  • The "tx time" calculation needs to be aware of the 11n rate configuration, so it can calculate the guard time correctly. Right now it uses the non-11n aware rate -> duration HAL function;
  • The TX path has to be rejiggled a bit to ensure _all_ traffic gets stuffed into one TX queue (well, besides beacons.) Management and higher priority traffic has to do this too. If not, then multiple TX queues can burst and they'll burst separately, blowing out the TX slot timing;
  • Someone needs to get 11n adhoc working, so that 11n rates are negotiated during adhoc peer establishment. Then aggregation can just magically work at that point (the TDMA code reuses a lot of adhoc mode vap behaviour code);
  • 802.11e / 802.11n delayed block-ACK support needs to be implemented;
  • Then when doing TDMA, we can just burst out an aggregate or two inside the given slot time, then wait for a delayed block ACK to come back from the remote peer in the next slot time! Yes, I'd like to try and reuse the standard stuff for doing delayed block-ack rather than implementing something specific for 802.11n aggregation + TDMA.
  • .. and yes, it'd be nice for this to support >2 slave terminals, but that's a bigger project.
Right now I think I'll tackle #1 and then make sure the 11n NICs can be configured in a static MCS rate, without aggregation. The rest will have to be up to someone else in the community. My plate is full.

So, TDMA on the 802.11n NICs is now working. Go forth and hack!

Making the AR5210 NIC work in the office..

I'm quite happy that FreeBSD's ath(4) driver supports almost all of the PCI and PCIe devices that Atheros has made. Once I find a way to open source this AR9380 HAL I've constructed, we'll actually support them all. However, there are a few little niggling things that have been bugging me. Today I addressed one of those.

The AR5210. It's their first 11a-only NIC. It does up to 54MBit OFDM 802.11a; it doesn't do QoS/WME (as it only has one data queue); it "may" go up to 72MBit if I hack on some magic extensions. And in open mode, it works great.

But it didn't work in the office or at home. All of which are 802.11n APs with WPA2 authentication and AES-CCMP encryption.

Now, the AR5210 only does open and WEP encryption. It doesn't do TKIP or AES-CCMP. So the encryption has to happen in software. The NIC was associating fine, but when wpa_supplicant went to program in the AES-CCMP encryption keys, the HAL simply refused.

What I discovered was this.

The driver keycache was also trying to allocate keycache slots for the AR5210, where it only supports the 4 WEP keys.  This is a big no-no. So once I mapped them to all be slot 0, I made a little progress.

The net80211 layer was trying to program in an AES-CCMP key, which the driver was dutifully passing to the HAL. The AR5210 HAL doesn't support anything but WEP or open, so the encryption key type was "clear". Now, "clear" means "for this MAC address, don't try decrypting anything." But the AR5210 HAL code rejected it - as I said, it doesn't do that.

Ok, so I ignored that entirely. I mapped all of the software encrypted key entries to slot 0 and just didn't program the hardware. So now the HAL didn't reject things. But it wasn't working. The received frames were being corrupted somehow and failed the CCMP MIC integrity check. I took at look at the frames being received (which should've been "clear" versus what was going on in the air - luckily, this laptop has an AR9280 inside so I could put it into monitor mode and sniff things. The packets just didn't add up. I was confused.

Then after discussing this with my flatmate, I idly wondered if the hardware was decrypting the traffic anyway. And, well, it was. Encrypted frames have the WEP bit set in the 802.11 header - whether they're WEP, TKIP, AES-CCMP. The AR5210 didn't know it wasn't WEP, so it tried decoding the frames itself. And corrupting them.

So after finding a PCU control register (hi AR_DIAG_SW) that lets me disable encryption/decryption, I was able to pass through the encrypted traffic fine and everything just plain worked. It's odd seeing an 11a, non-QoS station on my 11n AP, but that just goes to show that backwards interoperability is still useful.

And yes, I did take the AR5210 into the office and I did sit in a meeting with it and use it to work from. It let me onto the corporate wireless just fine, thankyou.

So now the FreeBSD AR5210 support doesn't do any hardware encryption. You can turn it on again if you'd like. Why? Because I don't want the headache of someone coming to me and asking why a dual-VAP AP with WEP and CCMP is failing. The hardware can only do _either_ WEP/open with hardware encryption, _or_ it can do everything without hardware encryption. So I decided to just disable it for now.

There's also a problem with how encryption is specified to net80211. It's done at startup time, when the driver attaches. Anything that isn't specified as being done in hardware is done in software. There is currently no clean way to dynamically change that configuration. So, if I have WEP encryption in hardware but CCMP/TKIP in software, I have to dynamically flip on/off the hardware encryption _AND_ I have to enforce that WEP and CCMP doesn't get configured at the same time.

The cleaner solution would be to:
  • Create a new driver attribute, which indicates the hardware can do WEP and CCMP at the same time - make sure it's off for the AR5210;
  • Add a HAL call to enable/disable hardware encryption;
  • If a user wants to do WEP or open - enable hardware encryption;
  • If a user wants to do CCMP/TKIP/etc - disable hardware encryption;
  • Complain if the user wants to create a VAP with CCMP/TKIP and WEP.
 If someone wants a mini-project - and they have an AR5210 - I'm all for it. But at the moment, this'll just have to do.

Minor wi(4) hacking

I finally had some time, a need for wireless connectivity in the shop and found my stash of old Prism 2, 2.5 and 3 cards. I thought I'd try them out with the new vap stuff.

First, there's a new filter on the firmware revisions in the driver. Very old versions of the firmware basically don't work without a lot of coaxing and workarounds. They are no longer supported. In addition, symbol cards are fairly rare and there's no readily available documentation for them, so support for them was removed as well. This filtering is a good thing, since it will keep people from using cards that are known to be broken. However, I had to fix a bug in the interrupt registration to make this work out OK (and to also stop spontaneous panics sometimes on attach if there was a lot of interrupt activity on a shared interrupt). I basically moved the interrupt registration to the end of the attach function, rather than the beginning and solved both problems.

All of these cards that I had don't have new enough firmware to support WPA. This highlights another problem: wpa_supplicant doesn't filter out WPA APs when it is looking for things to attach to, or otherwise provide a meaningful error message that would tell the user the reason that WPA isn't working is that their card is too old/lame to support this new-fangled stuff....

I wound up solving my problem with an atheros card that just worked, once I found them...