Category Archives: cardbus

Why PCI latency timers matter..

My latest "are you serious?" moment recently was trying to figure out the root cause of this performance issue with the AR5416 cardbus NIC on some of my test laptops.

Now, the AR5416 is Atheros' first 802.11n NIC, so it has some rough edges. But I was seeing some ridiculously bad transmission failures and I couldn't pinpoint them.

Not only that, I was seeing great performance (~ 130mbit TCP) on a specific laptop (Lenovo T41p) but the Lenovo T60 and T400 both performed extremely poorly.

To make matters weirder - the NIC performed great when speaking to another NIC in the same laptop. Just not to another physically separate device.

So, after much digging, here's what I discovered.

Firstly - I used my athalq packet descriptor logging and inspection tool (that's in FreeBSD-HEAD - no custom closed source code here!) to investigate the TX frames being sent to the hardware. What I found was troubling - large numbers of frames had TX data and TX delimiter underruns.

I then discovered that my code for counting TX data / delimiter underruns was totally incorrect - it's possible to see both a data/delimiter underrun error _with_ a valid transmitting frame. What was going on was cute - the hardware would start transmitting an aggregate frame but the DMA wouldn't keep up during said transmission and half way through the frame it would underrun. This only happened at higher MCS rates.

So making shorter aggregate frames fixed it, as well as increasing the delimiter count between frames. Both had the effect of reducing the likelihood of the NIC failing to transmit a longer aggregate. But they weren't solutions.

So I went digging. What I found was pretty simple in theory: the PCI latency timer on the NIC was being set to something appropriate (0xa8) but the PCI latency timer on the cardbus PCI bridge itself was not (0x20.) So any other bus activity would cause the NIC to not get the bus and it'd miss its DMA window.

Once I manually fixed the PCI bridge latency timer to be 0xa8, everything returned to normal.

However - there's only one thing on this PCI bridge - the cardbus interface itself. That's why it's so kooky. I would've thought that I'd have to up the value on the rest of the PCI bridges up to the root complex. There's no latency timer for PCIe, so it's not a problem there. So there's likely some very subtle timing involved that's just plain broken by default on how the BIOS initialises this cardbus slot and FreeBSD is not overriding it.

Now, if you see crappy performance on the PCI/cardbus 802.11n NICs in FreeBSD, you can check the output of 'athstats' to see if you do see TX underruns of any sort. If you are, the hardware isn't meeting the DMA deadlines it needs to DMA out frames and you need to do some further digging into your system to see why.

More CardBus fixes

After I committed my previous set of CardBus fixes, reports came in about interrupt storms, first with 16-bit cards, and later with 32-bit cards. These have been corrected in my latest fixes. We always act the CSTS interrupt when we see it. In addition, I've changed the acking of the 16-bit ExCA register to only happen when there's a "R2" or 16-bit card in the slot. Otherwise we now skip it. Hopefilly, this will help the shared interrupt case as well by eliminating some PCI bus cycles...The CSTS bit is interesting. It is the least well documented bit in the CardBus standard. It is unclear when it fires from reading the standard, but seems to be related to the card finishing its reset sequence. How this differs from the power-up interrupt, I'm not sure. I think we could further optimize the bring up of CardBus cards with it.In addition, I noticed there's a READY bit in the ExCA CSC register. We're currently doing busy waits to bring up the cards, including a couple of millisecond long DELAYs. I'll have to look into the prospect of using that interrupt to get around these issues there.Finally, I'm looking in earnest at the Alchemy Au1550 CPU for the OpenMicroServer port I'm doing. I've noticed that it uses a 16-bit PC Card interface. Maybe I'll finally need to rewrite the old PCIC driver from OLDCARD to make use of it (snagging the exca routines to make that easier). The OpenMicroServer has a CF card attached to it.

More CardBus fixes

After I committed my previous set of CardBus fixes, reports came in about interrupt storms, first with 16-bit cards, and later with 32-bit cards. These have been corrected in my latest fixes. We always act the CSTS interrupt when we see it. In addition, I've changed the acking of the 16-bit ExCA register to only happen when there's a "R2" or 16-bit card in the slot. Otherwise we now skip it. Hopefilly, this will help the shared interrupt case as well by eliminating some PCI bus cycles...

The CSTS bit is interesting. It is the least well documented bit in the CardBus standard. It is unclear when it fires from reading the standard, but seems to be related to the card finishing its reset sequence. How this differs from the power-up interrupt, I'm not sure. I think we could further optimize the bring up of CardBus cards with it.

In addition, I noticed there's a READY bit in the ExCA CSC register. We're currently doing busy waits to bring up the cards, including a couple of millisecond long DELAYs. I'll have to look into the prospect of using that interrupt to get around these issues there.

Finally, I'm looking in earnest at the Alchemy Au1550 CPU for the OpenMicroServer port I'm doing. I've noticed that it uses a 16-bit PC Card interface. Maybe I'll finally need to rewrite the old PCIC driver from OLDCARD to make use of it (snagging the exca routines to make that easier). The OpenMicroServer has a CF card attached to it.

Cardbus Fixes

I just checked into the tree some CardBus fixes. The biggest change was to the power-up sequence, as well as transitioning to using filters for the card change events. These two changes are somewhat intertwined, unfortunately, since the latter exposed some holes in the former. In a nutshell, we now register a filter for the card status change events. This means we can mark the card as bad right away before any additional interaction can happen to the card. We defer doing anything about the badness for a little bit (basically until the machine is idle enough for the cardbus kernel thread to run). The interlock mechanism is also much ligher weight, having moved from mutexes and CVs to simple msleeps. Fast interrupts require some care to get right, so I hope I've gotten it all right. The move is dictated by what you can and cannot do in a fast interrupt handler.Before making these changes, my atheros card would often reset for no reason. After these changes, I've not had it reset until I started using a kernel without the changes. I'm not sure why this would make such a big difference, and I hate mysteries. They indicate that there's something I don't understand, which I also hate. I'd be interested to see if I still see this when the fully integrated code is committed and looped back (I have several trees, and I most frequently run an unmodified kernel from svn, but sometimes also run one of these trees). There's a small chance that one of my other local changes could be the cause, but given what they are it seems doubtful. Still, a good datapoint if it is.In addition, these fixes add a retry option for the BadVcc errors that we'd see sometimes. These are annoying on some machines, and often times the best way to get around them was to reload the driver. They don't happen at all on TI based chipsets that I've seen (at least more recent ones). Instead, they were confined to the Ricoh chipsets. Since I switched my laptop a couple of years ago, I haven't seen them. These changes post-date the change, but have been tested lightly on another Ricoh laptop that I have. They've been heavily tested on the TI laptop. They are based only on the description of the problem in the NetBSD PR, since the implementations are so different for the different BSDs in this area.Finally, I did some comment tweaking and shuffling. I also managed a style change or two. These should have absolutely no effect on the running code, but hopefully help the reader of the code understand it better. Past experience has shown that these types of commits are most likely to provoke comment, even though the other two parts of my commits are actually quite a bit more important.Oh, why the flurry of commits? I was low on disk space, so I thought I'd go clean up. 11MB free just isn't enough. So I was looking around and discovered I still had a CVS tree and was going to blow it away. I did a final update just to see what changes I had. I'm glad I did, since I found these, and a few others. I still have more local changes there than I realized. Sometimes I guess it is good to run out of disk space and go on a cleaning spree? It also makes for smaller backups when you don't have top copy 5GB of /usr/obj and kernel build trees...

Cardbus Fixes

I just checked into the tree some CardBus fixes. The biggest change was to the power-up sequence, as well as transitioning to using filters for the card change events. These two changes are somewhat intertwined, unfortunately, since the latter exposed some holes in the former. In a nutshell, we now register a filter for the card status change events. This means we can mark the card as bad right away before any additional interaction can happen to the card. We defer doing anything about the badness for a little bit (basically until the machine is idle enough for the cardbus kernel thread to run). The interlock mechanism is also much ligher weight, having moved from mutexes and CVs to simple msleeps. Fast interrupts require some care to get right, so I hope I've gotten it all right. The move is dictated by what you can and cannot do in a fast interrupt handler.

Before making these changes, my atheros card would often reset for no reason. After these changes, I've not had it reset until I started using a kernel without the changes. I'm not sure why this would make such a big difference, and I hate mysteries. They indicate that there's something I don't understand, which I also hate. I'd be interested to see if I still see this when the fully integrated code is committed and looped back (I have several trees, and I most frequently run an unmodified kernel from svn, but sometimes also run one of these trees). There's a small chance that one of my other local changes could be the cause, but given what they are it seems doubtful. Still, a good datapoint if it is.

In addition, these fixes add a retry option for the BadVcc errors that we'd see sometimes. These are annoying on some machines, and often times the best way to get around them was to reload the driver. They don't happen at all on TI based chipsets that I've seen (at least more recent ones). Instead, they were confined to the Ricoh chipsets. Since I switched my laptop a couple of years ago, I haven't seen them. These changes post-date the change, but have been tested lightly on another Ricoh laptop that I have. They've been heavily tested on the TI laptop. They are based only on the description of the problem in the NetBSD PR, since the implementations are so different for the different BSDs in this area.

Finally, I did some comment tweaking and shuffling. I also managed a style change or two. These should have absolutely no effect on the running code, but hopefully help the reader of the code understand it better. Past experience has shown that these types of commits are most likely to provoke comment, even though the other two parts of my commits are actually quite a bit more important.

Oh, why the flurry of commits? I was low on disk space, so I thought I'd go clean up. 11MB free just isn't enough. So I was looking around and discovered I still had a CVS tree and was going to blow it away. I did a final update just to see what changes I had. I'm glad I did, since I found these, and a few others. I still have more local changes there than I realized. Sometimes I guess it is good to run out of disk space and go on a cleaning spree? It also makes for smaller backups when you don't have top copy 5GB of /usr/obj and kernel build trees...