Category Archives: http

On profiling HTTP, or "god damnit people, why are all the open source tools slow?"

Something that's been a challenge at work (and at other things in the past) has been "how do I generate enough traffic to test this thing?"

If you're running some public facing boxes then sure, you can do A/B testing. But what if you're not able to test it in the real world? What if you need to do testing before you ship, and the traffic levels have to be stupid high?

So, what do you do?

I've done this a few times. When doing squid and other reverse proxy development, I would run tools like apachebench, httperf, even web polygraph - but these things scaled poorly. They didn't handle tens of thousands of concurrent connections and scale to both slow and fast clients - their use of poll() and select() just wouldn't work out well.

Something I did at Netflix was to start building TCP testing tools that more than 65,000 concurrent sockets. My aim is much higher, but one has to start somewhere. I was testing out the network stack rather than specifically doing HTTP testing. Here at my current job, I'm much more interested in real HTTP and all processing.

I looked at what's out there, and it's not very pretty. I need to be able to do 10G of traffic, looking upwards towards 20G and 40G of HTTP in the future. After a little more digging into what was out there - and finding httperf actually reverted my changes to use libevent and went back to poll/select! - I decided it was about time I just started writing something minimal to stress test things and build upon it as the need arose. I want something that eventually ends up like web polygraph - multiple client/server sets with different URL choices from a pool, a variety of client IP addresses, and other things like how often to make the requests and other request pacing.

So, I grabbed libevent, libevhtp from Mark Ellzey and threw them together. It turned out okish - libevent/libevhtp still does a bunch of memcpy()'ing inside the buffer management routines that makes 40G on one box infeasible at the moment, but it's good enough to get a few gigabit of client traffic on one core. There were some hiccups which I'll cover below, but it's good enough to build upon.

What did I learn?
  • Well, it turns out the client code in libevhtp was a bit immature. Mark and I talked a bit about it on IRC and then I found there was an outstanding pull request that found and fixed a bunch of these. So, my code has turned into another thing - a libevhtp client and server test suite.
  • The libevhtp threading model is fine for a couple of CPUs, but it's the standardish *NIX model of "one thread does accept, farms work off to other threads." So it's not going to scale well at high request rates to multiple CPUs. That's cool; that's what the FreeBSD-HEAD RSS work is for.
  •  There's memcpy()'ing in the libevhtp body handling code. It's not a big deal at 1G, but at 10G it's definitely noticeable. I've spoken to Mark about it.
But, it's a good starting point. Once the rest of the bugs get shaken out, it'll be a good high throughput HTTP traffic tester.

What would I do next, after the bugs?

  • the server will eventually grow the ability to generate controllable sized responses. That way the client can control how big a response to send and thus can create a mix of requests/replies.
  • .. and HTTP request body testing would be nice.
  • The client side needs to grow the ability to create client pools, like web polygraph, where certain subsets of clients get certain behaviours (like a pool of IPs to use, separate pool of URLs to fetch from, the time between each HTTP request, etc.)
The other trick is how to simulate lots (and I do mean lots) of IP addresses. I don't want to create separate loopback connections for each - that would be crazy. Instead, it'd be good to use the transparent interception support in FreeBSD IPFW that allows both connections from and connections to arbitrary IP addresses. A little trickery with IP routing so we don't need more than 1 ARP entry for each server and voila!

Oh, and the code?

https://github.com/erikarn/libevhtp-http/

Follow FreeBSD on Twitter and send tweets from FreeBSD command line

It just came to my attention that Eric Anderson setup a FreeBSD feed on twitter. There you can find updates from the FreeBSD website, from the blogs aggregated at FreeBSD Planet, and other FreeBSD related RSS feeds published as 140 character tweets with tinyurl links to the full posts. I've been using twitter for a while now for two quite separate purposes. Primarily, I enjoy following people like Tim O'Reilly to get an endless stream of interesting tech links, ideas, and thoughts throughout the day. The updates are 140 characters or less and I only click through to those that I have time for so I find it less of a time sink than logging into my feedreader (Google Reader) and really digging into the news I'm interested in. I also find it quite useful for arranging social engagements. I use it as an SMS broadcast medium to make plans and arrange to meetup with friends for dinner, drinks, movies, or whatever after work. For the latter purpose Twitter works best in conjunction with a GPS-enabled smartphone and something like Loopt.Following Eric's lead I setup a couple of more specific FreeBSD related twitter accounts using Twitter Feed to automatically publish the updates from RSS. The first account freebsdannounce consists of all the RSS feeds from the main www.freebsd.org website (most of which I added almost exactly one year ago). The second account freebsdblogs consists of the FreeBSD Planet combined RSS feed. If you want everything subscribe to Eric's main FreeBSD feed, but if you want only a subset of that content subscribe to one of my two more specific feeds.Finally, I couldn't find a way to make simple updates to twitter from the base FreeBSD system command line so I created a patch for very basic HTTP POST support for fetch. Apply this patch, rebuild and reinstall libfetch(3) and fetch(1) and then you can update twitter from the command line (or send a simple POST request to other web services) with :
$ fetch -x status='Experimenting with Twitter API.' http://twitter.com/statuses/update.xml
fetch(1) will then prompt you for the HTTP authentication credentials of your twitter account.I'm not sure how useful other people find HTTP POST support in fetch. If you would find this useful let me know and maybe I'll clean up the patch above and send it out for review.

Follow FreeBSD on Twitter and send tweets from FreeBSD command line

It just came to my attention that Eric Anderson setup a FreeBSD feed on twitter. There you can find updates from the FreeBSD website, from the blogs aggregated at FreeBSD Planet, and other FreeBSD related RSS feeds published as 140 character tweets with tinyurl links to the full posts. I've been using twitter for a while now for two quite separate purposes. Primarily, I enjoy following people like Tim O'Reilly to get an endless stream of interesting tech links, ideas, and thoughts throughout the day. The updates are 140 characters or less and I only click through to those that I have time for so I find it less of a time sink than logging into my feedreader (Google Reader) and really digging into the news I'm interested in. I also find it quite useful for arranging social engagements. I use it as an SMS broadcast medium to make plans and arrange to meetup with friends for dinner, drinks, movies, or whatever after work. For the latter purpose Twitter works best in conjunction with a GPS-enabled smartphone and something like Loopt.

Following Eric's lead I setup a couple of more specific FreeBSD related twitter accounts using Twitter Feed to automatically publish the updates from RSS. The first account freebsdannounce consists of all the RSS feeds from the main www.freebsd.org website (most of which I added almost exactly one year ago). The second account freebsdblogs consists of the FreeBSD Planet combined RSS feed. If you want everything subscribe to Eric's main FreeBSD feed, but if you want only a subset of that content subscribe to one of my two more specific feeds.

Finally, I couldn't find a way to make simple updates to twitter from the base FreeBSD system command line so I created a patch for very basic HTTP POST support for fetch. Apply this patch, rebuild and reinstall libfetch(3) and fetch(1) and then you can update twitter from the command line (or send a simple POST request to other web services) with :

$ fetch -x status='Experimenting with Twitter API.' http://twitter.com/statuses/update.xml

fetch(1) will then prompt you for the HTTP authentication credentials of your twitter account.

I'm not sure how useful other people find HTTP POST support in fetch. If you would find this useful let me know and maybe I'll clean up the patch above and send it out for review.