Background: I’m developing something that should eventually become a high performance network server, with high transactions/s rate (basically a database cache). Currently I’m experimenting with various modes of using SMP facilities for the server (thread usage, binding, etc.). A big problem is that, while I temporarily have a server on which to test it, I don’t have a client machine which could push the server to the limits. I currently have a “dumb” multithreaded benchmark client, spawning N threads (N is 40 in these tests), each of which is a blocking network client (i.e. one thread per connection). This setup, when run via local Unix sockets on the server, can achieve 125,000 trans/s, but I believe the result should be much better if the client doesn’t task the CPU of the server.
Marko Zec helped me with that, temporarily providing me a machine which dual boots 7.x and 4.x with his VIMAGE patches, as well as without the patches. Originally I just used the 7.0 system, and achieved something like 62,000 trans/s, which is too low for me. On his insistence I booted 4.11 and ported the client-side benchmark on it. Without any significant modifications except those needed for the difference between gcc 2.9x and gcc 4, the same client code rocketed to 81,000 trans/s! This is using libc_r, meaning the whole 40-threaded thing is visible to the kernel as one process (4.x doesn’t have kernel support for multithreading)! This number is still too low and I’ll probably need to find several machines that could work at the same time to overtax the server (which will be very hard) but just the raw difference between 4.x and 7.x is staggering. Network card is bge, gigabit, directly connected to the server via crossover cable.
On the bright side, VIMAGE patches don’t influence the performance noticeably.