On Facebook, a member of the Delphi Group posted a question, asking if there were any benchmarks comparing RAD studio with other middleware frameworks.
One of the respondents replied with a link to a recently created benchmark framework found on Github, which include a number of function compatible HTTP servers written using different frameworks, and a test client framework written in Node.js, which is seemingly able to performance test the various HTTP servers in two different modes (blank and work) and with 1, 100 and 10000 concurrent connections.
The test framework can be found here: https://github.com/d-mozulyov/NetBenchmarks
It contains a number of precompiled HTTP servers that can be tested
- Indy.HTTP.exe (64 bit Delphi, based on Indy HTTP server)
- IndyPool.HTTP.exe (64 bit Delphi, variant based on Indy HTTP server)
- TMSSparkle.HTTP.exe (64 bit Delphi, based on TMS Sparkle)
- RealThinClient.HTTP (64 bit Delphi, based on RTC)
- Synopse.HTTP (64 bit Delphi, based on Synopse/Mormot)
- Node.HTTP (Node.js based)
- Golang.HTTP (Golang based)
Core source is available for all of them, which do not include the 3rdparty libraries. However having the executables makes it possible to benchmark them on various equipment which is fine. And having the source provides insight into what is actually being benchmarked, which is also good.
Further the source makes it possible to create other precompiled HTTP servers using other frameworks, for example kbmMW. So obviously that had to happen 🙂
So we can add:
- kbmMW.HTTP (64 bit Delphi, based on kbmMW)
to the list. Source and precompiled executable can be downloaded here.
From the outset all servers provide a HTTP web interface running on port 1234. Depending on if the server application is started with a parameter of 1 or not, it will run in Work or in Blank mode.
- Work mode accepts a JSON document provided by the benchmark client, and returns another JSON document to the client.
- Blank mode simply returns the text ‘OK’.
After installing Node.js from https://nodejs.org/en/download/ and having downloaded and extracted the NetBenchmarks toolset from GitHub, it is possible to run the benchmark.
I ran the benchmark on my sturdy AMD Zen 1 1950x having 16 cores able to run 32 thread typically at 3.6Ghz, 32GB RAM, some heavy lifting NVidia Titan V graphics cards, and a combination of HDD and M.2 SSD drives.
The machine is used for many other things, and as such the benchmark results will vary with each run, depending on other load on the machine, although I have tried to minimize it. But roughly 15% of the combined CPU time of all the cores were in use with background operations.
Further the absolute numbers will look way different if run on other equipment and CPU architecture.
Benchmarking is difficult, and the best benchmark is to run an application with the real features on the actual real production hardware. Only there you will find the true performance of that particular combination.
Things that can offset or skew a benchmark significantly is the benchmarking software itself. It is coded in a certain way, with certain assumptions, and will thus only benchmark that particular way of coding, which may or may not actually say anything about the true performance of the tested servers. I will get more into that in a bit.
However let us run it anyway. Benchmarking is funny.
Running the NetBenchmarks benchmark
It is as simple as running this bat file: benchmark.HTTP.bat
It will in turn make 6 tests on each of the HTTP servers, 1 connection Blank, 1 connection Work, 100 connections Blank, 100 connections Work, 10000 connections Blank, 10000 connections Work.
The result is output on the console. I have put it into a spreadsheet and a graph.
|Request / second|
|1 / blank||354||319||334||346||347||336||365||365|
|1 / work||353||313||350||337||377||359||364||353|
|100 / blank||1926||1072||410||1613||1852||617||1838||1890|
|100 / work||1908||612||175||1054||1763||326||1779||1776|
|10000 / blank||1000||944||908||970||1054||1073||970||984|
|10000 / work||912||848||890||830||1014||1012||907||1000|
That gives us some interesting numbers and graphs. As can be seen the Node.js and Golang HTTP servers generally fare quite well in all tests. That is expected, based on what I have heard from communities around those platforms/frameworks.
But what is more interesting is that many of the Delphi servers are keeping pace and even beating Node.js and Golang in all test cases except for 1/blank.
What is obviously also interesting is that kbmMW is doing quite well, but is being beaten by Synopse/Mormot and TMS Sparkle in the 10000 connection tests.
A ran all the tests multiple times, noting down the best numbers for each of the servers to eliminate poor outliers due to background CPU usage and thus try to make a fair test for all.
I however noticed that in some situations, the benchmark produced only 25% of the usual performance on specifically the 1 connection tests. And when it happened I could run the test multiple times, and it would still stick stable at 25% performance.
I checked background CPU load and there were no outliers due to that, nor paging or other such stuff.
I also found the number of requests per second to be quite small (350ish). Knowing kbmMW I would have expected a much higher number, so something does seem fishy with the Node.js client benchmark.
It turns out that the Node.js client benchmark actually do create the number of connections, but it will never execute a request on all of them at any time. Well… with a limited number of cores that will always be the case, but Node.js seems to be limiting it further. I suspected the client benchmark code to be a bottleneck in it self, not really providing the means to really exercise the HTTP servers.
So I dug out an old HTTP stress test client I had made long ago, because it should be very useable in this benchmark scenario.
Stress test benchmark
It actually allocates one thread for each connection to test. It will obviously also saturate the CPU cores with threads, that will not technically be able to run at the exact same time, but since the CPU and Windows is time slicing execution time between the threads, all threads will get a reasonably fair share of time, thus to an extent mimicking a CPU with a endless number of cores.
I ran a number of tests and I ran them a number of times, to try to eliminate poor outlier readings.
Being a kbmMW kind of guy, I included some variants of tests for specifically kbmMW. It concerns the number of worker threads to use by the kbmMW TCP transport layer, and if the HTTP header should be fully parsed or not.
The later is an option that I decided to add, because it seems to me that several if not all of the other solutions only parse a minimum of the header information each time, and leave it up to the developer to parse the rest if needed.
As kbmMW default always provides fully parsed headers to the developer, it was somewhat unfair to kbmMW, however all tests were run with full header parsing and some extra with the mandatory header parsing turned off.
I decided to run the tests on the blank only part of the servers, and every test was going to run totally 100000 calls to each server spread across 1, 10, 50, 100, 200 and 1000 concurrent connections (and threads).
|Request / second|
|Connections / Repetitions||kbmMW A||kbmMW B||kbmMW C||kbmMW D||Indy||IndyPool||RTC||Synopse||TMS Sparkle||Node.js||Golang|
|1 / 100000||4031||1010||1357||4853||4640||3946||4280||4709|
|10 / 10000||5932||6189||1303||680||5249||5230||4065||4715||5146|
|50 / 2000||6744||955||344||5563||5682||3881||5025||5392|
|100 / 1000||7180||1406||907||6052||5508||3266||4916||5417|
|200 / 500||6693||6268||5554||827||5197||5671|
As the numbers show, all the servers were able to provide a significant higher throughput, as long as a client were able to feed the servers with requests.
The kbmMW tests were made in 4 different variations:
- kbmMW A : 6 worker I/O threads, headers fully parsed (default for all tests)
- kbmMW B : 4 worker I/O threads, headers fully parsed
- kbmMW C : 8 worker I/O threads, headers fully parsed
- kbmMW D : 8 worker I/O threads, headers only partially parsed
No doubt the servers and the clients are competing for the existing cores and physical threads. However all the servers suffer the same penalty of the thread contention, so the result should be comparable.
I have no doubt that moving the test client to another machine, potentially would provide some even better numbers for several of the frameworks, specially when we are talking 100+ concurrent connections, simply due to the client not polluting the server benchmark machine with running client threads. However on the other hand, additional overhead is to be expected due to not being able to use the local loop back network, but instead having to traverse thru a couple of network cards and a cable.
It is a myth that Node.js and Golang is faster for servers than building one with Delphi. No doubt Node.js and Golang provides some very respectable numbers, but so does several of the Delphi solutions, notably RTC, Synapse (Mormot) and kbmMW which conquered the crown in these benchmarks.
2,082 total views, 23 views today