Preface

On Facebook, a member of the Delphi Group posted a question, asking if there were any benchmarks comparing RAD studio with other middleware frameworks.

One of the respondents replied with a link to a recently created benchmark framework found on Github, which include a number of function compatible HTTP servers written using different frameworks, and a test client framework written in Node.js, which is seemingly able to performance test the various HTTP servers in two different modes (blank and work) and with 1, 100 and 10000 concurrent connections.

The test framework can be found here: https://github.com/d-mozulyov/NetBenchmarks

It contains a number of precompiled HTTP servers that can be tested

  • Indy.HTTP.exe (64 bit Delphi, based on Indy HTTP server)
  • IndyPool.HTTP.exe (64 bit Delphi, variant based on Indy HTTP server)
  • TMSSparkle.HTTP.exe (64 bit Delphi, based on TMS Sparkle)
  • RealThinClient.HTTP (64 bit Delphi, based on RTC)
  • Synopse.HTTP (64 bit Delphi, based on Synopse/Mormot)
  • Node.HTTP (Node.js based)
  • Golang.HTTP (Golang based)

Core source is available for all of them, which do not include the 3rdparty libraries. However having the executables makes it possible to benchmark them on various equipment which is fine. And having the source provides insight into what is actually being benchmarked, which is also good.

Further the source makes it possible to create other precompiled HTTP servers using other frameworks, for example kbmMW. So obviously that had to happen πŸ™‚

So we can add:

  • kbmMW.HTTP (64 bit Delphi, based on kbmMW)

to the list. Source and precompiled executable can be downloaded here.

From the outset all servers provide a HTTP web interface running on port 1234. Depending on if the server application is started with a parameter of 1 or not, it will run in Work or in Blank mode.

  • Work mode accepts a JSON document provided by the benchmark client, and returns another JSON document to the client.
  • Blank mode simply returns the text ‘OK’.

After installing Node.js from https://nodejs.org/en/download/ and having downloaded and extracted the NetBenchmarks toolset from GitHub, it is possible to run the benchmark.

Benchmark conditions

I ran the benchmark on my sturdy AMD Zen 1 1950x having 16 cores able to run 32 thread typically at 3.6Ghz, 32GB RAM, some heavy lifting NVidia Titan V graphics cards, and a combination of HDD and M.2 SSD drives.

The machine is used for many other things, and as such the benchmark results will vary with each run, depending on other load on the machine, although I have tried to minimize it. But roughly 15% of the combined CPU time of all the cores were in use with background operations.

Further the absolute numbers will look way different if run on other equipment and CPU architecture.

Benchmarking is difficult, and the best benchmark is to run an application with the real features on the actual real production hardware. Only there you will find the true performance of that particular combination.

Things that can offset or skew a benchmark significantly is the benchmarking software itself. It is coded in a certain way, with certain assumptions, and will thus only benchmark that particular way of coding, which may or may not actually say anything about the true performance of the tested servers. I will get more into that in a bit.

However let us run it anyway. Benchmarking is funny.

Running the NetBenchmarks benchmark

It is as simple as running this bat file: benchmark.HTTP.bat

It will in turn make 6 tests on each of the HTTP servers, 1 connection Blank, 1 connection Work, 100 connections Blank, 100 connections Work, 10000 connections Blank, 10000 connections Work.

The result is output on the console. I have put it into a spreadsheet and a graph.

Request / second
ConnectionskbmMWIndyIndyPoolRTCSynopseTMS SparkleNode.jsGolang
1 / blank354319334346347336365365
1 / work353313350337377359364353
100 / blank192610724101613185261718381890
100 / work19086121751054176332617791776
10000 / blank100094490897010541073970984
10000 / work912848890830101410129071000
NetBenchmarks result

That gives us some interesting numbers and graphs. As can be seen the Node.js and Golang HTTP servers generally fare quite well in all tests. That is expected, based on what I have heard from communities around those platforms/frameworks.

But what is more interesting is that many of the Delphi servers are keeping pace and even beating Node.js and Golang in all test cases except for 1/blank.

What is obviously also interesting is that kbmMW is doing quite well, but is being beaten by Synopse/Mormot and TMS Sparkle in the 10000 connection tests.

A ran all the tests multiple times, noting down the best numbers for each of the servers to eliminate poor outliers due to background CPU usage and thus try to make a fair test for all.

I however noticed that in some situations, the benchmark produced only 25% of the usual performance on specifically the 1 connection tests. And when it happened I could run the test multiple times, and it would still stick stable at 25% performance.

I checked background CPU load and there were no outliers due to that, nor paging or other such stuff.

I also found the number of requests per second to be quite small (350ish). Knowing kbmMW I would have expected a much higher number, so something does seem fishy with the Node.js client benchmark.

It turns out that the Node.js client benchmark actually do create the number of connections, but it will never execute a request on all of them at any time. Well… with a limited number of cores that will always be the case, but Node.js seems to be limiting it further. I suspected the client benchmark code to be a bottleneck in it self, not really providing the means to really exercise the HTTP servers.

So I dug out an old HTTP stress test client I had made long ago, because it should be very useable in this benchmark scenario.

Stress test benchmark

It actually allocates one thread for each connection to test. It will obviously also saturate the CPU cores with threads, that will not technically be able to run at the exact same time, but since the CPU and Windows is time slicing execution time between the threads, all threads will get a reasonably fair share of time, thus to an extent mimicking a CPU with a endless number of cores.

I ran a number of tests and I ran them a number of times, to try to eliminate poor outlier readings.

Being a kbmMW kind of guy, I included some variants of tests for specifically kbmMW. It concerns the number of worker threads to use by the kbmMW TCP transport layer, and if the HTTP header should be fully parsed or not.

The later is an option that I decided to add, because it seems to me that several if not all of the other solutions only parse a minimum of the header information each time, and leave it up to the developer to parse the rest if needed.

As kbmMW default always provides fully parsed headers to the developer, it was somewhat unfair to kbmMW, however all tests were run with full header parsing and some extra with the mandatory header parsing turned off.

I decided to run the tests on the blank only part of the servers, and every test was going to run totally 100000 calls to each server spread across 1, 10, 50, 100, 200 and 1000 concurrent connections (and threads).

Request / second
Connections / RepetitionskbmMW AkbmMW BkbmMW CkbmMW DIndyIndyPoolRTCSynopseTMS SparkleNode.jsGolang
1 / 10000040311010135748534640394642804709
10 / 1000059326189130368052495230406547155146
50 / 2000674495534455635682388150255392
100 / 10007180140690760525508326649165417
200 / 50066936268555482751975671
1000 /1006709622073557199595666076293
Stress test benchmark

As the numbers show, all the servers were able to provide a significant higher throughput, as long as a client were able to feed the servers with requests.

The kbmMW tests were made in 4 different variations:

  • kbmMW A : 6 worker I/O threads, headers fully parsed (default for all tests)
  • kbmMW B : 4 worker I/O threads, headers fully parsed
  • kbmMW C : 8 worker I/O threads, headers fully parsed
  • kbmMW D : 8 worker I/O threads, headers only partially parsed

No doubt the servers and the clients are competing for the existing cores and physical threads. However all the servers suffer the same penalty of the thread contention, so the result should be comparable.

I have no doubt that moving the test client to another machine, potentially would provide some even better numbers for several of the frameworks, specially when we are talking 100+ concurrent connections, simply due to the client not polluting the server benchmark machine with running client threads. However on the other hand, additional overhead is to be expected due to not being able to use the local loop back network, but instead having to traverse thru a couple of network cards and a cable.

Conclusion

It is a myth that Node.js and Golang is faster for servers than building one with Delphi. No doubt Node.js and Golang provides some very respectable numbers, but so does several of the Delphi solutions, notably RTC, Synapse (Mormot) and kbmMW which conquered the crown in these benchmarks.

 1,489 total views,  2 views today

7 thoughts on “HTTP middleware benchmarks”
  1. You don’t mention the OS, min. image size or spin-up times. Request responsiveness is important but when building systems to scale horizontally .. at, um, scale, deployment efficiency is also a consideration.

    1. Hi,
      OS is Win10 Pro reasonably new updates.
      The monolithic executables range from 2MB to 12MB. The node.js sample required installation of the node.js runtime environment which is a 27MB download, and perhaps roughly twice the installation size. It was required anyway for the NetBenchmarks client.
      I allowed for 6-8 seconds spin up time after start, before triggering the client, which in turn (depending on the number of client threads) took 1-2 seconds starting all threads up, before beginning the calls and measurements.
      In addition the NetBenchmarks test included two JSON files, one which were sent to the server, and the other which can be used for comparison with the returned result.
      Those two files are not a requirement when running the blank variant of the test, in which case only the node.js solution required anything but the simple executable.

  2. why you don’t give a try to Webbroker default?
    check this testhttps://en.delphipraxis.net/topic/5620-pgpool-linux-apache-top-performance-delphi11/
    highly scalable both in IIS windows and Apache linuxhttps://github.com/danieleteti/delphimvcframework true MVC
    really Delphi can be excellent as server

  3. Hi
    RTC is more or less abandon-ware now, which is very sad. I used it and was forced to drop when the Teppi bought it and dropped the development. There is some info about a totally rewritten new library, but no other sign. The package is not supporting Delphi 10.4 and 11.
    It would very good if they restart the support for the latest Delphi.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.