Delphi kbmMW Performance RAD Server REST

HTTP middleware benchmarks

Bykimbomadsen

Nov 2, 2021 Delphi, kbmMW, REST

Contents

Preface

On Facebook, a member of the Delphi Group posted a question, asking if there were any benchmarks comparing RAD studio with other middleware frameworks.

One of the respondents replied with a link to a recently created benchmark framework found on Github, which include a number of function compatible HTTP servers written using different frameworks, and a test client framework written in Node.js, which is seemingly able to performance test the various HTTP servers in two different modes (blank and work) and with 1, 100 and 10000 concurrent connections.

The test framework can be found here: https://github.com/d-mozulyov/NetBenchmarks

It contains a number of precompiled HTTP servers that can be tested

Indy.HTTP.exe (64 bit Delphi, based on Indy HTTP server)
IndyPool.HTTP.exe (64 bit Delphi, variant based on Indy HTTP server)
TMSSparkle.HTTP.exe (64 bit Delphi, based on TMS Sparkle)
RealThinClient.HTTP (64 bit Delphi, based on RTC)
Synopse.HTTP (64 bit Delphi, based on Synopse/Mormot)
Node.HTTP (Node.js based)
Golang.HTTP (Golang based)

Core source is available for all of them, which do not include the 3rdparty libraries. However having the executables makes it possible to benchmark them on various equipment which is fine. And having the source provides insight into what is actually being benchmarked, which is also good.

Further the source makes it possible to create other precompiled HTTP servers using other frameworks, for example kbmMW. So obviously that had to happen 🙂

So we can add:

kbmMW.HTTP (64 bit Delphi, based on kbmMW)

to the list. Source and precompiled executable can be downloaded here.

From the outset all servers provide a HTTP web interface running on port 1234. Depending on if the server application is started with a parameter of 1 or not, it will run in Work or in Blank mode.

Work mode accepts a JSON document provided by the benchmark client, and returns another JSON document to the client.

Blank mode simply returns the text ‘OK’.

After installing Node.js from https://nodejs.org/en/download/ and having downloaded and extracted the NetBenchmarks toolset from GitHub, it is possible to run the benchmark.

Benchmark conditions

I ran the benchmark on my sturdy AMD Zen 1 1950x having 16 cores able to run 32 thread typically at 3.6Ghz, 32GB RAM, some heavy lifting NVidia Titan V graphics cards, and a combination of HDD and M.2 SSD drives.

The machine is used for many other things, and as such the benchmark results will vary with each run, depending on other load on the machine, although I have tried to minimize it. But roughly 15% of the combined CPU time of all the cores were in use with background operations.

Further the absolute numbers will look way different if run on other equipment and CPU architecture.

Benchmarking is difficult, and the best benchmark is to run an application with the real features on the actual real production hardware. Only there you will find the true performance of that particular combination.

Things that can offset or skew a benchmark significantly is the benchmarking software itself. It is coded in a certain way, with certain assumptions, and will thus only benchmark that particular way of coding, which may or may not actually say anything about the true performance of the tested servers. I will get more into that in a bit.

However let us run it anyway. Benchmarking is funny.

Running the NetBenchmarks benchmark

It is as simple as running this bat file: benchmark.HTTP.bat

It will in turn make 6 tests on each of the HTTP servers, 1 connection Blank, 1 connection Work, 100 connections Blank, 100 connections Work, 10000 connections Blank, 10000 connections Work.

The result is output on the console. I have put it into a spreadsheet and a graph.

	Request / second
Connections	kbmMW	Indy	IndyPool	RTC	Synopse	TMS Sparkle	Node.js	Golang
1 / blank	354	319	334	346	347	336	365	365
1 / work	353	313	350	337	377	359	364	353
100 / blank	1926	1072	410	1613	1852	617	1838	1890
100 / work	1908	612	175	1054	1763	326	1779	1776
10000 / blank	1000	944	908	970	1054	1073	970	984
10000 / work	912	848	890	830	1014	1012	907	1000

NetBenchmarks result

That gives us some interesting numbers and graphs. As can be seen the Node.js and Golang HTTP servers generally fare quite well in all tests. That is expected, based on what I have heard from communities around those platforms/frameworks.

But what is more interesting is that many of the Delphi servers are keeping pace and even beating Node.js and Golang in all test cases except for 1/blank.

What is obviously also interesting is that kbmMW is doing quite well, but is being beaten by Synopse/Mormot and TMS Sparkle in the 10000 connection tests.

A ran all the tests multiple times, noting down the best numbers for each of the servers to eliminate poor outliers due to background CPU usage and thus try to make a fair test for all.

I however noticed that in some situations, the benchmark produced only 25% of the usual performance on specifically the 1 connection tests. And when it happened I could run the test multiple times, and it would still stick stable at 25% performance.

I checked background CPU load and there were no outliers due to that, nor paging or other such stuff.

I also found the number of requests per second to be quite small (350ish). Knowing kbmMW I would have expected a much higher number, so something does seem fishy with the Node.js client benchmark.

It turns out that the Node.js client benchmark actually do create the number of connections, but it will never execute a request on all of them at any time. Well… with a limited number of cores that will always be the case, but Node.js seems to be limiting it further. I suspected the client benchmark code to be a bottleneck in it self, not really providing the means to really exercise the HTTP servers.

So I dug out an old HTTP stress test client I had made long ago, because it should be very useable in this benchmark scenario.

Stress test benchmark

It actually allocates one thread for each connection to test. It will obviously also saturate the CPU cores with threads, that will not technically be able to run at the exact same time, but since the CPU and Windows is time slicing execution time between the threads, all threads will get a reasonably fair share of time, thus to an extent mimicking a CPU with a endless number of cores.

I ran a number of tests and I ran them a number of times, to try to eliminate poor outlier readings.

Being a kbmMW kind of guy, I included some variants of tests for specifically kbmMW. It concerns the number of worker threads to use by the kbmMW TCP transport layer, and if the HTTP header should be fully parsed or not.

The later is an option that I decided to add, because it seems to me that several if not all of the other solutions only parse a minimum of the header information each time, and leave it up to the developer to parse the rest if needed.

As kbmMW default always provides fully parsed headers to the developer, it was somewhat unfair to kbmMW, however all tests were run with full header parsing and some extra with the mandatory header parsing turned off.

I decided to run the tests on the blank only part of the servers, and every test was going to run totally 100000 calls to each server spread across 1, 10, 50, 100, 200 and 1000 concurrent connections (and threads).

	Request / second
Connections / Repetitions	kbmMW A	kbmMW B	kbmMW C	kbmMW D	Indy	IndyPool	RTC	Synopse	TMS Sparkle	Node.js	Golang
1 / 100000	4031				1010	1357	4853	4640	3946	4280	4709
10 / 10000	5932	6189			1303	680	5249	5230	4065	4715	5146
50 / 2000	6744				955	344	5563	5682	3881	5025	5392
100 / 1000	7180				1406	907	6052	5508	3266	4916	5417
200 / 500	6693						6268	5554	827	5197	5671
1000 /100	6709		6220	7355			7199	5956		6607	6293

Stress test benchmark

As the numbers show, all the servers were able to provide a significant higher throughput, as long as a client were able to feed the servers with requests.

The kbmMW tests were made in 4 different variations:

kbmMW A : 6 worker I/O threads, headers fully parsed (default for all tests)
kbmMW B : 4 worker I/O threads, headers fully parsed
kbmMW C : 8 worker I/O threads, headers fully parsed
kbmMW D : 8 worker I/O threads, headers only partially parsed

No doubt the servers and the clients are competing for the existing cores and physical threads. However all the servers suffer the same penalty of the thread contention, so the result should be comparable.

I have no doubt that moving the test client to another machine, potentially would provide some even better numbers for several of the frameworks, specially when we are talking 100+ concurrent connections, simply due to the client not polluting the server benchmark machine with running client threads. However on the other hand, additional overhead is to be expected due to not being able to use the local loop back network, but instead having to traverse thru a couple of network cards and a cable.

Conclusion

It is a myth that Node.js and Golang is faster for servers than building one with Delphi. No doubt Node.js and Golang provides some very respectable numbers, but so does several of the Delphi solutions, notably RTC, Synapse (Mormot) and kbmMW which conquered the crown in these benchmarks.

By kimbomadsen

kbmMW

9 thoughts on “HTTP middleware benchmarks”

Arnaud says:

November 2, 2021 at 13:24

The mORMot test program uses the wrong class on Windows.
It could be interresting to see also the new mORMot 2 THttpAsyncServer. Especially on Linux.
I have created an issue https://github.com/d-mozulyov/NetBenchmarks/issues/1

Reply
Jolyon says:

November 2, 2021 at 16:48

You don’t mention the OS, min. image size or spin-up times. Request responsiveness is important but when building systems to scale horizontally .. at, um, scale, deployment efficiency is also a consideration.

Reply
1. kimbomadsen says:
  
  November 2, 2021 at 17:18
  
  Hi,
  OS is Win10 Pro reasonably new updates.
  The monolithic executables range from 2MB to 12MB. The node.js sample required installation of the node.js runtime environment which is a 27MB download, and perhaps roughly twice the installation size. It was required anyway for the NetBenchmarks client.
  I allowed for 6-8 seconds spin up time after start, before triggering the client, which in turn (depending on the number of client threads) took 1-2 seconds starting all threads up, before beginning the calls and measurements.
  In addition the NetBenchmarks test included two JSON files, one which were sent to the server, and the other which can be used for comparison with the returned result.
  Those two files are not a requirement when running the blank variant of the test, in which case only the node.js solution required anything but the simple executable.
  
  Reply
BOB says:

November 3, 2021 at 15:55

why you don’t give a try to Webbroker default?
check this testhttps://en.delphipraxis.net/topic/5620-pgpool-linux-apache-top-performance-delphi11/
highly scalable both in IIS windows and Apache linuxhttps://github.com/danieleteti/delphimvcframework true MVC
really Delphi can be excellent as server

Reply
Sandor Dobi says:

November 5, 2021 at 07:50

Hi
RTC is more or less abandon-ware now, which is very sad. I used it and was forced to drop when the Teppi bought it and dropped the development. There is some info about a totally rewritten new library, but no other sign. The package is not supporting Delphi 10.4 and 11.
It would very good if they restart the support for the latest Delphi.

Reply
VadimMest says:

November 14, 2021 at 11:47

Hi
The file uServers.pas absences in archive.

Reply
1. kimbomadsen says:
  
  November 14, 2021 at 16:19
  
  It is part of the NetBenchmark package on Github
  
  Reply
Roland Bengtsson says:

August 28, 2022 at 15:25

Would be interesting to add https://github.com/DelphiBuilder/NetCom7 to the benchmark 😊

Regards Roland

Reply
1. Arnaud says:
  
  September 1, 2022 at 08:53
  
  NetCom7 is a basic TCP client/server which does not support HTTP and has poor usable of the socket API, even if it claims otherwise. It uses Select() on both Windows and POSIX, in blocking mode, in a single thread, and only check for reading state, not writing. A set of awful choices.
  
  Reply

HTTP middleware benchmarks

Bykimbomadsen

Preface

Benchmark conditions

Running the NetBenchmarks benchmark

Stress test benchmark

Conclusion

Related Posts:

By kimbomadsen

Related Post

kbmMW LINQ #4 – Even shorter hand shorthanding

NEW!!! kbmMW Community Edition v. 5.23.00 released for Delphi 12.0.0 Athens!

ANN: kbmMW Professional and Enterprise Edition v. 5.23.00 released!

9 thoughts on “HTTP middleware benchmarks”

Leave a Reply Cancel reply

You missed

kbmMW LINQ #4 – Even shorter hand shorthanding

NEW!!! kbmMW Community Edition v. 5.23.00 released for Delphi 12.0.0 Athens!

ANN: kbmMW Professional and Enterprise Edition v. 5.23.00 released!

ANN: kbmMemTable v. 7.99.00 Standard and Professional Edition released