What performance should I expect from your SSH server?

Lee Painter

Introduction

We receive many enquires from users requesting information about the performance of our server API and how it compares to other standard implementations. We decided to setup a laboratory test to provide some basic statistics and advice to users that would answer the following questions.

 

  1. What's the maximum throughput can I expect for a single connection?
  2. How does throughput scale with increasing connections?
  3. What resources are required to support this maximum throughput.

 

Test Environment

Our test environment consisted of a 1U rack server running with the following specification:

Server

Operating System: Ubuntu 12.04

Processor: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

Memory: 16GB

HDD: 2TB, SATA 6Gb/s NCQ

Network: 1 Gbps Interface

Test 1 - Throughout Requirements

The first test we setup was to determine the maximum throughput of the server for a single network interface and establish the CPU resource level required to support this throughput and how this scaled with multiple connections. As we already know that performance is heavily dependent on the type of cipher used in SSH connections we decided we would test two different ciphers, AES and Arcfour. We chose these because AES is the default cipher for our own implementation and many others whilst Arcfour is well known to be an efficient, fast cipher and should demonstrate the maximum performance of the API. We do not recommend using Arcfour in production.

We created a client script that allowed us to set the preferred cipher and initiate as many clients transferring a 500MB file as we required. We then started recording the throughput achieved by each execution of the script, increasing the number of connections with each iteration of the test. 

The server was configured to only use a single transfer thread which restricted the use to a single CPU core on the server. Therefore our results would provide information on how much throughput a single transfer thread could handle 

 

Test 1 - The Results

The table below provides the test results and the graph provides a more readable view.

As expected the Arcfour cipher was the better performing cipher with a single connection throughput of 97.2 MB/s but this comes with a compromise on security. 

The AES cipher provided a single connection throughput of 53.3 MB/s.

As the connections scaled we see an even share of throughput distribution across the connection, the server (which remember is currently restricted to a single thread/CPU core) maintains a consistent throughput throughout the different iterations.

What does this data tell us?

My interpretation of the data is that we could expect a 2-core server with 2 transfer threads to handle the maximum throughput of a 1Gbps network interface. If we want to scale the server to handle more load then we need to ensure that we have 1 Gbps Network Interface to every 2 transfer threads / CPU Cores.

Arcfour with SHA1 - Not recommended for External Use / Internal Only 

Connections 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
MB/s Average 97.2 48.6 33.0 24.7 20.4 16.7 14.6 12.8 11.5 10.3 9.4 8.7 8.0 7.4 6.9 6.5 6.2 5.8 5.5 5.2
Average Time 17 17 17 17 16 17 16 16 16 16 16 16 16 16 16 16 16 16 16 16
Total Time 17 34 50 67 81 99 113 129 144 161 176 191 206 222 239 255 268 284 300 317
MB/s Total 97.2 97.2 99.1 98.7 102.0 100.1 102.4 102.5 103.3 102.6 103.3 103.8 104.3 104.2 103.7 103.7 104.8 104.7 104.7 104.3
                    

AES/128/CTR with SHA1 - Recommended for External Use

Connections 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
MB/s Average 53.3 30.0 20.4 15.6 12.2 10.3 8.8 7.7 6.9 6.3 5.7 5.3 4.8 4.4 4.0 3.7 3.5 3.3 3.1 3.0
Average Time 31 28 27 27 27 27 27 27 27 26 27 26 26 27 28 28 28 28 28 28
Total Time 31 55 81 106 135 160 187 215 239 263 292 312 344 376 414 444 469 503 531 554
MB/s Total 53.3 60.1 61.2 62.4 61.2 62.0 61.9 61.5 62.2 62.8 62.2 63.6 62.4 61.5 59.9 59.5 59.9 59.1 59.1 59.7

 

 

Test 2 - Comparison against OpenSSH Server 

The next test we setup using the same client script and process was to perform the same test against our own server as well as an OpenSSH native server. This would provide a benchmark as to where our performance is compared to the most widely deployed SSH server.

This test repeats the same process as Test 1, however we placed no restriction on our own server and configured the number of transfer threads to 4 to match the number of cores available to OpenSSH. The transfers were still restricted to the single 1 Gbps network interface and we used the AES cipher.

 

Test 2 - Results

The table below provides the test results and the graph a more readable view.

Both servers scaled well with our own server achieving comparable performance with OpenSSH. 

Connections 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Maverick SSHD                                        
Total Time 32 34 49 65 77 92 107 122 137 151 166 180 196 210 231 240 256 270 285 302
MB/s Average 51.6 48.6 33.7 25.4 21.5 18.0 15.4 13.5 12.1 10.9 10.0 9.2 8.4 7.9 7.2 6.9 6.5 6.1 5.8 5.5
Average Time 32 17 16 16 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
MB/s Total 51.6 97.2 101.2 101.7 107.3 107.8 108.1 108.4 108.6 109.4 109.5 110.2 109.6 110.2 107.3 110.2 109.7 110.2 110.2 109.4

 

OpenSSH                                        
Total Time 30 34 50 64 78 92 106 122 136 151 165 180 195 210 224 240 253 269 283 298
MB/s Average 55.1 48.6 33.0 25.8 21.2 18.0 15.6 13.5 12.2 10.9 10.0 9.2 8.5 7.9 7.4 6.9 6.5 6.1 5.8 5.5
Average Time 30 17 17 16 16 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15
MB/s Total 55.1 97.2 99.1 103.3 105.9 107.8 109.1 108.4 109.4 109.4 110.2 110.2 110.2 110.2 110.7 110.2 111.0 110.6 110.9 110.9

 

Conclusions 

Our tests have demonstrated that the performance of the Maverick SSHD server is comparable to the native performance of an Open SSH server in our laboratory conditions. We have established a formula for ensuring that a server can maximise its resources when it needs to scale by ensuring that it has one NIC for each 2 CPU cores and that our own server should be configured with one transfer thread per core. Anything less than this and performance will almost certainly be compromised.