Netflix Zuul vs Nginx performance

Netflix Zuul vs Nginx performance

Posted by Stanislav Miklik | April 16, 2015 | spring

Nowadays you can hear lot about microservices. Spring Boot is an excellent choice for building single microservice but you need to interconnect them somehow. That’s what Spring Cloud tries to solve (among other things) – especially Spring Cloud Netflix. It provides various components e.g. Eureka discovery service together with client side load balancer Ribbon for inter-microservice communication. But if you want to communicate to outside world (you provide external API or you just use AJAX from your page heavily) it is good to hide your various services behind a proxy.

Natural choice would be Nginx. But Netflix comes with its own solution – intelligent router Zuul. It comes with lot of interesting features and can be used e.g. for authentication, service migration, load shedding and various dynamic routing options. And it is written in Java. If Netflix uses it, is it fast enough compared to native reverse proxy? Or is it just suitable as an companion to Nginx when flexibility (or other features) are important?

Disclaimer: Do not consider this as a serious benchmark. I just wanted to get feeling how Nginx and Zuul compares and I can’t find any benchmarks on internet (ok, maybe I was not searching long enough but I wanted get my hands dirty). It does not follow any recommended benchmarking methodology (warmup period, number of measurements,…) and I was just using 3 micro EC2 instances (that is not optimal neither) in different availability zones.

Test

So what have I done? Test was to compare raw performance of both solutions without any special features. I just concurrently make single HTTP request to get one HTML page (of size cca. 26KB). I used ApacheBench to make the test with 200 concurrent threads (I have tried also httperf but it looks that it was more CPU demanding so I got lower numbers then with ab).

Direct connection

First I was interested what is the performance of target HTTP server (once again Nginx) without any reverse proxy. Ab was running on one machine and was accessing target server directly.


$ ab -n 10000 -c 200 http://target/sample.html

....

Document Path: /sample.html
Document Length: 26650 bytes

Total transferred: 268940000 bytes
HTML transferred: 266500000 bytes
Requests per second: 2928.45 [#/sec] (mean)
Time per request: 68.295 [ms] (mean)
Time per request: 0.341 [ms] (mean, across all concurrent requests)
Transfer rate: 76911.96 [Kbytes/sec] received

Connection Times (ms)
 min mean[+/-sd] median max
Connect: 4 33 6.0 32 66
Processing: 20 35 7.5 35 392
Waiting: 20 35 6.4 34 266
Total: 24 68 7.8 66 423

Percentage of the requests served within a certain time (ms)
 50% 66
 66% 67
 75% 69
 80% 70
 90% 74
 95% 81
 98% 91
 99% 92
 100% 423 (longest request)

Quiet nice, few more tests shows similar values: 2928 ; 2725 ; 2834 ; 2648 req/s. There are some deviations but this number is not that important now.

Via Nginx

So now I could setup proxy server (Ubuntu 14.04 LTS) with default nginx installation. I just updated configuration to proxy to target server like:


server {
   listen 80 default_server;
   listen [::]:80 default_server ipv6only=on;

   # Make site accessible from http://localhost/
   server_name localhost;

   # allow file upload
   client_max_body_size 10M;

   location / {
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header Host $host;
      proxy_pass http://target:80;
   }
}

And run similar test as before

$ ab -n 50000 -c 200 http://proxy/sample.html
...
Server Software: nginx/1.4.6
Server Hostname: proxy
Server Port: 80

Document Path: /sample.html
Document Length: 26650 bytes

Concurrency Level: 200
Time taken for tests: 52.366 seconds
Complete requests: 50000
Failed requests: 0
Total transferred: 1344700000 bytes
HTML transferred: 1332500000 bytes
Requests per second: 954.81 [#/sec] (mean)
Time per request: 209.465 [ms] (mean)
Time per request: 1.047 [ms] (mean, across all concurrent requests)
Transfer rate: 25076.93 [Kbytes/sec] received

Connection Times (ms)
 min mean[+/-sd] median max
Connect: 3 50 11.7 48 114
Processing: 37 159 11.9 160 208
Waiting: 36 159 11.9 160 207
Total: 40 209 10.4 209 256

Percentage of the requests served within a certain time (ms)
 50% 209
 66% 212
 75% 214
 80% 216
 90% 220
 95% 224
 98% 232
 99% 238
 100% 256 (longest request)

Further results were 954 ; 953 ; 941 req/s. Performance and latency is (as expected) worse.

Via Zuul

Now we can use same machine to setup the zuul. Application itself is very simple:

@SpringBootApplication
@Controller
@EnableZuulProxy
public class DemoApplication {
  public static void main(String[] args) {
    new SpringApplicationBuilder(DemoApplication.class).web(true).run(args);
  }
}

And we just have to define fixed route in application.yml

zuul:
  routes:
    sodik:
      path: /sodik/**
      url: http://target

And now let’s try to run test.


$ ab -n 50000 -c 200 http://proxy:8080/sodik/sample.html

Server Software: Apache-Coyote/1.1
Server Hostname: proxy
Server Port: 8080

Document Path: /sodik/sample.html
Document Length: 26650 bytes

Concurrency Level: 200
Time taken for tests: 136.164 seconds
Complete requests: 50000
Failed requests: 2
(Connect: 0, Receive: 0, Length: 2, Exceptions: 0)
Non-2xx responses: 2
Total transferred: 1343497042 bytes
HTML transferred: 1332447082 bytes
Requests per second: 367.20 [#/sec] (mean)
Time per request: 544.657 [ms] (mean)
Time per request: 2.723 [ms] (mean, across all concurrent requests)
Transfer rate: 9635.48 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 12 92.3 2 1010
Processing: 15 532 321.6 461 10250
Waiting: 10 505 297.2 441 9851
Total: 17 544 333.1 467 10270

Percentage of the requests served within a certain time (ms)
50% 467
66% 553
75% 626
80% 684
90% 896
95% 1163
98% 1531
99% 1864
100% 10270 (longest request)

Result is worse then my (optimistic?) guess. Additionally we can see two failures (and we can see two corresponding exceptions in Zuul log that complains about HTTP pool timeout). Apparently the timeout is set to 10 seconds by default.

So let’s get some more results.


Document Path: /sodik/sample.html
Document Length: 26650 bytes

Concurrency Level: 200
Time taken for tests: 50.080 seconds
Complete requests: 50000
Failed requests: 0
Total transferred: 1343550000 bytes
HTML transferred: 1332500000 bytes
Requests per second: 998.39 [#/sec] (mean)
Time per request: 200.322 [ms] (mean)
Time per request: 1.002 [ms] (mean, across all concurrent requests)
Transfer rate: 26199.09 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 16 7.9 16 126
Processing: 15 184 108.1 203 1943
Waiting: 13 183 105.9 202 1934
Total: 18 200 107.8 218 1983

Percentage of the requests served within a certain time (ms)
50% 218
66% 228
75% 235
80% 239
90% 254
95% 287
98% 405
99% 450
100% 1983 (longest request)

Wow, what an improvement. Only what comes to my mind that Java JIT compilation could help the performance. But to verify if it was just an coincidence, one more attempt: 1010 req/sec. At the end the result is a positive surprise for me.

Conclusion

Zuul’s raw performance is very comparative to Nginx – in fact after startup warmup period it is even slightly better in my results (again – see disclaimer – this is not a serious performance test). Nginx shows more predicable performance (lower variation) and (sadly) we have experienced minor glitches (2 out of 150000 requests) during Zuul “warmup” (but your microservices are fault resilient, right? 🙂 )

So if you consider using some of the extra Zuul features or want to gain more from integration with other Netflix services like Eureka for service discovery, Zuul looks very promising as a replacement for ordinary reverse proxy. Maybe it is really used by Netflix 🙂 so you can try it too.

 

Blog Comments

Stanislav, those are really interesting results! We do indeed use Zuul here at Netflix to front all of the streaming and website services at Netflix and do get great and reliable performance and stability from it as well as the flexibility to handle the billions of requests that come through it every day and the inevitable issues with running a system of this scale and complexity in the cloud.

=-mikey-=

[…] One reason for using client-side load balancer can be performance. With client side balancer you can directly contact desired service with one network hop (after initial discovery of course); with traditional load-balancer you need two hops – see my very unprofessional test. […]

Thank you for this article.

I’m interested in comparing Zuul to Kong, https://getkong.org/ and this article is a step toward that goal.

Is Netflix Zuul an API Gateway?

I believe yes, depending on your definition of API Gateway – it certainly can serve as a gateway to several REST services. And it provides also lot of different functions as mentioned in the intro section.

What did you do to improve Zuul performance between the first and second time?

Nothing special. Just executed new test without restarting the server. My guess is that Java have optimized the code itself (with JIT).

Very interesting to read. I’ve been using Zuul with Spring Cloud for some time and these results reflects my experience with it. I am curious how the Netflix OSS version of Zuul performs against Nginx (instead of the Spring Cloud version). In other words, how much faster would pure Zuul would be compared to Spring Cloud version.

well, I don’t know. But I think that Spring Cloud just bootstrap Zuul – so maybe it can take slightly longer to startup but runtime performance should be the same. The only difference in performance could be related to different versions of Zuul – as in general it could happen that Spring Cloud will pack slightly older version of Zuul and newer version can have performance optimalizations.

too bad you didn’t include HAProxy or Varnish

what is your zuul config? how can you make it to requests per second: 998.39 [#/sec] (mean)

Add a comment

*Please complete all fields correctly

Related Blogs

Posted by miklik | 06 August 2015
What is wro4j Since you are here, you most likely know, but for others. wro4j (Web Resource Optimizer for Java) allows you (as name suggests) to optimize your web resources. Typically when you…
Posted by miklik | 26 September 2014
I have came across an interesting blog about exception handling in Spring MVC. It nicely describe different options how to handle exceptions in MVC controllers. As usual Spring gives you…