Nowadays you can hear lot about microservices. Spring Boot is an excellent choice for building single microservice but you need to interconnect them somehow. That’s what Spring Cloud tries to solve (among other things) – especially Spring Cloud Netflix. It provides various components e.g. Eureka discovery service together with client side load balancer Ribbon for inter-microservice communication. But if you want to communicate to outside world (you provide external API or you just use AJAX from your page heavily) it is good to hide your various services behind a proxy.
Natural choice would be Nginx. But Netflix comes with its own solution – intelligent router Zuul. It comes with lot of interesting features and can be used e.g. for authentication, service migration, load shedding and various dynamic routing options. And it is written in Java. If Netflix uses it, is it fast enough compared to native reverse proxy? Or is it just suitable as an companion to Nginx when flexibility (or other features) are important?
Disclaimer: Do not consider this as a serious benchmark. I just wanted to get feeling how Nginx and Zuul compares and I can’t find any benchmarks on internet (ok, maybe I was not searching long enough but I wanted get my hands dirty). It does not follow any recommended benchmarking methodology (warmup period, number of measurements,…) and I was just using 3 micro EC2 instances (that is not optimal neither) in different availability zones.
So what have I done? Test was to compare raw performance of both solutions without any special features. I just concurrently make single HTTP request to get one HTML page (of size cca. 26KB). I used ApacheBench to make the test with 200 concurrent threads (I have tried also httperf but it looks that it was more CPU demanding so I got lower numbers then with ab).
First I was interested what is the performance of target HTTP server (once again Nginx) without any reverse proxy. Ab was running on one machine and was accessing target server directly.
$ ab -n 10000 -c 200 http://target/sample.html .... Document Path: /sample.html Document Length: 26650 bytes Total transferred: 268940000 bytes HTML transferred: 266500000 bytes Requests per second: 2928.45 [#/sec] (mean) Time per request: 68.295 [ms] (mean) Time per request: 0.341 [ms] (mean, across all concurrent requests) Transfer rate: 76911.96 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 4 33 6.0 32 66 Processing: 20 35 7.5 35 392 Waiting: 20 35 6.4 34 266 Total: 24 68 7.8 66 423 Percentage of the requests served within a certain time (ms) 50% 66 66% 67 75% 69 80% 70 90% 74 95% 81 98% 91 99% 92 100% 423 (longest request)
Quiet nice, few more tests shows similar values: 2928 ; 2725 ; 2834 ; 2648 req/s. There are some deviations but this number is not that important now.
So now I could setup proxy server (Ubuntu 14.04 LTS) with default nginx installation. I just updated configuration to proxy to target server like:
server { listen 80 default_server; listen [::]:80 default_server ipv6only=on; # Make site accessible from http://localhost/ server_name localhost; # allow file upload client_max_body_size 10M; location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header Host $host; proxy_pass http://target:80; } }
And run similar test as before
$ ab -n 50000 -c 200 http://proxy/sample.html ... Server Software: nginx/1.4.6 Server Hostname: proxy Server Port: 80 Document Path: /sample.html Document Length: 26650 bytes Concurrency Level: 200 Time taken for tests: 52.366 seconds Complete requests: 50000 Failed requests: 0 Total transferred: 1344700000 bytes HTML transferred: 1332500000 bytes Requests per second: 954.81 [#/sec] (mean) Time per request: 209.465 [ms] (mean) Time per request: 1.047 [ms] (mean, across all concurrent requests) Transfer rate: 25076.93 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 50 11.7 48 114 Processing: 37 159 11.9 160 208 Waiting: 36 159 11.9 160 207 Total: 40 209 10.4 209 256 Percentage of the requests served within a certain time (ms) 50% 209 66% 212 75% 214 80% 216 90% 220 95% 224 98% 232 99% 238 100% 256 (longest request)
Further results were 954 ; 953 ; 941 req/s. Performance and latency is (as expected) worse.
Now we can use same machine to setup the zuul. Application itself is very simple:
@SpringBootApplication @Controller @EnableZuulProxy public class DemoApplication { public static void main(String[] args) { new SpringApplicationBuilder(DemoApplication.class).web(true).run(args); } }
And we just have to define fixed route in application.yml
zuul: routes: sodik: path: /sodik/** url: http://target
And now let’s try to run test.
$ ab -n 50000 -c 200 http://proxy:8080/sodik/sample.html Server Software: Apache-Coyote/1.1 Server Hostname: proxy Server Port: 8080 Document Path: /sodik/sample.html Document Length: 26650 bytes Concurrency Level: 200 Time taken for tests: 136.164 seconds Complete requests: 50000 Failed requests: 2 (Connect: 0, Receive: 0, Length: 2, Exceptions: 0) Non-2xx responses: 2 Total transferred: 1343497042 bytes HTML transferred: 1332447082 bytes Requests per second: 367.20 [#/sec] (mean) Time per request: 544.657 [ms] (mean) Time per request: 2.723 [ms] (mean, across all concurrent requests) Transfer rate: 9635.48 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 12 92.3 2 1010 Processing: 15 532 321.6 461 10250 Waiting: 10 505 297.2 441 9851 Total: 17 544 333.1 467 10270 Percentage of the requests served within a certain time (ms) 50% 467 66% 553 75% 626 80% 684 90% 896 95% 1163 98% 1531 99% 1864 100% 10270 (longest request)
Result is worse then my (optimistic?) guess. Additionally we can see two failures (and we can see two corresponding exceptions in Zuul log that complains about HTTP pool timeout). Apparently the timeout is set to 10 seconds by default.
So let’s get some more results.
Document Path: /sodik/sample.html Document Length: 26650 bytes Concurrency Level: 200 Time taken for tests: 50.080 seconds Complete requests: 50000 Failed requests: 0 Total transferred: 1343550000 bytes HTML transferred: 1332500000 bytes Requests per second: 998.39 [#/sec] (mean) Time per request: 200.322 [ms] (mean) Time per request: 1.002 [ms] (mean, across all concurrent requests) Transfer rate: 26199.09 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 16 7.9 16 126 Processing: 15 184 108.1 203 1943 Waiting: 13 183 105.9 202 1934 Total: 18 200 107.8 218 1983 Percentage of the requests served within a certain time (ms) 50% 218 66% 228 75% 235 80% 239 90% 254 95% 287 98% 405 99% 450 100% 1983 (longest request)
Wow, what an improvement. Only what comes to my mind that Java JIT compilation could help the performance. But to verify if it was just an coincidence, one more attempt: 1010 req/sec. At the end the result is a positive surprise for me.
Zuul’s raw performance is very comparative to Nginx – in fact after startup warmup period it is even slightly better in my results (again – see disclaimer – this is not a serious performance test). Nginx shows more predicable performance (lower variation) and (sadly) we have experienced minor glitches (2 out of 150000 requests) during Zuul “warmup” (but your microservices are fault resilient, right? 🙂 )
So if you consider using some of the extra Zuul features or want to gain more from integration with other Netflix services like Eureka for service discovery, Zuul looks very promising as a replacement for ordinary reverse proxy. Maybe it is really used by Netflix 🙂 so you can try it too.
13 Comments
Mikey Cohen
Stanislav, those are really interesting results! We do indeed use Zuul here at Netflix to front all of the streaming and website services at Netflix and do get great and reliable performance and stability from it as well as the flexibility to handle the billions of requests that come through it every day and the inevitable issues with running a system of this scale and complexity in the cloud.
=-mikey-=
Michael Jackson
Thank you for this article.
I’m interested in comparing Zuul to Kong, https://getkong.org/ and this article is a step toward that goal.
Catalin
Is Netflix Zuul an API Gateway?
Stanislav Miklik
I believe yes, depending on your definition of API Gateway – it certainly can serve as a gateway to several REST services. And it provides also lot of different functions as mentioned in the intro section.
Ted
What did you do to improve Zuul performance between the first and second time?
Stanislav Miklik
Nothing special. Just executed new test without restarting the server. My guess is that Java have optimized the code itself (with JIT).
Paco
Very interesting to read. I’ve been using Zuul with Spring Cloud for some time and these results reflects my experience with it. I am curious how the Netflix OSS version of Zuul performs against Nginx (instead of the Spring Cloud version). In other words, how much faster would pure Zuul would be compared to Spring Cloud version.
Stanislav Miklik
well, I don’t know. But I think that Spring Cloud just bootstrap Zuul – so maybe it can take slightly longer to startup but runtime performance should be the same. The only difference in performance could be related to different versions of Zuul – as in general it could happen that Spring Cloud will pack slightly older version of Zuul and newer version can have performance optimalizations.
Caleb Cushing
too bad you didn’t include HAProxy or Varnish
nearspears
what is your zuul config? how can you make it to requests per second: 998.39 [#/sec] (mean)