NodeJS 各websocket框架性能分析

For a current project at WhoScored, I needed to learn JavaScript, Node.js and WebSocket channel, after seven years of writing web applications with Java and Spring framework. We wanted an application that can send data to thousands of concurrent users, and Node.js appeared to be the right way of doing it with its event-driven, non-blocking I/O model that can scale up easily.

Firstly, to learn JavaScript I started doing the tutorials in Code Academy and reading D. Crockford’s JavaScript: The Good Parts book. In parallel, I read J. R. Wilson’s Node.js the Right Way as well. It was a great read: by using some great modules like async, Q, ZMQ, Express, and methods like Promises and Generators, the book explains how to build scalable Node applications. It also covers sockets with Pub/Sub, Req/Res and Push/Pull patterns which shaped the WebSocket applications that I wrote.

System structure

After some trials, errors, and brainstorming sessions, the final structure looks like this:

A client connects for a specific key/resource, let’s say “matchesfeed/8/matchcentre”. WebSocket server’s responsibility is to return some JSON data for this key. If it doesn’t have data for this key in its memory, it requests it from “Fetcher” via a Push socket implemented with ZMQ library over a TCP connection. Fetcher receives this request, and adds it to a queue of jobs. Worker processes (implemented with Node.js clusters) check this queue constantly, get a job, and make an HTTP call to an API that has the actual JSON data for this key. When response is received by a worker, it passes the data to master process, which sends it to WebSocket server, and puts the job back into queue. When WebSocket server receives data, it publishes to clients who are connected for that key. All applications are open source, and you can find web socket server here, fetcher here and the data provider here. Node.js version used in all applications are 0.10.

einaros / ws (v 0.4)

Based on Heroku’s WebSocket demo, I began with einaros/ws implementation. It didn’t have as much as documentation that other frameworks had, but the development went smooth and I had something working in a short time.


Client beautifully connects via WebSockets

Then I wrote a load test to see the performance. The test basically connects clients every half a second, all requesting the same data. Data is 165KB, and is sent every second. With 120 concurrent clients, the application allocates around 210MB of RAM, and data sent appears to be around 35 MB/second. The test machine is a 3.4 Ghz Intel i7 (8 processors) with 16GB of RAM, running OSX v10.9.

I monitored the process with Strongloop’s agent, I find it pretty impressive.

At this point, the biggest drawback for me was it being a pure WebSocket implementation: if the client browser did not support WebSockets, it would not work. This made me try Socket.IO.

Socket.IO (v. 0.9)

Socket.IO seemed like the perfect solution, as it selects the most capable transport at runtime: it tries to connect to server via WebSockets first, if it doesn’t work, it downgrades by trying XHR-polling, if it doesn’t work, continues with flash sockets etc. Moreover, it would try to reconnect sockets if connection fails, and it supports rooms so that I could cluster connections based on what information they asked for, without any custom implementation. So I re-implemented the application by keeping everything the same but web socket framework.


Client connects to Socket.IO server

Indeed, when the server is down, Socket.IO tries to connect back with decreasing frequency.

And when the server is up, WebSocket connection is re-established:

Load test results are below, it is the same test I executed for ws/einaros implementation with the same data:

The data output looks like:


In WS, average load was 35 MBs

Performance limit

However, when I connect 140 clients, the memory usage starts to increment without the number of clients or the data size changing. Garbage Collector kicks in very frequently and gets more aggressive over time (sometimes a couple of times per second), eating up CPU time and desperately trying to free memory. Memory usage would reach ~1GB, then heartbeats would fail and clients would start to lose connection.


Heap size stops increasing when connections start dropping, and keeps stable as Socket.IO clients tries to connect back. Also notice the very low event loop latency.

It’s time to debug!

I used Mozilla’s memwatch module to detect memory leaks and take heap dumps to inspect the data. I found out that somehow 10MB Buffer objects would stack up constantly, you can find more details in my Stackoverflow Question. I couldn’t figure out why this would happen, but I think due to some cumulative CPU intensive operations, event loop response slows down and operations (sending messages) stack up in the queue. Being stuck, I decided to try out Sock.js.

I must admit my Sock.js adventure didn’t last long after discovering that Sock.js does not support connection query strings. This would mean that I’d need to amend my code so that after clients connect, they’d need to send a seperate message to pass the parameter(s), and this was a bit turn off. Next, I decided to give Engine.IO a try.

Engine.IO (v 0.7.12)

As far as I’ve seen, both Engine.IO and Socket.IO are mainly developed by Guillermo Rauch, and Engine.IO is a lower level library. Socket.IO v0.9 appears to be somewhat outdated and Engine.IO is the interim successor until Socket.IO v1.0 is released — which will use Engine.IO. One major difference is that Engine.IO always establishes a long-polling connection first, then tries to upgrade to better transports. This makes it resilient to firewalls and proxies, gives more predictable results and less frequent reconnecting. All other differences are very well listed on project’s readme. As you see, Engine.IO first started to connect via long-polling, then connected via WebSockets:


Notice the ‘transport’ parameter

However its client library does not try to reconnect when connection terminates.

Same load test with same values (160KB sent every second to 120 clients) looks like:

Performance limit

Let’s see what happens when we increase connection size. Engine.IO’s performance proved to be better for my application: I could load test with 230 concurrent connections, with (again) each one sent 165KB of data every second. Any number above 230 connections resulted in the application using massive amounts of data — up to 2GB (the same issue with Socket.IO implementation). Then connections would start dropping until 160 connections are remaining, and obviously this results in less resource usage. You can see this drop in the charts below. Remember in Engine.IO client implementation, clients don’t reconnect once they are disconnected.

Primus (v 1.5.0)

When I was about to end my research and development, I came across Primus. It is developed by Arnout Kazemier who is an active collabrator in some of the projects mentioned above. Primus actually lets you do what I have done all this time (changing WebSocket technologies for the business logic) dead easy: only with a single line of code. It has a similar syntax with all of the other frameworks, reconnects clients when connection terminates, and has lots of cool methods and structures — like rooms.


Primus connects

Here are the test results for einaros/ws library with Primus (160KB sent every second to 120 clients):

Compared to native implementation, it has a higher CPU usage and event loop latency, but a bit lower memory consumption. However, something unexpected happens after 10 minutes pass:

Event loop average drops to almost 0ms, but memory allocation starts to stack up, reaching GBs in minutes. This happened twice in two of my tries. I decided to leave this instead of digging deeper. (Please see Arnout’s comment about this issue on the right hand side)

As Primus is a wrapper for socket libraries, I decided to use a native library just to have things on a lower level.

Looking back: Performance limit of einaros/ws

At this point, I wanted to look back and see at what rate einaros/ws implementation (which I ditched as it doesn’t support other connection types) would crash.

By handling 410 concurrent connections, sending 165KB data to each every second, einaros/ws’s performance was far more superior than any other framework I tried. Compare it with Socket.IO’s max 140 connections and Engine.IO’s max 230 connections. einaros/ws would fail at 415 connections in the same way with other implementations: memory usage would stack up seemingly almost indefinitely (I waited until 8GB consumption before killing the process).

Final decision on the framework

In summary, these test results showed that Engine.IO looks like the better choice for my application. It supports different connection types apart from WebSockets, and has a significantly better memory usage compared to others.

The difference in memory seems like a good trade of for losing the functionality of clients not reconnecting in a connection failure. At least, until Socket.IO 1.0 is released.

Changing the flow based on limitations

The aim when starting this project was to have a system that can support thousands, even 10 thousand concurrent users. It seems, with 165KB of data sent every second, this won’t be possible. This made me decide to decrease the data size, and introduce a delta model, in which the application would send only the new JSON content instead of the whole object each time.

Test 1: The impact of data size

I ran the a load test that uses 5KB of data, sent to 4000 users every second. 5KB of JSON makes around 200 lines of data with reasonable string length. I’ve also modified the test so that 10 new clients connect every second. The data throughput is equivalent to 20MB sent per second.

We will take these values as a base and see what happens when we keep throughput same and change some variables.

Test 2: The impact of concurrent connection size

In this test, 2000 users connected to the server, and received 10KBs of data every second. Again in this case, 20MB is sent every second.

Compare these results with test one. As you see, doubling the data size and decreasing the connection size by half ended up in 20% lower memory consumption, 35% lower CPU usage and 75% less event loop latency.

Test 3: The impact of data broadcasting frequency

In this test, 4000 users connected to the server, and received 10KB of data every 2 seconds. Again, this equals to 20MB sent every second.

Compare these results with the first test. It seems doubling data size and decreasing frequency of data sent by half results in 25% lower CPU usage, but more or less the same memory consumption and event loop latency.

Conclusion

We do not really have a direct control over how many connections the application will have if we have one server, thus it seems we can only play around with the size of data sent, and how frequent we send this data. These tests show that for a constant throughput, (at least) for Engine.IO, the number of connections have the biggest impact on performance, then comes reducing size of data, and lastly reducing frequency of data sent. Sending larger data in less frequency appears to yield better performance. It’s a double-sided sword though: increasing size of data packets was what made my applications crash as well.

So I think the best way of balancing it is to estimate the max number of concurrent users in the long run (at least a couple of years forward), finding the maximum data size that can be used with that number, and then decrease the frequency of sending data if possible — but as said this is not that critical. After this point, as connection number has the biggest impact on the performance, by using a load balancer, adding more application instances should increase the performance more than any other method.


Discuss this on Hacker News

Acknowledgements: Benjamin Byholmfor reviewing and commenting on this article with his insights on caching, multiple servers, and V8 memory usage. Arnout Kazemier for his helpful comments, Nico Kaiser for his feedback, Guillermo Rauch and David Halls for taking the time to read the draft.

时间: 2024-08-02 10:39:55

NodeJS 各websocket框架性能分析的相关文章

PHP swoole 和 nodeJs性能分析

js出了个nodejs,我们技术老大前段时间发了个技术邮件说php支持多线程,异步,非阻塞 还打着旗号说要灭掉nodejs,官方网站上说swoole的性能已经最少也和nodejs可以媲美了,这个需要用数据说话,周末的时候有空就顺手测试了一下这两个东东!下面是测试报告哈. 首先我用127.0.0.1:8000端口测试swoole, 用127.0.0.1:1337测试nodejs 有图有真相:(稍后传,不会用mac把截图保存!汗!各位看客有谁会的教教我哈!编程都变傻了) 测试环境: 处理器:2.7G

背景建模技术(二):BgsLibrary的框架、背景建模的37种算法性能分析、背景建模技术的挑战

背景建模技术(二):BgsLibrary的框架.背景建模的37种算法性能分析.背景建模技术的挑战 1.基于MFC的BgsLibrary软件下载 下载地址:http://download.csdn.net/detail/frd2009041510/8691475 该软件平台中包含了37种背景建模算法,可以显示输入视频/图像.基于背景建模得到的前景和背景建模得到的背景图像,还可以显示出每种算法的计算复杂度等等.并且,测试的可以是视频.图片序列以及摄像头输入视频.其界面如下图所示: 2.BgsLibr

Nodejs下的ES6兼容性与性能分析

ES6标准发布后,前端人员也开发渐渐了解到了es6,但是由于兼容性的问题,仍然没有得到广泛的推广,不过业界也用了一些折中性的方案来解决兼容性和开发体系问题,但大家仍很疑惑,使用ES6会有哪些兼容性问题. 一.Nodejs下ES6兼容性现状   之前写了es6通过Babel编译后的在浏览器端的兼容性问题<Babel下的ES6兼容性和规范>,随着范围的扩展,ES6在Nodejs上兼容性也有必要重新梳理下.   随着iojs的引入,新版的Nodejs开始原生支持部分ES6的特性,既然ES6在浏览器端

php框架codeigniter框架源代码分析,注释中文化,类库分析(一)

最近这几天决定看 ci框架的源代码的,因为它是轻量级的,代码文件的结构比较清晰,又index.php作为 入口,在codeigniter.php文件中加载所有的基础类,于是我挨个类滴看下去,并且在看的时候分析了每一步 关键的进程,而且对原来的英文注释进行了翻译,现在主要文件已经翻译完成,稍候会翻译并分析一些其他 的重要的类,敬请期待 这里的是 它的详细的执行过程,从入口到最后结束,进行了完整的记录,大家先看着,其他的分析,例 如,CI超类的结构,如何实现MVC模式,数据库类的实现,xss过滤类的

PHP 性能分析与实验:性能的宏观分析

[编者按]此前,阅读过了很多关于 PHP 性能分析的文章,不过写的都是一条一条的规则,而且,这些规则并没有上下文,也没有明确的实验来体现出这些规则的优势,同时讨论的也侧重于一些语法要点.本文就改变 PHP 性能分析的角度,并通过实例来分析出 PHP 的性能方面需要注意和改进的点. 对 PHP 性能的分析,我们从两个层面着手,把这篇文章也分成了两个部分,一个是宏观层面,所谓宏观层面,就是 PHP 语言本身和环境层面,一个是应用层面,就是语法和使用规则的层面,不过不仅探讨规则,更辅助以示例的分析.

PHP 性能分析(三): 性能调优实战

在本系列的 第一篇 中,我们介绍了 XHProf .而在 第二篇 中,我们深入研究了 XHGui UI, 现在最后一篇,让我们把 XHProf /XHGui 的知识用到工作中! 性能调优 不用运行的代码才是绝好的代码.其他只是好的代码.所以,性能调优时,最好的选择是首先确保运行尽可能少的代码. OpCode 缓存 首先,最快且最简单的选择是启用 OpCode 缓存.OpCode 缓存的更多信息可以在 这里 找到. 在上图,我们看到启用 Zend OpCache 后发生的情况.最后一行是我们的基准

Android应用性能优化最佳实践.2.2 性能分析工具

2.2 性能分析工具 从前一节可以看到,Android系统在4.1以后从框架上解决了由于系统问题导致的卡顿现象,但在实际的使用过程中,在用户的感受上,卡顿仍然是应用开发中主要面临的问题,而原因从上一节的分析中也知道本质是VSync信号到来时,不能及时处理绘制事件导致,本节先抛出以下两个问题: 1)应用层做了什么会导致VSync事件不能及时处理? 2)卡顿能监控吗? 性能问题并不容易复现,也不好定位,光从几个场景不能完全覆盖所有的问题,因此在做性能优化时,最直接有效的方法,就是尽量复现存在性能问题

CYQ.Data 数据框架 性能评测

最近有网友经常关注 CYQ.Data 的性能问题,虽然关注,但没发现谁主动的写过和其它框架的性能评测文章.   个人平常比较忙一些,这么长久以来,一直也没好好的为 CYQ.Data 写一个简单的性能测试.   今天,得为它写了一篇了.   杂七几句: 当很多人问我 CYQ.Data 性能怎样时,我说:比其它ORM的框架性能要好.   当然,我没有给出任何的测试数据来证明,因为我没用过其它框架,所以没法给出数据,所以只能任网友:爱信不信.   说比其它框架要好,当然不是因为卖瓜的赞瓜甜,而是基于以

SQL2005性能分析一些细节功能你是否有用到?

原文:SQL2005性能分析一些细节功能你是否有用到?      我相信很多朋友对现在越来越大的数据量而感到苦恼,可是总要面对现实啊,包括本人在内的数据库菜鸟们在开发B/S程序时,往往只会关心自己的数据是否正确的查询出来,一旦自己写的程序哪天要花上十秒或者是一分种才会出来,此时就技穷了.如何优化成为菜鸟们的难题.本人不才,最近看了些园友关于数据库优化的文章,觉的有必要总结下,让更多像我一样只关心结果,并不关心质量的朋友少走些弯路.     本文主旨:本文并非大谈高深技术(也没这本事),只是想总结