In one project, we were tasked to integrate WhatsApp into the system. Since it was supposed to be a “killer feature” of the product and we wanted to please the client, we were highly motivated to find a working solution.
We didn’t have to search for long until we had promising results. There is an official WhatsApp Business API. Without much further analysis we have concluded that it should be possible and indeed an easy task.
However, when the time to dig deeper came, we quickly realised this is not the way to go. WhatsApp Business is a completely different thing to a standard WhatsApp. What we wanted to do was to connect an existing personal WhatsApp account and use it through the application, whereas with WhatsApp Business you need to setup a completely new account, which is not even a phone number, but rather a virtual account named the same as your company. There is also some verification process required, which would severely hamper the connection process in the application. And the biggest problem are the limitations. The main one is that you are not allowed to initiate any conversation and can only reply within 24 hours of the last message. So this API is clearly not supposed for normal conversations, but rather as some kind of helpdesk.
After some more searching we have discovered a node.js library whatsapp-web.js. It is a library that runs WhatsApp web in a headless browser and then queries it directly. Since WhatsApp web works well, this must too – at least that was our reasoning. We have even made a simple prototype that was loading the message history, was listening to incoming messages and sent one test message to a preconfigured recipient. All seemed to work and there was no viable alternative, so we decided this was a way to go.
DISCLAIMER: WhatsApp has a hard stance against any unofficial API, so consult any usage of this library in production system with a lawyer.
For various reasons we have decided to use serverless framework (AWS specifically) in the project. The initial plan for WhatsApp was to use EC2 (which is a virtual machine; EC stands for Elastic Computing) and run the library there all the time. There would also be an application wrapper communicating with the library and provide an API to the application.
So the time came when WhatsApp got onto the roadmap and we started the implementation. After finally setting EC2 up and configuring VPC (EC2 must live in VPC, which stands for Virtual Private Cloud) we wanted to implement at least one functionality to prove it will work in this environment.
Then came the problems. Running of the library required more computing resources than we expected and so the idea that the smallest instance type will easily cover 10 WhatsApp accounts proved to be naive. We would now have to implement some kind of load balancer to start/stop instances based on usage, which is one more layer of complexity. In addition to that, new operating price calculations were above the operating budget, so we were intensely looking at alternatives.
After some discussions in the team we came up with an idea: what if we could run the WhatsApp client in Lambda every minute for several seconds, process any new data and then shut it down again? There were several major obstacles, so it was more a hope than a real solution at the moment, but we still invested some time into this.
The first obstacle was how to run the browser in Lambda. The main problem is Lambda’s 50MB limitation for everything. Thankfully, you can have 3 more 50MB layers in addition to the main code and there exists a pakcage chrome-aws-lambda which is customised exactly in such a way so that it would fit into Lambda. It also seemed to work with the library, so problem solved.
Then there was session. In the time between the analysis and implementation, WhatsApp moved multi device support from beta to a preferred authentication mode. Since in the analysis the library was representing a session with a simple JSON, we counted on this and believed it can be a simple column in a database or something similar. Now, however, things were totally different. The library doesn’t have any compacted session representation, so it just uses the Chromium profile folder, which is 10s of MB large. Fortunately for us, when we just zipped the folder and uploaded it into S3 (storage solution in AWS), and then downloaded and unzipped the next time we were running the client, it worked like a charm.
So now we had a workaround for everything and decided to go this route.