WhatsApp integration – part 3: Baileys

Series overview

  1. Part 1: the beginnings
  2. Part 2: whatsapp-web.js in Lambda
  3. Part 3: Baileys

Introduction

In part 2 we have gone through the initial solution and its problems. Now we will describe the second solution, which we use until now.

Solution description

The new solution is based on a Baileys library. It has a completely different philosophy to whatsapp-web.js. It’s not running any kind of browser, but rather interacting with WhatsApp servers directly via Websocket and the WhatsApp protocol. This leads to much less computing resources needed and overally it’s a better fit for AWS ecosystem.

Storing state

The Baileys library is also solving the problem at the lower level. It implements WhatsApp protocol well but doesn’t add some convenience functionality. The first example of this is state. WhatsApp protocol doesn’t support any way to retrieve all the messages. Instead, after connecting a client, you need to listen to history events and later for update/upsert events and save the data to the store of your choice. There is a simple store (makeInMemoryStore) included in the library, but it’s not production ready. It serves as a nice guide how to implement your own store.

The store we have implemented is backed by S3, to which it saves after every change to prevent any losses (in Lambda you have to expect that your function will be killed at any moment). There is a throttling mechanism applied so that S3 rate limits are not exceeded and also to limit network transfers.

A big advantage is that the state is raw data which you may transform as needed, as opposed to IndexedDB data, which is inaccessible from outside. This allows for merging the data stores in case of concurrent conflicts and much lower size (usually no more than several megabytes, including our metadata).

Storing session

Session is represented by Signal credentials, which is a set of keys. In addition to this, each conversation has its own keys. Again the library provides a utility session manager (useMultiFileAuthState). It uses a filesystem to persist the data though, so that makes is unusable in the Lambda environment, where filesystem isn’t permanent.

We have opted to implement our own DynamoDB based session manager, where each connected client has one row and each key is put into its own column.

Limitations

The Baileys library worked very well even without applying workarounds – it’s fast and reliable. That doesn’t mean, however, that there are no limitations.

Undeterministic initial history load

When you create a client (or socket, as they call it), you need to listen to “chats.set”, “messages.set” and “contacts.set” – they represent history. The problem is that these events happen multiple times (usually 2, but I have seen an occurences with 1 or 3) and there is no way to check whether all of them have already arrived or not. Sometimes the second event is lost for some reason and cannot be replayed, which makes the whole session corrupt. Other times the second event comes after 2 minutes, which is better than not coming at all, but it means you can’t just wait for a specific time and then be sure it’s ready.

AWS CloudFront logs showing an occasion, when the second “messsges.set” event came more than 2 minutes after the initial client connection

So we now wait for 2 events each or some maximum timeout before starting the processing and then still listen to those events and handle them as needed. The only downfall of this is that we cannot reliably tell when the history was fully processed and so the status shown to the user may not be correct in some cases, but no data is lost, which is the most important thing.

Multiple sockets for the same client not allowed

The initial architecture was to have a separate Lambda for synchronizing of new data and for sending messages. But the testing quickly showed us this is not the way forward. Any attempt to connect the second time lead to either repeated unsuccessful attempts of connecting or a mysterious error “1006″. After some searching we found out it means “Many sockets open”.

So a new architecture was clearly needed. The usual solution would be to have a single worker which receives various commands via queue. AWS provides a queue in its SQS, but the problem is it doesn’t have any partitioning. And obviously, each client only needs to receive commands for itself, which is impossible to do without partitioning.

Fortunately, AWS SDK allows to create SQS queues dynamically, which is a little impractical (we wanted to have all the infrastructure described in serverless.yml files, now this would be broken) but works well.

And only a single Lambda remained. Its responsibilities are initial history processing, processing of new incoming events and also execution of commands coming from the user.

A simplified diagram of how the single worker handles various tasks

Media failing to download

It’s expected that not all media files are stored in WhatsApp servers. The library even provides a function to request reupload from the mobile device (documentation).

But in reality, for files older than some threshold this reupload request fails every time, even when the mobile device is online. In WhatsApp web the same media can be downloaded, so it’s probably down to the protocol usage.

We have created a bug and hope it will be resolved in the near future.

No contact names

When contacts are received in “contacts.set” event, they don’t contain a name given to them by the phone user, just the name they defined in WhatsApp. This is good enough in most cases, but some people fill the name with an emoji or something similar, in which cases it makes identification more problematic to the user.

Conclusion

All in all, the Baileys library is a reliable and fast library, covering most of the needs for a WhatsApp integration. Some limitations exist, but they are not critical and hopefully will be fixed in the near future.

Related Post

Leave a Comment

© 2021 Instea, s.r.o.
All rights reserved.

Contact us

Where to find us