Gofrendly is a social app for women. Its central feature is a realtime chat, very similar to what you find in Facebook Messenger or WhatsApp.
With realtime chats being such a central part of so many apps, you would expect them to be relatively easy to build by now. They are not. There are some approaches and tools that simplify the task, but none that fits perfectly.
This article is a discussion about various realtime chat implementations that we have tried at Gofrendly over the years.
Solution One: outsourcing your chat
Ideally, we would have liked to let someone else handle the chat infrastructure completely, so we could focus on developing the less mainstream parts of the Gofrendly app.
There are plenty of alternatives available offering drop-in chat solutions that can easily be added to a mobile app. The problem with most of them: your lack of control.
The very first version of Gofrendly (back before 2018) used one of them. That made us totally dependent on a third party’s design and implementation choices. That provider’s product was still under construction, which resulted in a generous pricing but a shaky infrastructure with unwanted downtime, as well as scaling and load issues. But the deal breaker came when Gofrendly wanted to merge features specific to our app into the chat, such as providing a smooth integration between finding new friends and chatting with them.
So Gofrendly’s chat underwent a first rewrite.
Solution Two: Firebase realtime database
Firebase offers a no-sql database accessible from android and ios. It provides realtime data synchronisation across devices via a combination of push notifications and realtime events. It is truly a master piece of infrastructure and scales well, provided that you design your data models correctly.
Offshore mobile development firms love Firebase, since they let mobile developers work with shared data without having to design a custom backend for it. To build a realtime chat with Firebase is rather simple:
- First you decide on data structures: one for messages, one to represent a chat discussion, one to represent users in a discussion, etc.
- Then you implement that logic twice: once on ios, once on android.
In your first release, it will work like a charm. The problems come over time, and some of them are serious:
- Time works against you: So what do you do when your data structures need to evolve? You build new ones alongside the old ones. And your firebase schema takes a leap in complexity. Gofrendly started with a chat between users: each chat was a simple list of messages. Works well when you have few users chatting at the same time. The data structures were small and neat, life was a walk in the park. Then we introduced a new type of chat in the app, where potentially hundreds of users needed to interact. We needed discussion threads. At that point, you take your virtual duck tape and start hammering and squeezing at your data structures till you get something that works without requiring a change in legacy data structures. Or you just build a separate second chat beside the first. As new features keep entering, your schema will grow more and more complex, until the technical debt gets too high. The inevitable day of this major data structure redesign will be painful, doubly so if you decide to maintain both the new and old data structures to ensure compatibility with legacy versions of your app.
- Event listeners at scale: a wonderful feature of Firebase is its capacity of propagating database changes to all devices that subscribe to them. Your app needs however to explicitly subscribe to every data resource it wants to watch. Works well when you have little data, but what happens when your users have been around and generating content for years? Some user’s apps will end up having to monitor a lot of data sets. And the Firebase SDK gets notoriously slow when subscribing to too many of them. As years pass, your users will see their app getting more and more sluggish, and start pouring hate on your app store ranking. Of course, you can design your data structures up front to tackle this issue, by implementing some form data pagination, or automated archiving of older data. But I am ready to bet that you will miss something somewhere that ultimately will make the number of event listeners in your app unsustainable.
- Globalism: when you start a Firebase instance, you have to choose a location for it. Odds are that when you were building your first MVP, you just chose the nearest data center available and stopped thinking about it. Years later, you have users all other the world, and their apps keep syncing their data with that data center. Say you chose the US east cost. But, oh joy and wonder, your app’s big breakthrough happens in rural Asia. What’s the time lag gonna be for your users there? And, bummer, when you have chosen a location for your Firebase instance, you are stuck with it.
- Data bug aggregation: with Firebase, mobile apps are in charge of manipulating data structures. And as your data structures grow in number and complexity, so will the risk of releasing buggy apps that will start breaking your data once in use. When that happens, you will be forced to release patched apps in a hurry and have to support buggy data in later app versions for as long as you still have buggy apps running in the wild. Do you feel the pain?
- Refactoring apocalypse: all the arguments above point into the same direction, namely that at some point in time you will need to redesign your Firebase schema from the ground up. It is not a question of 'if', but of 'when'. And at that point, since you will anyway have to handle data migration and backward compatibility, you might as well consider changing infrastructure completely and replacing Firebase with something else.
Bottom line: Firebase is a nice fit when you just want to release a quick MVP, and in particular when you are outsourcing it. But as time passes, you will reach a point when the need to refactor will force you out of Firebase.
Solution Three: Custom backend and realtime event framework
4 years into Gofrendly’s journey, we hit the wall with Firebase. So we decided to replace it with a custom solution, made of a backend offering a REST API and a realtime event framework to push data updates to the apps.
Designing a chat API
Let’s first look at the REST API. You will need a bunch of endpoints. Probably more than you first hoped for. As a reference, here is a screenshot of Gofrendly’s swagger specification for its chat API:
Your first reaction should be: “wow, this was not trivial.” And indeed, building the backend of a realtime chat is not a simple task. This alone justifies that you should absolutely consider other options first, like drop-in chats or Firebase.
Choosing a database
SQL or no-SQL? A SQL database is almost guaranteed to turn into a bottleneck over time, but if offers really neat querying capabilities. A no-SQL database scales up really well, but you will have to pay attention to your indexing if you want to support advanced querying.
We chose an hybrid setup: a no-SQL database (Google Datastore) with partial replication into an Elasticsearch cluster, thereby getting both the speed and flexibility of no-SQL and the advanced querying capabilities of Elasticsearch, but at the potential risk of split brain situations.
Designing data structures
There are numerous deep-reaching decisions to take when designing the data structures of a realtime chat
You want your structures to work at scale, so you want to keep them small and limit dependencies between different data structures. You also want to be able to paginate any list of things that can potentially grow endlessly. And you want them to work well for the specific database you chose.
Let’s start with messages. Each message has a text and an author. Each message belongs to a chat, and each message has a creation date used to sort them into a timeline. So a basic message structure would be:
Except you probably want to have different kind of messages: text, image, video, system message. And you probably want to add reactions to messages (likes and such), and keep track of the number of views.
Then consider this question: do you want to support nested chat threads? If so, any message may be the starting point to a new chat thread. And how many levels of threading do you want to support then?
Our message structure is now already a bit more complex:
Would that work? Of course. But what happens if you decide down the line that it would be nice to let a message author see which users have liked one particular message? A 'count_views' attribute won't work anymore…
My point here is that whichever data structure you come up with, you should expect to have to refactor it later on.
However, now that we are hiding those data structures behind a versioned API, refactoring data structure is much easier than when we shared them directly between apps via Firebase.
One nice trick with data stored as documents in a no-SQL database is that you can migrate your data structures on-the-fly the next time your backend is accessing them:
We have messages. What comes next? Consider this question: how do you list a user’s chats? You need some form of tracker object that maps a user ID to a chat ID. Basically something like:
And another question: How does a user know if an other user has read a chat message or not? Should you, for every message, maintain a list of recipients and a read status for each one? If so, how do you handle new users entering a chat at a later time?
A common solution to that problem is to keep track, for every user, chat and chat thread, of the date at which the user last fetched messages from that thread. If a message was created or updated after that date, it hasn’t been read yet.
Next, we could add booleans to the tracker, to toggle whether a given chat thread should be visible or hidden from the user, whether a user should get push notifications when messages are posted to this chat, or whether a user has left or joined a chat.
So our tracker object might end up looking more like:
Those two data structures are the bare minimum necessary to build a realtime chat. You’ll probably need to add more though, such as a chat object to keep track of a chat’s name, members and owners for access control restrictions.
Choosing a realtime event infrastructure
At this point, your mobile apps can use the API to create, update and retrieve messages, or list user’s chats. And we have gained the freedom of being able to relatively painlessly refactor our data structures without any impact on the mobile apps, provided that the API is respected.
The whole point of a realtime chat, however, is to be realtime. You want your app to be informed of new messages or updates shortly after they happen. One approach is to have apps polling the API endlessly for new updates, but that is very resource consuming. It is much better to instead push changes to the app when they occur.
You can push data updates to the apps via push notifications. And as a matter of fact, you must, since that’s the only reliable way to push data to your app when it is not running in the foreground.
But push notifications are limited in terms of payload size and are not guaranteed to be delivered quickly. When your app is running in foreground, websockets are a faster and more efficient way to push data updates. A major drawback with websockets though is that they require a significant backend infrastructure and a high devops costs, and odds are that this has nothing to do with your core business. Besides, websockets also have limitations and if you want to ensure the best possible reliability in an event delivery setup, such as handling bad network connectivity, it is going to get complex rather fast. Let it be someone else’s headache and use a third party solution.
Firebase offers realtime event delivery. And so do Pusher and PubNub. We went for Pusher, but they all do the work nicely.
Remains to define the data structures for our realtime events. A realtime chat needs at least the following events: MESSAGE_ADDED, MESSAGE_UPDATED, MESSAGE_DELETED. If you have message threads, you will need a MESSAGE_THREAD_UPDATED event indicating that something has changed in the thread starting under a given message. And for more chatroom-like functionality, you’ll need events such as USER_JOINED and USER_LEFT.
As an example, here is the definition of a MESSAGE_ADDED event in Gofrendly:
Versioning events and push payloads
Notice that event definitions are subject to the same problem we had with shared Firebase data structures: they cannot be changed once they are used by released apps.
Same thing with push notifications: the payload of a push notification depends on which version of the app is receiving it.
Meaning your backend must be aware of which app version it is pushing to and adjust the payload structures to match that version. That’s the only way you can handle backward compatibility against older app versions while still allowing yourself to refactor your push notification or event payloads.
In other words, the app must keep the backend up-to-date with information on the user's mobile device (a unique device identifier), which app version it runs and the device's push token. And you preferably want that in place already in the first release of your app.
Maintaining backward compatibility with Firebase
To make matters a bit more complicated, we decided to maintain backward compatibility between the new chat API and the older Firebase chat. Someone using the Gofrendly app based on Firebase should be able to chat seamlessly with someone using the newer Gofrendly app based on anything but Firebase.
Tricky, but doable. It boils down to implementing a two way transfer of chat data between new and old implementations:
- We implemented Firebase cloud functions to catch new chat messages written in Firebase, and call an import endpoint in the new chat API that extracts data from Firebase and emulates a user calling the new API to post that message, thereby updating the datastore and Elasticsearch databases and sending appropriate push notifications and realtime events to new API users.
- And we implemented the same Firebase logic as in the ios and android apps, but in the backend, to export chat events happening via the new chat API into Firebase, thereby keeping legacy app users in the loop.
And yes, that was a real headache!
There is a lot more to tell about realtime chat design, and this article really only scratched the surface. If you want to dive more into it, here is a tips: download Charles proxy, start an instance of Slack, and sniff the traffic going between your Slack app and its backend. It is both fascinating and humbling.
Thanks for reading this!