Skip to Content

What database does WhatsApp use to store messages?

WhatsApp is one of the most popular messaging apps in the world, with over 2 billion monthly active users as of 2022. With so many users constantly sending messages, media, and making voice and video calls through WhatsApp, the app requires a robust and scalable database solution to manage all of this data.

Quick Answer

WhatsApp uses a NoSQL database called CouchDB to store all user data, including messages, media, profiles, and more. CouchDB is a document-oriented database that provides high availability, fault tolerance, and scalability – all essential features for WhatsApp’s massive user base.

Overview of WhatsApp

WhatsApp was created in 2009 by Brian Acton and Jan Koum, former employees of Yahoo. The app quickly gained popularity as a fast, simple, and reliable messaging application. Key features of WhatsApp include:

  • End-to-end encrypted messaging and calling
  • Group chats with up to 256 participants
  • Media sharing (photos, videos, documents, etc.)
  • Voice calls
  • Video calls
  • Statuses and stories
  • Cross-platform availability (mobile and desktop)

In 2014, WhatsApp was acquired by Facebook for $19 billion. Today, it remains one of the most used apps worldwide, with over 65 billion messages sent daily.

WhatsApp’s Infrastructure and Scale

To support its massive user base and data requirements, WhatsApp has built out a large-scale technical infrastructure. Some key stats about WhatsApp’s infrastructure include:

  • Processes over 65 billion messages per day
  • Handles over 100 million voice calls per day
  • Used in over 180 countries
  • Supports sending messages to offline users
  • Linked devices can send/receive messages without phone online
  • Supports over 100 localized versions

This vast infrastructure powers WhatsApp’s real-time messaging capabilities and offline support features for over 2 billion users worldwide. The backend system must be highly optimized to store, route, and deliver messages quickly even with this massive volume of data.

Database Requirements for WhatsApp

Based on WhatsApp’s infrastructure needs and scale, its database solution must meet several key requirements:

  • High availability – Serves billions of users globally, so downtime is unacceptable
  • Fault tolerance – Data must be reliably stored and replicated across multiple data centers
  • Scalability – Billions of users, 100s of millions of daily messages/calls means database must scale horizonatally
  • Low latency – Messages delivered in real-time require fast reads/writes to the database
  • Flexibility – Variety of data types like messages, media, documents require a flexible schema

These requirements make traditional SQL databases challenging for WhatsApp. Relation schemas would be difficult to maintain with varied data types and billions of users. Instead, a NoSQL database is better suited for WhatsApp’s infrastructure needs.

Why WhatsApp Uses CouchDB

To meet its demanding database requirements, WhatsApp chose to build its backend system using CouchDB. Some key reasons why CouchDB meets WhatsApp’s needs:

Document Model

CouchDB stores data in flexible JSON documents, not rigid relational tables. This allows storing different data types like messages, media files, etc. without needing to define schema ahead of time.

Scalability

CouchDB scales horizontally on commodity hardware through sharding and replication. This allows WhatsApp to scale to its billions of users cost-effectively.

High Availability

CouchDB utilizes master-master replication, meaning there is no single point of failure. WhatsApp can replicate its data across multiple data centers to ensure 24/7 availability.

Low Latency

With its B+ tree indexing, CouchDB provides fast lookups and queries, enabling WhatsApp’s real-time messaging capabilities.

Offline Support

CouchDB allows offline, asynchronous data replication. This allows WhatsApp to queue messages when users are offline and sync them when they reconnect.

Flexible Querying

In addition to lookups by document ID, CouchDB supports views for efficient querying and aggregation. This helps WhatsApp analyze data like message types, trends, etc.

JSON Friendly

With JSON as its fundamental data format, CouchDB integrates seamlessly with web and mobile apps.

How WhatsApp Uses CouchDB

Specifically, WhatsApp leverages CouchDB in the following ways:

User Accounts and Profiles

User account information like phone numbers, profile photos, statuses, etc. are stored as JSON documents in CouchDB. Profiles are quickly accessed when users open WhatsApp.

When recipients are offline, messages are queued in CouchDB for later delivery. Once the user comes back online, queued messages are delivered.

Indexing Messages

CouchDB views index messages by parameters like timestamp, sender ID, recipient ID, etc. for quick filtering and search. Messages remain available years later thanks to this indexing.

Storing Media

Shared photos, videos, and files are stored as attachments in CouchDB documents. These attachments are replicated across servers for redundancy.

Synchronizing Linked Devices

WhatsApp can synchronize message delivery across a user’s linked laptop, desktop, tablet, and phone by using CouchDB’s multi-master replication.

Analyzing Usage Data

CouchDB’s views and filters allow WhatsApp to analyze usage data like message types, frequency, and trends. This analysis helps improve the product experience.

Backing Up Conversations

CouchDB’s built-in replication enables automatic backups of conversations. Users can restore conversations by recovering backups if needed.

By leveraging CouchDB’s scalability, availability, and flexibility, WhatsApp built its messaging platform to serve billions of users globally.

CouchDB Architecture

CouchDB has a decentralized, fault-tolerant architecture optimized for internet-scale applications like WhatsApp. Some key architectural components include:

Documents

The basic unit of data in CouchDB is a document expressed in JSON (JavaScript Object Notation) format. Documents contain data in field-value pairs. WhatsApp stores each user, message, media file, etc. as a document.

Databases

Documents are organized into databases, analogous to tables in a relational database. WhatsApp likely uses separate databases for users, messages, media, etc.

Views

Indexes called views allow querying and aggregation of documents based on fields, similar to SQL queries. WhatsApp uses views to filter messages in conversations, find profiles, etc.

Replication

CouchDB replicates data between nodes in a cluster for redundancy and scaling. WhatsApp leverages replication to synchronize data across servers.

RESTful API

CouchDB exposes all functions via a REST API, enabling CRUD operations on documents from any language. WhatsApp uses the API to perform functions like creating users and sending messages.

Example CouchDB Database Schema

Here is an example of how WhatsApp may store data in CouchDB databases and documents:

Users Database

Document ID Data
user_123
{
  "_id": "user_123",
  "name": "John Doe",
  "phone_number": "(123) 456-7890",
  "profile_image": "image_123.jpg"
}
user_456
{
  "_id": "user_456",
  "name": "Jane Smith",
  "phone_number": "(098) 765-4321",
  "profile_image": "image_456.jpg"
}

Messages Database

Document ID Data
message_1
{
  "_id": "message_1",
  "sender_id": "user_123", 
  "recipient_id": "user_456",
  "timestamp": "2017-04-15T14:30:00",
  "text": "Hi Jane!"
}
message_2
{
  "_id": "message_2",
  "sender_id": "user_456",
  "recipient_id": "user_123",
  "timestamp": "2017-04-15T14:32:00", 
  "text": "Hey John, how's it going?"
}

This shows how user profiles and messages can be stored as JSON documents, with IDs for retrieval. Views could be added to index on fields like timestamp for querying.

Optimizing CouchDB for WhatsApp’s Needs

While CouchDB works well out of the box for many use cases, WhatsApp has customized and optimized CouchDB in several ways to fit its specific needs:

Master-Master Replication

Traditionally, CouchDB uses master-slave replication. However, WhatsApp developed a master-master model allowing bidirectional merging of data between nodes to improve write availability and redundancy.

Incremental Replication

To optimize transfers for low-bandwidth connections, WhatsApp added incremental replication so only changed data is synced between devices.

Message Expiry

To improve space usage, expired messages older than 30 days are deleted. This prevents endless storage bloat over time.

Attachment Indexing

File attachments like images and videos are indexed separately from messages to optimize storage and retrieval.

Local Database Caching

Frequently accessed data is cached locally on users’ devices for low-latency reads and writes.

End-to-End Encryption

All data transmitted between WhatsApp servers and clients is encrypted for security and privacy.

These optimizations help WhatsApp maximize the performance, efficiency, and reliability of CouchDB while storing and managing petabytes of messaging data.

Conclusion

WhatsApp chose CouchDB as the ideal database for powering its messaging platform given the app’s scalability and availability requirements. CouchDB’s flexible document model, tunable eventual consistency, and distributed synchronization capabilities make it a great fit for WhatsApp’s backend infrastructure.

While any distributed NoSQL database like Cassandra or MongoDB could work, CouchDB’s strong consistency, built-in replication, and JSON document model make it simpler for WhatsApp to build a reliable, global messaging platform. Custom optimizations like master-master replication, attachment indexing, and local caching help WhatsApp get the most out of CouchDB.

Overall, CouchDB’s scalability, high availability, and flexibility to evolve with emerging needs allowed WhatsApp to grow exponentially to over 2 billion users, cementing its position as one of the world’s most popular messaging apps.