Skip to Content

How to get WhatsApp message in Python?

WhatsApp has become one of the most popular messaging apps with over 2 billion users worldwide. With its widespread adoption, accessing WhatsApp data such as chat messages can be useful for many applications. For example, businesses may want to analyze WhatsApp conversations to understand customer feedback. Developers may want to build bots that can interact with users over WhatsApp. Researchers may want to study WhatsApp usage patterns and trends.

In this article, we will discuss different methods to access and extract WhatsApp chat messages using Python. We will go through the requirements, pros and cons of each method, and provide sample code snippets to get started.

Overview of WhatsApp Message Structure

Before we dive into the various methods, it is useful to understand how WhatsApp stores and structures chat messages. WhatsApp uses an encrypted SQLite database file stored on the user’s device to store all chat data. This database file is located at /sdcard/WhatsApp/Databases/msgstore.db on Android devices. For iOS devices, the path is /var/mobile/Containers/Shared/AppGroup/{AppGroupId}/msgstore.db.

The msgstore.db file contains various tables such as messages, chats, contacts etc. The messages table stores all the chat messages and has columns like key_remote_jid (the other user in conversation), key_from_me (1 if sent by you, 0 if sent by remote user), data (encrypted message text), timestamp and others.

So in order to access WhatsApp messages, we need to first get access to this database file and then extract the data we need from the relevant tables.

Requirements for Accessing WhatsApp Messages

Here are the key requirements for accessing WhatsApp chat messages programmatically:

  • Physical access to the device where WhatsApp is installed
  • Root access on Android devices to access the database file at /sdcard/…
  • Jailbroken iOS device or tools like AnyTrans to access filesystem
  • SQLite browser or library to open and query the msgstore.db file
  • Decryption key or utilize existing tools to decrypt encrypted message data
  • Python sqlite3 module and libraries like cryptography, python-whatsapp to work with decrypted data

Methods to Access WhatsApp Messages

Based on the above requirements, here are some methods to get WhatsApp chat messages in Python:

1. Using WhatsApp Backup Files

One approach is to leverage the chat backup files periodically created by WhatsApp. These backup files store unencrypted message data in multiple database files. The key steps are:

  1. Enable WhatsApp chat backups in Settings
  2. Copy latest backup files from device storage: WhatsApp/Databases folder on Android and iOS filepath mentioned earlier
  3. Access these backup databases from Python using sqlite3 module
  4. Query the messages table to fetch required message data

Pros:

  • Easy to access backup files compared to on-device database
  • Message data is unencrypted in backups

Cons:

  • Requires user to manually create a WhatsApp backup periodically
  • Only provides data until last backup time

Example code:

import sqlite3

backup_db = '/path/to/msgstore-YYYY-MM-DD.db' 

conn = sqlite3.connect(backup_db)
cursor = conn.cursor()

cursor.execute('SELECT * FROM messages')
messages = cursor.fetchall()

for msg in messages:
  print(msg)

2. Using Tools like WhatsApp Viewer

There are various third party tools available that can directly access WhatsApp data on a device and export it in a convenient format. For example:

  • WhatsApp Viewer – Desktop application to view and export WhatsApp messages from Android and iOS devices.
  • iOS Forensic Toolkit – Tool for forensic extraction of data from iOS devices including WhatsApp messages.

The key steps are:

  1. Use appropriate tool to access on-device WhatsApp msgstore.db file
  2. Export messages in CSV, JSON etc. formats
  3. Read exported data in Python and process as required

Pros:

  • Convenient export formats like CSV, JSON
  • Access to historical messages beyond backups

Cons:

  • Requires paid license for many forensic tools
  • Extra overhead of exporting and reading data

3. Direct SQLite Access on Rooted Android Devices

For rooted Android devices, we can directly access the msgstore.db database file and extract messages using sqlite3 module in Python. The key steps are:

  1. Root the Android phone
  2. Navigate to /sdcard/WhatsApp/Databases/ folder
  3. Copy the msgstore.db file to a system where Python script will run
  4. Use sqlite3 module to connect and query the database file

Pros:

  • Full programmatic access without extra tools
  • Access to historical messages beyond backups

Cons:

  • Requires rooting the Android device
  • Need to handle encrypted data

4. Using WhatsApp Web/Desktop APIs

WhatsApp provides Web and Desktop apps that sync messages from mobile devices. We can leverage the APIs used by these apps to access WhatsApp data including messages. Some options are:

  • python-whatsapp – Python library to connect to WhatsApp Web
  • Selenium browser automation to programmatically interact with WhatsApp Web

Key steps are:

  1. Use WhatsApp Web client or API wrapper library
  2. Programmatically interact and extract messages
  3. Process messages data in Python

Pros:

  • No need to directly access database files
  • Continuously synced data

Cons:

  • Partial message history available
  • API changes can break solutions

Handling Encrypted WhatsApp Message Data

As we noticed earlier, the WhatsApp msgstore.db file stores messages in encrypted form. The data column contains encrypted binary data prefixed with the encryption algorithm name. So we need a way to decrypt the message data before we can process it.

Some options are:

  • Use existing tools like WhatsApp Key/DB Browser that have built-in decryption capabilities
  • Manually implement decryption based on algorithm name – can use libraries like Cryptography in Python
  • Extract decryption keys from device using forensic tools like UFED Physical Analyzer

Once messages are decrypted, we can easily work with them in Python. Here is sample Python code to decrypt messages using Cryptography library:

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

backend = default_backend()
cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=backend)
decryptor = cipher.decryptor()

decrypted_data = decryptor.update(encrypted_data) + decryptor.finalize()
message = decrypted_data.decode()

Storing and Processing WhatsApp Messages in Python

Once we have access to the decrypted WhatsApp messages, we can store and process them in Python for further analysis and use cases. Here are some options:

Storing Messages

  • SQLite database – retain original structure
  • CSV/JSON files – easy portability
  • Save messages as objects in a Python list

Processing Messages

  • Data analysis – sender frequency, message length, time gaps
  • Apply NLP for entity extraction, sentiment analysis etc.
  • Building chatbots to automate responses
  • Creating visualizations and dashboards

Here is some sample code to process messages:

# Store messages in SQLite db
import sqlite3 

conn = sqlite3.connect('whatsapp_messages.db')
c = conn.cursor()
c.execute('''CREATE TABLE messages 
            (id integer primary key, sender text, message text, 
              timestamp text)''')

# Insert messages            
c.executemany('INSERT INTO messages VALUES (?,?,?,?)', decrypted_messages)

# Analysis example 
from collections import defaultdict

user_msg_count = defaultdict(int) 

for msg in messages:
  user_msg_count[msg['sender']] += 1

print(user_msg_count)

This provides some frequency analysis of messages per user.

Conclusion

In this article, we discussed different techniques to access WhatsApp chat messages programmatically using Python. The core steps are accessing the encrypted WhatsApp msgstore database from device backups or directly, decrypting message data, and processing messages for further analysis and applications.

Each approach has its own pros and cons. For most use cases, leveraging unencrypted WhatsApp backups provides a simple method to get started. On rooted Android devices, directly accessing msgstore.db allows fetching full message history. For continuously syncing latest messages, utilizing WhatsApp Web APIs is an option.

There are other advanced techniques like using a MiTM proxy to intercept WhatsApp traffic but they require setting up more complex environments. Also, while extracting messages is possible, consider the privacy aspects when processing personal conversations.

Overall, WhatsApp chat data provides a trove of insights for developers and researchers. With the ubiquity of WhatsApp, being able to programmatically access and analyze messages opens up many possibilities to build interesting applications. Python, with its vast libraries and easy syntax, is the perfect language to get started on this.