tl;dr
We’ve been solely focused on building the most open, and secure messaging protocol in the world. We now have a secure and decentralized messaging network based on the MLS standard, which is also quantum-resistant. Our focus now is on how to scale secure groups and look around the corner to how we will do larger groups in the future.
Large groups come in many different sizes. Below, we will discuss our plans for scaling secure groups on XMTP today and lay out our different paths for very large groups in the future. Please join the conversation and help guide the future of large groups on XMTP.
-
Secure messaging that scales to 1000-2000 members and our plans for improving performance here
-
Different paths for scaling to much larger group experiences
/ tl;dr
What’s the problem?
Today, XMTP enforces a group size limit of 250 members per group. Developers are asking for more. We have a well defined path to scale XMTP with MLS - offering the same privacy and security guarantees we offer today - to 1000 or more members per group, and are doing the work to get there.
Here’s the plan for scaling MLS secure groups on XMTP to ensure our foundation delivers on our security, privacy, and decentralization promises.
-
Cut down the bandwidth required to invite a large number of people to a group at once. [Estimated new group limit: 400]
-
Shrink the amount of metadata required to modify the group using AppDataUpdate. [Estimated new group limit: 600]
-
Make our SDK more efficient when processing changes to a group. [Estimated new group limit: 800]
-
Allow all of your devices to appear as a single member in a group using something like Virtual Clients. [Estimated new group limit: 2000-3000]
This is our path for scaling XMTP secure groups built on the MLS standard.
XMTP is built on the MLS standard and offers a pretty maximalist set of features when it comes to security, privacy, and trust-minimization. It offers:
-
Message encryption. Node operators and observers on the network cannot read any messages from your conversation.
-
Forward secrecy. If a device is compromised the attacker cannot go back and decrypt past conversations.
-
Post compromise security. If a device is compromised an attacker cannot read messages indefinitely into the future.
-
Message authenticity. Everyone in a group chat can cryptographically verify that a message was sent by a given member.
-
Authentication of group membership. Everyone in the group will see the same list of group members, and this guarantee is cryptographically provable.
-
Authentication of group metadata. Everyone in the group will see the same metadata for a conversation, and this guarantee is cryptographically provable.
-
Trustless identity. Anyone in a group can cryptographically verify that a group member is associated with their public identity (wallet address, passkey, etc)
-
Quantum-resistant security. Your messages are protected from attacks from quantum computers that haven’t been invented yet.
These security properties are critical for a protocol you trust with your most sensitive and important conversations. We don’t want to compromise on these properties for the sake of scaling group size.
Still, there are many use-cases for messaging where you don’t want the group size to be constrained by technology at all. Brands, artists, and online communities want to be able connect with their audience through messaging without a hard ceiling.
That leaves us with the question: what if we designed a new protocol specifically designed for very large groups?
I want to share two proposals for potential paths to scaling XMTP into the tens of thousands of members using the same decentralized identity foundation.
Scaling beyond ~1-2k members
Getting to the tens of thousands of members is mostly a discussion about which corners we are willing to cut. WhatsApp and Telegram both offer groups that scale to hundreds of thousands of users, but the way they got there was by not offering any of the privacy and security properties above (even encryption!).
I don’t think we need to go as far as Telegram and WhatsApp, but we are going to have to make compromises to make something scale into the tens or hundreds of thousands of members. The good news is that privacy expectations should be different for groups of this size. This isn’t a conversation with a few friends. These are groups the size of a sports stadium.
Side note: One overlooked reason we want to ensure we keep encryption is because in an open network with nodes all over the world, we need to ensure that those nodes are not able to access any data or decrypt any data. Nodes should be able to run the XMTP network, be sustainable, and not be able to censor anything because they can’t see anything. This is also why we upgraded the protocol to be quantum-resistant.
Path 1: Channels
Many of the asks for larger XMTP groups are around broadcast use-cases. Brands, creators, and communities want a way for a relatively small number of senders to deliver updates to a very large audience.
The constraints of broadcast channels (not everyone can publish to the group) present unique opportunities for very high scale. We see a path for broadcast channels to be able to handle hundreds of thousands of members.
This proposal still has a few key privacy and security features, including end-to-end encryption. We see E2EE as a non-negotiable in a decentralized network because anyone can access the data stored on the nodes, and node operators do not want to be put in a situation where they have to moderate uploads. Moderation would be the responsibility of Channel Admins, who would be able to add/remove members, hide messages, and change the group’s metadata.
Key Properties
-
Private channel metadata. Only channel members could see the name and description of the channel.
-
Public channel membership. Anyone could see the ID of a group (random UUID) and a list of which XMTP inboxes were currently members of the group. (We think we’ll be able to hide this in the end but we wanted to start on the side of “a little too public” and see if this is critical for builders. We’d love to hear from developers on this.)
-
Maximum of 1000 admins. Up to 1000 members would be able to be given an admin role, allowing them to send messages to the channel, moderate/hide messages, and add/remove members.
-
Regular members can only react. Regular (non-admin) group members would be able to react to messages but not send regular group messages.
-
New members would see the full history. Unlike XMTP V3, new members to a group would be able to see messages sent before they were invited.
-
Delayed removals for regular users. When an admin kicks a member out of an XMTP group today, the keys of the group rotate immediately so that the removed member cannot see any messages sent after their removal. With channels, we would have some delay to when this takes place cryptographically. The app’s UI could still notify the user that they have been removed immediately and hide the channel, but we would not rotate the keys every time a user was removed. Adding a member would remain instant.
-
Immediate removals/additions for admins. We would still be able to add/remove admins and have those changes reflected immediately, protecting against cases where a recently-removed admin could cause chaos on their way out the door.
-
Semi-confidential reactions. Reacting to a message would make the following metadata public: the ID of the message that was being reacted to, the reaction emoji, and the XMTP inbox ID of the person reacting. Message contents would remain private.
A system like this could scale very efficiently, since much of the heavy lifting could be handled by the nodes. For example, thousands of Reactions could be aggregated into a single count to save clients from having to read and verify each reaction individually. Individual reactions would still be signed, and clients could audit the logs to ensure that nodes did not tamper with the reaction count. With limited key rotation, encrypting messages would be very low overhead.
We believe something with these properties could achieve censorship-resistance in a decentralized network.
Path 2: Gossiped Sender Keys
Another popular way to handle large groups with many senders is through what are commonly called Sender Keys.
With Sender Keys, each group member uses a symmetric secret key to encrypt their messages. That key gets disseminated to all other members of the group and is periodically rotated (typically after someone leaves). When a user sends their first message to the group they create a new key and share it with everyone in the group through some other authenticated channel - usually an invisible 1:1 conversation. Passive listeners in a group need to collect the sender keys of others, but do not need to create or share any of their own.
In WhatsApp or Signal, Sender Keys are pushed out to all group members through a secure 1:1 message between the sender and every other group member. This limits group size to around 1,000 because of the overhead of sending new keys to all members on first-send or after rotation. There’s only so many encrypted DMs you can send from your phone before performance becomes a problem.
The Towns Protocol uses a variation of Sender Keys we’ll call Gossiped Sender Keys. In Gossiped Sender Keys any member of the group can submit a KeySolicitationRequest asking for the keys that they are missing in bulk. Any other member of the group can respond to the request if they have matching keys stored locally. This offers much more scalability at the expense of some different trust assumptions.
-
A malicious client could provide the wrong keys for a targeted other user, effectively censoring them from the group.
-
A malicious client could provide the wrong keys for every other user to one particular requester, effectively locking the requester out of the group
-
A client with an out-of-date view of the group’s membership may accidentally send keys to a former member who was recently removed from the group.
Gossiped Sender Keys also has scaling limits. There are only so many of these keys that we can pass around between mobile devices. A group with 20k senders and 20 rotations per member would require 12.8MB of keys to read all the messages.
Clients can request missing keys on an as-needed basis, only requesting keys when they encounter a message with a key they don’t have. Since only a small percentage of the total keys would be required to read the most recent few pages of messages in a group, and most clients are not actually reading the full history, you should be able to get much better perceived performance for cases where users only read the most recent messages in the group.
Any form of Sender Keys is going to work best when in-practice the number of senders is a small percentage of the total group. Seemingly minor features like read receipts or reactions might dramatically change this ratio and take a 10,000 member group from “works great” to “completely hosed”. That makes it hard to definitively say how large a group can get. If everyone is a sender, and is rotating their keys regularly, you might hit performance problems with only a few thousand devices. But that should be rare in the real world. Ballpark estimates are 20-30k users for more realistic scenarios.
Key Properties
-
End to end encrypted. All messages are encrypted and metadata about who sent the message is private. The only unencrypted identifier on a message would be the unlinkable
sender_key_id. -
Unlimited senders. There would be no “hard limits” on how many users can send messages in a group. Message decryption will get slower in groups that have a large number of senders, especially if the user is attempting to catch up on a large number of messages at one time or download an archive of the complete conversation.
-
Multi-channel communities. Gossiped Sender Keys lends itself well to Discord-like multi-channel communities.
-
Private group metadata. Only channel members can see the group name or description
-
Public group member lists. Everyone in the group needs to be able to easily check if a request for keys is coming from a valid group member. The most straightforward way to do that is to make the member list public. Alternatively, we could create a separate cryptographic protocol for securely sharing the member list privately…but it’s not a small undertaking.
-
Delayed removals. A member who has been kicked out of a group may be able to send and receive messages for some time after their removal. Clients would eventually detect the removal and start rejecting messages from the user.
-
Someone needs to be online. Gossiped Sender Keys requires at least one group member to be online at all times. If no one is online to provide the keys, messages will be undecryptable until someone with the keys comes back online.
-
Trustless identity. Because Gossiped Sender Keys only looks up identities on an as-needed basis, identity lookup should be able to work roughly the same way it does with MLS and requires no trusted intermediaries.
Why is it so hard to scale secure messaging?
We think we can scale the system we have today, with the security properties above, to around 1000-2000 members (5,000-10,000 devices) through optimizations around how we use MLS. Beyond that, we will start to hit walls that are going to be hard to break through and have great performance on mobile devices.
Those limits are comparable to the limits on secure group size for Signal and WhatsApp, which both cap out at 1000 members.
Group member verification
When you join an XMTP group, your client needs to download the Identity Update history for every member of the group. This is how we get trustless identity. Those identity updates can then be used to cryptographically verify that the keys someone is using to sign a message are linked to the wallet or passkey that is their identity. Those Identity Updates are also used to link all of a user’s devices: so when someone is in a chat on both their phone and their iPad they appear as a single sender.
Reading the Identity Update log for a single user requires downloading somewhere between a few hundred bytes to 20kb of data, and then verifying up to 512 signatures in the worst case scenario (typical users would need less than 10 signature verifications). When you multiply that by 10,000…it’s a lot of computation to be doing on a phone.
There is an IETF draft proposal for something called Partial MLS which allows clients to defer this work until after you’ve joined the group. But if the work of verifying tens of thousands of group members takes minutes on a mobile device, it’s still going to be a bad experience eventually.
Commit size
XMTP V3 is built on top of the MLS protocol. Any changes to the group itself (adding/removing members, changing the metadata, rotating keys) happen through a MLS commit. This is how we get Forward Secrecy and Post Compromise Security. The size of each commit grows logarithmically with group size, which is better than the Sender Keys algorithm used by Signal and WhatsApp (exponential complexity when removing members). Even still, at some point around 20,000 devices we expect commits to become too large to be practical for mobile devices to process (1MB or more).
More importantly, in very large groups we should expect members and devices to be added/removed often. This leaves us in a situation where commits are both large and sent frequently.
Let’s discuss.
Questions
Looking for feedback around this direction and proposal
-
If you could only choose one of the above proposals, which would you choose?
-
If you are currently a developer of an XMTP app, is this something you would use if it were available?
- If not, what would change your mind?
-
Are there specific groups on other messaging platforms you’re a part of that you could see moving over to a decentralized protocol like this?
-
How would you expect agents to fit into these very large groups?
-
Are there other formats or groups you’d want to see?