XIP-79: Alternative node selection strategies

mkysel · January 15, 2026, 10:39pm

Authors: Isaac Hartford (api-Hypernova), Martin Kysel (mkysel)

Abstract

This XIP proposes a configurable node selection strategy system for XMTP Gateway nodes. Currently, the Gateway only supports stable hashing for deterministic pseudo-random topic-to-node mapping. This XIP introduces four additional selection strategies—Manual, Ordered, Random, and Closest—enabling operators to optimize node selection for testing, staged deployments, or latency-sensitive production environments.

Motivation

The existing stable hashing selector provides consistent topic affinity, ensuring messages for the same topic route to the same originator node. While this works well for production load distribution, it lacked flexibility for critical use cases:

Testing and development: Engineers need deterministic control over which node handles traffic (e.g., “route all messages to Node 100 only”)
Staged rollouts: Operators may want to prioritize specific nodes while maintaining automatic fallback
Latency optimization: Production deployments benefit from selecting the geographically closest node
Load balancing experiments: Pure random distribution provides an alternative for evaluating load patterns

Without alternative strategies, operators have no mechanism to direct traffic for debugging, staged deployments, or latency-sensitive optimizations.

Specification

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

1.0: Node selector interface

All selection strategies implement the NodeSelectorAlgorithm interface:

type NodeSelectorAlgorithm interface {
    GetNode(topic topic.Topic, banlist ...[]uint32) (uint32, error)
}

The interface accepts a topic and an optional list of unavailable node IDs to exclude. Implementations return a valid node ID or an error if no suitable node exists.

2.0: Supported strategies

2.1: Stable (default)

Strategy key: stable

Implements deterministic topic-to-node selection using SHA-256 hashing. Nodes are sorted by NodeID to ensure consistent ordering across restarts.

This strategy aligns with the originator selection algorithm defined in XIP-49. Topic stickiness is valuable because routing messages for the same topic to the same originator node leads to total ordering of messages for that topic, reducing ordering anomalies.

Algorithm:

Hash the topic using SHA-256, extract first 4 bytes as uint32
Sort available nodes by NodeID ascending
Compute virtual positions evenly distributed across uint32 address space
Binary search to find the node responsible for the topic’s hash position
If that node is unavailable, iterate to the next node in sorted order

This strategy provides topic affinity (stickiness) and minimizes churn when the node set is stable. Node set changes cause remapping for some topics.

2.2: Manual

Strategy key: manual

Selects the first available node from an explicitly configured list of node IDs. The topic is ignored.

Requirements:

node-selector-preferred-nodes MUST contain at least one node ID

Behavior:

Iterates through configured node IDs in order
Skips nodes not present in registry
Skips unavailable nodes
Returns error if no configured node is usable

This strategy is useful for pinning traffic to specific nodes during testing or partial rollouts.

2.3: Ordered

Strategy key: ordered

Selects the first available node from a preferred ordered list, falling back to any available node in the registry if none of the preferred nodes are usable.

Requirements:

node-selector-preferred-nodes MUST contain at least one node ID

Behavior:

First iterates through preferred nodes in configured order
If no preferred node is available, falls back to any registry node
Skips unavailable nodes

This strategy provides prioritization with automatic fallback for resilience.

2.4: Random

Strategy Key: random

Selects a node uniformly at random from all currently available nodes, excluding unavailable nodes. The implementation uses cryptographically secure random number generation (crypto/rand).

Behavior:

Filters out unavailable nodes from available nodes
Selects uniformly at random from remaining nodes
Returns error if no nodes available after filtering

This strategy provides statistical load distribution but no topic affinity or determinism.

2.5: Closest

Strategy key: closest

Selects the node with the lowest measured TCP connect latency, with optional restriction to a preferred node set.

Implementation details:

Measures latency via TCP handshake to each node’s HTTP address
Caches latency results with configurable expiry (default: 5 minutes)
Respects configurable connection timeout (default: 2 seconds)
If node-selector-preferred-nodes is set, restricts measurements to those nodes
Falls back to all registry nodes if no preferred nodes are available

Behavior:

On cache expiry, re-measures latency to all candidate nodes
Selects node with minimum measured latency
Skips unavailable nodes
Returns error if no latency measurements are available

3.0: Configuration

The following configuration options control node selection:

Option	Environment Variable	Description	Default
`--payer.node-selector-strategy`	`XMTPD_PAYER_NODE_SELECTOR_STRATEGY`	Selection strategy: stable, manual, ordered, random, closest	stable
`--payer.node-selector-preferred-nodes`	`XMTPD_PAYER_NODE_SELECTOR_PREFERRED_NODES`	Comma-separated list of preferred node IDs	(empty)
`--payer.node-selector-cache-expiry`	`XMTPD_PAYER_NODE_SELECTOR_CACHE_EXPIRY`	Cache TTL for closest strategy latency measurements	5m
`--payer.node-selector-connect-timeout`	`XMTPD_PAYER_NODE_SELECTOR_CONNECT_TIMEOUT`	TCP probe timeout for closest strategy	2s

The Uint32Slice type handles parsing of the preferred nodes list, accepting comma-separated values and gracefully handling empty strings.

4.0: Unavailable nodes and retry behavior

All strategies support an unavailable node tracking mechanism for fault tolerance:

When publishing to a node fails, the failed node ID is added to a request-scoped unavailable list
The selector is called again with the updated unavailable list to get an alternative node
This continues for up to 5 retry attempts
Each strategy filters unavailable nodes from consideration

5.0: Validation rules

Configuration validation enforces:

node-selector-strategy must be one of: stable, manual, ordered, random, closest
Strategies manual and ordered require at least one node in node-selector-preferred-nodes
node-selector-cache-expiry must be greater than 0
node-selector-connect-timeout must be greater than 0

Rationale

Design decisions

Five strategies vs fewer: Each strategy addresses distinct operational needs. Consolidating would reduce flexibility for specific use cases like controlled testing (manual) vs. production latency optimization (closest).

Interface design: The NodeSelectorAlgorithm interface accepts a topic even when unused (e.g., by Manual, Random, Closest) to maintain a consistent API. This allows transparent strategy swapping without code changes.

Unavailable list as variadic argument: The unavailable list is passed as ...\[\]uint32 to keep the common case (no unavailable nodes) simple while supporting retry logic without interface changes.

Closest strategy caching: Measuring TCP latency on every request would introduce unacceptable overhead. Caching with configurable expiry balances accuracy with performance.

Uint32Slice custom type: A dedicated type was implemented to handle comma-separated node ID parsing, gracefully accepting empty strings from environment variables without parse errors.

Alternate designs considered

Consistent hashing with virtual nodes: Considered for the stable strategy but rejected in favor of simple position-based hashing to minimize complexity. The current implementation provides sufficient distribution for expected node counts.

HTTP-based latency measurement: Considered for closest strategy but TCP handshake was chosen as it provides accurate network latency without requiring application-layer overhead.

Backward compatibility

This change is fully backward compatible:

The default strategy remains stable, matching prior behavior exactly
No changes to wire protocols or message formats
Configuration is opt-in via new flags/environment variables
Existing deployments continue to function without modification
The original stable hashing algorithm is unchanged

Test cases

Test cases should verify:

Each strategy returns a valid node ID from the registry
Strategies correctly exclude unavailable nodes
Stable strategy maintains topic affinity across repeated calls
Manual and Ordered strategies respect configured node ordering
Random strategy provides statistical distribution
Closest strategy selects minimum latency node
All strategies return an error when no nodes are available

Reference implementation

Reference implementation: PR #1483 - Gateway Alternative Node Selection Strategies

Security considerations

Closest strategy TCP probing: The closest strategy performs TCP probes to measure latency. These probes only complete the TCP handshake and do not send application data. However, probe traffic patterns could theoretically be used for node enumeration by adversaries monitoring network traffic. Operators SHOULD consider this when deploying in adversarial environments.
Manual strategy misconfiguration: Specifying non-existent node IDs in the manual strategy causes message delivery failures. The validation layer ensures at least one node is configured, but does not validate that configured nodes exist in the registry at startup time. Operators MUST ensure configured node IDs correspond to active registry nodes.
Random strategy entropy: The random strategy uses crypto/rand for secure random selection, ensuring unpredictable node selection patterns. This prevents adversaries from predicting routing to target specific nodes.
Unavailable list exhaustion: If all nodes fail and are added to the unavailable list, message delivery fails. This is expected behavior—the unavailable list prevents retry loops to consistently failing nodes.
Configuration injection: If an attacker gains access to configuration, they could redirect traffic via manual strategy. Configuration MUST be protected via standard infrastructure security practices.

Threat model

Malicious node targeting: An attacker who knows the stable hashing algorithm could craft topics designed to target specific nodes. Mitigation: The stable strategy uses SHA-256 hashing, making targeted topic crafting computationally expensive.

Latency manipulation: An attacker controlling network infrastructure could manipulate TCP latency measurements to influence closest strategy selection. Mitigation: The closest strategy is optional and operators should only use it in trusted network environments.

Node ID enumeration: Repeated calls to the selector could reveal the node set. Mitigation: Node registry information is already public; this does not expose new information.

Configuration injection: If an attacker gains access to configuration, they could redirect traffic via manual strategy. Mitigation: Configuration must be protected via standard infrastructure security practices.

Copyright

Copyright and related rights waived via CC0.

tantodefi · February 3, 2026, 2:08pm

Great work. ‘Farthest’ might have also made sense as an option for measuring worst-case latency across the node network. In this case would you first set payer.node-selector-cache-expiry to measure the TCP handshake probe for all the available nodes and then manually select the payer.node-selector-strategy pinning traffic to the furthest node to test that case?