Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions honest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@

# Overview: Honest multiparty PayJoin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Overview: Honest multiparty PayJoin
# Overview: Multiparty PayJoin for honest peer threat model

this is a bit of a mouthful but i think important to not imply that there's "honest payjoin" and "dishonest payjoin" or whatever


The following is a concrete description of the honest multiparty PayJoin protocol. To understand why certain choices design were made, it is recommended to read the [overview document](./00_overview.md) first.

## Motivation

This protocol is best understood as a collaborative transaction construction protocol for mutually trusting parties. Its purpose is to let participants jointly build a transaction with potentially better privacy properties and cost savings than a unilateral construction. The other parties are not trusted with the safety of your funds. What they are trusted with is liveness (showing up, responding, and progressing the round) and privacy (handling protocol information honestly and not needlessly leaking it).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here probably isn't the right place to go into all the detail but "mutually trusting" should be qualified at least somewhat.

specifically:

we trust peers with respect to liveness (so not to deviate from the protocol, or disappear i.e. crash faults, which also means we trust they won't get disconnected from the internet in the synchronous communication model when/if we rely on timeouts for liveness)

we trust peers with respect to privacy, i.e. we trust that they will forget/delete any private information they may observe like timing information, ordering of messages, metadata obtained from the transport layer like IP addresses, etc etc

we do not need to trust peers with safety on the consensus layer, i.e. producing transactions that will cause us to lose money, because we assume SIGHASH_ALL, ensuring that funds are always disbursed in accordance with our intents. so any safety violations in the sense of entering an invalid state (like producing a transaction that overspends its input funds and therefore will never be valid) is therefore defined to be a liveness failure (so the liveness guarantees we would like to make are not just for a single transaction construction session among a fixed set of participants, but the juxtaposition of all such sessions that an individual participant participates in)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please don't call it a "round" i've been trying so hard to get that wasabi brainrot out of my head, but everywhere else in distsys, cryptography etc, a "round" is a round of communication, whereas apparently in wasabi it's some imagined arena where shit takes place (c.f. the other naming choices) which is disparaging of circus performers who, unlike wasabi, take their job seriously


## Roles

### Initiator and Responder

The Initiator signals willingness to batch to their counterparty over a bidirectional channel. This signal is conveyed by including the `mppj=1` parameter in the BIP21 URI. Either the BIP77 sender or receiver may be the Initiator. Parameters for their bi-directional channel may be encoded in the BIP21 (same as BIP77).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should hedge re the URI mechanism for now, the more i think about it the more pressing it seems to have payment instructions that allow for post quantum HPKE, sender initiated interactions, and long term / reusable authenticated channels to be established, as all of these make a material difference for multiparty and especially for semi-honest multiparty


The Responder is the counterparty who receives this signal. A Responder that does not support multiparty PayJoin will ignore the `mppj=1` parameter and proceed with standard BIP77. A Responder that supports it waits for a session to be created.

Two timeouts govern the phase of the whole protocol:

* `T_intent`: the duration both parties are willing to wait for a session to be created after the intent to batch is signaled. If no session is created within `T_intent`, both parties MUST fall back to standard BIP77 over their existing bidirectional channel.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since URIs can be delivered over an async channel i think this timeout should be an abs expiry time as opposed to a duration

* `T_session`: the duration of the multiparty session itself, after which the session is considered expired.

`T_session` is defined by the `SessionCreator` while `T_intent` is defined by the `Initiator`.

// TODO: how do we indicate to sender/recv that we should join a mppj? out of band? New message? mailbox with more payment instructions?

### SessionCreator

Either the `Initiator` or the `Responder` may create the session. The party that does so is the `SessionCreator`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Either the `Initiator` or the `Responder` may create the session. The party that does so is the `SessionCreator`.
Either an `Initiator` or a `Responder` may create the session. The party that does so is the `SessionCreator`.

not sure where the right place to explain the graph of participants, where the potential participants are all parties in a connected component, intiator/responder is just the directedness of the edge and only indicates who first connected to whom, so that a session creator is any vertex in a connected component

The `SessionCreator` is responsible to creating session parameters (defined below), bootstrapping the transport mechanism and disseminating session information to the rest of the peers to the best of their capabilities. `SessionCreator` holds no special authority once the session is live. They simply become a participant.

### Participant

Once a party joins a session they become a `Participant`. All participants share the same obligations as outlined below in the phases section.

### Diagrams

Single receiver, two senders. Receiver is `Initiator` for both senders and becomes the `SessionCreator`.

```mermaid
sequenceDiagram
participant R as Receiver (Initiator)
participant S1 as Sender 1 (Responder)
participant S2 as Sender 2 (Responder)

R->>S1: BIP21 URI (mppj=1)
R->>S2: BIP21 URI (mppj=1)

R->>S1: session invitation (s, session params)
R->>S2: session invitation (s, session params)
```

Sender 1 is the `Initiator` to the receiver who is an `Initiator` to sender 2. The receiver at time 3 becomes the `SessionCreator`.

```mermaid
sequenceDiagram
participant S1 as Sender 1 (Initiator)
participant R as Receiver (Responder / Initiator / SessionCreator)
participant S2 as Sender 2 (Responder)

S1->>R: BIP21 URI (mppj=1)
R->>S2: BIP21 URI (mppj=1)

R->>S1: session invitation (s, session params)
R->>S2: session invitation (s, session params)
```

## Session Parameters

The SessionCreator fixes the following parameters before the session opens. All participants must verify that the final transaction conforms to the relevant parameters before signing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also require a total size limit

if the number of participants is known then we can say each participant gets so many weight units

i'm not sure how to handle this but we can also just ignore it (so that technically producing a nonstandard or even invalid transaction due to being oversized is still considered "success" if it could have been valid without the blocksize limit) because i don't think that would be a problem in practice


* **Global transaction fields**: `nLocktime`, `nVersion`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nLocktime can be handled similar to BIP 174's fallback locktime, i.e. take the max of the fallback and the per input fields.

what does need to be set a priori is whether time or height based, in order to ensure only one type of OP_CLTV input can be added if at all as those cannot be co-spent

* **Feerate**: each participant contributes fees proportional to the weight of their inputs and outputs
* **Input constraints**: `nSequence`, script type, segwit only
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw segwit only is just a shorthand for specific allowed script types

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we need output type constraints other than standardness?

in BFT we probably do want to allow that for improved privacy (enforce only p2tr for example) or for other reasons (JPEG only coinjoin, for example ;-)

* **Timeout**: `T_session` (see Roles)

## PSBT CRDT

### Join Semantics

Participants learn transaction fragments in arbitrary order and accumulate them as they arrive. In the honest setting there are no conflicting writes: global fields are fixed by the session parameters and each participant controls disjoint inputs and outputs. Any two valid fragments can therefore be merged by union.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Participants learn transaction fragments in arbitrary order and accumulate them as they arrive. In the honest setting there are no conflicting writes: global fields are fixed by the session parameters and each participant controls disjoint inputs and outputs. Any two valid fragments can therefore be merged by union.
Participants learn transaction fragments in arbitrary order and accumulate them as they arrive. In the honest setting there are no conflicting writes: global fields are fixed by the session parameters and each participant controls disjoint inputs and outputs. Any two valid fragments can therefore always be merged.

"union" is defined for sets, this requires a more specific definition but i think just "merged" is sufficient


If the accumulated transaction does not balance, or any fragment violates the session parameters, a participant refuse to sign and abandon the session.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"balance" is not yet defined

this should take the form of a global proprietary field with keydata (i need to double check that this is actually possible) where the keydata is a unique ID and the value is a number of satoshis that makes balance of exactly 0 the termination condition (and makes it possible for anyone to unilaterally abort the entire session by adding one of those with MAX_MONEY)


// TODO: refer to nothingmuch's document

## Communication model

### Gossip broadcast over Iroh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also add https://github.com/marmot-protocol/mdk, the directory etc...

the point to emphasize IMO is that it doesn't matter whether we have transport layer privacy from counterparties in this model

we should have some discussion about transport level privacy WRT a 3rd party observer (i.e. traffic analysis, "we kill people based on metadata", etc)


In the honest setting, participants can use an authenticated and encrypted gossip broadcast channel such as [Iroh gossip](https://docs.iroh.computer/connecting/gossip) to disseminate PSBT fragments. Each participant broadcasts their protocol messages (inputs, outputs, readiness declarations, and witnesses) to the session topic, and peers merge the received fragments into their local transaction view.

This setting does not require transport-layer metadata privacy as a protocol requirement. The reason is that participants are already mutually trusted with privacy in the honest model, including trust not to retain or misuse linkability information learned during the session. As a result, unlike the [semi-honest](./semi-honest.md) setting, the protocol does not depend on anonymous transport primitives to maintain the intended privacy properties within the participant set.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also add https://github.com/marmot-protocol/mdk, the directory mailboxes, matrix, etc...

the point to emphasize IMO is that it doesn't matter whether we have transport layer privacy from counterparties in this model

we should have some discussion about transport level privacy WRT a 3rd party observer (i.e. traffic analysis, "we kill people based on metadata", etc)


In this honest setting, a separate agreement protocol is not required for the success path. Gossip dissemination plus deterministic transaction construction is sufficient: if participants receive the same valid fragments, they converge to the same unsigned transaction. Any temporary view differences are primarily a liveness concern (delay or retry), not a fund-safety concern, because each participant still performs local validation and only signs an acceptable transaction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"eventual consistency" is the specific kind of liveness/consistency guarantee that we rely on, and this depends on the balance condition (when the balance is positive we're waiting for a message from one of the participants, if it's 0 we can initiate signing and either one or more parties will refuse to sign because the balance == 0 condition was reached due to a malicious peer, in which case SIGHASH_ALL enforces safety and the signature aggregation may just hang indefinitely or if balance < 0 then everyone can abort)

so technically we don't provide termination but if honest messages get delivered eventually and there are no faulty peers then the protocol will terminate outputting a valid txn


### Message Reconciliation

Iroh gossip is a fire-and-forget dissemination mechanism. A peer that is offline or disconnected during the registration window will not receive messages broadcast while they were absent. When they reconnect, gossip provides no mechanism to recover missed messages. A peer with an incomplete view of the transaction will either construct the wrong unsigned transaction or fail to sign entirely.

Reconciliation requires a persistent log that a rejoining peer can replay from a known index.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an index will not generally be available as messages are only partially ordered

secondly set reconcilation by definition is about sets, not sequences, so i think there's some misunderstanding?


With a trusted leader (RAFT-like protocol)

A leader maintains a replicated log of all session messages. A peer that reconnects identifies the last log index it observed and requests all subsequent entries from the leader. This is the core log replication operation RAFT is designed for and requires no additional mechanism. The tradeoff is that the leader is trusted with liveness: if the leader is unavailable, reconciliation stalls.

Without a trusted leader

Each peer must retain the full session message log for the duration of the session and support explicit state sync requests from reconnecting peers. A reconnecting peer can detect gaps by comparing a content-addressed message set with a peer that stayed online. This approach distributes the liveness burden but requires peers to know what they are missing, which gossip does not provide by default. This would be a much larger engineering burden.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full message set can be synced via set reconciliation at least in principle, since iroh uses that i'm not sure how it prunes old messages


Given that the honest model already accepts a trust assumption on liveness of a single peer, RAFT is the natural fit for this setting. The session creator can presumably become the RAFT leader.
// TODO: how to do discovery of other peers?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discovery of their addresses etc?

that should be gossipped as well, intitial payment instructions should provide each peer's address(es) to its immediate peers, and those peers will share that with their other peers, so ultimately it's the URI / long lived channel mechanism that bootstraps discovery


## Protocol Phases

Input and output registration can be sent in any order. Ordering and sorting semantics must be defined a priori.
// TODO: (Should outputs get defined before inputs ?)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should outputs get defined first?

One possible definition is to use the hash of the protocol transcript as a salt to sort the inputs and outputs.

All messages are base64 encoded as PSBT fragments.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why base64? that just adds a 37% overhead for no good reason, in BIP 77 the rationale was "BIP 78 already got it wrong so this simplifies parsing" (which i still don't really agree with)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats your prefered encoding?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none, just the raw psbt binary data

if a particular transport is not binary safe or favors some encoding strongly over another, then that's a consideration for that transport


// TODO: coinselection strategy? do peers run their own coinselection based on their own target outputs? What if a recv doesnt have target outputs? Would peers change outputs based on others output selection?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope for this doc


### Input Registration

Each participant submits the transaction inputs they control. Inputs must be posted as independent messages.
Global passive observers should not be able to determine which inputs originate from the same party. If the transport mechanism is encrypted this should not be an issue.

### Psuedo Outputs

// Move to semi-honest. In honest, peers can just declare how much they are burning in their input messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need either this or some "ready to sign" announcement, i would prefer just one mechanism and this one can be optimized in the BFT setting to avoid making proofs in the happy path

see above for comment about how to define concretely, i'm leaning towards not adding dummy outputs as those may end up in actual transactions due to bugs / user error, and other than inflating the scriptPubKey to above blocksize limit the only mechanism preventing such transactions from being broadcast is standardness (so e.g. empty)

they would also need an exception when calculating size and feerates


A pseudo output is an declaration of fee contribution above the session-mandated minimum. It participates in the balance equation like a real output but does not appear in the final transaction. Participants MUST post a pseudo output only when their intended fee contribution exceeds what the session parameters require.

When the global sum of inputs minus outputs minus pseudo outputs reaches zero, every participant can independently verify the transaction is balanced and proceed declare Ready-to-sign.

### Output Registration

Output and pseudo output messages MUST carry a unique identifier to prevent double accounting. E.g a peer may read an output message multiple times. Since `TxOut`'s are not uniquely indentifiable that peer would have no ability to de-duplicate.

A participant who wishes to back out of the session posts an output that causes the transaction balance to overflow. That is, their declared outputs exceed their input contribution. This makes the balance equation unsatisfiable and must cause all other participants to refuse to sign.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never be done by honest participants, except when detecting malfeasance and even then only as a hint to honest parties as they defect from the session


### Ready-to-sign (RTS) declarations

For each input they control, participants post a RTS declaration. This signals that they accept the current transaction template and are prepared to sign it.
Only once all inputs have corresponding ready signals does the protocol advance. This ensures that all participants have finished contributing transaction fragments.

When the global sum of inputs minus outputs minus pseudo outputs fee declarations hits zero, every participant can independently verify the transaction is balanced and sign.

### Witness provision
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should cite BIP 174 combiner role


Participants provide witnesses for the inputs they control. Once all witnesses are available, any participant can assemble the fully signed transaction and broadcast it to the Bitcoin network.
64 changes: 64 additions & 0 deletions semi-honest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Overview: Semi-honest multiparty PayJoin

The following is a concrete description of the semi-honest multiparty PayJoin protocol. To understand why certain choices design were made, it is recommended to read the [overview document](./00_overview.md) and [honest protocol document](./honest.md) first.

// TODO: supporting silent payments
// TODO: Two phases ?: you cannot add output until every input owner indicated no more OR we just say liveness is weak.
Since this is semi honest the only thing we need tolerate is crashes: someone has a delay and adds inputs at the last second

## Motivation

Two-party and honest versions of the multi party Payjoin protocol preserves privacy of input/output ownership against third-party observers, but it does not preserve privacy from the view of the counterparty itself. E.g In a two-party protocol, each participant can trivially attribute all unknown inputs and outputs to the other party. This reveals cluster information and requires counterparty trust.

With n > 2 and metadata privacy, this privileged view is reduced. Payments and change outputs become ambiguous within the participant set. As a result, participants do not need to trust any specific counterparty with their clustering information. Increasing the number of parties therefore reduces counterparty trust and weakens clustering inferences.

## Threat model

This protocol operates in a semi-honest (honest-but-curious) model. All participants are assumed to economically proximate and thus follow the protocol as specified. They are not expected to deviate from the rules or misbehave. However, they may attempt to learn as much as possible from the messages they observe and the final transaction.

Concretely, if any party learns the full plaintext transcript of messages, they should not be able to determine which inputs or outputs belong to which of the other participants.

The protocol does not assume Byzantine robustness, and it does not attempt to detect or punish misbehavior. If a participant fails to follow through, the protocol may fail, but safety is not compromised - i.e participants will only provide witness if their expected outputs are included in the final transaction.

Its possible for some participants to join late. If there are N RTS messages and a participant then registers then inputs and outputs this trivially creates a input-output link.
Possible mitigation include ignoring the laggard when the effective balance condition and N RTS's have been collected.

## Roles

The roles defined in the [honest protocol](./honest.md#roles) apply here without modification. The semi-honest model does not change who the Initiator, Responder, SessionCreator, or Participant are, nor how sessions are initiated. The only difference is that participants in this model are curious and may attempt to infer ownership links from observed messages and the final transaction. The communication model is therefore strengthened to prevent this.

## Communication model

In the semi-honest model, participants follow protocol rules but are still curious and may try to infer ownership links from any side channel available to them. For that reason, content encryption alone is not sufficient: if transport metadata reveals who posted which message and when, peers can correlate messages into participant-level clusters.

Metadata privacy is therefore a protocol requirement in this setting. The communication layer must hide sender network identity and reduce linkability across messages, so that learning the transcript does not trivially reveal input-output ownership.

This is why iroh gossip is not sufficient here as the primary transport. While iroh provides efficient dissemination, it does not by itself provide the metadata-hiding guarantees this threat model requires.

Separately, dissemination and agreement should be distinguished. Gossip is enough to disseminate messages and can still provide eventual convergence when deterministic merge rules are used. A separate agreement mechanism (some specific instantiation of a lattice agreement protocol) is only needed when stronger guarantees are required for intermediate consistency, timely termination, (crashes?) or recovery under communication disruptions.

Given those tradeoffs, using the BIP77 directory for total order is a simpler direction. A shared append-only mailbox gives participants a practical, common message order to process, which reduces the need to deploy and tune a separate distributed agreement layer. In other words, it combines the metadata privacy this model requires with a straightforward coordination primitive that is easier to implement and operate.

### BIP77 Directory as anonymous broadcast channel

Communication is mediated by a PayJoin directory accessed via OHTTP, following the same metadata privacy model as BIP77. For multiparty use, the directory mailbox supports append semantics. Multiple participants write to the same mailbox, and peers poll and retrieve all appended messages. The mailbox functions as an anonymous broadcast log of encrypted payloads.

#### Shared session secret

A session is defined by a single ephemeral shared secret s. Any participant who learns s can join the session. Knowledge of this secret is the only admission control mechanism.
Participants derive mailbox identifiers and initialize their HPKE context with s. The shared secret is distributed via the existing bidirectional channel. Parties who learn s can both read and write to the mailbox. All payloads are encrypted using HPKE, so the directory only handles opaque ciphertext blobs and cannot link messages to participants.

## Protocol phases

For the canonical phase-by-phase flow, see the honest protocol document: [Overview: Honest multiparty PayJoin](./honest.md#protocol-phases).

### Message Timing

To prevent timing correlation between a participant's messages, each message must be assigned a randomized delay before posting.

If the total number of messages is known in advance, sample n uniform random times within the session window and post each message at its assigned time.

Message publication times are assigned in two phases:

1. Input registration, output registration, and RTS declarations are assigned publication times at the start of the session.
2. Witness provision messages must not be assigned publication times until registration closes.