feat: CON-1391 Return result of remote NiDKG as soon as sufficient dealings are available#8469
Merged
feat: CON-1391 Return result of remote NiDKG as soon as sufficient dealings are available#8469
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Reduces remote NiDKG request latency by creating and delivering remote transcripts in data blocks as soon as enough dealings are available, and validating those early transcripts deterministically.
Changes:
- Build early remote NiDKG transcripts in data payloads (capped) once collection thresholds are met.
- Validate early remote transcripts by re-creating them during payload validation and comparing for equality.
- Extend tests/utilities and mocks to support fetching replicated state and asserting new behavior.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| rs/types/types/src/consensus/dkg.rs | Adds a constructor to build DkgDataPayload including early remote transcript results. |
| rs/test_utilities/artifact_pool/src/consensus_pool.rs | Adjusts test payload builder’s max dealings per block. |
| rs/consensus/src/consensus/notary.rs | Updates notary tests to mock get_state_at() for new state access requirements. |
| rs/consensus/dkg/src/test_utils.rs | Adds helpers for constructing DKG contexts and extracting configs/dealings/transcripts from test blocks. |
| rs/consensus/dkg/src/payload_validator.rs | Enables validation of early remote transcripts by recomputation + equality check; adds logging. |
| rs/consensus/dkg/src/payload_builder.rs | Builds early remote transcripts in data blocks and avoids reprocessing targets already completed in data blocks. |
| rs/consensus/dkg/src/lib.rs | Introduces MAX_EARLY_REMOTE_TRANSCRIPTS and adds extensive tests for early transcript behavior. |
| rs/consensus/dkg/Cargo.toml | Adds dev-dependencies needed by new tests/mocks. |
| rs/consensus/dkg/BUILD.bazel | Adds Bazel dev dependencies for new test code. |
| rs/artifact_pool/src/consensus_pool_cache.rs | Updates tests to use a mock state manager that supports get_state_at(). |
| rs/artifact_pool/Cargo.toml | Adds dev-dependencies for state manager labeled state and initial state helpers. |
| rs/artifact_pool/BUILD.bazel | Adds Bazel dev dependencies for state manager and test utilities state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 16, 2026
…9766) ## Background At the start of each interval, nodes broadcast NiDKG dealings for key generation and resharing. These dealings are included into data blocks. Once enough dealings exist on chain, they may be aggregated into a full NiDKG transcript. #8469 allows the subnet to aggregate dealings not only in the following summary block, but already as part of data blocks, as soon as enough dealings exist on the parent chain. ## Problem Due to size and validation time, the number of dealings to be included on chain is limited (currently to 1 dealing per block). Additionally, the current behavior of selecting dealings to be included is sub-optimal: 1. The subnet attempts to include all (up to `N`) dealings on chain, even if only `f+1` (fresh key) or `2f+1` (high threshold re-sharing) dealings are required to create the transcript. 2. The dealings are selected pseudo-randomly (based on the order of their hashes). This means, on average, all transcripts will complete at roughly the same time, once enough dealings for all of them exist on chain. This has an adverse effect on latency. ## Proposed Changes This PR proposes to stop including more dealings for a config once the config's `collection_threshold` has been reached. Additionally, dealings receive priority to be included based on the following conditions (in order of precedence): 1. While there are less than `MAX_REMOTE_DKGS_PER_INTERVAL` (currently set to 1) completed remote targets in the current interval, prioritize remote dealings. Otherwise, prioritize local dealings. This is to improve the latency of remote DKG requests such as `SetupInitialDKG` and `ReshareChainKey`. The number of prioritized remote DKGs is limited to avoid starving local DKG. Note that currently, the number of remote DKG *configs* per interval is also limited to 1 (by the same constant). This may however disappear in the future. 2. Among remote targets, prioritize dealings for targets that are closer to their collection threshold. This ensures progress is being made for one specific target (instead of all of them at once), which improves best- and average-case latency of remote DKG requests. ## Future Work Instead of only starting to work on remote DKG requests at the beginning of an interval, we will do so also as part of data blocks, as soon as the remote DKG request appears in the replicated state. --------- Co-authored-by: IDX GitHub Automation <infra+github-automation@dfinity.org> Co-authored-by: Pierugo Pace <pierugo.pace@dfinity.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Remote NiDKG requests (i.e. for subnet recovery/creation) currently take anywhere between 1 and 2 DKG intervals of the source subnet to complete. This is because once a request is made, the corresponding NiDKG config will be included only in the following summary block, and the final transcript is only created in the subsequent summary block.
We can reduce this latency by creating and delivering the transcripts already as part of a data block, as soon as enough dealings are available on the blockchain.
In #8644, we introduced the new data payload field required to hold and deliver NiDKG transcripts to execution. Initially, payloads containing these early remote transcripts were not created, and did not pass validation.
Proposed Changes
This PR changes the data payload builder to create and include transcripts as soon as enough dealings are available on the parent chain. For now, we will include at most 2 transcripts per data payload. The payload validator validates early transcripts by re-creating them with the same inputs, and checking for equality.
Note that
SetupInitialDKGrequests require 2 transcripts, whereasReshareChainKeyrequests for VetKeys require 1 transcript. Furthermore, execution expects us to deliver all transcripts for a given request in the same payload. Therefore, we make sure that for a given target ID, we will either include all transcript results in the data payload, or none of them.Generally, the introduced code paths should be seen as an optimization. NiDkg configs and transcripts continue to be created as part of summary blocks (unless the transcript was already created as part of an earlier data block).
One special case is the following: Assume early transcript creation for a
SetupInitialDKGrequest fails, but succeeds for one of the transcripts when building the summary payload. Then the summary payload containing the transcript will also contain a config for the second transcript that failed (it will be retried in the next interval).In order to handle this case as part of a following data block, we would have to identify and clone the completed sibling transcript into the data payload creating the second transcript. This would add some complexity which we avoid for now by handling this case exclusively as part of summary blocks. Specifically: we will only handle target IDs as part of data blocks for which all configs can be found in the previous summary. Any other cases (retries, timeouts) continue to be handled and initiated by the summary payload.