Conversation
…ergency mode if election failed before the threshold was reached
|
2 points after discussing this:
|
Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com>
Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com>
| #[pallet::constant] | ||
| type SignedPhase: Get<Self::BlockNumber>; | ||
|
|
||
| /// Minimum number of signed and unsigned blocks that EPM expects for a successful |
There was a problem hiding this comment.
May be a bit more documentation about how to set this number? Setting a number equal or higher than signed phase + unsigned phase would mean fallback is never called, while setting it to 0 implies fallback is always called?
Also what do you think of a name like FallbackThresholdBlocks?
There was a problem hiding this comment.
Setting a number equal or higher than signed phase + unsigned phase would mean fallback is never called
This is not quite the case. We keep a counter of signed+unsigned phase blocks that keeps increasing since the last successful election. So the fallback/emergency won't be triggered in the first round if fn elect fails, but will in the subsequent ones.
while setting it to 0 implies fallback is always called?
this is correct, I will add it.
There was a problem hiding this comment.
Also what do you think of a name like
FallbackThresholdBlocks
I thought of it as well. TBH, I think it's easier to reason about emergency phase rather than fallback throttling (especially because we don't even have fallback configured in Polkadot/Kusama). My reasoning for this (and how I tink about the EPM phase transitions) is that emergency phase is the real deal here. If an election fails, we start the emergency phase process by first trying the fallback.
I will improve the documentation in any case, we can do better and ensure that it's documented that the fallback is throttled as a result of the emergency phase being throttled.
wdyt?
Co-authored-by: Ankan <10196091+Ank4n@users.noreply.github.com>
|
The CI pipeline was cancelled due to failure one of the required jobs. |
|
The CI pipeline was cancelled due to failure one of the required jobs. |
As a follow up from the Kusama incident and our previous discussions, this PR fixes the knee jerk reaction of EPM to transition to the emergency phase. There were 2 reasons why EPM entered in emergency mode: 1) the staking pallet called
election_provider::elect()prematurely due toForcing::ForceNew, before EPM had enough time to prepare the next election results for the next era; and 2) there was no election fallback configured.This PR adds
T::MinElectingBlocksconfig to EPM that defines the minimum number of "electing" blocks (ie. signed and unsigned) in a round before an election failure tries the fallback elections and/or transitions to emergency phase. If the minimum electing blocks did not passed since last successful election and the election failed, the emergency phase will not be set.T::MinElectingBlocksis expressed in number of signed and unsigned blocks that are expected for an election to round be successful to run, i.e. for the election results to be queued.T::MinElectingBlocksis used to decide whether the emergency phase and fallback elections should be triggered when an election failed. At each election round, a storage valueElectionBlocksCountis reset and for each signed and unsigned blocks that pass in the round, the counter is incremented.The logic that decides if the emergency phase and fallback election should be throttled is implemented in
fn emergency_phase_throttling, namely:T:: MinElectingBlocks == 0, emergency throttling is disabled.Phase::Off, throttle fallback election and emergency phase if election fails.Phase::Signed, throttle fallback election and emergency phase iffT::MinElectingBlocks > ElectionBlocksCount::get()for the present round.Phase::Unsigned, throttle fallback election and emergency phase iffT::MinElectingBlocks > ElectionBlocksCount::get()for the present round.The advantage of this PR is that the changes are local to EPM and easy to reason about. This logic is transparent from staking, the difference is that a failed election may self-heal as staking keeps trying to fetch the election results in subsequent sessions.
polkadot companion: paritytech/polkadot#6825