Add a cancellation state to the state machine and remove syncTraceActive_#1355
Open
ryanzhang22 wants to merge 3 commits intopytorch:mainfrom
Open
Add a cancellation state to the state machine and remove syncTraceActive_#1355ryanzhang22 wants to merge 3 commits intopytorch:mainfrom
ryanzhang22 wants to merge 3 commits intopytorch:mainfrom
Conversation
…h#1269) Summary: Part 1 of a refactor to improve safety and readability around sync/async profiling. This is a straightforward (no-op) split that creates Sync/AsyncActivityProfilerHandler, which contains the methods which belonged to the ActivityProfilerController but are only used for one type of tracing. The original methods in the controller become thin wrappers that forward the request to the appropriate handler. Differential Revision: D92550422
Summary: Part 2 of a refactor to improve safety and readability around sync/async profiling. Sync profiling currently only uses the profiler state machine to check whether `isActive()` is true. However that doesn't tell us if sync or async is the one that actually owns the profiler at that point, which can lead to edge cases when we make decisions about whether or not to proceed based on the profiler state. See the example below. Since the state machine behavior is only relevant for async tracing, we move the related functions from the `GenericActivityProfiler` into the Async handler. The `GenericActivityProfiler` just becomes a set of functions used to drive profiling, without needing to worry about transitions. Each handler owns their own `isActive()` state, and `ActivityProfilerController` makes decisions about how to handle scheduling between sync and async. In particular, we add a `cancel()` function both handlers so the controller can make sure both handlers are in a good state before preempting. For the async trace, `cancel()` will also clear the async config and move the state back to `WaitForRequest`. This change makes ownership explicit, which should hopefully make reasoning about preemptions from the controller perspective easier. Differential Revision: D93475197
…ive_ Summary: Previously, we added `syncTraceActive_` so that async doesn't step into its state machine loop while sync tracing is active, as both sync and async will then simultaneously drive the profiler and cause undefined behavior. In the previous PRs, we added `isAsyncActive()` to check if async owns the profiler. We still needed `syncTraceActive_` at that point because `isAsyncActive()` is reset back to `False` only at the end of `cancel()`, so we can theoretically step into the loop in the middle of cancellation. Consider the following scenario: 1. We get a preemption request, and the main async thread starts running `cancel()`. 2. The profiler thread checks `isAsyncActive()` to decide whether to step into `performRunLoopStep`. 3. If `cancel()` is midway through running, `isAsyncActive()` still returns true -- so we step into the loop, potentially causing bad behavior. We introduce a `Cancelling` state that we set at the start of `cancel()`. If we do end up stepping into the loop, we see this state and immediately break. Differential Revision: D100019410
|
@ryanzhang22 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100019410. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Previously, we added
syncTraceActive_so that async doesn't step into its state machine loop while sync tracing is active, as both sync and async will then simultaneously drive the profiler and cause undefined behavior.In the previous PRs, we added
isAsyncActive()to check if async owns the profiler. We still neededsyncTraceActive_at that point becauseisAsyncActive()is reset back toFalseonly at the end ofcancel(), so we can theoretically step into the loop in the middle of cancellation. Consider the following scenario:cancel().isAsyncActive()to decide whether to step intoperformRunLoopStep.cancel()is midway through running,isAsyncActive()still returns true -- so we step into the loop, potentially causing bad behavior.We introduce a
Cancellingstate that we set at the start ofcancel(). If we do end up stepping into the loop, we see this state and immediately break.Differential Revision: D100019410