[VPlan] Unify inner and outer loop paths (NFCI).#192868
Conversation
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesMove combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan. This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop. Patch is 29.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/192868.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 56a2fc8ecd07a..58b642b54a2ec 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -562,10 +562,6 @@ class LoopVectorizationPlanner {
/// interleaving should be avoided up-front, no plans are generated.
void plan(ElementCount UserVF, unsigned UserIC);
- /// Use the VPlan-native path to plan how to best vectorize, return the best
- /// VF and its cost.
- VectorizationFactor planInVPlanNativePath(ElementCount UserVF);
-
/// Return the VPlan for \p VF. At the moment, there is always a single VPlan
/// for each VF.
VPlan &getPlanFor(ElementCount VF) const;
@@ -654,34 +650,22 @@ class LoopVectorizationPlanner {
unsigned OrigLoopInvocationWeight, unsigned EstimatedVFxUF,
bool DisableRuntimeUnroll);
-protected:
- /// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
- /// according to the information gathered by Legal when it checked if it is
- /// legal to vectorize the loop.
- void buildVPlans(ElementCount MinVF, ElementCount MaxVF);
-
private:
- /// Build a VPlan according to the information gathered by Legal. \return a
- /// VPlan for vectorization factors \p Range.Start and up to \p Range.End
- /// exclusive, possibly decreasing \p Range.End. If no VPlan can be built for
- /// the input range, set the largest included VF to the maximum VF for which
- /// no plan could be built.
- VPlanPtr tryToBuildVPlan(VFRange &Range);
-
- /// Build a VPlan using VPRecipes according to the information gather by
- /// Legal. This method is only used for the legacy inner loop vectorizer.
- /// \p Range's largest included VF is restricted to the maximum VF the
- /// returned VPlan is valid for. If no VPlan can be built for the input range,
- /// set the largest included VF to the maximum VF for which no plan could be
- /// built. Each VPlan is built starting from a copy of \p InitialPlan, which
- /// is a plain CFG VPlan wrapping the original scalar loop.
+ /// Build a VPlan using VPRecipes according to the information gathered by
+ /// Legal and VPlan-based analysis. For outer loops, performs basic recipe
+ /// conversion only. For inner loops, \p Range's largest included VF is
+ /// restricted to the maximum VF the returned VPlan is valid for. If no VPlan
+ /// can be built for the input range, set the largest included VF to the
+ /// maximum VF for which no plan could be built. Each VPlan is built starting
+ /// from a copy of \p InitialPlan, which is a plain CFG VPlan wrapping the
+ /// original scalar loop.
VPlanPtr tryToBuildVPlanWithVPRecipes(VPlanPtr InitialPlan, VFRange &Range,
LoopVersioning *LVer);
/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is
- /// legal to vectorize the loop. This method creates VPlans using VPRecipes.
- void buildVPlansWithVPRecipes(ElementCount MinVF, ElementCount MaxVF);
+ /// legal to vectorize the loop.
+ void buildVPlans(ElementCount MinVF, ElementCount MaxVF);
/// Add ComputeReductionResult recipes to the middle block to compute the
/// final reduction results. Add Select recipes to the latch block when
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e17a5b5434664..e13acdfa6bc9d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3446,8 +3446,61 @@ FixedScalableVFPair LoopVectorizationCostModel::computeFeasibleMaxVF(
return Result;
}
+// This function will select a scalable VF if the target supports scalable
+// vectors and a fixed one otherwise.
+// TODO: we could return a pair of values that specify the max VF and
+// min VF, to be used in `buildVPlans(MinVF, MaxVF)` instead of
+// `buildVPlans(VF, VF)`. We cannot do it because VPLAN at the moment
+// doesn't have a cost model that can choose which plan to execute if
+// more than one is generated.
+static ElementCount determineVPlanVF(const TargetTransformInfo &TTI,
+ LoopVectorizationCostModel &CM) {
+ auto [_, WidestType] = CM.getSmallestAndWidestTypes();
+
+ auto RegKind = TTI.enableScalableVectorization()
+ ? TargetTransformInfo::RGK_ScalableVector
+ : TargetTransformInfo::RGK_FixedWidthVector;
+
+ TypeSize RegSize = TTI.getRegisterBitWidth(RegKind);
+ unsigned N = RegSize.getKnownMinValue() / WidestType;
+ return ElementCount::get(N, RegSize.isScalable());
+}
+
FixedScalableVFPair
LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
+ // For outer loops, use simple type-based heuristic VF. No cost model or
+ // memory dependence analysis is available.
+ if (!TheLoop->isInnermost()) {
+ ElementCount VF = UserVF;
+ if (VF.isZero()) {
+ VF = determineVPlanVF(TTI, *this);
+ LLVM_DEBUG(dbgs() << "LV: VPlan computed VF " << VF << ".\n");
+
+ // Make sure we have a VF > 1 for stress testing.
+ if (VPlanBuildStressTest && VF.isScalar()) {
+ LLVM_DEBUG(dbgs() << "LV: VPlan stress testing: "
+ << "overriding computed VF.\n");
+ VF = ElementCount::getFixed(4);
+ }
+ } else if (VF.isScalable() && !TTI.supportsScalableVectors() &&
+ !ForceTargetSupportsScalableVectors) {
+ reportVectorizationFailure(
+ "Scalable vectorization requested but not supported by the target",
+ "the scalable user-specified vectorization width for outer-loop "
+ "vectorization cannot be used because the target does not support "
+ "scalable vectors.",
+ "ScalableVFUnfeasible", ORE, TheLoop);
+ return FixedScalableVFPair::getNone();
+ }
+ assert(isPowerOf2_32(VF.getKnownMinValue()) &&
+ "VF needs to be a power of two");
+ if (VF.isScalar())
+ return FixedScalableVFPair::getNone();
+ LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")
+ << "VF " << VF << " to build VPlans.\n");
+ return FixedScalableVFPair(VF);
+ }
+
if (Legal->getRuntimePointerChecking()->Need && TTI.hasBranchDivergence()) {
// TODO: It may be useful to do since it's still likely to be dynamically
// uniform if the target can skip.
@@ -6532,85 +6585,7 @@ void LoopVectorizationCostModel::collectInLoopReductions() {
}
}
-// This function will select a scalable VF if the target supports scalable
-// vectors and a fixed one otherwise.
-// TODO: we could return a pair of values that specify the max VF and
-// min VF, to be used in `buildVPlans(MinVF, MaxVF)` instead of
-// `buildVPlans(VF, VF)`. We cannot do it because VPLAN at the moment
-// doesn't have a cost model that can choose which plan to execute if
-// more than one is generated.
-static ElementCount determineVPlanVF(const TargetTransformInfo &TTI,
- LoopVectorizationCostModel &CM) {
- unsigned WidestType;
- std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();
-
- TargetTransformInfo::RegisterKind RegKind =
- TTI.enableScalableVectorization()
- ? TargetTransformInfo::RGK_ScalableVector
- : TargetTransformInfo::RGK_FixedWidthVector;
-
- TypeSize RegSize = TTI.getRegisterBitWidth(RegKind);
- unsigned N = RegSize.getKnownMinValue() / WidestType;
- return ElementCount::get(N, RegSize.isScalable());
-}
-
-VectorizationFactor
-LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {
- ElementCount VF = UserVF;
- // Outer loop handling: They may require CFG and instruction level
- // transformations before even evaluating whether vectorization is profitable.
- // Since we cannot modify the incoming IR, we need to build VPlan upfront in
- // the vectorization pipeline.
- if (!OrigLoop->isInnermost()) {
- // If the user doesn't provide a vectorization factor, determine a
- // reasonable one.
- if (UserVF.isZero()) {
- VF = determineVPlanVF(TTI, CM);
- LLVM_DEBUG(dbgs() << "LV: VPlan computed VF " << VF << ".\n");
-
- // Make sure we have a VF > 1 for stress testing.
- if (VPlanBuildStressTest && (VF.isScalar() || VF.isZero())) {
- LLVM_DEBUG(dbgs() << "LV: VPlan stress testing: "
- << "overriding computed VF.\n");
- VF = ElementCount::getFixed(4);
- }
- } else if (UserVF.isScalable() && !TTI.supportsScalableVectors() &&
- !ForceTargetSupportsScalableVectors) {
- LLVM_DEBUG(dbgs() << "LV: Not vectorizing. Scalable VF requested, but "
- << "not supported by the target.\n");
- reportVectorizationFailure(
- "Scalable vectorization requested but not supported by the target",
- "the scalable user-specified vectorization width for outer-loop "
- "vectorization cannot be used because the target does not support "
- "scalable vectors.",
- "ScalableVFUnfeasible", ORE, OrigLoop);
- return VectorizationFactor::Disabled();
- }
- assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
- assert(isPowerOf2_32(VF.getKnownMinValue()) &&
- "VF needs to be a power of two");
- LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")
- << "VF " << VF << " to build VPlans.\n");
- buildVPlans(VF, VF);
-
- if (VPlans.empty())
- return VectorizationFactor::Disabled();
-
- // For VPlan build stress testing, we bail out after VPlan construction.
- if (VPlanBuildStressTest)
- return VectorizationFactor::Disabled();
-
- return {VF, 0 /*Cost*/, 0 /* ScalarCost */};
- }
-
- LLVM_DEBUG(
- dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
- "VPlan-native path.\n");
- return VectorizationFactor::Disabled();
-}
-
void LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
- assert(OrigLoop->isInnermost() && "Inner loop expected.");
CM.collectValuesToIgnore();
CM.collectElementTypesForWidening();
@@ -6618,6 +6593,16 @@ void LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
if (!MaxFactors) // Cases that should not to be vectorized nor interleaved.
return;
+ if (!OrigLoop->isInnermost()) {
+ // For outer loops, computeMaxVF returns a single non-scalar VF; build a
+ // plan for only that VF.
+ ElementCount VF =
+ MaxFactors.FixedVF ? MaxFactors.FixedVF : MaxFactors.ScalableVF;
+ buildVPlans(VF, VF);
+ LLVM_DEBUG(printPlans(dbgs()));
+ return;
+ }
+
// Invalidate interleave groups if all blocks of loop will be predicated.
if (CM.blockNeedsPredicationForAnyReason(OrigLoop->getHeader()) &&
!useMaskedInterleavedAccesses(TTI)) {
@@ -6656,9 +6641,9 @@ void LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
ElementCount::isKnownLT(EpilogueUserVF, UserVF) &&
CM.selectUserVectorizationFactor(EpilogueUserVF)) {
// Build a separate plan for the forced epilogue VF.
- buildVPlansWithVPRecipes(EpilogueUserVF, EpilogueUserVF);
+ buildVPlans(EpilogueUserVF, EpilogueUserVF);
}
- buildVPlansWithVPRecipes(UserVF, UserVF);
+ buildVPlans(UserVF, UserVF);
LLVM_DEBUG(printPlans(dbgs()));
return;
}
@@ -6677,13 +6662,11 @@ void LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
VFCandidates.push_back(VF);
CM.collectInLoopReductions();
- for (const auto &VF : VFCandidates) {
- // Collect Uniform and Scalar instructions after vectorization with VF.
+ for (auto VF : VFCandidates)
CM.collectNonVectorizedAndSetWideningDecisions(VF);
- }
- buildVPlansWithVPRecipes(ElementCount::getFixed(1), MaxFactors.FixedVF);
- buildVPlansWithVPRecipes(ElementCount::getScalable(1), MaxFactors.ScalableVF);
+ buildVPlans(ElementCount::getFixed(1), MaxFactors.FixedVF);
+ buildVPlans(ElementCount::getScalable(1), MaxFactors.ScalableVF);
LLVM_DEBUG(printPlans(dbgs()));
}
@@ -6917,6 +6900,12 @@ LoopVectorizationPlanner::computeBestVF() {
}
}
+ // For outer loops, the plan has a single vector VF determined by the
+ // heuristic. Return it directly since there is no scalar VF plan for cost
+ // comparison.
+ if (!OrigLoop->isInnermost())
+ return {VectorizationFactor(FirstPlan.getSingleVF(), 0, 0), &FirstPlan};
+
LLVM_DEBUG(dbgs() << "LV: Computing best VF using cost kind: "
<< (CM.CostKind == TTI::TCK_RecipThroughput
? "Reciprocal Throughput\n"
@@ -7657,34 +7646,42 @@ VPRecipeBuilder::tryToCreateWidenNonPhiRecipe(VPSingleDefRecipe *R,
// optimizations.
static void printOptimizedVPlan(VPlan &) {}
-void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
- ElementCount MaxVF) {
+void LoopVectorizationPlanner::buildVPlans(ElementCount MinVF,
+ ElementCount MaxVF) {
if (ElementCount::isKnownGT(MinVF, MaxVF))
return;
- assert(OrigLoop->isInnermost() && "Inner loop expected.");
-
- const LoopAccessInfo *LAI = Legal->getLAI();
- LoopVersioning LVer(*LAI, LAI->getRuntimePointerChecking()->getChecks(),
- OrigLoop, LI, DT, PSE.getSE());
- if (!LAI->getRuntimePointerChecking()->getChecks().empty() &&
- !LAI->getRuntimePointerChecking()->getDiffChecks()) {
- // Only use noalias metadata when using memory checks guaranteeing no
- // overlap across all iterations.
- LVer.prepareNoAliasMetadata();
+ bool IsInnerLoop = OrigLoop->isInnermost();
+
+ // Set up loop versioning for inner loops with memory runtime checks.
+ // Outer loops don't have LoopAccessInfo since canVectorizeMemory() is not
+ // called for them.
+ std::optional<LoopVersioning> LVer;
+ if (IsInnerLoop) {
+ const LoopAccessInfo *LAI = Legal->getLAI();
+ LVer.emplace(*LAI, LAI->getRuntimePointerChecking()->getChecks(), OrigLoop,
+ LI, DT, PSE.getSE());
+ if (!LAI->getRuntimePointerChecking()->getChecks().empty() &&
+ !LAI->getRuntimePointerChecking()->getDiffChecks()) {
+ // Only use noalias metadata when using memory checks guaranteeing no
+ // overlap across all iterations.
+ LVer->prepareNoAliasMetadata();
+ }
}
// Create initial base VPlan0, to serve as common starting point for all
// candidates built later for specific VF ranges.
auto VPlan0 = VPlanTransforms::buildVPlan0(
OrigLoop, *LI, Legal->getWidestInductionType(),
- getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE, &LVer);
+ getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE,
+ LVer ? &*LVer : nullptr);
- // Create recipes for header phis.
- RUN_VPLAN_PASS(VPlanTransforms::createHeaderPhiRecipes, *VPlan0, PSE,
- *OrigLoop, Legal->getInductionVars(),
- Legal->getReductionVars(), Legal->getFixedOrderRecurrences(),
- CM.getInLoopReductions(), Hints.allowReordering());
+ // Create recipes for header phis. For outer loops, reductions, recurrences
+ // and in-loop reductions are empty since legality doesn't detect them.
+ RUN_VPLAN_PASS(VPlanTransforms::createHeaderPhiRecipes,
+ *VPlan0, PSE, *OrigLoop, Legal->getInductionVars(),
+ Legal->getReductionVars(), Legal->getFixedOrderRecurrences(),
+ CM.getInLoopReductions(), Hints.allowReordering());
RUN_VPLAN_PASS(VPlanTransforms::simplifyRecipes, *VPlan0);
// If we're vectorizing a loop with an uncountable exit, make sure that the
@@ -7707,40 +7704,59 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
RUN_VPLAN_PASS(VPlanTransforms::createLoopRegions, *VPlan0);
if (CM.foldTailByMasking())
RUN_VPLAN_PASS(VPlanTransforms::foldTailByMasking, *VPlan0);
- RUN_VPLAN_PASS(VPlanTransforms::introduceMasksAndLinearize, *VPlan0);
+ // introduceMasksAndLinearize does not support nested loop regions yet.
+ if (IsInnerLoop)
+ RUN_VPLAN_PASS(VPlanTransforms::introduceMasksAndLinearize,
+ *VPlan0);
auto MaxVFTimes2 = MaxVF * 2;
for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {
VFRange SubRange = {VF, MaxVFTimes2};
- if (auto Plan = tryToBuildVPlanWithVPRecipes(
- std::unique_ptr<VPlan>(VPlan0->duplicate()), SubRange, &LVer)) {
- // Now optimize the initial VPlan.
- VPlanTransforms::hoistPredicatedLoads(*Plan, PSE, OrigLoop);
- VPlanTransforms::sinkPredicatedStores(*Plan, PSE, OrigLoop);
- RUN_VPLAN_PASS(VPlanTransforms::truncateToMinimalBitwidths, *Plan,
- CM.getMinimalBitwidths());
- RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
- // TODO: try to put addExplicitVectorLength close to addActiveLaneMask
- if (CM.foldTailWithEVL()) {
- RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
- CM.getMaxSafeElements());
- RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
- }
+ auto Plan = tryToBuildVPlanWithVPRecipes(
+ std::unique_ptr<VPlan>(VPlan0->duplicate()), SubRange,
+ LVer ? &*LVer : nullptr);
+ VF = SubRange.End;
- if (auto P = VPlanTransforms::narrowInterleaveGroups(*Plan, TTI))
- VPlans.push_back(std::move(P));
+ if (!Plan)
+ continue;
- RUN_VPLAN_PASS_NO_VERIFY(printOptimizedVPlan, *Plan);
- assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
- VPlans.push_back(std::move(Plan));
+ VPlanTransforms::hoistPredicatedLoads(*Plan, PSE, OrigLoop);
+ VPlanTransforms::sinkPredicatedStores(*Plan, PSE, OrigLoop);
+ RUN_VPLAN_PASS(VPlanTransforms::truncateToMinimalBitwidths, *Plan,
+ CM.getMinimalBitwidths());
+ RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
+ // TODO: try to put addExplicitVectorLength close to addActiveLaneMask
+ if (CM.foldTailWithEVL()) {
+ RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
+ CM.getMaxSafeElements());
+ RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
}
- VF = SubRange.End;
+
+ if (auto P = VPlanTransforms::narrowInterleaveGroups(*Plan, TTI))
+ VPlans.push_back(std::move(P));
+
+ RUN_VPLAN_PASS_NO_VERIFY(printOptimizedVPlan, *Plan);
+ assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
+ VPlans.push_back(std::move(Plan));
}
}
VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
VPlanPtr Plan, VFRange &Range, LoopVersioning *LVer) {
+ // For outer loops, the plan only needs basic recipe conversion and induction
+ // live-out optimization; the full inner-loop recipe building below does not
+ // apply (no widening decisions, interleave groups, reductions, etc.).
+ if (!OrigLoop->isInnermost()) {
+ for (ElementCount VF : Range)
+ Plan->addVF(VF);
+ if (!VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI))
+ return nullptr;
+ VPlanTransforms::optimizeInductionLiveOutUsers(*Plan, PSE,
+ /*FoldTail=*/false);
+ return Plan;
+ }
+
using namespace llvm::VPlanPatternMatch;
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;
@@ -7973,46 +7989,6 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
return Plan;
}
-VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
- // Outer loop handling: They may require CFG and instruction level
- // transformations before even evaluating whether vectorization is profitable.
- // Since we cannot modify the incoming IR, we need to build VPlan upfront in
- // the vectorization pipeline.
- assert(!OrigLoop->isInnermost());
- assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
-
- auto Plan = VPlanTransforms::buildVPlan0(
- OrigLoop, *LI, Legal->getWidestInductionType(),
- getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE);
-
- VPlanTransform...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Move combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan. This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop.
81b93f5 to
bba732e
Compare
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
) Reduce nesting by using early continue, split off from #192868
…NFC). (#193979) Reduce nesting by using early continue, split off from llvm/llvm-project#192868
…NFC). (#193979) Reduce nesting by using early continue, split off from llvm/llvm-project#192868
artagnon
left a comment
There was a problem hiding this comment.
Thanks, the patch is much clearer now, and it LGTM! Excellent improvement!
…NFC). (#193979) Reduce nesting by using early continue, split off from llvm/llvm-project#192868
| // For outer loops, the plan only needs basic recipe conversion and induction | ||
| // live-out optimization; the full inner-loop recipe building below does not | ||
| // apply (no widening decisions, interleave groups, reductions, etc.). | ||
| if (!OrigLoop->isInnermost()) { |
There was a problem hiding this comment.
Once Plan is built, could it inform whether it models an outer or innermost loop, rather than continuing to rely on OrigLoop?
There was a problem hiding this comment.
Done by adding hasOuterLoop() checking if the top-level loop region contains another loop region
| // TODO: we could return a pair of values that specify the max VF and | ||
| // min VF, to be used in `buildVPlans(MinVF, MaxVF)` instead of | ||
| // `buildVPlans(VF, VF)`. We cannot do it because VPLAN at the moment | ||
| // doesn't have a cost model that can choose which plan to execute if | ||
| // more than one is generated. |
There was a problem hiding this comment.
This TODO retains the existing one, but going forward outerloops should compute their Max/VF using the methods doing so for innermost loops. Allowing only VF-agnostic decisions in outerloops should result in a single VPlan for the entire feasible range, from which the desired VF can be selected, even w/o / before cost model support.
| // `buildVPlans(VF, VF)`. We cannot do it because VPLAN at the moment | ||
| // doesn't have a cost model that can choose which plan to execute if | ||
| // more than one is generated. | ||
| static ElementCount determineVPlanVF(const TargetTransformInfo &TTI, |
There was a problem hiding this comment.
| static ElementCount determineVPlanVF(const TargetTransformInfo &TTI, | |
| static ElementCount computeVPlanOuterloopVF(const TargetTransformInfo &TTI, |
this retains the existing version, but worth renaming more accurately, and consistent with other compute*VF()'s.
Could this type-based-only factor serve the innermost Max VF computations too?
There was a problem hiding this comment.
Updated the name. Will check if we can re-use this for inner loop as well, but the logic there is quite a bit more involved
| ElementCount VF = UserVF; | ||
| if (VF.isZero()) { | ||
| VF = determineVPlanVF(TTI, Config); | ||
| LLVM_DEBUG(dbgs() << "LV: VPlan computed VF " << VF << ".\n"); | ||
|
|
||
| // Make sure we have a VF > 1 for stress testing. | ||
| if (VPlanBuildStressTest && VF.isScalar()) { | ||
| LLVM_DEBUG(dbgs() << "LV: VPlan stress testing: " | ||
| << "overriding computed VF.\n"); | ||
| VF = ElementCount::getFixed(4); | ||
| } | ||
| } else if (VF.isScalable() && !Config.supportsScalableVectors()) { | ||
| reportVectorizationFailure( | ||
| "Scalable vectorization requested but not supported by the target", | ||
| "the scalable user-specified vectorization width for outer-loop " | ||
| "vectorization cannot be used because the target does not support " | ||
| "scalable vectors.", | ||
| "ScalableVFUnfeasible", ORE, TheLoop); | ||
| return FixedScalableVFPair::getNone(); | ||
| } | ||
| assert(isPowerOf2_32(VF.getKnownMinValue()) && | ||
| "VF needs to be a power of two"); | ||
| if (VF.isScalar()) | ||
| return FixedScalableVFPair::getNone(); | ||
| LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "") | ||
| << "VF " << VF << " to build VPlans.\n"); | ||
| return FixedScalableVFPair(VF); |
There was a problem hiding this comment.
This retains the current code, but worth considering folding it all under computeVPlanOuterloopVF().
There was a problem hiding this comment.
Done, also moved to VFSelectionContext, thanks
| // For outer loops, computeMaxVF returns a single non-scalar VF; build a | ||
| // plan for only that VF. |
There was a problem hiding this comment.
Could the current outerloop behavior be consistent with UserVF/UserIC-dictated innermost behavior, both resulting in a single plan. E.g., should computeMaxVF() also indicate if it returned a single MaxVF (scalable or not) which is also the MinVF.
There was a problem hiding this comment.
I'll leave this separate in the initial patch, as there's more logic we need to run when selecting the user VF for inner loops
| @@ -6153,6 +6139,15 @@ LoopVectorizationPlanner::computeBestVF() { | |||
| return {VectorizationFactor::Disabled(), nullptr}; | |||
| // If there is a single VPlan with a single VF, return it directly. | |||
There was a problem hiding this comment.
May be good to return to the above behavior: if no plans exist, return null; if only one - assert either UsedVF or outerloop - return it. (If two - check if UserVF and EpilogUserVF, etc.).
|
|
||
| // For outer loops, the plan has a single vector VF determined by the | ||
| // heuristic. | ||
| if (!OrigLoop->isInnermost()) { |
There was a problem hiding this comment.
Better ask FirstPlan if it models an outerloop, than OrigLoop.
| if (!OrigLoop->isInnermost()) { | ||
| for (ElementCount VF : Range) | ||
| Plan->addVF(VF); | ||
| if (!VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI)) |
There was a problem hiding this comment.
Plan will be discarded later, automatically?
|
|
||
| // Analyze interleaved memory accesses. | ||
| if (UseInterleaved) | ||
| if (UseInterleaved && IsInnerLoop) |
There was a problem hiding this comment.
Should IsInnerLoop be folded into UseInterleaved?
| if (ORE->allowExtraAnalysis(LV_NAME)) | ||
| // For VPlan build stress testing of outer loops, bail after plan | ||
| // construction. | ||
| if (!IsInnerLoop && VPlanBuildStressTest) |
There was a problem hiding this comment.
Should !IsInnerLoop be folded into VPlanBuildStressTest, better renamed VPlanBuildOuterloopStressTest.
There was a problem hiding this comment.
Unfortuantely given that VPlanBuildStressTest is an option, so I don't think there's a good way to set this at a single place.
efc0a5c to
47e9f9b
Compare
…#193979) Reduce nesting by using early continue, split off from llvm#192868
…#193979) Reduce nesting by using early continue, split off from llvm#192868
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/19953 Here is the relevant piece of the build log for the reference |
Move combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan. This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop. PR: llvm/llvm-project#192868
Move combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan. This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop. PR: llvm/llvm-project#192868
…plans. (#196634) For phis check if any of the operands are VPIRValues or we already have cached types. If so, return them. This fixes a verification stack overflow in the VPlan outer loop path after llvm/llvm-project#192868.
Move combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan. This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop. PR: llvm/llvm-project#192868
…plans. (#196634) For phis check if any of the operands are VPIRValues or we already have cached types. If so, return them. This fixes a verification stack overflow in the VPlan outer loop path after llvm/llvm-project#192868.
…plans. (#196634) For phis check if any of the operands are VPIRValues or we already have cached types. If so, return them. This fixes a verification stack overflow in the VPlan outer loop path after llvm/llvm-project#192868.
Move combine the logic of tryToBuildVPlanWithVPRecipes and tryToBuildVPlan, as well as planInVPlanNativePath and plan.
This unifies the code paths to construct plans for both inner and outer loop vectorization, and removes some duplication. It also ensures we run almost the same VPlan-transformations in both modes. Currently a few code paths need to be guarded with a check if we are dealing with an inner and outer loop.