MLIR 23.0.0git
Mem2Reg.cpp
Go to the documentation of this file.
1//===- Mem2Reg.cpp - Promotes memory slots into values ----------*- C++ -*-===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8
13#include "mlir/IR/Builders.h"
14#include "mlir/IR/Dominance.h"
17#include "mlir/IR/Value.h"
21#include "llvm/ADT/STLExtras.h"
22#include "llvm/ADT/SetVector.h"
23#include "llvm/Support/DebugLog.h"
24#include "llvm/Support/GenericIteratedDominanceFrontier.h"
25
26namespace mlir {
27#define GEN_PASS_DEF_MEM2REG
28#include "mlir/Transforms/Passes.h.inc"
29} // namespace mlir
30
31#define DEBUG_TYPE "mem2reg"
32
33using namespace mlir;
34
35/// mem2reg
36///
37/// This pass turns unnecessary uses of automatically allocated memory slots
38/// into direct Value-based operations. For example, it will simplify storing a
39/// constant in a memory slot to immediately load it to a direct use of that
40/// constant. In other words, given a memory slot addressed by a non-aliased
41/// "pointer" Value, mem2reg removes all the uses of that pointer.
42///
43/// Within a block, this is done by following the chain of stores and loads of
44/// the slot and replacing the results of loads with the values previously
45/// stored. If a load happens before any other store, a poison value is used
46/// instead.
47///
48/// Control flow can create situations where a load could be replaced by
49/// multiple possible stores depending on the control flow path taken. As a
50/// result, this pass must introduce new block arguments in some blocks to
51/// accommodate for the multiple possible definitions. Each predecessor will
52/// populate the block argument with the definition reached at its end. With
53/// this, the value stored can be well defined at block boundaries, allowing
54/// the propagation of replacement through blocks.
55///
56/// The way regions are handled in the transformation is by offering an
57/// interface to express the behavior of the allocation value at the edges of
58/// the regions: from a particular definition reaching the region operation, the
59/// operation will specify what the reaching definition at the entry of its
60/// regions are (potentially mutating itself, for example to add region
61/// arguments). Likewise, provided a reaching definition at the end of the
62/// blocks in the regions, the region operation will provide the reaching
63/// definition right after itself.
64///
65/// This pass computes this transformation in two main phases: an analysis
66/// phase that does not mutate IR, and a transformation phase where mutation
67/// happens. Each phase is handled by the `MemorySlotPromotionAnalyzer` and
68/// `MemorySlotPromoter` classes respectively.
69///
70/// The two steps of the analysis phase are the following:
71/// - A first step computes the list of operations that transitively use the
72/// memory slot we would like to promote. The purpose of this phase is to
73/// identify which uses must be removed to promote the slot, either by rewiring
74/// the user or deleting it. Naturally, direct uses of the slot must be removed.
75/// Sometimes additional uses must also be removed: this is notably the case
76/// when a direct user of the slot cannot rewire its use and must delete itself,
77/// and thus must make its users no longer use it. If the allocation is used in
78/// nested regions, it is also ensured the region operations provide the right
79/// interface to analyze the values of the allocation at the edges of its
80/// regions. If any of those constraints cannot be satisfied, promotion cannot
81/// continue: this is decided at this step.
82/// - A second step computes the list of blocks where a block argument will be
83/// needed ("merge points") without mutating the IR. These blocks are the blocks
84/// leading to a definition clash between two predecessors. Such blocks happen
85/// to be the Iterated Dominance Frontier (IDF) of the set of blocks containing
86/// a store, as they represent the points where a clear defining dominator stops
87/// existing. Computing this information in advance allows making sure the
88/// terminators that will forward values are capable of doing so (inability to
89/// do so aborts promotion at this step).
90///
91/// At this point, promotion is guaranteed to happen, and the transformation
92/// phase can begin. For each region of the program, a two step process is
93/// carried out.
94/// - The first step of the per-region process computes the reaching definition
95/// of the memory slot at each blocking user. This is the core of the mem2reg
96/// algorithm, also known as load-store forwarding. This analyses loads and
97/// stores and propagates which value must be stored in the slot at each
98/// blocking user. This is achieved by doing a depth-first walk of the dominator
99/// tree of the function. This is sufficient because the reaching definition at
100/// the beginning of a block is either its new block argument if it is a merge
101/// block, or the definition reaching the end of its immediate dominator (parent
102/// in the dominator tree). We can therefore propagate this information down the
103/// dominator tree to proceed with renaming within blocks. If at any point a
104/// region operation that contains a use of the allocation is encountered, the
105/// transformation process is triggered on the child regions of the encountered
106/// operation, to obtain the reaching definition at its end and carry on with
107/// the value forwarding.
108/// - The second step of the per-region process uses the reaching definition to
109/// remove blocking uses in topological order. Some reaching definitions may
110/// be values that will be removed or modified during the blocking use removal
111/// step (typically, in the case of a store that stores the result of a load).
112/// To properly handle such values, this step traverses the operations to modify
113/// in reverse topological order. This way, if a value that will disappear is
114/// used in place of reaching definition, the logic to make it disappear will be
115/// executed after the value has been used to replace an operation. For regions
116/// within a PromotableRegionOpInterface, in order to correctly handle cases
117/// where the finalization logic would use a reaching definition that will be
118/// replaced, the finalization logic must be called before the blocking use
119/// removal step, so that any use of a value that will be removed gets properly
120/// replaced.
121///
122/// For further reading, chapter three of SSA-based Compiler Design [1]
123/// showcases SSA construction for control-flow graphs, where mem2reg is an
124/// adaptation of the same process.
125///
126/// [1]: Rastello F. & Bouchez Tichadou F., SSA-based Compiler Design (2022),
127/// Springer.
128
129namespace {
130
131using BlockingUsesMap =
132 llvm::MapVector<Operation *, SmallPtrSet<OpOperand *, 4>>;
133using RegionBlockingUsesMap =
134 llvm::SmallMapVector<Region *, BlockingUsesMap, 2>;
135
136using RegionSet = SmallPtrSet<Region *, 32>;
137
138/// Information about regions that will be traversed for promotion, computed
139/// during promotion analysis.
140struct RegionPromotionInfo {
141 /// True if an operation storing to the slot is present in the region.
142 bool hasValueStores;
143};
144
145/// Information computed during promotion analysis used to perform actual
146/// promotion.
147struct MemorySlotPromotionInfo {
148 /// Blocks for which at least two definitions of the slot values clash.
149 SmallPtrSet<Block *, 8> mergePoints;
150 /// Contains, for each each region, the blocking uses for its operations. The
151 /// blocking uses are the uses that must be eliminated by promotion. For each
152 /// region, this is a DAG structure because if an operation must eliminate
153 /// some of its uses, it is because the defining ops of the blocking uses
154 /// requested it. The defining ops therefore must also have blocking uses or
155 /// be the starting point of the blocking uses.
156 RegionBlockingUsesMap userToBlockingUses;
157 /// Regions of which the edges must be analyzed for promotion. All regions
158 /// are guaranteed to be held by a PromotableRegionOpInterface, and to be
159 /// nested within the parent region of the slot pointer.
161};
162
163/// Computes information for basic slot promotion. This will check that direct
164/// slot promotion can be performed, and provide the information to execute the
165/// promotion. This does not mutate IR.
166class MemorySlotPromotionAnalyzer {
167public:
168 MemorySlotPromotionAnalyzer(MemorySlot slot, DominanceInfo &dominance,
169 const DataLayout &dataLayout)
170 : slot(slot), dominance(dominance), dataLayout(dataLayout) {}
171
172 /// Computes the information for slot promotion if promotion is possible,
173 /// returns nothing otherwise.
174 std::optional<MemorySlotPromotionInfo> computeInfo();
175
176private:
177 /// Computes the transitive uses of the slot that block promotion. This finds
178 /// uses that would block the promotion, checks that the operation has a
179 /// solution to remove the blocking use, and potentially forwards the analysis
180 /// if the operation needs further blocking uses resolved to resolve its own
181 /// uses (typically, removing its users because it will delete itself to
182 /// resolve its own blocking uses). This will fail if one of the transitive
183 /// users cannot remove a requested use, and should prevent promotion.
184 /// Resulting blocking uses are grouped by region.
185 /// This also ensures all the uses are within promotable regions, adding
186 /// information about regions to be promoted to the `regionsToPromote` map.
187 LogicalResult computeBlockingUses(
188 RegionBlockingUsesMap &userToBlockingUses,
190
191 /// Computes the points in the provided region where multiple re-definitions
192 /// of the slot's value (stores) may conflict.
193 /// `definingBlocks` is the set of blocks containing a store to the slot,
194 /// either directly or inherited from a nested region.
195 void computeMergePoints(Region *region,
196 SmallPtrSetImpl<Block *> &definingBlocks,
197 SmallPtrSetImpl<Block *> &mergePoints);
198
199 /// Ensures predecessors of merge points can properly provide their current
200 /// definition of the value stored in the slot to the merge point. This can
201 /// notably be an issue if the terminator used does not have the ability to
202 /// forward values through block operands.
203 bool areMergePointsUsable(SmallPtrSetImpl<Block *> &mergePoints);
204
205 MemorySlot slot;
206
207 DominanceInfo &dominance;
208 const DataLayout &dataLayout;
209};
210
211/// Maps a region to a map of blocks to their index in the region.
212/// The region is identified by its entry block pointer instead of its region
213/// pointer to not need to invalidate the cache when region content is moved to
214/// a new region. This only supports moves of all the blocks of a region to
215/// an empty region.
216using BlockIndexCache = DenseMap<Block *, DenseMap<Block *, size_t>>;
217
218/// The MemorySlotPromoter handles the state of promoting a memory slot. It
219/// wraps a slot and its associated allocator. This will perform the mutation of
220/// IR.
221class MemorySlotPromoter {
222public:
223 MemorySlotPromoter(MemorySlot slot, PromotableAllocationOpInterface allocator,
224 OpBuilder &builder, DominanceInfo &dominance,
225 const DataLayout &dataLayout, MemorySlotPromotionInfo info,
226 const Mem2RegStatistics &statistics,
227 BlockIndexCache &blockIndexCache);
228
229 /// Actually promotes the slot by mutating IR. Promoting a slot DOES
230 /// invalidate the MemorySlotPromotionInfo of other slots. Preparation of
231 /// promotion info should NOT be performed in batches.
232 /// Returns a promotable allocation op if a new allocator was created, nullopt
233 /// otherwise.
234 std::optional<PromotableAllocationOpInterface> promoteSlot();
235
236private:
237 /// Computes the reaching definition for all the operations that require
238 /// promotion, including within nested regions needing promotion.
239 /// `reachingDef` is the value the slot contains at the beginning of the
240 /// block. This member function returns the reached definition at the end of
241 /// the block. If the block contains a region that needs promotion, the
242 /// blocking uses of that region will have been removed. This member function
243 /// will not remove the blocking uses contained directly in the block.
244 ///
245 /// The `reachingDef` may be a null value. In that case, a lazily-created
246 /// default value will be used.
247 ///
248 /// This member function must only be called at most once per block.
249 Value promoteInBlock(Block *block, Value reachingDef);
250
251 /// Computes the reaching definition for all the operations that require
252 /// promotion, including within nested regions needing promotion, and removes
253 /// the blocking uses of the slot within the region.
254 /// `reachingDef` is the value the slot contains at the beginning of the
255 /// region.
256 ///
257 /// The `reachingDef` may be a null value. In that case, a lazily-created
258 /// default value will be used.
259 ///
260 /// This member function must only be called at most once per region.
261 void promoteInRegion(Region *region, Value reachingDef);
262
263 /// Removes the blocking uses of the slot within the given region, in
264 /// reverse topological order. If the content of the region was moved out
265 /// to a different region, the new region will be processed instead.
266 void removeBlockingUses(Region *region);
267
268 /// Removes operations and merge point block arguments that ended up not being
269 /// necessary.
270 void removeUnusedItems();
271
272 /// Lazily-constructed default value representing the content of the slot when
273 /// no store has been executed. This function may mutate IR.
274 Value getOrCreateDefaultValue();
275
276 MemorySlot slot;
277 PromotableAllocationOpInterface allocator;
278 OpBuilder &builder;
279 /// Potentially non-initialized default value. Use `getOrCreateDefaultValue`
280 /// to initialize it on demand.
281 Value defaultValue;
282 /// Contains the reaching definition at this operation. Reaching definitions
283 /// are only computed for promotable memory operations with blocking uses.
286
287 /// Contains the reaching definition at the end of the blocks visited so far.
288 DenseMap<Block *, Value> reachingAtBlockEnd;
289
290 /// Lists all the values that have been set by a memory operation as a
291 /// reaching definition at one point during the promotion. The accompanying
292 /// operation is the memory operation that originally stored the value.
294 /// Operations to visit with the `visitReplacedValues` method at the end of
295 /// the promotion.
296 llvm::SmallVector<PromotableOpInterface> toVisitReplacedValues;
297 /// Operations to be erased at the end of the promotion.
298 llvm::SmallSetVector<Operation *, 8> toErase;
299
300 DominanceInfo &dominance;
301 const DataLayout &dataLayout;
302 MemorySlotPromotionInfo info;
303 const Mem2RegStatistics &statistics;
304
305 /// Shared cache of block indices of specific regions.
306 /// Cache entries must be invalidated before any addition, removal or
307 /// reordering of blocks in the corresponding region.
308 /// Cache entries are *NOT* invalidated if all the blocks of the corresponding
309 /// region are moved to an empty region.
310 BlockIndexCache &blockIndexCache;
311};
312
313} // namespace
314
315MemorySlotPromoter::MemorySlotPromoter(
316 MemorySlot slot, PromotableAllocationOpInterface allocator,
317 OpBuilder &builder, DominanceInfo &dominance, const DataLayout &dataLayout,
318 MemorySlotPromotionInfo info, const Mem2RegStatistics &statistics,
319 BlockIndexCache &blockIndexCache)
320 : slot(slot), allocator(allocator), builder(builder), dominance(dominance),
321 dataLayout(dataLayout), info(std::move(info)), statistics(statistics),
322 blockIndexCache(blockIndexCache) {
323#ifndef NDEBUG
324 auto isResultOrNewBlockArgument = [&]() {
325 if (BlockArgument arg = dyn_cast<BlockArgument>(slot.ptr))
326 return arg.getOwner()->getParentOp() == allocator;
327 return slot.ptr.getDefiningOp() == allocator;
328 };
329
330 assert(isResultOrNewBlockArgument() &&
331 "a slot must be a result of the allocator or an argument of the child "
332 "regions of the allocator");
333#endif // NDEBUG
334}
335
336Value MemorySlotPromoter::getOrCreateDefaultValue() {
337 if (defaultValue)
338 return defaultValue;
339
340 OpBuilder::InsertionGuard guard(builder);
342 return defaultValue = allocator.getDefaultValue(slot, builder);
343}
344
345LogicalResult MemorySlotPromotionAnalyzer::computeBlockingUses(
346 RegionBlockingUsesMap &userToBlockingUses,
347 DenseMap<Region *, RegionPromotionInfo> &regionsToPromote) {
348 // The promotion of an operation may require the promotion of further
349 // operations (typically, removing operations that use an operation that must
350 // delete itself). We thus need to start from the use of the slot pointer and
351 // propagate further requests through the forward slice.
352
353 // Graph regions are not supported.
354 Region *slotPtrRegion = slot.ptr.getParentRegion();
355 auto slotPtrRegionOp =
356 dyn_cast<RegionKindInterface>(slotPtrRegion->getParentOp());
357 if (slotPtrRegionOp &&
358 slotPtrRegionOp.getRegionKind(slotPtrRegion->getRegionNumber()) ==
359 RegionKind::Graph)
360 return failure();
361
362 // First insert that all immediate users of the slot pointer must no longer
363 // use it.
364 for (OpOperand &use : slot.ptr.getUses()) {
365 SmallPtrSet<OpOperand *, 4> &blockingUses =
366 userToBlockingUses[use.getOwner()->getParentRegion()][use.getOwner()];
367 blockingUses.insert(&use);
368 }
369
370 // Regions that immediately contain a slot memory use that is not a store.
371 RegionSet regionsWithDirectUse;
372 // Regions that immediately contain a slot memory use that is a store.
373 RegionSet regionsWithDirectStore;
374
375 // Then, propagate the requirements for the removal of uses. The
376 // topologically-sorted forward slice allows for all blocking uses of an
377 // operation to have been computed before it is reached. Operations are
378 // traversed in topological order of their uses, starting from the slot
379 // pointer.
380 SetVector<Operation *> forwardSlice;
381 mlir::getForwardSlice(slot.ptr, &forwardSlice);
382 for (Operation *user : forwardSlice) {
383 // If the next operation has no blocking uses, everything is fine.
384 auto *blockingUsesMapIt = userToBlockingUses.find(user->getParentRegion());
385 if (blockingUsesMapIt == userToBlockingUses.end())
386 continue;
387 BlockingUsesMap &blockingUsesMap = blockingUsesMapIt->second;
388 auto *it = blockingUsesMap.find(user);
389 if (it == blockingUsesMap.end())
390 continue;
391
392 SmallPtrSet<OpOperand *, 4> &blockingUses = it->second;
393
394 SmallVector<OpOperand *> newBlockingUses;
395 // If the operation decides it cannot deal with removing the blocking uses,
396 // promotion must fail.
397 if (auto promotable = dyn_cast<PromotableOpInterface>(user)) {
398 if (!promotable.canUsesBeRemoved(blockingUses, newBlockingUses,
399 dataLayout))
400 return failure();
401 regionsWithDirectUse.insert(user->getParentRegion());
402 } else if (auto promotable = dyn_cast<PromotableMemOpInterface>(user)) {
403 if (!promotable.canUsesBeRemoved(slot, blockingUses, newBlockingUses,
404 dataLayout))
405 return failure();
406
407 // Operations that interact with the slot's memory will be promoted using
408 // a reaching definition. Therefore, the operation must be within a region
409 // where the reaching definition can be computed.
410 if (promotable.storesTo(slot))
411 regionsWithDirectStore.insert(user->getParentRegion());
412 else
413 regionsWithDirectUse.insert(user->getParentRegion());
414 } else {
415 // An operation that has blocking uses must be promoted. If it is not
416 // promotable, promotion must fail.
417 return failure();
418 }
419
420 // Then, register any new blocking uses for coming operations.
421 for (OpOperand *blockingUse : newBlockingUses) {
422 assert(llvm::is_contained(user->getResults(), blockingUse->get()));
423
424 SmallPtrSetImpl<OpOperand *> &newUserBlockingUseSet =
425 blockingUsesMap[blockingUse->getOwner()];
426 newUserBlockingUseSet.insert(blockingUse);
427 }
428 }
429
430 // Finally, check that all the regions needed are promotable, and propagate
431 // the constraint to their parent regions.
432 auto visitRegions = [&](SmallVector<Region *> &regionsToPropagateFrom,
433 bool hasValueStores) {
434 while (!regionsToPropagateFrom.empty()) {
435 Region *region = regionsToPropagateFrom.pop_back_val();
436
437 if (region == slot.ptr.getParentRegion() ||
438 regionsToPromote.contains(region))
439 continue;
440
441 RegionPromotionInfo &regionInfo = regionsToPromote[region];
442 regionInfo.hasValueStores = hasValueStores;
443
444 auto promotableParentOp =
445 dyn_cast<PromotableRegionOpInterface>(region->getParentOp());
446 if (!promotableParentOp)
447 return failure();
448
449 if (!promotableParentOp.isRegionPromotable(slot, region, hasValueStores))
450 return failure();
451
452 regionsToPropagateFrom.push_back(region->getParentRegion());
453 }
454
455 return success();
456 };
457
458 // Start with the regions that directly contain a store to give priority
459 // to stores in the propagation of `hasValueStores` information.
460 SmallVector<Region *> regionsToPropagateFrom(regionsWithDirectStore.begin(),
461 regionsWithDirectStore.end());
462 if (failed(visitRegions(regionsToPropagateFrom, true)))
463 return failure();
464
465 // Then, propagate from the regions that directly contain non-store uses.
466 regionsToPropagateFrom.clear();
467 regionsToPropagateFrom.append(regionsWithDirectUse.begin(),
468 regionsWithDirectUse.end());
469 if (failed(visitRegions(regionsToPropagateFrom, false)))
470 return failure();
471
472 return success();
473}
474
475using IDFCalculator = llvm::IDFCalculatorBase<Block, false>;
476void MemorySlotPromotionAnalyzer::computeMergePoints(
477 Region *region, SmallPtrSetImpl<Block *> &definingBlocks,
478 SmallPtrSetImpl<Block *> &mergePoints) {
479 if (region->hasOneBlock())
480 return;
481
482 IDFCalculator idfCalculator(dominance.getDomTree(region));
483 idfCalculator.setDefiningBlocks(definingBlocks);
484
485 SmallVector<Block *> mergePointsVec;
486 idfCalculator.calculate(mergePointsVec);
487
488 mergePoints.insert_range(mergePointsVec);
489}
490
491bool MemorySlotPromotionAnalyzer::areMergePointsUsable(
492 SmallPtrSetImpl<Block *> &mergePoints) {
493 for (Block *mergePoint : mergePoints)
494 for (Block *pred : mergePoint->getPredecessors())
495 if (!isa<BranchOpInterface>(pred->getTerminator()))
496 return false;
497
498 return true;
499}
500
501std::optional<MemorySlotPromotionInfo>
502MemorySlotPromotionAnalyzer::computeInfo() {
503 MemorySlotPromotionInfo info;
504
505 // First, find the set of operations that will need to be changed for the
506 // promotion to happen. These operations need to resolve some of their uses,
507 // either by rewiring them or simply deleting themselves. If any of them
508 // cannot find a way to resolve their blocking uses, we abort the promotion.
509 // We also compute at this stage the regions that will be analyzed for
510 // reaching definition information.
511 if (failed(
512 computeBlockingUses(info.userToBlockingUses, info.regionsToPromote)))
513 return {};
514
515 // Compute the blocks containing a store for each region, either directly or
516 // inherited from a nested region. As a side effect, `definingBlocks` contains
517 // all regions with at least one store.
519 for (Operation *user : slot.ptr.getUsers())
520 if (auto storeOp = dyn_cast<PromotableMemOpInterface>(user))
521 if (storeOp.storesTo(slot))
522 definingBlocks[user->getParentRegion()].insert(user->getBlock());
523 for (auto &[region, regionInfo] : info.regionsToPromote)
524 if (regionInfo.hasValueStores)
525 definingBlocks[region->getParentRegion()].insert(
526 region->getParentOp()->getBlock());
527
528 // Then, compute blocks in which two or more definitions of the allocated
529 // variable may conflict. These blocks will need a new block argument to
530 // accommodate this.
531 for (auto &[region, defBlocks] : definingBlocks)
532 computeMergePoints(region, defBlocks, info.mergePoints);
533
534 // The slot can be promoted if the block arguments to be created can
535 // actually be populated with values, which may not be possible depending
536 // on their predecessors.
537 if (!areMergePointsUsable(info.mergePoints))
538 return {};
539
540 return info;
541}
542
543Value MemorySlotPromoter::promoteInBlock(Block *block, Value reachingDef) {
544 SmallVector<Operation *> blockOps;
545 for (Operation &op : block->getOperations())
546 blockOps.push_back(&op);
547 for (Operation *op : blockOps) {
548 // Promote operations that interact with the slot's memory.
549 if (auto memOp = dyn_cast<PromotableMemOpInterface>(op)) {
550 if (info.userToBlockingUses[memOp->getParentRegion()].contains(memOp))
551 reachingDefs.insert({memOp, reachingDef});
552
553 if (memOp.storesTo(slot)) {
554 builder.setInsertionPointAfter(memOp);
555 // To not expose default value creation to the interfaces, if we have
556 // no reaching definition by now, we set it to the default value.
557 // This is slightly too eager as `getStored` may not need it.
558 if (!reachingDef)
559 reachingDef = getOrCreateDefaultValue();
560 Value stored = memOp.getStored(slot, builder, reachingDef, dataLayout);
561 assert(stored && "a memory operation storing to a slot must provide a "
562 "new definition of the slot");
563 reachingDef = stored;
564 replacedValuesMap[memOp] = stored;
565 }
566 }
567
568 // Promote regions that contain operations that interact with the slot's
569 // memory.
570 if (auto promotableRegionOp = dyn_cast<PromotableRegionOpInterface>(op)) {
571 bool needsPromotion = false;
572 bool hasValueStores = false;
573 for (Region &region : op->getRegions()) {
574 auto regionInfoIt = info.regionsToPromote.find(&region);
575 if (regionInfoIt == info.regionsToPromote.end())
576 continue;
577 needsPromotion = true;
578 if (!regionInfoIt->second.hasValueStores)
579 continue;
580
581 hasValueStores = true;
582 break;
583 }
584
585 if (needsPromotion) {
586 llvm::SmallMapVector<Region *, Value, 2> regionsToProcess;
587
588 // To not expose default value creation to the interfaces, if we have
589 // no reaching definition by now, we set it to the default value.
590 // This is slightly too eager as `setupPromotion` may not need it.
591 if (!reachingDef)
592 reachingDef = getOrCreateDefaultValue();
593
594 promotableRegionOp.setupPromotion(slot, reachingDef, hasValueStores,
595 regionsToProcess);
596
597#ifndef NDEBUG
598 for (Region &region : op->getRegions())
599 if (info.regionsToPromote.contains(&region))
600 assert(
601 regionsToProcess.contains(&region) &&
602 "reaching definition must be provided for a required region");
603#endif // NDEBUG
604
605 for (auto &[region, reachingDef] : regionsToProcess) {
606 assert(region->getParentOp() == op &&
607 "region must be part of the operation");
608 if (!info.regionsToPromote.contains(region))
609 continue;
610 promoteInRegion(region, reachingDef);
611 }
612
613 // TODO: Currently we have to invalidate the dominance information of
614 // the regions of the operation because finalizePromotion may move their
615 // content. We might want to support moving dominance information
616 // accross regions as this can be detected.
617 for (Region &region : op->getRegions())
618 dominance.invalidate(&region);
619
620 builder.setInsertionPointAfter(op);
621 reachingDef = promotableRegionOp.finalizePromotion(
622 slot, reachingDef, hasValueStores, reachingAtBlockEnd, builder);
623
624 // Blocking uses can then be removed for the regions that were promoted.
625 // Even though `finalizePromotion` may have moved regions to a new
626 // operation, `removeBlockingUses` handles this case and will redirect
627 // processing to the correct region.
628 for (auto &[region, reachingDef] : regionsToProcess)
629 removeBlockingUses(region);
630 }
631 }
632 }
633
634 reachingAtBlockEnd[block] = reachingDef;
635 return reachingDef;
636}
637
638void MemorySlotPromoter::promoteInRegion(Region *region, Value reachingDef) {
639 if (region->hasOneBlock()) {
640 promoteInBlock(&region->front(), reachingDef);
641 return;
642 }
643
644 struct DfsJob {
645 llvm::DomTreeNodeBase<Block> *block;
646 Value reachingDef;
647 };
648
649 SmallVector<DfsJob> dfsStack;
650
651 auto &domTree = dominance.getDomTree(region);
652
653 dfsStack.emplace_back<DfsJob>(
654 {domTree.getNode(&region->front()), reachingDef});
655
656 while (!dfsStack.empty()) {
657 DfsJob job = dfsStack.pop_back_val();
658 Block *block = job.block->getBlock();
659
660 if (info.mergePoints.contains(block)) {
661 BlockArgument blockArgument =
662 block->addArgument(slot.elemType, slot.ptr.getLoc());
663 job.reachingDef = blockArgument;
664 }
665
666 job.reachingDef = promoteInBlock(block, job.reachingDef);
667
668 if (auto terminator = dyn_cast<BranchOpInterface>(block->getTerminator())) {
669 for (BlockOperand &blockOperand : terminator->getBlockOperands()) {
670 if (info.mergePoints.contains(blockOperand.get())) {
671 if (!job.reachingDef)
672 job.reachingDef = getOrCreateDefaultValue();
673
674 terminator.getSuccessorOperands(blockOperand.getOperandNumber())
675 .append(job.reachingDef);
676 }
677 }
678 }
679
680 for (auto *child : job.block->children())
681 dfsStack.emplace_back<DfsJob>({child, job.reachingDef});
682 }
683}
684
685/// Gets or creates a block index mapping for the region of which the entry
686/// block is `regionEntryBlock`.
687static const DenseMap<Block *, size_t> &
688getOrCreateBlockIndices(BlockIndexCache &blockIndexCache,
689 Block *regionEntryBlock) {
690 auto [it, inserted] = blockIndexCache.try_emplace(regionEntryBlock);
691 if (!inserted)
692 return it->second;
693
694 DenseMap<Block *, size_t> &blockIndices = it->second;
695 SetVector<Block *> topologicalOrder =
696 getBlocksSortedByDominance(*regionEntryBlock->getParent());
697 for (auto [index, block] : llvm::enumerate(topologicalOrder))
698 blockIndices[block] = index;
699 return blockIndices;
700}
701
702/// Sorts `ops` according to dominance. Relies on the topological order of basic
703/// blocks to get a deterministic ordering. Uses `blockIndexCache` to avoid the
704/// potentially expensive recomputation of a block index map.
705/// This function assumes no blocks are ever deleted or entry block changed
706/// during the lifetime of the block index cache.
708 BlockIndexCache &blockIndexCache) {
709 if (region.empty())
710 return;
711
712 // Produce a topological block order and construct a map to lookup the indices
713 // of blocks.
714 const DenseMap<Block *, size_t> &topoBlockIndices =
715 getOrCreateBlockIndices(blockIndexCache, &region.front());
716
717 // Combining the topological order of the basic blocks together with block
718 // internal operation order guarantees a deterministic, dominance respecting
719 // order.
720 llvm::sort(ops, [&](Operation *lhs, Operation *rhs) {
721 size_t lhsBlockIndex = topoBlockIndices.at(lhs->getBlock());
722 size_t rhsBlockIndex = topoBlockIndices.at(rhs->getBlock());
723 if (lhsBlockIndex == rhsBlockIndex)
724 return lhs->isBeforeInBlock(rhs);
725 return lhsBlockIndex < rhsBlockIndex;
726 });
727}
728
729void MemorySlotPromoter::removeBlockingUses(Region *region) {
730 auto *blockingUsesMapIt = info.userToBlockingUses.find(region);
731 if (blockingUsesMapIt == info.userToBlockingUses.end())
732 return;
733 BlockingUsesMap &blockingUsesMap = blockingUsesMapIt->second;
734 if (blockingUsesMap.empty())
735 return;
736
737 // Operations may have been moved to a different region at this point.
738 // To cover this, we process the current region of an operation to remove
739 // instead of the provided region.
740 region = blockingUsesMap.front().first->getParentRegion();
741#ifndef NDEBUG
742 for (auto &[op, blockingUses] : blockingUsesMap)
743 assert(op->getParentRegion() == region &&
744 "all operations must still be in the same region");
745#endif // NDEBUG
746
747 llvm::SmallVector<Operation *> usersToRemoveUses(
748 llvm::make_first_range(blockingUsesMap));
749
750 // Sort according to dominance.
751 dominanceSort(usersToRemoveUses, *region, blockIndexCache);
752
753 // Iterate over the operations to rewrite in reverse dominance order.
754 for (Operation *toPromote : llvm::reverse(usersToRemoveUses)) {
755 if (auto toPromoteMemOp = dyn_cast<PromotableMemOpInterface>(toPromote)) {
756 Value reachingDef = reachingDefs.lookup(toPromoteMemOp);
757 // If no reaching definition is known, this use is outside the reach of
758 // the slot. The default value should thus be used.
759 // FIXME: This is too eager, and will generate default values even for
760 // pure stores. This cannot be removed easily as partial stores may
761 // still require a default value to complete.
762 if (!reachingDef)
763 reachingDef = getOrCreateDefaultValue();
764
765 builder.setInsertionPointAfter(toPromote);
766 if (toPromoteMemOp.removeBlockingUses(slot, blockingUsesMap[toPromote],
767 builder, reachingDef,
768 dataLayout) == DeletionKind::Delete)
769 toErase.insert(toPromote);
770 if (toPromoteMemOp.storesTo(slot))
771 if (Value replacedValue = replacedValuesMap[toPromoteMemOp])
772 replacedValues.push_back({toPromoteMemOp, replacedValue});
773 continue;
774 }
775
776 auto toPromoteBasic = cast<PromotableOpInterface>(toPromote);
777 builder.setInsertionPointAfter(toPromote);
778 if (toPromoteBasic.removeBlockingUses(blockingUsesMap[toPromote],
779 builder) == DeletionKind::Delete)
780 toErase.insert(toPromote);
781 if (toPromoteBasic.requiresReplacedValues())
782 toVisitReplacedValues.push_back(toPromoteBasic);
783 }
784}
785
786void MemorySlotPromoter::removeUnusedItems() {
787 // We want to eliminate unused block arguments. Because block arguments can be
788 // used to populate other block arguments, there might be cycles of arguments
789 // that are only used to populate each-other. We therefore need a small
790 // dataflow analysis to identify which block arguments are truly used.
791
792 SmallPtrSet<BlockArgument, 8> mergePointArgsUnused;
793 SmallVector<BlockArgument> usedMergePointArgsToProcess;
794
795 // First, separate the block arguments that are not used or only used for the
796 // purpose of populating a merge point block argument from the others. These
797 // block arguments are potentially unused. Meanwhile, arguments that are
798 // definitely used will be the starting point of the propagation of the
799 // analysis.
800 auto isDefinitelyUsed = [&](BlockArgument arg) {
801 for (auto &use : arg.getUses()) {
802 if (llvm::is_contained(toErase, use.getOwner()))
803 continue;
804
805 // We now want to detect whether the use is to populate a merge point
806 // block argument. If it is not, the argument is definitely used.
807
808 auto branchOp = dyn_cast<BranchOpInterface>(use.getOwner());
809 if (!branchOp)
810 return true;
811
812 std::optional<BlockArgument> successorArgument =
813 branchOp.getSuccessorBlockArgument(use.getOperandNumber());
814 if (!successorArgument)
815 return true;
816
817 if (!info.mergePoints.contains(successorArgument->getOwner()))
818 return true;
819
820 // The last block argument of a merge point is its reaching definition
821 // argument. If the argument being populated is not the last one, it is a
822 // genuine use of the value.
823 bool isLastBlockArgument =
824 successorArgument->getArgNumber() ==
825 successorArgument->getOwner()->getNumArguments() - 1;
826 if (!isLastBlockArgument)
827 return true;
828 }
829 return false;
830 };
831
832 for (Block *mergePoint : info.mergePoints) {
833 BlockArgument arg = mergePoint->getArguments().back();
834 if (isDefinitelyUsed(arg))
835 usedMergePointArgsToProcess.push_back(arg);
836 else
837 mergePointArgsUnused.insert(arg);
838 }
839
840 // We now refine mergePointArgsUnused from the information of which block
841 // arguments are definitely used.
842 while (!usedMergePointArgsToProcess.empty()) {
843 BlockArgument arg = usedMergePointArgsToProcess.pop_back_val();
844 Block *mergePoint = arg.getOwner();
845
846 assert(arg.getArgNumber() == mergePoint->getNumArguments() - 1 &&
847 "merge point argument must be the last argument of the merge point");
848
849 for (BlockOperand &use : mergePoint->getUses()) {
850 // If a value used to populate this used merge point argument is another
851 // merge point block argument that is currently considered unused, it must
852 // now be considered used and processed as such later.
853
854 auto branch = cast<BranchOpInterface>(use.getOwner());
855 SuccessorOperands succOperands =
856 branch.getSuccessorOperands(use.getOperandNumber());
857
858 // The successor operand is either the last one or is not present if the
859 // user block is dead.
860 assert(succOperands.size() == mergePoint->getNumArguments() ||
861 succOperands.size() + 1 == mergePoint->getNumArguments());
862
863 // If the user block is dead, the default value acts as a placeholder
864 // dummy value.
865 if (succOperands.size() + 1 == mergePoint->getNumArguments())
866 succOperands.append(getOrCreateDefaultValue());
867
868 Value populatedValue = succOperands[arg.getArgNumber()];
869 auto populatedValueAsArg = dyn_cast<BlockArgument>(populatedValue);
870 if (populatedValueAsArg &&
871 mergePointArgsUnused.erase(populatedValueAsArg))
872 usedMergePointArgsToProcess.push_back(populatedValueAsArg);
873 }
874
875 builder.setInsertionPointToStart(mergePoint);
876 allocator.handleBlockArgument(slot, arg, builder);
877 if (statistics.newBlockArgumentAmount)
878 (*statistics.newBlockArgumentAmount)++;
879 }
880
881 for (Operation *toEraseOp : toErase)
882 toEraseOp->erase();
883
884 // First, erase all successor operands that feed into unused merge point
885 // block arguments. This must be done before erasing the block arguments
886 // themselves because an unused merge point argument may be used to
887 // populate another unused merge point argument via a branch operation.
888 for (BlockArgument arg : mergePointArgsUnused) {
889 Block *mergePoint = arg.getOwner();
890 for (BlockOperand &use : mergePoint->getUses()) {
891 auto branch = cast<BranchOpInterface>(use.getOwner());
892 SuccessorOperands succOperands =
893 branch.getSuccessorOperands(use.getOperandNumber());
894 succOperands.erase(arg.getArgNumber());
895 }
896 }
897
898 // Now that all successor operands feeding unused args have been removed,
899 // erase the block arguments themselves.
900 for (BlockArgument arg : mergePointArgsUnused) {
901 Block *mergePoint = arg.getOwner();
902 mergePoint->eraseArgument(mergePoint->getNumArguments() - 1);
903 }
904}
905
906std::optional<PromotableAllocationOpInterface>
907MemorySlotPromoter::promoteSlot() {
908 // Perform the promotion recursively through nested regions. The reaching
909 // definition starts with a null value that will be replaced by a
910 // lazily-created default value if the value must be passed to a promotion
911 // interface while no store has been encountered yet.
912 // Innermost regions will see their blocking uses be removed, but not the
913 // outermost region which we have to remove manually afterwards. This is
914 // because PromotableRegionOpInterface::finalizePromotion must be called
915 // before removeBlockingUses.
916 promoteInRegion(slot.ptr.getParentRegion(), nullptr);
917
918 // Blocking uses can then be removed for the outermost region.
919 removeBlockingUses(slot.ptr.getParentRegion());
920
921 // Notify operations that requested it of the reaching definitions set by
922 // storing memory operations.
923 for (PromotableOpInterface op : toVisitReplacedValues) {
924 builder.setInsertionPointAfter(op);
925 op.visitReplacedValues(replacedValues, builder);
926 }
927
928 // Finally, remove unused operations and merge point block arguments.
929 removeUnusedItems();
930
931 assert(slot.ptr.use_empty() &&
932 "after promotion, the slot pointer should not be used anymore");
933
934 LDBG() << "Promoted memory slot: " << slot.ptr;
935
936 if (statistics.promotedAmount)
937 (*statistics.promotedAmount)++;
938
939 return allocator.handlePromotionComplete(slot, defaultValue, builder);
940}
941
944 const DataLayout &dataLayout, DominanceInfo &dominance,
945 Mem2RegStatistics statistics) {
946 bool promotedAny = false;
947
948 // A cache that stores deterministic block indices which are used to determine
949 // a valid operation modification order. The block index maps are computed
950 // lazily and cached to avoid expensive recomputation.
951 BlockIndexCache blockIndexCache;
952
954
956 newWorkList.reserve(workList.size());
957 while (true) {
958 bool changesInThisRound = false;
959 for (PromotableAllocationOpInterface allocator : workList) {
960 bool changedAllocator = false;
961 for (MemorySlot slot : allocator.getPromotableSlots()) {
962 if (slot.ptr.use_empty())
963 continue;
964
965 MemorySlotPromotionAnalyzer analyzer(slot, dominance, dataLayout);
966 std::optional<MemorySlotPromotionInfo> info = analyzer.computeInfo();
967 if (info) {
968 std::optional<PromotableAllocationOpInterface> newAllocator =
969 MemorySlotPromoter(slot, allocator, builder, dominance,
970 dataLayout, std::move(*info), statistics,
971 blockIndexCache)
972 .promoteSlot();
973 changedAllocator = true;
974 // Add newly created allocators to the worklist for further
975 // processing.
976 if (newAllocator)
977 newWorkList.push_back(*newAllocator);
978
979 // A break is required, since promoting a slot may invalidate the
980 // remaining slots of an allocator.
981 break;
982 }
983 }
984 if (!changedAllocator)
985 newWorkList.push_back(allocator);
986 changesInThisRound |= changedAllocator;
987 }
988 if (!changesInThisRound)
989 break;
990 promotedAny = true;
991
992 // Swap the vector's backing memory and clear the entries in newWorkList
993 // afterwards. This ensures that additional heap allocations can be avoided.
994 workList.swap(newWorkList);
995 newWorkList.clear();
996 }
997
998 return success(promotedAny);
999}
1000
1001namespace {
1002
1003struct Mem2Reg : impl::Mem2RegBase<Mem2Reg> {
1004 using impl::Mem2RegBase<Mem2Reg>::Mem2RegBase;
1005
1006 void runOnOperation() override {
1007 Operation *scopeOp = getOperation();
1008
1009 Mem2RegStatistics statistics{&promotedAmount, &newBlockArgumentAmount};
1010
1011 bool changed = false;
1012
1013 auto &dataLayoutAnalysis = getAnalysis<DataLayoutAnalysis>();
1014 const DataLayout &dataLayout = dataLayoutAnalysis.getAtOrAbove(scopeOp);
1015 auto &dominance = getAnalysis<DominanceInfo>();
1016
1017 for (Region &region : scopeOp->getRegions()) {
1018 if (region.getBlocks().empty())
1019 continue;
1020
1021 OpBuilder builder(&region.front(), region.front().begin());
1022
1023 SmallVector<PromotableAllocationOpInterface> allocators;
1024 // Build a list of allocators to attempt to promote the slots of.
1025 region.walk([&](PromotableAllocationOpInterface allocator) {
1026 allocators.emplace_back(allocator);
1027 });
1028
1029 // Attempt promoting as many of the slots as possible.
1030 if (succeeded(tryToPromoteMemorySlots(allocators, builder, dataLayout,
1031 dominance, statistics)))
1032 changed = true;
1033 }
1034 if (!changed)
1035 markAllAnalysesPreserved();
1036 }
1037};
1038
1039} // namespace
return success()
lhs
*if copies could not be generated due to yet unimplemented cases *copyInPlacementStart and copyOutPlacementStart in copyPlacementBlock *specify the insertion points where the incoming copies and outgoing should be inserted(the insertion happens right before the *insertion point). Since `begin` can itself be invalidated due to the memref *rewriting done from this method
static void dominanceSort(SmallVector< Operation * > &ops, Region &region, BlockIndexCache &blockIndexCache)
Sorts ops according to dominance.
Definition Mem2Reg.cpp:707
static const DenseMap< Block *, size_t > & getOrCreateBlockIndices(BlockIndexCache &blockIndexCache, Block *regionEntryBlock)
Gets or creates a block index mapping for the region of which the entry block is regionEntryBlock.
Definition Mem2Reg.cpp:688
llvm::IDFCalculatorBase< Block, false > IDFCalculator
Definition Mem2Reg.cpp:475
This class represents an argument of a Block.
Definition Value.h:306
unsigned getArgNumber() const
Returns the number of this argument.
Definition Value.h:318
Block * getOwner() const
Returns the block that owns this argument.
Definition Value.h:315
Block represents an ordered list of Operations.
Definition Block.h:33
unsigned getNumArguments()
Definition Block.h:138
Region * getParent() const
Provide a 'getParent' method for ilist_node_with_parent methods.
Definition Block.cpp:27
OpListType & getOperations()
Definition Block.h:147
Operation * getTerminator()
Get the terminator operation of this block.
Definition Block.cpp:249
BlockArgument addArgument(Type type, Location loc)
Add one value to the argument list.
Definition Block.cpp:158
BlockArgListType getArguments()
Definition Block.h:97
iterator begin()
Definition Block.h:153
void eraseArgument(unsigned index)
Erase the argument at 'index' and remove it from the argument list.
Definition Block.cpp:198
The main mechanism for performing data layout queries.
A class for computing basic dominance information.
Definition Dominance.h:140
use_range getUses() const
Returns a range of all uses, which is useful for iterating over all uses.
This class helps build Operations.
Definition Builders.h:209
void setInsertionPointToStart(Block *block)
Sets the insertion point to the start of the specified block.
Definition Builders.h:433
void setInsertionPointAfter(Operation *op)
Sets the insertion point to the node after the specified operation, which will cause subsequent inser...
Definition Builders.h:414
Operation is the basic unit of execution within MLIR.
Definition Operation.h:88
Block * getBlock()
Returns the operation block that contains this operation.
Definition Operation.h:231
MutableArrayRef< Region > getRegions()
Returns the regions held by this operation.
Definition Operation.h:703
This class contains a list of basic blocks and a link to the parent operation it is attached to.
Definition Region.h:26
Block & front()
Definition Region.h:65
Region * getParentRegion()
Return the region containing this region or nullptr if the region is attached to a top-level operatio...
Definition Region.cpp:45
unsigned getRegionNumber()
Return the number of this region in the parent operation.
Definition Region.cpp:62
bool empty()
Definition Region.h:60
Operation * getParentOp()
Return the parent operation this region is attached to.
Definition Region.h:200
BlockListType & getBlocks()
Definition Region.h:45
bool hasOneBlock()
Return true if this region has exactly one block.
Definition Region.h:68
RetT walk(FnT &&callback)
Walk all nested operations, blocks or regions (including this region), depending on the type of callb...
Definition Region.h:296
void erase(unsigned subStart, unsigned subLen=1)
Erase operands forwarded to the successor.
void append(ValueRange valueRange)
Add new operands that are forwarded to the successor.
unsigned size() const
Returns the amount of operands passed to the successor.
This class represents an instance of an SSA value in the MLIR system, representing a computable value...
Definition Value.h:96
bool use_empty() const
Returns true if this value has no uses.
Definition Value.h:208
use_range getUses() const
Returns a range of all uses, which is useful for iterating over all uses.
Definition Value.h:188
Block * getParentBlock()
Return the Block in which this Value is defined.
Definition Value.cpp:46
user_range getUsers() const
Definition Value.h:218
Location getLoc() const
Return the location of this value.
Definition Value.cpp:24
Operation * getDefiningOp() const
If this value is the result of an operation, return the operation that defines it.
Definition Value.cpp:18
Region * getParentRegion()
Return the Region in which this Value is defined.
Definition Value.cpp:39
DomTree & getDomTree(Region *region) const
Definition Dominance.h:101
void invalidate()
Invalidate dominance info.
Definition Dominance.cpp:37
detail::InFlightRemark failed(Location loc, RemarkOpts opts)
Report an optimization remark that failed.
Definition Remarks.h:717
Include the generated interface declarations.
SetVector< Block * > getBlocksSortedByDominance(Region &region)
Gets a list of blocks that is sorted according to dominance.
llvm::SetVector< T, Vector, Set, N > SetVector
Definition LLVM.h:125
llvm::DenseMap< KeyT, ValueT, KeyInfoT, BucketT > DenseMap
Definition LLVM.h:120
LogicalResult tryToPromoteMemorySlots(ArrayRef< PromotableAllocationOpInterface > allocators, OpBuilder &builder, const DataLayout &dataLayout, DominanceInfo &dominance, Mem2RegStatistics statistics={})
Attempts to promote the memory slots of the provided allocators.
Definition Mem2Reg.cpp:942
void getForwardSlice(Operation *op, SetVector< Operation * > *forwardSlice, const ForwardSliceOptions &options={})
Fills forwardSlice with the computed forward slice (i.e.
Statistics collected while applying mem2reg.
Definition Mem2Reg.h:18
llvm::Statistic * promotedAmount
Total amount of memory slots promoted.
Definition Mem2Reg.h:20
llvm::Statistic * newBlockArgumentAmount
Total amount of new block arguments inserted in blocks.
Definition Mem2Reg.h:22
Represents a slot in memory.
Value ptr
Pointer to the memory slot, used by operations to refer to it.
Type elemType
Type of the value contained in the slot.