Disable delete optimization and exit ref loop faster#6219
Open
ddanielr wants to merge 2 commits intoapache:2.1from
Open
Disable delete optimization and exit ref loop faster#6219ddanielr wants to merge 2 commits intoapache:2.1from
ddanielr wants to merge 2 commits intoapache:2.1from
Conversation
When delete table is called, the delete marker code checks to see if any file references exist in other tables. However, only a single reference has to exist for delete markers to be created. Added break out of for loop once a single entry was found. Fixed log lines and removed a `getLogger` call. Removed a nested try block in favor of a single try-with-resources
Adds a property to allow the scan of the metadata table to be skipped for table deletes. This forces delete markers to always be created when deleting tables instead of the manager deleting the volumes immediately.
Contributor
Author
|
@keith-turner & @ctubbsii I'm also leaning towards just removing this delete marker optimization entirely. I wanted to be a bit more careful about it in 2.1 but if you don't see a reason to keep this code around then I'll push a change to just remove the scan and volume deletion shortcut entirely. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a property to skip scanning the metadata table for table deletes.
DeleteMarker Creation Optimization
When the manager deletes a table it performs an optimization step by creating a batch scanner with 8 threads (not configurable) to scan all the other table file references on the metadata table and ensure that no file references are found for the given table volume.
The manager then directly deletes the volumes as opposed to writing delete markers and allowing the GC to handle the tablet file deletions.
This is a nice optimization to have when dealing with a small static set of tables. However, when table creation is dynamic these scans can cause unnecessary delays and/or hanging fate processes as all metadata tablets must be scanned in order to process a single table delete.
Unnecessary File Ref Counting
The batch scanner only needs to produce a single shared file ref result (
refCount) in order to trigger delete markers to be created as the code only checks ifrefCountis equal to zero.However, the existing code needlessly counts all of the found refs first.
This is unnecessary and a fast break was added to the iterator loop for the batch scanner.