diff --git a/README.md b/README.md index 0057d95..178ee38 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,10 @@ The example diagram below shows a configuration with three tiers. ![Storage Tiering Diagram](storage_tiering_diagram.jpg) +> [!NOTE] +> The `access_time_attribute` configuration option in `plugin_specific_configuration` has been removed as the iRODS Server tracks access time directly since v5.0.0. +> If you are upgrading, update any custom violating objects queries that used the Access Time AVU and see [Removing Leftover Access Time AVUs](#removing-leftover-access-time-avus). + ## How to build This project uses a "build hook" which allows the [iRODS Development Environment](https://github.com/irods/irods_development_environment) to build packages in the usual manner. Please see the instructions for building plugins with the development environment: [https://github.com/irods/irods_development_environment?tab=readme-ov-file#how-to-build-an-irods-plugin](https://github.com/irods/irods_development_environment?tab=readme-ov-file#how-to-build-an-irods-plugin) @@ -103,13 +107,13 @@ id name ### Customizing Metadata Attributes A number of metadata attributes are used within the storage tiering capability which identify the tier group, the amount of time data may be at rest within the tier, the optional query, etc. + These attributes may map to concepts already in use by other names within a given iRODS installation. For that reason we have exposed them as configuration options within the storage tiering **plugin_specific_configuration** block. For a default installation the following values are used: ``` "plugin_specific_configuration": { - "access_time_attribute" : "irods::access_time", "group_attribute" : "irods::storage_tiering::group", "time_attribute" : "irods::storage_tiering::time", "query_attribute" : "irods::storage_tiering::query", @@ -171,7 +175,7 @@ Data objects which have been labeled via particular metadata, or within a specif **Checking for resources in violating queries is required to prevent erroneous data migrations for replicas on other resources which may represent other tiers in the storage tiering group.** This can be done in the manner shown below (`DATA_RESC_ID in ('10068', '10069')`) or via resource hierarchy (e.g. `DATA_RESC_HIER like 'root_resc;%`), but the query must filter on resources to correctly identify violating objects. ``` -imeta set -R fast_resc irods::storage_tiering::query "select DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' and DATA_RESC_ID in ('10068', '10069')" +imeta set -R fast_resc irods::storage_tiering::query "select DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING' and DATA_RESC_ID in ('10068', '10069')" ``` The example above implements the default query. Note that the string `TIME_CHECK_STRING` is used in place of an actual time. This string will be replaced by the storage tiering framework with the appropriately computed time given the previous parameters. @@ -270,3 +274,44 @@ units: 1 ``` The above AVUs indicate that the resource represents tier 0 AND tier 1 in example_group_1. This should not be done. + +## Removing Leftover Access Time AVUs + +Prior to v6.0.0 of this plugin, the default behavior was to track access time per data object with an `irods::access_time` AVU. This plugin now uses the access time provided by the iRODS Server itself. + +To remove the now-redundant AVUs, either walk each data object with `imeta rm` or run some direct SQL. + +### Via `imeta` + +This will generate many single `imeta -M rm` commands, one per data object. + +Substitute the `irods::access_time` string if you used a custom `access_time_attribute` value. + +``` +iquest "imeta -M rm -d '%s/%s' '%s' '%s'" "select COLL_NAME, DATA_NAME, META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE where META_DATA_ATTR_NAME = 'irods::access_time'" > remove_storage_tiering_access_time_avus.sh +bash -x remove_storage_tiering_access_time_avus.sh +``` + +### Via direct SQL + +This will remove all rows in the join or junction table (`R_OBJT_METAMAP`) with a single database roundtrip. + +Substitute the `irods::access_time` string if you used a custom `access_time_attribute` value. + +``` +# PostgreSQL and MySQL +delete from R_OBJT_METAMAP +where meta_id in ( + select meta_id + from R_META_MAIN + where meta_attr_name = 'irods::access_time' +); +``` + +### Remove Unused Metadata + +Both approaches above will leave the entries in the `R_META_MAIN` table and can be removed. + +``` +iadmin rum +``` diff --git a/include/irods/private/storage_tiering/configuration.hpp b/include/irods/private/storage_tiering/configuration.hpp index 7761bbc..05a99ad 100644 --- a/include/irods/private/storage_tiering/configuration.hpp +++ b/include/irods/private/storage_tiering/configuration.hpp @@ -5,8 +5,8 @@ #include namespace irods { - struct storage_tiering_configuration { - std::string access_time_attribute{"irods::access_time"}; + struct storage_tiering_configuration + { std::string group_attribute{"irods::storage_tiering::group"}; std::string time_attribute{"irods::storage_tiering::time"}; std::string query_attribute{"irods::storage_tiering::query"}; diff --git a/include/irods/private/storage_tiering/storage_tiering.hpp b/include/irods/private/storage_tiering/storage_tiering.hpp index 3ea8b7e..e487428 100644 --- a/include/irods/private/storage_tiering/storage_tiering.hpp +++ b/include/irods/private/storage_tiering/storage_tiering.hpp @@ -61,8 +61,6 @@ namespace irods { std::string make_partial_list(resource_index_map::iterator _itr, resource_index_map::iterator _end); - void update_access_time_for_data_object(const std::string& _object_path); - std::string get_metadata_for_data_object(RcComm* _comm, const std::string& _meta_attr_name, const std::string& _object_path); diff --git a/packaging/test_plugin_unified_storage_tiering.py b/packaging/test_plugin_unified_storage_tiering.py index a9f0bb4..14560fa 100644 --- a/packaging/test_plugin_unified_storage_tiering.py +++ b/packaging/test_plugin_unified_storage_tiering.py @@ -27,7 +27,6 @@ def storage_tiering_configured_custom(arg=None, sleep_time=1): "instance_name" : "irods_rule_engine_plugin-unified_storage_tiering-instance", "plugin_name" : "irods_rule_engine_plugin-unified_storage_tiering", "plugin_specific_configuration" : { - "access_time_attribute" : "irods::custom_access_time", "group_attribute" : "irods::custom_storage_tiering::group", "time_attribute" : "irods::custom_storage_tiering::time", "query_attribute" : "irods::custom_storage_tiering::query", @@ -194,25 +193,6 @@ def get_tracked_replica(session, logical_path, group_attribute_name=None): return session.run_icommand(['iquest', '%s', tracked_replica_query])[0].strip() -def get_access_time(session, data_object_path): - """Return value of AVU with attribute irods::access_time annotated on provided data_object_path. - - If the provided data object path does not exist or does not have an irods::access_time AVU, the output will contain - CAT_NO_ROWS_FOUND. - - Arguments: - session - iRODSSession which will run the query - data_object_path - Full iRODS logical path to a data object - """ - coll_name = os.path.dirname(data_object_path) - data_name = os.path.basename(data_object_path) - - query = "select META_DATA_ATTR_VALUE where " \ - f"COLL_NAME = '{coll_name}' and DATA_NAME = '{data_name}' and " \ - "META_DATA_ATTR_NAME = 'irods::access_time'" - - return session.assert_icommand(['iquest', '%s', query], 'STDOUT')[1].strip() - class TestStorageTieringPlugin(ResourceBase, unittest.TestCase): def setUp(self): @@ -240,7 +220,7 @@ def setUp(self): admin_session.assert_icommand('imeta add -R rnd2 irods::storage_tiering::group example_group 2') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::time 5') admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::time 15') - admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''') + admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"''') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2') admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1') @@ -282,9 +262,23 @@ def test_put_and_get(self): alice_session.assert_icommand('iput -R rnd0 ' + filename) alice_session.assert_icommand('imeta ls -d ' + filename, 'STDOUT_SINGLELINE', filename) alice_session.assert_icommand('ils -L ' + filename, 'STDOUT_SINGLELINE', filename) - time.sleep(5) + + # sleep a little less than irods::storage_tiering::time + time.sleep(2) + # touch the file again, updating the mtime only + alice_session.assert_icommand('itouch ' + filename) + + # test updated mtime prevents staging to tier 1 + time.sleep(3) + invoke_storage_tiering_rule() + wait_for_empty_queue( + lambda: alice_session.assert_icommand('ils -L ' + filename, 'STDOUT_SINGLELINE', 'rnd0') + ) + time.sleep(2) # just to make sure + alice_session.assert_icommand('ils -L ' + filename, 'STDOUT_SINGLELINE', 'rnd0') # test stage to tier 1 + time.sleep(10) invoke_storage_tiering_rule() delay_assert_icommand(alice_session, 'ils -L ' + filename, 'STDOUT_SINGLELINE', 'rnd1') @@ -403,56 +397,6 @@ def test_single_quote_data_name__127(self): finally: alice_session.assert_icommand('irm -f ' + cmd_filename) - def test_storage_tiering_sets_admin_keyword_when_updating_access_time_as_rodsadmin__222(self): - with storage_tiering_configured(): - with session.make_session_for_existing_admin() as admin_session: - zone_name = IrodsConfig().client_environment['irods_zone_name'] - - with session.make_session_for_existing_user('alice', 'apass', lib.get_hostname(), zone_name) as alice_session: - resc_name = 'storage_tiering_ufs_222' - filename = 'test_file_issue_222' - - try: - lib.create_local_testfile(filename) - alice_session.assert_icommand(f'iput -R rnd0 {filename}') - alice_session.assert_icommand(f'imeta ls -d {filename}', 'STDOUT_SINGLELINE', filename) - alice_session.assert_icommand(f'ils -L {filename}', 'STDOUT_SINGLELINE', filename) - time.sleep(5) - - # test stage to tier 1. - invoke_storage_tiering_rule() - delay_assert_icommand(alice_session, f'ils -L {filename}', 'STDOUT_SINGLELINE', 'rnd1') - - # test stage to tier 2. - time.sleep(15) - invoke_storage_tiering_rule() - delay_assert_icommand(alice_session, f'ils -L {filename}', 'STDOUT_SINGLELINE', 'rnd2') - - # capture the access time. - _, out, _ = admin_session.assert_icommand( - ['iquest', '%s', f"select META_DATA_ATTR_VALUE where DATA_NAME = '{filename}' and META_DATA_ATTR_NAME = 'irods::access_time'"], 'STDOUT') - access_time = out.strip() - self.assertGreater(len(access_time), 0) - - # sleeping guarantees the access time will be different following the call to irepl. - time.sleep(2) - - # show the access time is updated correctly. - lib.create_ufs_resource(admin_session, resc_name) - admin_session.assert_icommand(f'irepl -M -R {resc_name} {alice_session.home_collection}/{filename}') - - _, out, _ = admin_session.assert_icommand( - ['iquest', '%s', f"select META_DATA_ATTR_VALUE where DATA_NAME = '{filename}' and META_DATA_ATTR_NAME = 'irods::access_time'"], 'STDOUT') - new_access_time = out.strip() - self.assertGreater(len(new_access_time), 0) - - # this assertion is the primary focus of the test. - self.assertGreater(int(new_access_time), int(access_time)) - - finally: - alice_session.assert_icommand(f'irm -f {filename}') - admin_session.assert_icommand(f'iadmin rmresc {resc_name}') - def test_checksum_verification_with_regular_user_data_object__issue_354(self): """This tests the fix for the issue where admin-initiated tiering with checksum verification would fail with permission denied when trying to compute/update @@ -519,7 +463,7 @@ def setUp(self): admin_session.assert_icommand('imeta add -R rnd2 irods::storage_tiering::group example_group 2') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::time 5') admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::time 15') - admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''') + admin_session.assert_icommand('''imeta set -R rnd1 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"''') admin_session.assert_icommand('iadmin mkresc ufs0g2 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs0g2', 'STDOUT_SINGLELINE', 'unixfilesystem') admin_session.assert_icommand('iadmin mkresc ufs1g2 unixfilesystem '+test.settings.HOSTNAME_1 +':/tmp/irods/ufs1g2', 'STDOUT_SINGLELINE', 'unixfilesystem') @@ -532,7 +476,7 @@ def setUp(self): admin_session.assert_icommand('imeta add -R ufs0g2 irods::storage_tiering::time 5') admin_session.assert_icommand('imeta add -R ufs1g2 irods::storage_tiering::time 15') - admin_session.assert_icommand('''imeta set -R ufs1g2 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs1g2' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''') + admin_session.assert_icommand('''imeta set -R ufs1g2 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs1g2' and DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"''') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2') admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1') @@ -635,7 +579,7 @@ def setUp(self): admin_session.assert_icommand('imeta add -R rnd2 irods::custom_storage_tiering::group example_group 2') admin_session.assert_icommand('imeta add -R rnd0 irods::custom_storage_tiering::time 5') admin_session.assert_icommand('imeta add -R rnd1 irods::custom_storage_tiering::time 15') - admin_session.assert_icommand('''imeta set -R rnd1 irods::custom_storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and META_DATA_ATTR_NAME = 'irods::custom_access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''') + admin_session.assert_icommand('''imeta set -R rnd1 irods::custom_storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs2' || = 'ufs3' and DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"''') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::minimum_delay_time_in_seconds 1') admin_session.assert_icommand('imeta add -R rnd0 irods::storage_tiering::maximum_delay_time_in_seconds 2') admin_session.assert_icommand('imeta add -R rnd1 irods::storage_tiering::minimum_delay_time_in_seconds 1') @@ -898,12 +842,10 @@ def creating_data_object_does_not_trigger_restage_test_impl( # first to ensure that the delay rule was not scheduled and processed by the time we check. admin_session.assert_icommand(["iqstat", "-a"], "STDOUT", "No delayed rules pending") - # Then, make sure that the object is in the correct tier and has an access_time. + # Then, make sure that the object is in the correct tier. self.assertTrue( lib.replica_exists_on_resource(admin_session, logical_path, expected_destination_resource) ) - access_time = get_access_time(admin_session, logical_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", access_time) finally: self.user1.assert_icommand(["irm", "-f", logical_path]) @@ -1507,7 +1449,7 @@ def setUp(self): admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::time 15') - admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs0' and META_DATA_ATTR_NAME = 'irods::access_time' and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"''') + admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where RESC_NAME = 'ufs0' and DATA_ACCESS_TIME < 'TIME_CHECK_STRING' and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"''') admin_session.assert_icommand('''imeta add -R ufs0 irods::storage_tiering::query archive_query specific''') admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::minimum_delay_time_in_seconds 1') admin_session.assert_icommand('imeta add -R ufs0 irods::storage_tiering::maximum_delay_time_in_seconds 2') @@ -1533,10 +1475,8 @@ def test_put_and_get(self): try: admin_session.assert_icommand('iput -R ufs0 ' + filename) admin_session.assert_icommand('imeta add -d ' + filename + ' archive_object yes') - admin_session.assert_icommand('imeta ls -d ' + filename, 'STDOUT_SINGLELINE', 'irods::access_time') admin_session.assert_icommand('iput -R ufs0 ' + filename + ' ' + filename2) - admin_session.assert_icommand('imeta ls -d ' + filename2, 'STDOUT_SINGLELINE', 'irods::access_time') # test stage to tier 1 invoke_storage_tiering_rule() @@ -1817,8 +1757,7 @@ def test_put_and_get(self): # Wait until the object migrates to the next tier. lib.delayAssert( lambda: lib.replica_exists_on_resource(admin_session, logical_path, "ufs0") == False) - lib.replica_exists_on_resource(admin_session, logical_path, "ufs2") - admin_session.assert_icommand('imeta ls -d '+filename, 'STDOUT_SINGLELINE', '--') + self.assertTrue(lib.replica_exists_on_resource(admin_session, logical_path, "ufs2")) # test restage to tier 0 admin_session.assert_icommand('iget ' + filename + ' - ', 'STDOUT_SINGLELINE', 'TESTFILE') @@ -1869,8 +1808,8 @@ def do_incorrect_violating_query_test(self, columns_to_select): other_resource = 'ufs1' query_attribute_name = 'irods::storage_tiering::query' custom_violating_query = '''"SELECT {} where RESC_NAME = '{}' ''' \ - '''and META_DATA_ATTR_NAME = 'irods::access_time' ''' \ - '''and META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING'"'''.format( + '''and DATA_ACCESS_TIME < 'TIME_CHECK_STRING ''' \ + '''and DATA_MODIFY_TIME < 'TIME_CHECK_STRING'"'''.format( columns_to_select, resource) try: @@ -2170,88 +2109,6 @@ def test_one_rodsuser_and_one_rodsadmin_with_own_permission_succeeds__issue_273( ) -class test_accessing_read_only_object_updates_access_time(unittest.TestCase): - @classmethod - def setUpClass(self): - self.user1 = session.mkuser_and_return_session("rodsuser", "tolstoy", "tpass", lib.get_hostname()) - - self.filename = "test_accessing_read_only_object_updates_access_time" - if not os.path.exists(self.filename): - lib.create_local_testfile(self.filename) - - self.collection_path = "/".join(["/" + self.user1.zone_name, "public_collection"]) - self.object_path = "/".join([self.collection_path, self.filename]) - - with session.make_session_for_existing_admin() as admin_session: - # Make a place for public group to put stuff. - admin_session.assert_icommand(["imkdir", "-p", self.collection_path]) - admin_session.assert_icommand(["ichmod", "-r", "own", "public", self.collection_path]) - - with storage_tiering_configured(): - # TODO(#200): Replace with itouch or istream. Have to use put API due to missing PEP support. - # For this test, we don't actually care about tiering or restaging objects. We just want to test - # updating the access_time metadata. So, it doesn't matter into what resource the object's replica goes - # This is why no tier groups are being configured in this test. - admin_session.assert_icommand(["iput", self.filename, self.object_path]) - - # Give permissions exclusively to a rodsuser (removing permissions for original owner). - admin_session.assert_icommand(["ichmod", "read", self.user1.username, self.object_path]) - admin_session.assert_icommand(["ichmod", "null", admin_session.username, self.object_path]) - - if os.path.exists(self.filename): - os.unlink(self.filename) - - @classmethod - def tearDownClass(self): - with session.make_session_for_existing_admin() as admin_session: - admin_session.run_icommand(["ichmod", "-M", "own", admin_session.username, self.object_path]) - admin_session.run_icommand(["irm", "-f", self.object_path]) - - self.user1.__exit__() - - admin_session.run_icommand(['iadmin', 'rmuser', self.user1.username]) - admin_session.run_icommand(['iadmin', 'rum']) - - def read_object_updates_access_time_test_impl(self, read_command, *read_command_args): - """A basic test implementation to show that access_time metadata is updated for reads accessing data. - - Arguments: - self - Instance of this class - read_command - A callable which will execute some sort of read operation on the object - read_command_args - *args to pass to the read_command callable - """ - with storage_tiering_configured(): - # Capture the original access time so we have something against which to compare. - access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", access_time) - - # Sleeping guarantees the access time will be different following the access. - time.sleep(2) - - # Access the data... - read_command(*read_command_args) - - # Ensure the access_time was updated as a result of the access. - new_access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", new_access_time) - self.assertGreater(new_access_time, access_time) - - def test_dataobj_open_read_close_updates_access_time__issue_175_203(self): - # This basic test shows that access_time metadata is updated when dataObjOpen/Read/Close APIs access the data. - self.read_object_updates_access_time_test_impl( - self.user1.assert_icommand, ["irods_test_read_object", self.object_path], "STDOUT") - - def test_replica_open_close_updates_access_time(self): - # This basic test shows that access_time metadata is updated when replica_open/close APIs access the data. - self.read_object_updates_access_time_test_impl( - self.user1.assert_icommand, ["istream", "read", self.object_path], "STDOUT") - - def test_get_updates_access_time(self): - # This basic test shows that access_time metadata is updated when get API accesses the data. - self.read_object_updates_access_time_test_impl( - self.user1.assert_icommand, ["iget", self.object_path, "-"], "STDOUT") - - class test_basic_tier_out_after_creating_single_data_object(unittest.TestCase): @classmethod def setUpClass(self): @@ -2436,95 +2293,3 @@ def test_data_object_with_select_in_name_tiers_out__issue_281(self): self.user1.assert_icommand(["ils", "-L", logical_path], "STDOUT") self.user1.run_icommand(["irm", "-f", logical_path]) - -class test_accessing_object_for_write_updates_access_time(unittest.TestCase): - @classmethod - def setUpClass(self): - self.user1 = session.mkuser_and_return_session("rodsuser", "tolstoy", "tpass", lib.get_hostname()) - - self.filename = "test_accessing_object_for_write_updates_access_time" - - self.collection_path = "/".join(["/" + self.user1.zone_name, "public_collection"]) - self.object_path = "/".join([self.collection_path, self.filename]) - - with session.make_session_for_existing_admin() as admin_session: - # Make a place for public group to put stuff. - admin_session.assert_icommand(["imkdir", "-p", self.collection_path]) - admin_session.assert_icommand(["ichmod", "-r", "own", "public", self.collection_path]) - - with storage_tiering_configured(): - # For this test, we don't actually care about tiering or restaging objects. We just want to test - # updating the access_time metadata. So, it doesn't matter into what resource the object's replica goes - # This is why no tier groups are being configured in this test. - admin_session.assert_icommand(["istream", "write", self.object_path], input=self.filename) - - # Give permissions exclusively to a rodsuser (removing permissions for original owner). - admin_session.assert_icommand(["ichmod", "own", self.user1.username, self.object_path]) - admin_session.assert_icommand(["ichmod", "null", admin_session.username, self.object_path]) - - if os.path.exists(self.filename): - os.unlink(self.filename) - - @classmethod - def tearDownClass(self): - with session.make_session_for_existing_admin() as admin_session: - admin_session.run_icommand(["ichmod", "-r", "-M", "own", admin_session.username, self.collection_path]) - admin_session.run_icommand(["irm", "-rf", self.collection_path]) - - self.user1.__exit__() - - admin_session.run_icommand(['iadmin', 'rmuser', self.user1.username]) - admin_session.run_icommand(['iadmin', 'rum']) - - def access_object_for_write_updates_access_time_test_impl(self, open_command, *open_command_args): - """A basic test implementation to show that access_time metadata is updated for writes accessing data. - - Arguments: - self - Instance of this class - open_command - A callable which will execute some sort of open operation on the object - open_command_args - *args to pass to the open_command callable - """ - with storage_tiering_configured(): - # Capture the original access time so we have something against which to compare. - access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", access_time) - - # Sleeping guarantees the access time will be different following the access. - time.sleep(2) - - # Access the data... - open_command(*open_command_args) - - # Ensure the access_time was updated as a result of the access. - new_access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", new_access_time) - self.assertGreater(new_access_time, access_time) - - def test_multiple_replica_opens_and_replica_closes_updates_access_time__issue_316(self): - # This test shows that access_time is updated if replica_open accesses a replica multiple times simultaneously. - self.access_object_for_write_updates_access_time_test_impl( - self.user1.assert_icommand, - ["irods_test_multi_open_for_write_object", "--open-count", "10", self.object_path], - "STDOUT") - - def test_touch_updates_access_time__issue_266(self): - # This is a basic test to show that access_time metadata is updated when touch API accesses the data. - self.access_object_for_write_updates_access_time_test_impl( - self.user1.assert_icommand, ["itouch", self.object_path], "STDOUT") - - def test_touch_collection_does_not_update_access_time__issue_266(self): - with storage_tiering_configured(): - # Capture the original access time so we have something against which to compare. - access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", access_time) - - # Sleeping guarantees the access time could be different following the access. - time.sleep(2) - - # Touch the collection (note: this is NOT accessing any data). - self.user1.assert_icommand(["itouch", self.collection_path], "STDOUT") - - # Ensure the access_time was NOT updated as a result of the access. - new_access_time = get_access_time(self.user1, self.object_path) - self.assertNotIn("CAT_NO_ROWS_FOUND", new_access_time) - self.assertEqual(new_access_time, access_time) diff --git a/src/configuration.cpp b/src/configuration.cpp index 22639ab..20f3a3a 100644 --- a/src/configuration.cpp +++ b/src/configuration.cpp @@ -30,11 +30,6 @@ namespace irods } // Override defaults with configured values. - - if (const auto attr = config->find("access_time_attribute"); attr != config->end()) { - access_time_attribute = attr->get(); - } - if (const auto attr = config->find("group_attribute"); attr != config->end()) { group_attribute = attr->get(); } diff --git a/src/main.cpp b/src/main.cpp index 8764c3c..4cf98ed 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -5,8 +5,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -31,6 +33,7 @@ // =-=-=-=-=-=-=- // stl includes #include +#include #include #include #include @@ -179,220 +182,6 @@ namespace { } // apply_data_retention_policy - void update_access_time_for_data_object(rcComm_t* _comm, - const std::string& _logical_path, - const std::string& _attribute) - { - auto ts = std::to_string(std::time(nullptr)); - modAVUMetadataInp_t avuOp{ - "set", - "-d", - const_cast(_logical_path.c_str()), - const_cast(_attribute.c_str()), - const_cast(ts.c_str()), - ""}; - const auto free_cond_input = irods::at_scope_exit{[&avuOp] { clearKeyVal(&avuOp.condInput); }}; - - addKeyVal(&avuOp.condInput, ADMIN_KW, ""); - - auto status = rcModAVUMetadata(_comm, &avuOp); - if (status < 0) { - const auto msg = fmt::format("{}: failed to set access time for [{}]", __func__, _logical_path); - log_re::error(msg); - THROW(status, msg); - } - } // update_access_time_for_data_object - - void apply_access_time_to_collection(rcComm_t* _comm, int _handle, const std::string& _attribute) - { - collEnt_t* coll_ent{nullptr}; - int err = rcReadCollection(_comm, _handle, &coll_ent); - while(err >= 0) { - if(DATA_OBJ_T == coll_ent->objType) { - const auto& vps = irods::get_virtual_path_separator(); - std::string lp{coll_ent->collName}; - lp += vps; - lp += coll_ent->dataName; - update_access_time_for_data_object(_comm, lp, _attribute); - } - else if(COLL_OBJ_T == coll_ent->objType) { - collInp_t coll_inp; - memset(&coll_inp, 0, sizeof(coll_inp)); - rstrcpy( - coll_inp.collName, - coll_ent->collName, - MAX_NAME_LEN); - int handle = rcOpenCollection(_comm, &coll_inp); - apply_access_time_to_collection(_comm, handle, _attribute); - rcCloseCollection(_comm, handle); - } - - err = rcReadCollection(_comm, _handle, &coll_ent); - } // while - } // apply_access_time_to_collection - - void set_access_time_metadata( - rsComm_t* _comm, - const std::string& _object_path, - const std::string& _collection_type, - const std::string& _attribute) { - irods::experimental::client_connection conn; - RcComm& comm = static_cast(conn); - if(_collection_type.size() == 0) { - update_access_time_for_data_object(&comm, _object_path, _attribute); - } - else { - // register a collection - collInp_t coll_inp; - memset(&coll_inp, 0, sizeof(coll_inp)); - rstrcpy( - coll_inp.collName, - _object_path.c_str(), - MAX_NAME_LEN); - int handle = rcOpenCollection(&comm, &coll_inp); - if(handle < 0) { - THROW( - handle, - boost::format("failed to open collection [%s]") % - _object_path); - } - - apply_access_time_to_collection(&comm, handle, _attribute); - } - } // set_access_time_metadata - - void apply_access_time_policy( - const std::string& _rn, - ruleExecInfo_t* _rei, - const std::list& _args) { - namespace fs = irods::experimental::filesystem; - - try { - if ("pep_api_data_obj_put_post" == _rn || "pep_api_data_obj_get_post" == _rn || - "pep_api_data_obj_repl_post" == _rn || "pep_api_phy_path_reg_post" == _rn) - { - auto it = _args.begin(); - std::advance(it, 2); - if(_args.end() == it) { - rodsLog(LOG_ERROR, "%s:%d: Invalid number of arguments [PEP=%s].", __func__, __LINE__, _rn.c_str()); - THROW( - SYS_INVALID_INPUT_PARAM, - "invalid number of arguments"); - } - - auto obj_inp = boost::any_cast(*it); - const char* coll_type_ptr = getValByKey(&obj_inp->condInput, COLLECTION_KW); - - std::string object_path{obj_inp->objPath}; - std::string coll_type{}; - if(coll_type_ptr) { - coll_type = "true"; - } - - set_access_time_metadata(_rei->rsComm, object_path, coll_type, config->access_time_attribute); - } - else if ("pep_api_touch_post" == _rn) { - auto it = _args.begin(); - std::advance(it, 2); - if (_args.end() == it) { - log_re::error("{}:{}: Invalid number of arguments [PEP={}].", __func__, __LINE__, _rn.c_str()); - THROW(SYS_INVALID_INPUT_PARAM, "invalid number of arguments"); - } - - const auto* inp = boost::any_cast(*it); - const auto json_input = nlohmann::json::parse(std::string_view(static_cast(inp->buf), inp->len)); - - const auto& object_path = json_input.at("logical_path").get_ref(); - - // The touch only affects the collection itself and does not access the objects or collections within. - // Therefore, no access_time update occurs if the touch was on a collection. Just return early. - if (fs::server::is_collection(*_rei->rsComm, fs::path{object_path})) { - return; - } - - set_access_time_metadata(_rei->rsComm, object_path, "", config->access_time_attribute); - } - else if ("pep_api_data_obj_open_post" == _rn || "pep_api_data_obj_create_post" == _rn || - "pep_api_replica_open_post" == _rn) - { - auto it = _args.begin(); - std::advance(it, 2); - if(_args.end() == it) { - THROW( - SYS_INVALID_INPUT_PARAM, - "invalid number of arguments"); - } - - auto obj_inp = boost::any_cast(*it); - int l1_idx{}; - std::string resource_name; - try { - auto [l1_idx, resource_name] = get_index_and_resource(obj_inp); - opened_objects[l1_idx] = std::make_tuple(obj_inp->objPath, resource_name); - } - catch(const irods::exception& _e) { - rodsLog( - LOG_ERROR, - "get_index_and_resource failed for [%s]", - obj_inp->objPath); - } - } - else if("pep_api_data_obj_close_post" == _rn) { - //TODO :: only for create/write events - auto it = _args.begin(); - std::advance(it, 2); - if(_args.end() == it) { - THROW( - SYS_INVALID_INPUT_PARAM, - "invalid number of arguments"); - } - - const auto opened_inp = boost::any_cast(*it); - const auto l1_idx = opened_inp->l1descInx; - if(opened_objects.find(l1_idx) != opened_objects.end()) { - auto [object_path, resource_name] = opened_objects[l1_idx]; - - set_access_time_metadata(_rei->rsComm, object_path, "", config->access_time_attribute); - } - } - else if ("pep_api_replica_close_post" == _rn) { - auto it = _args.begin(); - std::advance(it, 2); - if (_args.end() == it) { - THROW(SYS_INVALID_INPUT_PARAM, "invalid number of arguments"); - } - - const auto* inp = boost::any_cast(*it); - const auto json_input = nlohmann::json::parse(std::string_view(static_cast(inp->buf), inp->len)); - - // replica_close can be called multiple times on the same data object when a parallel transfer is being - // executed. Only one of these replica_close calls will finalize the status of the data object. The - // finalizing occurs by default, so the caller must provide the "update_status" member with a value of - // false in order to not finalize the data object. If no such member is found or it is not false, then - // we update the access_time. Else, we do not want to update the access_time for each replica_close - // call, so just return early if this is found. - if (const auto update_status_iter = json_input.find("update_status"); - json_input.end() != update_status_iter) { - if (const auto update_status = update_status_iter->get(); !update_status) { - return; - } - } - - const auto l1_idx = json_input.at("fd").get(); - const auto opened_objects_iter = opened_objects.find(l1_idx); - if (opened_objects_iter != opened_objects.end()) { - auto [object_path, resource_name] = std::get<1>(*opened_objects_iter); - set_access_time_metadata(_rei->rsComm, object_path, "", config->access_time_attribute); - } - } - } catch( const boost::bad_any_cast&) { - // do nothing - no object to annotate - } - catch (const nlohmann::json::exception& e) { - THROW(SYS_LIBRARY_ERROR, fmt::format("{}: JSON exception caught: {}", __func__, e.what())); - } - } // apply_access_time_policy - int apply_data_movement_policy( rcComm_t* _comm, const std::string& _instance_name, @@ -640,7 +429,6 @@ irods::error exec_rule( } try { - apply_access_time_policy(_rn, rei, _args); apply_restage_movement_policy(_rn, rei, _args); } catch(const std::invalid_argument& _e) { diff --git a/src/storage_tiering.cpp b/src/storage_tiering.cpp index d2792dc..9c92a33 100644 --- a/src/storage_tiering.cpp +++ b/src/storage_tiering.cpp @@ -447,7 +447,9 @@ namespace irods { config_.time_attribute, _resource_name); std::time_t offset = boost::lexical_cast(offset_str); - return std::to_string(now - offset); + // Zero-pad to 11 characters to match iRODS DATA_ACCESS_TIME format (getNowStr()). + // This ensures string-based GenQuery comparisons work correctly. + return fmt::format("{:011d}", now - offset); } catch(const boost::bad_lexical_cast& _e) { THROW( @@ -480,20 +482,15 @@ namespace irods { for(auto& q_itr : results) { auto& query_string = q_itr.first; auto& query_type_str = q_itr.second; - size_t start_pos = query_string.find(config_.time_check_string); - if(start_pos != std::string::npos) { - query_string.replace( - start_pos, - config_.time_check_string.length(), - tier_time); - } - rodsLog( - config_.data_transfer_log_level_value, - "custom query for [%s] - [%s], [%s]", - _resource_name.c_str(), - query_string.c_str(), - query_type_str.c_str()); + // replace all occurrences of time_check_string + boost::replace_all(query_string, config_.time_check_string, tier_time); + + rodsLog(config_.data_transfer_log_level_value, + "custom query for [%s] - [%s], [%s]", + _resource_name.c_str(), + query_string.c_str(), + query_type_str.c_str()); } // for return results; @@ -502,13 +499,11 @@ namespace irods { const auto leaf_str = get_leaf_resources_string(_resource_name); metadata_results results; results.push_back(std::make_pair( - fmt::format( - "select DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where META_DATA_ATTR_NAME = '{}' " - "and META_DATA_ATTR_VALUE < '{}' and META_DATA_ATTR_UNITS <> '{}' and DATA_RESC_ID in ({})", - config_.access_time_attribute, - tier_time, - config_.migration_scheduled_flag, - leaf_str), + fmt::format("select DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM where " + "DATA_ACCESS_TIME < '{}' and DATA_MODIFY_TIME < '{}' and DATA_RESC_ID in ({})", + tier_time, + tier_time, + leaf_str), "")); rodsLog( config_.data_transfer_log_level_value, @@ -975,18 +970,12 @@ namespace irods { void storage_tiering::set_migration_metadata_flag_for_object( rcComm_t* _comm, const std::string& _object_path) { - auto access_time = get_metadata_for_data_object( - _comm, - config_.access_time_attribute, - _object_path); - - modAVUMetadataInp_t set_op{ - "set", - "-d", - const_cast(_object_path.c_str()), - const_cast(config_.access_time_attribute.c_str()), - const_cast(access_time.c_str()), - const_cast(config_.migration_scheduled_flag.c_str())}; + modAVUMetadataInp_t set_op{"set", + "-d", + const_cast(_object_path.c_str()), + const_cast(config_.migration_scheduled_flag.c_str()), + "1", // the value is not important, but must match the unset/rm operation + ""}; addKeyVal(&set_op.condInput, ADMIN_KW, ""); @@ -1000,17 +989,12 @@ namespace irods { void storage_tiering::unset_migration_metadata_flag_for_object( rcComm_t* _comm, const std::string& _object_path) { - auto access_time = get_metadata_for_data_object( - _comm, - config_.access_time_attribute, - _object_path); - modAVUMetadataInp_t set_op{ - "set", - "-d", - const_cast(_object_path.c_str()), - const_cast(config_.access_time_attribute.c_str()), - const_cast(access_time.c_str()), - nullptr}; + modAVUMetadataInp_t set_op{"rm", + "-d", + const_cast(_object_path.c_str()), + const_cast(config_.migration_scheduled_flag.c_str()), + "1", // the value is not important, but must match the set operation + ""}; addKeyVal(&set_op.condInput, ADMIN_KW, ""); @@ -1029,12 +1013,12 @@ namespace irods { std::string coll_name = p.parent_path().string(); std::string data_name = p.filename().string(); + // just checks the presence of the attribute/flag - not the value const auto query_str = fmt::format("select META_DATA_ATTR_VALUE where META_DATA_ATTR_NAME = '{}' and " - "META_DATA_ATTR_UNITS = '{}' and DATA_NAME = '{}' and COLL_NAME = '{}'", - config_.access_time_attribute, + "COLL_NAME = '{}' and DATA_NAME = '{}'", config_.migration_scheduled_flag, - data_name, - coll_name); + coll_name, + data_name); query qobj{_comm, query_str, 1}; return qobj.size() > 0;