Skip to content

[#746] establish default order for replicas listed by an iRODSDataObject#815

Open
d-w-moore wants to merge 7 commits intoirods:mainfrom
d-w-moore:746.m
Open

[#746] establish default order for replicas listed by an iRODSDataObject#815
d-w-moore wants to merge 7 commits intoirods:mainfrom
d-w-moore:746.m

Conversation

@d-w-moore
Copy link
Copy Markdown
Collaborator

@d-w-moore d-w-moore commented Apr 15, 2026

The parent data object's modify_time and replica_status fields , as well as some others, actually pertain more to individual replicas.

#747 was an old PR meant to address the issue and contains much discussion as well.

On consideration, I think a minor release is the proper place to address this, and I'm doing it by

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.
  • deciding for the time being not to deprecate anything. yet. To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

So, this PR replaces the old one, #747 , due to being new work and being based on top of source code conveniently ruff-formatted.

@korydraughn
Copy link
Copy Markdown
Contributor

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural.
I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

@korydraughn - I'm fine with changing the default order back to sorting on replica number for this minor release, even if it will allow attributes such as dataObject.modify_time to continue to misrepresent the "information advertised" .... It's but a minor code change to allow the application writer to sort differently if they so desire....

Comment on lines +3002 to +3008
# Ensure that one of the replicas is stale, to test proper sorting.
with data.open('a', **{kw.RESC_NAME_KW: newResc1}) as f:
f.write(b'.')
time.sleep(2)

# Voting should ensure exactly two good replicas of the three.
data.replicate(resource=newResc2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assertion which proves a replica is stale.

Comment thread irods/data_object.py
self.manager = manager
if parent and results:
self.collection = parent
results = sorted(results, key=(replica_sort_function or _DEFAULT_SORT_KEY_FN))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way to support this in a minor release is to provide an opt-in which changes the default behavior.

test_put__issue_722(self)

def test_default_sorting_of_replicas__issue_647(self):
@unittest.skipIf(irods.version.version_as_tuple() < (4,), 'too soon for this test.')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing the message to something like the following.

Relies on backward incompatible changes. Disabled until PRC 4


def test_default_sorting_of_replicas__issue_647(self):
@unittest.skipIf(irods.version.version_as_tuple() < (4,), 'too soon for this test.')
def test_modified_default_sorting_of_replicas__issue_647(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need another test for the sorter option?

Alternatively, you can change the behavior of the test such that it covers PRC 3 and PRC 4. For example:

if irods.version.version_as_tuple() < (4,):
    data = self.sess.data_objects.get(data.path, sorter=<fn>)
else:
    data = self.sess.data_objects.get(data.path)

Doing that implies the name of the test would need to change as well.

@korydraughn
Copy link
Copy Markdown
Contributor

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

Oh right. That still sounds like an acceptable approach.

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural. I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

I'm not yet convinced that is the proper approach. Feels like it should be handled via support functions which simplify the find-replica step.

Do instances of iRODSDataObject always have the list of replicas? If so, then they can sort/search the list of replicas for what they need. Perhaps that's how the iRODSDataObject constructor works in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants