feat: Rerouted ReadRows to data client#1299
Conversation
Summary of ChangesHello @gkevinzheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Bigtable client's row reading mechanism by migrating the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a nice refactoring that reroutes read_row and read_rows to use the new data client. This simplifies the code by removing the legacy chunk processing and retry logic. The changes look good overall, but I've found one potential issue in the cancel method of PartialRowsData which might lead to resource leaks.
google/cloud/bigtable/row_data.py
Outdated
| or the | ||
| :meth:`~google.api_core.retry.Retry.with_deadline` method. | ||
| :type generator: :class:`Iterable[Row]` | ||
| :param generator: The `Row` iterator from :meth:`Table.read_rows` |
There was a problem hiding this comment.
You should add a note that this is not intended to be created directly
google/cloud/bigtable/row_data.py
Outdated
| self.rows = {} | ||
|
|
||
| # Flag to stop iteration, for any reason not related to self.retry() | ||
| self._cancelled = False |
There was a problem hiding this comment.
is _cancelled still in use?
There was a problem hiding this comment.
Removed self._cancelled.
| self.response_iterator.cancel() | ||
| self._generator.close() | ||
|
|
||
| def consume_all(self, max_loops=None): |
There was a problem hiding this comment.
it looks like max_loops isnt used?
There was a problem hiding this comment.
I'm not sure what to do here because it wasn't used in the original implementation of the function either, so I assume it was there to prevent a breaking change. Now that we are able to make a breaking change, I should remove it, right?
| # are not user visible, so we just use the raw protos for merging. | ||
| return data_messages_v2_pb2.ReadRowsResponse.pb(resp_protoplus) | ||
|
|
||
| def __iter__(self): |
There was a problem hiding this comment.
What happens if you try to iterate over this multiple times? I assume since the inner generator is exhausted, it would just yield nothing? Is that similar to the old implementation?
Also, what do we expect to happen if we keep iterating after cancellation?
There was a problem hiding this comment.
I believe that in the original, if we keep iterating after cancellation, the while not self._cancelled part will return nothing or at most 1 item. I'll look into what happens when we iterate over the same PartialRowsData object multiple times.
There was a problem hiding this comment.
As I previously suspected, iterating over the same PartialRowsData will yield nothing the second time around in the old implementation as well. The following system test passes when I test an older branch of v3_staging:
def test_table_read_rows_multiple_reads(
data_table_read_rows_retry_tests,
):
from types import SimpleNamespace
rows_data = data_table_read_rows_retry_tests.read_rows()
first_iteration = SimpleNamespace()
first_iteration.rows = {}
second_iteration = SimpleNamespace()
second_iteration.rows = {}
for item in rows_data:
first_iteration.rows[item.row_key] = item
for item in rows_data:
second_iteration.rows[item.row_key] = item
_assert_data_table_read_rows_retry_correct(first_iteration)
assert second_iteration.rows == {}
I will add this to this branch to test that the same behavior occurs.
Changes Made:
RowandCellobjects in the data client toPartialRowDataandCellobjects in the legacy client.ReadRowResponsechunks and testingReadRowResponsechunks._update_message_requestfromRowSetbecause it's no longer needed to create aReadRowQueryread_rowandread_rowsto use their data client counterparts intable.py.