add parallel blob object upload/4684 by HemangChothani · Pull Request #6 · HemangChothani/google-cloud-python

HemangChothani · 2019-06-07T07:26:39Z

issue [4684]

mf2199 · 2019-06-25T11:49:44Z

storage/google/cloud/storage/blob.py

+    def upload_parallel(
+        self, path, content_type=None, client=None, predefined_acl=None
+    ):
+        """Upload this blob's contents parallel from the content of file in directory.


"Upload this blob's contents in parallel, from the contents of a file in the directory."

mf2199 · 2019-06-25T11:51:10Z

storage/google/cloud/storage/blob.py

+    ):
+        """Upload this blob's contents parallel from the content of file in directory.
+
+        The content type of the upload will be determined in order


"The type of the uploaded content will be determined in the order"

mf2199 · 2019-06-25T12:18:21Z

storage/google/cloud/storage/blob.py

+            thread = threading.Thread(
+                target=self._upload_from_list,
+                args=(files_list, total_files, content_type, client, predefined_acl),
+            )


This doesn't seem right. Here you are creating multiple threads that essentially try to upload the same set of files. That is, _upload_from_list uses the same files_list over and over again. There should be distribution of individual file uploads among separate threads, not duplicating the uploading procedure.

I have done the changes which doesn't pass files_list and total_files with all threads , but it just passes the files_list with all thread as a argument , but self._files_list[self._file_count] line doesn't allow to upload duplicate files because it increment the count with every upload and took the file from list from particular index.

How about racing? The index is updated after the upload. Therefore we cannot eliminate the possibility that while a file is being uploaded by a thread, another thread starts uploading the same file. This can also lead to out-of-range exception.

mf2199 · 2019-06-26T13:31:50Z

storage/google/cloud/storage/blob.py

+                content_type,
+                client,
+                predefined_acl,
+            )


Now there is a different kind of a problem. If the upload fails, the counter is still increased. There should be some sort of a retry mechanism to recover from such state, or an index tracking, to handle each file individually.

I think in the Storage module they didn't apply mechanism of retry, that's why they have a different task in git to implement retry mechanism in storage.
Please refer issue [7907].

Regardless of that, the possibility of racing should be eliminated.

mf2199 suggested changes Jun 25, 2019

View reviewed changes

mf2199 suggested changes Jun 26, 2019

View reviewed changes

HemangChothani added 3 commits June 27, 2019 14:38

add parallel blob object upload

2686672

change file variables scope

e300afd

changes to avoid race condition

42f1d9e

HemangChothani force-pushed the feature/storage_parallel_operations_copying_objects branch from 213e374 to 42f1d9e Compare June 27, 2019 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add parallel blob object upload/4684#6

add parallel blob object upload/4684#6
HemangChothani wants to merge 3 commits intomasterfrom
feature/storage_parallel_operations_copying_objects

HemangChothani commented Jun 7, 2019

Uh oh!

mf2199 Jun 25, 2019

Uh oh!

mf2199 Jun 25, 2019

Uh oh!

mf2199 Jun 25, 2019

Uh oh!

HemangChothani Jun 26, 2019

Uh oh!

mf2199 Jun 26, 2019

Uh oh!

mf2199 Jun 26, 2019

Uh oh!

HemangChothani Jun 27, 2019

Uh oh!

mf2199 Jun 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HemangChothani commented Jun 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants