Skip to content

Significant slowdown in SearchClient.upload_documents(...) for large payloads #46860

@msilvestrixdatanet

Description

@msilvestrixdatanet
  • Package Name: azure-search-documents
  • Package Version: 12.0.0
  • Operating System: Windows 11/Ubuntu 24.04
  • Python Version: 3.12

Describe the bug
After upgrading azure-search-documents from 11.6.0 to 12.0.0, we observe a significant slowdown in SearchClient.upload_documents(...) for large payloads (hundreds of docs, vector field dimension 3072).
A similar issue happens when invoking SearchClient.merge_documents(...).
The regression appears client-side, before HTTP/network becomes dominant.

To Reproduce

  1. Create two clean virtual environments:

    • one with azure-search-documents==11.6.0
    • one with azure-search-documents==12.0.0
  2. Use the same Azure AI Search service and the same existing index for both runs.

    • The index must contain a vector field (e.g. content_vector) with dimension 3072.
    • Keep all index settings identical across runs.
  3. Prepare a synthetic payload with large vector-heavy documents:

    • hundreds or even thousands of documents per run (e.g. 1000)
    • each document includes:
      • key/id
      • text field (thousands of chars)
      • vector field with 3072 floats
      • a few metadata fields
  4. Run the benchmark serially (not in parallel), alternating versions:

    • run 1 with 11.6.0
    • run 2 with 12.0.0
  5. For each run, measure:

    • wall-clock time of SearchClient.upload_documents(documents=payload)

Expected behavior
12.0.0 should not introduce a major regression compared to 11.6.0 for typical bulk upload workloads with large vector fields.

Additional context
In 12.0.0, IndexDocumentsBatch._extend_batch builds each action as:

action_dict = {"@search.action": action_type}
action_dict.update(doc)
action = IndexAction(action_dict)

This goes through model conversion/serialization paths (Model.__init__, _create_value, _serialize) for each document and recursively for nested structures (including large vectors).

Likely files/methods:

  • azure/search/documents/models/_patch.py (IndexDocumentsBatch._extend_batch)
  • azure/search/documents/_utils/model_base.py (Model.__init__, _serialize)

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.SearchService AttentionWorkflow: This issue is responsible by Azure service team.bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

Status

Untriaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions