Skip to content

DurableOrchestrationClient.get_status never completes #602

@joepatol

Description

@joepatol

🐛 Describe the bug
I am using a custom get_status endpoint in my function app which returns the status of an orchestration. The trigger fetches the status using the Python SDK, does some modifications and returns it.

I have observed that the call to this endpoint sometimes never completes, it hangs for 1 hour (the configured function timeout) and restarts the instance.

@app.route("status/{instance_id}", methods=["GET"])
@app.durable_client_input(client_name="client")
@handle_client_errors
@require_auth
async def get_status(
    req: func.HttpRequest,
    client: durable_func.DurableOrchestrationClient,
) -> func.HttpResponse:
    instance_id = req.route_params.get("instance_id")
    logger.info(f"Fetching status for orchestration with ID = '{instance_id}'.")
    if not instance_id:
        return func.HttpResponse(
            status_code=400, body=json.dumps({"error": "Instance ID is required"})
        )

    try:
        status = await asyncio.wait_for(client.get_status(instance_id), timeout=10)
    except TimeoutError as exc:
        logger.error(
            f"Timeout while fetching orchestration status for instance ID = '{instance_id}'.",
            exc_info=exc,
        )
        raise HttpError(
            "Timeout while fetching orchestration status",
            status_code=504,
        ) from exc
    if not status:
        return func.HttpResponse(
            status_code=404, body=json.dumps({"error": "Orchestration not found"})
        )

    logger.info("Successfully fetched orchestration status, creating response")
    response = create_orchestration_status_response(status)
    status_code = 202 if response["runtimeStatus"] in ["Pending", "Running"] else 200

    return func.HttpResponse(
        status_code=status_code,
        body=json.dumps(response),
        headers={"Content-Type": "application/json"},
    )

In app insights I can see the 'Fetching status' log, after that nothing happens. I do observe other calls to the endpoint (for the same and different orchestrations) that do complete successfully.

🤔 Expected behavior
The get_status task should resolve or get cancelled. Maybe adding a timeout param to get_status could help?

Steps to reproduce

  • Create a get_status endpoint like above
  • start an orchestration and return it's status query uri
  • Poll the status uri

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

  • Timeframe issue observed: Past 2 weeks
  • Orchestration instance ID(s): 8df92d4c9f174a3c9a7ba981a67120e1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions