[Bug]:Improve tunnel reliability with retries, validation, and failure visibility (closes #432)#433
[Bug]:Improve tunnel reliability with retries, validation, and failure visibility (closes #432)#433janhavitupe wants to merge 2 commits intoGetBindu:mainfrom
Conversation
📝 Walkthrough🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindu/penguin/bindufy.py`:
- Around line 653-655: The tunnel creation currently calls _setup_tunnel (which
performs live httpx.get() checks) before start_uvicorn_server, causing premature
health checks; change the flow so _setup_tunnel only establishes/returns the
tunnel endpoint without doing live HTTP probes (e.g., add a parameter
validate=False or split out a new validate_tunnel(tunnel_url) function), then
call the validation step only after start_uvicorn_server has started accepting
requests (or when run_server is False but launch=True handle validation
accordingly); update callers around the existing _setup_tunnel call and the code
paths noted (including the block covering lines 657-686) to perform validation
after server start and respect fail_on_tunnel_error semantics.
- Around line 211-213: The three module-level constants (_TUNNEL_MAX_ATTEMPTS,
_TUNNEL_BASE_BACKOFF, _TUNNEL_HEALTH_TIMEOUT) must be removed and read from
bindu.settings.app_settings instead; add corresponding settings (e.g.
TUNNEL_MAX_ATTEMPTS, TUNNEL_BASE_BACKOFF, TUNNEL_HEALTH_TIMEOUT) into
app_settings with the current defaults (3, 1, 3) and then replace uses of the
constants in bindufy.py with app_settings.TUNNEL_MAX_ATTEMPTS,
app_settings.TUNNEL_BASE_BACKOFF, and app_settings.TUNNEL_HEALTH_TIMEOUT (import
app_settings at top if not already present) so environments can override them
via settings.
In `@bindu/utils/display.py`:
- Around line 149-152: The warning text in bindu/utils/display.py is misleading:
update the message appended to warning_lines (the two append calls shown) to
instruct users they can "continue locally after tunnel failure" by passing
fail_on_tunnel_error=False to bindufy(), and update the matching log banner
elsewhere in the module to use the same phrasing; locate uses of
warning_lines.append and the log banner string in this file and replace "to
continue locally without this warning" / "removes the warning" wording with "to
continue locally after tunnel failure" (keep the reference to bindufy() and
fail_on_tunnel_error as-is).
In `@tests/unit/penguin/test_setup_tunnel.py`:
- Line 30: The patch decorators in tests currently target the module-local
symbol "bindu.penguin.bindufy.TunnelManager" but _setup_tunnel() imports
TunnelManager inside the function with "from bindu.tunneling.manager import
TunnelManager", so update the patch targets in the four decorators to
"bindu.tunneling.manager.TunnelManager" (apply the change where
`@patch`("bindu.penguin.bindufy.TunnelManager") appears in the four test methods)
so the function-local import is intercepted correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0a8b5a2-4e35-454e-8af5-bd3c318e5ea8
📒 Files selected for processing (3)
bindu/penguin/bindufy.pybindu/utils/display.pytests/unit/penguin/test_setup_tunnel.py
| # Create tunnel if enabled | ||
| tunnel_url = _setup_tunnel(tunnel_config, port, _manifest, bindu_app) | ||
| tunnel_url, tunnel_failure_reason = _setup_tunnel(tunnel_config, port, _manifest, bindu_app, fail_on_tunnel_error) | ||
|
|
There was a problem hiding this comment.
Tunnel validation runs before the server exists.
_setup_tunnel() now performs live httpx.get() checks, but this call still happens before start_uvicorn_server() is invoked. With launch=True, the health check is probing a server that isn't listening yet, so otherwise-valid tunnels will be retried as failures and the default fail_on_tunnel_error=True path will abort startup. This also makes run_server=False + launch=True unsatisfiable. Split tunnel creation from validation, or defer validation until the HTTP server is accepting requests.
Also applies to: 657-686
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bindu/penguin/bindufy.py` around lines 653 - 655, The tunnel creation
currently calls _setup_tunnel (which performs live httpx.get() checks) before
start_uvicorn_server, causing premature health checks; change the flow so
_setup_tunnel only establishes/returns the tunnel endpoint without doing live
HTTP probes (e.g., add a parameter validate=False or split out a new
validate_tunnel(tunnel_url) function), then call the validation step only after
start_uvicorn_server has started accepting requests (or when run_server is False
but launch=True handle validation accordingly); update callers around the
existing _setup_tunnel call and the code paths noted (including the block
covering lines 657-686) to perform validation after server start and respect
fail_on_tunnel_error semantics.
…heck timing, update messaging, and correct test patching
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindu/penguin/bindufy.py`:
- Around line 220-223: _bindufy_core() currently never invokes
_validate_tunnel_health() after _setup_tunnel(), so a dead or 503 tunnel can be
treated as public; update _bindufy_core() to call _validate_tunnel_health() once
the HTTP server reports readiness and before calling prepare_server_display() or
returning the tunnel URL, and ensure that any TunnelError or retry logic from
_validate_tunnel_health() is handled (falling back to setting tunnel_url=None
and tunnel_failure_reason) so prepare_server_display() will render the warning
panel; apply the same insertion point for the other similar paths that call
_setup_tunnel() (the blocks referenced around the other ranges) so health
validation always runs after server readiness and before exposing the tunnel
URL.
- Around line 256-286: The try-except currently covers both tunnel allocation
and the post-allocation mutations, causing a new tunnel to be created if a
mutation (e.g., setting manifest.url or bindu_app.url or resetting
bindu_app._agent_card_json_schema) fails; refactor so
tunnel_manager.create_tunnel(...) is the only call inside the retry loop (the
block that retries on transient create failures), then after a successful
create_tunnel returns (tunnel_url) perform the mutations (manifest.url =
tunnel_url, bindu_app.url = tunnel_url, bindu_app._agent_card_json_schema =
None) outside that retry loop or inside a separate try-except that does not
re-enter the create_tunnel retry logic, and ensure only create_tunnel failures
trigger the exponential backoff / retry logic.
In `@bindu/settings.py`:
- Around line 151-165: The three tunnel-related settings max_attempts,
base_backoff_seconds, and health_check_timeout_seconds must be validated at
settings parse time: enforce max_attempts >= 1, base_backoff_seconds >= 0, and
health_check_timeout_seconds > 0 so bad env values raise immediately. Fix by
adding Pydantic constraints/validators on those fields in the settings model
(either use conint/ge/gt or add `@validator` methods for the model) to validate
the values and raise a clear ValidationError during parsing; update the
Field(...) declarations for max_attempts, base_backoff_seconds, and
health_check_timeout_seconds (or add validators named for those fields)
accordingly so invalid config is rejected early.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41c17b52-b240-499a-b93e-8026863f161f
📒 Files selected for processing (4)
bindu/penguin/bindufy.pybindu/settings.pybindu/utils/display.pytests/unit/penguin/test_setup_tunnel.py
🚧 Files skipped from review as they are similar to previous changes (2)
- bindu/utils/display.py
- tests/unit/penguin/test_setup_tunnel.py
| Retries tunnel creation up to app_settings.tunnel.max_attempts times with | ||
| exponential backoff before giving up. Health validation is intentionally | ||
| deferred — it is performed separately after the HTTP server is accepting | ||
| requests (see _validate_tunnel_health). |
There was a problem hiding this comment.
The deferred health check never actually runs.
The docstring says validation happens later, but _bindufy_core() never calls _validate_tunnel_health() after _setup_tunnel() returns. An unreachable or 503 tunnel is therefore still passed to prepare_server_display() and treated as public, so health-check-triggered retries, TunnelError, and the fallback warning never execute. Downstream, prepare_server_display() only renders the warning panel when tunnel_url is None and tunnel_failure_reason is set (bindu/utils/display.py:136-161), so this path still presents the agent as publicly reachable. This also makes run_server=False + launch=True look successful even though nothing is listening behind the tunnel. Call the validator after the HTTP server reaches readiness and before displaying or returning the public URL.
Also applies to: 308-333, 651-685
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bindu/penguin/bindufy.py` around lines 220 - 223, _bindufy_core() currently
never invokes _validate_tunnel_health() after _setup_tunnel(), so a dead or 503
tunnel can be treated as public; update _bindufy_core() to call
_validate_tunnel_health() once the HTTP server reports readiness and before
calling prepare_server_display() or returning the tunnel URL, and ensure that
any TunnelError or retry logic from _validate_tunnel_health() is handled
(falling back to setting tunnel_url=None and tunnel_failure_reason) so
prepare_server_display() will render the warning panel; apply the same insertion
point for the other similar paths that call _setup_tunnel() (the blocks
referenced around the other ranges) so health validation always runs after
server readiness and before exposing the tunnel URL.
| for attempt in range(1, max_attempts + 1): | ||
| try: | ||
| tunnel_url = tunnel_manager.create_tunnel( | ||
| local_port=port, | ||
| config=tunnel_config, | ||
| subdomain=tunnel_config.subdomain, | ||
| ) | ||
| logger.info( | ||
| f"✅ Tunnel created (attempt {attempt}/{max_attempts}): {tunnel_url}" | ||
| ) | ||
|
|
||
| manifest.url = tunnel_url | ||
| bindu_app.url = tunnel_url | ||
| bindu_app._agent_card_json_schema = None | ||
|
|
||
| return tunnel_url, None | ||
|
|
||
| except Exception as e: | ||
| failure_reason = str(e) | ||
| if attempt < max_attempts: | ||
| backoff = base_backoff * (2 ** (attempt - 1)) | ||
| logger.info( | ||
| f"Tunnel attempt {attempt}/{max_attempts} failed " | ||
| f"({failure_reason}). Retrying in {backoff}s..." | ||
| ) | ||
| time.sleep(backoff) | ||
| else: | ||
| logger.error( | ||
| f"Tunnel attempt {attempt}/{max_attempts} failed " | ||
| f"({failure_reason}). All retries exhausted." | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "bindufy.py" -type fRepository: GetBindu/Bindu
Length of output: 84
🏁 Script executed:
cat -n ./bindu/penguin/bindufy.py | sed -n '250,290p'Repository: GetBindu/Bindu
Length of output: 1797
🏁 Script executed:
cat -n ./bindu/penguin/bindufy.py | sed -n '200,260p'Repository: GetBindu/Bindu
Length of output: 2482
🏁 Script executed:
rg -t py "class AgentManifest" --max-count=5Repository: GetBindu/Bindu
Length of output: 101
🏁 Script executed:
cat -n ./bindu/common/models.py | grep -A 50 "class AgentManifest:"Repository: GetBindu/Bindu
Length of output: 2076
🏁 Script executed:
cat -n ./bindu/common/models.py | sed -n '145,175p'Repository: GetBindu/Bindu
Length of output: 1104
Restructure exception handling to avoid creating duplicate tunnels.
The current code wraps both TunnelManager.create_tunnel() and the subsequent manifest/app mutations in the same try-except block. If the tunnel is successfully allocated but a mutation fails (e.g., manifest.url = tunnel_url), the retry loop creates another tunnel and misattributes a local bug to a transient tunnel failure. Move the mutations outside the retry block or into a separate try-except that doesn't trigger another tunnel creation.
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 273-273: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bindu/penguin/bindufy.py` around lines 256 - 286, The try-except currently
covers both tunnel allocation and the post-allocation mutations, causing a new
tunnel to be created if a mutation (e.g., setting manifest.url or bindu_app.url
or resetting bindu_app._agent_card_json_schema) fails; refactor so
tunnel_manager.create_tunnel(...) is the only call inside the retry loop (the
block that retries on transient create failures), then after a successful
create_tunnel returns (tunnel_url) perform the mutations (manifest.url =
tunnel_url, bindu_app.url = tunnel_url, bindu_app._agent_card_json_schema =
None) outside that retry loop or inside a separate try-except that does not
re-enter the create_tunnel retry logic, and ensure only create_tunnel failures
trigger the exponential backoff / retry logic.
| max_attempts: int = Field( | ||
| default=3, | ||
| validation_alias=AliasChoices("TUNNEL__MAX_ATTEMPTS"), | ||
| description="Maximum tunnel creation attempts before giving up", | ||
| ) | ||
| base_backoff_seconds: int = Field( | ||
| default=1, | ||
| validation_alias=AliasChoices("TUNNEL__BASE_BACKOFF_SECONDS"), | ||
| description="Base backoff in seconds; doubles each retry (1s, 2s, 4s)", | ||
| ) | ||
| health_check_timeout_seconds: int = Field( | ||
| default=3, | ||
| validation_alias=AliasChoices("TUNNEL__HEALTH_CHECK_TIMEOUT_SECONDS"), | ||
| description="Timeout in seconds for the post-creation health check request", | ||
| ) |
There was a problem hiding this comment.
Validate the retry knobs when settings load.
These values come from env, but invalid inputs fail late inside the tunnel path: max_attempts=0 skips all attempts, base_backoff_seconds<0 makes time.sleep() raise, and health_check_timeout_seconds<=0 turns the health check into a config/runtime error instead of a clean tunnel failure. Add bounds here so bad config is rejected during settings parsing.
Suggested bounds
max_attempts: int = Field(
default=3,
+ ge=1,
validation_alias=AliasChoices("TUNNEL__MAX_ATTEMPTS"),
description="Maximum tunnel creation attempts before giving up",
)
base_backoff_seconds: int = Field(
default=1,
+ ge=0,
validation_alias=AliasChoices("TUNNEL__BASE_BACKOFF_SECONDS"),
description="Base backoff in seconds; doubles each retry (1s, 2s, 4s)",
)
health_check_timeout_seconds: int = Field(
default=3,
+ gt=0,
validation_alias=AliasChoices("TUNNEL__HEALTH_CHECK_TIMEOUT_SECONDS"),
description="Timeout in seconds for the post-creation health check request",
)As per coding guidelines, bindu/**/*.py: "Validate all external input and use type hints for input validation in Python files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bindu/settings.py` around lines 151 - 165, The three tunnel-related settings
max_attempts, base_backoff_seconds, and health_check_timeout_seconds must be
validated at settings parse time: enforce max_attempts >= 1,
base_backoff_seconds >= 0, and health_check_timeout_seconds > 0 so bad env
values raise immediately. Fix by adding Pydantic constraints/validators on those
fields in the settings model (either use conint/ge/gt or add `@validator` methods
for the model) to validate the values and raise a clear ValidationError during
parsing; update the Field(...) declarations for max_attempts,
base_backoff_seconds, and health_check_timeout_seconds (or add validators named
for those fields) accordingly so invalid config is rejected early.
Summary
This PR improves the robustness and reliability of tunnel initialization by addressing silent failure cases and enhancing error handling, validation, and observability.
Changes
1. Retry + Backoff
2. Tunnel Validation
3. Explicit Failure Handling
TunnelErrorfor clear failure signalingfail_on_tunnel_errorflag for optional soft fallback4. Improved Logging
info→ retry attemptswarning→ soft failureserror→ hard failures5. Developer Feedback (DX)
6. Tests
Why this change?
Previously, tunnel creation failures were silently ignored, leading to misleading states where the agent appeared publicly available but was not.
This PR ensures:
Backward Compatibility
closes #432
Summary by CodeRabbit
New Features
Tests