[Bug]:Improve tunnel reliability with retries, validation, and failure visibility (closes #432) by janhavitupe · Pull Request #433 · GetBindu/Bindu

janhavitupe · 2026-04-03T05:21:33Z

Summary

This PR improves the robustness and reliability of tunnel initialization by addressing silent failure cases and enhancing error handling, validation, and observability.

Changes

1. Retry + Backoff

Added retry logic with exponential backoff for tunnel creation
Prevents failures due to transient network issues

2. Tunnel Validation

Added HTTP health check after tunnel creation
Ensures tunnel is actually reachable before proceeding
Validation is integrated into retry flow

3. Explicit Failure Handling

Introduced TunnelError for clear failure signaling
Default behavior now fails fast instead of silently continuing
Added fail_on_tunnel_error flag for optional soft fallback

4. Improved Logging

Structured logging:
- info → retry attempts
- warning → soft failures
- error → hard failures
Includes attempt counts and failure reasons

5. Developer Feedback (DX)

Startup banner now shows a prominent warning panel when tunnel fails
Clearly indicates that the agent is NOT publicly accessible

6. Tests

Added tests covering:
- Retry success after failure
- Complete failure (hard and soft modes)
- Health check failure triggering retry
- Tunnel disabled scenario

Why this change?

Previously, tunnel creation failures were silently ignored, leading to misleading states where the agent appeared publicly available but was not.

This PR ensures:

Failures are visible and actionable
System behaves predictably under failure conditions
Developers get clear feedback during startup

Backward Compatibility

Default behavior is now fail-fast (safer)
Previous behavior can be restored using:
```
fail_on_tunnel_error=False
```

closes #432

Summary by CodeRabbit

New Features
- Tunnel creation now retries with exponential backoff, includes health checks, and can be configured to either fail the operation or continue local-only on tunnel failure.
- Improved user-facing failure messaging and a visible “tunnel failure” notice during startup when applicable.
Tests
- Added tests covering retry behavior, backoff timing, success-after-retry, and failure modes for tunnel setup.

…rror handling

coderabbitai · 2026-04-03T05:21:52Z

📝 Walkthrough

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main changes: retry/validation logic and failure visibility for tunnel initialization, directly addressing issue `#432`.
Description check	✅ Passed	The description covers most key sections: summary of problem/solution, detailed changes, backward compatibility, and issue reference. However, it lacks formal structure (Change Type, Scope, Security Impact, Verification, Human Verification sections from template are missing).
Linked Issues check	✅ Passed	All coding requirements from issue `#432` are fully met: prominent failure warning panel added to startup output, fail_on_tunnel_error parameter implemented, retry logic with exponential backoff added, health validation integrated, and explicit TunnelError exception introduced.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to tunnel reliability improvements: configuration fields for retry/backoff, tunnel health validation, failure signaling, display enhancements, and comprehensive tests. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/penguin/bindufy.py`:
- Around line 653-655: The tunnel creation currently calls _setup_tunnel (which
performs live httpx.get() checks) before start_uvicorn_server, causing premature
health checks; change the flow so _setup_tunnel only establishes/returns the
tunnel endpoint without doing live HTTP probes (e.g., add a parameter
validate=False or split out a new validate_tunnel(tunnel_url) function), then
call the validation step only after start_uvicorn_server has started accepting
requests (or when run_server is False but launch=True handle validation
accordingly); update callers around the existing _setup_tunnel call and the code
paths noted (including the block covering lines 657-686) to perform validation
after server start and respect fail_on_tunnel_error semantics.
- Around line 211-213: The three module-level constants (_TUNNEL_MAX_ATTEMPTS,
_TUNNEL_BASE_BACKOFF, _TUNNEL_HEALTH_TIMEOUT) must be removed and read from
bindu.settings.app_settings instead; add corresponding settings (e.g.
TUNNEL_MAX_ATTEMPTS, TUNNEL_BASE_BACKOFF, TUNNEL_HEALTH_TIMEOUT) into
app_settings with the current defaults (3, 1, 3) and then replace uses of the
constants in bindufy.py with app_settings.TUNNEL_MAX_ATTEMPTS,
app_settings.TUNNEL_BASE_BACKOFF, and app_settings.TUNNEL_HEALTH_TIMEOUT (import
app_settings at top if not already present) so environments can override them
via settings.

In `@bindu/utils/display.py`:
- Around line 149-152: The warning text in bindu/utils/display.py is misleading:
update the message appended to warning_lines (the two append calls shown) to
instruct users they can "continue locally after tunnel failure" by passing
fail_on_tunnel_error=False to bindufy(), and update the matching log banner
elsewhere in the module to use the same phrasing; locate uses of
warning_lines.append and the log banner string in this file and replace "to
continue locally without this warning" / "removes the warning" wording with "to
continue locally after tunnel failure" (keep the reference to bindufy() and
fail_on_tunnel_error as-is).

In `@tests/unit/penguin/test_setup_tunnel.py`:
- Line 30: The patch decorators in tests currently target the module-local
symbol "bindu.penguin.bindufy.TunnelManager" but _setup_tunnel() imports
TunnelManager inside the function with "from bindu.tunneling.manager import
TunnelManager", so update the patch targets in the four decorators to
"bindu.tunneling.manager.TunnelManager" (apply the change where
`@patch`("bindu.penguin.bindufy.TunnelManager") appears in the four test methods)
so the function-local import is intercepted correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0a8b5a2-4e35-454e-8af5-bd3c318e5ea8

📥 Commits

Reviewing files that changed from the base of the PR and between 45656ee and e5841e1.

📒 Files selected for processing (3)

bindu/penguin/bindufy.py
bindu/utils/display.py
tests/unit/penguin/test_setup_tunnel.py

coderabbitai · 2026-04-03T05:30:34Z

    # Create tunnel if enabled
-    tunnel_url = _setup_tunnel(tunnel_config, port, _manifest, bindu_app)
+    tunnel_url, tunnel_failure_reason = _setup_tunnel(tunnel_config, port, _manifest, bindu_app, fail_on_tunnel_error)



⚠️ Potential issue | 🔴 Critical

Tunnel validation runs before the server exists.

_setup_tunnel() now performs live httpx.get() checks, but this call still happens before start_uvicorn_server() is invoked. With launch=True, the health check is probing a server that isn't listening yet, so otherwise-valid tunnels will be retried as failures and the default fail_on_tunnel_error=True path will abort startup. This also makes run_server=False + launch=True unsatisfiable. Split tunnel creation from validation, or defer validation until the HTTP server is accepting requests.

Also applies to: 657-686

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bindu/penguin/bindufy.py` around lines 653 - 655, The tunnel creation currently calls _setup_tunnel (which performs live httpx.get() checks) before start_uvicorn_server, causing premature health checks; change the flow so _setup_tunnel only establishes/returns the tunnel endpoint without doing live HTTP probes (e.g., add a parameter validate=False or split out a new validate_tunnel(tunnel_url) function), then call the validation step only after start_uvicorn_server has started accepting requests (or when run_server is False but launch=True handle validation accordingly); update callers around the existing _setup_tunnel call and the code paths noted (including the block covering lines 657-686) to perform validation after server start and respect fail_on_tunnel_error semantics.

…heck timing, update messaging, and correct test patching

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/penguin/bindufy.py`:
- Around line 220-223: _bindufy_core() currently never invokes
_validate_tunnel_health() after _setup_tunnel(), so a dead or 503 tunnel can be
treated as public; update _bindufy_core() to call _validate_tunnel_health() once
the HTTP server reports readiness and before calling prepare_server_display() or
returning the tunnel URL, and ensure that any TunnelError or retry logic from
_validate_tunnel_health() is handled (falling back to setting tunnel_url=None
and tunnel_failure_reason) so prepare_server_display() will render the warning
panel; apply the same insertion point for the other similar paths that call
_setup_tunnel() (the blocks referenced around the other ranges) so health
validation always runs after server readiness and before exposing the tunnel
URL.
- Around line 256-286: The try-except currently covers both tunnel allocation
and the post-allocation mutations, causing a new tunnel to be created if a
mutation (e.g., setting manifest.url or bindu_app.url or resetting
bindu_app._agent_card_json_schema) fails; refactor so
tunnel_manager.create_tunnel(...) is the only call inside the retry loop (the
block that retries on transient create failures), then after a successful
create_tunnel returns (tunnel_url) perform the mutations (manifest.url =
tunnel_url, bindu_app.url = tunnel_url, bindu_app._agent_card_json_schema =
None) outside that retry loop or inside a separate try-except that does not
re-enter the create_tunnel retry logic, and ensure only create_tunnel failures
trigger the exponential backoff / retry logic.

In `@bindu/settings.py`:
- Around line 151-165: The three tunnel-related settings max_attempts,
base_backoff_seconds, and health_check_timeout_seconds must be validated at
settings parse time: enforce max_attempts >= 1, base_backoff_seconds >= 0, and
health_check_timeout_seconds > 0 so bad env values raise immediately. Fix by
adding Pydantic constraints/validators on those fields in the settings model
(either use conint/ge/gt or add `@validator` methods for the model) to validate
the values and raise a clear ValidationError during parsing; update the
Field(...) declarations for max_attempts, base_backoff_seconds, and
health_check_timeout_seconds (or add validators named for those fields)
accordingly so invalid config is rejected early.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41c17b52-b240-499a-b93e-8026863f161f

📥 Commits

Reviewing files that changed from the base of the PR and between e5841e1 and ebfff26.

📒 Files selected for processing (4)

bindu/penguin/bindufy.py
bindu/settings.py
bindu/utils/display.py
tests/unit/penguin/test_setup_tunnel.py

🚧 Files skipped from review as they are similar to previous changes (2)

bindu/utils/display.py
tests/unit/penguin/test_setup_tunnel.py

coderabbitai · 2026-04-03T09:07:20Z

+    Retries tunnel creation up to app_settings.tunnel.max_attempts times with
+    exponential backoff before giving up. Health validation is intentionally
+    deferred — it is performed separately after the HTTP server is accepting
+    requests (see _validate_tunnel_health).


⚠️ Potential issue | 🔴 Critical

The deferred health check never actually runs.

The docstring says validation happens later, but _bindufy_core() never calls _validate_tunnel_health() after _setup_tunnel() returns. An unreachable or 503 tunnel is therefore still passed to prepare_server_display() and treated as public, so health-check-triggered retries, TunnelError, and the fallback warning never execute. Downstream, prepare_server_display() only renders the warning panel when tunnel_url is None and tunnel_failure_reason is set (bindu/utils/display.py:136-161), so this path still presents the agent as publicly reachable. This also makes run_server=False + launch=True look successful even though nothing is listening behind the tunnel. Call the validator after the HTTP server reaches readiness and before displaying or returning the public URL.

Also applies to: 308-333, 651-685

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bindu/penguin/bindufy.py` around lines 220 - 223, _bindufy_core() currently never invokes _validate_tunnel_health() after _setup_tunnel(), so a dead or 503 tunnel can be treated as public; update _bindufy_core() to call _validate_tunnel_health() once the HTTP server reports readiness and before calling prepare_server_display() or returning the tunnel URL, and ensure that any TunnelError or retry logic from _validate_tunnel_health() is handled (falling back to setting tunnel_url=None and tunnel_failure_reason) so prepare_server_display() will render the warning panel; apply the same insertion point for the other similar paths that call _setup_tunnel() (the blocks referenced around the other ranges) so health validation always runs after server readiness and before exposing the tunnel URL.

coderabbitai · 2026-04-03T09:07:20Z

+    for attempt in range(1, max_attempts + 1):
+        try:
+            tunnel_url = tunnel_manager.create_tunnel(
+                local_port=port,
+                config=tunnel_config,
+                subdomain=tunnel_config.subdomain,
+            )
+            logger.info(
+                f"✅ Tunnel created (attempt {attempt}/{max_attempts}): {tunnel_url}"
+            )
+
+            manifest.url = tunnel_url
+            bindu_app.url = tunnel_url
+            bindu_app._agent_card_json_schema = None
+
+            return tunnel_url, None
+
+        except Exception as e:
+            failure_reason = str(e)
+            if attempt < max_attempts:
+                backoff = base_backoff * (2 ** (attempt - 1))
+                logger.info(
+                    f"Tunnel attempt {attempt}/{max_attempts} failed "
+                    f"({failure_reason}). Retrying in {backoff}s..."
+                )
+                time.sleep(backoff)
+            else:
+                logger.error(
+                    f"Tunnel attempt {attempt}/{max_attempts} failed "
+                    f"({failure_reason}). All retries exhausted."
+                )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "bindufy.py" -type f

Repository: GetBindu/Bindu

Length of output: 84

🏁 Script executed:

cat -n ./bindu/penguin/bindufy.py | sed -n '250,290p'

Repository: GetBindu/Bindu

Length of output: 1797

🏁 Script executed:

cat -n ./bindu/penguin/bindufy.py | sed -n '200,260p'

Repository: GetBindu/Bindu

Length of output: 2482

🏁 Script executed:

rg -t py "class AgentManifest" --max-count=5

Repository: GetBindu/Bindu

Length of output: 101

🏁 Script executed:

cat -n ./bindu/common/models.py | grep -A 50 "class AgentManifest:"

Repository: GetBindu/Bindu

Length of output: 2076

🏁 Script executed:

cat -n ./bindu/common/models.py | sed -n '145,175p'

Repository: GetBindu/Bindu

Length of output: 1104

Restructure exception handling to avoid creating duplicate tunnels.

The current code wraps both TunnelManager.create_tunnel() and the subsequent manifest/app mutations in the same try-except block. If the tunnel is successfully allocated but a mutation fails (e.g., manifest.url = tunnel_url), the retry loop creates another tunnel and misattributes a local bug to a transient tunnel failure. Move the mutations outside the retry block or into a separate try-except that doesn't trigger another tunnel creation.

🧰 Tools

🪛 Ruff (0.15.7)

[warning] 273-273: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bindu/penguin/bindufy.py` around lines 256 - 286, The try-except currently covers both tunnel allocation and the post-allocation mutations, causing a new tunnel to be created if a mutation (e.g., setting manifest.url or bindu_app.url or resetting bindu_app._agent_card_json_schema) fails; refactor so tunnel_manager.create_tunnel(...) is the only call inside the retry loop (the block that retries on transient create failures), then after a successful create_tunnel returns (tunnel_url) perform the mutations (manifest.url = tunnel_url, bindu_app.url = tunnel_url, bindu_app._agent_card_json_schema = None) outside that retry loop or inside a separate try-except that does not re-enter the create_tunnel retry logic, and ensure only create_tunnel failures trigger the exponential backoff / retry logic.

coderabbitai · 2026-04-03T09:07:21Z

+    max_attempts: int = Field(
+        default=3,
+        validation_alias=AliasChoices("TUNNEL__MAX_ATTEMPTS"),
+        description="Maximum tunnel creation attempts before giving up",
+    )
+    base_backoff_seconds: int = Field(
+        default=1,
+        validation_alias=AliasChoices("TUNNEL__BASE_BACKOFF_SECONDS"),
+        description="Base backoff in seconds; doubles each retry (1s, 2s, 4s)",
+    )
+    health_check_timeout_seconds: int = Field(
+        default=3,
+        validation_alias=AliasChoices("TUNNEL__HEALTH_CHECK_TIMEOUT_SECONDS"),
+        description="Timeout in seconds for the post-creation health check request",
+    )


⚠️ Potential issue | 🟠 Major

Validate the retry knobs when settings load.

These values come from env, but invalid inputs fail late inside the tunnel path: max_attempts=0 skips all attempts, base_backoff_seconds<0 makes time.sleep() raise, and health_check_timeout_seconds<=0 turns the health check into a config/runtime error instead of a clean tunnel failure. Add bounds here so bad config is rejected during settings parsing.

Suggested bounds

max_attempts: int = Field( default=3, + ge=1, validation_alias=AliasChoices("TUNNEL__MAX_ATTEMPTS"), description="Maximum tunnel creation attempts before giving up", ) base_backoff_seconds: int = Field( default=1, + ge=0, validation_alias=AliasChoices("TUNNEL__BASE_BACKOFF_SECONDS"), description="Base backoff in seconds; doubles each retry (1s, 2s, 4s)", ) health_check_timeout_seconds: int = Field( default=3, + gt=0, validation_alias=AliasChoices("TUNNEL__HEALTH_CHECK_TIMEOUT_SECONDS"), description="Timeout in seconds for the post-creation health check request", )

As per coding guidelines, bindu/**/*.py: "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bindu/settings.py` around lines 151 - 165, The three tunnel-related settings max_attempts, base_backoff_seconds, and health_check_timeout_seconds must be validated at settings parse time: enforce max_attempts >= 1, base_backoff_seconds >= 0, and health_check_timeout_seconds > 0 so bad env values raise immediately. Fix by adding Pydantic constraints/validators on those fields in the settings model (either use conint/ge/gt or add `@validator` methods for the model) to validate the values and raise a clear ValidationError during parsing; update the Field(...) declarations for max_attempts, base_backoff_seconds, and health_check_timeout_seconds (or add validators named for those fields) accordingly so invalid config is rejected early.

Improve tunnel reliability with retries, validation, and structured e…

e5841e1

…rror handling

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Address review feedback: move tunnel config to settings, fix health c…

ebfff26

…heck timing, update messaging, and correct test patching

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]:Improve tunnel reliability with retries, validation, and failure visibility (closes #432)#433

[Bug]:Improve tunnel reliability with retries, validation, and failure visibility (closes #432)#433
janhavitupe wants to merge 2 commits intoGetBindu:mainfrom
janhavitupe:fix/tunnel-reliability

janhavitupe commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

janhavitupe commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Retry + Backoff

2. Tunnel Validation

3. Explicit Failure Handling

4. Improved Logging

5. Developer Feedback (DX)

6. Tests

Why this change?

Backward Compatibility

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

janhavitupe commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading