Skip to content

πŸ“ docs+config: ecosystem alignment β€” integration roadmap, branch-protection-as-code, README cross-links#9

Open
scttbnsn wants to merge 23 commits into
mainfrom
dev/0.2.2
Open

πŸ“ docs+config: ecosystem alignment β€” integration roadmap, branch-protection-as-code, README cross-links#9
scttbnsn wants to merge 23 commits into
mainfrom
dev/0.2.2

Conversation

@scttbnsn

Copy link
Copy Markdown
Contributor

Ecosystem-alignment batch for lookout (v0.2.2 line). Derived from two background investigation workflows over lookout Β· drydock Β· sockguard (integration analysis + presentation/governance normalization), with every finding adversarially verified before it landed here.

Changes

  • πŸ“ ROADMAP.md β€” adds a The ecosystem section (sockguard β†’ lookout β†’ drydock topology, standard vs edge mode) and folds the verified integration gaps into Now/Next/Later. The headline: edge mode is fully built agent-side but blocked on drydock's missing /api/lookout/ws controller endpoint.
  • πŸ”§ scripts/apply-branch-protection.sh β€” codifies the OpenSSF-top-tier ruleset now active on main (idempotent create/update), so branch protection is reproducible-as-code.
  • πŸ“ README.md β€” adds the Part of the CodesWhat ecosystem cross-link block to the footer (drydock + sockguard), adds the missing Discussions badge, and removes the misleading Snyk placeholder (it rendered as "monitored" but Snyk was never wired up).
  • πŸ”§ .github/ISSUE_TEMPLATE/config.yml β€” routes Question/Help to Discussions Q&A and adds a Feature-request link, matching the drydock/sockguard convention.

No code or behavior changes β€” docs, governance, and CI/issue config only. The .roadmap/internal.md punch-list (gitignored) captures the full M1–M7 integration sequence and risks.

Sibling CODEOWNERS normalization is in separate PRs (drydock #427, sockguard #97); their README/community normalization will follow in their own PRs.

scttbnsn added 19 commits June 12, 2026 18:40
…cket (embedded SVGs; both dropped from simple-icons)
…payload

- Thread *config.Config through NewSSEBroadcaster and NewAdapter so the
  ack payload can read LogLevel and DDPollInterval at runtime
- MemoryGB: parse /proc/meminfo (MemTotal) on Linux via parseProcMeminfo;
  return 0.0 on all other GOOS β€” no cgo, no extra deps
- ackDataBody.MemoryGB promoted from int to float64 (one-decimal GiB)
- Wire fields: logLevel (string), pollInterval (string), memoryGb (float64)
Replace the blocking retry loop in HandleInput (up to 500 ms on readPump)
with a per-ExecSession buffered channel drained by a single dedicated goroutine.

- readPump now NEVER blocks on exec stdin I/O; it enqueues via a non-blocking
  channel select and returns immediately.
- Stdin ordering within a session is strictly preserved: drainInput is the
  only writer on net.Conn, so concurrent HandleInput callers cannot reorder
  or interleave bytes.
- Backpressure policy (documented in execInputQueueDepth comment): when the
  queue is full (drain cannot keep up), the incoming frame is dropped and an
  error frame is sent to the controller.  This is "drop-with-error-frame":
  safer than blocking readPump (which would freeze pings, requests, and all
  other session I/O) and safer than silent corruption.
- drainInput exits on session.Close() (done channel); the buffered channel
  is fully drained before exit so no enqueued bytes are silently discarded.
- No goroutine leaks: Close() closes done, drainInput selects on it.
Stand up an in-process mock drydock WebSocket server (httptest.Server +
gorilla/websocket) that speaks the real lookout/1.0 frame format from
internal/protocol.  Covers:

- hello/welcome handshake: happy path (validates protocol, agentId, capabilities)
  and rejected-hello path (server returns error frame β†’ connect() returns error)
- request fan-out: maxStreams semaphore saturation immediately returns an error
  frame instead of blocking readPump
- exec session concurrency cap: StartExec beyond maxExecSessions sends exec_end
  rejection frame
- exec session lifecycle: readLoop emits exec_output frames and exec_end on EOF
- #30 regression β€” ordered input queue:
    TestExecInputQueue_OrderPreserved: N concurrent HandleInput callers; all N
    byte values arrive at the Docker-side sink exactly once, none interleaved
    TestExecInputQueue_NoBlockOnFull: HandleInput returns in <200 ms even when
    queue is saturated (enforces readPump non-blocking requirement)
    TestExecInputQueue_DrainOnClose: drainInput goroutine exits promptly on
    session.Close() - no goroutine leak
    TestExecClose_Idempotent: session.Close() is safe to call multiple times
- ping round-trip: readPump replies TypePing with TypePong and correct timestamp
- TestPing_UnderExecLoad: pong arrives within 500 ms while 200 exec_input frames
  are in flight, proving readPump is non-blocking with the ordered queue fix

All 11 tests pass under -race.
HandleResize ran a 10Γ—50 ms blocking retry loop directly on readPump,
the same anti-pattern that issue #30 fixed for HandleInput.  During a
10-attempt resize burst, readPump could not service pings, exec_input
frames, or any other message type for up to 500 ms.

Dispatch HandleResize with `go` so readPump returns immediately, exactly
as HandleInput was fixed in #30.  The retry loop in HandleResize itself
is unchanged; resize failures are non-fatal and isolated to their own
goroutine.
When the WebSocket tunnel dropped, connect() called wg.Wait() and then
closed the WebSocket conn, but never iterated execSessions.  Each
session's readLoop and drainInput goroutines remained alive, blocked on
s.conn.Read / s.conn.Write against the Docker hijacked net.Conn.  The
Docker exec process kept running and the goroutines were orphaned for
the duration of the reconnect window β€” an unbounded leak on every
unclean disconnect.

After wg.Wait(), range execSessions and call Close() on every surviving
session.  The sync.Once in Close() ensures idempotency when readLoop
also calls it.
drainInput previously spun indefinitely when conn.Write failed, relying
on readLoop to eventually call Close() via a read error.  For asymmetric
half-closed connections (e.g. a proxy that closes only the write path),
conn.Read may never error while conn.Write keeps failing β€” drainInput
would spin forever and the goroutine would never exit.

Add a consecutive write-error counter.  After maxWriteErrors (3) back-
to-back failures, drainInput calls s.Close() directly and returns.
The sync.Once in Close() ensures idempotency with any concurrent
readLoop Close() call.
- Clarify TestExecInputQueue_OrderPreserved: it does NOT catch the
  readPump-blocking bug (HandleInput is called directly, bypassing
  readPump).  Note that TestPing_UnderExecLoad is the primary regression
  test for that pattern.

- Label TestPing_UnderExecLoad as the PRIMARY regression test for the
  readPump-blocking anti-pattern from #30 and explain why it is the
  only test that exercises the full readPump dispatch path.

- Add TestHandleResize_NonBlocking: asserts that a burst of exec_resize
  frames for an unknown session does not stall a concurrent ping for
  more than 500 ms (regression test for the HandleResize fix).

- Add TestExecSessions_TornDownOnDisconnect: injects two fake sessions,
  drops the server-side WS conn, and verifies both sessions' Docker
  net.Conns are closed after connect() returns (regression test for the
  exec-session teardown fix).

- Add TestDrainInput_ExitsOnWriteErrors: closes the exec conn and feeds
  frames into inputQ; asserts drainInput exits within 2 s and that
  session.done is closed (regression test for the write-error exit fix).
- close dial response body (bodyclose)
- check c.connect/conn.Read returns in goroutines (errcheck)
- remove unused newTestClientWithWS helper (unused)
✨ feat(adapter): populate logLevel/pollInterval/memoryGb in dd:ack (M4)
πŸ› fix(edge): per-session ordered exec input queue + edge test harness (#30/#38)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant