Skip to content

fix(startup): defer connect and temperature boot tasks#1988

Merged
elibosley merged 16 commits intomainfrom
codex/defer-startup-tasks
Apr 15, 2026
Merged

fix(startup): defer connect and temperature boot tasks#1988
elibosley merged 16 commits intomainfrom
codex/defer-startup-tasks

Conversation

@elibosley
Copy link
Copy Markdown
Member

@elibosley elibosley commented Apr 9, 2026

Summary

  • move mothership and remote access startup work to post-ready listeners instead of Nest bootstrap
  • move temperature provider discovery to post-ready startup and make initialization idempotent
  • make disk sensor availability constant-time so startup no longer blocks on a full disk scan
  • create the dependency recovery archive during plugin install and verify that both node_modules and the archive exist
  • keep temperature metrics resilient after transient provider probe or initialization failures
  • preserve structured deferred Connect startup warning errors from Promise.allSettled results
  • restore compact PM2 lifecycle output by passing --mini-list on start, restart, status, and stop

Why

  • the FeatureOS report at https://product.unraid.net/p/api-not-starting showed the API burning its startup budget before it could signal readiness
  • Connect still had a bootstrap-path hazard through remote access initialization
  • temperature discovery matched the reported stall, with disk sensor availability doing expensive disk work during startup
  • PM2 formatted output regressed the CLI back to noisy startup banners and tables instead of the expected compact text mode

Testing

  • ruby -e "require 'yaml'; data=YAML.load_file('.github/workflows/main.yml'); puts data['jobs']['test-api']['steps'].find { |s| s['name'] == 'Setup libvirt' }['run']" | bash -n
  • ruby -e "require 'yaml'; YAML.load_file('.github/workflows/main.yml')"
  • pnpm exec vitest run src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts --reporter verbose
  • pnpm --filter ./api lint
  • pnpm --filter unraid-api-plugin-connect exec vitest run src/__test__/connect-startup-tasks.test.ts src/__test__/mothership.events.test.ts
  • pnpm --filter unraid-api-plugin-connect build
  • pnpm exec vitest run src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts src/unraid-api/graph/resolvers/metrics/temperature/temperature-startup-tasks.spec.ts
  • pnpm --filter ./api build
  • bash -n plugin/source/dynamix.unraid.net/usr/local/share/dynamix.unraid.net/install/scripts/verify_install.sh
  • bash plugin/tests/test-shell-detection.sh
  • cd api && pnpm test -- src/unraid-api/plugin/__test__/plugin-management.service.spec.ts
  • pnpm --filter ./api test -- src/unraid-api/cli/__test__/pm2-commands.spec.ts (ran the configured API suite: 179 files passed, 1975 tests passed)

Notes

  • the local commit skipped simple-git-hooks with SKIP_SIMPLE_GIT_HOOKS=1 because this checkout's hook references node_modules/lint-staged/bin/lint-staged.js, which is missing
  • follow-up server investigation showed the missing self-heal path was installer-side: installs wrote vendor archive metadata but never actually generated the archive

Summary by CodeRabbit

  • New Features

    • App-ready driven startup for Connect tasks; remote access and mothership init run after server begins listening.
  • Improvements

    • Temperature providers now initialize lazily and retry safely; startup-time failures are logged.
    • CLI PM2 commands use compact --mini-list output for start/restart/status/stop behavior.
    • Disk sensor availability now reports available unconditionally.
  • Tests

    • Added/updated tests for connect startup, temperature init flows, disk sensors, and PM2 CLI commands.
  • Chores

    • Install/verify scripts now archive/check dependency installation and report dependency errors.

Fixes #1989

- move mothership, remote access, and temperature provider discovery off the Nest bootstrap path

- before this change, startup could spend the full PM2 ready budget on non-critical background work

- that caused restart-loop risk when Connect hit slow UPNP work and when temperature discovery stalled on disk sensor availability

- the new startup listeners wait for the app.ready event and schedule that work asynchronously after readiness

- temperature initialization is now idempotent and disk sensor availability no longer blocks on a full disk scan
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Startup initialization moved from Nest lifecycle hooks to explicit app-ready event handlers for temperature and connect components; TemperatureService now lazily initializes providers via initializeProviders(); DiskSensorsService.isAvailable() now always returns true; new connect startup orchestrator runs remote-access and mothership init tasks concurrently; PM2 CLI args and related tests updated.

Changes

Cohort / File(s) Summary
Temperature service
api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts, api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts
Removed OnModuleInit/onModuleInit(). Added initializeProviders() (concurrency-safe via initialized + initializationPromise), loadAvailableProviders(), and handleAppReady() to init on APP_READY_EVENT. Tests updated to call initializeProviders() and cover handleAppReady() and failure/retry behaviors.
Disk sensors
api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.ts, api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts
DiskSensorsService.isAvailable() changed to unconditionally return true. Tests rewired to assert true for all cases and assert disksService.getDisks() is not called.
Connect startup orchestration
packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts, packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts
Added runConnectStartupTasks() and ConnectStartupTasksListener: on app.ready runs optional dynamicRemoteAccessService.initRemoteAccess() and mothershipController.initOrRestart() concurrently via Promise.allSettled(), logging warnings for individual failures. Tests validate behavior, error tolerance, and listener handling.
Mothership controller / module
packages/unraid-api-plugin-connect/src/mothership-proxy/mothership.controller.ts, packages/unraid-api-plugin-connect/src/mothership-proxy/mothership.module.ts
Removed implicit OnApplicationBootstrap lifecycle invocation from MothershipController (deleted onApplicationBootstrap()); retained onModuleDestroy(). Registered ConnectStartupTasksListener in MothershipModule providers.
Dynamic remote access service
packages/unraid-api-plugin-connect/src/remote-access/dynamic-remote-access.service.ts
Removed OnApplicationBootstrap usage and made initRemoteAccess() public so startup orchestrator can call it; body unchanged.
PM2 CLI changes & tests
api/src/unraid-api/cli/start.command.ts, api/src/unraid-api/cli/restart.command.ts, api/src/unraid-api/cli/status.command.ts, api/src/unraid-api/cli/stop.command.ts, api/src/unraid-api/cli/__test__/pm2-commands.spec.ts
Removed --mini-list from PM2 invocations across commands. StartCommand.cleanupPM2State() now runs pm2 kill --no-autorestart then deletes PM2 home. Added tests asserting PM2 invocation sequences and absence of --mini-list.
Installer & verification scripts
plugin/plugins/dynamix.unraid.net.plg, plugin/source/dynamix.unraid.net/.../verify_install.sh
Installer now runs /etc/rc.d/rc.unraid-api archive-dependencies after adding plugin and logs outcome. Verification script adds check_populated_dir, dependency checks for node_modules, vendor archive validation, and reports DEPENDENCY_ERRORS in summary.
Misc small refactors / tests
api/src/unraid-api/plugin/plugin-management.service.ts, api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts
Minor simplification in addBundledPlugin() return; test adjustments to DiskSensorsService expectations and structure.

Sequence Diagram(s)

sequenceDiagram
    actor App
    participant Listener as ConnectStartupTasksListener
    participant Runner as runConnectStartupTasks
    participant DRA as DynamicRemoteAccessService
    participant Moth as MothershipController
    participant Log as Logger

    App->>Listener: app.ready (reason: "nestjs-server-listening")
    Listener->>Runner: invoke with deps & logger
    Runner->>DRA: initRemoteAccess() (if present)
    Runner->>Moth: initOrRestart() (if present)
    par concurrent inits
      DRA-->>Runner: fulfilled / rejected
      Moth-->>Runner: fulfilled / rejected
    end
    Runner->>Log: warn(...) on any rejection
    Runner-->>Listener: complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • mdatelle
  • zackspear

Poem

🐰 I hopped in at ready, ears attuned to the start,
Tasks paired up and raced, each playing its part,
Remote and mothership woke with a cheer,
Disks now declare "ready"—no errors appear,
The rabbit applauds, carrots for art! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix(startup): defer connect and temperature boot tasks' directly and clearly summarizes the main objective of deferring bootstrap-time initialization to post-ready listeners for Connect and temperature services.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/defer-startup-tasks

Comment @coderabbitai help to get the list of available commands and usage tips.

@elibosley elibosley marked this pull request as ready for review April 9, 2026 08:09
- switch the post-ready temperature and connect listeners to direct async event handling

- before this change both listeners wrapped work in setTimeout(..., 0), which added indirection without changing the readiness boundary

- that made the startup path harder to reason about and less idiomatic than using the event emitter's async listener mode directly

- the new listeners run from @onevent(..., { async: true }) and keep warning behavior intact

- connect startup still isolates remote-access and mothership failures with Promise.allSettled so one rejection does not block the other
- align the temperature startup task files with repo formatting expectations

- CI was failing in the Test API lint step on import ordering and wrapping only

- no runtime behavior changed in this commit

- formatting was applied from the api workspace and verified with pnpm run lint
- fold the post-ready temperature initialization handler into TemperatureService

- before this change the temperature module registered a separate listener provider only to forward app.ready into initializeProviders

- that added module-level indirection without adding real behavior or isolation

- the service now owns its own app-ready handler and logs startup initialization failures directly

- the standalone startup-task files were removed and the service spec now covers the post-ready path
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts (1)

94-107: ⚠️ Potential issue | 🟠 Major

Don’t permanently cache an empty provider set after probe failures.

When isAvailable() throws on Lines 101-103, the pass still ends by setting initialized = true on Line 107. Since getMetrics() now gates on initializeProviders() first, a transient startup probe failure can leave availableProviders empty for the rest of the process with no retry path.

One way to preserve retries only for failed empty scans
     private async loadAvailableProviders(): Promise<void> {
+        let hadProbeFailure = false;
         // 1. Get sensor specific configs
         const config = this.configService.getConfig(false);
         const lmSensorsConfig = config?.sensors?.lm_sensors;
         const smartctlConfig = config?.sensors?.smartctl;
         const ipmiConfig = config?.sensors?.ipmi;
@@
             try {
                 if (await provider.service.isAvailable()) {
                     availableProviders.push(provider.service);
                     this.logger.log(`Temperature provider available: ${provider.service.id}`);
                 } else {
                     this.logger.debug(`Temperature provider not available: ${provider.service.id}`);
                 }
             } catch (err) {
+                hadProbeFailure = true;
                 this.logger.warn(`Failed to check provider ${provider.service.id}`, err);
             }
         }
 
         this.availableProviders = availableProviders;
-        this.initialized = true;
+        this.initialized = availableProviders.length > 0 || !hadProbeFailure;
 
         if (this.availableProviders.length === 0) {
             this.logger.warn('No temperature providers available');
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`
around lines 94 - 107, The initialization currently sets this.availableProviders
and this.initialized = true even when all provider.isAvailable() probes threw,
which can permanently cache an empty provider set and block retries because
getMetrics() gates on initializeProviders(); modify initializeProviders() (the
loop using provider.service.isAvailable()) to detect probe failures and only
mark this.initialized = true when at least one provider was confirmed available
or when the scan completed with no probe errors; otherwise leave initialized =
false (or set a retry flag) so subsequent calls to getMetrics() will retry
probing; you can implement this by tracking a local hadProbeErrors boolean while
iterating and only assigning this.initialized = true when !hadProbeErrors ||
availableProviders.length > 0, keeping references to availableProviders,
isAvailable(), initializeProviders(), getMetrics(), and this.initialized to
locate the change.
♻️ Duplicate comments (1)
packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts (1)

3-4: ⚠️ Potential issue | 🟡 Minor

Remove unused imports.

MothershipController and DynamicRemoteAccessService are imported but never used in this test file.

🧹 Proposed fix
 import { describe, expect, it, vi } from 'vitest';
 
-import { MothershipController } from '../mothership-proxy/mothership.controller.js';
-import { DynamicRemoteAccessService } from '../remote-access/dynamic-remote-access.service.js';
 import {
     ConnectStartupTasksListener,
     runConnectStartupTasks,
 } from '../startup/connect-startup-tasks.js';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`
around lines 3 - 4, Remove the unused imports by deleting the import lines for
MothershipController and DynamicRemoteAccessService from the test file; locate
the import statements that reference MothershipController and
DynamicRemoteAccessService at the top of the file and remove them so the test no
longer imports unused symbols (then run the linter or tests to confirm no other
references remain).
🧹 Nitpick comments (2)
packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts (2)

69-72: Logger adapter inconsistency.

The ConnectStartupLogger interface defines warn(message: string, error: unknown) with two parameters, but the adapter on line 71 concatenates them into a single string. This works but loses structured error data that NestJS Logger could otherwise handle.

♻️ Preserve error as second argument
             {
                 info: (message: string) => this.logger.log(message),
-                warn: (message: string, error: unknown) => this.logger.warn(`${message}: ${String(error)}`),
+                warn: (message: string, error: unknown) => this.logger.warn(message, error),
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts`
around lines 69 - 72, The adapter for ConnectStartupLogger is merging the warn
message and error into one string, losing structured error data; update the warn
implementation so it forwards the error as the second argument to the Nest
logger instead of concatenating (i.e., change the adapter's warn: (message,
error) => this.logger.warn(`${message}: ${String(error)}`) to forward the
original error object/trace as the second parameter), ensuring
ConnectStartupLogger.warn signature and usages still match.

41-48: Redundant .catch() inside Promise.allSettled.

The .catch() handlers swallow errors and return undefined, so Promise.allSettled will see fulfilled results rather than rejected. This works but obscures the actual settlement status. Consider removing the inner catches and handling rejections from the allSettled results, or keep the current approach if the explicit logging location is intentional.

♻️ Alternative: Handle rejections from allSettled results
-    await Promise.allSettled([
-        dynamicRemoteAccessService?.initRemoteAccess().catch((error: unknown) => {
-                logger.warn('Dynamic remote access startup failed', error);
-            }),
-        mothershipController?.initOrRestart().catch((error: unknown) => {
-                logger.warn('Mothership startup failed', error);
-            }),
-    ]);
+    const results = await Promise.allSettled([
+        dynamicRemoteAccessService?.initRemoteAccess(),
+        mothershipController?.initOrRestart(),
+    ]);
+
+    if (results[0]?.status === 'rejected') {
+        logger.warn('Dynamic remote access startup failed', results[0].reason);
+    }
+    if (results[1]?.status === 'rejected') {
+        logger.warn('Mothership startup failed', results[1].reason);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts`
around lines 41 - 48, The inner .catch handlers on
dynamicRemoteAccessService?.initRemoteAccess() and
mothershipController?.initOrRestart() swallow errors so Promise.allSettled
always sees fulfilled values; remove those inner .catch calls and instead await
Promise.allSettled([...]) with the bare promises, then iterate the returned
results and call logger.warn (or logger.error) when a result.status ===
'rejected', including the result.reason; update references to
dynamicRemoteAccessService?.initRemoteAccess,
mothershipController?.initOrRestart and logger.warn accordingly so failures are
recorded from the allSettled results rather than being masked by the per-promise
catches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`:
- Around line 78-87: The test calls the async function runConnectStartupTasks
without awaiting it; change the test to be async and await
runConnectStartupTasks({}, logger) before assertions so the Promise is settled
even when it resolves synchronously—update the test declaration (the it
callback) to async and await the call to runConnectStartupTasks to ensure
correct async semantics for logger assertions.

---

Outside diff comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Around line 94-107: The initialization currently sets this.availableProviders
and this.initialized = true even when all provider.isAvailable() probes threw,
which can permanently cache an empty provider set and block retries because
getMetrics() gates on initializeProviders(); modify initializeProviders() (the
loop using provider.service.isAvailable()) to detect probe failures and only
mark this.initialized = true when at least one provider was confirmed available
or when the scan completed with no probe errors; otherwise leave initialized =
false (or set a retry flag) so subsequent calls to getMetrics() will retry
probing; you can implement this by tracking a local hadProbeErrors boolean while
iterating and only assigning this.initialized = true when !hadProbeErrors ||
availableProviders.length > 0, keeping references to availableProviders,
isAvailable(), initializeProviders(), getMetrics(), and this.initialized to
locate the change.

---

Duplicate comments:
In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`:
- Around line 3-4: Remove the unused imports by deleting the import lines for
MothershipController and DynamicRemoteAccessService from the test file; locate
the import statements that reference MothershipController and
DynamicRemoteAccessService at the top of the file and remove them so the test no
longer imports unused symbols (then run the linter or tests to confirm no other
references remain).

---

Nitpick comments:
In `@packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts`:
- Around line 69-72: The adapter for ConnectStartupLogger is merging the warn
message and error into one string, losing structured error data; update the warn
implementation so it forwards the error as the second argument to the Nest
logger instead of concatenating (i.e., change the adapter's warn: (message,
error) => this.logger.warn(`${message}: ${String(error)}`) to forward the
original error object/trace as the second parameter), ensuring
ConnectStartupLogger.warn signature and usages still match.
- Around line 41-48: The inner .catch handlers on
dynamicRemoteAccessService?.initRemoteAccess() and
mothershipController?.initOrRestart() swallow errors so Promise.allSettled
always sees fulfilled values; remove those inner .catch calls and instead await
Promise.allSettled([...]) with the bare promises, then iterate the returned
results and call logger.warn (or logger.error) when a result.status ===
'rejected', including the result.reason; update references to
dynamicRemoteAccessService?.initRemoteAccess,
mothershipController?.initOrRestart and logger.warn accordingly so failures are
recorded from the allSettled results rather than being masked by the per-promise
catches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 267e5daa-1b8e-4a19-b789-3e021357951b

📥 Commits

Reviewing files that changed from the base of the PR and between 5c297da and 94bfa73.

📒 Files selected for processing (11)
  • api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature-startup-tasks.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature-startup-tasks.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.module.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts
  • packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts
  • packages/unraid-api-plugin-connect/src/mothership-proxy/mothership.controller.ts
  • packages/unraid-api-plugin-connect/src/mothership-proxy/mothership.module.ts
  • packages/unraid-api-plugin-connect/src/remote-access/dynamic-remote-access.service.ts
  • packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts

- align DiskSensorsService tests with the new constant-time availability behavior

- before this change CI still expected isAvailable() to return false when no disks existed or when DisksService threw

- that no longer matched the startup fix, where disk sensor availability intentionally avoids calling getDisks

- the spec now asserts that availability remains true and that isAvailable() does not touch DisksService
- update the connect startup task test to await runConnectStartupTasks in the no-provider case

- before this change the test wrapped an async call in a synchronous not.toThrow assertion

- that could miss promise-settlement semantics even when the implementation resolved immediately

- the test now uses async/await and asserts the promise resolves before checking logger expectations
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 92.72727% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.55%. Comparing base (5c297da) to head (6d7aa96).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...solvers/metrics/temperature/temperature.service.ts 93.61% 3 Missing ⚠️
...src/unraid-api/plugin/plugin-management.service.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1988      +/-   ##
==========================================
+ Coverage   52.35%   52.55%   +0.19%     
==========================================
  Files        1033     1033              
  Lines       71626    71649      +23     
  Branches     8148     8172      +24     
==========================================
+ Hits        37503    37655     +152     
+ Misses      33998    33869     -129     
  Partials      125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts (1)

3-4: ⚠️ Potential issue | 🟡 Minor

Remove unused class imports in the test file.

MothershipController and DynamicRemoteAccessService are imported but never used. This adds noise and triggers static-analysis warnings.

Proposed cleanup
-import { MothershipController } from '../mothership-proxy/mothership.controller.js';
-import { DynamicRemoteAccessService } from '../remote-access/dynamic-remote-access.service.js';
 import {
     ConnectStartupTasksListener,
     runConnectStartupTasks,
 } from '../startup/connect-startup-tasks.js';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`
around lines 3 - 4, Remove the unused imports MothershipController and
DynamicRemoteAccessService from the test file; locate the import statements
importing MothershipController from 'mothership.controller.js' and
DynamicRemoteAccessService from 'dynamic-remote-access.service.js' in
connect-startup-tasks.test.ts and delete those two import specifiers so the test
no longer imports unused classes and the static-analysis warnings are resolved.
🧹 Nitpick comments (2)
api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts (1)

36-60: Consider table-driving the three isAvailable cases.

The three tests now verify the same invariant and can be collapsed into an it.each(...) block to reduce duplication while preserving intent.

♻️ Optional refactor
 describe('isAvailable', () => {
-    it('should return true without checking disks', async () => {
-        vi.mocked(disksService.getDisks).mockResolvedValue([
-            { id: 'disk1', device: '/dev/sda', name: 'Test Disk' } as unknown as Disk,
-        ]);
-
-        const available = await service.isAvailable();
-        expect(available).toBe(true);
-        expect(disksService.getDisks).not.toHaveBeenCalled();
-    });
-
-    it('should return true when no disks exist', async () => {
-        vi.mocked(disksService.getDisks).mockResolvedValue([]);
-
-        const available = await service.isAvailable();
-        expect(available).toBe(true);
-        expect(disksService.getDisks).not.toHaveBeenCalled();
-    });
-
-    it('should return true when DisksService would throw', async () => {
-        vi.mocked(disksService.getDisks).mockRejectedValue(new Error('Failed'));
-
-        const available = await service.isAvailable();
-        expect(available).toBe(true);
-        expect(disksService.getDisks).not.toHaveBeenCalled();
-    });
+    it.each([
+        ['when disks exist', () =>
+            vi.mocked(disksService.getDisks).mockResolvedValue([
+                { id: 'disk1', device: '/dev/sda', name: 'Test Disk' } as unknown as Disk,
+            ])],
+        ['when no disks exist', () =>
+            vi.mocked(disksService.getDisks).mockResolvedValue([])],
+        ['when DisksService would throw', () =>
+            vi.mocked(disksService.getDisks).mockRejectedValue(new Error('Failed'))],
+    ])('should return true without checking disks %s', async (_label, setupMock) => {
+        setupMock();
+        const available = await service.isAvailable();
+        expect(available).toBe(true);
+        expect(disksService.getDisks).not.toHaveBeenCalled();
+    });
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts`
around lines 36 - 60, Collapse the three redundant tests into a table-driven
it.each for service.isAvailable by creating an array of cases that sets
vi.mocked(disksService.getDisks) to either a resolved array ([{ id: 'disk1',
device: '/dev/sda', name: 'Test Disk' }]), a resolved empty array ([]), or a
rejected Error('Failed'), and for each case call await service.isAvailable() and
assert expect(available).toBe(true) and
expect(disksService.getDisks).not.toHaveBeenCalled(); keep the test name generic
like "returns true for various disksService outcomes" and reuse the existing
mocks (vi.mocked and disksService.getDisks) and service.isAvailable to locate
the code.
packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts (1)

47-50: Avoid asserting exact log message text in behavior tests.

These assertions couple tests to log wording. Prefer checking that warning behavior occurred and the error object was propagated.

Suggested assertion change
-        expect(logger.warn).toHaveBeenCalledWith(
-            'Dynamic remote access startup failed',
-            backgroundError
-        );
+        expect(logger.warn).toHaveBeenCalledTimes(1);
+        expect(logger.warn).toHaveBeenCalledWith(expect.any(String), backgroundError);
...
-        expect(logger.warn).toHaveBeenCalledWith(
-            'Dynamic remote access startup failed',
-            backgroundError
-        );
+        expect(logger.warn).toHaveBeenCalledTimes(1);
+        expect(logger.warn).toHaveBeenCalledWith(expect.any(String), backgroundError);

As per coding guidelines: "Test what the code does, not implementation details like exact error message wording - avoid writing tests that break when minor changes are made to ... log formats."

Also applies to: 72-75

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`
around lines 47 - 50, Replace fragile assertions that check exact log text with
checks that only assert a warning was emitted and the error object was
propagated: update the expect(logger.warn).toHaveBeenCalledWith('Dynamic remote
access startup failed', backgroundError) (and the similar assertion around lines
72-75) to something like
expect(logger.warn).toHaveBeenCalledWith(expect.any(String), backgroundError) or
assert logger.warn was called and that one of its args equals backgroundError;
keep the reference to logger.warn and backgroundError so the test no longer
depends on exact wording.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Line 127: In getMetrics, initializeProviders() is awaited outside the existing
error handling, so if initializeProviders rejects the rejection will bubble
instead of returning null like other failures; wrap the await
this.initializeProviders() call inside the getMetrics try/catch (or add a
dedicated try/catch around it) so any thrown error is caught, logged (use the
same logger used elsewhere in this service), and getMetrics returns null on
failure; reference the getMetrics method and the initializeProviders() method
when making the change.

---

Duplicate comments:
In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`:
- Around line 3-4: Remove the unused imports MothershipController and
DynamicRemoteAccessService from the test file; locate the import statements
importing MothershipController from 'mothership.controller.js' and
DynamicRemoteAccessService from 'dynamic-remote-access.service.js' in
connect-startup-tasks.test.ts and delete those two import specifiers so the test
no longer imports unused classes and the static-analysis warnings are resolved.

---

Nitpick comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts`:
- Around line 36-60: Collapse the three redundant tests into a table-driven
it.each for service.isAvailable by creating an array of cases that sets
vi.mocked(disksService.getDisks) to either a resolved array ([{ id: 'disk1',
device: '/dev/sda', name: 'Test Disk' }]), a resolved empty array ([]), or a
rejected Error('Failed'), and for each case call await service.isAvailable() and
assert expect(available).toBe(true) and
expect(disksService.getDisks).not.toHaveBeenCalled(); keep the test name generic
like "returns true for various disksService outcomes" and reuse the existing
mocks (vi.mocked and disksService.getDisks) and service.isAvailable to locate
the code.

In
`@packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts`:
- Around line 47-50: Replace fragile assertions that check exact log text with
checks that only assert a warning was emitted and the error object was
propagated: update the expect(logger.warn).toHaveBeenCalledWith('Dynamic remote
access startup failed', backgroundError) (and the similar assertion around lines
72-75) to something like
expect(logger.warn).toHaveBeenCalledWith(expect.any(String), backgroundError) or
assert logger.warn was called and that one of its args equals backgroundError;
keep the reference to logger.warn and backgroundError so the test no longer
depends on exact wording.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 085ffbbf-0b88-485e-ace5-92a45121edab

📥 Commits

Reviewing files that changed from the base of the PR and between 94bfa73 and d454d08.

📒 Files selected for processing (4)
  • api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts
  • packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts

}

async getMetrics(): Promise<TemperatureMetrics | null> {
await this.initializeProviders();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle initialization failures inside getMetrics.

At Line 127, initializeProviders() is awaited outside the method’s error handling block. If initialization rejects (e.g., unexpected config/runtime error), this can bubble and fail the request instead of returning null gracefully like the rest of getMetrics.

Suggested fix
 async getMetrics(): Promise<TemperatureMetrics | null> {
-    await this.initializeProviders();
+    try {
+        await this.initializeProviders();
+    } catch (err) {
+        this.logger.error('Failed to initialize temperature providers', err);
+        return null;
+    }
 
     // Check if we can use recent history instead of re-reading sensors
     const mostRecent = this.history.getMostRecentReading();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await this.initializeProviders();
try {
await this.initializeProviders();
} catch (err) {
this.logger.error('Failed to initialize temperature providers', err);
return null;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`
at line 127, In getMetrics, initializeProviders() is awaited outside the
existing error handling, so if initializeProviders rejects the rejection will
bubble instead of returning null like other failures; wrap the await
this.initializeProviders() call inside the getMetrics try/catch (or add a
dedicated try/catch around it) so any thrown error is caught, logged (use the
same logger used elsewhere in this service), and getMetrics returns null on
failure; reference the getMetrics method and the initializeProviders() method
when making the change.

- purpose: make the unraid-api CLI recover cleanly from wedged PM2 state on server startup
- before: start, restart, status, and stop passed --mini-list to PM2, and start left PM2 daemon state behind after stop/delete
- problem: on the coworker server, unraid-api start and pm2 start --mini-list were hanging for days while no managed unraid-api process existed in PM2
- now: the CLI removes --mini-list from PM2 lifecycle commands and start fully kills and clears the configured PM2 home before launching again
- how: update the start/restart/status/stop command wrappers and add regression coverage for the exact PM2 arguments we send
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/src/unraid-api/cli/start.command.ts`:
- Around line 26-27: The code calls await this.pm2.run({ tag: 'PM2 Kill' },
'kill', '--no-autorestart') and then await this.pm2.deletePm2Home() but omits
forceKillPm2Daemon(), which can leave a stuck daemon racing the delete; insert
an awaited call to this.pm2.forceKillPm2Daemon() between the pm2.run(... 'kill'
...) and this.pm2.deletePm2Home() calls so the daemon is forcibly terminated
before deleting PM2_HOME.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b6c14863-ee97-4f12-941e-1321dd8890b3

📥 Commits

Reviewing files that changed from the base of the PR and between d454d08 and c1bfdd3.

📒 Files selected for processing (5)
  • api/src/unraid-api/cli/__test__/pm2-commands.spec.ts
  • api/src/unraid-api/cli/restart.command.ts
  • api/src/unraid-api/cli/start.command.ts
  • api/src/unraid-api/cli/status.command.ts
  • api/src/unraid-api/cli/stop.command.ts

Comment on lines +26 to +27
await this.pm2.run({ tag: 'PM2 Kill' }, 'kill', '--no-autorestart');
await this.pm2.deletePm2Home();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Also force-kill the PM2 daemon before deleting PM2_HOME.

The hardened stop --delete path already does pm2 kill then forceKillPm2Daemon() then deletePm2Home(). Skipping forceKillPm2Daemon() here means the new startup-recovery path can still race a stuck daemon and fail to clear the old state cleanly.

Suggested fix
         await this.pm2.run({ tag: 'PM2 Delete' }, 'delete', ECOSYSTEM_PATH);
         await this.pm2.run({ tag: 'PM2 Kill' }, 'kill', '--no-autorestart');
+        await this.pm2.forceKillPm2Daemon();
         await this.pm2.deletePm2Home();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await this.pm2.run({ tag: 'PM2 Kill' }, 'kill', '--no-autorestart');
await this.pm2.deletePm2Home();
await this.pm2.run({ tag: 'PM2 Kill' }, 'kill', '--no-autorestart');
await this.pm2.forceKillPm2Daemon();
await this.pm2.deletePm2Home();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/src/unraid-api/cli/start.command.ts` around lines 26 - 27, The code calls
await this.pm2.run({ tag: 'PM2 Kill' }, 'kill', '--no-autorestart') and then
await this.pm2.deletePm2Home() but omits forceKillPm2Daemon(), which can leave a
stuck daemon racing the delete; insert an awaited call to
this.pm2.forceKillPm2Daemon() between the pm2.run(... 'kill' ...) and
this.pm2.deletePm2Home() calls so the daemon is forcibly terminated before
deleting PM2_HOME.

- Purpose: make plugin installs create the dependency recovery archive that startup already expects.
- Before: installs wrote vendor archive metadata but never generated the archive, so a partial install that lost node_modules had no recovery path.
- Problem: overlapping or interrupted installs on the server could leave /usr/local/unraid-api without dependencies, and restart failed immediately instead of self-healing.
- Change: the plugin installer now runs archive-dependencies before first start, and install verification now checks both node_modules and the vendor archive.
- How: add the archive creation step to the .plg install flow and extend verify_install.sh to validate dependency presence and vendor archive config integrity.
- Purpose: make deferred startup and temperature paths tolerate transient failures without wedging later reads.
- Before: temperature provider discovery could permanently cache an empty provider list after transient probe failures, and getMetrics could bubble initialization failures.
- Problem: a disk temperature query after a bad provider probe could return no data or fail instead of retrying cleanly.
- Change: provider discovery now retries after empty scans with probe failures, getMetrics returns null on initialization errors, and Connect startup logs settled rejection reasons without wrapping errors.
- How: inspect Promise.allSettled results directly for Connect startup, preserve structured Nest warning errors, and add behavior coverage for transient temperature probe failures.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts`:
- Around line 40-42: The failing test entry is a formatting issue inside the
it.each test data where
vi.mocked(disksService.getDisks).mockResolvedValue([...]) is split/misaligned;
reformat the it.each row so the mocked call and its array argument are on a
single well-indented line (or follow the project's prettier/formatter rules) —
specifically update the it.each entry that uses
vi.mocked(disksService.getDisks).mockResolvedValue and the Disk cast to be
properly spaced and indented so the test file passes the Test API formatting
check.

In
`@api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts`:
- Around line 121-122: The current assignment sets initialized =
availableProviders.length > 0 || !hadProbeFailure which marks the service
initialized if any probe succeeded or if there were no probe failures; change
this so initialized only becomes true when there were no probe failures and
there is at least one discovered provider (use a logical AND between
!hadProbeFailure and availableProviders.length > 0). Update the initialization
in temperature.service.ts where availableProviders, hadProbeFailure and
initialized are set so a transient probe error does not permanently hide
providers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a5e2bdae-b64b-4654-84a0-d410fa396850

📥 Commits

Reviewing files that changed from the base of the PR and between 0729266 and 6a55fe0.

📒 Files selected for processing (6)
  • api/src/unraid-api/graph/resolvers/metrics/temperature/sensors/disk_sensors.service.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.spec.ts
  • api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts
  • api/src/unraid-api/plugin/plugin-management.service.ts
  • packages/unraid-api-plugin-connect/src/__test__/connect-startup-tasks.test.ts
  • packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts
✅ Files skipped from review due to trivial changes (2)
  • api/src/unraid-api/plugin/plugin-management.service.ts
  • packages/unraid-api-plugin-connect/src/test/connect-startup-tasks.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/unraid-api-plugin-connect/src/startup/connect-startup-tasks.ts

Comment thread api/src/unraid-api/graph/resolvers/metrics/temperature/temperature.service.ts Outdated
- Purpose: address review feedback for temperature provider initialization and test formatting.

- Before: a transient provider probe failure could still mark initialization complete when another provider was found, and the disk sensor spec had a formatter-sensitive chained mock call.

- Problem: a one-time probe error could hide later provider recovery, while CI could fail on Prettier formatting even when behavior was unchanged.

- Now: initialization only completes after a clean provider scan with at least one provider, and the spec uses a local mocked function variable to avoid chained-call formatting churn.

- Verification: pnpm --filter ./api lint and targeted temperature Vitest specs passed locally.
- Purpose: prevent the API CI setup step from failing when libvirt is already running on the GitHub runner.

- Before: the workflow always launched /usr/sbin/libvirtd --daemon after installing libvirt-daemon-system.

- Problem: the APT package can already start libvirtd, so the second daemon failed while trying to obtain the existing pidfile.

- Now: the workflow first checks whether sudo virsh can reach libvirt, removes a stale pidfile only when no libvirtd process exists, and falls back through systemctl, service, and direct daemon startup.

- Verification: parsed .github/workflows/main.yml as YAML and ran bash -n against the Setup libvirt script block.
- Purpose: avoid failing CI on a system libvirt diagnostic that is not the hypervisor path used by the VM tests.

- Before: the setup step made sudo virsh list --all a hard gate after starting libvirtd.

- Problem: GitHub runners can reset the qemu:///system connection even though the tests validate qemu:///session.

- After: the workflow still starts libvirtd idempotently, then prints system virsh status without failing the job.

- How it works: the final diagnostic virsh command is restored to non-blocking behavior with || true.
- Purpose: restore the raw compact PM2 lifecycle output expected by unraid-api CLI users.

- Before: start, restart, status, and stop omitted --mini-list, so PM2 printed the formatted table output and startup banner.

- Problem: the formatted output is noisy in terminal and plugin flows, and it regressed the branch away from the expected raw text mode.

- New behavior: PM2 lifecycle commands pass --mini-list again for compact output while keeping the existing PM2 cleanup hardening.

- How it works: re-add --mini-list to start, restart, status, and stop, then update the PM2 command regression spec to assert those arguments.
- Purpose: keep the plugin installer focused on the node_modules packaged in the txz artifact.

- Before: the install flow created a boot-drive dependency recovery archive and verification required that archive to exist.

- Why: that made the archive a second source of truth and caused install verification to depend on recovery plumbing instead of the packaged dependency tree.

- What changed: remove the install-time archive creation and stop validating the vendor archive during install verification.

- How: leave the node_modules population check in verify_install.sh while dropping the vendor archive existence check.
- Purpose: move Connect startup work onto the same background scheduling shape used by Docker startup tasks.

- Before: Connect startup work was awaited from the app.ready listener through a helper that looked reusable but only had one production caller.

- Problem: the awaited helper made the startup path look heavier than it needed to be and obscured that mothership startup should not stall primary server readiness.

- New behavior: app.ready schedules remote access and mothership startup work with zero-delay timers, and each task logs its own failure without blocking the other.

- How it works: scheduleConnectStartupTasks accepts narrow service surfaces, catches background errors per task, and tests the listener through Nest's testing module.

- Build fix: add explicit portable Apollo and subscription return types so declaration emit no longer leaks private pnpm symlink paths.
- Purpose: suppress PM2 daemon startup banner output during CLI-managed PM2 commands.

- Before: PM2 commands inherited caller environment but did not force PM2 discrete mode.

- Problem: daemon startup could print PM2 banner output into command flows where raw output needs to stay clean.

- New behavior: PM2Service.run now sets PM2_DISCRETE_MODE to true by default while preserving any caller-provided override.

- How it works: the service normalizes the caller env once, keeps /usr/local/bin in PATH, always sets PM2_HOME, and adds a focused test for the final execa environment.
@github-actions
Copy link
Copy Markdown
Contributor

This plugin has been deployed to Cloudflare R2 and is available for testing.
Download it at this URL:

https://preview.dl.unraid.net/unraid-api/tag/PR1988/dynamix.unraid.net.plg

@elibosley elibosley merged commit 7b08ee2 into main Apr 15, 2026
13 checks passed
@elibosley elibosley deleted the codex/defer-startup-tasks branch April 15, 2026 19:08
elibosley pushed a commit that referenced this pull request Apr 15, 2026
🤖 I have created a release *beep* *boop*
---


## [4.32.2](v4.32.1...v4.32.2)
(2026-04-15)


### Bug Fixes

* **startup:** defer connect and temperature boot tasks
([#1988](#1988))
([7b08ee2](7b08ee2)),
closes [#1989](#1989)

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API not starting

2 participants