Allow on-the-fly updates to adaptors, schemas, and icons by taylordowns2000 · Pull Request #4473 · OpenFn/lightning

taylordowns2000 · 2026-02-26T12:46:41Z

Description

Enables on-the-fly updates to the adaptor registry, credential schemas, and adaptor icons without requiring an application restart. Superusers can trigger refreshes from a new Settings > Maintenance admin page, and a configurable Oban cron job (ADAPTOR_REFRESH_INTERVAL_HOURS) keeps them in sync automatically across clustered nodes.

Closes #3114
Closes #2209
Closes #325 (wow what a golden oldie!)
Closes #1996

Changes

New modules: AdaptorIcons, CredentialSchemas, and AdaptorRefreshWorker — extract runtime refresh logic out of mix tasks into callable modules
AdaptorRegistry — refactored GenServer state from a bare list to a map (%{adaptors, cache_path, local_mode}); added refresh/1, refresh_sync/1, and PubSub-based cross-node sync via adaptor:refresh topic
MaintenanceLive — new LiveView under Settings with action buttons for each refresh operation, gated to superusers
Mix tasks (install_adaptor_icons, install_schemas) — slimmed down to thin wrappers around the new modules
Oban cron — AdaptorRefreshWorker runs on a configurable interval; skips in local adaptors mode; broadcasts to peer nodes on success
Minor .env.example typo fixes

Validation steps

Clear you adaptor registry and credential schemas
Log in as a superuser
Note that you can't find adaptors
Note that you can't find credential schemas
Go to the admin dashbaord
Go to "Maintenance"
Click the buttons and check that the things come back!

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

I have used Claude Code
I have used another model
I have not used AI

You can read more details in our
Responsible AI Policy

Pre-submission checklist

I have performed an AI review of my code (we recommend using /review
with Claude Code)
I have implemented and tested all related authorization policies.
(e.g., :owner, :admin, :editor, :viewer)
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

codecov · 2026-02-27T02:06:42Z

Codecov Report

❌ Patch coverage is 60.88710% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.07%. Comparing base (e7c8d51) to head (9003c09).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/lightning/maintenance.ex	0.00%	61 Missing ⚠️
lib/lightning_web/live/maintenance_live/index.ex	57.50%	17 Missing ⚠️
lib/lightning/adaptor_registry.ex	71.42%	8 Missing ⚠️
lib/lightning/adaptor_refresh_worker.ex	83.33%	4 Missing ⚠️
lib/lightning/config/bootstrap.ex	57.14%	3 Missing ⚠️
lib/lightning/application.ex	66.66%	2 Missing ⚠️
lib/lightning/credential_schemas.ex	97.43%	1 Missing ⚠️
lib/mix/tasks/install_adaptor_icons.ex	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4473      +/-   ##
==========================================
- Coverage   89.47%   89.07%   -0.41%     
==========================================
  Files         425      430       +5     
  Lines       20212    20388     +176     
==========================================
+ Hits        18085    18160      +75     
- Misses       2127     2228     +101

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

josephjclark · 2026-03-03T09:26:44Z

When I try and run this locally I get:

[notice] Application lightning exited: exited in: Lightning.Application.start(:normal, [:timex])
    ** (EXIT) an exception was raised:
        ** (RuntimeError) No Oban instance named `Oban` is running and config isn't available.

            (oban 2.20.1) lib/oban/registry.ex:37: Oban.Registry.config/1
            (oban 2.20.1) lib/oban.ex:609: Oban.insert/3
            (lightning 2.15.15-pre) lib/lightning/application.ex:155: Lightning.Application.start/2
            (kernel 10.2.6) application_master.erl:349: :application_master.start_supervisor/3
            (kernel 10.2.6) application_master.erl:331: :application_master.start_the_app/5
            (kernel 10.2.6) application_master.erl:317: :application_master.start_it_new/7

Main is fine

I haven't looked closely at this, but Claude says:

The error is happening in schedule_adaptor_refresh/0 at [application.ex:160-166](vscode-webview://0t3qgbcac14hmcf78knd4046spna3jtbcqlru439rntvjiue4v02/lib/lightning/application.ex#L160-L166).

The function calls Oban.insert/3 immediately after the supervisor starts, but Oban hasn't fully initialized yet by that point in the startup sequence. Oban needs a moment after its supervisor child starts before it can accept job insertions.

The root cause is a race condition in the startup sequence:

Supervisor.start_link(children, opts) starts all children (including Oban) and returns {:ok, pid}
But Oban.insert/3 is called synchronously right after, before Oban's internal registry/config is fully ready
The unless condition also won't protect against this — both local_adaptors_enabled?() and the :test env check pass, so it tries to insert the job.

How to fix it: You'd typically schedule this via a small delay (schedule_in: 5 or similar) or use a Task that waits, or restructure so the Oban insert happens after a confirmed ready signal. The simplest fix is wrapping the insert in a Task.start/1 so it runs asynchronously after the supervisor has returned:


defp schedule_adaptor_refresh do
  unless Lightning.AdaptorRegistry.local_adaptors_enabled?() or
           Lightning.Config.env() == :test do
    Task.start(fn ->
      Lightning.AdaptorRefreshWorker.new(%{}, schedule_in: 0)
      |> Oban.insert()
    end)
  end
end
Or just use schedule_in: 5 (seconds) to give Oban time to initialize before the job is inserted.

not sure if this is coincidence or a problem on the branch?

taylordowns2000 · 2026-03-03T09:42:31Z

Unfortunately that's just what happens when you start the app, @josephjclark !

I reported it first on September 10th, 2025. @stuartc reported it again more recently on January 21st.

josephjclark

Tested this locally and it seems to work! The same mix commands still work as on main (so we don't need to change anything in deployments), and the buttons on the admin page work great.

I get Oban errors on this branch (but not others). Claude made some suggestions which made it go away. I honestly have no idea - an adult should take a look at it.

I'm a little discomforted that the logic of the schema install (and presumably others) has changed so much. Error handling looks different. I'm sure it's fine!

stuartc

Hey @taylordowns2000, thanks for tackling this -- you know it's something I've
wanted to address for a long time. I have a few architectural concerns specific
to clustered/ephemeral deployments.

Rolling restarts: late nodes miss the refresh

The Oban worker uses unique: [period: 3600], which means only the first node's
startup job gets accepted during a rolling deploy. The job can run on any node
-- including one that's about to be terminated. The PubSub broadcast only
reaches nodes that are alive and subscribed at that moment. Nodes that start
later get their Oban job deduped and never receive a broadcast, so they're stuck
with build-time data until the next cron run.

Rough timeline for a 3-node rolling deploy:

T=0   Old A, Old B, Old C running
T=1   New A starts -> Oban job accepted
T=3   Job runs on Old B (about to die)
T=8   Job finishes -> broadcasts -> New A receives it
T=10  Old B terminates (refreshed data lost)
T=12  New B starts -> Oban job DEDUPED -> no broadcast -> stuck with build-time data
T=17  New C starts -> same -> stuck

New A ends up with fresh icons/schemas, but New B and New C are stuck until the
next scheduled cron run (which could be hours away).

Icon path is different in releases

Plug.Static serves from :lightning's release priv dir (e.g.
/app/lib/lightning-x.y.z/priv/static/), but adaptor_icons_path in prod is
"priv/static/images/adaptors" -- a CWD-relative path that resolves to
/app/priv/static/images/adaptors/. These are different directories. Icons
written at runtime go to the cwd relative path but Phoenix serves from the
release priv dir, so refreshed icons get written to disk but are never actually
served to users.

The build-time mix task works because it runs before mix release, so the files
end up inside the release artifact. At runtime, adaptor_icons_path would need
to resolve via :code.priv_dir(:lightning) instead.

Also bare in mind that if we put files inside priv/static without finger
printint and updating the manifest file we lose cache control (i.e. cache
busting filenames), and gz encoding (not that big a deal I guess with compressed
images). So Plug.Static DOES do etags, and 304s, but if you have fingerprinted
paths then you can tell the browser to cache forever (1 year) and you never get
another request from the same client. Not that big of a deal with adaptor
icons though.

Ephemeral storage and the "go fetch yourself" pattern

On ephemeral containers, all runtime files are lost on restart. Each node
independently fetches from NPM, GitHub, and jsDelivr after receiving a broadcast
-- the broadcast says "go refresh yourself" rather than "data is available."
With 2-3 nodes this is ok, but it means each deploy triggers N independent
fetches from external services, and the data is immediately lost when that
container cycles.

A possible alternative direction

When I was working on this, my thinking was to invert this for
clustered/ephemeral setups something like:

PubSub signals "invalidate your cache" rather than "go fetch your own
copy."
DB as single source of truth for registry data and schemas (they're just
JSON). One Oban worker fetches and writes to Postgres, all nodes read from DB
with an ETS cache. No stale window on startup -- new nodes read from DB
immediately. (My old unfinished branch used the filesystem as well, but I
think it may be time to use the db)
Lazy caching proxy for icons -- serve through a dynamic route (not
Plug.Static) that fetches from github or wherever on first request with a
TTL. This sidesteps the path divergence and ephemeral storage issues entirely,
and avoids downloading the full adaptors tarball for icons that may never be
requested. Or a mix of both.

This kind of stuff needs to be tested in an environment much closer to
production, the priv directory thing is super easy to miss in dev mode, it's not
the first time I've wanted something like this, but being able to build a proper
Elixir release image (or no image) easily and repeatibly would be super valuable
here.

Happy to chat about any of this -- I know this is tricky territory and if we
don't get it right it's gonna be a real pain, and equally awesome if we do.
Really glad to see this moving forward!

taylordowns2000 · 2026-03-08T21:33:41Z

Awesome feedback @stuartc . I think I've understood and implemented it and my basic click-testing seems to work. Claude description of the changes below:

Overview

Replace the current filesystem + GenServer-state storage for adaptor registry, credential schemas, and adaptor icons with Postgres as single source of truth and ETS as a read-through cache. This eliminates three production issues:

rolling-restart stale data,
icon path divergence in releases, and
ephemeral storage data loss.

Files Created (9 new)

┌──────────────────────────────────────────────────────────────────────┬─────────────────────────────────────┐
│                                 File                                 │               Purpose               │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ priv/repo/migrations/20260308204728_create_adaptor_cache_entries.exs │ Migration for adaptor_cache_entries │
│                                                                      │  table                              │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ lib/lightning/adaptor_data/cache_entry.ex                            │ Ecto schema                         │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ lib/lightning/adaptor_data.ex                                        │ DB context (put, get, put_many,     │
│                                                                      │ delete)                             │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ lib/lightning/adaptor_data/cache.ex                                  │ ETS read-through cache              │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ lib/lightning/adaptor_data/listener.ex                               │ PubSub invalidation listener        │
│                                                                      │ GenServer                           │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ lib/lightning_web/controllers/adaptor_icon_controller.ex             │ Lazy icon proxy (serves PNGs +      │
│                                                                      │ manifest)                           │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│                                                                      │ Endpoint plug to intercept          │
│ lib/lightning_web/plugs/adaptor_icons.ex                             │ /images/adaptors/* before           │
│                                                                      │ Plug.Static                         │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ test/lightning/adaptor_data_test.exs                                 │ Context CRUD tests                  │
├──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────┤
│ test/lightning/adaptor_data/cache_test.exs + listener_test.exs       │ Cache + listener tests              │
└──────────────────────────────────────────────────────────────────────┴─────────────────────────────────────┘

Files Modified (key changes)

  ┌─────────────────────────────────────────────────────────────────────┬──────────────────────────────────────┐
  │                                File                                 │                Change                │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │                                                                     │ Reads from ETS/DB cache instead of   │
  │ lib/lightning/adaptor_registry.ex                                   │ GenServer state; removed PubSub      │
  │                                                                     │ subscription to old topic;           │
  │                                                                     │ simplified startup                   │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │                                                                     │ Writes to DB, broadcasts             │
  │ lib/lightning/adaptor_refresh_worker.ex                             │ {:invalidate_cache, kinds},          │
  │                                                                     │ uniqueness 3600→60s, no icon refresh │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning/credential_schemas.ex                                 │ Added fetch_and_store/0 for DB       │
  │                                                                     │ writes                               │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │                                                                     │ Rewritten: manifest from registry +  │
  │ lib/lightning/adaptor_icons.ex                                      │ lazy icon fetch from GitHub (no      │
  │                                                                     │ tarball)                             │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning/credentials.ex                                        │ get_schema/1 reads from cache with   │
  │                                                                     │ filesystem fallback                  │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning_web/live/credential_live/credential_form_component.ex │ get_type_options reads from cache    │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning_web/live/maintenance_live/index.ex                    │ Uses new DB-backed refresh functions │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning_web/endpoint.ex                                       │ Added AdaptorIcons plug before       │
  │                                                                     │ Plug.Static                          │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ lib/lightning/application.ex                                        │ ETS table init + Listener GenServer  │
  │                                                                     │ in supervision tree                  │
  ├─────────────────────────────────────────────────────────────────────┼──────────────────────────────────────┤
  │ test/support/{conn_case,channel_case,data_case}.ex                  │ Sandbox allow for AdaptorRegistry +  │
  │                                                                     │ registry seeding                     │
  └─────────────────────────────────────────────────────────────────────┴──────────────────────────────────────┘

Architecture

  External sources (NPM, GitHub, jsDelivr)
          ↓ (Oban worker fetches once)
      PostgreSQL (adaptor_cache_entries)
          ↓ (read-through)
      ETS cache (Lightning.AdaptorData.Cache)
          ↓ (PubSub invalidation across cluster)
      All nodes read from ETS → DB fallback

concept

abb66df

github-project-automation bot added this to Core Feb 26, 2026

github-project-automation bot moved this to New Issues in Core Feb 26, 2026

taylordowns2000 added 8 commits February 26, 2026 13:50

clean

43f3f7d

more

52ec045

cl

9cb633c

does it run

542933f

remove unused

77efa71

implicit try

8e13ade

typo

6a23414

pr review changes

25dfcd0

taylordowns2000 marked this pull request as ready for review February 26, 2026 23:35

taylordowns2000 requested review from josephjclark and stuartc February 27, 2026 08:15

refresh on startup, thanks stu

ec919db

claude fixes for oban

9003c09

josephjclark approved these changes Mar 3, 2026

View reviewed changes

stuartc requested changes Mar 5, 2026

View reviewed changes

github-project-automation bot moved this from New Issues to In review in Core Mar 5, 2026

DB-backed storage

dadd9c9

taylordowns2000 requested a review from stuartc March 8, 2026 21:33

taylordowns2000 added 3 commits March 8, 2026 23:53

handle cache on tests

03a82f3

updated tests

59acbd4

expect newer http

d4adb94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow on-the-fly updates to adaptors, schemas, and icons#4473

Allow on-the-fly updates to adaptors, schemas, and icons#4473
taylordowns2000 wants to merge 15 commits intomainfrom
christmas-for-devops

taylordowns2000 commented Feb 26, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

josephjclark commented Mar 3, 2026

Uh oh!

taylordowns2000 commented Mar 3, 2026

Uh oh!

josephjclark left a comment

Uh oh!

stuartc left a comment

Uh oh!

taylordowns2000 commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

taylordowns2000 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Validation steps

AI Usage

Pre-submission checklist

Uh oh!

codecov bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

josephjclark commented Mar 3, 2026

Uh oh!

taylordowns2000 commented Mar 3, 2026

Uh oh!

josephjclark left a comment

Choose a reason for hiding this comment

Uh oh!

stuartc left a comment

Choose a reason for hiding this comment

Rolling restarts: late nodes miss the refresh

Icon path is different in releases

Ephemeral storage and the "go fetch yourself" pattern

A possible alternative direction

Uh oh!

taylordowns2000 commented Mar 8, 2026

Overview

Files Created (9 new)

Files Modified (key changes)

Architecture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taylordowns2000 commented Feb 26, 2026 •

edited

Loading

codecov bot commented Feb 27, 2026 •

edited

Loading