Automatic Letsencrypt certificate refresh#2730
Automatic Letsencrypt certificate refresh#2730j-chmielewski wants to merge 16 commits intorelease/2.0from
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an automated Let's Encrypt certificate renewal/refresh mechanism for the Edge (proxy) certificate, including admin email notifications (with logs attached) on ACME failures, and refactors shared ACME helper code into a dedicated core module.
Changes:
- Add periodic (daily) LetsEncrypt certificate expiry checks in the utility thread and trigger ACME refresh when within a threshold.
- Introduce a new
defguard_core::letsencryptmodule that encapsulates ACME trigger, timeout handling, cert persistence, and notification logic. - Add a new mail template + DB migration entries to notify admins when automatic certificate refresh fails (including logs attachment).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| migrations/20260417073540_[2.0.0]_letsencrypt_cert_refresh.up.sql | Adds mail_context rows for the new failure notification template. |
| migrations/20260417073540_[2.0.0]_letsencrypt_cert_refresh.down.sql | Removes the inserted mail_context rows for rollback. |
| crates/defguard_mail/templates/letsencrypt-cert-refresh-failed.text | Adds plaintext template for refresh-failure notification. |
| crates/defguard_mail/templates/letsencrypt-cert-refresh-failed.mjml | Adds MJML template for refresh-failure notification. |
| crates/defguard_mail/src/templates.rs | Adds mail-sending function for refresh-failure notifications (with log attachment). |
| crates/defguard_mail/src/mail.rs | Adds new MailMessage variant and wires it to templates + subject. |
| crates/defguard_core/src/utility_thread.rs | Schedules daily LE expiry checks and triggers refresh. |
| crates/defguard_core/src/lib.rs | Exposes new letsencrypt module. |
| crates/defguard_core/src/letsencrypt.rs | Implements refresh logic, proxy ACME trigger, cert persistence, and admin notification (plus tests). |
| crates/defguard_core/src/handlers/component_setup.rs | Reuses the shared LE/ACME helpers from the new module for the SSE setup flow. |
| crates/defguard_common/src/db/models/settings.rs | Adds Settings::proxy_hostname() helper + new SettingsUrlError variants. |
| crates/defguard/src/main.rs | Updates utility thread startup to pass proxy_control_tx. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn proxy_hostname(&self) -> Result<String, SettingsUrlError> { | ||
| let url = self | ||
| .proxy_public_url() | ||
| .map_err(|_err| SettingsUrlError::UnparsableEdgeUrl(self.public_proxy_url.clone()))?; | ||
| let hostname = url | ||
| .host_str() | ||
| .ok_or_else(|| SettingsUrlError::EdgeUrlMissingHostname(self.public_proxy_url.clone()))? | ||
| .to_string(); |
There was a problem hiding this comment.
proxy_hostname() doesn’t special-case an empty/whitespace public_proxy_url. If it’s empty, the user-facing error becomes Unparsable Edge url: , which is much less actionable than the previous behavior in the ACME flow. Consider trimming + explicitly erroring on empty, with guidance (e.g., “Public Edge URL is not configured…”) before attempting Url::parse.
| for user in admin_users { | ||
| templates::letsencrypt_cert_refresh_failed_mail(&user.email, &mut conn, &logs.join("\n")) |
There was a problem hiding this comment.
logs.join("\n") creates a temporary String and you’re passing a reference to it into an async call that is awaited. This pattern typically fails to compile (borrowed temporary does not live long enough across .await). Assign the joined logs to a local variable (e.g., let joined = logs.join("\n");) and pass &joined into the mail function (or change the mail function to take an owned String).
| for user in admin_users { | |
| templates::letsencrypt_cert_refresh_failed_mail(&user.email, &mut conn, &logs.join("\n")) | |
| let joined_logs = logs.join("\n"); | |
| for user in admin_users { | |
| templates::letsencrypt_cert_refresh_failed_mail(&user.email, &mut conn, &joined_logs) |
| tokio::spawn(async move { | ||
| let result = call_proxy_trigger_acme( | ||
| &pool_clone, | ||
| &proxy_host, | ||
| proxy_port, |
There was a problem hiding this comment.
tokio::spawn is used for the ACME RPC without keeping a JoinHandle. If the timeout branch triggers (see the sleep_until(deadline) arm below), do_letsencrypt_refresh returns but the spawned task keeps running in the background (potentially holding an open gRPC stream / keeping the Edge busy). Consider storing the JoinHandle and aborting it on timeout, or wrapping the entire call_proxy_trigger_acme future in tokio::time::timeout so it is actually cancelled.
| use tokio::sync::mpsc::{self, UnboundedSender, unbounded_channel}; | ||
| use tonic::{ | ||
| Request, | ||
| service::Interceptor, |
There was a problem hiding this comment.
tonic::service::Interceptor is imported but not used in this module (the client uses an inline closure interceptor). Removing the unused import will avoid warnings (and potential CI failures if warnings are denied).
| service::Interceptor, |
| let attachment = Attachment::new( | ||
| format!("defguard-letsencrypt-refresh-logs-{now}.txt"), |
There was a problem hiding this comment.
The attachment filename uses {now} via DateTime’s default Display formatting, which includes spaces and :. Those characters are commonly problematic in filenames/content-disposition handling across clients. Consider formatting the timestamp into a safe, filesystem/email-friendly form (e.g., now.format("%Y%m%dT%H%M%SZ")).
| let attachment = Attachment::new( | |
| format!("defguard-letsencrypt-refresh-logs-{now}.txt"), | |
| let timestamp = now.format("%Y%m%dT%H%M%SZ"); | |
| let attachment = Attachment::new( | |
| format!("defguard-letsencrypt-refresh-logs-{timestamp}.txt"), |
| ('letsencrypt-cert-refresh-failed', 'title', 'en_US', 'Letsencrypt certificate refresh failed'), | ||
| ('letsencrypt-cert-refresh-failed', 'content', 'en_US', 'Automatic Letsencrypt certificate refresh has failed. Please verify your Edge setup.'); |
There was a problem hiding this comment.
User-facing email copy uses "Letsencrypt"; elsewhere in the codebase docs/messages use "Let's Encrypt". Consider updating the inserted mail_context text to the standard branding/spelling for consistency.
| ('letsencrypt-cert-refresh-failed', 'title', 'en_US', 'Letsencrypt certificate refresh failed'), | |
| ('letsencrypt-cert-refresh-failed', 'content', 'en_US', 'Automatic Letsencrypt certificate refresh has failed. Please verify your Edge setup.'); | |
| ('letsencrypt-cert-refresh-failed', 'title', 'en_US', 'Let''s Encrypt certificate refresh failed'), | |
| ('letsencrypt-cert-refresh-failed', 'content', 'en_US', 'Automatic Let''s Encrypt certificate refresh has failed. Please verify your Edge setup.'); |
| Self::UserImportBlocked => "User import blocked".to_string(), | ||
| Self::EnrollmentNotification => "Defguard: User enrollment completed".to_string(), | ||
| Self::LetsencryptCertRefreshFailed => { | ||
| "Defguard: automatic Letsencrypt certificate refresh failed".to_string() |
There was a problem hiding this comment.
Email subject uses "Letsencrypt". For consistency with other user-facing text in the repo, consider using the standard "Let's Encrypt" spelling/capitalization.
| "Defguard: automatic Letsencrypt certificate refresh failed".to_string() | |
| "Defguard: automatic Let's Encrypt certificate refresh failed".to_string() |
| @@ -0,0 +1 @@ | |||
| DELETE FROM mail_context where "template" = 'letsencrypt-cert-refresh-failed'; | |||
There was a problem hiding this comment.
In migrations, other mail_context deletions use unquoted column names and uppercase WHERE (e.g., migrations/20260331114131_[2.0.0]_fix_mjml_label.up.sql:1). For consistency, consider DELETE FROM mail_context WHERE template = ... here as well (quotes on "template" aren’t needed).
| DELETE FROM mail_context where "template" = 'letsencrypt-cert-refresh-failed'; | |
| DELETE FROM mail_context WHERE template = 'letsencrypt-cert-refresh-failed'; |
No description provided.