[Bug]:  WORKER_ALREADY_EXISTS on restart — should worker names be unique per instance?

### What happened?

When a worker process restarts, or when multiple instances run in parallel, workers fail to register with this error:

```
StreamError(Io(Custom { kind: AddrInUse, error: "WORKER_ALREADY_EXISTS" }))
```

The workers never recover — they loop on the error until manual intervention.

This breaks two common scenarios for us:

**1. Rolling deploys (AWS ECS).** During a rolling deploy, the new task starts before the old task is stopped, so both run simultaneously for 30s–2min. The new task cannot register because the old task's Postgres session still holds the advisory lock. The deploy produces minutes of error logs until the old task finally shuts down.

**2. Horizontal scaling.** Running the same task with 2+ instances fails the same way — only one instance successfully registers, the others loop indefinitely. The docs' horizontal scaling example (`./my-worker & ./my-worker &`) does not work as written with `PostgresStorage`.

After digging into the source, this is caused by the advisory-lock check in the register SQL:

```sql
ON CONFLICT (id) DO UPDATE SET ...
WHERE pg_try_advisory_lock(hashtext(workers.id));
```
from https://github.com/apalis-dev/apalis-postgres/blob/main/queries/worker/register.sql

The previous session still holds the lock, so `pg_try_advisory_lock` returns false, zero rows are updated, and the Rust layer returns `WORKER_ALREADY_EXISTS`.

### Expected behavior

One of the following should be true, and clearly documented:

- Worker names can be stable (`email-worker`) and restart/scale cleanly, or
- Worker names must be unique per running instance (`email-worker-{instance_id}`), and the docs should say so explicitly — including for rolling deploys, not just scaling.

The current docs recommend "a unique, descriptive name" with the example `email-worker-us-east`, which doesn't make the per-instance requirement clear.

reference : https://apalis.dev/docs/guides/workers/introduction#best-practices


### Steps to reproduce


1. Start a process with a worker named `email-worker` using `PostgresStorage`.
2. Before stopping it, start a second process with the same name.
3. The second process fails with `WORKER_ALREADY_EXISTS`.

Same result if you stop the first process and restart it quickly — the old Postgres session hasn't fully closed yet.

### Minimal code example

```rust
use apalis::prelude::*;
use apalis_postgres::PostgresStorage;
use sqlx::PgPool;
use tokio::signal::unix::{signal, SignalKind};
use std::time::Duration;

#[derive(Clone, serde::Serialize, serde::Deserialize)]
struct Email { to: String }

#[derive(Clone, serde::Serialize, serde::Deserialize)]
struct Sms { to: String }

async fn send_email(_: Email) -> Result<(), BoxDynError> {
    Ok(())
}

async fn send_sms(_: Sms) -> Result<(), BoxDynError> {
    Ok(())
}

async fn shutdown_signal() {
    let mut sigterm = signal(SignalKind::terminate()).unwrap();
    let mut sigint = signal(SignalKind::interrupt()).unwrap();
    tokio::select! {
        _ = sigterm.recv() => {},
        _ = sigint.recv() => {},
    }
}

#[tokio::main]
async fn main() -> Result<(), BoxDynError> {
    let pool = PgPool::connect(&std::env::var("DATABASE_URL")?).await?;
    PostgresStorage::setup(&pool).await?;

    let email_storage = PostgresStorage::new(&pool);
    let sms_storage = PostgresStorage::new(&pool);

    Monitor::new()
        .register(
            WorkerBuilder::new("email-worker")
                .backend(email_storage)
                .build_fn(send_email),
        )
        .register(
            WorkerBuilder::new("sms-worker")
                .backend(sms_storage)
                .build_fn(send_sms),
        )
        .shutdown_timeout(Duration::from_secs(30))
        .run_with_signal(shutdown_signal())
        .await?;

    Ok(())
}
```

### Version

1.0.0-rc.x

### Environment

- `apalis` 1.0.0-rc.4
- `apalis-postgres` 1.0.0-rc.4
- PostgreSQL 16
- Rust 1.91
- Deployment: AWS ECS (Fargate)


### Relevant log output

```shell
worker errored; restarting worker=email-worker error=StreamError(Io(Custom { kind: AddrInUse, error: "WORKER_ALREADY_EXISTS" })) attempt=0 shutting_down=true
worker errored; restarting worker=sms-worker error=StreamError(Io(Custom { kind: AddrInUse, error: "WORKER_ALREADY_EXISTS" })) attempt=0 shutting_down=true
```

### Additional context

Questions that would help clarify the intended usage:

1. Is this expected behavior, or a lifecycle bug?
2. What is the recommended pattern for zero-downtime rolling deploys?
3. Should the docs be updated to clearly state that worker names must be unique per instance (including across deploy overlap), and show how to derive them from environment?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: WORKER_ALREADY_EXISTS on restart — should worker names be unique per instance? #21

What happened?

Expected behavior

Steps to reproduce

Minimal code example

Version

Environment

Relevant log output

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: WORKER_ALREADY_EXISTS on restart — should worker names be unique per instance? #21

Description

What happened?

Expected behavior

Steps to reproduce

Minimal code example

Version

Environment

Relevant log output

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions