EphemeralRunnerSet controller does not clean up runners stuck in init container failure

### Checks

- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [ ] I am using charts that are officially provided

### Controller Version

0.13.0

### Deployment Method

Helm

### Checks

- [x] This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions/actions-runner-controller/discussions)).
- [x] I've read the [Changelog](https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md#changelog) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

### To Reproduce

```markdown
1. Deploy an `AutoscalingRunnerSet` with init containers (e.g., a `fix-docker-sock` init container that runs `ln -sf /var/run/user/1000/docker.sock /var/run/docker.sock`)
2. Have a node where the container runtime (containerd) experiences transient failures (e.g., `failed to create containerd task: failed to create shim task: context canceled`)
3. The init container fails with `Init:StartError`, causing the Pod phase to transition to `Failed`
4. Observe that the `EphemeralRunner` resource remains in `Pending` status indefinitely
5. The failed pods accumulate and consume runner pool capacity
```

### Describe the bug

When an EphemeralRunner pod fails during init container execution (e.g., `Init:StartError` due to containerd runtime failure), the EphemeralRunner controller does not effectively handle the failure. The failed pods remain and consume pool capacity, degrading the runner pool.

### Describe the expected behavior

I think:

- `EphemeralRunner` controller should inspect `pod.Status.InitContainerStatuses` in addition to main container statuses to properly detect and report init container failures
- `EphemeralRunnerSet` controller should clean up `Failed` EphemeralRunners during normal reconciliation (not only during deletion), similar to how it cleans up `Succeeded` runners

When a pod fails due to init container errors, the `EphemeralRunner` status should transition to `Failed` (not remain `Pending`), so that:
   - Operators can observe the actual state via `kubectl get ephemeralrunners`
   - External cleanup mechanisms (e.g., CronJobs) can target failed runners


### Additional Context

```yaml
N/A
```

### Controller Logs

```shell
N/A
```

### Runner Pod Logs

```shell
N/A
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EphemeralRunnerSet controller does not clean up runners stuck in init container failure #4452

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EphemeralRunnerSet controller does not clean up runners stuck in init container failure #4452

Description

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions