Skip to content

iSCSI: Add NOP-Out keepalive and handle target-initiated NOP-In#64

Merged
LTRData merged 2 commits intoLTRData:LTRData.DiscUtils-initialfrom
devedse:iscsi-nop-keepalive
Apr 17, 2026
Merged

iSCSI: Add NOP-Out keepalive and handle target-initiated NOP-In#64
LTRData merged 2 commits intoLTRData:LTRData.DiscUtils-initialfrom
devedse:iscsi-nop-keepalive

Conversation

@devedse
Copy link
Copy Markdown

@devedse devedse commented Apr 16, 2026

Problem

During idle periods (no active I/O on the iSCSI session) the target enforces its DefaultTime2Retain/DefaultTime2Wait timeout and will silently drop the session. The next SCSI command then fails with a broken-pipe / EOF error instead of a clean retry, causing the caller to see an unexpected I/O error.

A second, related issue is that DiscUtils had no handler for target-initiated NOP-In PDUs at all. A target can send a NOP-In at any time to probe the initiator, and the RFC requires the initiator to echo it back with a NOP-Out. Without this handler the PDU was returned to the caller as an unrecognised opcode, corrupting the command/response stream.

Fix – changes in Connection.cs

1 – SemaphoreSlim _streamSemaphore

All traffic that touches the underlying NetworkStream (writes in Send, SendAsync, Logout, and the new keepalive callback; reads in ReadPdu / ReadPduAsync) is now serialised by a SemaphoreSlim(1,1). This prevents the timer callback from interleaving with an in-flight command.

2 – Timer _keepAliveTimer + KeepAliveCallback

Started immediately after the login negotiation completes. It fires every 10 seconds and sends a NOP-Out ping. If the semaphore is already held (a real command is in progress) the tick is silently skipped — it simply waits for the next interval. Any IOException or ObjectDisposedException during the ping is swallowed; the broken connection will be surfaced naturally on the next real command.

3 – SendNopOut

Builds and sends an initiator-initiated NOP-Out with ITT = 0xFFFFFFFF and TTT = 0xFFFFFFFF (RFC 3720 §10.18). With these reserved values the target must not send a NOP-In response, so there is no extra PDU to read — the keepalive is purely write-side.

4 – HandleNopIn

Responds to a target-initiated NOP-In (TTT ≠ 0xFFFFFFFF) by sending a NOP-Out that echoes the TTT (RFC 3720 §10.19). Critically, SeenStatusSequenceNumber() is not called here: the RFC states that the StatSN carried in a target-initiated NOP-In is informational and does not consume a sequence number — calling it would cause the next real response to fail with a StatSN mismatch.

5 – ReadPdu / ReadPduAsync – NOP-In loop

Both sync and async read paths now loop over consecutive NOP-In PDUs (max 16) before returning the next real response PDU, ensuring neither path is disrupted by a target ping arriving between command and response.

RFC references

  • RFC 3720 §10.18 – NOP-Out (initiator-initiated ping)
  • RFC 3720 §10.19 – NOP-In (target-initiated ping and response)

devedse and others added 2 commits April 16, 2026 14:48
During idle periods an iSCSI target can close the session due to the
DefaultTime2Retain/DefaultTime2Wait timeout expiring while no traffic
is flowing. This causes the next I/O to fail with a broken connection
rather than a clean retry.

Changes in Connection.cs:

1. SemaphoreSlim (_streamSemaphore) - serialises all PDU traffic on
   the NetworkStream so the keepalive timer callback and the main
   Send / SendAsync / Logout paths can never interleave writes or
   reads on the same socket.

2. Keepalive timer (_keepAliveTimer) - fires every 10 seconds after a
   successful login and calls KeepAliveCallback.  The callback skips
   the tick if the semaphore is already held (a real command is in
   flight), so it adds zero latency to normal operation.

3. SendNopOut - issues an initiator-initiated NOP-Out PDU with
   ITT=0xFFFFFFFF and TTT=0xFFFFFFFF (RFC 3720 §10.18).  With these
   reserved values the target MUST NOT send a NOP-In response, so
   there is no read-side traffic to worry about.

4. HandleNopIn - when the target itself initiates a NOP-In (TTT !=
   0xFFFFFFFF) the initiator is required to respond with a NOP-Out
   that echoes the TTT (RFC 3720 §10.19).  Previously DiscUtils had
   no handler for this opcode, so a target-initiated ping would
   surface as an unrecognised PDU error.  The StatSN carried by a
   target-initiated NOP-In is informational and does NOT consume a
   sequence number, so SeenStatusSequenceNumber() is deliberately
   NOT called.

5. ReadPdu / ReadPduAsync - loop over consecutive NOP-In PDUs
   (up to 16) before returning the next real command response, which
   keeps the async path correct without changing any public API.
@LTRData
Copy link
Copy Markdown
Owner

LTRData commented Apr 17, 2026

Looks good, I think. Just made a small modification to optimize away allocation of a temporary byte array each time and also using async send methods to not unnecessarily block pool threads.

@LTRData LTRData merged commit d24de85 into LTRData:LTRData.DiscUtils-initial Apr 17, 2026
4 checks passed
@devedse devedse deleted the iscsi-nop-keepalive branch April 18, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants