iSCSI: Add NOP-Out keepalive and handle target-initiated NOP-In#64
Merged
LTRData merged 2 commits intoLTRData:LTRData.DiscUtils-initialfrom Apr 17, 2026
Merged
Conversation
During idle periods an iSCSI target can close the session due to the DefaultTime2Retain/DefaultTime2Wait timeout expiring while no traffic is flowing. This causes the next I/O to fail with a broken connection rather than a clean retry. Changes in Connection.cs: 1. SemaphoreSlim (_streamSemaphore) - serialises all PDU traffic on the NetworkStream so the keepalive timer callback and the main Send / SendAsync / Logout paths can never interleave writes or reads on the same socket. 2. Keepalive timer (_keepAliveTimer) - fires every 10 seconds after a successful login and calls KeepAliveCallback. The callback skips the tick if the semaphore is already held (a real command is in flight), so it adds zero latency to normal operation. 3. SendNopOut - issues an initiator-initiated NOP-Out PDU with ITT=0xFFFFFFFF and TTT=0xFFFFFFFF (RFC 3720 §10.18). With these reserved values the target MUST NOT send a NOP-In response, so there is no read-side traffic to worry about. 4. HandleNopIn - when the target itself initiates a NOP-In (TTT != 0xFFFFFFFF) the initiator is required to respond with a NOP-Out that echoes the TTT (RFC 3720 §10.19). Previously DiscUtils had no handler for this opcode, so a target-initiated ping would surface as an unrecognised PDU error. The StatSN carried by a target-initiated NOP-In is informational and does NOT consume a sequence number, so SeenStatusSequenceNumber() is deliberately NOT called. 5. ReadPdu / ReadPduAsync - loop over consecutive NOP-In PDUs (up to 16) before returning the next real command response, which keeps the async path correct without changing any public API.
Owner
|
Looks good, I think. Just made a small modification to optimize away allocation of a temporary byte array each time and also using async send methods to not unnecessarily block pool threads. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During idle periods (no active I/O on the iSCSI session) the target enforces its DefaultTime2Retain/DefaultTime2Wait timeout and will silently drop the session. The next SCSI command then fails with a broken-pipe / EOF error instead of a clean retry, causing the caller to see an unexpected I/O error.
A second, related issue is that DiscUtils had no handler for target-initiated NOP-In PDUs at all. A target can send a NOP-In at any time to probe the initiator, and the RFC requires the initiator to echo it back with a NOP-Out. Without this handler the PDU was returned to the caller as an unrecognised opcode, corrupting the command/response stream.
Fix – changes in Connection.cs
1 – SemaphoreSlim _streamSemaphore
All traffic that touches the underlying NetworkStream (writes in Send, SendAsync, Logout, and the new keepalive callback; reads in ReadPdu / ReadPduAsync) is now serialised by a SemaphoreSlim(1,1). This prevents the timer callback from interleaving with an in-flight command.
2 – Timer _keepAliveTimer + KeepAliveCallback
Started immediately after the login negotiation completes. It fires every 10 seconds and sends a NOP-Out ping. If the semaphore is already held (a real command is in progress) the tick is silently skipped — it simply waits for the next interval. Any IOException or ObjectDisposedException during the ping is swallowed; the broken connection will be surfaced naturally on the next real command.
3 – SendNopOut
Builds and sends an initiator-initiated NOP-Out with ITT = 0xFFFFFFFF and TTT = 0xFFFFFFFF (RFC 3720 §10.18). With these reserved values the target must not send a NOP-In response, so there is no extra PDU to read — the keepalive is purely write-side.
4 – HandleNopIn
Responds to a target-initiated NOP-In (TTT ≠ 0xFFFFFFFF) by sending a NOP-Out that echoes the TTT (RFC 3720 §10.19). Critically, SeenStatusSequenceNumber() is not called here: the RFC states that the StatSN carried in a target-initiated NOP-In is informational and does not consume a sequence number — calling it would cause the next real response to fail with a StatSN mismatch.
5 – ReadPdu / ReadPduAsync – NOP-In loop
Both sync and async read paths now loop over consecutive NOP-In PDUs (max 16) before returning the next real response PDU, ensuring neither path is disrupted by a target ping arriving between command and response.
RFC references