-
Notifications
You must be signed in to change notification settings - Fork 140
soundwire: cadence: Fixes for message completion interrupt race #5659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: topic/sof-dev
Are you sure you want to change the base?
soundwire: cadence: Fixes for message completion interrupt race #5659
Conversation
|
@bardliao @shumingfan Do you have soak/stress tests you could run on Realtek so we can get more confidence this doesn't cause any new problems? |
|
@rfvirgil There are some
That means Device 0 is attached. Not sure if it is a real issue. |
| CDNS_MCP_INT_SLAVE_MASK, 0); | ||
|
|
||
| int_status &= ~CDNS_MCP_INT_SLAVE_MASK; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfvirgil @shumingfan Is it possible that a Peripheral can generate a new event while handling a special event and lead to an infinite loop in a corner case?
30adad5 to
03f9ba5
Compare
…thread
Clear the CDNS_MCP_INT_RX_WL interrupt before signalling completion.
This is to prevent the potential race where:
- The main thread is scheduled immediately the completion is signalled,
and starts a new message
- The RX_WL IRQ for this new message happens before sdw_cdns_irq() has
been re-scheduled.
- When sdw_cdns_irq() is re-scheduled it clears the new RX_WL interrupt.
MAIN THREAD | IRQ THREAD
|
_cdns_xfer_msg() |
{ |
write data to FIFO |
wait_for_completion_timeout() |
<BLOCKED> | <---- RX_WL IRQ
| sdw_cdns_irq()
| {
| signal completion
<== RESCHEDULE <==
Handle message completion |
} |
|
Start new message |
_cdns_xfer_msg() |
{ |
write data to FIFO |
wait_for_completion_timeout() |
<BLOCKED> | <---- RX_WL IRQ
==> RESCHEDULE ==>
| // New RX_WL IRQ is cleared before
| // it has been handled.
| clear CDNS_MCP_INTSTAT
| return IRQ_HANDLED;
| }
Before this change, this error message was sometimes seen on kernels
that have large amounts of debugging enabled:
SCP Msg trf timed out
This error indicates that the completion has not been signalled after
500ms.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 956baa1 ("soundwire: cdns: Add sdw_master_ops and IO transfer support")
03f9ba5 to
015497c
Compare
|
I've reduced this pull request to only the one patch to fix the race between interrupt thread handling RX_WL IRQ, and main thread starting a new message. |
This series is to fix three places in the cadence IRQ handling where interrupt evens could be lost.Two are variations of the same problem that the code cleared interrupts AFTER it handled them. This is not the correct way, because any interrupt that re-triggered while the irq handler function is running would then be cleared at the end of the function without ever having been handled.
The other case is to avoid masking the PING status change interrupt while waiting for the work function to handle it. There is a potential problem here if state changes that happen while masked are lost,New version, this pull is now only to fix the potential race caused by clearing the RX_WL interrupt AFTER signaling the completion to the main thread. The main thread could be scheduled immediately and start a new message which raises an IRQ that is then cleared by the IRQ thread before it is handled.