fix: resolve intermediary variable detection in Java and Python engines#390
fix: resolve intermediary variable detection in Java and Python engines#390Ayush-Patel-56 wants to merge 2 commits intocbomkit:mainfrom
Conversation
Signed-off-by: Ayush Patel <ayushpatel2731@gmail.com>
There was a problem hiding this comment.
@Ayush-Patel-56 Thanks for tackling this ❤️
The diagnosis is correct and the approach (replace name-equality with a recursive symbol trace) is reasonable. A few items below; flagging the bigger ones as required and one as a follow-up.
Required before merge
Correctness
- Python
traceSymbol— first-usage-wins is non-deterministic. The loop returns on the firstASSIGNMENT_LHSencountered while iteratingsymbol.usages(). For a variable reassigned more than once (x = y; x = z; use(x)), which RHS is followed depends on iteration order rather than which assignment reaches the call site. The original name comparison was wrong but predictable; this is wrong and non-deterministic. At minimum, document the limitation in a comment; ideally pick the assignment lexically nearest the use site. - C# engine not updated.
CSharpDetectionEnginehas the same shallowname().equals()pattern. If C# is out of scope intentionally, say so in the PR description — otherwise apply the same fix there.
Code quality
- Java duplicates an existing helper.
JavaDetectionEngine.java:923already hastraceVariable(IdentifierTree)doing the same chain-of-initializers walk onIdentifierTree. The newtraceSymbol(Symbol)is a near-clone. Refactor so there's one implementation (e.g., havetraceVariabledelegate totraceSymbol), rather than shipping two parallel walkers. - Stray whitespace in
PythonDetectionEngine.java:21. Blank line was changed to a line with a single space — unrelated to the fix.mvn spotless:applyshould drop it. @Rule(key = "Issue8")on the test class isn't the convention here (seeIssue16Test,Issue214Test, etc.). Please remove.- Trailing-whitespace-only lines in
Issue8IntermediaryVariableTest.java. Spotless will clean these up. - Unused
HeatedSeatsrule.seatRulesdeclares a rule forHeatedSeatsbut the test file never instantiates it. Drop the dead rule or add aHeatedSeatscase.
Test coverage
The single new test covers the simplest case (intermediary = s as a constructor argument). For a change that introduces recursive resolution into both engines, please add:
- Chain length > 1 (
a = s; b = a; new Car(b)). - Method-invocation receiver path (
isInvocationOnVariable) — currently onlyisInitForVariablevianew Car(...)is exercised. - At least one Python regression test. The Python engine path has no test coverage in this PR, and Python's
usages()-based trace differs structurally from Java'sdeclaration()-based one — they need independent verification.
Optional in this PR — please file a follow-up
- Recursion guard. Both
traceSymbolmethods recurse without a visited set or depth bound. Java is hard to construct a cycle in (compiler usually catches it); Python'sa = b; b = awould loop forever, since the existing!rhsSymbol.equals(symbol)check only blocks the direct self-cycle. Real-world Python rarely produces this, so it's fine to defer — but please open a follow-up issue so it isn't lost. A smallSet<Symbol> visitedor a depth cap will close it cheaply.
Signed-off-by: Ayush Patel <ayushpatel2731@gmail.com>
|
Addressed in the latest commit:
Will open a separate issue for the recursion guard. C# is intentionally out of scope as it doesn't have a |
|
Issue created: #400 |
|
Opened #401 to add a recursion guard to |
Description
Resolves #8 by implementing recursive symbol tracing in the Java and Python detection engines.
The engine previously used shallow name-based matching for variables, which caused cryptographic rules to fail when intermediary variables were introduced (e.g.,
SeatInterface intermediary = s; Car c = new Car(intermediary);). This PR introduces a tracing mechanism that resolves variables back to their original initialization point.Changes
traceSymbolto follow variable assignments through the AST.areSymbolsEquivalentto compare symbols after tracing them to their source declaration.isInvocationOnVariableandisInitForVariablein bothJavaDetectionEngineandPythonDetectionEngineto use deep symbol resolution.Issue8IntermediaryVariableTestcovering nested rule detection across assignment levels.Verification
DetectionStoreLoggerthat the engine correctly associates children nodes through intermediary variables.Comparing symbols: s (traced to s) with intermediary (traced to s).See attached screenshot for the successful test execution and detection hierarchy.
