Skip to content

fix: resolve intermediary variable detection in Java and Python engines#390

Open
Ayush-Patel-56 wants to merge 2 commits intocbomkit:mainfrom
Ayush-Patel-56:fix/issue-8-tracing
Open

fix: resolve intermediary variable detection in Java and Python engines#390
Ayush-Patel-56 wants to merge 2 commits intocbomkit:mainfrom
Ayush-Patel-56:fix/issue-8-tracing

Conversation

@Ayush-Patel-56
Copy link
Copy Markdown

Description

Resolves #8 by implementing recursive symbol tracing in the Java and Python detection engines.

The engine previously used shallow name-based matching for variables, which caused cryptographic rules to fail when intermediary variables were introduced (e.g., SeatInterface intermediary = s; Car c = new Car(intermediary);). This PR introduces a tracing mechanism that resolves variables back to their original initialization point.

Changes

  • Recursive Tracing: Added traceSymbol to follow variable assignments through the AST.
  • Deep Equivalence: Implemented areSymbolsEquivalent to compare symbols after tracing them to their source declaration.
  • Engine Updates: Modified isInvocationOnVariable and isInitForVariable in both JavaDetectionEngine and PythonDetectionEngine to use deep symbol resolution.
  • Regression Testing: Added Issue8IntermediaryVariableTest covering nested rule detection across assignment levels.

Verification

  • Unit Tests: All 157 existing tests passed.
  • Manual Verification: Confirmed via DetectionStoreLogger that the engine correctly associates children nodes through intermediary variables.
  • Trace Proof: Logs confirm successful symbol resolution: Comparing symbols: s (traced to s) with intermediary (traced to s).

See attached screenshot for the successful test execution and detection hierarchy.
Screenshot 2026-05-04 103511

Signed-off-by: Ayush Patel <ayushpatel2731@gmail.com>
Copy link
Copy Markdown
Contributor

@n1ckl0sk0rtge n1ckl0sk0rtge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ayush-Patel-56 Thanks for tackling this ❤️
The diagnosis is correct and the approach (replace name-equality with a recursive symbol trace) is reasonable. A few items below; flagging the bigger ones as required and one as a follow-up.

Required before merge

Correctness

  • Python traceSymbol — first-usage-wins is non-deterministic. The loop returns on the first ASSIGNMENT_LHS encountered while iterating symbol.usages(). For a variable reassigned more than once (x = y; x = z; use(x)), which RHS is followed depends on iteration order rather than which assignment reaches the call site. The original name comparison was wrong but predictable; this is wrong and non-deterministic. At minimum, document the limitation in a comment; ideally pick the assignment lexically nearest the use site.
  • C# engine not updated. CSharpDetectionEngine has the same shallow name().equals() pattern. If C# is out of scope intentionally, say so in the PR description — otherwise apply the same fix there.

Code quality

  • Java duplicates an existing helper. JavaDetectionEngine.java:923 already has traceVariable(IdentifierTree) doing the same chain-of-initializers walk on IdentifierTree. The new traceSymbol(Symbol) is a near-clone. Refactor so there's one implementation (e.g., have traceVariable delegate to traceSymbol), rather than shipping two parallel walkers.
  • Stray whitespace in PythonDetectionEngine.java:21. Blank line was changed to a line with a single space — unrelated to the fix. mvn spotless:apply should drop it.
  • @Rule(key = "Issue8") on the test class isn't the convention here (see Issue16Test, Issue214Test, etc.). Please remove.
  • Trailing-whitespace-only lines in Issue8IntermediaryVariableTest.java. Spotless will clean these up.
  • Unused HeatedSeats rule. seatRules declares a rule for HeatedSeats but the test file never instantiates it. Drop the dead rule or add a HeatedSeats case.

Test coverage

The single new test covers the simplest case (intermediary = s as a constructor argument). For a change that introduces recursive resolution into both engines, please add:

  • Chain length > 1 (a = s; b = a; new Car(b)).
  • Method-invocation receiver path (isInvocationOnVariable) — currently only isInitForVariable via new Car(...) is exercised.
  • At least one Python regression test. The Python engine path has no test coverage in this PR, and Python's usages()-based trace differs structurally from Java's declaration()-based one — they need independent verification.

Optional in this PR — please file a follow-up

  • Recursion guard. Both traceSymbol methods recurse without a visited set or depth bound. Java is hard to construct a cycle in (compiler usually catches it); Python's a = b; b = a would loop forever, since the existing !rhsSymbol.equals(symbol) check only blocks the direct self-cycle. Real-world Python rarely produces this, so it's fine to defer — but please open a follow-up issue so it isn't lost. A small Set<Symbol> visited or a depth cap will close it cheaply.

Signed-off-by: Ayush Patel <ayushpatel2731@gmail.com>
@Ayush-Patel-56
Copy link
Copy Markdown
Author

Addressed in the latest commit:

  • Added chain-length > 1 test case (b → a → s) to the existing intermediary variable test
  • Added Issue8MethodReceiverTest to cover the isInvocationOnVariable path (s.describe() where s traces through an intermediary)
  • Added Python Issue8IntermediaryVariableTest covering Cipher(intermediary, ...) resolving through a variable chain
    Cleaned up the existing test - removed @Rule, HeatedSeats, and refactored traceVariable to delegate to traceSymbol

Will open a separate issue for the recursion guard. C# is intentionally out of scope as it doesn't have a traceSymbol equivalent yet.

@Ayush-Patel-56
Copy link
Copy Markdown
Author

Issue created: #400

@Ayush-Patel-56
Copy link
Copy Markdown
Author

Ayush-Patel-56 commented May 6, 2026

Opened #401 to add a recursion guard to traceSymbol for the mutual assignment cycle case (a = b; b = a) that scenario is not covered by this PR but is a natural edge case of the same tracing logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with handling detections of sub-rules

2 participants