Skip to content

fix(c-parser): reject call sites inside function bodies; support nginx-style split-line defs (#331)#352

Open
justrach wants to merge 1 commit intomainfrom
issue-331-c-parser-fix
Open

fix(c-parser): reject call sites inside function bodies; support nginx-style split-line defs (#331)#352
justrach wants to merge 1 commit intomainfrom
issue-331-c-parser-fix

Conversation

@justrach
Copy link
Copy Markdown
Owner

Summary

Fixes #331 — two C parser correctness bugs.

False positives: call sites indexed as functions

fprintf(stderr, ...), curl_easy_perform(curl), and similar call sites inside function bodies were being indexed as function definitions.

Fix: Track c_brace_depth as we scan C/C++ lines:

  • C: reject function candidates at brace_depth > 0 (any depth ≥ 1 is inside a function body)
  • C++: allow brace_depth == 1 (class/struct method bodies at depth 1 are valid definitions); reject at depth ≥ 2

Brace counting uses countBracesDelta() which skips string and char literal contents to avoid false increments from lines like const char *s = "fake() {".

Recall gap: nginx-style split-line definitions

Functions written with the return type on its own line:

ngx_int_t
ngx_http_init_connection(ngx_connection_t *c)
{

had before_name.len == 0 and were silently dropped.

Fix: When before_name is empty and the function name is at column 0 (at_col0), accept the name if the previous non-blank line looks like a type identifier (not a statement, brace, semicolon, or call expression).

New tests

  • issue-331: C parser does not index indented call sites as functions
  • issue-331: C parser finds nginx-style split-line definitions

All 411 existing tests continue to pass (zig build test exits 0).

Test plan

  • zig build test — all 411 tests pass
  • Both new issue-331 tests pass
  • issue-319, issue-321 regression tests still pass

Generated with Devin

…t-line defs

Fixes #331.

Two C parser bugs are addressed:

1. **False positives**: call sites like `fprintf(stderr, ...)` were
   indexed as function definitions when they appeared inside a function
   body.  Fix: track brace depth (`c_brace_depth`) as we parse C/C++
   lines. For C files, reject any function candidate at depth > 0; for
   C++ allow depth 1 (class/struct body). Brace counting uses
   `countBracesDelta` which skips string and char literals, avoiding
   false increments from `const char *s = "fake() {"`.

2. **Recall gap (nginx-style)**: definitions where the return type is on
   the previous line (`ngx_int_t\nngx_http_init_connection(...)`) had
   `before_name.len == 0` and were dropped. Fix: when `before_name` is
   empty and `at_col0` is true, accept if the previous non-blank line
   looks like a type identifier (not a statement or brace).

New tests:
- `issue-331: C parser does not index indented call sites as functions`
- `issue-331: C parser finds nginx-style split-line definitions`

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 510501 521228 +2.10% +10727 OK
codedb_changes 55450 57252 +3.25% +1802 OK
codedb_deps 8593 8788 +2.27% +195 OK
codedb_edit 6869 5928 -13.70% -941 OK
codedb_find 61322 60137 -1.93% -1185 OK
codedb_hot 102200 94618 -7.42% -7582 OK
codedb_outline 291984 284553 -2.55% -7431 OK
codedb_read 92113 87617 -4.88% -4496 OK
codedb_search 180253 172383 -4.37% -7870 OK
codedb_snapshot 252374 260177 +3.09% +7803 OK
codedb_status 217357 214108 -1.49% -3249 OK
codedb_symbol 63867 59307 -7.14% -4560 OK
codedb_tree 75665 66455 -12.17% -9210 OK
codedb_word 69326 67532 -2.59% -1794 OK

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 246e072ea1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/explore.zig
if (extractIdent(std.mem.trimStart(u8, prev_trimmed, " \t*")) == null) return null;
return name;
}
if (!at_col0 and !looksLikeCMethodDef(before_name)) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow indented keyword-prefixed C/C++ definitions

Remove or relax this gate: it rejects any non-column-0 definition unless looksLikeCMethodDef passes, and that helper returns false when the first token is a C keyword like static, inline, extern, or const. As a result, indented definitions such as static/inline methods inside C++ class bodies (and indented static C functions) are no longer indexed, which is a regression in symbol recall introduced by this commit.

Useful? React with 👍 / 👎.

Comment thread src/explore.zig
Comment on lines +4368 to +4371
} else if (ch == '{') {
delta += 1;
} else if (ch == '}') {
delta -= 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore inline block comments in brace-depth accounting

Brace depth is updated by counting { and } directly on the parsed line, but only string/char literals are skipped; /* ... */ spans are not. This causes braces inside inline block comments (for example /* } */ within a function body) to corrupt c_brace_depth, which can incorrectly return to file scope and reintroduce false-positive function indexing for subsequent call sites.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C parser still indexes call sites as functions and misses nginx-style definitions

1 participant