Skip to content

Add DeBERTa-V3 submission — ConPara + all RAID attacks#110

Merged
liamdugan merged 2 commits intoliamdugan:mainfrom
MohamedMady19:main
Apr 20, 2026
Merged

Add DeBERTa-V3 submission — ConPara + all RAID attacks#110
liamdugan merged 2 commits intoliamdugan:mainfrom
MohamedMady19:main

Conversation

@MohamedMady19
Copy link
Copy Markdown

DeBERTa-ConPara-v3

Architecture: DeBERTa-v3-base (184M) + 30 MI-selected linguistic features + Projection Head (768→256→128)
Training: HC3Plus + M4 + MAGE + all 11 RAID attack types + 60K T5 paraphrase pairs (~826K samples, 50/50)
New vs v2: SupCon loss (τ=0.07, λ=0.5) + T5 flan-t5-base online paraphrase augmentation (p=0.15)
Threshold: 0.95

@MohamedMady19
Copy link
Copy Markdown
Author

Hi @liamdugan, the evaluation workflow has been stuck in "Waiting" state for ~3 days (PR #108 and the previous submission evaluated within hours). Could you take a look? Happy to resubmit if needed. Thanks!

@liamdugan
Copy link
Copy Markdown
Owner

Hi @MohamedMady19 I have the eval pipeline set up to require manual approval for running evaluation (to prevent people from abusing the system).

Your previous PR was submitted during work hours so I was able to immediately get to it. However, this new submission was made 8:11am Saturday morning (my time) and thus I did not see it until now.

Please be patient when submitting PRs over the weekend 🙂 I will get to them during the week. Thanks

@github-actions
Copy link
Copy Markdown

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

DeBERTa-ConPara-v3

Release date: 2026-04-18

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 96.68 and a TPR of 95.88% at FPR=5% and 87.87% at FPR=1%.
Without adversarial attacks, it achieved AUROC of 96.90 and a TPR of 96.11% at FPR=5% and 89.07% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

@MohamedMady19
Copy link
Copy Markdown
Author

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

DeBERTa-ConPara-v3

Release date: 2026-04-18

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 96.68 and a TPR of 95.88% at FPR=5% and 87.87% at FPR=1%. Without adversarial attacks, it achieved AUROC of 96.90 and a TPR of 96.11% at FPR=5% and 89.07% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

Thank you so much, please feel free to merge it to the Leadboard

@liamdugan liamdugan merged commit 6b3a2f1 into liamdugan:main Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants