Skip to content

Update EmTest#1463

Open
MohsenTaheriShalmani wants to merge 2 commits intomasterfrom
aiClassificationTest
Open

Update EmTest#1463
MohsenTaheriShalmani wants to merge 2 commits intomasterfrom
aiClassificationTest

Conversation

@MohsenTaheriShalmani
Copy link
Contributor

Update thresholds and AIClassificationEMTestBase based on the last set of experiments.

@Cfg("If using THRESHOLD for AI Classification Repair, specify its value." +
" All classifications with probability equal or above such threshold value will be accepted.")
var classificationRepairThreshold = 0.8
var classificationRepairThreshold = 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes based on latest experiments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... i see you wrote it in the description of this PR... :)

@Cfg("Minimum confidence threshold required for the AI response classifier to decide" +
"whether to send a request as-is or attempt a repair.")
var aIResponseClassifierWeaknessThreshold = 0.4
var aIResponseClassifierWeaknessThreshold = 0.8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes based on latest experiments?


for(ok in ok2xx){

if (isWeakClassifier(model, ok, weaknessThreshold)) continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i m unsure about this... we will need to discuss. for example, if the model is always weak, would it mean this test will always pass? that would be against the point of having a E2E. or is guassian not able to reliably solve these simples APIs in these E2Es?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants