Sorry for bothering , due to resource constraints, my GPU can only support llama3_8b_judge at best, not the 70b llama3 or the expensive gpt4o api 😫.
Would it be okay if i use llama3_8b_judge as the M.J. like you commented in your script? Would this cause a really bad / unbalanced evaluation? Really looking for some help here🥺.
What would you suggest, in my scenario, the best way to evaluate the output scores of datasets like dreamtts or cn-college-listen?
Looking forward to your reply.
Sorry for bothering , due to resource constraints, my GPU can only support llama3_8b_judge at best, not the 70b llama3 or the expensive gpt4o api 😫.
Would it be okay if i use llama3_8b_judge as the M.J. like you commented in your script? Would this cause a really bad / unbalanced evaluation? Really looking for some help here🥺.
What would you suggest, in my scenario, the best way to evaluate the output scores of datasets like dreamtts or cn-college-listen?
Looking forward to your reply.