VetLLM Leaderboard

VetLLM evaluates LLMs with veterinary and medical programming tasks.

Average Quality Score
543210
3.17
3.07
2.97
2.9
2.41
Mistral-7B-Instruct-v0.2
Llama-3.1-8B-Instruct
openchat-3.5-1210
vetllm-mistral-7b-merged-pmc2
gemma-7b
#ModelPass@1
1
Instruction-tuned Mistral 7B model
3.17
2
Instruction-tuned LLaMA 3.1 (8B)
3.07
3
Strong 7B conversational model
2.97
4
Fine-tuned Mistral 7B for veterinary tasks
2.9
5
Official 7B model from Google's Gemma family
2.41