).Ģ️⃣ If there isn’t a correct answer but you’ve got a reference answer meaning an example of what a good answer looks like, then you can use reference matching metrics like semantic similarity. It turns out that a great way to evaluate the performance of LLMs is by actually using other LLMs :)ġ️⃣ If there is a correct answer to your problem, then you can just use a metric as done in regular ML (I.e accuracy. How do you evaluate the performance of your LLM? How do you even know if your LLM is better than another one or if your new prompt is better than the previous one? Evaluation of regular ML models is easy compared to LLMs!
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |