QuaCer-C: Quantitative Certification of Knowledge Comprehension in LLMs
QuaCer-C introduces a novel certification framework for Large Language Models (LLMs), addressing the lack of formal guarantees in traditional performance evaluations. This work provides quantitative certificates that offer high-confidence, tight bounds on the probability of correct answers for knowledge comprehension prompts.
Key contributions of this research include:
- Formal specification of the knowledge comprehension property using popular knowledge graphs like Wikidata5m.
- Modeling certification as a probability estimation problem, leveraging Clopper-Pearson confidence intervals for provable, high-confidence bounds.
- Generation of certificates for popular LLMs including Llama 7B and 13B, Vicuna 7B and 13B, and Mistral-7B.
Our findings indicate that knowledge comprehension capability improves with an increase in the number of model parameters. Comparisons between different model classes reveal that Mistral performs less effectively than Llama and Vicuna in this evaluation.
This work contributes to the development of more reliable and verifiable AI systems, crucial for applications requiring trusted knowledge processing and comprehension.