IDENTIFYING HULLCINATION IN RETREIVAL AUGMENTED GENERATION

Authors

  • Ashish Bansal USA Author

Keywords:

Generative AI, Hallucinations, Large Language Models, Retreival Augmented Generation

Abstract

Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far by blending the strengths of Large Language Models (LLMs) with the dynamic retrieval of external documents to enhance the quality and relevance of generated content. However, scaling such prototypes to robust products with minimized hallucinations or fake responses still remains an open challenge, especially in niche data heavy domains such as medical or financial industry. Understanding and diagnosing these hallucinations is crucial. It not only ensures the integrity and reliability of the information provided by these models but also pushes the boundaries of AI’s utility in critical applications. The ongoing problem of hallucinations has tempered enterprise enthusiasm for generative artificial intelligence (AI). While large language models (LLMs) can generate rapid, confident and fluent responses to many user prompts, some are demonstrably false. Be- cause of the closed nature of these systems, systematically assessing these claims is challenging and explianing those responses are much harder. In this work, we discussed an approach to detect the hallucination in RAG responses caused by the retrieved information and provide the metric of hallucination in these response. This work give a practitioner to detect hallucinations during the runtime and mitigate those responses to enhance the reliability and relevance of the generated responses from RAG systems. Particularly, we focus on identifying the root cause of hallucinations scoping are study on retrieved information and generated response.

References

Rohrbach, A., Hendricks, L. A., Burns, K., Darrell, T. Saenko, K. Object hallucination in image captioning. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E., Chiang, D., Hockenmaier, J. Tsujii, J.) 4035–4045 (Association for Computa- tional Linguistics, 2018).

Xiao, Y. Wang, W. Y. On hallucination and predictive uncertainty in conditional language generation. In Proc. 16th Conference of the European Chapter of the Association for Com- putational Linguistics 2734–2744 (Association for Computational Linguistics, 2021).

Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv.55, 248 (2023).

Opdahl, A. L. et al. Trustworthy journalism through AI. Data Knowl. Eng. 146, 102182 (2023).

Maynez, J., Narayan, S., Bohnet, B. McDonald, R. On faithfulness and factuality in abstrac- tive summarization. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D., Chai, J., Schluter, N. Tetreault, J.) 1906–1919 (Association for Computational Linguistics, 2020).

Desai, S. Durrett, G. Calibration of pre-trained transformers. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B., Cohn, T., He, Y. Liu, Y.) 295–302 (Association for Computational Linguistics, 2020).

MacKay, D. J. C. Information-based objective functions for active data selection. Neural Com- put. 4, 590–604 (1992).

Xiao, T. Z., Gomez, A. N. & Gal, Y. Wat zei je? Detecting out-of-distribution translations with variational transformers. In Workshop on Bayesian Deep Learning at the Conference on Neu- ral Information Processing Systems (NeurIPS, Vancouver, 2019).

Negri, M., Bentivogli, L., Mehdad, Y., Giampiccolo, D. & Marchetti, A. Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora. In Proc. 2011 Confer- ence on Empirical Methods in Natural Language Processing 670–679 (Association for Com- putational Linguistics, 2011).

Laban, P., Schnabel, T., Bennett, P. N. & Hearst, M. A. SummaC: re-visiting NLI-based models for inconsistency detection in summarization. Trans. Assoc. Comput. Linguist. 10, 163–177 (2022).

Lee, K., Chang, M.-W. & Toutanova, K. Latent retrieval for weakly supervised open domain question answering. In Proc. 57th Annual Meeting of the Association for Computational Linguistics 6086–6096 (Association for Computational Linguistics, 2019).

Penedo, G. et al. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. In Proc. 36th Conference on Neural Information Process- ing Systems (eds Oh, A. et al.) 79155–79172 (Curran Associates, 2023)

Murray, K. & Chiang, D. Correcting length bias in neural machine translation. In Proc. Third Conference on Machine Translation (eds Bojar, O. et al.) 212–223 (Assoc. Comp. Linguistics, 2018).

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

Kane, H., Kocyigit, Y., Abdalla, A., Ajanoh, P. & Coulibali, M. Towards neural similarity evalu- ators. In Workshop on Document Intelligence at the 32nd conference on Neural Information Processing (2019).

Downloads

Published

2023-08-30

How to Cite

Ashish Bansal. (2023). IDENTIFYING HULLCINATION IN RETREIVAL AUGMENTED GENERATION. INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET), 14(07), 104-109. https://lib-index.com/index.php/IJARET/article/view/IJARET_14_07_007