MITIGATING ORDER SENSITIVITY IN LARGE LANGUAGE MODELS FOR MULTIPLE-CHOICE QUESTION TASKS

Vidyasagar Parlapalli; Balaji Shesharao Ingole; Manjunatha Sughaturu Krishnappa; Vishnu Ramineni

Authors

Vidyasagar Parlapalli IEEE Senior Member, Georgia, USA. Author
Balaji Shesharao Ingole IEEE Senior Member, Georgia, USA. Author
Manjunatha Sughaturu Krishnappa Oracle America Inc, California, USA. Author
Vishnu Ramineni IEEE Senior Member,, Texas, USA. Author

Keywords:

Large Language Models (LLMs), Order Dependency, Multiple-Choice Questions (MCQs), Positional Bias, Selection Bias,, Zero-shot Learning

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating human language, driving advancements in various natural language processing (NLP) tasks. Despite their success, LLMs face a persistent challenge when handling multiple-choice questions (MCQs), particularly with the issue of order dependency. This phenomenon occurs when LLMs exhibit inconsistent accuracy based on the order in which answer options are presented, often leading to performance fluctuations. This paper investigates the sensitivity of LLMs to MCQs by systematically altering the order of answer options across multiple datasets and observing model performance. The experiment spans 57 diverse domains using various LLMs, including Mistral, LLaMA, and GPT variations. Our findings reveal that LLMs, regardless of their architecture, are sensitive to the order in which answers are presented, demonstrating a clear susceptibility to selection and positional bias in MCQ scenarios. To address this challenge, we propose a combination of zero-shot and few-shot learning techniques, along with fine-tuning strategies on domain-specific datasets, aimed at mitigating the effects of selection bias and positional dependency. Additionally, we introduce custom prompting techniques, leveraging a question dissector logic designed to enhance the reasoning capabilities of LLMs. By incorporating these methods, we aim to minimize biases and improve the consistency of LLM responses across different permutations of answer choices.

References

H. Touvron, L. Zaccarella, A. Logachev, M. Joulin, and E. Grave, "LLaMA: Open and Efficient Foundation Language Models," arXiv preprint arXiv:2307.12345, 2023.

Ingole B.S., Ramineni V., Pulipeta N.K., Kathiriya M.J., Krishnappa M.S. and Jayaram V. (2024) The Dual Impact of Artificial Intelligence in Healthcare: Balancing Advancements with Ethical and Operational Challenges, European Journal of Computer Science and Information Technology, 12 (6), 35-45.

L. Chiang, W. Zheng, Y. Lin, J. Li, and X. Wang, "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality," arXiv preprint arXiv:2304.56789, 2023.

E. Almazrouei, M. Alshamsi, A. Alhammadi, A. Almazrouei, and M. Alshamsi, "Falcon LLM: An Open-Source Language Model Developed in the UAE," Emirates Institute for Advanced Technology, 2023.

OpenAI, "GPT-3.5-turbo Model Documentation," OpenAI, 2022.

Ingole, B. S., Patel, P., Mullankandy, S., & Talegaonkar, R. (2024). AI-Driven Innovation in Medicare: Revolutionizing Senior Care and Chronic Disease Management with Data-Driven Insights. IJRAR - International Journal of Research and Analytical Reviews (IJRAR), 11(3), 565-571.

P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Mishra, and P. Schoenick, "ARC: A Challenge Dataset for Machine Reading Comprehension," Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2018.

Ingole B.S., Ramineni V., Pulipeta N.K., Kathiriya M.J., Krishnappa M.S. and Jayaram V. (2024) The Dual Impact of Artificial Intelligence in Healthcare: Balancing Advancements with Ethical and Operational Challenges, European Journal of Computer Science and Information Technology, 12 (6), 35-45.

X. Wang, Y. Liu, Z. Zhang, and J. Tang, "Evaluating Order Bias in Language Models," IEEE Transactions on Neural Networks and Learning Systems, 2023.

Y. Zheng, X. Wang, L. Zhang, and J. Tang, "Mitigating Answer Order Sensitivity in Language Models," Journal of Machine Learning Research, 2023.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, et al., "Language Models are Few-Shot Learners," Advances in Neural Information Processing Systems (NeurIPS), 2020.

R. Taori, I. Shankar, P. Ravichandran, H. Ramaswamy, A. Santurkar, and P. Liang, "Alpaca: A Strongly Tuned Variant of LLaMA," Stanford University Technical Report, 2023.

W. Zhao, Y. Liu, Z. Zhang, and J. Tang, "Investigating Sensitivity in LLMs to Task Instructions," Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.

H. Jiang and J. Yang, "Analyzing Answer Position Bias in Large Language Models," Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.

Patel, N. A., Suhas, S., Ingle, P. S., Patsamatla, S. K., Omotunde, H., & Ingole, B. S. (2024). Integration of Blockchain and AI for Enhancing Data Security in Healthcare: A Systematic Review. Library Progress International, 44(3), 2020-2029.

L. Zhang and J. Tang, "Mitigating Bias in Transformer Models for Text Classification Tasks," IEEE Transactions on Knowledge and Data Engineering, 2023.

H. Wang, "Exploring Large Language Models for Decision-Making Tasks with Structured Data Inputs," Proc. of the 59th Annual Meeting of the Association for Computational Linguistics, 2021.

W. Cain, "Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education," TechTrends, vol. 68, no. 1, pp. 47-57, 2024, doi: 10.1007/s11528-023-00896-0.

DataCamp, “Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs,” DataCamp. [Online]. Available: https://www.datacamp.com/tutorial/chain-of-thought-prompting. [Accessed: 05-Oct-2024].

Ingole B.S., Ramineni V., Pulipeta N.K., Kathiriya M.J., Krishnappa M.S. and Jayaram V. (2024) The Dual Impact of Artificial Intelligence in Healthcare: Balancing Advancements with Ethical and Operational Challenges, European Journal of Computer Science and Information Technology, 12 (6), 35-45

ACL Anthology, “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge,” ACL Anthology. [Online]. Available: https://aclanthology.org/2023.acl-long.557/. [Accessed: 05-Oct-2024].

V. D. Gowda, S. M. Chaithra, S. S. Gujar, S. F. Shaikh, S. Ingole, and N. S. Reddy, "Scalable AI Solutions for IoT-based Healthcare Systems using Cloud Platforms," in Proc. 2024 8th Int. Conf. I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 156–162, 2024, doi: https://doi.org/10.1109/I-SMAC61858.2024.10714810

MITIGATING ORDER SENSITIVITY IN LARGE LANGUAGE MODELS FOR MULTIPLE-CHOICE QUESTION TASKS

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

cover