TRACING DATA LINEAGE WITH GENERATIVE AI: IMPROVING DATA TRANSPARENCY AND COMPLIANCE

Sudeesh Goriparthi

Authors

Sudeesh Goriparthi Senior software engineer, software architecture, Walmart, Dallas, USA. Author

Keywords:

Artificial Intelligence (AI), Data Volume, Quality, Privacy, Security, Transparency

Abstract

The healthcare, banking, and transportation industries are just a few of the many that are seeing the rising tide of artificial intelligence (AI). Implementing artificial intelligence necessitates a consistent flow of high-quality data, as it is built upon the analysis of massive datasets. Problems do arise, though, when data is used for AI. Data volume and quality, privacy and security, fairness and interpretability/explainability, technical expertise, ethical concerns, and a thorough assessment of the challenges of data usage for AI are all addressed in this study. Artificial Intelligence (AI) applications are highly dependent on data, which makes the quality of data and the protection of data privacy extremely important problems. On the other hand, several obstacles prevent AI systems from making good use of data. In this abstract, the primary issues that are linked to data quality and data privacy in artificial intelligence applications are investigated, and ways to solve these challenges are presented. When it comes to data quality, some of the problems include erroneous or insufficient data, which might result in AI outcomes that are deceptive. However, biased outcomes can be perpetuated by biased data, which is another difficulty. In addition, the reliability of AI systems might be undermined by data that is contradictory or inconsistent with itself. It is possible to adopt strategies such as data purification, data augmentation, and bias reduction algorithms to overcome these difficulties. This is because there are worries over the protection of personal information, which leads to issues addressing data privacy. The failure to adequately protect the privacy of one's data can lead to breaches, legal repercussions, and ethical conundrums. It is possible to protect sensitive data by employing techniques such as anonymization and de-identification. Additionally, protecting data privacy can be accomplished through the use of secure storage and communication mechanisms, such as encryption.

References

Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson Education Limited: London, UK, 2016. [Google Scholar]

Sharma, L.; Garg, P.K. Artificial Intelligence: Technologies, Applications, and Challenges; Taylor & Francis: New York, NY, USA, 2021. [Google Scholar]

Aguiar-Pérez, J.M.; Pérez-Juárez, M.A.; Alonso-Felipe, M.; Del-Pozo-Velázquez, J.; Rozada-Raneros, S.; Barrio-Conde, M. Understanding Machine Learning Concepts. In Encyclopedia of Data Science and Machine Learning; IGI Global: Hershey, PA, USA, 2023; pp. 1007–1022. [Google Scholar]

Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA; 2019; Volume 1, pp. 4171–4186. [Google Scholar]

Gumbs, A.A.; Grasso, V.; Bourdel, N.; Croner, R.; Spolverato, G.; Frigerio, I.; Illanes, A.; Abu Hilal, M.; Park, A.; Elyan, E. The advances in computer vision that are enabling more autonomous actions in surgery: A systematic review of the literature. Sensors 2022, 22, 4918. [Google Scholar] [CrossRef] [PubMed]

Enholm, I.M.; Papagiannidis, E.; Mikalef, P.; Krogstie, J. Artificial intelligence and business value: A literature review. Inf. Syst. Front. 2022, 24, 1709–1734

Cotton D, Cotton P, Shipway J: Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innov. Educ. Teach. Int. 2023;1–12. 10.1080/14703297.2023.2190148

Dergaa I, Chamari K, Zmijewski P, et al.: From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol. Sport. 2023;40(2):615–622. 10.5114/biolsport.2023.125623

Pencheva I, Esteve M, Mikhaylov SJ: Big Data and AI – A transformational shift for government: So, what next for research? Public Policy Adm. 2018;35(1):24–44. 10.1177/0952076718780537

Liang W, Tadesse G, Ho D, et al.: Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 2022;4:669–677. 10.1038/s42256-022-00516-1

Wilson G, Bryan J, Cranston K, et al.: Good enough practices in scientific computing. PLoS Comput. Biol. 2016;13:e1005510. 10.1371/journal.pcbi.1005510

Grant R: Data management supporting the research communications ecosystem. 2022.

European Commission Publications Office: AI and Open Data: a crucial combination. data.europa.eu - The official portal for European data. 2018

Ghosh A, Fossas G: Can There be Art Without an Artist? arXiv. 2022; arXiv:2209.07667v2.

Sun, X.; Liu, Y.; Liu, J. Ensemble learning for multi-source remote sensing data classification based on different feature extraction methods. IEEE Access 2018, 6, 50861–50869. [Google Scholar]

Zha, D.; Bhat, Z.P.; Lai, K.H.; Yang, F.; Jiang, Z.; Zhong, S.; Hu, X. Data-centric artificial intelligence: A survey. arXiv 2023, arXiv:2303.10158. [Google Scholar]

Ntoutsi, E.; Fafalios, P.; Gadiraju, U.; Iosifidis, V.; Nejdl, W.; Vidal, M.E.; Ruggieri, S.; Turini, F.; Papadopoulos, S.; Krasanakis, E.; et al. Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1356. [Google Scholar] [CrossRef] [Green Version]

Jarrahi, M.H.; Ali, M.; Shion, G. The Principles of Data-Centric AI (DCAI). arXiv 2022, arXiv:2211.14611. [Google Scholar]

Zha, D.; Bhat, Z.P.; Lai, K.-H.; Yang, F.; Hu, X. Data-centric AI: Perspectives and Challenges. arXiv 2023, arXiv:2301.04819. [Google Scholar]

Mazumder, M.; Banbury, C.; Yao, X.; Karlaš, B.; Rojas, W.G.; Diamos, S.; Diamos, G.; He, L.; Kiela, D.; Jurado, D.; et al. Dataperf: Benchmarks for data-centric ai development. arXiv 2022, arXiv:2207.10062. [Google Scholar]

Miranda, L.J. Towards Data-Centric Machine Learning: A Short Review. Available online: https://ljvmiranda921.github.io/notebook/2021/07/30/data-centric-ml/ (accessed on 15 April 2023).

Alvarez-Coello, D.; Wilms, D.; Bekan, A.; Gómez, J.M. Towards a data-centric architecture in the automotive industry. Procedia Comput. Sci. 2021, 181, 658–663. [Google Scholar] [CrossRef]

Uddin, M.F.; Navarun, G. Seven V’s of Big Data understanding Big Data to extract value. In Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education, Bridgeport, CT, USA, 3–5 April 2014. [Google Scholar]

O’Leary, D.E. Artificial intelligence and big data. IEEE Intell. Syst. 2013, 28, 96–99. [Google Scholar] [CrossRef]

Broo, D.G.; Jennifer, S. Towards data-centric decision making for smart infrastructure: Data and its challenges. IFAC-Pap. 2020, 53, 90–94. [Google Scholar] [CrossRef]

Jakubik, J.; Vössing, M.; Kühl, N.; Walk, J.; Satzger, G. Data-centric Artificial Intelligence. arXiv 2022, arXiv:2212.11854. [Google Scholar]