DEVELOPING HIGH-PERFORMANCE COMPUTING ALGORITHMS FOR LARGE-SCALE DATA ANALYSIS

Venkata Sai Swaroop Reddy; Nallapa Reddy

Authors

Venkata Sai Swaroop Reddy Senior Software Engineer, ViaSat Inc, USA. Author
Nallapa Reddy Senior Software Engineer, ViaSat Inc, USA. Author

Keywords:

HPC System, Large-Scale Data Analysis, Message Passing Interface

Abstract

Designing high-performance computing (HPC) systems for big data analytics requires careful consideration of factors including data storage and processing efficiency, as well as the capacity to handle the massive volumes of data generated by big data applications. High-end servers with powerful CPUs, lots of RAM, and quick storage devices like SSDs or HDDs arranged in a distributed or parallel fashion are common hardware components of high-performance computing systems. Some types of data processing can be accelerated using additional specialist hardware, such as graphics processing units or field-programmable gate arrays (FPGAs). The applications, middleware, and operating system make up the software components of a high-performance computing system. The OS needs to be highly scalable and have little overhead to handle HPC workloads. To help nodes in a distributed computing system communicate with each other, middleware like MPI (Message Passing Interface) can be utilised. With efficient algorithms and optimised data structures, applications should be developed to take advantage of the parallel and distributed processing capabilities of the HPC system. Research included using algorithms to analyse diverse sets of medical and legal records. The issues with improving performance in graph calculations, clustering, and classification were resolved. The use of CUDA allowed for performance increases of more than 95 times. Electronic record analysis relies on high performance technologies to respond well to the process of analysing massive amounts of data from information systems. The study demonstrates how to accelerate computations using the most common and fundamental machine learning tasks as an example.

References

K. Kutyrev, A. Yakovlev, O.M.-P.C. Science, and undefined 2019, Mortality Prediction Based on Echocardiographic Data and Machine Learning: CHF, CHD, Aneurism, ACS Cases, Elsevier. (n.d.).

S. Sikorskiy, O. Metsker, A. Yakovlev, and S. Kovalchuk, Machine Learning Based Text Mining in Electronic Health Records: Cardiovascular Patient Cases, 2018. doi:10.1007/978-3-319-93713-7_80.

A. Yakovlev, O. Metsker, Prediction of in-hospital mortality and length of stay in acute coronary syndrome patients using machine-learning methods, J. Am. Coll. Cardiol. 71 (2018) 242.

O. Metsker, E. Trofimov, M. Petrov, N.B.-P.C. Science, and undefined 2019, Russian Court Decisions Data Analysis Using Distributed Computing and Machine Learning to Improve Lawmaking and Law Enforcement, Elsevier. (n.d.).

J. Dongarra, T. Herault, and Y. Robert, Fault tolerance techniques for high-performance computing, (2015). doi:10.1007/978-3-319-20943-2_1.

E. Elsebakhi, F. Lee, E. Schendel, A. Haque, N. Kathireason, T. Pathare, N. Syed, and R. Al-Ali, Largescale machine learning based on functional networks for biomedical big data with high performance computing platforms, J. Comput. Sci. (2015). doi:10.1016/j.jocs.2015.09.008.

E.J. Topol, High-performance medicine: the convergence of human and artificial intelligence., Nat. Med. (2019). doi:10.1038/s41591-018-0300-7.

S.T.-24th A.I.C.S. and, and undefined 2000, Problems with mining medical data, Ieeexplore.Ieee.Org. (n.d.).

W. Kim, Parallel clustering algorithms: Survey, Spring. (2009). doi:10.1016/0167-8191(89)90036-7.

G. Crispatzu, P. Kulkarni, M.R. Toliat, P. Nürnberg, M. Herling, C.D. Herling, and P. Frommolt, Semiautomated cancer genome analysis using high-performance computing, Hum. Mutat. (2017). doi:10.1002/humu.23275.

NVIDIA, (P1) Cuda C Programming Guide, Program. Guid. (2015). doi:10.1016/j.pedhc.2005.10.011 PMCID: PMC3074485 NIHMSID: Nihms253063.

Storti Duane, and Mete Yurtoglu, CUDA for Engineers. An Introduction to High-Performance Parallel Computing, 2015. doi:1-4244-1484-9/08/$25.00.

Y. Xing, C. Wu, X. Yang, W. Wang, E. Zhu, and J. Yin, ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers, Molecules. 23 (2018) 1028. doi:10.3390/molecules23051028.

M. Garland, Parallel computing with CUDA, 2010 IEEE Int. Symp. Parallel Distrib. Process. (2010). doi:10.1109/IPDPS.2010.5470378.

V. Narayanan, I. Arora, and A. Bhatia, Fast and accurate sentiment classification using an enhanced Naive Bayes model, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 2013: pp. 194–201. doi:10.1007/978-3-642-41278-3_24.

Q. Yao, Y. Tian, P.F. Li, L.L. Tian, Y.M. Qian, and J.S. Li, Design and Development of a Medical Big Data Processing System Based on Hadoop, J. Med. Syst. 39 (2015). doi:10.1007/s10916-015-0220-8.

F. Viegas, G. Andrade, J. Almeida, R. Ferreira, M. Gonçalves, G. Ramos, and L. Rocha, GPU-NB: A fast CUDA-based implementation of Näive Bayes, Proc. - Symp. Comput. Archit. High Perform. Comput. (2013) 168–175. doi:10.1109/SBAC-PAD.2013.16.

O. Metsker, E. Bolgova, A. Yakovlev, A. Funkner, and S. Kovalchuk, Pattern-based Mining in Electronic Health Records for Complex Clinical Process Analysis, Procedia Comput. Sci. 119 (2017) 197–206. doi:10.1016/j.procs.2017.11.177.

R. Shahid, S. Bertazzon, M.L. Knudtson, and W.A. Ghali, Comparison of distance measures in spatial analytical modeling for health service planning, BMC Health Serv. Res. (2009). doi:10.1186/1472-6963-9- 200.

O. Metsker, S. Kesarev, E. Bolgova, K. Golubev, A. Karsakov, A. Yakovlev, and S. Kovalchuk, Modelling and analysis of complex patient-treatment process using graphminer toolbox, in: Lect. Notes Comput. Sci.

DEVELOPING HIGH-PERFORMANCE COMPUTING ALGORITHMS FOR LARGE-SCALE DATA ANALYSIS

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

cover