Over the past eight months, the COVID-19 pandemic has shed increasing awareness on the importance of the biotechnology sector and the pivotal role it plays in ushering new discoveries in disease research, treatments, and cures. While indeed, the notion of biotechnology’s importance in both the global economy and human health has never been more apparent to the keen observer, one of the most fascinating elements has been the rate at which novel discoveries are being developed. The traditional biotech research and development (R&D) process has long been marred by bottlenecks and redundancies spanning data collection, recording, analysis, and extrapolation as well as wet-lab benchwork. However, over the past decade, a powerful yet subtle tangential shift in how scientific data is harnessed, processed, and translated into functional insights has occurred. In the healthcare and life sciences space, there has been an unfathomable amount of data created and procured from high-throughput technologies in healthcare institutions and research labs worldwide. With these vast troves of data, only a finite amount has been cleaned, labeled, and structured to generate functional clinical and scientific insights to improve current standards of clinical care and develop novel discoveries.
Traditional data analysis methods in biotechnology have been relatively primitive, in which these methods were typically only functionally compatible with simple, homogenous data. However, these conventional methods begin to fail when the data becomes multivariate and heterogeneous. Often, data such as electronic health records (EHRs) include multiple variables, including diagnoses and comorbidities for patients.