Human Omics-data Analytics
Unlocking What Lies in Genome and Omics-Data
Summary:

The Prism-Vote (PV) system tackles the challenges of real-world population heterogeneity by providing individualized predictions based on nuanced genetic propensities. The system is compatible with baseline prediction models

Beyond genome data, the PV system applies to exome, methylation, and other omics data, with application potential in clinical trials, treatment effect modeling, and biomarker discovery. It has also been used in commercial risk screening projects.

The science team of BethBio is experienced in genomics research and has developed numerous biostatistics tools that can assist clients in genome and biomarker association studies

The Challenge from A World of Heterogeneous Populations

Harnessing the human genome and omics data to develop therapeutics, assess risks, and identify biomarkers is one of the most exciting opportunities in modern science. However, one of the greatest challenges is the diverse and complex genome profiles of real-world populations. For example, a set of risk genes found through studies in one ethnicity may not hold the same significance in other populations, as evident in diseases such as Alzheimer’s. Notably, most studies and current knowledge of gene associations are still predominantly centered around the populations of European ancestry, despite scientists putting in continuous effort into improving representation across populations.

While risk factors and effect sizes for a disease or treatment can be more clearly identified within a single population, subjects often constitute a multi-population cohort in real-world application settings. Even among subjects of the same population, there can still be significant heterogeneity due to mixed-ancestry and other individual differences. This heterogeneity limits the application of genomic data in clinical settings. For instance, while some well-established and more universal gene variants like BRCA1 mutations have clear utility, many other associations fail to replicate across diverse populations. Recklessly assuming that findings from one population apply universally can lead to significant inaccuracies and hinder the development of effective medical applications.

Making Individualized Prediction with the Prism-Vote (PV) System

To address this problem, BethBio has invented the Prism-Vote (PV) system, an analytical framework that transforms population diversity into an advantage for clinical representation and individualized predictions. The PV system differs from conventional methods that classify individuals into rigid population strata, by calculating each subject’s propensity towards different subpopulations, creating a more nuanced representation. For example, a subject’s propensity towards different genetic ancestries can be determined through non-parametric clustering with a reference genome database. Once an individual’s propensities are characterized, the stratum-specific effect sizes can be estimated using Bayesian probability with a more homogeneous genetic structure.

Furthermore, conventional methods that estimate the effect size of a minor population such as selecting common SNP subsets, would lead to individuals with the same genetic variation profile to be estimated having the same degree of risk. Yet in reality, heterogeneous populations are often continuous spectra rather than composed of several distinct population groups. The PV system more truthfully reflects this propensity landscape and can produce true individualized predictions, improving prediction accuracy in real-world datasets. Conventional methods also cannot include diversified populations in their training set, while the PV system’s advantages become more pronounced the more heterogeneous the subject population is.

Performance-wise, the PV system has demonstrated remarkable improvements in a variety of applications, particularly in mixed-population datasets. For instance, applying the PV system improves prediction accuracy by 5.2% in hypertension (AUC), 5.4% in diabetes (AUC), and 12.1% in BMI (correlation coefficient) in a mixed (6+ subgroup) population real world dataset. Other examples include improving prediction accuracy by 4.9% for Schizophrenia (AUC) in real-life GWAS dataset; and improving 10.2% (correlation coefficient) for BayesR, by 20.5% for LM, by 26.5% for Dirichlet Process Regression in high heritability simulated scenarios. Moreover, the PV system has been applied commercially to develop risk screening tests for complex diseases, demonstrating its versatility and scalability.

New Biomarkers, Patient Classification and Risk Prediction

Whether your goal of utilizing omics data is to identify new biomarkers, perform patient classifications or build new risk prediction models, the PV system is uniquely equipped to handle the complexities of real-world, heterogeneous populations. Most importantly, the system is compatible with baseline prediction models that researchers deem suitable for the data structure, such as LDpred2, PRS-CS and SBayesR etc. and integrates seamlessly into existing workflows.

The technology is versatile and can serve numerous purposes. For instance, using the system for analyzing clinical trials can improve models for treatment effects for underrepresented populations, enhance risk factor identification, and allow for more precise detection for differentiated responses. The system can be applied the other way around to estimate therapeutic effects or risks for individual patients, through integrating stratum-specific risks based on their propensities towards the different strata. Additionally, by processing the heterogeneous pool of subject data more accurately, we may identify new sub-stratum-specific biomarkers, risk genes and associations that may not replicate across populations. The PV system is not only limited to analyzing genome data but can also be applied to exome, methylation, or other omics-data. Stratification of population is also not limited to ethnicity but also by clinical conditions or other characteristic parameters with sufficient data.

Furthermore, at BethBio, our scientific team has decades of expertise in genomics research and has developed numerous biostatistics tools for genome and biomarker association studies over the years. For example, the Zoom-focus algorithm improves testing power by locating the optimal testing region for rare variants, and the W-test is a powerful tool for pairwise epistasis testing.

Ready to transform your omics-data research or clinical applications? Contact us at business@bethbio.com today to learn how BethBio’s Prism-Vote (PV) system and other biostatistical tools can assist you to make more precise predictions, improve outcomes for diverse populations and discover new biomarkers. Together, we can unlock the full potential of genomic and omics data.