UK Biobank Population-level Studies Using Olink’s PEA Technology Advance Disease Understanding and Therapy Development
Imagine a world where diseases can be predicted and treated with pinpoint accuracy long before symptoms appear. This is the promise of integrating proteomics with genomics—a revolutionary approach that is redefining the boundaries of medical research. In this blog, we explore the transformative potential of this integration, drawing on insights from a recent webinar featuring Dr. Ryan Dhindsa from Baylor College of Medicine as he shared his research on proteogenomics with the UK Biobank data using Olink’s cutting-edge Proximity Extension Assay (PEA) technology. Dr. Dhindsa’s research program uses large-scale population omics data to identify new disease-associated genes and therapeutic targets. His work has led to discoveries in epilepsy, idiopathic pulmonary fibrosis, Parkinson’s disease, diabetes, and other conditions.
In his work with the UK Biobank, 3,000 plasma proteins from 54,000 participants’ samples were analyzed using Olink Explore, uncovering thousands of rare genotype-protein associations. Dr. Dhindsa discussed how these findings provide unique insights into disease biology, aiding the identification of new therapeutic targets and clinical biomarkers.
The study methods used by Dr. Dhindsa include:
- Phenome-Wide Association Studies (PheWAS): Allow simultaneous analysis of multiple diseases using electronic health record data.
- Exome-Wide Association Studies (ExWAS): Focus on protein-coding variants and rare genetic variants to identify strong disease associations.
- Collapsing Analysis: Groups variants within genes to enhance discovery of rare variant associations.
Discovering Genes is Just the Start
Gene discovery is a fundamental process in drug development. Dr. Dhindsa emphasized that genetically targeted treatments are 2 to 3 times more likely to receive FDA approval. This underscores the importance of integrating genetic insights into the drug development pipeline to enhance the success rates of new therapies. However, discovering genes is just the beginning. The real challenge lies in translating these discoveries into actionable clinical insights.
The Power of Phenome-Wide Association Studies
Transitioning from traditional genetic research methods, phenome-wide association studies (PheWAS) offer a broader perspective for examining thousands of diseases simultaneously by using electronic health record data. The UK Biobank is a transformative cohort study with extensive phenotyping data from 500,000 participants, including genomic and health record data. Dr Dhindsa’s studies leveraged the electronic health records from the UK Biobank to facilitate large-scale gene discovery. They were able to identify over 47,000 genotype-phenotype associations, providing a comprehensive view of how genetic variations influence a wide range of conditions. This approach marks a significant advancement in our ability to study complex diseases. Dr Dhindsa’s studies showed that rare variants tend to have stronger effect sizes on disease risk than common variants.
Explore the AstraZeneca PheWAS Portal for gene-phenotype association data from the UK Biobank: https://azphewas.com/
Collapsing Analysis for Statistical Power
While PheWAS offers extensive insights, the challenge of rare genetic variants remains. Rare variants often have significant impacts on disease risk but lack the statistical power needed for robust conclusions. This is where collapsing analysis comes into play. By grouping variants within a gene, collapsing analysis increases statistical power. This method led to the discovery of 1,700 gene-phenotype associations with a median odds ratio of 12, highlighting the critical role of rare variants in identifying biological signals and understanding complex diseases. One notable finding was the identification of the gene MAP3K15, where mutations appeared protective against type 2 diabetes. This finding was validated in independent cohorts, suggesting that this gene could be a promising therapeutic target. However, this example is an exception because typically gene discoveries show increased risk of disease rather than protection, requiring more upstream work to understand how the risk locus confers the effect. Proteomics enables this to be done on a large scale.
Proteogenomics: A Powerful Integration
Proteomics complements and enhances genetic discoveries. By examining the proteome—the entire set of proteins expressed by a genome—we gain a more nuanced understanding of disease mechanisms. Proteomics helps identify novel biomarkers for early diagnosis and develop personalized treatment plans. The integration of proteomics with genomics provides a detailed layer of data that complements genetic insights, offering a dynamic view of disease progression and response to treatment. This integration is pivotal for biomarker discovery, drug repositioning, and understanding protein regulatory networks.
Building on the foundation of collapsing analysis, proteogenomics integrates proteomic data with genomic insights, bridging the gap between genetic associations and disease mechanisms. As part of the UK Biobank Plasma Proteomics Consortium, Dr Dhindsa’s team analyzed proteomics data from 54,000 participants using Olink Explore across 3000 plasma analytes. They focused on 49,000 of the participants that also had whole-exome sequencing data available and identified 5,400 rare genotype-protein associations across 1,300 proteins using variant-level ExWAS and gene-level collapsing analyses. The study identified thousands of protein quantitative trait loci (pQTLs), emphasizing the importance of sequencing-based tests for rare variants, as 80% of the rare variants were previously not identified in a comparable GWAS study. Rare variants affecting protein levels were linked to significant reductions in protein levels, validating the biological relevance of the associations. For example, different mutations in the gene NLRC4, known for its role in a rare autoinflammatory syndrome affected IL-18 plasma levels, providing insights into potential therapeutic targets.
The integration of Olink’s proteomic data with traditional exome analysis can enhance the detection of meaningful gene-phenotype associations. It was demonstrated that gene-phenotype associations which were not detected previously in the 450,000 collapsing/ExWAS analysis covering 14,000 phenotypes were improved by an augmented Olink model. Specific improved associations included the link between genes PCSK9, ANGPTL3, PROC, CD36, TCN1, LPL, and KEL with phenotypes dyslipidemia, phlebitis/thrombophlebitis, thalassemia, vitamin B deficiency, and hypertrophic skin, respectively. Notably, the Olink-augmented analysis appears to be particularly useful where plasma/blood is a relevant tissue.
Clonal hematopoiesis also exemplifies the potential of proteogenomics. Clonal hematopoiesis, a condition where somatic mutations in hematopoietic stem cells lead to clonal expansion, can increase the risk for various diseases, including cardiovascular conditions and blood cancers. By examining known driver mutations and their associated protein changes, the researchers identified significant trans-protein quantitative trait loci (trans-PQTLs) linked to these conditions. These findings offer potential biomarkers for tracking clonal hematopoiesis and suggest new therapeutic avenues.
Expanding Horizons: Beyond the Protein-Coding Genome
The release of whole genome sequence data from 500,000 subjects by the UK Biobank in 2023 opens new research avenues beyond the protein-coding regions of the genome. Non-coding regions, which constitute the majority of the human genome, contain functional elements crucial for gene regulation. A pilot study by Dr Dhindsa’s team focusing on untranslated regions (UTRs) demonstrated that proteomics could help identify functional variants within these regions, offering insights into their impact on human health. This expansion into non-coding regions is a significant step forward in understanding the complete genetic landscape.
The journey of integrating proteomics with genomics is more than a scientific advancement; it represents a paradigm shift in the approach to disease management. This combined approach offers a detailed layer of data that complements genetic insights, providing a dynamic view of disease progression and response to treatment. Proteomics enhances the ability to perform robust disease risk assessments, identify precise biomarkers, and develop effective therapeutic strategies. The future of precision medicine lies in this integration, promising more accurate and personalized healthcare solutions.
A New Era in Precision Medicine
Looking ahead, the integration of proteomics with other omics technologies, such as metabolomics and transcriptomics, will provide a more holistic view of disease mechanisms and enhance the discovery of novel therapeutic targets. The continued expansion of proteomic coverage and sample sizes, along with advancements in data sharing and standardization, will further drive the field forward.
The integration of proteomics with genomics is crucial for advancing precision medicine, offering new insights into disease mechanisms and potential therapeutic targets. The recent webinar highlighted the significant advancements and potential of this integrated approach in drug development and personalized healthcare. By leveraging Olink’s innovative platforms, researchers can uncover new discoveries and shape the future of medical research.
Webinar Q&A Highlights
The Q&A session of the webinar addressed several critical questions about the scalability and economic viability of proteomics, the impact of post-genomic research on drug development, and the representation of different ancestries in the UK Biobank data. The integration of genomics and proteomics data was highlighted as highly effective in elucidating disease mechanisms, providing a comprehensive understanding of biological processes, and identifying new therapeutic targets.
General Scalability and Economic Viability of Proteomics
Question: What are the main risks and benefits of proteomics regarding scalability and economic viability?
Answer: Dr Dhindsa highlighted that the primary risks are associated with the cost and scalability of proteomics. Decreasing costs and advancements in technology are making proteomics more accessible and scalable. Economically, proteomics is becoming more viable due to its integration with other omics technologies, which enhances the overall value and potential for discovering new therapeutic targets.
Impact of Post-Genomic Research on Drug Development
Question: How do you see post-genomic research influencing drug development?
Answer: Dr Dhindsa emphasized that post-genomic research will influence drug development in multiple ways, including the identification of new biomarkers for disease and treatment response. By understanding the genetic basis of diseases and integrating this knowledge with proteomic data, researchers can develop more targeted therapies, ultimately improving drug efficacy and reducing adverse effects.
Validation of New Associations
Question: Did you validate the newly discovered associations?
Answer: Dr Dhindsa mentioned that the initial study was more exploratory in nature, focusing on identifying potential associations. However, they did validate some associations by replicating them in independent cohorts, such as the Mexico City cohort and the FinGen population. Ideally, further validation would involve replicating these findings in additional studies and conducting functional analyses to confirm the biological relevance of these associations.
Representation and Diversity in UK Biobank Data
Question: What are the limitations regarding the representation of different ancestries in the UK Biobank data?
Answer: Dr Dhindsa acknowledged that the UK Biobank predominantly includes individuals of European ancestry, which limits the generalizability of the findings to other populations. This highlights the need for more diverse datasets to ensure that the discoveries are applicable to a broader range of genetic backgrounds and to address health disparities in underrepresented populations.
Use of Non-Coding Regions in Genome-Wide Studies
Question: How do you handle the challenges of interpreting non-coding regions in genome-wide studies?
Answer: Dr Dhindsa explained that non-coding regions pose challenges due to the lack of functional annotations compared to coding regions. To address this, they used proteomic data to validate potential functional effects of variants in non-coding regions. This approach helps to identify which non-coding variants might be biologically relevant and worth further investigation.
Applications of Proteomics in Disease Prediction
Question: Can proteomics be used to predict the progression of diseases?
Answer: Yes, proteomics has significant potential in disease prediction. By measuring protein levels and identifying biomarkers associated with disease progression, researchers can develop predictive models. These models can help identify individuals at risk of developing certain diseases or those who might respond better to specific treatments, thus enabling personalized medicine.
Combining Genomics and Proteomics Data
Question: How effective is the integration of genomics and proteomics data in elucidating disease mechanisms?
Answer: The integration of genomics and proteomics data is highly effective in providing a more comprehensive understanding of disease mechanisms. Genomics data helps identify genetic variants associated with diseases, while proteomics data provides insights into how these variants affect protein expression and function. This combined approach allows for the identification of novel therapeutic targets and the development of more precise interventions.
Future Directions and Scaling Up Studies
Question: What are the future directions for scaling up these studies?
Answer: Future directions involve increasing the sample size and diversity of datasets, integrating more omics technologies, and improving the automation and scalability of proteomic platforms. As technologies advance, researchers will be able to conduct larger and more comprehensive studies, leading to more robust and generalizable findings.
Addressing the Challenge of Identifying Functional Variants
Question: How do you address the challenge of identifying functional variants in non-coding regions?
Answer: Dr Dhindsa noted that identifying functional variants in non-coding regions is challenging due to the lack of functional annotations. They addressed this by using proteomic data to assess the impact of non-coding variants on protein levels. This approach helps to prioritize variants that are likely to have a functional impact and are worth further investigation.
Use of Biomarkers for Brain-Related Diseases
Question: Are these biomarkers valuable for brain-related diseases?
Answer: Initially, there was skepticism about the utility of these biomarkers for brain-related diseases. However, recent advancements have shown that certain biomarkers can predict neurological conditions. This has been based on extensive studies demonstrating their predictive power for various brain-related diseases, indicating a promising direction for future research.
Relatedness of Individuals in Study Cohorts
Question: Did the study focus on related or unrelated individuals?
Answer: The study primarily focused on unrelated individuals to avoid confounding effects in association studies. There are small sub-cohorts within the UK Biobank that include related individuals, but the main analysis was conducted on unrelated participants to ensure the robustness and reliability of the findings.
Combining Multiple Omics Modalities
Question: Do you think combining multiple omics modalities will be the next step in research?
Answer: Absolutely. Combining multiple omics modalities, such as genomics, proteomics, and transcriptomics, provides a more comprehensive understanding of biological processes. This integrated approach is expected to become increasingly important in future research, allowing for a deeper exploration of complex diseases and the identification of new therapeutic targets.
Use of Proteomics in Identifying Rare Genetic Variants
Question: How can proteomics help in identifying rare genetic variants?
Answer: Proteomics can enhance the identification of rare genetic variants by providing functional data on protein expression and modifications. This helps in understanding the biological impact of these variants and their association with diseases. Proteomics data can complement genomic data, leading to more accurate identification and validation of rare variants.
Availability of Data and Tools for Researchers
Question: Are the data and tools used in your study publicly available?
Answer: Yes, the UK Biobank data is available to researchers who apply and get approval for access. The summary statistics and association data are also accessible via public databases. Additionally, a risk association study (large-scale analysis of individual protein-disease associations in the UK BioBank) is freely available on the Olink Insight tool https://insight.olink.com/data-stories/ukb-diseases . Furthermore, data stories developed from this study are shared with the research community to facilitate further research and replication of findings.
Integration with Other Data Types
Question: Did you integrate proteomics data with other types of data, such as metabolomics?
Answer: While the primary focus of this study was on integrating genomics and proteomics data, the integration with other data types like metabolomics is also crucial. Future research directions include combining multiple data types to provide a holistic view of biological processes and disease mechanisms, further enhancing the understanding and potential for discovery in complex diseases.
Impact on Drug Repositioning
Question: How does this research impact drug repositioning efforts?
Answer: The research has significant implications for drug repositioning. By identifying new protein targets and understanding their role in diseases, existing drugs can be repurposed for new therapeutic uses. For instance, the identification of FLT3 as a potential target in clonal hematopoiesis suggests that existing FLT3 inhibitors could be repurposed for treating related conditions.



