Speed drug development through stratifying clinical trial enrollment

Progressive fibrosing interstitial lung disease (PF-ILD) is a disease with a heterogenous phenotype and clinical trials could be vastly reduced if a biomarker stratification approach could be adopted.

How does one define a disease?


Disease, noun: a condition of the living animal or plant body or of one of its parts that impairs normal functioning and is typically manifested by distinguishing signs and symptoms.
– Merriam Webster1

When a person is considered sick with a disease, ranging from diabetes to Alzheimer’s to cystic fibrosis, you would think of ‘distinguishing signs and symptoms’ that are consistent with all the manifestations of a particular impairment. Advances in modern medical research has enabled a fine-tuning of the definitions of disease, for example in diabetes early on the distinction made between Type I (inability to produce insulin) and Type 2 (inability to respond to insulin), leading to a common symptom (chronically high levels of blood sugar). These observations, called phenotypes can vary to a wide extent in a given disease, and thus are called heterogenous phenotypes.

Genomics has greatly increased the capability to uncover the molecular underpinning of various diseases to help pick apart phenotypic variability. A single condition can have a wide variety of symptoms, responsiveness to drug treatments and risk for co-morbidities, and analyzing the genomes and various phenotypes of thousands or even tens of thousands of affected individuals illuminates the potential for personalizing the treatment approaches used.

A recent Nature Medicine publication2 from the University of Dundee and the Wellcome Trust called “Heterogeneity in phenotype, disease progression and drug response in type 2 diabetes” illustrates the heterogeneous variability in Type 2 Diabetes (T2D) across over 23,000 Scottish patients as part of the UK Biobank and a multicenter clinical trial studying the efficacy of three commonly used treatments for recently diagnosed T2D. Overlaying the complex phenotypes in diabetes against the individuals genetic risk on top of outcome data (other severe morbidities such as cardiovascular events and retinopathies) as well as drug response data yield insights into how T2D could be managed in a much more effective, personalized way.

The interconnectedness of biology illustrates how complex a problem heterogeneous phenotypes are. Even with a single-gene disorder such as Cystic Fibrosis (CF), one single mutation in one area of the gene may have a very different outcome and severity of clinical disease from a different mutation in an area of the same gene. This is reflected in the recent drug approvals to treat CF, where one medication (called lumacraftor) helps re-fold the CFTR gene only if a mutation called F508del is present3. The fact that genes (and proteins their gene products) interact in context with all the other proteins illustrate the complexity of biology – and human health.

The problem with interstitial lung disease

Interstitial Lung Disease (ILD) is a heterogeneous group of disorders that cause lung scars, affecting an individual’s ability to breathe. These scars are at the lowest level of lung organization, a functional unit called the alveolus (alveoli plural). The lung interstitium is the connective tissue outside this air sac, and oxygen travels through the alveolus and the interstitium to the surrounding capillaries.

Symptoms include shortness of breath, chest pain, extreme tiredness, and dry cough, and range in severity from mild to life-threatening, and affects approximately 100,000 individuals in the United States every year. Symptoms of this disease are treated with a collection of at least 18 therapeutics and oxygen therapy, with lung transplant being a final (and drastic) option. About 30% of all single lung transplants in the US are driven by idiopathic pulmonary fibrosis which indicates the severe need for additional development of therapeutics.

There are several types of ILD, some driven by genetics, the majority with no known cause (called idiopathy pulmonary fibrosis or IPF), others caused by dust or mold in the environment, including asbestos-related lung diseases. As a heterogeneous disorder with poor treatment options several efforts are underway to develop improved treatments for this disease.

Putting precision into precision medicine

Precision medicine has various definitions, and a good one from the FDA4 is commonly quoted: to target the right treatments to the right patients at the right time. In the past decade (since 2011 when the first next-generation sequencing (NGS)-based Laboratory Developed Tests started being offered for cancer abnormalities) there has been an explosion of diagnostic tests being offered for the personalized treatment of cancer. Prior to NGS-based tests there were several imaging and protein biomarker-based personalized treatments, as well as a nascent effort around pharmacogenetic testing around Cytochrome P450 metabolism of existing drugs. It was the explosion of the wealth of genetic data that has put precision medicine in the spotlight, and all the companion diagnostics that have opened up as a result. Over 40% of the existing pharmaceutical pipeline are for targeted therapeutics, a major shift from the prior ‘one drug for all patients’ paradigm.


Over 40% of the existing pharmaceutical pipeline are for targeted therapeutics, a major shift from the prior ‘one drug for all patients’ paradigm.

There are cases, such as with PF-ILD, where there is no available genetic marker for predisposition to this disease, and there is no other available biomarker to discriminate between the progressive fibrosing form of ILD versus the other types. The prevention of the progressive form of ILD new therapeutics aim to accomplish, however without being able to determine beforehand who has the progressive form versus who doesn’t have the progressive form means individuals are treated who would receive no benefit from the therapy.

This is where Olink Proteomics can make a difference: utilizing the ability of Olink Explore to accurately measure the relative levels of up to three thousand biomarkers simultaneously. In this way multi-marker analysis can be undertaken to determine a unique collection of circulating proteins to differentiate different clinical sub-groups of a heterogeneous disease.

Discovery and development of a 12 protein biomarker signature

In a recent paper published in the journal Lancet Respiratory Medicine5 titled “Proteomic biomarkers of progressive fibrosing interstitial lung disease: a multicentre cohort analysis“, Dr. Justin Oldham6 (University of California Davis) and colleagues report analysis of 368 proteins using Olink technology across a discovery cohort of 385 ILD patients and a second validation cohort of an additional 204 patients. Using machine learning, 31 proteins were identified from the discovery cohort of which 17 maintained its association in the validation cohort.

Again using machine learning, the 17 biomarkers were narrowed to 12 proteins, which in their validation cohort showed a sensitivity of 0.90 and a corresponding negative predictive value (NPV) of 0.91. A high NPV means there is a high accuracy of a given test to be ‘true negatives’, that is when the test is negative there is little room for error of that individual actually being positive for the condition.

Figure 1: Table showing the high sensitivity and Negative Predictive Value of the 12-biomarkers for PF-ILD. From Bowman WS and Oldham J et al (2022)7

Calculating a $26.7M cost savings of a clinical trial

The authors specifically point out the benefit to reduce the size of clinical trials necessary if a clinical trial could be enriched for patients who were screened to benefit from the effects of therapy.


A theoretical randomised controlled trial with 1:1 randomisation designed without regard to proteomic signature would require 676 patients to detect a 50% reduction in FVC decline at 90% power, assuming a standard deviation of 200 mL and two-tailed α of 0·05. A similar trial restricted to patients with a high-risk proteomic signature would require 142 patients, assuming the same parameters (appendix p 23).
– Bowman WS et al Lancet Respir Med (2022) p.7 doi:10.1016/S2213-2600(21)00503-87

There have been studies performed that estimate the cost of a single clinical trial patient enrollee is approximately $50,000, and using the numbers above, 676 patients enrolled would cost $33.8M, while 142 patients enrolled would cost $7.1M, a savings of $26.7M. In addition to the cost savings, enrolling only one-fifth of the original number through patient stratification using a blood-based biomarker panel would significantly reduce the amount of time needed for enrolling and completing said clinical trial. 1


Biomarker assays

~881 million

Protein data points generated


Publications listed on website

Contact us!