# br Conclusions br Author contributions br Financial disclosu

Conclusions

Author contributions

Financial disclosures

Formatting of funding sources

Research involving human participants and/or animals

Informed consent

Acknowledgements

Introduction
Biomarker discovery research has yielded few clinically useful biomarkers. Poor methodologies in the statistical design of studies and in the evaluation of studies may be contributing factors [1]. With regard to design of discovery studies, guidelines have recently been discussed, including sources and numbers of biological samples for adequate power [2]. In this article we address a common and underappreciated issue in the evaluation of biomarker discovery studies.
The classic discovery study entails measuring many biomarkers, perhaps using array-based or other such high-throughput technology, on a set of biological samples from cases and controls. For each biomarker, one calculates a statistic and its P-value using the case and control data pertaining to that biomarker. The biomarkers are then ranked according to one or more criteria, such as P-value, (average) fold change between cases and controls, sensitivity at a given specificity, area under the curve, biological relevance to the target disease, availability of Dorsomorphin (Compound C) for assay development, potential difficulties with targeted assays, and differential expression in publicly available databases. P-values are a commonly-used criterion for ranking biomarker candidates and determining the top set of markers considered for further development and validation. Thus, statistical P-values can play a fundamental role in the evaluation of biomarker discovery studies.
As an example, consider the “Colocare” study to discover and validate markers to predict colon cancer recurrence in patients diagnosed with stage 1 colon cancer [3]. Tissue and blood samples taken at diagnosis from 40 cases with colon cancer recurrence and 160 controls without recurrence will be tested with approximately 3000 autoantibodies. As described in [2], the data analytic plan is to calculate the sensitivity corresponding to 90% specificity for each biomarker and to generate a corresponding standard P-value for no association between biomarker and case-control status. We simulated data for 3000 useless biomarkers not associated with case-control status and found that 69 (2.3%) had approximate P-values less than 0.01 (see third row of Table 1 in (2)). Since one would expect that approximately 30 markers (1% of markers) would attain P-values less than 0.01 if all 3000 biomarkers were useless, i.e. the estimated number of ‘false discoveries’ is 30 (=0.01 × 3000), the data analysis suggests that 69-30 = 39 true biomarkers have been discovered. However this conclusion is incorrect since we generated the data in such a way that none of the 3000 markers are predictive of case-control status. The issue here is that standard P-value calculations that rely on asymptotic statistical theory are problematic and lead to an erroneous conclusion in this example.
In this paper we demonstrate this phenomenon in more detail and propose an alternative method for calculating P-values that is generally correct and robust to the vagaries of biomarker discovery data. This exact P-value approach is applicable regardless of the statistic used to rank biomarkers and it is computationally reasonable with modern computing capacities. Most importantly, we show in simulations studies that use of exact P-values leads to more reliable conclusions from biomarker discovery data than does use of approximate P-values.

Materials and methods
In case control studies, the P-value associated with a statistic is defined as
Standard P-value calculations often employ approximations based on an asymptotic normal distribution for a Z-score standardized version of the statistic. Our study was designed to investigate if such standard P-value calculations, as commonly performed in case-control studies, are potentially incorrect in practice, and if incorrect P-value calculations can substantially affect the soundness of conclusions drawn from biomarker discovery studies. To address these questions we simulated biomarker discovery data where the capacities of biomarkers to predict outcome were specified, allowing us to compare conclusions based on data analysis with the specified truth.