Background Modeling

Introduction

In addition to running the alignments and computing the counts, phippery also provides workflows to model the background with mock-IP samples and compute non-specific peptide enrichments. By default the phip-flow pipeline will run the edgeR [1] workflow as described in the Chen et al. 2022 [2] paper. Optionally, we also provide a simpler Z-score method to evaluate the significance of peptide enrichment relative to background that was used in Mina et al. 2019 [3]. Each is described in greater detail below. Note that these workflows are not mutually exclusive, (i.e. do not overwrite the counts, cpm, or any other default/optional outputs from the pipeline). You may run none or all of the workflows in tandem, then do with the combined results as you wish.

edgeR/BEER Method

Chen et al. 2022 adapts the edgeR tool to compute fold-change with respect to mock-IP samples and p-values of peptide enrichment. This is run by default. Optionally, you may run the BEER (Bayesian Estimation Enrichment in R) method, which is statistically more powerful and may be better at identifying significantly enriched peptides with lower fold-changes. The trade-off for using the BEER method is longer run-time. By default, the phip-flow pipeline runs EdgeR, but not BEER. see Optional Parameters in the pipeline documentation for more.

Z-score Method (optional)

phippery can also optionally run the Z-score method to compute the significance of peptide enrichment relative to background. This Z-score method used in Mina et al. 2019 is described in detail in their supplementary document. The method takes the mock-IP samples to bin together peptide species of similar abundance under the beads-only condition. Here, abundance can be represented in any form of normalized counts and CPM is the default in phippery. Note that the mock-IP samples are used only to determine binning.

To compute the Z-score for a peptide species in an empirical sample, identify the bin it belongs to and compute the mean and standard deviation CPM among the peptide species in that bin. To reduce the influence of outliers, such as signal from epitope-specific binding, the highest 5% and lowest 5% of CPM values are discarded when computing the mean \(\mu\) and standard deviation \(\sigma\). Formally, for a peptide species, \(p\), with CPM value, \(n_p\), belonging to bin \(i\), the Z-score is:

\[Z_p = \frac{n_p - \mu_i}{\sigma_i} .\]

References