Introduction
Welcome to the documentation for phippery
.
We present a software suite of data analysis tools for
Phage Immunoprecipitation Sequencing (PhIP-Seq) [1],
a high-throughput phage-display sequencing technique.
All the tools presented here are
designed for flexibility and general purpose, suitable for
either commonly used (e.g. Virscan)
or highly customized phage-display libraries
(e.g. Phage-DMS).
We encourage bug reporting and all other
helpful feedback –
the source code for phippery
can be found at
https://github.com/matsengrp/phippery. For more
on contributing, see the
contributing page.
If you find these tools useful for your own research studies please cite our manuscript, phippery: a software suite for PhIP-Seq data analysis.
Getting Started
Head over to the Installation page to get installation instructions for each of the three tools described above.
To get a feel for running each of the three related tools pictured above, we suggest following a walk through of running the alignments pipeline on some empirical data from Stoddard et al. 2021 [2].
Check out the running your own data section for a bare minimum approach to getting enrichment values from your own NGS data.
Take a look at the Nextflow pipeline page for a better description of the pipeline and its features for downstream analysis.
Background
The advent of modern oligonucleotide synthesis allows researchers to generate highly multiplexed assays such as PhIP-Seq, which is used to investigate antibody-antigen interactions with comprehensive phage-display libraries. The library used in VirScan [3], a general purpose application of PhIP-Seq, comprises \(\mathcal{O}(10^5)\) peptides spanning over 1000 individual strains across 206 species of virus. There are also specialized library designs, such as in deep mutational scanning, for estimating the impact that mutations to a viral protein may have on antibody binding [4].
Despite the growing use of the protocol, there is not yet an established set of
software tools for bioinformatics and computational tasks with PhIP-Seq data.
Much of the published code is specific to the authors’ experiment, thus new researchers
are either piecing together snippets from others or developing scripts from scratch.
A goal of phippery
is to provide some efficient and unit-tested general infrastructure
for computing enrichment, data formatting/storing/transforming, and other common analysis
functions. Each of the tools presented here can be used separately or in
conjunction for the rapid exploration of PhIP-Seq data.
Here we focus most heavily on the Nextflow
pipeline as it provides a framework
for creating, modeling, and computing statistics on a PhIP-Seq dataset.
The pipeline inputs
are demultiplexed fastq files for each of the sample IP’s,
as well as annotation tables
for samples and peptides – CSV files with only a single column requirement each.
The default workflow then performs all of the major steps in processing the raw data and
obtaining an enrichment dataset (along with some other optional statistical goodies).
The pipeline will output a pickled binary of the
xarray.DataSet
as described in under the hood
, and/or optionally two common CSV formats
(tall & wide)
such that the user may query with their own favorite analysis tools.
References
Licensing and Acknowledgement
This work is provided by members of the Matsen and Overbaugh groups at the Fred Hutchinson Cancer Research Center. The software is publicly available licenced under the MIT License. The work presented is funded by the NIH, NSF, and HHMI.
For questions or concerns about these using tools, feel free to email jgallowa (at) fredhutch If you find these tools useful for your own research studies, please cite <X>
Note
for questions and/or suggestions, please open an issue