Gotchas
Missing Annotations (i.e. NaN’s)
When dealing with missing values in the annotation tables, we use the
pd.convert_dtypes
function to best allow for missing annotations, while maintaining the integrity of
the inferred datatype. It is highly recommended you stay consistent with datatype for feature annotations,
i.e. try not to mix values like 1
(integer), 6.7
(float), and hello_world
(string) in any one of the columns.
For missing data of any type,
The following values will be interpreted as NaN
:
‘’, ‘#N/A’, ‘#N/A N/A’,
‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’,
‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
Contributing
We welcome and encourage contributions to the pipeline. We recommend following the same instructions as tskit for submitting a pull request.
Phippery Development Install
For activate development, and documentation, we recommend using the following instructions inside of a virtual environment or equivalent.
» git clone https://github.com/matsengrp/phippery.git
» (cd phippery && pip install -e ".[dev]")
Next, run the tests to make sure everything is working properly.
» (cd phippery && pytest -vv)
PyPI
This process will most likely be handled by the maintainers of the project after a PR has been approved and merged into main.
Update the version:
bumpver update --patch
For small changes, use ``--patch``
For minor changes, use ``--minor``
For major changes, use ``--major``
Build the wheel
python -m build
Use Twine to check
twine check dist/*
Optionally, Use Twine to upload to testpypi
twine upload -r testpypi --verbose dist/*
Building Documentation
To edit the documentation seen here,
simply edit the respective .rst
file
(following the git workflow described below)
in the docs/
subdirectory. Once edited, you can check
the edits are rendered correctly by building the docs locally
» cd docs/
» make clean && make html
Then open the index file built at _build/html/index.html
with a browser of choice to inspect changes.
Once the changes have been approved and merged into the main branch the documentation will automatically build and deploy.
The Data Structure
The primary data structure resulting from PhIP-Seq experiments is an enrichment matrix, X, with i rows and j columns. Commonly, row index represents a peptide that is displayed on a phage, and each column represents a sample that was mixed with the entire phage library. After sequencing and demultiplexing each sample, we align the reads to the oligonucleotide reference library to observe a count of aligned reads to each peptide.
Outside of the enrichment matrix, each sample in an experiment as well as each peptide in the phage library used have number of important annotations required when performing analysis tasks like model fitting, normalizing, and differential selection. Additionally, the comparison across groups of virus proteins and sample types is crucial in many experiments. For large sample size experiments, it can be difficult to cross reference each of these groups before and after analysis.
Here, we take advantage of the powerful xarray approach to organizing all the Phip-Seq data along four primary coordinate dimensions which tie all sample/peptide enrichments to the respective annotations. Doing this allows us to store all the information without the error prone step of cross-checking separate dataframes, and without the large storage scaling of using “Tall” dataframes.