Command Line Interface
The CLI is written using the
Click
library, and thus both phippery -h
, and phippery COMMAND -h
will provide
the same information provided below.
With the binary dataset output (default) and an installation of the phippery CLI tools, we can run the some useful queries on the dataset to learn a little about the dataset.
$ phippery about output/Pan-CoV-example.phip
The about command will print information about the three primary aspects of a single dataset; Samples, Peptides, and Enrichment Layers. For more about how the data is structured, see the under the hood page. Primarily, it tells you what information is available in terms of the Samples Table, Peptide Table, and Enrichment Layers.
Sample Table:
-------------
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 124 to 540
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 seq_dir 6 non-null string
1 library_batch 6 non-null string
2 control_status 6 non-null string
3 participant_ID 4 non-null string
4 patient_status 4 non-null string
5 fastq_filename 6 non-null string
6 raw-total-sequences 6 non-null Int64
7 reads-mapped 6 non-null Int64
8 error-rate 6 non-null Float64
9 average-quality 6 non-null Float64
dtypes: Float64(2), Int64(2), string(6)
memory usage: 552.0 bytes
Above we see our example dataset sample table. The information about annotation feature data types, and missing information (NA) counts is provided by default.
As displayed, this dataset contains 6 samples, each with the annotations we fed to the pipeline along with some alignment statistics. While maybe not immediately useful, it’s nice to know which information you have available at any given time – especially after we start slicing or grouping datasets.
Further, you may want to know more detail about one of the annotation columns at a time. The about-feature will give you a useful description of the feature level distributions (categorical or numeric features), as well as a few example queries for help indexing the dataset by this annotation feature. Let’s take a look at our reads mapped annotation feature:
reads-mapped: Integer Feature:
---------------------------
distribution of numerical feature:
count 6.000000
mean 359803.000000
std 283811.764886
min 122878.000000
25% 147733.250000
50% 234885.500000
75% 597263.000000
max 729431.000000
Name: reads-mapped, dtype: float64
Some example query statements:
------------------------------
> "reads-mapped >= 359803"
> "reads-mapped <= 359803"
> "(reads-mapped >= 147733) and (reads-mapped <= 234885)"
Tip
run phippery -h
for a list of possible commands. Additionally, you can run phippery COMMAND -h
for option descriptions for a specific command.
phippery
Welcome to the phippery CLI!
Here we present a few commands that allow users to slice, transform, normalize, fit models to, and more given a binary pickle dump’d xarray, usually as a result of running the PhIP-Flow pipeline.
For more information and example workflows please refer to the full documentation at https://matsengrp.github.io/phippery/
phippery [OPTIONS] COMMAND [ARGS]...
about
Summarize the data in a given dataset
If no verbosity flag is set, this will print the basic information about number of samples, peptides, and enrichment layers in a given dataset. With a verbosity of one (-v) you will get also information about annotations and available datatypes. If verbosity flag is set to two (-vv) - Print detailed information about all data tables including annotation feature types and distribution of enrichment values for each layer. A verbosity of three will basically loop through all features and give you you the detailed description of each.
phippery about [OPTIONS] FILENAME
Options
- -v, --verbose
Arguments
- FILENAME
Required argument
about-feature
Summarize details about a specific sample or peptide annotation feature.
The function will tell you information about a specific feature in you sample annotation table, depending on it’s inferred datatype. For numeric feature types the command will get information about quantiles, for categorical or boolean feature types, the function will give individual factor-level counts.
Both datatype categories will print a few examples of valid dataset queries using the feature in question
phippery about-feature [OPTIONS] FEATURE FILENAME
Options
- -d, --dimension <dimension>
The dimension we expect to find this feature
- Options:
sample | peptide
- --distribution, --counts
Force a specific output of either value counts or distribution for quantitative features
Arguments
- FEATURE
Required argument
- FILENAME
Required argument
load-from-csv
Load and dump xarray dataset given a set of wide csv’s
Using this command usually means you have either:
Decided to store the output of your analysis in the form of wide csv’s instead of a pickle dump’d binary for longer-term storage.
Created your own enrichment matrix without the help of the phip-flow alignment pipeline.
Note
In the case of #2, please note that your matrix data must be numeric and have shape (len(peptide_table), len(sample_table)). Finally, you must include pipeline outputs
Note
Currently only accepting a single enrichment matrix.
phippery load-from-csv [OPTIONS]
Options
- -s, --sample_table <sample_table>
Required Path to sample table csv.
- -p, --peptide_table <peptide_table>
Required Path to peptide table csv.
- -c, --counts_matrix <counts_matrix>
Required Path to counts matrix csv.
- -o, --output <output>
Required Path where the phip dataset will be dump’d to netCDF
merge
phippery merge [OPTIONS] DATASETS
Options
- -o, --output <output>
Arguments
- DATASETS
Required argument
query-expression
Perform a single pandas-style query expression on dataset samples
This command takes a single string query statement, applied it to the sample table in the dataset, and returns the dataset with the slice applied. This mean that all enrichment layers get sliced. If no output (-o) is provided, by default this command will overwrite the provided dataset file.
Note
for more information on pandas query style strings, please see the Pandas documentation additionally, I’ve found this blog very helpful for performing queries on a dataframe.
phippery query-expression [OPTIONS] EXPRESSION FILENAME
Options
- -d, --dimension <dimension>
- Options:
sample | peptide
- -o, --output <output>
Arguments
- EXPRESSION
Required argument
- FILENAME
Required argument
query-table
Perform dataset index a csv giving a set of query expressions
This command takes a csv providing a set of queries for both samples or peptide and applies each to the respective annotation table in the dataset, and returns the dataset with the slice applied. This mean that all enrichment layers get sliced. If no output (-o) is provided, by default this command will overwrite the provided dataset file.
An example table (csv) might look like the following
dimension |
expression |
---|---|
sample |
“Cohort == 2.0” |
sample |
“technical_replicate_id > 500” |
Note
for more information on pandas query style strings, please see the Pandas documentation additionally, I’ve found this blog very helpful for performing queries on a dataframe.
phippery query-table [OPTIONS] FILENAME EXPRESSION_TABLE
Options
- -o, --output <output>
Arguments
- FILENAME
Required argument
- EXPRESSION_TABLE
Required argument
to-tall-csv
Export the given dataset to a tall style dataframe.
phippery to-tall-csv [OPTIONS] FILENAME
Options
- -o, --output <output>
Required
Arguments
- FILENAME
Required argument
to-wide-csv
Export the given dataset to wide style dataframes.
phippery to-wide-csv [OPTIONS] FILENAME
Options
- -o, --output-prefix <output_prefix>
Required
Arguments
- FILENAME
Required argument