Command Line Interface

The CLI is written using the Click library, and thus both phippery -h, and phippery COMMAND -h will provide the same information provided below.

With the binary dataset output (default) and an installation of the phippery CLI tools, we can run the some useful queries on the dataset to learn a little about the dataset.

$ phippery about output/Pan-CoV-example.phip

The about command will print information about the three primary aspects of a single dataset; Samples, Peptides, and Enrichment Layers. For more about how the data is structured, see the under the hood page. Primarily, it tells you what information is available in terms of the Samples Table, Peptide Table, and Enrichment Layers.

Sample Table:
-------------
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 124 to 540
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   seq_dir              6 non-null      string
 1   library_batch        6 non-null      string
 2   control_status       6 non-null      string
 3   participant_ID       4 non-null      string
 4   patient_status       4 non-null      string
 5   fastq_filename       6 non-null      string
 6   raw-total-sequences  6 non-null      Int64
 7   reads-mapped         6 non-null      Int64
 8   error-rate           6 non-null      Float64
 9   average-quality      6 non-null      Float64
dtypes: Float64(2), Int64(2), string(6)
memory usage: 552.0 bytes

Above we see our example dataset sample table. The information about annotation feature data types, and missing information (NA) counts is provided by default.

As displayed, this dataset contains 6 samples, each with the annotations we fed to the pipeline along with some alignment statistics. While maybe not immediately useful, it’s nice to know which information you have available at any given time – especially after we start slicing or grouping datasets.

Further, you may want to know more detail about one of the annotation columns at a time. The about-feature will give you a useful description of the feature level distributions (categorical or numeric features), as well as a few example queries for help indexing the dataset by this annotation feature. Let’s take a look at our reads mapped annotation feature:

reads-mapped: Integer Feature:
---------------------------

distribution of numerical feature:

count         6.000000
mean     359803.000000
std      283811.764886
min      122878.000000
25%      147733.250000
50%      234885.500000
75%      597263.000000
max      729431.000000
Name: reads-mapped, dtype: float64


Some example query statements:
------------------------------

> "reads-mapped >= 359803"

> "reads-mapped <= 359803"

> "(reads-mapped >= 147733) and (reads-mapped <= 234885)"

Tip

run phippery -h for a list of possible commands. Additionally, you can run phippery COMMAND -h for option descriptions for a specific command.

phippery

Welcome to the phippery CLI!

Here we present a few commands that allow users to slice, transform, normalize, fit models to, and more given a binary pickle dump’d xarray, usually as a result of running the PhIP-Flow pipeline.

For more information and example workflows please refer to the full documentation at https://matsengrp.github.io/phippery/

phippery [OPTIONS] COMMAND [ARGS]...

about

Summarize the data in a given dataset

If no verbosity flag is set, this will print the basic information about number of samples, peptides, and enrichment layers in a given dataset. With a verbosity of one (-v) you will get also information about annotations and available datatypes. If verbosity flag is set to two (-vv) - Print detailed information about all data tables including annotation feature types and distribution of enrichment values for each layer. A verbosity of three will basically loop through all features and give you you the detailed description of each.

phippery about [OPTIONS] FILENAME

Options

-v, --verbose

Arguments

FILENAME

Required argument

about-feature

Summarize details about a specific sample or peptide annotation feature.

The function will tell you information about a specific feature in you sample annotation table, depending on it’s inferred datatype. For numeric feature types the command will get information about quantiles, for categorical or boolean feature types, the function will give individual factor-level counts.

Both datatype categories will print a few examples of valid dataset queries using the feature in question

phippery about-feature [OPTIONS] FEATURE FILENAME

Options

-d, --dimension <dimension>

The dimension we expect to find this feature

Options:

sample | peptide

--distribution, --counts

Force a specific output of either value counts or distribution for quantitative features

Arguments

FEATURE

Required argument

FILENAME

Required argument

load-from-csv

Load and dump xarray dataset given a set of wide csv’s

Using this command usually means you have either:

  1. Decided to store the output of your analysis in the form of wide csv’s instead of a pickle dump’d binary for longer-term storage.

  2. Created your own enrichment matrix without the help of the phip-flow alignment pipeline.

Note

In the case of #2, please note that your matrix data must be numeric and have shape (len(peptide_table), len(sample_table)). Finally, you must include pipeline outputs

Note

Currently only accepting a single enrichment matrix.

phippery load-from-csv [OPTIONS]

Options

-s, --sample_table <sample_table>

Required Path to sample table csv.

-p, --peptide_table <peptide_table>

Required Path to peptide table csv.

-c, --counts_matrix <counts_matrix>

Required Path to counts matrix csv.

-o, --output <output>

Required Path where the phip dataset will be dump’d to netCDF

merge

phippery merge [OPTIONS] DATASETS

Options

-o, --output <output>

Arguments

DATASETS

Required argument

query-expression

Perform a single pandas-style query expression on dataset samples

This command takes a single string query statement, applied it to the sample table in the dataset, and returns the dataset with the slice applied. This mean that all enrichment layers get sliced. If no output (-o) is provided, by default this command will overwrite the provided dataset file.

Note

for more information on pandas query style strings, please see the Pandas documentation additionally, I’ve found this blog very helpful for performing queries on a dataframe.

phippery query-expression [OPTIONS] EXPRESSION FILENAME

Options

-d, --dimension <dimension>
Options:

sample | peptide

-o, --output <output>

Arguments

EXPRESSION

Required argument

FILENAME

Required argument

query-table

Perform dataset index a csv giving a set of query expressions

This command takes a csv providing a set of queries for both samples or peptide and applies each to the respective annotation table in the dataset, and returns the dataset with the slice applied. This mean that all enrichment layers get sliced. If no output (-o) is provided, by default this command will overwrite the provided dataset file.

An example table (csv) might look like the following

Example

dimension

expression

sample

“Cohort == 2.0”

sample

“technical_replicate_id > 500”

Note

for more information on pandas query style strings, please see the Pandas documentation additionally, I’ve found this blog very helpful for performing queries on a dataframe.

phippery query-table [OPTIONS] FILENAME EXPRESSION_TABLE

Options

-o, --output <output>

Arguments

FILENAME

Required argument

EXPRESSION_TABLE

Required argument

to-tall-csv

Export the given dataset to a tall style dataframe.

phippery to-tall-csv [OPTIONS] FILENAME

Options

-o, --output <output>

Required

Arguments

FILENAME

Required argument

to-wide-csv

Export the given dataset to wide style dataframes.

phippery to-wide-csv [OPTIONS] FILENAME

Options

-o, --output-prefix <output_prefix>

Required

Arguments

FILENAME

Required argument