Command line interface

tdms

Train and evaluate neural networks on deep mutational scanning data.

tdms [OPTIONS] COMMAND [ARGS]...

Options

-v, --version

Print version and exit. Note that as per git describe, the SHA is prefixed by a g.

beta

Plot beta coefficients as a heatmap.

tdms beta [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

cartesian

Take the cartesian product of the variable options in a config file, and put it all in an _output directory.

tdms cartesian [OPTIONS] CHOICE_JSON_PATH

Arguments

CHOICE_JSON_PATH

Required argument

create

Create a model.

See the documentation for each model to see an example model string.

tdms create [OPTIONS] DATA_PATH OUT_PATH MODEL_STRING

Options

--monotonic <monotonic>

If this option is used, then the model will be initialized with weights greater than zero. During training with this model then, tdms will put a floor of 0 on all non-bias weights. It will also multiply the output by the value provided as an option argument here, so use -1.0 if you want your nonlinearity to be monotonically decreasing, or 1.0 if you want it to be increasing.

--beta-l1-coefficients <beta_l1_coefficients>

Coefficients with which to l1-regularize beta coefficients, a comma-seperated list of coefficients for each latent dimension.

--interaction-l1-coefficients <interaction_l1_coefficients>

Coefficients with which to l1-regularize site interaction weights, a comma-seperated list of coefficients for each latent dimension

--non-lin-bias, --no-non-lin-bias
--output-bias, --no-output-bias
--seed <seed>

Set random seed. Seed is uninitialized if not set.

--config <config>

Read configuration from FILE.

Arguments

DATA_PATH

Required argument

OUT_PATH

Required argument

MODEL_STRING

Required argument

error

Evaluate and produce plot of error.

tdms error [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--show-points

Show points in addition to LOWESS curves.

--device <device>
--include-details

Include details from config file in error summary.

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

evaluate

Evaluate the performance of a model.

Dump to a dictionary containing the results.

tdms evaluate [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--device <device>
--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

geplot

Make a “global epistasis” plot showing the fit to the nonlinearity.

tdms geplot [OPTIONS] MODEL_PATH DATA_PATH

Options

--steps <steps>
Default

100

--out <out>

Required

--device <device>
--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

go

Run a common sequence of commands: create, train, scatter, and beta.

Then touch a .sentinel file to signal successful completion.

tdms go [OPTIONS]

Options

--config <config>

Read configuration from FILE.

heatmap

Plot single mutant predictions as a heatmap.

Note/warning: because of the way we have set up the encoding, the heatmap values cannot be interpreted in a straightfoward way.

tdms heatmap [OPTIONS] MODEL_PATH

Options

--out <out>

Required

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

prep

Prepare data for training.

IN_PATH should point to a pickle dump’d Pandas DataFrame containing the string encoded aa_substitutions column along with any TARGETS you specify. OUT_PREFIX is the location to dump the prepped data to another pickle file.

tdms prep [OPTIONS] IN_PATH OUT_PREFIX TARGETS...

Options

--per-stratum-variants-for-test <per_stratum_variants_for_test>

This is the number of variants for each stratum to hold out for testing, with the same number used for validation. The rest of the examples will be used for training the model.

Default

100

--skip-stratum-if-count-is-smaller-than <skip_stratum_if_count_is_smaller_than>

If the total number of examples for any particular stratum is lower than this number, we throw out the stratum completely.

Default

250

--drop-nans

Drop all rows that contain a nan.

--export-dataframe <export_dataframe>

Filename prefix for exporting the original dataframe in a .pkl file with an appended in_test column.

--partition-by <partition_by>

Column name containing a feature by which the data should be split into independent datasets for partitioning; e.g. ‘library’.

--train-on-all-single-mutants

Place all single-mutants into training set.

--dry-run

Only print paths and files to be made, rather than actually making them.

--seed <seed>

Set random seed. Seed is uninitialized if not set.

--config <config>

Read configuration from FILE.

Arguments

IN_PATH

Required argument

OUT_PREFIX

Required argument

TARGETS

Required argument(s)

profiles

Plot amino acid and site profiles from low-rank approximation.

tdms profiles [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

scatter

Evaluate and produce scatter plot of observed vs predicted targets on the test set provided.

tdms scatter [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--device <device>
--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

summarize

Report various summaries of the data.

tdms summarize [OPTIONS] DATA_PATH

Options

--out-prefix <out_prefix>

If this flag is set, make pdf plots summarizing the data.

--config <config>

Read configuration from FILE.

Arguments

DATA_PATH

Required argument

svd

Plot singular values of beta matricies.

tdms svd [OPTIONS] MODEL_PATH DATA_PATH

Options

--out <out>

Required

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

train

Train a model, saving trained model to original location.

tdms train [OPTIONS] MODEL_PATH DATA_PATH

Options

--loss-fn <loss_fn>

Loss function for training.

Default

l1

--loss-weight-span <loss_weight_span>

If this option is used, add a weight to a mean-absolute-deviation loss equal to the exponential of a loss decay times the true score.

--batch-size <batch_size>

Batch size for training.

Default

500

--learning-rate <learning_rate>

Initial learning rate.

Default

0.001

--min-lr <min_lr>

Minimum learning rate before early stopping on training.

Default

1e-05

--patience <patience>

Patience for ReduceLROnPlateau.

Default

10

--device <device>

Device used to train nn

Default

cpu

--independent-starts <independent_starts>

Number of independent training starts to use. Each training start gets trained independently and the best start is used for full training.

Default

5

--independent-start-epochs <independent_start_epochs>

How long to train each independent start. If not set, 10% of the full number of epochs is used.

--simple-training

Ignore all fancy training options: do bare-bones training for a fixed number of epochs. Fail if data contains nans.

--exp-target <exp_target>

Provide base to be exponentiated by functional scores of variants.Emphasizes fitting highly functional variants. If on, weight decay will be turned off.

--beta-rank <beta_rank>

What number of dimensions to use in the low-rank reconstructions of betas.

--epochs <epochs>

Number of epochs for full training.

Default

100

--site-path <site_path>

Path to .JSON file containing both site numbers and site numbers.

--dry-run

Only print paths and files to be made, rather than actually making them.

--seed <seed>

Set random seed. Seed is uninitialized if not set.

--config <config>

Read configuration from FILE.

Arguments

MODEL_PATH

Required argument

DATA_PATH

Required argument

transfer

Transfer beta coefficients from one tdms model to another.

tdms transfer [OPTIONS] SOURCE_PATH DEST_PATH

Arguments

SOURCE_PATH

Required argument

DEST_PATH

Required argument

validate

Validate that a given data set is sane.

tdms validate [OPTIONS] DATA_PATH

Arguments

DATA_PATH

Required argument