Escape Profile

Use optimal transport method to compare phage-dms escape profiles. See the escape profile description and examples.

phippery.escprof.compute_sim_score(a, b, cost_matrix)[source]

Returns the similarity score given two distributions and the cost matrix.

Parameters:

a (list) – A distribution of relative contribution for each amino acid to scaled differential selection.
b (list) – Another distribution of relative contribution for each amino acid to scaled differential selection.
cost_matrix (list) – The cost matrix to evaluate optimal transport from a to b.

Returns:

The similarity score (reciprocal of the optimal transport cost).

Return type:

float

phippery.escprof.get_cost_matrix()[source]

Returns the default 40x40 cost matrix based on BLOSUM62 and assigns maximum cost to transport between opposite signed differential selection contributions.

Returns:: 40x40 matrix (as a list of lists)
Return type:: list

phippery.escprof.get_loc_esc_distr(ds, metric, sample_factor, sfact_val, loc)[source]

Returns the normalized distribution represented as a list for the amino acid pattern of scaled differential selection for a specified individual and amino acid site.

Parameters:

ds (xarray.DataSet) – The dataset containing the sample of interest.
metric (str) – The name of the scaled differential selection data in ds.
sample_factor (str) – The sample annotation label to identify the individual sample (e.g. ‘sample_ID’).
sfact_val (str) – The sample_factor value to identify the sample of interest.
loc (int) – The location number for the amino acid site of interest.

Returns:

The relative contributions to the total absolute scaled differential selection at the site. The first 20 entries are contributions to negative selection (binding loss). The last 20 entries are contributions to positive selection (binding gain).

Return type:

list

phippery.escprof.loc_sim_score(ds, metric, cost_matrix, sample_factor, sfact_val1, sfact_val2, loc)[source]

Returns the similarity score for comparison at a site between two samples.

Parameters:

ds (xarray.DataSet) – The dataset containing the sample of interest.
metric (str) – The name of the scaled differential selection data in ds.
cost_matrix (list) – The cost matrix to evaluate optimal transport between two distributions.
sample_factor (str) – The sample annotation label to identify the samples (e.g. ‘sample_ID’).
sfact_val1 (str) – The sample_factor value to identify sample 1.
sfact_val2 (str) – The sample_factor value to identify sample 2.
loc (int) – The location number for the amino acid site of interest.

Returns:

The similarity score at the amino acid site.

Return type:

float

phippery.escprof.region_sim_score(ds, metric, cost_matrix, sample_factor, sfact_val1, sfact_val2, loc_start, loc_end)[source]

Returns the similarity score for comparison in the region [loc_start, loc_end].

Parameters:

ds (xarray.DataSet) – The dataset containing the sample of interest
metric (str) – The name of the scaled differential selection data in ds.
cost_matrix (list) – The cost matrix to evaluate optimal transport between two distributions.
sample_factor (str) – The sample annotation label to identify the samples (e.g. ‘sample_ID’).
sfact_val1 (str) – The sample_factor value to identify sample 1.
sfact_val2 (str) – The sample_factor value to identify sample 2.
loc_start (int) – The location number for the first amino acid site in the region of interest.
loc_end (int) – The location number for the last amino acid site in the region of interest.

Returns:

The similarity score for the region.

Return type:

float