gctree.mutation_model

Mutation models.

Classes

MutationModel

A class for a mutation model, and functions to mutate sequences.

class gctree.mutation_model.MutationModel(mutability_file=None, substitution_file=None, mutation_order=True, with_replacement=True)[source]

A class for a mutation model, and functions to mutate sequences.

Parameters:
  • mutability_file (Optional[str]) – S5F format mutabilities

  • substitution_file (Optional[str]) – S5F format substitution biases

  • mutation_order (bool) – whether or not to mutate sequences using a context sensitive manner where mutation order matters

  • with_replacement (bool) – allow the same position to mutate multiple times on a single branch

Notes

mutability_file shall be a csv file with the first column containing fivemers, and the second column containing mutability scores. An example can be found at https://bitbucket.org/kleinstein/shazam/src/master/data-raw/HS5F_Mutability.csv

For example:

Fivemer,Mutability,...
TCGGG,0.03542,...
GCCGG,0.02241675,...
GCCGC,0.06789,...
.
.
.

substitution_file shall be a csv file with the first column containing fivemers, and the next four columns containing targeting probabilities for bases A, C, G, and T, respectively. An example can be found at https://bitbucket.org/kleinstein/shazam/src/master/data-raw/HS5F_Substitution.csv

For example:

Fivemer,A,C,G,T,...
AAAAA,0,0.33,0.33,0.34,...
AAAAC,0,0.5000,0.2500,0.2500,...
AAAAG,0,0.65,0.15,0.20,...
.
.
.
__init__(mutability_file=None, substitution_file=None, mutation_order=True, with_replacement=True)[source]
mutability(kmer)[source]

Returns the mutability of a central base of \(k\)-mer, along with nucleotide bias averages over ambiguous "N" nucleotide identities.

Parameters:

kmer (str) – nucleotide \(k\)-mer

Return type:

Tuple[float64, float64]

mutabilities(sequence)[source]

Returns the mutability of a sequence at each site, along with nucleotide biases.

Parameters:

sequence (str) – nucleotide sequence

Return type:

List[Tuple[float64, float64]]

mutate(sequence, lambda0=1, frame=None)[source]

Mutate a sequence, with lamdba0 the baseline mutability. Cannot mutate the same position multiple times.

Parameters:
  • sequence (str) – nucleotide sequence to mutate

  • lambda0 (float64) – a baseline mutation rate

  • frame (Optional[int]) – the reading frame of the first postition

Return type:

str

simulate(sequence, seq_bounds=None, fitness_function=<function MutationModel.<lambda>>, lambda0=[1], frame=None, N_init=1, N=None, T=None, n=None, verbose=False)[source]

Simulate a neutral binary branching process with the mutation model, returning a ete3.Treenode object.

Parameters:
  • sequence (str) – root nucleotide sequence

  • seq_bounds (Optional[Tuple[Tuple[int, int], Tuple[int, int]]]) – ranges for two subsequences used as two parallel genes

  • fitness_function (Callable) – mean number offspring as a function of sequence

  • lambda0 (List[float64]) – baseline mutation rate(s)

  • frame (Optional[int]) – coding frame of starting position(s)

  • N_init (int) – initial naive abundnace

  • N (Optional[int]) – maximum population size

  • T (Optional[int]) – maximum generation time

  • n (Optional[int]) – sample size

  • verbose (bool) – print more messages

Return type:

TreeNode