gctree.mutation_model
Mutation models.
Classes
A class for a mutation model, and functions to mutate sequences. |
- class gctree.mutation_model.MutationModel(mutability_file=None, substitution_file=None, mutation_order=True, with_replacement=True)[source]
A class for a mutation model, and functions to mutate sequences.
- Parameters:
Notes
mutability_file
shall be a csv file with the first column containing fivemers, and the second column containing mutability scores. An example can be found at https://bitbucket.org/kleinstein/shazam/src/master/data-raw/HS5F_Mutability.csvFor example:
Fivemer,Mutability,... TCGGG,0.03542,... GCCGG,0.02241675,... GCCGC,0.06789,... . . .
substitution_file
shall be a csv file with the first column containing fivemers, and the next four columns containing targeting probabilities for bases A, C, G, and T, respectively. An example can be found at https://bitbucket.org/kleinstein/shazam/src/master/data-raw/HS5F_Substitution.csvFor example:
Fivemer,A,C,G,T,... AAAAA,0,0.33,0.33,0.34,... AAAAC,0,0.5000,0.2500,0.2500,... AAAAG,0,0.65,0.15,0.20,... . . .
- __init__(mutability_file=None, substitution_file=None, mutation_order=True, with_replacement=True)[source]
- mutability(kmer)[source]
Returns the mutability of a central base of \(k\)-mer, along with nucleotide bias averages over ambiguous
"N"
nucleotide identities.
- mutabilities(sequence)[source]
Returns the mutability of a sequence at each site, along with nucleotide biases.
- mutate(sequence, lambda0=1, frame=None)[source]
Mutate a sequence, with lamdba0 the baseline mutability. Cannot mutate the same position multiple times.
- simulate(sequence, seq_bounds=None, fitness_function=<function MutationModel.<lambda>>, lambda0=[1], frame=None, N_init=1, N=None, T=None, n=None, verbose=False)[source]
Simulate a neutral binary branching process with the mutation model, returning a
ete3.Treenode
object.- Parameters:
sequence (
str
) – root nucleotide sequenceseq_bounds (
Optional
[Tuple
[Tuple
[int
,int
],Tuple
[int
,int
]]]) – ranges for two subsequences used as two parallel genesfitness_function (
Callable
) – mean number offspring as a function of sequencelambda0 (
List
[float64
]) – baseline mutation rate(s)frame (
Optional
[int
]) – coding frame of starting position(s)N_init (
int
) – initial naive abundnaceverbose (
bool
) – print more messages
- Return type:
TreeNode