gctree.branching_processes
This module contains classes for simulation and inference for a binary branching process with mutation in which the tree is collapsed to nodes that count the number of clonal leaves of each type.
Classes
A collection of trees. |
|
A collapsed tree, modeled as an infinite type Galton-Watson process run to extinction. |
Exceptions
- class gctree.branching_processes.CollapsedTree(tree=None, allow_repeats=False)[source]
A collapsed tree, modeled as an infinite type Galton-Watson process run to extinction.
- tree
ete3.TreeNode
object withabundance
node features
- Parameters:
tree (
Optional
[TreeNode
]) – ete3 tree withabundance
node features. If uncollapsed, it will be collapsed along branches with no mutations. Can be ommitted on initializaion, and later simulated. If a tree is provided, names of nodes with abundance 0 will not be preserved.allow_repeats (
bool
) – tolerate the existence of nodes with the same genotype after collapse, e.g. in sister clades.
- ll(p, q)[source]
Log likelihood of branching process parameters \((p, q)\) given tree topology \(T\) and genotype abundances \(A\).
\[\ell(p, q; T, A) = \log\mathbb{P}(T, A \mid p, q)\]
- mle(**kwargs)[source]
Maximum likelihood estimate of \((p, q)\).
\[(p, q) = \arg\max_{p,q\in [0,1]}\ell(p, q)\]- Parameters:
kwargs – keyword arguments passed along to the branching process likelihood
CollapsedTree.ll()
- Return type:
Tuple
[float64
,float64
]- Returns:
Tuple \((p, q)\) with estimated branching probability and estimated mutation probability
- simulate(p, q, root=True)[source]
Simulate a collapsed tree as an infinite type Galton-Watson process run to extintion, with branching probability \(p\) and mutation probability \(q\). Overwrites existing tree attribute.
- Parameters:
p (
float64
) – branching probabilityq (
float64
) – mutation probabilityroot (
bool
) – flag indicating simulation is being run from the root of the tree, so we should update tree attributes (should usually beTrue
)
- render(outfile, scale=None, branch_margin=0, node_size=None, idlabel=False, colormap=None, frame=None, position_map=None, chain_split=None, frame2=None, position_map2=None, show_support=False, show_nuc_muts=False)[source]
Render to tree image file.
- Parameters:
outfile (
str
) – file name to render to, filetype inferred from suffix, .svg for colorscale (
Optional
[float
]) – branch length scale in pixels (set automatically ifNone
)branch_margin (
float
) – additional leaf branch separation margin, in pixels, to scale tree widthnode_size (
Optional
[float
]) – size of nodes in pixels (set according to abundance ifNone
)idlabel (
bool
) – label nodes with seq ids, and write sequences of all nodes to a fasta file with same base name asoutfile
colormap (
Optional
[Dict
]) – dictionary mapping node names to color names or to dictionaries of color frequenciesframe (
Optional
[int
]) – coding frame for annotating amino acid substitutionsposition_map (
Optional
[List
]) – mapping of position names for sequence indices, to be used with substitution annotations and theframe
argumentchain_split (
Optional
[int
]) – if sequences are a concatenation two gene sequences, this is the index at which the 2nd one starts (requiresframe
andframe2
arguments)frame2 (
Optional
[int
]) – coding frame for 2nd sequence when usingchain_split
position_map2 (
Optional
[List
]) – likeposition_map
, but for 2nd sequence when usingchain_split
show_support (
bool
) – annotate bootstrap support if availableshow_nuc_muts (
bool
) – If True, annotate branches with nucleotide mutations. If False, and frame is provided, then branches will be annotated with amino acid mutations.
- feature_colormap(feature, cmap='viridis', vmin=None, vmax=None, scale='linear', **kwargs)[source]
Generate a colormap based on a continuous tree feature.
- Parameters:
feature (
str
) – feature name (all nodes in tree attribute must have this feature)cmap (
str
) – any matplotlib color palette namevmin (
Optional
[float
]) – minimum value for colormap (default to minimum of the feature over the tree)vmax (
Optional
[float
]) – maximum value for colormap (default to maximum of the feature over the tree)scale (
str
) –linear
(default),log
, orsymlog
(must also providelinthresh
kwarg)kwargs – additional keyword arguments for scale transformation
- Return type:
- Returns:
Dictionary of node names to hex color strings, which may be used as the colormap in
gctree.CollapsedTree.render()
- write(file_name)[source]
Serialize to pickle file.
- Parameters:
file_name (
str
) – file name (.p suffix recommended)
- newick(file_name)[source]
Write to newick file.
- Parameters:
file_name (
str
) – file name (.nk suffix recommended)
- compare(tree2, method='identity')[source]
Compare this tree to the other tree.
- Parameters:
tree2 (
CollapsedTree
) – another object of this typemethod (
str
) – comparison type (identity
,MRCA
, orRF
)
- Return type:
- Returns:
tree difference
- support(bootstrap_trees_list, weights=None, compatibility=False)[source]
Compute support from a list of bootstrap
CollapsedTree
objects, and add to tree attibute.- Parameters:
bootstrap_trees_list (
List
[CollapsedTree
]) – List of treesweights (
Optional
[List
[float64
]]) – weights for each tree, perhaps for weighting parsimony degenerate treescompatibility (
bool
) – counts trees that don’t disconfirm the split.
- local_branching(tau=1, tau0=1, infinite_root_branch=True, nan_root_lbr=False)[source]
Add local branching statistics (Neher et al. 2014) as tree node features to the ETE tree attribute. After execution, all nodes will have new features
LBI
(local branching index) andLBR
(local branching ratio, below Vs above the node)- Parameters:
tau – decay timescale for exponential filter
tau0 – effective branch length for branches with zero mutations
infinite_root_branch – calculate assuming the root node has an infinite branch
nan_root_lbr – replace the root LBR value with
np.nan
- class gctree.branching_processes.CollapsedForest(forest=None)[source]
A collection of trees.
We can intialize with a list of trees, each an instance of
ete3.Tree
orCollapsedTree
, or we can simulate the forest later.- n_trees
number of trees in forest
- parameters
fit branching process parameters, if mle has been run, otherwise None
- Parameters:
forest (
Optional
[List
[Union
[CollapsedTree
,TreeNode
]]]) – list ofete3.Tree
- simulate(p, q, n_trees)[source]
Simulate a forest of collapsed trees. Overwrites existing forest attribute.
- Parameters:
p (
float64
) – branching probabilityq (
float64
) – mutation probabilityn_trees (
int
) – number of trees
- ll(p, q, marginal=False)[source]
Log likelihood of branching process parameters \((p, q)\) given tree topologies \(T_1, \dots, T_n\) and corresponding genotype abundances vectors \(A_1, \dots, A_n\) for each of \(n\) trees in the forest.
If
marginal=False
(the default), compute the joint log likelihood\[\ell(p, q; T, A) = \sum_{i=1}^n\log\mathbb{P}(T_i, A_i \mid p, q),\]otherwise compute the marginal log likelihood
\[\ell(p, q; T, A) = \log\left(\sum_{i=1}^n\mathbb{P}(T_i, A_i \mid p, q)\right).\]- Parameters:
p (
float64
) – branching probabilityq (
float64
) – mutation probabilitymarginal (
bool
) – compute the marginal likelihood over trees, otherwise compute the joint likelihood of trees
- Return type:
- Returns:
Log branching process likelihood \(\ell(p, q; T, A)\) and its gradient \(\nabla\ell(p, q; T, A)\)
- mle(**kwargs)[source]
Maximum likelihood estimate of \((p, q)\).
\[(p, q) = \arg\max_{p,q\in [0,1]}\ell(p, q)\]- Parameters:
kwargs – keyword arguments passed along to the branching process likelihood
CollapsedForest.ll()
- Return type:
Tuple
[float64
,float64
]- Returns:
Tuple \((p, q)\) with estimated branching probability and estimated mutation probability
- filter_trees(ranking_strategy=None, mutability_file=None, substitution_file=None, ignore_isotype=False, chain_split=None, verbose=False, outbase='gctree.out', summarize_forest=False, tree_stats=False, img_type='svg', ranking_coeffs=None, branching_process_ranking_coeff=-1, use_old_mut_parsimony=False)[source]
Filter trees according to specified criteria.
By default, the forest will be trimmed to maximize branching process likelihood, then minimize isotype parsimony, then maximize context-based Poisson likelihood, and finally minimize number of alleles. Any criteria for which the necessary arguments are not provided will be automatically ignored.
For other ranking strategies, see the ranking_strategy argument.
- Parameters:
ranking_strategy (
Optional
[str
]) – A string expression describing how to rank trees. See docs for command line argument –ranking_strategy for description.ignore_isotype (
bool
) – Ignore isotype parsimony when ranking. By default, isotype information added with :meth:add_isotypes
will be used to compute isotype parsimony, which is used in ranking.chain_split (
Optional
[int
]) – The index at which non-adjacent sequences are concatenated, for calculating context-based Poisson likelihood.verbose (
bool
) – print information about trimmingoutbase (
str
) – file name stem for a file with information for each tree in the DAG.summarize_forest (
bool
) – whether to write a summary of the forest to file [outbase].forest_summary.logtree_stats (
bool
) – whether to write stats for each tree in the forest to file [outbase].tree_stats.logimg_type (
str
) – format for output plots.ranking_coeffs (
Optional
[Sequence
[float
]]) – (Deprecated. Useranking_strategy
instead) A list or tuple of coefficients for prioritizing tree weights. The order of coefficients is: isotype parsimony score, context poisson likelihood, and number of alleles. A coefficient of-1
will be applied to branching process likelihood by default, unless a different value is provided to the keyword argument branching_process_ranking_coeff. Trees are chosen to minimize this linear combination of tree weights, so weights for which larger values are more optimal (such as likelihoods) should have negative coefficients.branching_process_ranking_coeff (
float
) – (Deprecated. Useranking_strategy
instead) Ranking coefficient to use for branching process likelihood. Value is ignored unless ranking_coeffs argument is provided.use_old_mut_parsimony (
bool
) – (Deprecated. Useranking_strategy
instead) Whether to use the deprecated ‘mutability parsimony’ instead of context-based poisson likelihood (only applicable if mutability and substitution files are provided.
- Return type:
- Returns:
The trimmed forest, containing all optimal trees according to the specified criteria, and a tuple of data about the trees in that forest, with format (branching process likelihood, isotype parsimony, context-based Poisson likelihood, alleles).
- likelihood_rankplot(outbase, p, q, img_type='svg')[source]
Save a rank plot of likelihoods to the file [outbase].inference.likelihood_rank.[img_type].
- n_topologies()[source]
Count the number of topology classes, ignoring internal node sequences.
- Return type:
- iter_topology_classes()[source]
Sort trees by topology class.
- Returns:
- A generator of CollapsedForest objects, each containing trees with the same topology,
ignoring internal node labels. CollapsedForests will be yielded in reverse-order of the number of trees in each topology class, so that each CollapsedForest will contain at least as many trees as the one that follows.
- add_isotypes(isotypemap=None, isotypemap_file=None, idmap=None, idmap_file=None, isotype_names=None)[source]
Adds isotype annotations, including inferred ancestral isotypes, to all nodes in stored trees.