gctree.branching_processes.CollapsedForest
- class gctree.branching_processes.CollapsedForest(forest=None)[source]
Bases:
object
A collection of trees.
We can intialize with a list of trees, each an instance of
ete3.Tree
orCollapsedTree
, or we can simulate the forest later.- n_trees
number of trees in forest
- parameters
fit branching process parameters, if mle has been run, otherwise None
- Parameters:
forest (
Optional
[List
[Union
[CollapsedTree
,TreeNode
]]]) – list ofete3.Tree
Methods
Adds isotype annotations, including inferred ancestral isotypes, to all nodes in stored trees.
Filter trees according to specified criteria.
Sort trees by topology class.
Save a rank plot of likelihoods to the file [outbase].inference.likelihood_rank.[img_type].
Log likelihood of branching process parameters \((p, q)\) given tree topologies \(T_1, \dots, T_n\) and corresponding genotype abundances vectors \(A_1, \dots, A_n\) for each of \(n\) trees in the forest.
Maximum likelihood estimate of \((p, q)\).
Count the number of topology classes, ignoring internal node sequences.
Sample a random CollapsedTree from the forest.
Simulate a forest of collapsed trees.
- simulate(p, q, n_trees)[source]
Simulate a forest of collapsed trees. Overwrites existing forest attribute.
- Parameters:
p (
float64
) – branching probabilityq (
float64
) – mutation probabilityn_trees (
int
) – number of trees
- ll(p, q, marginal=False)[source]
Log likelihood of branching process parameters \((p, q)\) given tree topologies \(T_1, \dots, T_n\) and corresponding genotype abundances vectors \(A_1, \dots, A_n\) for each of \(n\) trees in the forest.
If
marginal=False
(the default), compute the joint log likelihood\[\ell(p, q; T, A) = \sum_{i=1}^n\log\mathbb{P}(T_i, A_i \mid p, q),\]otherwise compute the marginal log likelihood
\[\ell(p, q; T, A) = \log\left(\sum_{i=1}^n\mathbb{P}(T_i, A_i \mid p, q)\right).\]- Parameters:
p (
float64
) – branching probabilityq (
float64
) – mutation probabilitymarginal (
bool
) – compute the marginal likelihood over trees, otherwise compute the joint likelihood of trees
- Return type:
- Returns:
Log branching process likelihood \(\ell(p, q; T, A)\) and its gradient \(\nabla\ell(p, q; T, A)\)
- mle(**kwargs)[source]
Maximum likelihood estimate of \((p, q)\).
\[(p, q) = \arg\max_{p,q\in [0,1]}\ell(p, q)\]- Parameters:
kwargs – keyword arguments passed along to the branching process likelihood
CollapsedForest.ll()
- Return type:
Tuple
[float64
,float64
]- Returns:
Tuple \((p, q)\) with estimated branching probability and estimated mutation probability
- filter_trees(ranking_strategy=None, mutability_file=None, substitution_file=None, ignore_isotype=False, chain_split=None, verbose=False, outbase='gctree.out', summarize_forest=False, tree_stats=False, img_type='svg', ranking_coeffs=None, branching_process_ranking_coeff=-1, use_old_mut_parsimony=False)[source]
Filter trees according to specified criteria.
By default, the forest will be trimmed to maximize branching process likelihood, then minimize isotype parsimony, then maximize context-based Poisson likelihood, and finally minimize number of alleles. Any criteria for which the necessary arguments are not provided will be automatically ignored.
For other ranking strategies, see the ranking_strategy argument.
- Parameters:
ranking_strategy (
Optional
[str
]) – A string expression describing how to rank trees. See docs for command line argument –ranking_strategy for description.ignore_isotype (
bool
) – Ignore isotype parsimony when ranking. By default, isotype information added with :meth:add_isotypes
will be used to compute isotype parsimony, which is used in ranking.chain_split (
Optional
[int
]) – The index at which non-adjacent sequences are concatenated, for calculating context-based Poisson likelihood.verbose (
bool
) – print information about trimmingoutbase (
str
) – file name stem for a file with information for each tree in the DAG.summarize_forest (
bool
) – whether to write a summary of the forest to file [outbase].forest_summary.logtree_stats (
bool
) – whether to write stats for each tree in the forest to file [outbase].tree_stats.logimg_type (
str
) – format for output plots.ranking_coeffs (
Optional
[Sequence
[float
]]) – (Deprecated. Useranking_strategy
instead) A list or tuple of coefficients for prioritizing tree weights. The order of coefficients is: isotype parsimony score, context poisson likelihood, and number of alleles. A coefficient of-1
will be applied to branching process likelihood by default, unless a different value is provided to the keyword argument branching_process_ranking_coeff. Trees are chosen to minimize this linear combination of tree weights, so weights for which larger values are more optimal (such as likelihoods) should have negative coefficients.branching_process_ranking_coeff (
float
) – (Deprecated. Useranking_strategy
instead) Ranking coefficient to use for branching process likelihood. Value is ignored unless ranking_coeffs argument is provided.use_old_mut_parsimony (
bool
) – (Deprecated. Useranking_strategy
instead) Whether to use the deprecated ‘mutability parsimony’ instead of context-based poisson likelihood (only applicable if mutability and substitution files are provided.
- Return type:
- Returns:
The trimmed forest, containing all optimal trees according to the specified criteria, and a tuple of data about the trees in that forest, with format (branching process likelihood, isotype parsimony, context-based Poisson likelihood, alleles).
- likelihood_rankplot(outbase, p, q, img_type='svg')[source]
Save a rank plot of likelihoods to the file [outbase].inference.likelihood_rank.[img_type].
- n_topologies()[source]
Count the number of topology classes, ignoring internal node sequences.
- Return type:
- iter_topology_classes()[source]
Sort trees by topology class.
- Returns:
- A generator of CollapsedForest objects, each containing trees with the same topology,
ignoring internal node labels. CollapsedForests will be yielded in reverse-order of the number of trees in each topology class, so that each CollapsedForest will contain at least as many trees as the one that follows.
- add_isotypes(isotypemap=None, isotypemap_file=None, idmap=None, idmap_file=None, isotype_names=None)[source]
Adds isotype annotations, including inferred ancestral isotypes, to all nodes in stored trees.