gctree.CollapsedTree

class gctree.CollapsedTree(tree=None, allow_repeats=False)[source]

Bases: object

A collapsed tree, modeled as an infinite type Galton-Watson process run to extinction.

tree

ete3.TreeNode object with abundance node features

Parameters:
  • tree (Optional[TreeNode]) – ete3 tree with abundance node features. If uncollapsed, it will be collapsed along branches with no mutations. Can be ommitted on initializaion, and later simulated. If a tree is provided, names of nodes with abundance 0 will not be preserved.

  • allow_repeats (bool) – tolerate the existence of nodes with the same genotype after collapse, e.g. in sister clades.

Methods

compare

Compare this tree to the other tree.

feature_colormap

Generate a colormap based on a continuous tree feature.

ll

Log likelihood of branching process parameters \((p, q)\) given tree topology \(T\) and genotype abundances \(A\).

local_branching

Add local branching statistics (Neher et al. 2014) as tree node features to the ETE tree attribute.

mle

Maximum likelihood estimate of \((p, q)\).

newick

Write to newick file.

render

Render to tree image file.

simulate

Simulate a collapsed tree as an infinite type Galton-Watson process run to extintion, with branching probability \(p\) and mutation probability \(q\).

support

Compute support from a list of bootstrap CollapsedTree objects, and add to tree attibute.

write

Serialize to pickle file.

ll(p, q)[source]

Log likelihood of branching process parameters \((p, q)\) given tree topology \(T\) and genotype abundances \(A\).

\[\ell(p, q; T, A) = \log\mathbb{P}(T, A \mid p, q)\]
Parameters:
  • p (float64) – branching probability

  • q (float64) – mutation probability

Return type:

Tuple[float64, ndarray]

Returns:

Log likelihood \(\ell(p, q; T, A)\) and its gradient \(\nabla\ell(p, q; T, A)\)

mle(**kwargs)[source]

Maximum likelihood estimate of \((p, q)\).

\[(p, q) = \arg\max_{p,q\in [0,1]}\ell(p, q)\]
Parameters:

kwargs – keyword arguments passed along to the branching process likelihood CollapsedTree.ll()

Return type:

Tuple[float64, float64]

Returns:

Tuple \((p, q)\) with estimated branching probability and estimated mutation probability

simulate(p, q, root=True)[source]

Simulate a collapsed tree as an infinite type Galton-Watson process run to extintion, with branching probability \(p\) and mutation probability \(q\). Overwrites existing tree attribute.

Parameters:
  • p (float64) – branching probability

  • q (float64) – mutation probability

  • root (bool) – flag indicating simulation is being run from the root of the tree, so we should update tree attributes (should usually be True)

render(outfile, scale=None, branch_margin=0, node_size=None, idlabel=False, colormap=None, frame=None, position_map=None, chain_split=None, frame2=None, position_map2=None, show_support=False, show_nuc_muts=False)[source]

Render to tree image file.

Parameters:
  • outfile (str) – file name to render to, filetype inferred from suffix, .svg for color

  • scale (Optional[float]) – branch length scale in pixels (set automatically if None)

  • branch_margin (float) – additional leaf branch separation margin, in pixels, to scale tree width

  • node_size (Optional[float]) – size of nodes in pixels (set according to abundance if None)

  • idlabel (bool) – label nodes with seq ids, and write sequences of all nodes to a fasta file with same base name as outfile

  • colormap (Optional[Dict]) – dictionary mapping node names to color names or to dictionaries of color frequencies

  • frame (Optional[int]) – coding frame for annotating amino acid substitutions

  • position_map (Optional[List]) – mapping of position names for sequence indices, to be used with substitution annotations and the frame argument

  • chain_split (Optional[int]) – if sequences are a concatenation two gene sequences, this is the index at which the 2nd one starts (requires frame and frame2 arguments)

  • frame2 (Optional[int]) – coding frame for 2nd sequence when using chain_split

  • position_map2 (Optional[List]) – like position_map, but for 2nd sequence when using chain_split

  • show_support (bool) – annotate bootstrap support if available

  • show_nuc_muts (bool) – If True, annotate branches with nucleotide mutations. If False, and frame is provided, then branches will be annotated with amino acid mutations.

feature_colormap(feature, cmap='viridis', vmin=None, vmax=None, scale='linear', **kwargs)[source]

Generate a colormap based on a continuous tree feature.

Parameters:
  • feature (str) – feature name (all nodes in tree attribute must have this feature)

  • cmap (str) – any matplotlib color palette name

  • vmin (Optional[float]) – minimum value for colormap (default to minimum of the feature over the tree)

  • vmax (Optional[float]) – maximum value for colormap (default to maximum of the feature over the tree)

  • scale (str) – linear (default), log, or symlog (must also provide linthresh kwarg)

  • kwargs – additional keyword arguments for scale transformation

Return type:

Dict[str, str]

Returns:

Dictionary of node names to hex color strings, which may be used as the colormap in gctree.CollapsedTree.render()

write(file_name)[source]

Serialize to pickle file.

Parameters:

file_name (str) – file name (.p suffix recommended)

newick(file_name)[source]

Write to newick file.

Parameters:

file_name (str) – file name (.nk suffix recommended)

compare(tree2, method='identity')[source]

Compare this tree to the other tree.

Parameters:
  • tree2 (CollapsedTree) – another object of this type

  • method (str) – comparison type (identity, MRCA, or RF)

Return type:

Union[bool, float64]

Returns:

tree difference

support(bootstrap_trees_list, weights=None, compatibility=False)[source]

Compute support from a list of bootstrap CollapsedTree objects, and add to tree attibute.

Parameters:
  • bootstrap_trees_list (List[CollapsedTree]) – List of trees

  • weights (Optional[List[float64]]) – weights for each tree, perhaps for weighting parsimony degenerate trees

  • compatibility (bool) – counts trees that don’t disconfirm the split.

local_branching(tau=1, tau0=1, infinite_root_branch=True, nan_root_lbr=False)[source]

Add local branching statistics (Neher et al. 2014) as tree node features to the ETE tree attribute. After execution, all nodes will have new features LBI (local branching index) and LBR (local branching ratio, below Vs above the node)

Parameters:
  • tau – decay timescale for exponential filter

  • tau0 – effective branch length for branches with zero mutations

  • infinite_root_branch – calculate assuming the root node has an infinite branch

  • nan_root_lbr – replace the root LBR value with np.nan