historydag.HistoryDag

class historydag.HistoryDag(dagroot, attr={})[source]

An object to represent a collection of internally labeled trees. A wrapper object to contain exposed HistoryDag methods and point to a HistoryDagNode root.

Parameters:

dagroot (HistoryDagNode) – The root node of the history DAG
attr (Any) – An attribute to contain data which will be preserved by copying (default and empty dict)

Subclassing HistoryDag: HistoryDag may be subclassed without overriding __init__, by defining a _required_label_fields class variable for any subclasses.

The value of _required_label_fields should be a dictionary keyed by label fields that are expected by methods of the subclass. Each dictionary entry shall be of the form required_field: [(from_fields, conversion_func), …], where the dict value is a list of tuples, with each conversion_func a function mapping HistoryDagNode`s to the value of that node label’s `required_field field, and from_fields a tuple containing all label fields expected by that function.

Keyword arguments passed to HistoryDag.from_history_dag() will be passed to conversion functions provided in the appropriate subclass’s _required_label_fields attribute. Be sure to document each subclass, including available conversion functions and their keywords, in each subclass’s docstring.

__init__(dagroot, attr={})[source]

Methods

`__init__`(dagroot[, attr])
`add_all_allowed_edges`(args, *kwargs)	Provided as a deprecated synonym for `make_complete()`.
`add_label_fields`([new_field_names, ...])	Returns a copy of the DAG in which each node's label is extended to include the new fields listed in new_field_names.
`add_node_at_all_possible_places`(new_leaf_id)	Inserts a sequence into the dag such that every tree in the dag now contains that new node.
`average_pairwise_rf_distance`([...])	Return the average Robinson-Foulds distance between pairs of histories.
`convert_to_collapsed`()	Rebuilds the DAG so that no edge connects two nodes with the same label, unless one is a leaf node.
`copy`()	Uses bytestring serialization, and is guaranteed to copy: :rtype: `HistoryDag`
`count_edges`([collapsed])	Counts the number of trees each edge takes part in.
`count_histories`([expand_func, ...])	Annotates each node in the DAG with the number of clade sub-trees underneath.
`count_nodes`([collapse, rooted])	Counts the number of trees each node takes part in.
`count_optimal_histories`([start_func, ...])	Count the number of histories which would be left if the DAG were trimmed.
`count_paths_to_leaf`(leaf_label[, ...])	Annotates each node in the DAG with the number of paths to `leaf_label` underneath.
`count_rf_distances`(history[, rooted, ...])	Returns a Counter containing all RF distances to a given history.
`count_sum_rf_distances`(reference_dag[, ...])	Returns a Counter containing all sum RF distances to a given reference DAG.
`count_topologies`([collapse_leaves])	Counts the number of unique topologies in the history DAG.
`count_topologies_fast`()	Counts the number of unique topologies in the history DAG.
`count_trees`(args, *kwargs)	Deprecated name for `count_histories()`
`edge_probabilities`([log_probabilities, ...])
`explode_nodes`([expand_func, ...])	Explode nodes according to a provided function.
`export_edge_probabilities`()	Return a dictionary keyed by (parent, child) `HistoryDagNode` pairs, with downward conditional edge probabilities as values.
`fast_sample`([tree_builder, log_probabilities])	This is a non-recursive alternative to `HistoryDag.sample()`, which is likely to be slower on small DAGs, but may allow significant optimizations on large DAGs, or in the case that the data format being sampled is something other than a tree-shaped HistoryDag object.
`find_node`(filter_func)	Return the first (non-UA) node for which `filter_func` evaluates to True.
`find_nodes`(filter_func)	Return a generator on (non-UA) nodes for which `filter_func` evaluates to True.
`from_history_dag`(dag[, label_fields])	Converts HistoryDag instances between subclasses of HistoryDag.
`get_annotated_edges`([skip_ua_node])	Return a generator containing all edges in the history DAG, and their weights and downward conditional edge probabilities.
`get_edges`([skip_ua_node])	Return a generator containing all edges in the history DAG, as parent, child node tuples.
`get_histories`()	Return a generator containing all histories in the history DAG.
`get_histories_by_index`(key_iterator[, ...])	Retrieving a history by index is slow, since each retrieval requires running the `trim_optimal_weight` method on the entire DAG to populate node counts.
`get_label_type`()	Return the type for labels on this dag's nodes.
`get_leaves`()	Return a generator containing all leaf nodes in the history DAG.
`get_probability_countfuncs`([...])	Produce a `historydag.utils.AddFuncDict()` containing functions to compute history probabilities using e.g. `HistoryDag.optimal_weight_annotate()`.
`get_topologies`([collapse_leaves])	Return a list of pseudo-newick representations of topologies in the history DAG.
`get_trees`()	Deprecated name for `get_histories()`
`hamming_parsimony_count`()	Deprecated in favor of `sequence_dag.SequenceHistoryDag.hamming_parsimony_count()`.
`history_intersect`(reference_dag[, key])	Modify this HistoryDag to contain only the histories which are also contained in `reference_dag`.
`insert_node`(new_leaf_id[, id_name, ...])	Inserts a sequence into the DAG.
`internal_avg_parents`()	Returns the average number of parents among internal nodes.
`is_clade_tree`()	Deprecated name for `is_history()`
`is_history`()	Returns whether history DAG is a history.
`iter_covering_histories`([cover_edges])	Samples a sequence of histories which together contain all nodes in the history DAG.
`label_uncertainty_summary`()	Print information about internal nodes which have the same child clades but different labels.
`leaf_path_uncertainty_dag`(terminal_node[, ...])	Create a DAG of possible paths leading to terminal_node
`leaf_path_uncertainty_graphviz`(terminal_node)	Create a graphviz DAG of possible paths leading to terminal_node
`make_complete`([new_from_root, ...])	Add all allowed edges to the DAG in place.
`make_uniform`()	Deprecated name for `HistoryDag.uniform_distribution_annotate()`
`mean_history_weight`([edge_weight_func])
`merge`(trees)	Graph union this history DAG with all those in a list of history DAGs.
`most_supported_trees`()	Trims the DAG to only express the trees that have the highest support.
`natural_distribution_annotate`([...])	Set edge probabilities to 1/n, where n is the count of edges descending from the corresponding node-clade pair.
`node_probabilities`([log_probabilities, ...])	Compute the probability of each node in the DAG.
`nodes_above_node`(node)	Return a set of nodes from which the passed node is reachable along directed edges.
`num_edges`([skip_ua_node])	Return the number of edges in the DAG, including edges descending from the UA node, unless skip_ua_node is True.
`num_leaves`()	Return the number of leaf nodes in the DAG.
`num_nodes`()	Return the number of nodes in the DAG, not counting the UA node.
`optimal_rf_distance`(history[, rooted, ...])	Returns the optimal (min or max) RF distance to a given history.
`optimal_sum_rf_distance`(reference_dag[, ...])	Returns the optimal (min or max) summed rooted RF distance to all histories in the reference DAG.
`optimal_weight_annotate`([start_func, ...])	A template method for finding the optimal tree weight in the DAG.
`overestimate_rf_diameter`()	Returns an overestimate of the RF diameter of the DAG.
`postorder`([include_root])	Recursive postorder traversal of the history DAG.
`postorder_above`(terminal_node[, ...])	Recursive postorder traversal of all ancestors of a (possibly internal) node.
`postorder_cladetree_accum`(args, *kwargs)	Deprecated name for `HistoryDag.postorder_history_accum()`
`postorder_history_accum`(leaf_func, ...[, ...])	A template method for leaf-to-root dynamic programming.
`preorder`([skip_ua_node, skip_root])	Recursive postorder traversal of the history DAG.
`preorder_history_accum`(leaf_func, edge_func, ...)	A template method for leaf-to-root and root-to-leaf dynamic programming.
`probability_annotate`(edge_weight_func[, ...])	Uses the supplied edge weight function to compute conditional probabilities on edges.
`recompute_parents`()	Repopulate `HistoryDagNode.parent` attributes.
`relabel`(relabel_func[, relax_type])	Return a new HistoryDag with labels modified according to a provided function.
`remove_label_fields`([fields_to_remove])	Returns a copy of the DAG with the list of fields_to_remove dropped from each node's label.
`sample`([edge_selector, log_probabilities])	Samples a history from the history DAG.
`sample_with_edge`(edge)	Samples a history which contains `edge` (a tuple of HistoryDagNodes) from the history DAG.
`sample_with_node`(node)	Samples a history which contains `node` from the history DAG.
`serialize`()	rtype: `bytes`
`set_sample_mask`(edge_selector[, ...])	Zero out edge weights for masked edges before calling `HistoryDag.fast_sample()`.
`shared_history_count`(reference_dag[, key])	Count the histories which are also contained in `reference_dag`.
`sum_probability`([log_probabilities])	Compute the total probability of all histories in the DAG, using downward conditional edge probabilities.
`sum_rf_distances`([reference_dag, rooted, ...])	Computes the sum of Robinson-Foulds distances over all pairs of histories in this DAG and the provided reference DAG.
`sum_weights`([edge_weight_func])	For weights which are a sum over edges, compute the sum of all tree weights in the DAG.
`summary`()	Print summary info about the history DAG.
`to_ascii`(name_func, **kwargs)	A convenience function that uses the `to_ete()` method and ete3's ASCII drawing tools to render a history.
`to_ete`([name_func, features, feature_funcs])	Convert a history DAG which is a history to an ete tree.
`to_graphviz`([labelfunc, namedict, ...])	Converts history DAG to graphviz (dot format) Digraph object.
`to_newick`([name_func, features, feature_funcs])	Converts a history to extended newick format.
`to_newicks`(**kwargs)	Returns a list of extended newick strings formed with label fields.
`trim_below_weight`(max_weight[, start_func, ...])	Trim the dag to contain at least all the histories within the specified weight range.
`trim_optimal_rf_distance`(history[, rooted, ...])	Trims this history DAG to the optimal (min or max) RF distance to a given history.
`trim_optimal_sum_rf_distance`(reference_dag)	Trims the DAG to contain only histories with the optimal (min or max) sum rooted RF distance to the given reference DAG.
`trim_optimal_weight`([start_func, ...])	Trims the DAG to only express trees with optimal weight.
`trim_topology`(topology[, collapse_leaves])	Trims the history DAG to express only trees matching the provided topology.
`trim_within_range`([min_weight, max_weight, ...])
`underestimate_rf_diameter`()	Returns an underestimate of the RF diameter of the DAG.
`uniform_distribution_annotate`([...])	Adjust edge probabilities so that the DAG expresses a uniform distribution on expressed trees.
`unlabel`()	Sets all internal node labels to be identical, and merges nodes so that all histories in the DAG have unique topologies.
`update_label_fields`(field_names, ...)	Changes label field values to values returned by the function new_field_values.
`weight_count`([start_func, edge_weight_func, ...])	A template method for counting weights of trees expressed in the history DAG.
`weight_counts_with_ambiguities`([start_func, ...])	Template method for counting tree weights in the DAG, with exploded labels.
`weight_range_annotate`([start_func, ...])	Computes the minimum and maximum weight of any history in the history DAG.

classmethod from_history_dag(dag, label_fields=None, **kwargs)[source]

Converts HistoryDag instances between subclasses of HistoryDag. No copy is performed, so the passed dag will in general be modified.

Parameters:

dag (HistoryDag) – A HistoryDag (or subclass) instance
label_fields (Optional[Sequence[str]]) – A list specifying the order of label fields in node labels on the resulting HistoryDag
kwargs – Any additional arguments required for label conversions. For details, see the class docstring for the subclass into which the conversion is taking place.

Returns:

The converted HistoryDag object, carrying the type from which this static method was called. After conversion to the new HistoryDag subclass to_cls, the following will be true about node labels:

If passed label_fields is None, then existing label fields will be preserved, except that missing required label fields will be recovered if possible, and the existing label fields used to recover them will be omitted. Recovered label fields will appear before the existing label fields.
If passed label_fields is not None, then it must include all fields expected in node labels in the converted history DAG object, otherwise an exception will be raised.
Converted node label field order will match the order of passed label_fields.
All label fields passed in label_fields will be included in converted node labels, if possible. Otherwise, an exception will be raised.

__init__(dagroot, attr={})[source]

get_histories_by_index(key_iterator, tree_builder_func=None)[source]

Retrieving a history by index is slow, since each retrieval requires running the trim_optimal_weight method on the entire DAG to populate node counts. This method instead runs that method a single time and yields a history for each index yielded by key_iterator.

Parameters:

key_iterator – An iterator on desired history indices. May be consumable, as it will only be used once.
tree_builder_func – A function accepting an index and returning a PreorderTreeBuilder instance to be used to build the history with that index. If None (default), then tree-shaped HistoryDag objects will be yielded using PreorderHistoryBuilder.

get_label_type()[source]

Return the type for labels on this dag’s nodes.

Return type:: type

trim_below_weight(max_weight, start_func=None, edge_weight_func=None, min_possible_weight=-inf)[source]

Trim the dag to contain at least all the histories within the specified weight range.

Supports totally ordered weights, accumulated by addition. A weight type must implement all ordering operators properly, as well as + and -, and addition and subtraction must respect the ordering. That is, if a < b, then a + c < b + c for any c (including negative c)

get_histories()[source]

Return a generator containing all histories in the history DAG.

Note that each history is a tree-shaped history DAG, containing a UA node, which exists as a subgraph of the history DAG.

The order of these histories does not necessarily match the order of indexing. That is, dag.get_histories() and history for history in dag will result in different orderings. get_histories should be slightly faster, but possibly more memory intensive.

Return type:: Generator[HistoryDag, None, None]

get_trees()[source]

Deprecated name for get_histories()

Return type:: Generator[HistoryDag, None, None]

get_leaves()[source]

Return a generator containing all leaf nodes in the history DAG.

Return type:: Generator[HistoryDagNode, None, None]

get_edges(skip_ua_node=False)[source]

Return a generator containing all edges in the history DAG, as parent, child node tuples.

Edges’ parent nodes will be in preorder.

Return type:: Generator[Tuple[HistoryDagNode, HistoryDagNode], None, None]

get_annotated_edges(skip_ua_node=False)[source]

Return a generator containing all edges in the history DAG, and their weights and downward conditional edge probabilities.

Yields ((parent, child), weight, probability) for each edge.

Edges’ parent nodes will be in preorder.

Return type:: Generator[Tuple[HistoryDagNode, HistoryDagNode], None, None]

num_edges(skip_ua_node=False)[source]

Return the number of edges in the DAG, including edges descending from the UA node, unless skip_ua_node is True.

Return type:: int

num_nodes()[source]

Return the number of nodes in the DAG, not counting the UA node.

Return type:: int

num_leaves()[source]

Return the number of leaf nodes in the DAG.

Return type:: int

find_nodes(filter_func)[source]

Return a generator on (non-UA) nodes for which filter_func evaluates to True.

Return type:: Generator[HistoryDagNode, None, None]

find_node(filter_func)[source]

Return the first (non-UA) node for which filter_func evaluates to True.

Return type:: HistoryDagNode

fast_sample(tree_builder=None, log_probabilities=False)[source]

This is a non-recursive alternative to HistoryDag.sample(), which is likely to be slower on small DAGs, but may allow significant optimizations on large DAGs, or in the case that the data format being sampled is something other than a tree-shaped HistoryDag object.

This method does not provide an edge_selector argument like HistoryDag.sample(). Instead, any masking of edges should be done prior to sampling using the HistoryDag.set_sample_mask() method, or by modifying the arguments to HistoryDag.probability_annotate().

Parameters:

tree_builder (Optional[PreorderTreeBuilder]) – a PreorderTreeBuilder instance to handle construction of the sampled tree.
log_probabilities – Whether edge probabilities annotated on this DAG (using, for example, HistoryDag.probability_annotate()) are on a log-scale.

sample(edge_selector=lambda e: ..., log_probabilities=False)[source]

Samples a history from the history DAG. (A history is a sub-history DAG containing the root and all leaf nodes) For reproducibility, set random.seed before sampling.

When there is an option, edges pointing to nodes on which edge_selector is True will always be chosen.

Returns a new HistoryDag object.

To use the more general sampling pattern which allows an arbitrary PreorderTreeBuilder object, use HistoryDag.fast_sample() instead.

Return type:: HistoryDag

nodes_above_node(node)[source]

Return a set of nodes from which the passed node is reachable along directed edges.

Return type:: Set[HistoryDagNode]

sample_with_node(node)[source]

Samples a history which contains node from the history DAG.

Sampling is likely unbiased from the distribution of trees in the DAG, conditioned on each sampled tree containing the passed node. However, if unbiased sampling from the conditional distribution is important, this should be tested.

Return type:: HistoryDag

sample_with_edge(edge)[source]

Samples a history which contains edge (a tuple of HistoryDagNodes) from the history DAG.

Sampling is likely unbiased from the distribution of trees in the DAG, conditioned on each sampled tree containing the passed edge. However, if unbiased sampling from the conditional distribution is important, this should be tested.

Return type:: HistoryDag

iter_covering_histories(cover_edges=False)[source]

Samples a sequence of histories which together contain all nodes in the history DAG.

Histories are sampled using sample_with_node(), starting with the nodes which are contained in the fewest of the DAG’s histories. The sequence of trees is therefore non-deterministic unless random.seed is set.

Return type:: Generator[HistoryDag, None, None]

unlabel()[source]

Sets all internal node labels to be identical, and merges nodes so that all histories in the DAG have unique topologies.

Return type:: HistoryDag

relabel(relabel_func, relax_type=False)[source]

Return a new HistoryDag with labels modified according to a provided function.

Parameters:

relabel_func (Callable[[HistoryDagNode], Union[NamedTuple, UALabel]]) – A function which takes a node and returns the new label appropriate for that node. The relabel_func should return a consistent NamedTuple type with name Label. That is, all returned labels should have matching _fields attribute. Method is only guaranteed to work when no two leaf nodes are mapped to the same new label. If this is not the case, this method may raise a warning or error, or may fail silently, returning an invalid HistoryDag.
relax_type – Whether to require the returned HistoryDag to be of the same subclass as self. If True, the returned HistoryDag will be of the abstract type HistoryDag

Return type:

HistoryDag

add_label_fields(new_field_names=[], new_field_values=lambda n: ...)[source]

Returns a copy of the DAG in which each node’s label is extended to include the new fields listed in new_field_names.

Parameters:

new_field_names – A list of strings consisting of the names of the new fields to add.
new_field_values – A callable that takes a node and returns the ordered list of values for each new field name to assign to that node.

remove_label_fields(fields_to_remove=[])[source]

Returns a copy of the DAG with the list of fields_to_remove dropped from each node’s label.

Parameters:: fields_to_remove – A list of strings consisting of the names of the new fields to remove.

update_label_fields(field_names, new_field_values)[source]

Changes label field values to values returned by the function new_field_values. This method is not in-place, but returns a new DAG.

Parameters:

field_names – A list of strings containing names of label fields whose contents are to be modified
new_field_values – A function taking a node and returning a tuple of field values whose order matches field_names

is_history()[source]

Returns whether history DAG is a history.

That is, each node-clade pair has exactly one descendant edge.

Return type:: bool

is_clade_tree()[source]

Deprecated name for is_history()

Return type:: bool

copy()[source]

Uses bytestring serialization, and is guaranteed to copy: :rtype: HistoryDag

node labels
node attr attributes
edge weights
edge probabilities

However, other object attributes will not be copied.

history_intersect(reference_dag, key=lambda n: ...)[source]

Modify this HistoryDag to contain only the histories which are also contained in reference_dag.

Parameters:

reference_dag (HistoryDag) – The history DAG with which this one will be intersected. reference_dag will not be modified.
key – A function accepting a node and returning a value which will be used to compare nodes.

shared_history_count(reference_dag, key=lambda n: ...)[source]

Count the histories which are also contained in reference_dag.

Parameters:

reference_dag (HistoryDag) – The history DAG with which this one will be intersected. reference_dag will not be modified.
key – A function accepting a node and returning a value which will be used to compare nodes.

Return type:

int

Returns:

The number of histories shared between this history DAG and the reference.

merge(trees)[source]: Graph union this history DAG with all those in a list of history DAGs.

add_all_allowed_edges(*args, **kwargs)[source]

Provided as a deprecated synonym for make_complete().

Return type:: int

make_complete(new_from_root=True, adjacent_labels=True, preserve_parent_labels=False)[source]

Add all allowed edges to the DAG in place.

Parameters:

new_from_root (bool) – If False, no edges will be added that start at the DAG root. Useful when attempting to constrain root label.
adjacent_labels (bool) – If False, no edges will be added between nodes with the same labels. Useful when attempting to maintain the history DAG in a ‘collapsed’ state.
preserve_parent_labels (bool) – If True, ensures that for any edge added between a parent and child node, the parent node label was already among the original parent labels of the child node. This ensures that parsimony score is preserved.

Return type:

int

Returns:

The number of edges added to the history DAG

to_newick(name_func=<function HistoryDag.<lambda>>, features=None, feature_funcs={})[source]

Converts a history to extended newick format. Supports arbitrary node names and a sequence feature. For use on a history DAG which is a history.

For extracting newick representations of trees in a general history DAG, see HistoryDag.to_newicks().

Parameters:

name_func (Callable[[HistoryDagNode], str]) – A map from nodes to newick node names
features (Optional[List[str]]) – A list of label field names to be included in extended newick data. If None, all label fields will be included. To include none of them, pass an empty list.
feature_funcs (Mapping[str, Callable[[HistoryDagNode], str]]) – A dictionary keyed by extended newick field names, containing functions specifying how to populate that field for each node.

Return type:

str

Returns:

A newick string. If features is an empty list, and feature_funcs is empty,: then this will be a standard newick string. Otherwise, it will have ete3’s extended newick format.

to_ascii(name_func, **kwargs)[source]

A convenience function that uses the to_ete() method and ete3’s ASCII drawing tools to render a history.

See HistoryDagNode.to_ascii() for details.

to_ete(name_func=<function HistoryDag.<lambda>>, features=None, feature_funcs={})[source]

Convert a history DAG which is a history to an ete tree.

Parameters:

name_func (Callable[[HistoryDagNode], str]) – A map from nodes to newick node names
features (Optional[List[str]]) – A list of label field names to be included in extended newick data. If None, all label fields will be included. To include none of them, pass an empty list.
feature_funcs (Mapping[str, Callable[[HistoryDagNode], str]]) – A dictionary keyed by extended newick field names, containing functions specifying how to populate that field for each node.

Return type:

TreeNode

Returns:

An ete3 Tree with the same topology as self, and node names and attributes as specified.

to_graphviz(labelfunc=None, namedict={}, show_child_clades=True, show_partitions=None, level_leaves=False, graph_attr={}, node_attr={}, edge_attr={}, edge_attr_inheritance='none', show_edge_probs=False, show_edge_weights=False)[source]

Converts history DAG to graphviz (dot format) Digraph object.

Parameters:

labelfunc (Optional[Callable[[HistoryDagNode], str]]) – A function to label nodes. If None, nodes will be labeled by their DAG node labels, or their label hash if label data is too large.
namedict (Mapping[Union[NamedTuple, UALabel], str]) – A dictionary from node labels to label strings. Labelfunc will be used instead, if both are provided.
show_child_clades (bool) – Whether to include child clades in output.
show_partitions (Optional[bool]) – Deprecated alias for show_child_clades.
level_leaves (bool) – Whether to draw leaves on the same level, or wherever they fall naturally.
graph_attr (dict) – Additional graphviz graph attributes (see graphviz docs)
node_attr (dict) – Additional graphviz node attributes (see graphviz docs)
edge_attr (dict) – Additional graphviz edge attributes (see graphviz docs)
edge_attr_inheritance (str) – “parent” to inherit from parent node, “child” to inherit from child, or “none”.
show_edge_probs (bool) – whether to show edge probabilities
show_edge_weights (bool) – whether to show edge weights

Return type:

Digraph

Notes

Graphviz dot format attributes are documented at https://graphviz.org/doc/info/attrs.html The graphviz attributes passed to this method are for the entire graph. Attributes for individual nodes can be included in individual node attr dictionaries under the key gv_attrs. For example, node.attr['gv_attrs'] = {'color': 'red'} will color a node red in the graphviz output.

internal_avg_parents()[source]

Returns the average number of parents among internal nodes.

A simple measure of similarity between the trees that the DAG expresses. However, keep in mind that two trees with the same topology but different labels would be considered entirely unalike by this measure.

Return type:: float

explode_nodes(expand_func=parsimony_utils.default_nt_transitions.ambiguity_map.get_sequence_resolution_func('sequence'), expand_node_func=None, expandable_func=None)[source]

Explode nodes according to a provided function. Adds copies of each node to the DAG with exploded labels, but with the same parents and children as the original node.

Parameters:

expand_func (Callable[[Union[NamedTuple, UALabel]], Iterable[Union[NamedTuple, UALabel]]]) – (Deprecated) A function that takes a node label, and returns an iterable containing ‘exploded’ or ‘disambiguated’ labels corresponding to the original. The wrapper utils.explode_label() is provided to make such a function easy to write.
expand_node_func (Optional[Callable[[HistoryDagNode], Iterable[Union[NamedTuple, UALabel]]]]) – A function that takes a node and returns an iterable containing ‘exploded’ or ‘disambiguated’ labels corresponding to the node. If provided, expand_func will be ignored.
expandable_func (Optional[Callable[[Union[NamedTuple, UALabel]], bool]]) – A function that takes a node label, and returns whether the iterable returned by calling expand_func on that label would contain more than one item.

Return type:

int

Returns:

The number of new nodes added to the history DAG.

leaf_path_uncertainty_dag(terminal_node, node_data_func=lambda n: ...)[source]

Create a DAG of possible paths leading to terminal_node

Parameters:

terminal_node – The returned path DAG will contain all paths from the UA node ending at this node.
node_data_func – A function accepting a HistoryDagNode and returning data for the corresponding node in the path dag. Return type must be a valid dictionary key.

Returns:

A dictionary keyed by return values of node_data_func, with each value a dictionary keyed by child nodes, with edge supports as values.

Return type:

child_dictionary

leaf_path_uncertainty_graphviz(terminal_node, node_data_func=lambda n: ..., node_label_func=lambda n: ...)[source]

Create a graphviz DAG of possible paths leading to terminal_node

Parameters:

terminal_node – The returned path DAG will contain all paths from the UA node ending at this node.
node_data_func – A function accepting a HistoryDagNode and returning data for the corresponding node in the path dag. Return type must be a valid dictionary key.
node_label_func – A function accepting an object of the type returned by node_data_func, and returning a label to be displayed on the corresponding node.

summary()[source]: Print summary info about the history DAG.

label_uncertainty_summary()[source]: Print information about internal nodes which have the same child clades but different labels.

postorder_history_accum(leaf_func, edge_func, accum_within_clade, accum_between_clade, accum_above_edge=None, compute_edge_probabilities=False, normalize_edgeweights=None)[source]

A template method for leaf-to-root dynamic programming.

Intermediate computations are stored in a _dp_data attribute on each node. Note that a Weight can be whatever you like, such as integers, Counters, strings, or dictionaries.

Parameters:

leaf_func (Callable[[HistoryDagNode], Any]) – A function to assign weights to leaf nodes
edge_func (Callable[[HistoryDagNode, HistoryDagNode], Any]) – A function to assign weights to edges. The parent node will always be the first argument.
accum_within_clade (Callable[[List[Any]], Any]) – A function which accumulates a list of weights of subtrees below a single clade. That is, the weights are for alternative trees.
accum_between_clade (Callable[[List[Any]], Any]) – A function which accumulates a list of weights of subtrees below different clades. That is, the weights are for different parts of the same tree.
accum_above_edge (Optional[Callable[[Any, Any], Any]]) – A function which adds the weight for a subtree to the weight of the edge above it. If None, this function will be inferred from accum_between_clade. The edge weight is the second argument.
compute_edge_probabilities (bool) – If True, compute downward-conditional edge probabilities, proportional to aggregated subtree weights below and including each edge descending from a node-clade pair.

Return type:

Any

Returns:

The resulting weight computed for the History DAG UA (root) node.

postorder_cladetree_accum(*args, **kwargs)[source]

Deprecated name for HistoryDag.postorder_history_accum()

Return type:: Any

optimal_weight_annotate(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, **kwargs)[source]

A template method for finding the optimal tree weight in the DAG. Dynamically annotates each node in the DAG with the optimal weight of a clade sub-tree beneath it, so that the DAG root node is annotated with the optimal weight of a history in the DAG.

Parameters:

start_func (Optional[Callable[[HistoryDagNode], Any]]) – A function which assigns starting weights to leaves.
edge_weight_func (Optional[Callable[[HistoryDagNode, HistoryDagNode], Any]]) – A function which assigns weights to DAG edges based on the parent node and the child node, in that order.
accum_func (Optional[Callable[[List[Any]], Any]]) – A function which takes a list of weights of different parts of a tree, and returns a weight, like sum.
optimal_func (Optional[Callable[[List[Any]], Any]]) – A function which takes a list of weights and returns the optimal one, like min.

Return type:

Any

Returns:

The optimal weight of a tree under the DAG UA node.

count_optimal_histories(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, eq_func=<function HistoryDag.<lambda>>, **kwargs)[source]

Count the number of histories which would be left if the DAG were trimmed.

That is, how many histories would be left if HistoryDag.trim_optimal_weight() were called with the same arguments?

:param All arguments are the same as HistoryDag.trim_optimal_weight().:

Returns:

A utils.IntState object containing the number of optimal histories in the DAG, with state attribute containing their (optimal) weight.

As a side-effect, each node’s _dp_data attribute is populated with IntState objects containing the number of optimal sub-histories rooted at that node, and the weight of those sub-histories.

sum_weights(edge_weight_func=None, **kwargs)[source]: For weights which are a sum over edges, compute the sum of all tree weights in the DAG.

weight_count(start_func=None, edge_weight_func=None, accum_func=None, **kwargs)[source]

A template method for counting weights of trees expressed in the history DAG.

Weights must be hashable, but may otherwise be of arbitrary type.

Default arguments are contained in this HistoryDag subclass’s _default_args variable, and are documented in the subclass docstring.

Parameters:

start_func (Optional[Callable[[HistoryDagNode], Any]]) – A function which assigns a weight to each leaf node
edge_weight_func (Optional[Callable[[HistoryDagNode, HistoryDagNode], Any]]) – A function which assigns a weight to pairs of labels, with the parent node label the first argument
accum_func (Optional[Callable[[List[Any]], Any]]) – A way to ‘add’ a list of weights together

Returns:

A Counter keyed by weights.

weight_range_annotate(start_func=None, edge_weight_func=None, accum_func=None, min_func=<built-in function min>, max_func=<built-in function max>, **kwargs)[source]

Computes the minimum and maximum weight of any history in the history DAG.

As a side-effect, this method also stores in each node’s _dp_data attribute a tuple containing the minimum and maximum weights of any sub-history beneath that node.

Parameters:

start_func (Optional[Callable[[HistoryDagNode], Any]]) – A function which assigns a weight to each leaf node
edge__weight_func – A function which assigns a weight to pairs of labels, with the parent node label the first argument
accum_func (Optional[Callable[[List[Any]], Any]]) – A way to ‘add’ a list of weights together
min_func (Callable[[List[Any]], Any]) – A function which takes a list of weights and returns their “minimum”
max_func (Callable[[List[Any]], Any]) – A function which takes a list of weights and returns their “maximum”

Returns:

A tuple containing the minimum and maximum weight of any history in the history DAG.

hamming_parsimony_count()[source]: Deprecated in favor of sequence_dag.SequenceHistoryDag.hamming_parsimony_count().

to_newicks(**kwargs)[source]

Returns a list of extended newick strings formed with label fields.

Arguments are passed to utils.make_newickcountfuncs(). Arguments are the same as for historydag.HistoryDag.to_newick().

count_topologies(collapse_leaves=False)[source]

Counts the number of unique topologies in the history DAG. This is achieved by counting the number of unique newick strings with only leaves labeled.

count_histories() gives the total number of unique trees in the DAG, taking into account internal node labels.

For large DAGs, this method is prohibitively slow. Use count_topologies_fast() instead.

Parameters:: collapse_leaves (bool) – By default, topologies are counted as-is in the DAG. However, even if the DAG is collapsed by label, edges above leaf nodes will not be collapsed. if collapse_leaves is True, then the number of unique topologies with all leaf-adjacent edges collapsed will be counted. Assumes that the DAG is collapsed with HistoryDag.convert_to_collapsed().
Return type:: int
Returns:: The number of topologies in the history DAG

count_topologies_fast()[source]

Counts the number of unique topologies in the history DAG.

This is achieved by creating a new history DAG in which all internal nodes have matching labels.

This is only guaranteed to match the output of count_topologies if the DAG has all allowed edges added.

Return type:: int

count_trees(*args, **kwargs)[source]: Deprecated name for count_histories()

count_histories(expand_func=None, expand_count_func=lambda ls: ..., bifurcating=False)[source]

Annotates each node in the DAG with the number of clade sub-trees underneath.

Parameters:

expand_func (Optional[Callable[[Union[NamedTuple, UALabel]], List[Union[NamedTuple, UALabel]]]]) – A function which takes a label and returns a list of labels, for example disambiguations of an ambiguous sequence. If provided, this method will count at least the number of histories that would be in the DAG, if explode_nodes() were called with the same expand_func.
expand_count_func (Callable[[Union[NamedTuple, UALabel]], int]) – A function which takes a label and returns an integer value corresponding to the number of ‘disambiguations’ of that label. If provided, expand_func will be used to find this value.
bifurcating – If True, the number of bifurcating topologies possible below each node will be computed. This is only an underestimate of the true number, since nodes that would be created by adding all resolutions of multifurcating nodes may already be present, resulting in additional subtree swaps.

Returns:

The total number of unique complete trees below the root node. If expand_func or expand_count_func is provided, the complete trees being counted are not guaranteed to be unique. If bifurcating is True, then the values stored in nodes’ _dp_data attributes will include all resolutions of multifurcations below a node, but not of a node’s own multifurcation. To get the number of bifurcating subtrees below a node, one can use ``node._dp_data * utils.count_labeled_binary_topologies(len(node.clades)).

preorder_history_accum(leaf_func, edge_func, accum_within_clade, accum_between_clade, ua_start_val, accum_above_edge=None)[source]

A template method for leaf-to-root and root-to-leaf dynamic programming.

Parameters:

leaf_func (Callable[[HistoryDagNode], Any]) – A function to assign weights to leaf nodes
edge_func (Callable[[HistoryDagNode, HistoryDagNode], Any]) – A function to assign weights to edges. The parent node will always be the first argument.
accum_within_clade (Callable[[List[Any]], Any]) – A function which accumulates a list of weights of subtrees below a single clade. That is, the weights are for alternative trees.
accum_between_clade (Callable[[List[Any]], Any]) – A function which accumulates a list of weights of subtrees below different clades. That is, the weights are for different parts of the same tree.
accum_above_edge (Optional[Callable[[Any, Any], Any]]) – A function which adds the weight for a subtree to the weight of the edge above it. If None, this function will be inferred from accum_between_clade. The edge weight is the second argument.

Returns:

One describing downward weights below each node, and another describing upward weights above each node

Return type:

Two dictionaries

count_nodes(collapse=False, rooted=True)[source]

Counts the number of trees each node takes part in.

For node supports with respect to a uniform distribution on trees, use HistoryDag.uniform_distribution_annotate() and HistoryDag.node_probabilities().

Parameters:

collapse – A flag that when set to true, treats nodes as clade unions and ignores label information. Then, the returned dictionary is keyed by clade union sets.
rooted – A flag which is ignored unless collapse is True. When rooted is also False, the returned dictionary is keyed by splits – that is, sets containing each clade union and its complement, with values the number of (rooted) trees in the DAG containing each split. Splits are not double-counted when a tree has a bifurcating root. If False, dag is expected to have trees all on the same set of leaf labels.

Return type:

Dict[HistoryDagNode, int]

Returns:

A dictionary mapping each node in the DAG to the number of trees that it takes part in.

count_edges(collapsed=False)[source]

Counts the number of trees each edge takes part in.

Return type:: Dict[Tuple[HistoryDagNode, HistoryDagNode], int]
Returns:: A dictionary mapping each edge in the DAG to the number of trees that it takes part in.

most_supported_trees()[source]: Trims the DAG to only express the trees that have the highest support.

count_paths_to_leaf(leaf_label, expand_func=None, expand_count_func=lambda ls: ...)[source]

Annotates each node in the DAG with the number of paths to leaf_label underneath.

Parameters:

leaf_label – The label of the leaf node of interest
expand_func (Optional[Callable[[Union[NamedTuple, UALabel]], List[Union[NamedTuple, UALabel]]]]) – A function which takes a label and returns a list of labels, for example disambiguations of an ambiguous sequence. If provided, this method will count at least the number of histories that would be in the DAG, if explode_nodes() were called with the same expand_func.
expand_count_func (Callable[[Union[NamedTuple, UALabel]], int]) – A function which takes a label and returns an integer value corresponding to the number of ‘disambiguations’ of that label. If provided, expand_func will be used to find this value.

Returns:

The total number of unique paths to the leaf node of interest. If expand_func or expand_count_func is provided, the paths being counted are not guaranteed to be unique.

weight_counts_with_ambiguities(start_func=None, edge_func=None, accum_func=None, expand_func=None)[source]

Template method for counting tree weights in the DAG, with exploded labels. Like HistoryDag.weight_count(), but creates dictionaries of Counter objects at each node, keyed by possible sequences at that node. Analogous to HistoryDag.count_histories() with expand_func provided.

Weights must be hashable.

Parameters:

start_func (Optional[Callable[[HistoryDagNode], Any]]) – A function which assigns a weight to each leaf node
edge_func (Optional[Callable[[Union[NamedTuple, UALabel], Union[NamedTuple, UALabel]], Any]]) – A function which assigns a weight to pairs of labels, with the parent node label the first argument. Must correctly handle the UA node label which is a UALabel instead of a namedtuple.
accum_func (Optional[Callable[[List[Any]], Any]]) – A way to ‘add’ a list of weights together
expand_func (Optional[Callable[[Union[NamedTuple, UALabel]], Iterable[Union[NamedTuple, UALabel]]]]) – A function which takes a label and returns a list of labels, such as disambiguations of an ambiguous sequence.

Returns:

A Counter keyed by weights. The total number of trees will be greater than count_histories(), as these are possible disambiguations of trees. These disambiguations may not be unique, but if two are the same, they come from different subtrees of the DAG.

underestimate_rf_diameter()[source]

Returns an underestimate of the RF diameter of the DAG. This estimate is calculated by calculating the maximal sum RF distance between the DAG and a random tree from a topological outlier.

On a set of DAGs with 2000 or less histories, this underestimate is quite accurate compared to the actual computed RF diameter.

overestimate_rf_diameter()[source]

Returns an overestimate of the RF diameter of the DAG. This estimate is calculated by calculating twice of the maximal sum RF distance between the DAG and a random tree from the median tree.

On a set of DAGs with 2000 or less histories, this underestimate was not close compared to the actual RF diameter. However, the overestimate was never more than twice of the actual RF diameter.

optimal_sum_rf_distance(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]

Returns the optimal (min or max) summed rooted RF distance to all histories in the reference DAG.

The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use :meth:optimal_weight_annotate and :meth:utils.make_rfdistance_countfuncs instead of making multiple calls to this method with the same reference history DAG.

trim_optimal_sum_rf_distance(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]

Trims the DAG to contain only histories with the optimal (min or max) sum rooted RF distance to the given reference DAG.

See utils.sum_rfdistance_funcs() for detailed documentation of arguments.

Trimming to the minimum sum RF distance is equivalent to finding ‘median’ topologies, and trimming to maximum sum rf distance is equivalent to finding topological outliers.

The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use :meth:trim_optimal_weight and :meth:utils.sum_rfdistance_funcs instead of making multiple calls to this method with the same reference history.

trim_optimal_rf_distance(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]

Trims this history DAG to the optimal (min or max) RF distance to a given history.

See utils.make_rfdistance_countfuncs() for detailed documentation of arguments.

Also returns that optimal RF distance

The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use optimal_weight_annotate() and utils.make_rfdistance_countfuncs() instead of making multiple calls to this method with the same reference history.

optimal_rf_distance(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]

Returns the optimal (min or max) RF distance to a given history.

See utils.make_rfdistance_countfuncs() for detailed documentation of arguments.

The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use optimal_weight_annotate() and utils.make_rfdistance_countfuncs() instead of making multiple calls to this method with the same reference history.

count_rf_distances(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1))[source]

Returns a Counter containing all RF distances to a given history.

The given history must be on the same taxa as all trees in the DAG.

See utils.make_rfdistance_countfuncs() for detailed documentation of arguments.

Since computing reference splits is expensive, it is better to use weight_count() and utils.make_rfdistance_countfuncs() instead of making multiple calls to this method with the same reference history.

count_sum_rf_distances(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1))[source]

Returns a Counter containing all sum RF distances to a given reference DAG.

See utils.sum_rfdistance_funcs() for detailed documentation of arguments.

The given history DAG must be on the same taxa as all trees in the DAG.

Since computing reference splits is expensive, it is better to use weight_count() and utils.sum_rfdistance_funcs() instead of making multiple calls to this method with the same reference history DAG.

sum_rf_distances(reference_dag=None, rooted=True, one_sided=None, one_sided_coefficients=(1, 1))[source]

Computes the sum of Robinson-Foulds distances over all pairs of histories in this DAG and the provided reference DAG.

Parameters:

reference_dag (Optional[HistoryDag]) – If None, the sum of pairwise distances between histories in this DAG is computed. If provided, the sum is over pairs containing one history in this DAG and one from reference_dag.
rooted (bool) – If False, use edges’ splits for RF distance computation. Otherwise, use the clade below each edge.
one_sided (Optional[str]) – May be ‘left’, ‘right’, or None. ‘left’ means that we count splits (or clades, in the rooted case) which are in the reference trees but not in the DAG tree, especially useful if trees in the DAG might be resolutions of multifurcating trees in the reference DAG. ‘right’ means that we count splits or clades in the DAG tree which are not in the reference trees, useful if the reference trees are possibly resolutions of multifurcating trees in the DAG. If not None, one_sided_coefficients are ignored.
one_sided_coefficients (Tuple[float, float]) – coefficients for non-standard symmetric difference calculations. See utils.make_rfdistance_countfuncs() for more details.

Returns:

An integer sum of RF distances.

If T is the set of histories in the reference DAG, and T’ is the set of histories in this DAG, then the returned sum is:

\[\sum_{t\in T} \sum_{t'\in T'} d(t, t')\]

That is, since RF distance is symmetric, when T = T’ (such as when reference_dag=None), or when the intersection of T and T’ is nonempty, some distances are counted twice.

Note that when computing one-sided distances, or when the one_sided_coefficients values are not equal, this ‘distance’ is no longer symmetric.

average_pairwise_rf_distance(reference_dag=None, non_identical=True, **kwargs)[source]

Return the average Robinson-Foulds distance between pairs of histories.

Parameters:

reference_dag (Optional[HistoryDag]) – A history DAG from which to take the second history in each pair. If None, self will be used as the reference.
non_identical – If True, mean divisor will be the number of non-identical pairs.
kwargs – See historydag.sum_rf_distances() for additional keyword arguments

Returns:

The average rf-distance between pairs of histories, where the first history comes from this DAG, and the second comes from reference_dag. The normalization constant is the product of the number of histories in the two DAGs, unless non_identical is True, in which case the number of histories which appear in both DAGs is subtracted from this constant.

trim_optimal_weight(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, eq_func=<function HistoryDag.<lambda>>)[source]

Trims the DAG to only express trees with optimal weight. This is guaranteed to be possible when edge_weight_func depends only on the labels of an edge’s parent and child node.

Requires that weights are of a type that supports reliable equality testing. In particular, floats are not recommended. Instead, consider defining weights to be a precursor type, and define optimal_func to choose the one whose corresponding float is maximized/minimized.

If floats must be used, a Numpy type may help.

Parameters:

start_func (Optional[Callable[[HistoryDagNode], Any]]) – A function which assigns starting weights to leaves.
edge_weight_func (Optional[Callable[[HistoryDagNode, HistoryDagNode], Any]]) – A function which assigns weights to DAG edges based on the parent node and the child node, in that order.
accum_func (Optional[Callable[[List[Any]], Any]]) – A function which takes a list of weights of different parts of a tree, and returns a weight, like sum.
optimal_func (Optional[Callable[[List[Any]], Any]]) – A function which takes a list of weights and returns the optimal one, like min.
eq_func (Callable[[Any, Any], bool]) – A function which tests equality, taking a pair of weights and returning a bool.

Return type:

Any

get_topologies(collapse_leaves=False)[source]

Return a list of pseudo-newick representations of topologies in the history DAG.

The newicks returned are not well-formed, and are for use with HistoryDag.trim_topology(). Otherwise, this method would be equivalent to HistoryDag.to_newicks() with keyword arguments internal_labels=False and collapsed_leaves as desired.

Parameters:: collapse_leaves (bool) – Whether to collapse leaf-adjacent edges between nodes with matching labels
Return type:: List[str]
Returns:: A list of strings, each representing a topology present in the history DAG.

trim_topology(topology, collapse_leaves=False)[source]

Trims the history DAG to express only trees matching the provided topology.

Parameters:

topology (str) – A string like one output by HistoryDag.get_topologies()
collapse_leaves (bool) – must match the same argument provided to HistoryDag.get_topologies() when creating the string passed as topology.

export_edge_probabilities()[source]: Return a dictionary keyed by (parent, child) HistoryDagNode pairs, with downward conditional edge probabilities as values.

get_probability_countfuncs(log_probabilities=False, edge_probabilities=None)[source]

Produce a historydag.utils.AddFuncDict() containing functions to compute history probabilities using e.g. HistoryDag.optimal_weight_annotate().

If no edge probabilities are provided, a method like HistoryDag.probability_annotate() should be called to set edge annotations correctly.

Parameters:

log_probabilities – If True, interpret all edge probabilities as log-probabilities
edge_probabilities – A dictionary containing conditional edge probabilities for each edge in the DAG. If not provided, edge probabilities are recovered from edge annotations.

Returns:

historydag.utils.AddFuncDict() containing functions to compute history probabilities using e.g. HistoryDag.optimal_weight_annotate()

sum_probability(log_probabilities=False, **kwargs)[source]

Compute the total probability of all histories in the DAG, using downward conditional edge probabilities.

Immediately after computing downward conditional probabilities, this should always return 1.

However, after trimming, this method returns the probability that a history in the trimmed DAG would be sampled from the original DAG.

Parameters:

log_probabilities – If True, interpret conditional edge probabilities as log-probabilities. In this case, the return value is a log-probability as well.
kwargs – The utils.AddFuncDict containing keyword arguments for counting probabilities returned from HistoryDag.get_probability_countfuncs(). If not provided, conditional edge probabilities annotated on the DAG will be used.

node_probabilities(log_probabilities=False, edge_weight_func=None, normalize_edgeweights=None, accum_func=None, aggregate_func=None, start_func=None, ua_node_val=None, collapse_key=None, adjust_func=None, **kwargs)[source]

Compute the probability of each node in the DAG.

Parameters:

log_probabilities – If True, all probabilities, and the values from edge_weight_func, will be treated as log values.
edge_weight_func – A function accepting a parent node and a child node and returning the weight associated to that edge. If not provided, it is assumed that correct edge probability annotations are already populated by a method such as HistoryDag.probability_annotate().
normalize_edgeweights – A function taking a list of weights and returning a normalized list of downward-conditional edge probabilities. The default is determined by log_probabilities.
accum_func – A function taking a list of probabilities for parts of a sub-history, and returning a probability for that sub-history. The default is determined by log_probabilities.
aggregate_func – A function taking a list of probabilities for alternative sub-histories, and returning the aggregated probability of all sub-histories. The default is determined by log_probabilities.
start_func – A function taking a leaf node and returning its starting weight. The default is determined by log_probabilities.
ua_node_val – The probability value for the UA node. If not provided, the default value is determined by log_probabilities.
collapse_key – A function accepting a HistoryDagNode and returning a key with respect to which node probabilities should be collapsed. The return type is the key type for the dictionary returned by this method. For example, to compute probabilities of each clade observed in the DAG, use collapse_key=HistoryDagNode.clade_union.
adjust_func (Optional[Callable[[HistoryDagNode, HistoryDagNode], float]]) – A function accepting an edge, and returning a factor by which to adjust confidence in the edge’s child node contributed by trees containing that edge.

Returns:

A dictionary keyed by HistoryDagNode objects (or the return values of collapse_key if provided) whose values are probabilities according to the distribution induced by downward-conditional edge probabilities in the DAG.

set_sample_mask(edge_selector, log_probabilities=False)[source]

Zero out edge weights for masked edges before calling HistoryDag.fast_sample(). This should be equivalent to passing the same edge_selector function to HistoryDag.sample().

Parameters:

edge_selector – A function accepting an edge (a tuple of HistoryDagNode objects) and returning True of False. An edge marked False will be ineligible for sampling, unless all other edges in the same edge set are also marked False.
log_probabilities – Since the mask is applied by modifying edge probabilities, one must specify whether those probabilities are on a log scale.

Take care to verify that you shouldn’t instead use HistoryDag.probability_annotate() with a choice of edge_weight_func that takes into account the masking preferences.

probability_annotate(edge_weight_func, log_probabilities=False, normalize_edgeweights=None, accum_func=None, aggregate_func=None, start_func=None, **kwargs)[source]

Uses the supplied edge weight function to compute conditional probabilities on edges.

Conditional probabilities are annotated on the DAG’s edges, so that future calls to e.g. HistoryDag.sample() use the probability distribution determined by them.

Parameters:

edge_weight_func – A function accepting a parent node and a child node and returning the weight associated to that edge.
log_probabilities – If True, all probabilities, and the values from edge_weight_func, will be treated as log values.
normalize_edgeweights – A function taking a list of weights and returning a normalized list of downward-conditional edge probabilities. The default is determined by log_probabilities.
accum_func – A function taking a list of probabilities for parts of a sub-history, and returning a probability for that sub-history. The default is determined by log_probabilities.
aggregate_func – A function taking a list of probabilities for alternative sub-histories, and returning the aggregated probability of all sub-histories. The default is determined by log_probabilities.
start_func – A function taking a leaf node and returning its starting weight. The default is determined by log_probabilities.

Returns:

The sum of un-normalized probabilities, according to the provided edge_weight_func. This value can be used to normalize history probabilities computed with the same edge_weight_func provided to this method (for example, weights returned by HistoryDag.weight_count()).

natural_distribution_annotate(log_probabilities=False)[source]

Set edge probabilities to 1/n, where n is the count of edges descending from the corresponding node-clade pair.

This induces the ‘natural’ distribution on histories, determined by the topology of the dag.

uniform_distribution_annotate(log_probabilities=False)[source]

Adjust edge probabilities so that the DAG expresses a uniform distribution on expressed trees.

The probability assigned to each edge below a clade is proportional to the number of subtrees possible below that edge.

make_uniform()[source]: Deprecated name for HistoryDag.uniform_distribution_annotate()

recompute_parents()[source]: Repopulate HistoryDagNode.parent attributes.

convert_to_collapsed()[source]

Rebuilds the DAG so that no edge connects two nodes with the same label, unless one is a leaf node.

The resulting DAG should express at least the collapsed histories present in the original.

add_node_at_all_possible_places(new_leaf_id, id_name='sequence')[source]

Inserts a sequence into the dag such that every tree in the dag now contains that new node.

This method adds the new node as a leaf node by connecting it as a child of every non-leaf node in the original dag. The resulting dag has one new node corresponding to the added sequence as well as copies of all internal nodes corresponding to parents (and more ancestral nodes) to the added sequence.

insert_node(new_leaf_id, id_name='sequence', edge_weight_func=<function TransitionModel.weighted_hamming_edge_weight.<locals>.edge_weight>)[source]

Inserts a sequence into the DAG.

Sequence will be inserted as a child of the dagnode(s) realizing the minimum overall distance between sequences, and also added to the dag as a child of other nodes in such a way as to guarantee that every tree in the DAG now contains the new sequence.

The choice of other nodes is computed by looking at the set of nodes that are incompatible with the first minimizing node. For a full description of this, see the docstring for the method-local function incompatible.

postorder_above(terminal_node, skip_ua_node=False, recompute_parents=True)[source]

Recursive postorder traversal of all ancestors of a (possibly internal) node. This traversal is postorder with respect to reversed edge directions. With respect to standard edge directions (pointing towards leaves), the traversal order guarantees that all of a node’s parents will be visited before the node itself.

Parameters:

terminal_node – The node whose ancestors should be included in the traversal. This must actually be a node in self, not simply compare equal to a node in self.
skip_ua_node – If True, the UA node will not be included in the traversal
recompute_parents – If False, node parent sets will not be recomputed. This makes many repeated calls to postorder_above much faster.

Returns:

Generator on nodes that lie on any path between node_as_leaf and UA node

postorder(include_root=True)[source]

Recursive postorder traversal of the history DAG.

Return type:: Generator[HistoryDagNode, None, None]
Returns:: Generator on nodes

preorder(skip_ua_node=False, skip_root=None)[source]

Recursive postorder traversal of the history DAG.

Careful! This is not guaranteed to visit a parent node before any of its children. for that, need reverse postorder traversal.

If skip_ua_node is passed, the universal ancestor node will be skipped. skip_root is provided as a backwards-compatible synonym of skip_ua_node.

Return type:: Generator[HistoryDagNode, None, None]
Returns:: Generator on nodes