historydag.dag
A module providing the class HistoryDag, and supporting functions.
Functions
|
A convenience function to print two histories as ascii art trees side- by-side. |
|
Convert |
|
Make a history DAG using a newick string. |
|
Build a tree-shaped |
|
|
|
Deprecated name for |
|
Deprecated name for |
|
Build a history DAG from a list of history DAGs which are histories. |
|
Build a history DAG from a list of newick strings. |
|
Take an iterable containing HistoryDagNodes, and build a HistoryDag from those nodes. |
|
Create a |
|
Produce a history DAG containing all binary topologies on the provided iterable of leaf labels. |
Classes
|
|
|
An object to represent a collection of internally labeled trees. |
|
|
Any class implementing a PreorderTreeBuilder interface can be used as a tree sample constructor in |
Exceptions
- historydag.dag.convert(dag, newclass)[source]
Convert
dag
to the HistoryDag subclassnewclass
.This is a wrapper for the
newclass.from_history_dag
method, which for most subclasses should be identical toHistoryDag.from_history_dag()
.
- class historydag.dag.PreorderTreeBuilder[source]
Any class implementing a PreorderTreeBuilder interface can be used as a tree sample constructor in
HistoryDag.fast_sample()
. Subclasses implementing this interface may implement an arbitrary constructor interface, as the user will be responsible for creating instances to be used for sampling. In addition, subclasses must implement the following methods:- add_node()
This method must accept a :class:HistoryDagNode object
dag_node
and, optionally a TreeBuilderNode instanceparent
, representing the parent node of the node to be added, and returns a TreeBuilderNode instance representing the added node in the sampled tree. TreeBuilderNode can be any type which is convenient for the internal implementation of the PreorderTreeBuilder subclass. This method can expect to be called on nodes in a sampled tree in a pre-ordering. A parent node will always be provided unless dag_node is the root node.
- get_finished_tree()
This method takes no arguments and returns the data defining the sampled tree, after any necessary clean-up or final tree construction steps. Its return value is the return value of
HistoryDag.fast_sample()
.
- class historydag.dag.EteTreeBuilder(name_func=lambda n: ..., features=[], feature_funcs={})[source]
- class historydag.dag.HistoryDag(dagroot, attr={})[source]
An object to represent a collection of internally labeled trees. A wrapper object to contain exposed HistoryDag methods and point to a HistoryDagNode root.
- Parameters:
dagroot (
HistoryDagNode
) – The root node of the history DAGattr (
Any
) – An attribute to contain data which will be preserved by copying (default and empty dict)
Subclassing HistoryDag: HistoryDag may be subclassed without overriding __init__, by defining a _required_label_fields class variable for any subclasses.
The value of _required_label_fields should be a dictionary keyed by label fields that are expected by methods of the subclass. Each dictionary entry shall be of the form required_field: [(from_fields, conversion_func), …], where the dict value is a list of tuples, with each conversion_func a function mapping HistoryDagNode`s to the value of that node label’s `required_field field, and from_fields a tuple containing all label fields expected by that function.
Keyword arguments passed to
HistoryDag.from_history_dag()
will be passed to conversion functions provided in the appropriate subclass’s _required_label_fields attribute. Be sure to document each subclass, including available conversion functions and their keywords, in each subclass’s docstring.- classmethod from_history_dag(dag, label_fields=None, **kwargs)[source]
Converts HistoryDag instances between subclasses of HistoryDag. No copy is performed, so the passed dag will in general be modified.
- Parameters:
dag (
HistoryDag
) – A HistoryDag (or subclass) instancelabel_fields (
Optional
[Sequence
[str
]]) – A list specifying the order of label fields in node labels on the resulting HistoryDagkwargs – Any additional arguments required for label conversions. For details, see the class docstring for the subclass into which the conversion is taking place.
- Returns:
The converted HistoryDag object, carrying the type from which this static method was called. After conversion to the new HistoryDag subclass
to_cls
, the following will be true about node labels:If passed
label_fields
is None, then existing label fields will be preserved, except that missing required label fields will be recovered if possible, and the existing label fields used to recover them will be omitted. Recovered label fields will appear before the existing label fields.If passed
label_fields
is not None, then it must include all fields expected in node labels in the converted history DAG object, otherwise an exception will be raised.Converted node label field order will match the order of passed
label_fields
.All label fields passed in
label_fields
will be included in converted node labels, if possible. Otherwise, an exception will be raised.
- get_histories_by_index(key_iterator, tree_builder_func=None)[source]
Retrieving a history by index is slow, since each retrieval requires running the
trim_optimal_weight
method on the entire DAG to populate node counts. This method instead runs that method a single time and yields a history for each index yielded bykey_iterator
.- Parameters:
key_iterator – An iterator on desired history indices. May be consumable, as it will only be used once.
tree_builder_func – A function accepting an index and returning a
PreorderTreeBuilder
instance to be used to build the history with that index. If None (default), then tree-shaped HistoryDag objects will be yielded usingPreorderHistoryBuilder
.
- trim_below_weight(max_weight, start_func=None, edge_weight_func=None, min_possible_weight=-inf)[source]
Trim the dag to contain at least all the histories within the specified weight range.
Supports totally ordered weights, accumulated by addition. A weight type must implement all ordering operators properly, as well as + and -, and addition and subtraction must respect the ordering. That is, if a < b, then a + c < b + c for any c (including negative c)
- get_histories()[source]
Return a generator containing all histories in the history DAG.
Note that each history is a tree-shaped history DAG, containing a UA node, which exists as a subgraph of the history DAG.
The order of these histories does not necessarily match the order of indexing. That is,
dag.get_histories()
andhistory for history in dag
will result in different orderings.get_histories
should be slightly faster, but possibly more memory intensive.- Return type:
- get_trees()[source]
Deprecated name for
get_histories()
- Return type:
- get_edges(skip_ua_node=False)[source]
Return a generator containing all edges in the history DAG, as parent, child node tuples.
Edges’ parent nodes will be in preorder.
- Return type:
Generator
[Tuple
[HistoryDagNode
,HistoryDagNode
],None
,None
]
- get_annotated_edges(skip_ua_node=False)[source]
Return a generator containing all edges in the history DAG, and their weights and downward conditional edge probabilities.
Yields ((parent, child), weight, probability) for each edge.
Edges’ parent nodes will be in preorder.
- Return type:
Generator
[Tuple
[HistoryDagNode
,HistoryDagNode
],None
,None
]
- num_edges(skip_ua_node=False)[source]
Return the number of edges in the DAG, including edges descending from the UA node, unless skip_ua_node is True.
- Return type:
- find_nodes(filter_func)[source]
Return a generator on (non-UA) nodes for which
filter_func
evaluates to True.- Return type:
- find_node(filter_func)[source]
Return the first (non-UA) node for which
filter_func
evaluates to True.- Return type:
- fast_sample(tree_builder=None, log_probabilities=False)[source]
This is a non-recursive alternative to
HistoryDag.sample()
, which is likely to be slower on small DAGs, but may allow significant optimizations on large DAGs, or in the case that the data format being sampled is something other than a tree-shaped HistoryDag object.This method does not provide an edge_selector argument like
HistoryDag.sample()
. Instead, any masking of edges should be done prior to sampling using theHistoryDag.set_sample_mask()
method, or by modifying the arguments toHistoryDag.probability_annotate()
.- Parameters:
tree_builder (
Optional
[PreorderTreeBuilder
]) – a PreorderTreeBuilder instance to handle construction of the sampled tree.log_probabilities – Whether edge probabilities annotated on this DAG (using, for example,
HistoryDag.probability_annotate()
) are on a log-scale.
- sample(edge_selector=lambda e: ..., log_probabilities=False)[source]
Samples a history from the history DAG. (A history is a sub-history DAG containing the root and all leaf nodes) For reproducibility, set
random.seed
before sampling.When there is an option, edges pointing to nodes on which edge_selector is True will always be chosen.
Returns a new HistoryDag object.
To use the more general sampling pattern which allows an arbitrary PreorderTreeBuilder object, use
HistoryDag.fast_sample()
instead.- Return type:
- nodes_above_node(node)[source]
Return a set of nodes from which the passed node is reachable along directed edges.
- Return type:
- sample_with_node(node)[source]
Samples a history which contains
node
from the history DAG.Sampling is likely unbiased from the distribution of trees in the DAG, conditioned on each sampled tree containing the passed node. However, if unbiased sampling from the conditional distribution is important, this should be tested.
- Return type:
- sample_with_edge(edge)[source]
Samples a history which contains
edge
(a tuple of HistoryDagNodes) from the history DAG.Sampling is likely unbiased from the distribution of trees in the DAG, conditioned on each sampled tree containing the passed edge. However, if unbiased sampling from the conditional distribution is important, this should be tested.
- Return type:
- iter_covering_histories(cover_edges=False)[source]
Samples a sequence of histories which together contain all nodes in the history DAG.
Histories are sampled using
sample_with_node()
, starting with the nodes which are contained in the fewest of the DAG’s histories. The sequence of trees is therefore non-deterministic unlessrandom.seed
is set.- Return type:
- unlabel()[source]
Sets all internal node labels to be identical, and merges nodes so that all histories in the DAG have unique topologies.
- Return type:
- relabel(relabel_func, relax_type=False)[source]
Return a new HistoryDag with labels modified according to a provided function.
- Parameters:
relabel_func (
Callable
[[HistoryDagNode
],Union
[NamedTuple
,UALabel
]]) – A function which takes a node and returns the new label appropriate for that node. The relabel_func should return a consistent NamedTuple type with name Label. That is, all returned labels should have matching _fields attribute. Method is only guaranteed to work when no two leaf nodes are mapped to the same new label. If this is not the case, this method may raise a warning or error, or may fail silently, returning an invalid HistoryDag.relax_type – Whether to require the returned HistoryDag to be of the same subclass as self. If True, the returned HistoryDag will be of the abstract type HistoryDag
- Return type:
- add_label_fields(new_field_names=[], new_field_values=lambda n: ...)[source]
Returns a copy of the DAG in which each node’s label is extended to include the new fields listed in new_field_names.
- Parameters:
new_field_names – A list of strings consisting of the names of the new fields to add.
new_field_values – A callable that takes a node and returns the ordered list of values for each new field name to assign to that node.
- remove_label_fields(fields_to_remove=[])[source]
Returns a copy of the DAG with the list of fields_to_remove dropped from each node’s label.
- Parameters:
fields_to_remove – A list of strings consisting of the names of the new fields to remove.
- update_label_fields(field_names, new_field_values)[source]
Changes label field values to values returned by the function new_field_values. This method is not in-place, but returns a new DAG.
- Parameters:
field_names – A list of strings containing names of label fields whose contents are to be modified
new_field_values – A function taking a node and returning a tuple of field values whose order matches field_names
- is_history()[source]
Returns whether history DAG is a history.
That is, each node-clade pair has exactly one descendant edge.
- Return type:
- is_clade_tree()[source]
Deprecated name for
is_history()
- Return type:
- copy()[source]
Uses bytestring serialization, and is guaranteed to copy: :rtype:
HistoryDag
node labels
node attr attributes
edge weights
edge probabilities
However, other object attributes will not be copied.
- history_intersect(reference_dag, key=lambda n: ...)[source]
Modify this HistoryDag to contain only the histories which are also contained in
reference_dag
.- Parameters:
reference_dag (
HistoryDag
) – The history DAG with which this one will be intersected.reference_dag
will not be modified.key – A function accepting a node and returning a value which will be used to compare nodes.
Count the histories which are also contained in
reference_dag
.- Parameters:
reference_dag (
HistoryDag
) – The history DAG with which this one will be intersected.reference_dag
will not be modified.key – A function accepting a node and returning a value which will be used to compare nodes.
- Return type:
- Returns:
The number of histories shared between this history DAG and the reference.
- add_all_allowed_edges(*args, **kwargs)[source]
Provided as a deprecated synonym for
make_complete()
.- Return type:
- make_complete(new_from_root=True, adjacent_labels=True, preserve_parent_labels=False)[source]
Add all allowed edges to the DAG in place.
- Parameters:
new_from_root (
bool
) – If False, no edges will be added that start at the DAG root. Useful when attempting to constrain root label.adjacent_labels (
bool
) – If False, no edges will be added between nodes with the same labels. Useful when attempting to maintain the history DAG in a ‘collapsed’ state.preserve_parent_labels (
bool
) – If True, ensures that for any edge added between a parent and child node, the parent node label was already among the original parent labels of the child node. This ensures that parsimony score is preserved.
- Return type:
- Returns:
The number of edges added to the history DAG
- to_newick(name_func=<function HistoryDag.<lambda>>, features=None, feature_funcs={})[source]
Converts a history to extended newick format. Supports arbitrary node names and a sequence feature. For use on a history DAG which is a history.
For extracting newick representations of trees in a general history DAG, see
HistoryDag.to_newicks()
.- Parameters:
name_func (
Callable
[[HistoryDagNode
],str
]) – A map from nodes to newick node namesfeatures (
Optional
[List
[str
]]) – A list of label field names to be included in extended newick data. If None, all label fields will be included. To include none of them, pass an empty list.feature_funcs (
Mapping
[str
,Callable
[[HistoryDagNode
],str
]]) – A dictionary keyed by extended newick field names, containing functions specifying how to populate that field for each node.
- Return type:
- Returns:
- A newick string. If features is an empty list, and feature_funcs is empty,
then this will be a standard newick string. Otherwise, it will have ete3’s extended newick format.
- to_ascii(name_func, **kwargs)[source]
A convenience function that uses the
to_ete()
method and ete3’s ASCII drawing tools to render a history.See
HistoryDagNode.to_ascii()
for details.
- to_ete(name_func=<function HistoryDag.<lambda>>, features=None, feature_funcs={})[source]
Convert a history DAG which is a history to an ete tree.
- Parameters:
name_func (
Callable
[[HistoryDagNode
],str
]) – A map from nodes to newick node namesfeatures (
Optional
[List
[str
]]) – A list of label field names to be included in extended newick data. If None, all label fields will be included. To include none of them, pass an empty list.feature_funcs (
Mapping
[str
,Callable
[[HistoryDagNode
],str
]]) – A dictionary keyed by extended newick field names, containing functions specifying how to populate that field for each node.
- Return type:
TreeNode
- Returns:
An ete3 Tree with the same topology as self, and node names and attributes as specified.
- to_graphviz(labelfunc=None, namedict={}, show_child_clades=True, show_partitions=None, level_leaves=False, graph_attr={}, node_attr={}, edge_attr={}, edge_attr_inheritance='none', show_edge_probs=False, show_edge_weights=False)[source]
Converts history DAG to graphviz (dot format) Digraph object.
- Parameters:
labelfunc (
Optional
[Callable
[[HistoryDagNode
],str
]]) – A function to label nodes. If None, nodes will be labeled by their DAG node labels, or their label hash if label data is too large.namedict (
Mapping
[Union
[NamedTuple
,UALabel
],str
]) – A dictionary from node labels to label strings. Labelfunc will be used instead, if both are provided.show_child_clades (
bool
) – Whether to include child clades in output.show_partitions (
Optional
[bool
]) – Deprecated alias for show_child_clades.level_leaves (
bool
) – Whether to draw leaves on the same level, or wherever they fall naturally.graph_attr (
dict
) – Additional graphviz graph attributes (see graphviz docs)node_attr (
dict
) – Additional graphviz node attributes (see graphviz docs)edge_attr (
dict
) – Additional graphviz edge attributes (see graphviz docs)edge_attr_inheritance (
str
) – “parent” to inherit from parent node, “child” to inherit from child, or “none”.show_edge_probs (
bool
) – whether to show edge probabilitiesshow_edge_weights (
bool
) – whether to show edge weights
- Return type:
Digraph
Notes
Graphviz dot format attributes are documented at https://graphviz.org/doc/info/attrs.html The graphviz attributes passed to this method are for the entire graph. Attributes for individual nodes can be included in individual node attr dictionaries under the key
gv_attrs
. For example,node.attr['gv_attrs'] = {'color': 'red'}
will color a node red in the graphviz output.
- internal_avg_parents()[source]
Returns the average number of parents among internal nodes.
A simple measure of similarity between the trees that the DAG expresses. However, keep in mind that two trees with the same topology but different labels would be considered entirely unalike by this measure.
- Return type:
- explode_nodes(expand_func=parsimony_utils.default_nt_transitions.ambiguity_map.get_sequence_resolution_func('sequence'), expand_node_func=None, expandable_func=None)[source]
Explode nodes according to a provided function. Adds copies of each node to the DAG with exploded labels, but with the same parents and children as the original node.
- Parameters:
expand_func (
Callable
[[Union
[NamedTuple
,UALabel
]],Iterable
[Union
[NamedTuple
,UALabel
]]]) – (Deprecated) A function that takes a node label, and returns an iterable containing ‘exploded’ or ‘disambiguated’ labels corresponding to the original. The wrapperutils.explode_label()
is provided to make such a function easy to write.expand_node_func (
Optional
[Callable
[[HistoryDagNode
],Iterable
[Union
[NamedTuple
,UALabel
]]]]) – A function that takes a node and returns an iterable containing ‘exploded’ or ‘disambiguated’ labels corresponding to the node. If provided, expand_func will be ignored.expandable_func (
Optional
[Callable
[[Union
[NamedTuple
,UALabel
]],bool
]]) – A function that takes a node label, and returns whether the iterable returned by calling expand_func on that label would contain more than one item.
- Return type:
- Returns:
The number of new nodes added to the history DAG.
- leaf_path_uncertainty_dag(terminal_node, node_data_func=lambda n: ...)[source]
Create a DAG of possible paths leading to terminal_node
- Parameters:
terminal_node – The returned path DAG will contain all paths from the UA node ending at this node.
node_data_func – A function accepting a HistoryDagNode and returning data for the corresponding node in the path dag. Return type must be a valid dictionary key.
- Returns:
A dictionary keyed by return values of node_data_func, with each value a dictionary keyed by child nodes, with edge supports as values.
- Return type:
child_dictionary
- leaf_path_uncertainty_graphviz(terminal_node, node_data_func=lambda n: ..., node_label_func=lambda n: ...)[source]
Create a graphviz DAG of possible paths leading to terminal_node
- Parameters:
terminal_node – The returned path DAG will contain all paths from the UA node ending at this node.
node_data_func – A function accepting a HistoryDagNode and returning data for the corresponding node in the path dag. Return type must be a valid dictionary key.
node_label_func – A function accepting an object of the type returned by node_data_func, and returning a label to be displayed on the corresponding node.
- label_uncertainty_summary()[source]
Print information about internal nodes which have the same child clades but different labels.
- postorder_history_accum(leaf_func, edge_func, accum_within_clade, accum_between_clade, accum_above_edge=None, compute_edge_probabilities=False, normalize_edgeweights=None)[source]
A template method for leaf-to-root dynamic programming.
Intermediate computations are stored in a _dp_data attribute on each node. Note that a Weight can be whatever you like, such as integers, Counters, strings, or dictionaries.
- Parameters:
leaf_func (
Callable
[[HistoryDagNode
],Any
]) – A function to assign weights to leaf nodesedge_func (
Callable
[[HistoryDagNode
,HistoryDagNode
],Any
]) – A function to assign weights to edges. The parent node will always be the first argument.accum_within_clade (
Callable
[[List
[Any
]],Any
]) – A function which accumulates a list of weights of subtrees below a single clade. That is, the weights are for alternative trees.accum_between_clade (
Callable
[[List
[Any
]],Any
]) – A function which accumulates a list of weights of subtrees below different clades. That is, the weights are for different parts of the same tree.accum_above_edge (
Optional
[Callable
[[Any
,Any
],Any
]]) – A function which adds the weight for a subtree to the weight of the edge above it. If None, this function will be inferred from accum_between_clade. The edge weight is the second argument.compute_edge_probabilities (
bool
) – If True, compute downward-conditional edge probabilities, proportional to aggregated subtree weights below and including each edge descending from a node-clade pair.
- Return type:
- Returns:
The resulting weight computed for the History DAG UA (root) node.
- postorder_cladetree_accum(*args, **kwargs)[source]
Deprecated name for
HistoryDag.postorder_history_accum()
- Return type:
- optimal_weight_annotate(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, **kwargs)[source]
A template method for finding the optimal tree weight in the DAG. Dynamically annotates each node in the DAG with the optimal weight of a clade sub-tree beneath it, so that the DAG root node is annotated with the optimal weight of a history in the DAG.
- Parameters:
start_func (
Optional
[Callable
[[HistoryDagNode
],Any
]]) – A function which assigns starting weights to leaves.edge_weight_func (
Optional
[Callable
[[HistoryDagNode
,HistoryDagNode
],Any
]]) – A function which assigns weights to DAG edges based on the parent node and the child node, in that order.accum_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A function which takes a list of weights of different parts of a tree, and returns a weight, like sum.optimal_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A function which takes a list of weights and returns the optimal one, like min.
- Return type:
- Returns:
The optimal weight of a tree under the DAG UA node.
- count_optimal_histories(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, eq_func=<function HistoryDag.<lambda>>, **kwargs)[source]
Count the number of histories which would be left if the DAG were trimmed.
That is, how many histories would be left if
HistoryDag.trim_optimal_weight()
were called with the same arguments?:param All arguments are the same as
HistoryDag.trim_optimal_weight()
.:- Returns:
A
utils.IntState
object containing the number of optimal histories in the DAG, withstate
attribute containing their (optimal) weight.As a side-effect, each node’s
_dp_data
attribute is populated with IntState objects containing the number of optimal sub-histories rooted at that node, and the weight of those sub-histories.
- sum_weights(edge_weight_func=None, **kwargs)[source]
For weights which are a sum over edges, compute the sum of all tree weights in the DAG.
- weight_count(start_func=None, edge_weight_func=None, accum_func=None, **kwargs)[source]
A template method for counting weights of trees expressed in the history DAG.
Weights must be hashable, but may otherwise be of arbitrary type.
Default arguments are contained in this HistoryDag subclass’s _default_args variable, and are documented in the subclass docstring.
- Parameters:
start_func (
Optional
[Callable
[[HistoryDagNode
],Any
]]) – A function which assigns a weight to each leaf nodeedge_weight_func (
Optional
[Callable
[[HistoryDagNode
,HistoryDagNode
],Any
]]) – A function which assigns a weight to pairs of labels, with the parent node label the first argumentaccum_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A way to ‘add’ a list of weights together
- Returns:
A Counter keyed by weights.
- weight_range_annotate(start_func=None, edge_weight_func=None, accum_func=None, min_func=<built-in function min>, max_func=<built-in function max>, **kwargs)[source]
Computes the minimum and maximum weight of any history in the history DAG.
As a side-effect, this method also stores in each node’s
_dp_data
attribute a tuple containing the minimum and maximum weights of any sub-history beneath that node.- Parameters:
start_func (
Optional
[Callable
[[HistoryDagNode
],Any
]]) – A function which assigns a weight to each leaf nodeedge__weight_func – A function which assigns a weight to pairs of labels, with the parent node label the first argument
accum_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A way to ‘add’ a list of weights togethermin_func (
Callable
[[List
[Any
]],Any
]) – A function which takes a list of weights and returns their “minimum”max_func (
Callable
[[List
[Any
]],Any
]) – A function which takes a list of weights and returns their “maximum”
- Returns:
A tuple containing the minimum and maximum weight of any history in the history DAG.
- hamming_parsimony_count()[source]
Deprecated in favor of
sequence_dag.SequenceHistoryDag.hamming_parsimony_count()
.
- to_newicks(**kwargs)[source]
Returns a list of extended newick strings formed with label fields.
Arguments are passed to
utils.make_newickcountfuncs()
. Arguments are the same as forhistorydag.HistoryDag.to_newick()
.
- count_topologies(collapse_leaves=False)[source]
Counts the number of unique topologies in the history DAG. This is achieved by counting the number of unique newick strings with only leaves labeled.
count_histories()
gives the total number of unique trees in the DAG, taking into account internal node labels.For large DAGs, this method is prohibitively slow. Use
count_topologies_fast()
instead.- Parameters:
collapse_leaves (
bool
) – By default, topologies are counted as-is in the DAG. However, even if the DAG is collapsed by label, edges above leaf nodes will not be collapsed. if collapse_leaves is True, then the number of unique topologies with all leaf-adjacent edges collapsed will be counted. Assumes that the DAG is collapsed withHistoryDag.convert_to_collapsed()
.- Return type:
- Returns:
The number of topologies in the history DAG
- count_topologies_fast()[source]
Counts the number of unique topologies in the history DAG.
This is achieved by creating a new history DAG in which all internal nodes have matching labels.
This is only guaranteed to match the output of
count_topologies
if the DAG has all allowed edges added.- Return type:
- count_trees(*args, **kwargs)[source]
Deprecated name for
count_histories()
- count_histories(expand_func=None, expand_count_func=lambda ls: ..., bifurcating=False)[source]
Annotates each node in the DAG with the number of clade sub-trees underneath.
- Parameters:
expand_func (
Optional
[Callable
[[Union
[NamedTuple
,UALabel
]],List
[Union
[NamedTuple
,UALabel
]]]]) – A function which takes a label and returns a list of labels, for example disambiguations of an ambiguous sequence. If provided, this method will count at least the number of histories that would be in the DAG, ifexplode_nodes()
were called with the same expand_func.expand_count_func (
Callable
[[Union
[NamedTuple
,UALabel
]],int
]) – A function which takes a label and returns an integer value corresponding to the number of ‘disambiguations’ of that label. If provided, expand_func will be used to find this value.bifurcating – If True, the number of bifurcating topologies possible below each node will be computed. This is only an underestimate of the true number, since nodes that would be created by adding all resolutions of multifurcating nodes may already be present, resulting in additional subtree swaps.
- Returns:
The total number of unique complete trees below the root node. If expand_func or expand_count_func is provided, the complete trees being counted are not guaranteed to be unique. If bifurcating is True, then the values stored in nodes’
_dp_data
attributes will include all resolutions of multifurcations below a node, but not of a node’s own multifurcation. To get the number of bifurcating subtrees below a node, one can use ``node._dp_data * utils.count_labeled_binary_topologies(len(node.clades)).
- preorder_history_accum(leaf_func, edge_func, accum_within_clade, accum_between_clade, ua_start_val, accum_above_edge=None)[source]
A template method for leaf-to-root and root-to-leaf dynamic programming.
- Parameters:
leaf_func (
Callable
[[HistoryDagNode
],Any
]) – A function to assign weights to leaf nodesedge_func (
Callable
[[HistoryDagNode
,HistoryDagNode
],Any
]) – A function to assign weights to edges. The parent node will always be the first argument.accum_within_clade (
Callable
[[List
[Any
]],Any
]) – A function which accumulates a list of weights of subtrees below a single clade. That is, the weights are for alternative trees.accum_between_clade (
Callable
[[List
[Any
]],Any
]) – A function which accumulates a list of weights of subtrees below different clades. That is, the weights are for different parts of the same tree.accum_above_edge (
Optional
[Callable
[[Any
,Any
],Any
]]) – A function which adds the weight for a subtree to the weight of the edge above it. If None, this function will be inferred from accum_between_clade. The edge weight is the second argument.
- Returns:
One describing downward weights below each node, and another describing upward weights above each node
- Return type:
Two dictionaries
- count_nodes(collapse=False, rooted=True)[source]
Counts the number of trees each node takes part in.
For node supports with respect to a uniform distribution on trees, use
HistoryDag.uniform_distribution_annotate()
andHistoryDag.node_probabilities()
.- Parameters:
collapse – A flag that when set to true, treats nodes as clade unions and ignores label information. Then, the returned dictionary is keyed by clade union sets.
rooted – A flag which is ignored unless
collapse
isTrue
. Whenrooted
is alsoFalse
, the returned dictionary is keyed by splits – that is, sets containing each clade union and its complement, with values the number of (rooted) trees in the DAG containing each split. Splits are not double-counted when a tree has a bifurcating root. If False, dag is expected to have trees all on the same set of leaf labels.
- Return type:
- Returns:
A dictionary mapping each node in the DAG to the number of trees that it takes part in.
- count_edges(collapsed=False)[source]
Counts the number of trees each edge takes part in.
- Return type:
- Returns:
A dictionary mapping each edge in the DAG to the number of trees that it takes part in.
- most_supported_trees()[source]
Trims the DAG to only express the trees that have the highest support.
- count_paths_to_leaf(leaf_label, expand_func=None, expand_count_func=lambda ls: ...)[source]
Annotates each node in the DAG with the number of paths to
leaf_label
underneath.- Parameters:
leaf_label – The label of the leaf node of interest
expand_func (
Optional
[Callable
[[Union
[NamedTuple
,UALabel
]],List
[Union
[NamedTuple
,UALabel
]]]]) – A function which takes a label and returns a list of labels, for example disambiguations of an ambiguous sequence. If provided, this method will count at least the number of histories that would be in the DAG, ifexplode_nodes()
were called with the same expand_func.expand_count_func (
Callable
[[Union
[NamedTuple
,UALabel
]],int
]) – A function which takes a label and returns an integer value corresponding to the number of ‘disambiguations’ of that label. If provided, expand_func will be used to find this value.
- Returns:
The total number of unique paths to the leaf node of interest. If expand_func or expand_count_func is provided, the paths being counted are not guaranteed to be unique.
- weight_counts_with_ambiguities(start_func=None, edge_func=None, accum_func=None, expand_func=None)[source]
Template method for counting tree weights in the DAG, with exploded labels. Like
HistoryDag.weight_count()
, but creates dictionaries of Counter objects at each node, keyed by possible sequences at that node. Analogous toHistoryDag.count_histories()
with expand_func provided.Weights must be hashable.
- Parameters:
start_func (
Optional
[Callable
[[HistoryDagNode
],Any
]]) – A function which assigns a weight to each leaf nodeedge_func (
Optional
[Callable
[[Union
[NamedTuple
,UALabel
],Union
[NamedTuple
,UALabel
]],Any
]]) – A function which assigns a weight to pairs of labels, with the parent node label the first argument. Must correctly handle the UA node label which is a UALabel instead of a namedtuple.accum_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A way to ‘add’ a list of weights togetherexpand_func (
Optional
[Callable
[[Union
[NamedTuple
,UALabel
]],Iterable
[Union
[NamedTuple
,UALabel
]]]]) – A function which takes a label and returns a list of labels, such as disambiguations of an ambiguous sequence.
- Returns:
A Counter keyed by weights. The total number of trees will be greater than count_histories(), as these are possible disambiguations of trees. These disambiguations may not be unique, but if two are the same, they come from different subtrees of the DAG.
- underestimate_rf_diameter()[source]
Returns an underestimate of the RF diameter of the DAG. This estimate is calculated by calculating the maximal sum RF distance between the DAG and a random tree from a topological outlier.
On a set of DAGs with 2000 or less histories, this underestimate is quite accurate compared to the actual computed RF diameter.
- overestimate_rf_diameter()[source]
Returns an overestimate of the RF diameter of the DAG. This estimate is calculated by calculating twice of the maximal sum RF distance between the DAG and a random tree from the median tree.
On a set of DAGs with 2000 or less histories, this underestimate was not close compared to the actual RF diameter. However, the overestimate was never more than twice of the actual RF diameter.
- optimal_sum_rf_distance(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]
Returns the optimal (min or max) summed rooted RF distance to all histories in the reference DAG.
The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use :meth:
optimal_weight_annotate
and :meth:utils.make_rfdistance_countfuncs
instead of making multiple calls to this method with the same reference history DAG.
- trim_optimal_sum_rf_distance(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]
Trims the DAG to contain only histories with the optimal (min or max) sum rooted RF distance to the given reference DAG.
See
utils.sum_rfdistance_funcs()
for detailed documentation of arguments.Trimming to the minimum sum RF distance is equivalent to finding ‘median’ topologies, and trimming to maximum sum rf distance is equivalent to finding topological outliers.
The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use :meth:
trim_optimal_weight
and :meth:utils.sum_rfdistance_funcs
instead of making multiple calls to this method with the same reference history.
- trim_optimal_rf_distance(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]
Trims this history DAG to the optimal (min or max) RF distance to a given history.
See
utils.make_rfdistance_countfuncs()
for detailed documentation of arguments.Also returns that optimal RF distance
The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use
optimal_weight_annotate()
andutils.make_rfdistance_countfuncs()
instead of making multiple calls to this method with the same reference history.
- optimal_rf_distance(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1), optimal_func=min)[source]
Returns the optimal (min or max) RF distance to a given history.
See
utils.make_rfdistance_countfuncs()
for detailed documentation of arguments.The given history must be on the same taxa as all trees in the DAG. Since computing reference splits is expensive, it is better to use
optimal_weight_annotate()
andutils.make_rfdistance_countfuncs()
instead of making multiple calls to this method with the same reference history.
- count_rf_distances(history, rooted=False, one_sided=None, one_sided_coefficients=(1, 1))[source]
Returns a Counter containing all RF distances to a given history.
The given history must be on the same taxa as all trees in the DAG.
See
utils.make_rfdistance_countfuncs()
for detailed documentation of arguments.Since computing reference splits is expensive, it is better to use
weight_count()
andutils.make_rfdistance_countfuncs()
instead of making multiple calls to this method with the same reference history.
- count_sum_rf_distances(reference_dag, rooted=True, one_sided=None, one_sided_coefficients=(1, 1))[source]
Returns a Counter containing all sum RF distances to a given reference DAG.
See
utils.sum_rfdistance_funcs()
for detailed documentation of arguments.The given history DAG must be on the same taxa as all trees in the DAG.
Since computing reference splits is expensive, it is better to use
weight_count()
andutils.sum_rfdistance_funcs()
instead of making multiple calls to this method with the same reference history DAG.
- sum_rf_distances(reference_dag=None, rooted=True, one_sided=None, one_sided_coefficients=(1, 1))[source]
Computes the sum of Robinson-Foulds distances over all pairs of histories in this DAG and the provided reference DAG.
- Parameters:
reference_dag (
Optional
[HistoryDag
]) – If None, the sum of pairwise distances between histories in this DAG is computed. If provided, the sum is over pairs containing one history in this DAG and one fromreference_dag
.rooted (
bool
) – If False, use edges’ splits for RF distance computation. Otherwise, use the clade below each edge.one_sided (
Optional
[str
]) – May be ‘left’, ‘right’, or None. ‘left’ means that we count splits (or clades, in the rooted case) which are in the reference trees but not in the DAG tree, especially useful if trees in the DAG might be resolutions of multifurcating trees in the reference DAG. ‘right’ means that we count splits or clades in the DAG tree which are not in the reference trees, useful if the reference trees are possibly resolutions of multifurcating trees in the DAG. If not None, one_sided_coefficients are ignored.one_sided_coefficients (
Tuple
[float
,float
]) – coefficients for non-standard symmetric difference calculations. Seeutils.make_rfdistance_countfuncs()
for more details.
- Returns:
An integer sum of RF distances.
If T is the set of histories in the reference DAG, and T’ is the set of histories in this DAG, then the returned sum is:
\[\sum_{t\in T} \sum_{t'\in T'} d(t, t')\]That is, since RF distance is symmetric, when T = T’ (such as when
reference_dag=None
), or when the intersection of T and T’ is nonempty, some distances are counted twice.Note that when computing one-sided distances, or when the one_sided_coefficients values are not equal, this ‘distance’ is no longer symmetric.
- average_pairwise_rf_distance(reference_dag=None, non_identical=True, **kwargs)[source]
Return the average Robinson-Foulds distance between pairs of histories.
- Parameters:
reference_dag (
Optional
[HistoryDag
]) – A history DAG from which to take the second history in each pair. If None,self
will be used as the reference.non_identical – If True, mean divisor will be the number of non-identical pairs.
kwargs – See
historydag.sum_rf_distances()
for additional keyword arguments
- Returns:
The average rf-distance between pairs of histories, where the first history comes from this DAG, and the second comes from
reference_dag
. The normalization constant is the product of the number of histories in the two DAGs, unlessnon_identical
is True, in which case the number of histories which appear in both DAGs is subtracted from this constant.
- trim_optimal_weight(start_func=None, edge_weight_func=None, accum_func=None, optimal_func=None, eq_func=<function HistoryDag.<lambda>>)[source]
Trims the DAG to only express trees with optimal weight. This is guaranteed to be possible when edge_weight_func depends only on the labels of an edge’s parent and child node.
Requires that weights are of a type that supports reliable equality testing. In particular, floats are not recommended. Instead, consider defining weights to be a precursor type, and define optimal_func to choose the one whose corresponding float is maximized/minimized.
If floats must be used, a Numpy type may help.
- Parameters:
start_func (
Optional
[Callable
[[HistoryDagNode
],Any
]]) – A function which assigns starting weights to leaves.edge_weight_func (
Optional
[Callable
[[HistoryDagNode
,HistoryDagNode
],Any
]]) – A function which assigns weights to DAG edges based on the parent node and the child node, in that order.accum_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A function which takes a list of weights of different parts of a tree, and returns a weight, like sum.optimal_func (
Optional
[Callable
[[List
[Any
]],Any
]]) – A function which takes a list of weights and returns the optimal one, like min.eq_func (
Callable
[[Any
,Any
],bool
]) – A function which tests equality, taking a pair of weights and returning a bool.
- Return type:
- get_topologies(collapse_leaves=False)[source]
Return a list of pseudo-newick representations of topologies in the history DAG.
The newicks returned are not well-formed, and are for use with
HistoryDag.trim_topology()
. Otherwise, this method would be equivalent toHistoryDag.to_newicks()
with keyword argumentsinternal_labels=False
andcollapsed_leaves
as desired.
- trim_topology(topology, collapse_leaves=False)[source]
Trims the history DAG to express only trees matching the provided topology.
- Parameters:
topology (
str
) – A string like one output byHistoryDag.get_topologies()
collapse_leaves (
bool
) – must match the same argument provided toHistoryDag.get_topologies()
when creating the string passed astopology
.
- export_edge_probabilities()[source]
Return a dictionary keyed by (parent, child)
HistoryDagNode
pairs, with downward conditional edge probabilities as values.
- get_probability_countfuncs(log_probabilities=False, edge_probabilities=None)[source]
Produce a
historydag.utils.AddFuncDict()
containing functions to compute history probabilities using e.g.HistoryDag.optimal_weight_annotate()
.If no edge probabilities are provided, a method like
HistoryDag.probability_annotate()
should be called to set edge annotations correctly.- Parameters:
log_probabilities – If True, interpret all edge probabilities as log-probabilities
edge_probabilities – A dictionary containing conditional edge probabilities for each edge in the DAG. If not provided, edge probabilities are recovered from edge annotations.
- Returns:
historydag.utils.AddFuncDict()
containing functions to compute history probabilities using e.g.HistoryDag.optimal_weight_annotate()
- sum_probability(log_probabilities=False, **kwargs)[source]
Compute the total probability of all histories in the DAG, using downward conditional edge probabilities.
Immediately after computing downward conditional probabilities, this should always return 1.
However, after trimming, this method returns the probability that a history in the trimmed DAG would be sampled from the original DAG.
- Parameters:
log_probabilities – If True, interpret conditional edge probabilities as log-probabilities. In this case, the return value is a log-probability as well.
kwargs – The
utils.AddFuncDict
containing keyword arguments for counting probabilities returned fromHistoryDag.get_probability_countfuncs()
. If not provided, conditional edge probabilities annotated on the DAG will be used.
- node_probabilities(log_probabilities=False, edge_weight_func=None, normalize_edgeweights=None, accum_func=None, aggregate_func=None, start_func=None, ua_node_val=None, collapse_key=None, adjust_func=None, **kwargs)[source]
Compute the probability of each node in the DAG.
- Parameters:
log_probabilities – If True, all probabilities, and the values from
edge_weight_func
, will be treated as log values.edge_weight_func – A function accepting a parent node and a child node and returning the weight associated to that edge. If not provided, it is assumed that correct edge probability annotations are already populated by a method such as
HistoryDag.probability_annotate()
.normalize_edgeweights – A function taking a list of weights and returning a normalized list of downward-conditional edge probabilities. The default is determined by
log_probabilities
.accum_func – A function taking a list of probabilities for parts of a sub-history, and returning a probability for that sub-history. The default is determined by
log_probabilities
.aggregate_func – A function taking a list of probabilities for alternative sub-histories, and returning the aggregated probability of all sub-histories. The default is determined by
log_probabilities
.start_func – A function taking a leaf node and returning its starting weight. The default is determined by
log_probabilities
.ua_node_val – The probability value for the UA node. If not provided, the default value is determined by
log_probabilities
.collapse_key – A function accepting a
HistoryDagNode
and returning a key with respect to which node probabilities should be collapsed. The return type is the key type for the dictionary returned by this method. For example, to compute probabilities of each clade observed in the DAG, usecollapse_key=HistoryDagNode.clade_union
.adjust_func (
Optional
[Callable
[[HistoryDagNode
,HistoryDagNode
],float
]]) – A function accepting an edge, and returning a factor by which to adjust confidence in the edge’s child node contributed by trees containing that edge.
- Returns:
A dictionary keyed by
HistoryDagNode
objects (or the return values ofcollapse_key
if provided) whose values are probabilities according to the distribution induced by downward-conditional edge probabilities in the DAG.
- set_sample_mask(edge_selector, log_probabilities=False)[source]
Zero out edge weights for masked edges before calling
HistoryDag.fast_sample()
. This should be equivalent to passing the same edge_selector function toHistoryDag.sample()
.- Parameters:
edge_selector – A function accepting an edge (a tuple of HistoryDagNode objects) and returning True of False. An edge marked False will be ineligible for sampling, unless all other edges in the same edge set are also marked False.
log_probabilities – Since the mask is applied by modifying edge probabilities, one must specify whether those probabilities are on a log scale.
Take care to verify that you shouldn’t instead use
HistoryDag.probability_annotate()
with a choice ofedge_weight_func
that takes into account the masking preferences.
- probability_annotate(edge_weight_func, log_probabilities=False, normalize_edgeweights=None, accum_func=None, aggregate_func=None, start_func=None, **kwargs)[source]
Uses the supplied edge weight function to compute conditional probabilities on edges.
Conditional probabilities are annotated on the DAG’s edges, so that future calls to e.g.
HistoryDag.sample()
use the probability distribution determined by them.- Parameters:
edge_weight_func – A function accepting a parent node and a child node and returning the weight associated to that edge.
log_probabilities – If True, all probabilities, and the values from
edge_weight_func
, will be treated as log values.normalize_edgeweights – A function taking a list of weights and returning a normalized list of downward-conditional edge probabilities. The default is determined by
log_probabilities
.accum_func – A function taking a list of probabilities for parts of a sub-history, and returning a probability for that sub-history. The default is determined by
log_probabilities
.aggregate_func – A function taking a list of probabilities for alternative sub-histories, and returning the aggregated probability of all sub-histories. The default is determined by
log_probabilities
.start_func – A function taking a leaf node and returning its starting weight. The default is determined by
log_probabilities
.
- Returns:
The sum of un-normalized probabilities, according to the provided edge_weight_func. This value can be used to normalize history probabilities computed with the same
edge_weight_func
provided to this method (for example, weights returned byHistoryDag.weight_count()
).
- natural_distribution_annotate(log_probabilities=False)[source]
Set edge probabilities to 1/n, where n is the count of edges descending from the corresponding node-clade pair.
This induces the ‘natural’ distribution on histories, determined by the topology of the dag.
- uniform_distribution_annotate(log_probabilities=False)[source]
Adjust edge probabilities so that the DAG expresses a uniform distribution on expressed trees.
The probability assigned to each edge below a clade is proportional to the number of subtrees possible below that edge.
- make_uniform()[source]
Deprecated name for
HistoryDag.uniform_distribution_annotate()
- convert_to_collapsed()[source]
Rebuilds the DAG so that no edge connects two nodes with the same label, unless one is a leaf node.
The resulting DAG should express at least the collapsed histories present in the original.
- add_node_at_all_possible_places(new_leaf_id, id_name='sequence')[source]
Inserts a sequence into the dag such that every tree in the dag now contains that new node.
This method adds the new node as a leaf node by connecting it as a child of every non-leaf node in the original dag. The resulting dag has one new node corresponding to the added sequence as well as copies of all internal nodes corresponding to parents (and more ancestral nodes) to the added sequence.
- insert_node(new_leaf_id, id_name='sequence', edge_weight_func=<function TransitionModel.weighted_hamming_edge_weight.<locals>.edge_weight>)[source]
Inserts a sequence into the DAG.
Sequence will be inserted as a child of the dagnode(s) realizing the minimum overall distance between sequences, and also added to the dag as a child of other nodes in such a way as to guarantee that every tree in the DAG now contains the new sequence.
The choice of other nodes is computed by looking at the set of nodes that are incompatible with the first minimizing node. For a full description of this, see the docstring for the method-local function
incompatible
.
- postorder_above(terminal_node, skip_ua_node=False, recompute_parents=True)[source]
Recursive postorder traversal of all ancestors of a (possibly internal) node. This traversal is postorder with respect to reversed edge directions. With respect to standard edge directions (pointing towards leaves), the traversal order guarantees that all of a node’s parents will be visited before the node itself.
- Parameters:
terminal_node – The node whose ancestors should be included in the traversal. This must actually be a node in self, not simply compare equal to a node in self.
skip_ua_node – If True, the UA node will not be included in the traversal
recompute_parents – If False, node parent sets will not be recomputed. This makes many repeated calls to postorder_above much faster.
- Returns:
Generator on nodes that lie on any path between node_as_leaf and UA node
- postorder(include_root=True)[source]
Recursive postorder traversal of the history DAG.
- Return type:
- Returns:
Generator on nodes
- preorder(skip_ua_node=False, skip_root=None)[source]
Recursive postorder traversal of the history DAG.
Careful! This is not guaranteed to visit a parent node before any of its children. for that, need reverse postorder traversal.
If skip_ua_node is passed, the universal ancestor node will be skipped. skip_root is provided as a backwards-compatible synonym of skip_ua_node.
- Return type:
- Returns:
Generator on nodes
- historydag.dag.from_tree(treeroot, label_features, label_functions={}, attr_func=lambda n: ..., child_node_func=lambda n: ..., leaf_node_func=ete3.TreeNode.get_leaves, edge_weight_func=lambda n: ...)[source]
Build a tree-shaped
historydag.HistoryDag()
(a ‘history’) object from the provided tree data.Default arguments are suitable for loading a
ete3.Tree
, but by providing appropriate child_node_func and leaf_node_func, any data structure implementing a tree can be used.- Parameters:
treeroot (
TreeNode
) – The root node of a tree to be converted to HistoryDag historylabel_features (
List
[str
]) – tree node attribute names to be used as HistoryDagNode label fields. Each attribute name must be accessible bygetattr(treenode, name)
. Field names provided in label_functions will take precedence.label_functions (
Mapping
[str
,Callable
[[TreeNode
],Any
]]) – dictionary keyed by additional label field names, containing functions mapping tree nodes to intended label field values.attr_func (
Callable
[[TreeNode
],Any
]) – function to populate HistoryDag node attr attribute, which is not used to distinguish nodes, and may be overwritten by attr of another node with the same label and child clades.child_node_func (
Callable
[[TreeNode
],Sequence
[TreeNode
]]) – function taking a tree node and returning an iterable containing the node’s children. By default, accesses node’s children attribute.leaf_node_func (
Callable
[[TreeNode
],Sequence
[TreeNode
]]) – function accepting a tree node and returning an iterable containing the leaf nodes accessible from that node.edge_weight_func (
Callable
[[TreeNode
],Any
]) – function accepting a tree node and returning the weight of that node’s parent edge.
- Return type:
- Returns:
HistoryDag object, which has the same topology as the input tree, with the required UA node added as a new root.
- historydag.dag.ascii_compare_histories(history1, history2, name_func, name_func2=None, **kwargs)[source]
A convenience function to print two histories as ascii art trees side- by-side.
Provided histories can be HistoryDag or HistoryDagNode objects, so all or part of two histories may be compared.
- Parameters:
history1 – The first history to compare. Will appear on the left
history2 – The second history to compare. Will appear on the right
name_func – A function mapping each HistoryDagNode to a node name string.
name_func2 – A different name_func to be used for history2. If not provided,
name_func
will be used.kwargs – This function also accepts all keyword arguments allowed by
HistoryDag.to_ascii()
- historydag.dag.history_dag_from_trees(treelist, label_features, **kwargs)[source]
Create a
historydag.HistoryDag
from a list of trees.Default arguments are suitable for loading lists of
ete3.Tree`s, but any tree data structure can be used by providing appropriate functions to `child_node_func
and leaf_node_func keyword arguments.- Parameters:
treelist (
List
[TreeNode
]) – List of root nodes of input trees.label_features (
List
[str
]) – tree node attribute names to be used as HistoryDagNode label fields. Each attribute name must be accessible bygetattr(treenode, name)
. Field names provided in label_functions keyword argument will take precedence.kwargs – Passed to
historydag.from_tree()
. See docstring for that method for argument details
- Returns:
historydag.HistoryDag
constructed from input trees.
- historydag.dag.history_dag_from_etes(*args, **kwargs)[source]
Deprecated name for
historydag.history_dag_from_trees()
- Return type:
- historydag.dag.from_newick(tree, label_features, label_functions={}, newick_format=8, attr_func=lambda n: ...)[source]
Make a history DAG using a newick string. Internally, utilizes newick parsing features provided by ete3, then calls
from_tree()
on the resulting ete3.Tree object.- Parameters:
tree (
str
) – newick string representation of a tree. May contain extended node data in ‘extended newick format’ used by ete3.label_features (
List
[str
]) – (passed tofrom_tree()
) list of features to be used as label fields in resulting history DAG. ‘name’ refers to the node name string in the standard newick format. See ete3 docs for more details.newick_format – ete3 format number of passed newick string. See ete3 docs for details.
label_functions (
Mapping
[str
,Callable
[[TreeNode
],Any
]]) – (passed tofrom_tree()
)attr_func (
Callable
[[TreeNode
],Any
]) – (passed tofrom_tree()
)
- Return type:
- Returns:
HistoryDag object, which has the same topology as the input newick tree, with the required UA node added as a new root.
- historydag.dag.history_dag_from_newicks(newicklist, label_features, label_functions={}, attr_func=lambda n: ..., newick_format=1)[source]
Build a history DAG from a list of newick strings.
See
from_newick()
for argument details.- Return type:
- historydag.dag.history_dag_from_histories(treelist)[source]
Build a history DAG from a list of history DAGs which are histories.
- Return type:
- historydag.dag.history_dag_from_clade_trees(*args, **kwargs)[source]
Deprecated name for
history_dag_from_histories()
- Return type: