torchdms.data.partition

torchdms.data.partition(aa_func_scores, per_stratum_variants_for_test, skip_stratum_if_count_is_smaller_than, export_dataframe, partition_label, train_on_all_single_mutants=False)[source]

Partition the data as needed and build a SplitDataframe.

A “stratum” is a slice of the data with a given number of mutations. We group training data sets into strata based on their number of mutations so that the data is presented the neural network with an even proportion of each.

Furthermore, we group data rows by unique variants and then split on those grouped items so that we don’t have the same variant showing up in train and test.