trojai.modelgen package¶
Subpackages¶
Submodules¶
trojai.modelgen.architecture_factory module¶
trojai.modelgen.config module¶
-
class
trojai.modelgen.config.
ConfigInterface
[source]¶ Bases:
abc.ABC
Defines the interface for all configuration objects
-
class
trojai.modelgen.config.
DefaultOptimizerConfig
(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None)[source]¶ Bases:
trojai.modelgen.config.OptimizerConfigInterface
Defines the configuration needed to setup the DefaultOptimizer
-
get_device_type
()[source]¶ Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string
-
-
class
trojai.modelgen.config.
DefaultSoftToHardFn
[source]¶ Bases:
object
The default conversion from soft-decision outputs to hard-decision
-
class
trojai.modelgen.config.
EarlyStoppingConfig
(num_epochs: int = 5, val_loss_eps: float = 0.001)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines configuration related to early stopping.
-
class
trojai.modelgen.config.
ModelGeneratorConfig
(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, model_save_dir: str, stats_save_dir: str, num_models: int, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel=False, amp=False, experiment_cfg: dict = None, run_ids: Union[Any, Sequence[Any]] = None, filenames: Union[str, Sequence[str]] = None, save_with_hash: bool = False)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Object used to configure the model generator
-
static
load
(fname: str)[source]¶ Loads a saved modelgen_cfg object from data that was saved using the .save() function. :param fname: the filename where the modelgen_cfg object is saved :return: a ModelGeneratorConfig object
-
static
-
class
trojai.modelgen.config.
ReportingConfig
(num_batches_per_logmsg: int = 100, disable_progress_bar: bool = False, num_epochs_per_metric: int = 1, num_batches_per_metrics: int = 50, tensorboard_output_dir: str = None, experiment_name: str = 'experiment')[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines all options to setup how data is reported back to the user while models are being trained
-
class
trojai.modelgen.config.
RunnerConfig
(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel: bool = False, amp: bool = False, model_save_dir: str = '/tmp/models', stats_save_dir: str = '/tmp/model_stats', model_save_format: str = 'pt', run_id: Any = None, filename: str = None, save_with_hash: bool = False)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Container for all parameters needed to use the Runner to train a model.
-
static
setup_optimizer_generator
(optimizer, data)[source]¶ Converts an optimizer specification to a generator, to be compatible with sequential training. :param optimizer: the optimizer to configure into a generator :param num_datasets: the number of datasets for which optimizers need to be created :return: A generator that returns optimizers for every dataset to be trained
-
static
-
class
trojai.modelgen.config.
TorchTextOptimizerConfig
(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None, copy_pretrained_embeddings: bool = False)[source]¶ Bases:
trojai.modelgen.config.OptimizerConfigInterface
Defines the configuration needed to setup the TorchTextOptimizer
-
get_device_type
()[source]¶ Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string
-
static
load
(fname)[source]¶ Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration
-
-
class
trojai.modelgen.config.
TrainingConfig
(device: Union[str, torch.device] = 'cpu', epochs: int = 10, batch_size: int = 32, lr: float = 0.0001, optim: Union[str, trojai.modelgen.optimizer_interface.OptimizerInterface] = 'adam', optim_kwargs: dict = None, objective: Union[str, Callable] = 'cross_entropy_loss', objective_kwargs: dict = None, save_best_model: bool = False, train_val_split: float = 0.05, val_data_transform: Callable[[Any], Any] = None, val_label_transform: Callable[[int], int] = None, val_dataloader_kwargs: dict = None, early_stopping: trojai.modelgen.config.EarlyStoppingConfig = None, soft_to_hard_fn: Callable = None, soft_to_hard_fn_kwargs: dict = None, lr_scheduler: Any = None, lr_scheduler_init_kwargs: dict = None, lr_scheduler_call_arg: Any = None, clip_grad: bool = False, clip_type: str = 'norm', clip_val: float = 1.0, clip_kwargs: dict = None, adv_training_eps: float = None, adv_training_iterations: int = None, adv_training_ratio: float = None)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines all required items to setup training with an optimizer
-
class
trojai.modelgen.config.
UGEConfig
(queues: Union[trojai.modelgen.config.UGEQueueConfig, Sequence[trojai.modelgen.config.UGEQueueConfig]], queue_distribution: Sequence[float] = None, multi_model_same_gpu: bool = False)[source]¶ Bases:
object
Defines a configuration for the UGE
-
class
trojai.modelgen.config.
UGEQueueConfig
(queue_name: str, gpu_enabled: bool, sync_mode: bool = False)[source]¶ Bases:
object
Defines the configuration for a Queue w.r.t. UGE in TrojAI
-
trojai.modelgen.config.
logger
= <Logger trojai.modelgen.config (WARNING)>¶ Defines all configurations pertinent to model generation.
-
trojai.modelgen.config.
modelgen_cfg_to_runner_cfg
(modelgen_cfg: trojai.modelgen.config.ModelGeneratorConfig, run_id=None, filename=None) → trojai.modelgen.config.RunnerConfig[source]¶ Convenience function which creates a RunnerConfig object, from a ModelGeneratorConfig object. :param modelgen_cfg: the ModelGeneratorConfig to convert :param run_id: run_id to be associated with the RunnerConfig :param filename: filename to be associated with the RunnerConfig :return: the created RunnerConfig object
trojai.modelgen.constants module¶
Defines valid devices on which models can be trained
-
trojai.modelgen.constants.
VALID_DEVICES
= ['cpu', 'cuda']¶ Defines valid loss functions which can be specified when configuring an optimizer implementing the OptimizerInterface
-
trojai.modelgen.constants.
VALID_LOSS_FUNCTIONS
= ['cross_entropy_loss', 'BCEWithLogitsLoss']¶ Defines valid optimization algorithms which can be specified when configuring an optimizer implementing the OptimizerInterface
-
trojai.modelgen.constants.
VALID_OPTIMIZERS
= ['adam', 'sgd', 'adamw']¶ Defines the valid types of data that the modelgen pipeline can handle
trojai.modelgen.data_configuration module¶
-
class
trojai.modelgen.data_configuration.
TextDataConfiguration
(max_vocab_size: int = 25000, embedding_dim: int = 100, embedding_type: str = 'glove', num_tokens_embedding_train: str = '6B', text_field_kwargs: dict = None, label_field_kwargs: dict = None)[source]¶
-
trojai.modelgen.data_configuration.
logger
= <Logger trojai.modelgen.data_configuration (WARNING)>¶ Configurations for various types of data
trojai.modelgen.data_descriptions module¶
File describes data description classes, which contain specific information that may be used in order to instantiate an architecture
-
class
trojai.modelgen.data_descriptions.
CSVImageDatasetDesc
(num_samples, shuffled, num_classes)[source]¶ Bases:
trojai.modelgen.data_descriptions.DataDescription
Information potentially relevant to instantiating models to process image data
-
class
trojai.modelgen.data_descriptions.
CSVTextDatasetDesc
(vocab_size, unk_idx, pad_idx)[source]¶ Bases:
trojai.modelgen.data_descriptions.DataDescription
Information potentially relevant to instantiating models to process text data
trojai.modelgen.data_manager module¶
-
class
trojai.modelgen.data_manager.
DataManager
(experiment_path: str, train_file: Union[str, Sequence[str]], clean_test_file: str, triggered_test_file: str = None, data_type: str = 'image', train_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, train_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, test_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, test_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, file_loader: Union[Callable[[str], Any], str] = 'default_image_loader', shuffle_train=True, shuffle_clean_test=False, shuffle_triggered_test=False, data_configuration: trojai.modelgen.data_configuration.DataConfiguration = None, custom_datasets: dict = None, train_dataloader_kwargs: dict = None, test_dataloader_kwargs: dict = None)[source]¶ Bases:
object
Manages data from an experiment from trojai.datagen.
-
load_data
()[source]¶ Load experiment data as given from initialization. :return: Objects containing training and test, and triggered data if it was provided.
- TODO:
[ ] - extend the text data-type to have more input arguments, for example the tokenizer and FIELD options [ ] - need to support sequential training for text datasets
-
trojai.modelgen.datasets module¶
-
class
trojai.modelgen.datasets.
CSVDataset
(path_to_data: str, csv_filename: str, true_label=False, path_to_csv=None, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶ Bases:
trojai.modelgen.datasets.DatasetInterface
Defines a dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVDataset can support any underlying data that can be loaded on the fly and fed into the model (for example: image data)
-
class
trojai.modelgen.datasets.
CSVTextDataset
(path_to_data: str, csv_filename: str, true_label: bool = False, text_field: torchtext.data.Field = None, text_field_kwargs: dict = None, label_field: torchtext.data.LabelField = None, label_field_kwargs: dict = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶ Bases:
torchtext.data.Dataset
,trojai.modelgen.datasets.DatasetInterface
Defines a text dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVTextDataset can support text data, and differs from the CSVDataset because it loads all the text data into memory and builds a vocabulary from it.
-
class
trojai.modelgen.datasets.
DatasetInterface
(path_to_data: str, *args, **kwargs)[source]¶ Bases:
torch.utils.data.Dataset
-
trojai.modelgen.datasets.
csv_dataset_from_df
(path_to_data, data_df, true_label=False, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶ Initializes a CSVDataset object from a DataFrame rather than a filepath. :param path_to_data: root folder where all the data is located :param data_df: the dataframe in which the data lives :param true_label: (bool) if True, then use the column “true_label” as the label associated with each datapoint. If False (default), use the column “train_label” as the label associated with each datapoint :param shuffle: if True, the dataset is shuffled before loading into the model :param random_state: if specified, seeds the random sampler when shuffling the data :param data_loader: either a string value (currently only supports default_image_loader), or a callable
function which takes a string input of the file path and returns the data
- Parameters
data_transform – a callable function which is applied to every data point before it is fed into the model. By default, this is an identity operation
label_transform – a callable function which is applied to every label before it is fed into the model. By default, this is an identity operation.
-
trojai.modelgen.datasets.
csv_textdataset_from_df
(data_df, true_label: bool = False, text_field: torchtext.data.Field = None, label_field: torchtext.data.LabelField = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶ Initializes a CSVDataset object from a DataFrame rather than a filepath. :param data_df: the dataframe in which the data lives :param true_label: if True, then use the column “true_label” as the label associated with each :param text_field: defines how the text data will be converted to
a Tensor. If none, a default will be provided and tokenized with spacy
- Parameters
label_field – defines how to process the label associated with the text
max_vocab_size – the maximum vocabulary size that will be built
shuffle – if True, the dataset is shuffled before loading into the model
random_state – if specified, seeds the random sampler when shuffling the data
kwargs – any additional keyword arguments, currently unused
-
trojai.modelgen.datasets.
logger
= <Logger trojai.modelgen.datasets (WARNING)>¶ Define some basic default functions for dataset defaults. These allow Dataset objects to be pickled; vs lambda functions.
trojai.modelgen.default_optimizer module¶
-
class
trojai.modelgen.default_optimizer.
DefaultOptimizer
(optimizer_cfg: trojai.modelgen.config.DefaultOptimizerConfig = None)[source]¶ Bases:
trojai.modelgen.optimizer_interface.OptimizerInterface
Defines the default optimizer which trains the models
-
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
static
load
(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶ Reconstructs a DefaultOptimizer, by loading the configuration used to construct the original DefaultOptimizer, and then creating a new DefaultOptimizer object from the saved configuration :param fname: The filename of the saved optimzier :return: a DefaultOptimizer object
-
save
(fname: str) → None[source]¶ Saves the configuration object used to construct the DefaultOptimizer. NOTE: because the DefaultOptimizer object itself is not persisted, but rather the
DefaultOptimizerConfig object, the state of the object is not persisted!
- Parameters
fname – the filename to save the DefaultOptimizer’s configuration.
- Returns
None
-
test
(net: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVDataset, triggered_data: trojai.modelgen.datasets.CSVDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None) → dict[source]¶ Test the trained network :param net: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see
DataManger.load_data for more information.
- Parameters
torch_dataloader_kwargs – any keyword arguments to pass directly to PyTorch’s DataLoader
- Returns
a dictionary of the statistics on the clean and triggered data (if applicable)
-
train
(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None, use_amp: bool = False) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the network. :param net: the network to train :param dataset: the dataset to train the network on :param torch_dataloader_kwargs: any additional kwargs to pass to PyTorch’s native DataLoader :param use_amp: if True, uses automated mixed precision for FP16 training. :return: the trained network, and a list of EpochStatistics objects which contain the statistics for training,
and the # of epochs on which the net was trained
-
train_epoch
(model: torch.nn.Module, train_loader: torch.utils.data.DataLoader, val_clean_loader: torch.utils.data.DataLoader, val_triggered_loader: torch.utils.data.DataLoader, epoch_num: int, use_amp: bool = False)[source]¶ Runs one epoch of training on the specified model
- Parameters
model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_clean_loader – a DataLoader object pointing to the validation dataset that is clean
val_triggered_loader – a DataLoader object pointing to the validation dataset that is triggered
epoch_num – the epoch number that is being trained
use_amp – if True use automated mixed precision for FP16 training.
- Returns
a list of statistics for batches where statistics were computed
-
-
trojai.modelgen.default_optimizer.
split_val_clean_trig
(val_dataset)[source]¶ Splits the validation dataset into clean and triggered.
- Parameters
val_dataset – the validation dataset to split
- Returns
A tuple of the clean & triggered validation dataset
-
trojai.modelgen.default_optimizer.
train_val_dataset_split
(dataset: torch.utils.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torch.utils.data.Dataset, torch.utils.data.Dataset)[source]¶ Splits a PyTorch dataset (of type: torch.utils.data.Dataset) into train/test TODO:
[ ] - specify random seed to torch splitter
- Parameters
dataset – the dataset to be split
split_amt – fraction specifying the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels
- Returns
a tuple of the train and validation datasets
trojai.modelgen.model_generator module¶
-
class
trojai.modelgen.model_generator.
ModelGenerator
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], *args, **kwargs)[source]¶ Bases:
trojai.modelgen.model_generator_interface.ModelGeneratorInterface
Generates models based on requested data and saves each to a file.
trojai.modelgen.model_generator_interface module¶
-
class
trojai.modelgen.model_generator_interface.
ModelGeneratorInterface
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]])[source]¶ Bases:
abc.ABC
Generates models based on requested data and saves each to a file.
-
trojai.modelgen.model_generator_interface.
validate_model_generator_interface_input
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]]) → None[source]¶ Validates a ModelGeneratorConfig :param configs: (ModelGeneratorConfig or sequence) configurations to be used for model generation :return None
trojai.modelgen.optimizer_interface module¶
-
class
trojai.modelgen.optimizer_interface.
OptimizerInterface
[source]¶ Bases:
abc.ABC
Object that performs training and testing of TrojAI models.
-
abstract
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
abstract
get_device_type
() → str[source]¶ Return a string representation of the type of device used by the optimizer to train the model.
-
abstract static
load
(fname: str)[source]¶ Load an optimizer from disk and return it :param fname: the filename where the optimizer is serialized :return: The loaded optimizer
-
abstract
save
(fname: str) → None[source]¶ Save the optimizer to a file :param fname - the filename to save the optimizer to
-
abstract
test
(model: torch.nn.Module, clean_test_data: torch.utils.data.Dataset, triggered_test_data: torch.utils.data.Dataset, clean_test_triggered_labels_data: torch.utils.data.Dataset, torch_dataloader_kwargs) → dict[source]¶ Perform whatever tests desired on the model with clean data and triggered data, return a dictionary of results. :param model: (torch.nn.Module) Trained Pytorch model :param clean_test_data: (CSVDataset) Object containing clean test data :param triggered_test_data: (CSVDataset or None) Object containing triggered test data, None if triggered data
was not provided for testing
- Parameters
clean_test_triggered_labels_data – triggered part of the training dataset but with correct labels; see DataManger.load_data for more information.
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class
- Returns
(dict) Dictionary of test accuracy results. Required key, value pairs are:
clean_accuracy: (float in [0, 1]) classification accuracy on clean data clean_n_total: (int) number of examples in clean test set
- The following keys are optional, but should be used if triggered test data was provided
triggered_accuracy: (float in [0, 1]) classification accuracy on triggered data triggered_n_total: (int) number of examples in triggered test set
NOTE: This list may be augmented in the future to allow for additional test data collection.
-
abstract
train
(model: torch.nn.Module, data: torch.utils.data.Dataset, progress_bar_disable: bool, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the given model using parameters in self.training_params :param model: (torch.nn.Module) The untrained Pytorch model :param data: (CSVDataset) Object containing training data, output 0 from TrojaiDataManager.load_data() :param progress_bar_disable: (bool) Don’t display the progress bar if True :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: (torch.nn.Module, EpochStatistics) trained model, a sequence of EpochStatistics objects (one for
each epoch), and the # of epochs with which the model was trained (useful for early stopping).
-
abstract
trojai.modelgen.runner module¶
trojai.modelgen.torchtext_optimizer module¶
-
class
trojai.modelgen.torchtext_optimizer.
TorchTextOptimizer
(optimizer_cfg: trojai.modelgen.config.TorchTextOptimizerConfig = None)[source]¶ Bases:
trojai.modelgen.optimizer_interface.OptimizerInterface
An optimizer for training and testing LSTM models. Currently in a prototype state.
-
convert_dataset_to_dataiterator
(dataset: trojai.modelgen.datasets.CSVTextDataset, batch_size: int = None) → torchtext.data.iterator.Iterator[source]¶
-
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
static
load
(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶ Reconstructs an TorchTextOptimizer, by loading the configuration used to construct the original TorchTextOptimizer, and then creating a new TorchTextOptimizer object from the saved configuration :param fname: The filename of the saved TorchTextOptimizer :return: an TorchTextOptimizer object
-
save
(fname: str) → None[source]¶ Saves the configuration object used to construct the TorchTextOptimizer. NOTE: because the TorchTextOptimizer object itself is not persisted, but rather the
TorchTextOptimizerConfig object, the state of the object does not persist!
- Parameters
fname – the filename to save the TorchTextOptimizer’s configuration.
-
test
(model: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVTextDataset, triggered_data: trojai.modelgen.datasets.CSVTextDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) → dict[source]¶ Test the trained network :param model: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see
DataManger.load_data for more information.
- Parameters
progress_bar_disable – if True, disables the progress bar
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class
- Returns
a dictionary of the statistics on the clean and triggered data (if applicable)
-
train
(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the network. :param net: the model to train :param dataset: the dataset to train the network on :param progress_bar_disable: if True, disables the progress bar :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: the trained network, list of EpochStatistics objects, and the # of epochs on which teh net was trained
-
train_epoch
(model: torch.nn.Module, train_loader: torchtext.data.iterator.Iterator, val_loader: torchtext.data.iterator.Iterator, epoch_num: int, progress_bar_disable: bool = False)[source]¶ Runs one epoch of training on the specified model
- Parameters
model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_loader – a DataLoader object pointing to the validation dataset
epoch_num – the epoch number that is being trained
progress_bar_disable – if True, disables the progress bar
- Returns
a list of statistics for batches where statistics were computed
-
static
train_val_dataset_split
(dataset: torchtext.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torchtext.data.Dataset, torchtext.data.Dataset)[source]¶ Splits a torchtext dataset (of type: torchtext.data.Dataset) into train/test. NOTE: although this has the same functionality as default_optimizer.train_val_dataset_split, it works with a
torchtext.data.Dataset object rather than torch.utils.data.Dataset.
- TODO:
[ ] - specify random seed to torch splitter
- Parameters
dataset – the dataset to be split
split_amt – fraction specificing the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels
- Returns
a tuple of the train and validation datasets
-
trojai.modelgen.training_statistics module¶
-
class
trojai.modelgen.training_statistics.
BatchStatistics
(batch_num: int, batch_train_accuracy: float, batch_train_loss: float)[source]¶ Bases:
object
Represents the statistics collected from training a batch NOTE: this is currently unused!
-
class
trojai.modelgen.training_statistics.
EpochStatistics
(epoch_num, training_stats=None, validation_stats=None, batch_training_stats=None)[source]¶ Bases:
object
Contains the statistics computed for an Epoch
-
class
trojai.modelgen.training_statistics.
EpochTrainStatistics
(train_acc: float, train_loss: float)[source]¶ Bases:
object
Defines the training statistics for one epoch of training
-
class
trojai.modelgen.training_statistics.
EpochValidationStatistics
(val_clean_acc, val_clean_loss, val_triggered_acc, val_triggered_loss)[source]¶ Bases:
object
Defines the validation statistics for one epoch of training
-
class
trojai.modelgen.training_statistics.
TrainingRunStatistics
[source]¶ Bases:
object
Contains the statistics computed for an entire training run, a sequence of epochs TODO:
[ ] - have another function which returns detailed statistics per epoch in an easily serialized manner
-
add_epoch
(epoch_stats: Union[trojai.modelgen.training_statistics.EpochStatistics, Sequence[trojai.modelgen.training_statistics.EpochStatistics]])[source]¶
-
autopopulate_final_summary_stats
()[source]¶ - Uses the information from the final epoch’s final batch to auto-populate the following statistics:
final_train_acc final_train_loss final_val_acc final_val_loss
-
-
trojai.modelgen.training_statistics.
logger
= <Logger trojai.modelgen.training_statistics (WARNING)>¶ Contains classes necessary for collecting statistics on the model during training
trojai.modelgen.uge_model_generator module¶
-
trojai.modelgen.uge_model_generator.
ALL_EXEC_PERMISSIONS
= 365¶ This file contains all the functionality needed to train models for a Univa Grid Engine (UGE) HPC cluster.
-
class
trojai.modelgen.uge_model_generator.
UGEModelGenerator
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], uge_config: trojai.modelgen.config.UGEConfig, working_directory: str = '/home/docs/uge_model_generator', validate_uge_dirs: bool = True)[source]¶ Bases:
trojai.modelgen.model_generator_interface.ModelGeneratorInterface
Class which generates models utilizing a Univa Grid Engine
-
expand_modelgen_configs_to_process
() → Sequence[trojai.modelgen.config.ModelGeneratorConfig][source]¶ Converts a sequence of ModelGeneratorConfig objects into another sequence of ModelGeneratorConfig objects such that each element in the sequence only creates one model. For example:
Input: cfgs = [cfg1->num_models=1, cfg2->num_models=2]. len(cfgs)=2 Output: cfgs = [cfg1->num_models=1, cfg2->num_models=1, cfg2->num_models=1]. len(cfgs)=3
- NOTE: This will lead to multiple configs pointing to the same data on disk. I’m not sure if
this is a problem for PyTorch or not, but this is something to investigate if unexpected results arise.
- Returns
expanded config configuration
-
get_queue_numjobs_assignment
() → Sequence[source]¶ Determine the number of jobs to give to each queue based on UGEConfig :return: a list of tuples, with each tuple containing the queue in index-0, and the number of jobs
assigned to that queue in index-1
-
trojai.modelgen.utils module¶
-
trojai.modelgen.utils.
clamp
(X, l, u, cuda=True)[source]¶ Clamps a tensor to lower bound l and upper bound u. :param X: the tensor to clamp. :param l: lower bound for the clamp. :param u: upper bound for the clamp. :param cuda: whether the tensor should be on the gpu.
-
trojai.modelgen.utils.
get_uniform_delta
(shape, eps, requires_grad=True)[source]¶ Generates a troch uniform random matrix of shape within +-eps. :param shape: the tensor shape to create. :param eps: the epsilon bounds 0+-eps for the uniform random tensor. :param requires_grad: whether the tensor requires a gradient.
-
trojai.modelgen.utils.
make_trojai_model_dict
(model)[source]¶ - Create a TrojAI approved dictionary specification of a PyTorch model for saving to a file. E.g. for a trained model
- ‘model’:
save_dict = make_trojai_model_dict(model) torch.save(save_dict, filename)
- Parameters
model – (torch.nn.Module) The desired model to be saved.
- Returns
(dict) dictionary containing TrojAI approved information about the model, which can also be used for later loading the model.
-
trojai.modelgen.utils.
resave_trojai_model_as_dict
(file, new_loc=None)[source]¶ - Load a fully serialized Pytorch model (i.e. whole model was saved instead of a specification) and save it as a
TrojAI style dictionary specification.
- Parameters
file – (str) Location of the file to re-save
new_loc – (str) Where to save the file if replacing the original is not desired