trojai.modelgen package¶

Subpackages¶

trojai.modelgen.architectures package

Submodules¶

trojai.modelgen.architecture_factory module¶

class trojai.modelgen.architecture_factory.ArchitectureFactory[source]¶

Bases: abc.ABC

Factory object that returns architectures (untrained models) for training.

abstract new_architecture(**kwargs) → torch.nn.Module[source]¶: Returns a new architecture (untrained model) :return: an untrained torch.nn.Module

trojai.modelgen.config module¶

class trojai.modelgen.config.ConfigInterface[source]¶

Bases: abc.ABC

Defines the interface for all configuration objects

class trojai.modelgen.config.DefaultOptimizerConfig(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None)[source]¶

Bases: trojai.modelgen.config.OptimizerConfigInterface

Defines the configuration needed to setup the DefaultOptimizer

get_device_type()[source]¶: Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string

static load(fname)[source]¶: Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration

save(fname)[source]¶: Saves the optimizer configuration to a file :param fname: the filename to save the config to :return: None

class trojai.modelgen.config.DefaultSoftToHardFn[source]¶

Bases: object

The default conversion from soft-decision outputs to hard-decision

class trojai.modelgen.config.EarlyStoppingConfig(num_epochs: int = 5, val_loss_eps: float = 0.001)[source]¶

Bases: trojai.modelgen.config.ConfigInterface

Defines configuration related to early stopping.

validate()[source]¶

class trojai.modelgen.config.ModelGeneratorConfig(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, model_save_dir: str, stats_save_dir: str, num_models: int, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel=False, amp=False, experiment_cfg: dict = None, run_ids: Union[Any, Sequence[Any]] = None, filenames: Union[str, Sequence[str]] = None, save_with_hash: bool = False)[source]¶

Bases: trojai.modelgen.config.ConfigInterface

Object used to configure the model generator

static load(fname: str)[source]¶: Loads a saved modelgen_cfg object from data that was saved using the .save() function. :param fname: the filename where the modelgen_cfg object is saved :return: a ModelGeneratorConfig object

save(fname: str)[source]¶: Saves the ModelGeneratorConfig object in two different parts. Every object within the config, except for the optimizer is saved in the .klass.save file, and the optimizer is saved separately. :param fname - the filename to save the configuration to :return: None

validate() → None[source]¶: Validate the input arguments to construct the object :return: None

class trojai.modelgen.config.OptimizerConfigInterface[source]¶

Bases: trojai.modelgen.config.ConfigInterface

abstract get_device_type()[source]¶

abstract static load(fname)[source]¶

save(fname)[source]¶

class trojai.modelgen.config.ReportingConfig(num_batches_per_logmsg: int = 100, disable_progress_bar: bool = False, num_epochs_per_metric: int = 1, num_batches_per_metrics: int = 50, tensorboard_output_dir: str = None, experiment_name: str = 'experiment')[source]¶

Bases: trojai.modelgen.config.ConfigInterface

Defines all options to setup how data is reported back to the user while models are being trained

validate()[source]¶

class trojai.modelgen.config.RunnerConfig(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel: bool = False, amp: bool = False, model_save_dir: str = '/tmp/models', stats_save_dir: str = '/tmp/model_stats', model_save_format: str = 'pt', run_id: Any = None, filename: str = None, save_with_hash: bool = False)[source]¶

Bases: trojai.modelgen.config.ConfigInterface

Container for all parameters needed to use the Runner to train a model.

static setup_optimizer_generator(optimizer, data)[source]¶: Converts an optimizer specification to a generator, to be compatible with sequential training. :param optimizer: the optimizer to configure into a generator :param num_datasets: the number of datasets for which optimizers need to be created :return: A generator that returns optimizers for every dataset to be trained

validate() → None[source]¶: Validate the RunnerConfig object :return: None

static validate_optimizer(optimizer, data)[source]¶: Validates an optimzer configuration :param optimizer: the optimizer/optimizer configuration to be validated :param data: the data to be optimized :return:

class trojai.modelgen.config.TorchTextOptimizerConfig(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None, copy_pretrained_embeddings: bool = False)[source]¶

Bases: trojai.modelgen.config.OptimizerConfigInterface

Defines the configuration needed to setup the TorchTextOptimizer

get_device_type()[source]¶: Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string

static load(fname)[source]¶: Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration

save(fname)[source]¶: Saves the optimizer configuration to a file :param fname: the filename to save the config to :return: None

validate()[source]¶

class trojai.modelgen.config.TrainingConfig(device: Union[str, torch.device] = 'cpu', epochs: int = 10, batch_size: int = 32, lr: float = 0.0001, optim: Union[str, trojai.modelgen.optimizer_interface.OptimizerInterface] = 'adam', optim_kwargs: dict = None, objective: Union[str, Callable] = 'cross_entropy_loss', objective_kwargs: dict = None, save_best_model: bool = False, train_val_split: float = 0.05, val_data_transform: Callable[[Any], Any] = None, val_label_transform: Callable[[int], int] = None, val_dataloader_kwargs: dict = None, early_stopping: trojai.modelgen.config.EarlyStoppingConfig = None, soft_to_hard_fn: Callable = None, soft_to_hard_fn_kwargs: dict = None, lr_scheduler: Any = None, lr_scheduler_init_kwargs: dict = None, lr_scheduler_call_arg: Any = None, clip_grad: bool = False, clip_type: str = 'norm', clip_val: float = 1.0, clip_kwargs: dict = None, adv_training_eps: float = None, adv_training_iterations: int = None, adv_training_ratio: float = None)[source]¶

Bases: trojai.modelgen.config.ConfigInterface

Defines all required items to setup training with an optimizer

get_cfg_as_dict()[source]¶: Returns a dictionary representation of the configuration :return: (dict) a dictionary

validate() → None[source]¶: Validate the object configuration :return: None

class trojai.modelgen.config.UGEConfig(queues: Union[trojai.modelgen.config.UGEQueueConfig, Sequence[trojai.modelgen.config.UGEQueueConfig]], queue_distribution: Sequence[float] = None, multi_model_same_gpu: bool = False)[source]¶

Bases: object

Defines a configuration for the UGE

validate()[source]¶: Validate the UGEConfig object

class trojai.modelgen.config.UGEQueueConfig(queue_name: str, gpu_enabled: bool, sync_mode: bool = False)[source]¶

Bases: object

Defines the configuration for a Queue w.r.t. UGE in TrojAI

validate() → None[source]¶: Validate the UGEQueueConfig object

trojai.modelgen.config.identity_function(x)[source]¶

trojai.modelgen.config.logger = <Logger trojai.modelgen.config (WARNING)>¶: Defines all configurations pertinent to model generation.

trojai.modelgen.config.modelgen_cfg_to_runner_cfg(modelgen_cfg: trojai.modelgen.config.ModelGeneratorConfig, run_id=None, filename=None) → trojai.modelgen.config.RunnerConfig[source]¶: Convenience function which creates a RunnerConfig object, from a ModelGeneratorConfig object. :param modelgen_cfg: the ModelGeneratorConfig to convert :param run_id: run_id to be associated with the RunnerConfig :param filename: filename to be associated with the RunnerConfig :return: the created RunnerConfig object

trojai.modelgen.constants module¶

Defines valid devices on which models can be trained

trojai.modelgen.constants.VALID_DEVICES = ['cpu', 'cuda']¶: Defines valid loss functions which can be specified when configuring an optimizer implementing the OptimizerInterface

trojai.modelgen.constants.VALID_LOSS_FUNCTIONS = ['cross_entropy_loss', 'BCEWithLogitsLoss']¶: Defines valid optimization algorithms which can be specified when configuring an optimizer implementing the OptimizerInterface

trojai.modelgen.constants.VALID_OPTIMIZERS = ['adam', 'sgd', 'adamw']¶: Defines the valid types of data that the modelgen pipeline can handle

trojai.modelgen.data_configuration module¶

class trojai.modelgen.data_configuration.DataConfiguration[source]¶: Bases: object

class trojai.modelgen.data_configuration.ImageDataConfiguration[source]¶: Bases: trojai.modelgen.data_configuration.DataConfiguration

class trojai.modelgen.data_configuration.TextDataConfiguration(max_vocab_size: int = 25000, embedding_dim: int = 100, embedding_type: str = 'glove', num_tokens_embedding_train: str = '6B', text_field_kwargs: dict = None, label_field_kwargs: dict = None)[source]¶

Bases: trojai.modelgen.data_configuration.DataConfiguration

set_embedding_vectors_cfg()[source]¶

validate()[source]¶

trojai.modelgen.data_configuration.logger = <Logger trojai.modelgen.data_configuration (WARNING)>¶: Configurations for various types of data

trojai.modelgen.data_descriptions module¶

File describes data description classes, which contain specific information that may be used in order to instantiate an architecture

class trojai.modelgen.data_descriptions.CSVImageDatasetDesc(num_samples, shuffled, num_classes)[source]¶

Bases: trojai.modelgen.data_descriptions.DataDescription

Information potentially relevant to instantiating models to process image data

class trojai.modelgen.data_descriptions.CSVTextDatasetDesc(vocab_size, unk_idx, pad_idx)[source]¶

Bases: trojai.modelgen.data_descriptions.DataDescription

Information potentially relevant to instantiating models to process text data

class trojai.modelgen.data_descriptions.DataDescription[source]¶

Bases: object

Generic Data Description class from which all specific data type data descriptors

trojai.modelgen.data_manager module¶

class trojai.modelgen.data_manager.DataManager(experiment_path: str, train_file: Union[str, Sequence[str]], clean_test_file: str, triggered_test_file: str = None, data_type: str = 'image', train_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, train_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, test_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, test_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, file_loader: Union[Callable[[str], Any], str] = 'default_image_loader', shuffle_train=True, shuffle_clean_test=False, shuffle_triggered_test=False, data_configuration: trojai.modelgen.data_configuration.DataConfiguration = None, custom_datasets: dict = None, train_dataloader_kwargs: dict = None, test_dataloader_kwargs: dict = None)[source]¶

Bases: object

Manages data from an experiment from trojai.datagen.

load_data()[source]¶

Load experiment data as given from initialization. :return: Objects containing training and test, and triggered data if it was provided.

TODO:: [ ] - extend the text data-type to have more input arguments, for example the tokenizer and FIELD options [ ] - need to support sequential training for text datasets

validate() → None[source]¶

Validate the construction of the TrojaiDataManager object :return: None

TODO:

[ ] - think about whether the contents of the files passed into the DataManager should be validated,: in addition to simply checking for existence, which is what is done now

trojai.modelgen.datasets module¶

class trojai.modelgen.datasets.CSVDataset(path_to_data: str, csv_filename: str, true_label=False, path_to_csv=None, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶

Bases: trojai.modelgen.datasets.DatasetInterface

Defines a dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVDataset can support any underlying data that can be loaded on the fly and fed into the model (for example: image data)

get_data_description()[source]¶

set_data_description()[source]¶

class trojai.modelgen.datasets.CSVTextDataset(path_to_data: str, csv_filename: str, true_label: bool = False, text_field: torchtext.data.Field = None, text_field_kwargs: dict = None, label_field: torchtext.data.LabelField = None, label_field_kwargs: dict = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶

Bases: torchtext.data.Dataset, trojai.modelgen.datasets.DatasetInterface

Defines a text dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVTextDataset can support text data, and differs from the CSVDataset because it loads all the text data into memory and builds a vocabulary from it.

build_vocab(embedding_vectors_cfg, max_vocab_size, use_vocab=True)[source]¶

get_data_description()[source]¶

set_data_description()[source]¶

static sort_key(ex)[source]¶

class trojai.modelgen.datasets.DatasetInterface(path_to_data: str, *args, **kwargs)[source]¶

Bases: torch.utils.data.Dataset

abstract get_data_description()[source]¶

abstract set_data_description()[source]¶

trojai.modelgen.datasets.csv_dataset_from_df(path_to_data, data_df, true_label=False, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶

Initializes a CSVDataset object from a DataFrame rather than a filepath. :param path_to_data: root folder where all the data is located :param data_df: the dataframe in which the data lives :param true_label: (bool) if True, then use the column “true_label” as the label associated with each datapoint. If False (default), use the column “train_label” as the label associated with each datapoint :param shuffle: if True, the dataset is shuffled before loading into the model :param random_state: if specified, seeds the random sampler when shuffling the data :param data_loader: either a string value (currently only supports default_image_loader), or a callable

function which takes a string input of the file path and returns the data

Parameters

data_transform – a callable function which is applied to every data point before it is fed into the model. By default, this is an identity operation
label_transform – a callable function which is applied to every label before it is fed into the model. By default, this is an identity operation.

trojai.modelgen.datasets.csv_textdataset_from_df(data_df, true_label: bool = False, text_field: torchtext.data.Field = None, label_field: torchtext.data.LabelField = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶

Initializes a CSVDataset object from a DataFrame rather than a filepath. :param data_df: the dataframe in which the data lives :param true_label: if True, then use the column “true_label” as the label associated with each :param text_field: defines how the text data will be converted to

a Tensor. If none, a default will be provided and tokenized with spacy

Parameters

label_field – defines how to process the label associated with the text
max_vocab_size – the maximum vocabulary size that will be built
shuffle – if True, the dataset is shuffled before loading into the model
random_state – if specified, seeds the random sampler when shuffling the data
kwargs – any additional keyword arguments, currently unused

trojai.modelgen.datasets.default_image_file_loader(img_loc)[source]¶

trojai.modelgen.datasets.identity_transform(x)[source]¶

trojai.modelgen.datasets.logger = <Logger trojai.modelgen.datasets (WARNING)>¶: Define some basic default functions for dataset defaults. These allow Dataset objects to be pickled; vs lambda functions.

trojai.modelgen.default_optimizer module¶

class trojai.modelgen.default_optimizer.DefaultOptimizer(optimizer_cfg: trojai.modelgen.config.DefaultOptimizerConfig = None)[source]¶

Bases: trojai.modelgen.optimizer_interface.OptimizerInterface

Defines the default optimizer which trains the models

get_cfg_as_dict() → dict[source]¶: Return a dictionary with key/value pairs that describe the parameters used to train the model.

get_device_type() → str[source]¶

Returns: a string representing the device used to train the model

static load(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶: Reconstructs a DefaultOptimizer, by loading the configuration used to construct the original DefaultOptimizer, and then creating a new DefaultOptimizer object from the saved configuration :param fname: The filename of the saved optimzier :return: a DefaultOptimizer object

save(fname: str) → None[source]¶

Saves the configuration object used to construct the DefaultOptimizer. NOTE: because the DefaultOptimizer object itself is not persisted, but rather the

DefaultOptimizerConfig object, the state of the object is not persisted!

Parameters: fname – the filename to save the DefaultOptimizer’s configuration.
Returns: None

test(net: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVDataset, triggered_data: trojai.modelgen.datasets.CSVDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None) → dict[source]¶

Test the trained network :param net: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see

DataManger.load_data for more information.

Parameters: torch_dataloader_kwargs – any keyword arguments to pass directly to PyTorch’s DataLoader
Returns: a dictionary of the statistics on the clean and triggered data (if applicable)

train(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None, use_amp: bool = False) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶: Train the network. :param net: the network to train :param dataset: the dataset to train the network on :param torch_dataloader_kwargs: any additional kwargs to pass to PyTorch’s native DataLoader :param use_amp: if True, uses automated mixed precision for FP16 training. :return: the trained network, and a list of EpochStatistics objects which contain the statistics for training,

and the # of epochs on which the net was trained

train_epoch(model: torch.nn.Module, train_loader: torch.utils.data.DataLoader, val_clean_loader: torch.utils.data.DataLoader, val_triggered_loader: torch.utils.data.DataLoader, epoch_num: int, use_amp: bool = False)[source]¶

Runs one epoch of training on the specified model

Parameters

model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_clean_loader – a DataLoader object pointing to the validation dataset that is clean
val_triggered_loader – a DataLoader object pointing to the validation dataset that is triggered
epoch_num – the epoch number that is being trained
use_amp – if True use automated mixed precision for FP16 training.

Returns

a list of statistics for batches where statistics were computed

trojai.modelgen.default_optimizer.split_val_clean_trig(val_dataset)[source]¶

Splits the validation dataset into clean and triggered.

Parameters: val_dataset – the validation dataset to split
Returns: A tuple of the clean & triggered validation dataset

trojai.modelgen.default_optimizer.train_val_dataset_split(dataset: torch.utils.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torch.utils.data.Dataset, torch.utils.data.Dataset)[source]¶

Splits a PyTorch dataset (of type: torch.utils.data.Dataset) into train/test TODO:

[ ] - specify random seed to torch splitter

Parameters

dataset – the dataset to be split
split_amt – fraction specifying the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels

Returns

a tuple of the train and validation datasets

trojai.modelgen.model_generator module¶

class trojai.modelgen.model_generator.ModelGenerator(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], *args, **kwargs)[source]¶

Bases: trojai.modelgen.model_generator_interface.ModelGeneratorInterface

Generates models based on requested data and saves each to a file.

run(*args, **kwargs) → None[source]¶: Train and save models as specified. :return: None

validate() → None[source]¶: Validate the provided input when constructing the ModelGenerator interface

trojai.modelgen.model_generator_interface module¶

class trojai.modelgen.model_generator_interface.ModelGeneratorInterface(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]])[source]¶

Bases: abc.ABC

Generates models based on requested data and saves each to a file.

abstract run() → None[source]¶: Train and save models as specified. :return: None

trojai.modelgen.model_generator_interface.validate_model_generator_interface_input(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]]) → None[source]¶: Validates a ModelGeneratorConfig :param configs: (ModelGeneratorConfig or sequence) configurations to be used for model generation :return None

trojai.modelgen.optimizer_interface module¶

class trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶

Bases: abc.ABC

Object that performs training and testing of TrojAI models.

abstract get_cfg_as_dict() → dict[source]¶: Return a dictionary with key/value pairs that describe the parameters used to train the model.

abstract get_device_type() → str[source]¶: Return a string representation of the type of device used by the optimizer to train the model.

abstract static load(fname: str)[source]¶: Load an optimizer from disk and return it :param fname: the filename where the optimizer is serialized :return: The loaded optimizer

abstract save(fname: str) → None[source]¶: Save the optimizer to a file :param fname - the filename to save the optimizer to

abstract test(model: torch.nn.Module, clean_test_data: torch.utils.data.Dataset, triggered_test_data: torch.utils.data.Dataset, clean_test_triggered_labels_data: torch.utils.data.Dataset, torch_dataloader_kwargs) → dict[source]¶

Perform whatever tests desired on the model with clean data and triggered data, return a dictionary of results. :param model: (torch.nn.Module) Trained Pytorch model :param clean_test_data: (CSVDataset) Object containing clean test data :param triggered_test_data: (CSVDataset or None) Object containing triggered test data, None if triggered data

was not provided for testing

Parameters

clean_test_triggered_labels_data – triggered part of the training dataset but with correct labels; see DataManger.load_data for more information.
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class

Returns

(dict) Dictionary of test accuracy results. Required key, value pairs are:

clean_accuracy: (float in [0, 1]) classification accuracy on clean data clean_n_total: (int) number of examples in clean test set

The following keys are optional, but should be used if triggered test data was provided: triggered_accuracy: (float in [0, 1]) classification accuracy on triggered data triggered_n_total: (int) number of examples in triggered test set

NOTE: This list may be augmented in the future to allow for additional test data collection.

abstract train(model: torch.nn.Module, data: torch.utils.data.Dataset, progress_bar_disable: bool, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶: Train the given model using parameters in self.training_params :param model: (torch.nn.Module) The untrained Pytorch model :param data: (CSVDataset) Object containing training data, output 0 from TrojaiDataManager.load_data() :param progress_bar_disable: (bool) Don’t display the progress bar if True :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: (torch.nn.Module, EpochStatistics) trained model, a sequence of EpochStatistics objects (one for

each epoch), and the # of epochs with which the model was trained (useful for early stopping).

trojai.modelgen.runner module¶

class trojai.modelgen.runner.Runner(runner_cfg: trojai.modelgen.config.RunnerConfig, persist_metadata: dict = None)[source]¶

Bases: object

Fundamental unit of model generation, which trains a model as specified in a RunnerConfig object.

run() → None[source]¶: Trains a model and saves it and the associated model statistics

trojai.modelgen.runner.add_numerical_extension(path, filename)[source]¶

trojai.modelgen.runner.try_force_json(x)[source]¶: Tries to make a value JSON serializable

trojai.modelgen.runner.try_serialize(d, u)[source]¶

trojai.modelgen.torchtext_optimizer module¶

class trojai.modelgen.torchtext_optimizer.TorchTextOptimizer(optimizer_cfg: trojai.modelgen.config.TorchTextOptimizerConfig = None)[source]¶

Bases: trojai.modelgen.optimizer_interface.OptimizerInterface

An optimizer for training and testing LSTM models. Currently in a prototype state.

convert_dataset_to_dataiterator(dataset: trojai.modelgen.datasets.CSVTextDataset, batch_size: int = None) → torchtext.data.iterator.Iterator[source]¶

get_cfg_as_dict() → dict[source]¶: Return a dictionary with key/value pairs that describe the parameters used to train the model.

get_device_type() → str[source]¶

Returns: a string representing the device used to train the model

static load(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶: Reconstructs an TorchTextOptimizer, by loading the configuration used to construct the original TorchTextOptimizer, and then creating a new TorchTextOptimizer object from the saved configuration :param fname: The filename of the saved TorchTextOptimizer :return: an TorchTextOptimizer object

save(fname: str) → None[source]¶

Saves the configuration object used to construct the TorchTextOptimizer. NOTE: because the TorchTextOptimizer object itself is not persisted, but rather the

TorchTextOptimizerConfig object, the state of the object does not persist!

Parameters: fname – the filename to save the TorchTextOptimizer’s configuration.

test(model: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVTextDataset, triggered_data: trojai.modelgen.datasets.CSVTextDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) → dict[source]¶

Test the trained network :param model: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see

DataManger.load_data for more information.

Parameters

progress_bar_disable – if True, disables the progress bar
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class

Returns

a dictionary of the statistics on the clean and triggered data (if applicable)

train(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶: Train the network. :param net: the model to train :param dataset: the dataset to train the network on :param progress_bar_disable: if True, disables the progress bar :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: the trained network, list of EpochStatistics objects, and the # of epochs on which teh net was trained

train_epoch(model: torch.nn.Module, train_loader: torchtext.data.iterator.Iterator, val_loader: torchtext.data.iterator.Iterator, epoch_num: int, progress_bar_disable: bool = False)[source]¶

Runs one epoch of training on the specified model

Parameters

model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_loader – a DataLoader object pointing to the validation dataset
epoch_num – the epoch number that is being trained
progress_bar_disable – if True, disables the progress bar

Returns

a list of statistics for batches where statistics were computed

static train_val_dataset_split(dataset: torchtext.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torchtext.data.Dataset, torchtext.data.Dataset)[source]¶

Splits a torchtext dataset (of type: torchtext.data.Dataset) into train/test. NOTE: although this has the same functionality as default_optimizer.train_val_dataset_split, it works with a

torchtext.data.Dataset object rather than torch.utils.data.Dataset.

TODO:: [ ] - specify random seed to torch splitter

Parameters

dataset – the dataset to be split
split_amt – fraction specificing the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels

Returns

a tuple of the train and validation datasets

trojai.modelgen.training_statistics module¶

class trojai.modelgen.training_statistics.BatchStatistics(batch_num: int, batch_train_accuracy: float, batch_train_loss: float)[source]¶

Bases: object

Represents the statistics collected from training a batch NOTE: this is currently unused!

get_batch_num()[source]¶

get_batch_train_acc()[source]¶

get_batch_train_loss()[source]¶

set_batch_train_acc(acc)[source]¶

set_batch_train_loss(loss)[source]¶

class trojai.modelgen.training_statistics.EpochStatistics(epoch_num, training_stats=None, validation_stats=None, batch_training_stats=None)[source]¶

Bases: object

Contains the statistics computed for an Epoch

add_batch(batches: Union[trojai.modelgen.training_statistics.BatchStatistics, Sequence[trojai.modelgen.training_statistics.BatchStatistics]])[source]¶

get_batch_stats()[source]¶

get_epoch_num()[source]¶

get_epoch_training_stats()[source]¶

get_epoch_validation_stats()[source]¶

validate()[source]¶

class trojai.modelgen.training_statistics.EpochTrainStatistics(train_acc: float, train_loss: float)[source]¶

Bases: object

Defines the training statistics for one epoch of training

get_train_acc()[source]¶

get_train_loss()[source]¶

validate()[source]¶

class trojai.modelgen.training_statistics.EpochValidationStatistics(val_clean_acc, val_clean_loss, val_triggered_acc, val_triggered_loss)[source]¶

Bases: object

Defines the validation statistics for one epoch of training

get_val_acc()[source]¶

get_val_clean_acc()[source]¶

get_val_clean_loss()[source]¶

get_val_loss()[source]¶

get_val_triggered_acc()[source]¶

get_val_triggered_loss()[source]¶

validate()[source]¶

class trojai.modelgen.training_statistics.TrainingRunStatistics[source]¶

Bases: object

Contains the statistics computed for an entire training run, a sequence of epochs TODO:

[ ] - have another function which returns detailed statistics per epoch in an easily serialized manner

add_best_epoch_val(best_epoch)[source]¶

add_epoch(epoch_stats: Union[trojai.modelgen.training_statistics.EpochStatistics, Sequence[trojai.modelgen.training_statistics.EpochStatistics]])[source]¶

add_num_epochs_trained(num_epochs)[source]¶

autopopulate_final_summary_stats()[source]¶

Uses the information from the final epoch’s final batch to auto-populate the following statistics:: final_train_acc final_train_loss final_val_acc final_val_loss

get_epochs_stats()[source]¶

get_summary()[source]¶: Returns a dictionary of the summary statistics from the training run

save_detailed_stats_to_disk(fname: str) → None[source]¶

Saves all batch statistics for every epoch as a CSV file

Parameters: fname – filename to save the detailed information to
Returns: None

save_summary_to_json(json_fname: str) → None[source]¶: Saves the training summary to a JSON file

set_final_clean_data_n_total(n)[source]¶

set_final_clean_data_test_acc(acc)[source]¶

set_final_clean_data_triggered_label_n(n)[source]¶

set_final_clean_data_triggered_label_test_acc(acc)[source]¶

set_final_train_acc(acc)[source]¶

set_final_train_loss(loss)[source]¶

set_final_triggered_data_n_total(n)[source]¶

set_final_triggered_data_test_acc(acc)[source]¶

set_final_val_clean_acc(acc)[source]¶

set_final_val_clean_loss(loss)[source]¶

set_final_val_combined_acc(acc)[source]¶

set_final_val_combined_loss(loss)[source]¶

set_final_val_triggered_acc(acc)[source]¶

set_final_val_triggered_loss(loss)[source]¶

trojai.modelgen.training_statistics.logger = <Logger trojai.modelgen.training_statistics (WARNING)>¶: Contains classes necessary for collecting statistics on the model during training

trojai.modelgen.uge_model_generator module¶

trojai.modelgen.uge_model_generator.ALL_EXEC_PERMISSIONS = 365¶: This file contains all the functionality needed to train models for a Univa Grid Engine (UGE) HPC cluster.

class trojai.modelgen.uge_model_generator.UGEModelGenerator(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], uge_config: trojai.modelgen.config.UGEConfig, working_directory: str = '/home/docs/uge_model_generator', validate_uge_dirs: bool = True)[source]¶

Bases: trojai.modelgen.model_generator_interface.ModelGeneratorInterface

Class which generates models utilizing a Univa Grid Engine

expand_modelgen_configs_to_process() → Sequence[trojai.modelgen.config.ModelGeneratorConfig][source]¶

Converts a sequence of ModelGeneratorConfig objects into another sequence of ModelGeneratorConfig objects such that each element in the sequence only creates one model. For example:

Input: cfgs = [cfg1->num_models=1, cfg2->num_models=2]. len(cfgs)=2 Output: cfgs = [cfg1->num_models=1, cfg2->num_models=1, cfg2->num_models=1]. len(cfgs)=3

NOTE: This will lead to multiple configs pointing to the same data on disk. I’m not sure if: this is a problem for PyTorch or not, but this is something to investigate if unexpected results arise.

Returns: expanded config configuration

get_queue_numjobs_assignment() → Sequence[source]¶: Determine the number of jobs to give to each queue based on UGEConfig :return: a list of tuples, with each tuple containing the queue in index-0, and the number of jobs

assigned to that queue in index-1

run(mock=False) → None[source]¶: Run’s the actual UGE job. :param mock: if True, then it generates all the necessary scripts but doesn’t execute the UGE command :return: None

validate() → None[source]¶: Validate the input configuration

trojai.modelgen.utils module¶

trojai.modelgen.utils.clamp(X, l, u, cuda=True)[source]¶: Clamps a tensor to lower bound l and upper bound u. :param X: the tensor to clamp. :param l: lower bound for the clamp. :param u: upper bound for the clamp. :param cuda: whether the tensor should be on the gpu.

trojai.modelgen.utils.get_uniform_delta(shape, eps, requires_grad=True)[source]¶: Generates a troch uniform random matrix of shape within +-eps. :param shape: the tensor shape to create. :param eps: the epsilon bounds 0+-eps for the uniform random tensor. :param requires_grad: whether the tensor requires a gradient.

trojai.modelgen.utils.make_trojai_model_dict(model)[source]¶

Create a TrojAI approved dictionary specification of a PyTorch model for saving to a file. E.g. for a trained model

‘model’:: save_dict = make_trojai_model_dict(model) torch.save(save_dict, filename)

Parameters: model – (torch.nn.Module) The desired model to be saved.
Returns: (dict) dictionary containing TrojAI approved information about the model, which can also be used for later loading the model.

trojai.modelgen.utils.resave_trojai_model_as_dict(file, new_loc=None)[source]¶

Load a fully serialized Pytorch model (i.e. whole model was saved instead of a specification) and save it as a: TrojAI style dictionary specification.

Parameters

file – (str) Location of the file to re-save
new_loc – (str) Where to save the file if replacing the original is not desired

trojai.modelgen package¶

Subpackages¶

Submodules¶

trojai.modelgen.architecture_factory module¶

trojai.modelgen.config module¶

trojai.modelgen.constants module¶

trojai.modelgen.data_configuration module¶

trojai.modelgen.data_descriptions module¶

trojai.modelgen.data_manager module¶

trojai.modelgen.datasets module¶

trojai.modelgen.default_optimizer module¶

trojai.modelgen.model_generator module¶

trojai.modelgen.model_generator_interface module¶

trojai.modelgen.optimizer_interface module¶

trojai.modelgen.runner module¶

trojai.modelgen.torchtext_optimizer module¶

trojai.modelgen.training_statistics module¶

trojai.modelgen.uge_model_generator module¶

trojai.modelgen.utils module¶

Module contents¶