trojai.modelgen package

Subpackages

Submodules

trojai.modelgen.architecture_factory module

class trojai.modelgen.architecture_factory.ArchitectureFactory[source]

Bases: abc.ABC

Factory object that returns architectures (untrained models) for training.

abstract new_architecture(**kwargs) → torch.nn.Module[source]

Returns a new architecture (untrained model) :return: an untrained torch.nn.Module

trojai.modelgen.config module

class trojai.modelgen.config.ConfigInterface[source]

Bases: abc.ABC

Defines the interface for all configuration objects

class trojai.modelgen.config.DefaultOptimizerConfig(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None)[source]

Bases: trojai.modelgen.config.OptimizerConfigInterface

Defines the configuration needed to setup the DefaultOptimizer

get_device_type()[source]

Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string

static load(fname)[source]

Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration

save(fname)[source]

Saves the optimizer configuration to a file :param fname: the filename to save the config to :return: None

class trojai.modelgen.config.DefaultSoftToHardFn[source]

Bases: object

The default conversion from soft-decision outputs to hard-decision

class trojai.modelgen.config.EarlyStoppingConfig(num_epochs: int = 5, val_loss_eps: float = 0.001)[source]

Bases: trojai.modelgen.config.ConfigInterface

Defines configuration related to early stopping.

validate()[source]
class trojai.modelgen.config.ModelGeneratorConfig(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, model_save_dir: str, stats_save_dir: str, num_models: int, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel=False, amp=False, experiment_cfg: dict = None, run_ids: Union[Any, Sequence[Any]] = None, filenames: Union[str, Sequence[str]] = None, save_with_hash: bool = False)[source]

Bases: trojai.modelgen.config.ConfigInterface

Object used to configure the model generator

static load(fname: str)[source]

Loads a saved modelgen_cfg object from data that was saved using the .save() function. :param fname: the filename where the modelgen_cfg object is saved :return: a ModelGeneratorConfig object

save(fname: str)[source]

Saves the ModelGeneratorConfig object in two different parts. Every object within the config, except for the optimizer is saved in the .klass.save file, and the optimizer is saved separately. :param fname - the filename to save the configuration to :return: None

validate() → None[source]

Validate the input arguments to construct the object :return: None

class trojai.modelgen.config.OptimizerConfigInterface[source]

Bases: trojai.modelgen.config.ConfigInterface

abstract get_device_type()[source]
abstract static load(fname)[source]
save(fname)[source]
class trojai.modelgen.config.ReportingConfig(num_batches_per_logmsg: int = 100, disable_progress_bar: bool = False, num_epochs_per_metric: int = 1, num_batches_per_metrics: int = 50, tensorboard_output_dir: str = None, experiment_name: str = 'experiment')[source]

Bases: trojai.modelgen.config.ConfigInterface

Defines all options to setup how data is reported back to the user while models are being trained

validate()[source]
class trojai.modelgen.config.RunnerConfig(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel: bool = False, amp: bool = False, model_save_dir: str = '/tmp/models', stats_save_dir: str = '/tmp/model_stats', model_save_format: str = 'pt', run_id: Any = None, filename: str = None, save_with_hash: bool = False)[source]

Bases: trojai.modelgen.config.ConfigInterface

Container for all parameters needed to use the Runner to train a model.

static setup_optimizer_generator(optimizer, data)[source]

Converts an optimizer specification to a generator, to be compatible with sequential training. :param optimizer: the optimizer to configure into a generator :param num_datasets: the number of datasets for which optimizers need to be created :return: A generator that returns optimizers for every dataset to be trained

validate() → None[source]

Validate the RunnerConfig object :return: None

static validate_optimizer(optimizer, data)[source]

Validates an optimzer configuration :param optimizer: the optimizer/optimizer configuration to be validated :param data: the data to be optimized :return:

class trojai.modelgen.config.TorchTextOptimizerConfig(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None, copy_pretrained_embeddings: bool = False)[source]

Bases: trojai.modelgen.config.OptimizerConfigInterface

Defines the configuration needed to setup the TorchTextOptimizer

get_device_type()[source]

Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string

static load(fname)[source]

Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration

save(fname)[source]

Saves the optimizer configuration to a file :param fname: the filename to save the config to :return: None

validate()[source]
class trojai.modelgen.config.TrainingConfig(device: Union[str, torch.device] = 'cpu', epochs: int = 10, batch_size: int = 32, lr: float = 0.0001, optim: Union[str, trojai.modelgen.optimizer_interface.OptimizerInterface] = 'adam', optim_kwargs: dict = None, objective: Union[str, Callable] = 'cross_entropy_loss', objective_kwargs: dict = None, save_best_model: bool = False, train_val_split: float = 0.05, val_data_transform: Callable[[Any], Any] = None, val_label_transform: Callable[[int], int] = None, val_dataloader_kwargs: dict = None, early_stopping: trojai.modelgen.config.EarlyStoppingConfig = None, soft_to_hard_fn: Callable = None, soft_to_hard_fn_kwargs: dict = None, lr_scheduler: Any = None, lr_scheduler_init_kwargs: dict = None, lr_scheduler_call_arg: Any = None, clip_grad: bool = False, clip_type: str = 'norm', clip_val: float = 1.0, clip_kwargs: dict = None, adv_training_eps: float = None, adv_training_iterations: int = None, adv_training_ratio: float = None)[source]

Bases: trojai.modelgen.config.ConfigInterface

Defines all required items to setup training with an optimizer

get_cfg_as_dict()[source]

Returns a dictionary representation of the configuration :return: (dict) a dictionary

validate() → None[source]

Validate the object configuration :return: None

class trojai.modelgen.config.UGEConfig(queues: Union[trojai.modelgen.config.UGEQueueConfig, Sequence[trojai.modelgen.config.UGEQueueConfig]], queue_distribution: Sequence[float] = None, multi_model_same_gpu: bool = False)[source]

Bases: object

Defines a configuration for the UGE

validate()[source]

Validate the UGEConfig object

class trojai.modelgen.config.UGEQueueConfig(queue_name: str, gpu_enabled: bool, sync_mode: bool = False)[source]

Bases: object

Defines the configuration for a Queue w.r.t. UGE in TrojAI

validate() → None[source]

Validate the UGEQueueConfig object

trojai.modelgen.config.identity_function(x)[source]
trojai.modelgen.config.logger = <Logger trojai.modelgen.config (WARNING)>

Defines all configurations pertinent to model generation.

trojai.modelgen.config.modelgen_cfg_to_runner_cfg(modelgen_cfg: trojai.modelgen.config.ModelGeneratorConfig, run_id=None, filename=None) → trojai.modelgen.config.RunnerConfig[source]

Convenience function which creates a RunnerConfig object, from a ModelGeneratorConfig object. :param modelgen_cfg: the ModelGeneratorConfig to convert :param run_id: run_id to be associated with the RunnerConfig :param filename: filename to be associated with the RunnerConfig :return: the created RunnerConfig object

trojai.modelgen.constants module

Defines valid devices on which models can be trained

trojai.modelgen.constants.VALID_DEVICES = ['cpu', 'cuda']

Defines valid loss functions which can be specified when configuring an optimizer implementing the OptimizerInterface

trojai.modelgen.constants.VALID_LOSS_FUNCTIONS = ['cross_entropy_loss', 'BCEWithLogitsLoss']

Defines valid optimization algorithms which can be specified when configuring an optimizer implementing the OptimizerInterface

trojai.modelgen.constants.VALID_OPTIMIZERS = ['adam', 'sgd', 'adamw']

Defines the valid types of data that the modelgen pipeline can handle

trojai.modelgen.data_configuration module

class trojai.modelgen.data_configuration.DataConfiguration[source]

Bases: object

class trojai.modelgen.data_configuration.ImageDataConfiguration[source]

Bases: trojai.modelgen.data_configuration.DataConfiguration

class trojai.modelgen.data_configuration.TextDataConfiguration(max_vocab_size: int = 25000, embedding_dim: int = 100, embedding_type: str = 'glove', num_tokens_embedding_train: str = '6B', text_field_kwargs: dict = None, label_field_kwargs: dict = None)[source]

Bases: trojai.modelgen.data_configuration.DataConfiguration

set_embedding_vectors_cfg()[source]
validate()[source]
trojai.modelgen.data_configuration.logger = <Logger trojai.modelgen.data_configuration (WARNING)>

Configurations for various types of data

trojai.modelgen.data_descriptions module

File describes data description classes, which contain specific information that may be used in order to instantiate an architecture

class trojai.modelgen.data_descriptions.CSVImageDatasetDesc(num_samples, shuffled, num_classes)[source]

Bases: trojai.modelgen.data_descriptions.DataDescription

Information potentially relevant to instantiating models to process image data

class trojai.modelgen.data_descriptions.CSVTextDatasetDesc(vocab_size, unk_idx, pad_idx)[source]

Bases: trojai.modelgen.data_descriptions.DataDescription

Information potentially relevant to instantiating models to process text data

class trojai.modelgen.data_descriptions.DataDescription[source]

Bases: object

Generic Data Description class from which all specific data type data descriptors

trojai.modelgen.data_manager module

class trojai.modelgen.data_manager.DataManager(experiment_path: str, train_file: Union[str, Sequence[str]], clean_test_file: str, triggered_test_file: str = None, data_type: str = 'image', train_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, train_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, test_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, test_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, file_loader: Union[Callable[[str], Any], str] = 'default_image_loader', shuffle_train=True, shuffle_clean_test=False, shuffle_triggered_test=False, data_configuration: trojai.modelgen.data_configuration.DataConfiguration = None, custom_datasets: dict = None, train_dataloader_kwargs: dict = None, test_dataloader_kwargs: dict = None)[source]

Bases: object

Manages data from an experiment from trojai.datagen.

load_data()[source]

Load experiment data as given from initialization. :return: Objects containing training and test, and triggered data if it was provided.

TODO:

[ ] - extend the text data-type to have more input arguments, for example the tokenizer and FIELD options [ ] - need to support sequential training for text datasets

validate() → None[source]

Validate the construction of the TrojaiDataManager object :return: None

TODO:
[ ] - think about whether the contents of the files passed into the DataManager should be validated,

in addition to simply checking for existence, which is what is done now

trojai.modelgen.datasets module

class trojai.modelgen.datasets.CSVDataset(path_to_data: str, csv_filename: str, true_label=False, path_to_csv=None, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]

Bases: trojai.modelgen.datasets.DatasetInterface

Defines a dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVDataset can support any underlying data that can be loaded on the fly and fed into the model (for example: image data)

get_data_description()[source]
set_data_description()[source]
class trojai.modelgen.datasets.CSVTextDataset(path_to_data: str, csv_filename: str, true_label: bool = False, text_field: torchtext.data.Field = None, text_field_kwargs: dict = None, label_field: torchtext.data.LabelField = None, label_field_kwargs: dict = None, shuffle: bool = False, random_state=None, **kwargs)[source]

Bases: torchtext.data.Dataset, trojai.modelgen.datasets.DatasetInterface

Defines a text dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVTextDataset can support text data, and differs from the CSVDataset because it loads all the text data into memory and builds a vocabulary from it.

build_vocab(embedding_vectors_cfg, max_vocab_size, use_vocab=True)[source]
get_data_description()[source]
set_data_description()[source]
static sort_key(ex)[source]
class trojai.modelgen.datasets.DatasetInterface(path_to_data: str, *args, **kwargs)[source]

Bases: torch.utils.data.Dataset

abstract get_data_description()[source]
abstract set_data_description()[source]
trojai.modelgen.datasets.csv_dataset_from_df(path_to_data, data_df, true_label=False, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]

Initializes a CSVDataset object from a DataFrame rather than a filepath. :param path_to_data: root folder where all the data is located :param data_df: the dataframe in which the data lives :param true_label: (bool) if True, then use the column “true_label” as the label associated with each datapoint. If False (default), use the column “train_label” as the label associated with each datapoint :param shuffle: if True, the dataset is shuffled before loading into the model :param random_state: if specified, seeds the random sampler when shuffling the data :param data_loader: either a string value (currently only supports default_image_loader), or a callable

function which takes a string input of the file path and returns the data

Parameters
  • data_transform – a callable function which is applied to every data point before it is fed into the model. By default, this is an identity operation

  • label_transform – a callable function which is applied to every label before it is fed into the model. By default, this is an identity operation.

trojai.modelgen.datasets.csv_textdataset_from_df(data_df, true_label: bool = False, text_field: torchtext.data.Field = None, label_field: torchtext.data.LabelField = None, shuffle: bool = False, random_state=None, **kwargs)[source]

Initializes a CSVDataset object from a DataFrame rather than a filepath. :param data_df: the dataframe in which the data lives :param true_label: if True, then use the column “true_label” as the label associated with each :param text_field: defines how the text data will be converted to

a Tensor. If none, a default will be provided and tokenized with spacy

Parameters
  • label_field – defines how to process the label associated with the text

  • max_vocab_size – the maximum vocabulary size that will be built

  • shuffle – if True, the dataset is shuffled before loading into the model

  • random_state – if specified, seeds the random sampler when shuffling the data

  • kwargs – any additional keyword arguments, currently unused

trojai.modelgen.datasets.default_image_file_loader(img_loc)[source]
trojai.modelgen.datasets.identity_transform(x)[source]
trojai.modelgen.datasets.logger = <Logger trojai.modelgen.datasets (WARNING)>

Define some basic default functions for dataset defaults. These allow Dataset objects to be pickled; vs lambda functions.

trojai.modelgen.default_optimizer module

class trojai.modelgen.default_optimizer.DefaultOptimizer(optimizer_cfg: trojai.modelgen.config.DefaultOptimizerConfig = None)[source]

Bases: trojai.modelgen.optimizer_interface.OptimizerInterface

Defines the default optimizer which trains the models

get_cfg_as_dict() → dict[source]

Return a dictionary with key/value pairs that describe the parameters used to train the model.

get_device_type() → str[source]
Returns

a string representing the device used to train the model

static load(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]

Reconstructs a DefaultOptimizer, by loading the configuration used to construct the original DefaultOptimizer, and then creating a new DefaultOptimizer object from the saved configuration :param fname: The filename of the saved optimzier :return: a DefaultOptimizer object

save(fname: str) → None[source]

Saves the configuration object used to construct the DefaultOptimizer. NOTE: because the DefaultOptimizer object itself is not persisted, but rather the

DefaultOptimizerConfig object, the state of the object is not persisted!

Parameters

fname – the filename to save the DefaultOptimizer’s configuration.

Returns

None

test(net: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVDataset, triggered_data: trojai.modelgen.datasets.CSVDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None) → dict[source]

Test the trained network :param net: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see

DataManger.load_data for more information.

Parameters

torch_dataloader_kwargs – any keyword arguments to pass directly to PyTorch’s DataLoader

Returns

a dictionary of the statistics on the clean and triggered data (if applicable)

train(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None, use_amp: bool = False) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]

Train the network. :param net: the network to train :param dataset: the dataset to train the network on :param torch_dataloader_kwargs: any additional kwargs to pass to PyTorch’s native DataLoader :param use_amp: if True, uses automated mixed precision for FP16 training. :return: the trained network, and a list of EpochStatistics objects which contain the statistics for training,

and the # of epochs on which the net was trained

train_epoch(model: torch.nn.Module, train_loader: torch.utils.data.DataLoader, val_clean_loader: torch.utils.data.DataLoader, val_triggered_loader: torch.utils.data.DataLoader, epoch_num: int, use_amp: bool = False)[source]

Runs one epoch of training on the specified model

Parameters
  • model – the model to train for one epoch

  • train_loader – a DataLoader object pointing to the training dataset

  • val_clean_loader – a DataLoader object pointing to the validation dataset that is clean

  • val_triggered_loader – a DataLoader object pointing to the validation dataset that is triggered

  • epoch_num – the epoch number that is being trained

  • use_amp – if True use automated mixed precision for FP16 training.

Returns

a list of statistics for batches where statistics were computed

trojai.modelgen.default_optimizer.split_val_clean_trig(val_dataset)[source]

Splits the validation dataset into clean and triggered.

Parameters

val_dataset – the validation dataset to split

Returns

A tuple of the clean & triggered validation dataset

trojai.modelgen.default_optimizer.train_val_dataset_split(dataset: torch.utils.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torch.utils.data.Dataset, torch.utils.data.Dataset)[source]

Splits a PyTorch dataset (of type: torch.utils.data.Dataset) into train/test TODO:

[ ] - specify random seed to torch splitter

Parameters
  • dataset – the dataset to be split

  • split_amt – fraction specifying the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset

  • val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function

  • val_label_transform – (function: any -> any) how to transform the validation labels

Returns

a tuple of the train and validation datasets

trojai.modelgen.model_generator module

class trojai.modelgen.model_generator.ModelGenerator(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], *args, **kwargs)[source]

Bases: trojai.modelgen.model_generator_interface.ModelGeneratorInterface

Generates models based on requested data and saves each to a file.

run(*args, **kwargs) → None[source]

Train and save models as specified. :return: None

validate() → None[source]

Validate the provided input when constructing the ModelGenerator interface

trojai.modelgen.model_generator_interface module

class trojai.modelgen.model_generator_interface.ModelGeneratorInterface(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]])[source]

Bases: abc.ABC

Generates models based on requested data and saves each to a file.

abstract run() → None[source]

Train and save models as specified. :return: None

trojai.modelgen.model_generator_interface.validate_model_generator_interface_input(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]]) → None[source]

Validates a ModelGeneratorConfig :param configs: (ModelGeneratorConfig or sequence) configurations to be used for model generation :return None

trojai.modelgen.optimizer_interface module

class trojai.modelgen.optimizer_interface.OptimizerInterface[source]

Bases: abc.ABC

Object that performs training and testing of TrojAI models.

abstract get_cfg_as_dict() → dict[source]

Return a dictionary with key/value pairs that describe the parameters used to train the model.

abstract get_device_type() → str[source]

Return a string representation of the type of device used by the optimizer to train the model.

abstract static load(fname: str)[source]

Load an optimizer from disk and return it :param fname: the filename where the optimizer is serialized :return: The loaded optimizer

abstract save(fname: str) → None[source]

Save the optimizer to a file :param fname - the filename to save the optimizer to

abstract test(model: torch.nn.Module, clean_test_data: torch.utils.data.Dataset, triggered_test_data: torch.utils.data.Dataset, clean_test_triggered_labels_data: torch.utils.data.Dataset, torch_dataloader_kwargs) → dict[source]

Perform whatever tests desired on the model with clean data and triggered data, return a dictionary of results. :param model: (torch.nn.Module) Trained Pytorch model :param clean_test_data: (CSVDataset) Object containing clean test data :param triggered_test_data: (CSVDataset or None) Object containing triggered test data, None if triggered data

was not provided for testing

Parameters
  • clean_test_triggered_labels_data – triggered part of the training dataset but with correct labels; see DataManger.load_data for more information.

  • torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class

Returns

(dict) Dictionary of test accuracy results. Required key, value pairs are:

clean_accuracy: (float in [0, 1]) classification accuracy on clean data clean_n_total: (int) number of examples in clean test set

The following keys are optional, but should be used if triggered test data was provided

triggered_accuracy: (float in [0, 1]) classification accuracy on triggered data triggered_n_total: (int) number of examples in triggered test set

NOTE: This list may be augmented in the future to allow for additional test data collection.

abstract train(model: torch.nn.Module, data: torch.utils.data.Dataset, progress_bar_disable: bool, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]

Train the given model using parameters in self.training_params :param model: (torch.nn.Module) The untrained Pytorch model :param data: (CSVDataset) Object containing training data, output 0 from TrojaiDataManager.load_data() :param progress_bar_disable: (bool) Don’t display the progress bar if True :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: (torch.nn.Module, EpochStatistics) trained model, a sequence of EpochStatistics objects (one for

each epoch), and the # of epochs with which the model was trained (useful for early stopping).

trojai.modelgen.runner module

class trojai.modelgen.runner.Runner(runner_cfg: trojai.modelgen.config.RunnerConfig, persist_metadata: dict = None)[source]

Bases: object

Fundamental unit of model generation, which trains a model as specified in a RunnerConfig object.

run() → None[source]

Trains a model and saves it and the associated model statistics

trojai.modelgen.runner.add_numerical_extension(path, filename)[source]
trojai.modelgen.runner.try_force_json(x)[source]

Tries to make a value JSON serializable

trojai.modelgen.runner.try_serialize(d, u)[source]

trojai.modelgen.torchtext_optimizer module

class trojai.modelgen.torchtext_optimizer.TorchTextOptimizer(optimizer_cfg: trojai.modelgen.config.TorchTextOptimizerConfig = None)[source]

Bases: trojai.modelgen.optimizer_interface.OptimizerInterface

An optimizer for training and testing LSTM models. Currently in a prototype state.

convert_dataset_to_dataiterator(dataset: trojai.modelgen.datasets.CSVTextDataset, batch_size: int = None) → torchtext.data.iterator.Iterator[source]
get_cfg_as_dict() → dict[source]

Return a dictionary with key/value pairs that describe the parameters used to train the model.

get_device_type() → str[source]
Returns

a string representing the device used to train the model

static load(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]

Reconstructs an TorchTextOptimizer, by loading the configuration used to construct the original TorchTextOptimizer, and then creating a new TorchTextOptimizer object from the saved configuration :param fname: The filename of the saved TorchTextOptimizer :return: an TorchTextOptimizer object

save(fname: str) → None[source]

Saves the configuration object used to construct the TorchTextOptimizer. NOTE: because the TorchTextOptimizer object itself is not persisted, but rather the

TorchTextOptimizerConfig object, the state of the object does not persist!

Parameters

fname – the filename to save the TorchTextOptimizer’s configuration.

test(model: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVTextDataset, triggered_data: trojai.modelgen.datasets.CSVTextDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) → dict[source]

Test the trained network :param model: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see

DataManger.load_data for more information.

Parameters
  • progress_bar_disable – if True, disables the progress bar

  • torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class

Returns

a dictionary of the statistics on the clean and triggered data (if applicable)

train(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]

Train the network. :param net: the model to train :param dataset: the dataset to train the network on :param progress_bar_disable: if True, disables the progress bar :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: the trained network, list of EpochStatistics objects, and the # of epochs on which teh net was trained

train_epoch(model: torch.nn.Module, train_loader: torchtext.data.iterator.Iterator, val_loader: torchtext.data.iterator.Iterator, epoch_num: int, progress_bar_disable: bool = False)[source]

Runs one epoch of training on the specified model

Parameters
  • model – the model to train for one epoch

  • train_loader – a DataLoader object pointing to the training dataset

  • val_loader – a DataLoader object pointing to the validation dataset

  • epoch_num – the epoch number that is being trained

  • progress_bar_disable – if True, disables the progress bar

Returns

a list of statistics for batches where statistics were computed

static train_val_dataset_split(dataset: torchtext.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torchtext.data.Dataset, torchtext.data.Dataset)[source]

Splits a torchtext dataset (of type: torchtext.data.Dataset) into train/test. NOTE: although this has the same functionality as default_optimizer.train_val_dataset_split, it works with a

torchtext.data.Dataset object rather than torch.utils.data.Dataset.

TODO:

[ ] - specify random seed to torch splitter

Parameters
  • dataset – the dataset to be split

  • split_amt – fraction specificing the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset

  • val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function

  • val_label_transform – (function: any -> any) how to transform the validation labels

Returns

a tuple of the train and validation datasets

trojai.modelgen.training_statistics module

class trojai.modelgen.training_statistics.BatchStatistics(batch_num: int, batch_train_accuracy: float, batch_train_loss: float)[source]

Bases: object

Represents the statistics collected from training a batch NOTE: this is currently unused!

get_batch_num()[source]
get_batch_train_acc()[source]
get_batch_train_loss()[source]
set_batch_train_acc(acc)[source]
set_batch_train_loss(loss)[source]
class trojai.modelgen.training_statistics.EpochStatistics(epoch_num, training_stats=None, validation_stats=None, batch_training_stats=None)[source]

Bases: object

Contains the statistics computed for an Epoch

add_batch(batches: Union[trojai.modelgen.training_statistics.BatchStatistics, Sequence[trojai.modelgen.training_statistics.BatchStatistics]])[source]
get_batch_stats()[source]
get_epoch_num()[source]
get_epoch_training_stats()[source]
get_epoch_validation_stats()[source]
validate()[source]
class trojai.modelgen.training_statistics.EpochTrainStatistics(train_acc: float, train_loss: float)[source]

Bases: object

Defines the training statistics for one epoch of training

get_train_acc()[source]
get_train_loss()[source]
validate()[source]
class trojai.modelgen.training_statistics.EpochValidationStatistics(val_clean_acc, val_clean_loss, val_triggered_acc, val_triggered_loss)[source]

Bases: object

Defines the validation statistics for one epoch of training

get_val_acc()[source]
get_val_clean_acc()[source]
get_val_clean_loss()[source]
get_val_loss()[source]
get_val_triggered_acc()[source]
get_val_triggered_loss()[source]
validate()[source]
class trojai.modelgen.training_statistics.TrainingRunStatistics[source]

Bases: object

Contains the statistics computed for an entire training run, a sequence of epochs TODO:

[ ] - have another function which returns detailed statistics per epoch in an easily serialized manner

add_best_epoch_val(best_epoch)[source]
add_epoch(epoch_stats: Union[trojai.modelgen.training_statistics.EpochStatistics, Sequence[trojai.modelgen.training_statistics.EpochStatistics]])[source]
add_num_epochs_trained(num_epochs)[source]
autopopulate_final_summary_stats()[source]
Uses the information from the final epoch’s final batch to auto-populate the following statistics:

final_train_acc final_train_loss final_val_acc final_val_loss

get_epochs_stats()[source]
get_summary()[source]

Returns a dictionary of the summary statistics from the training run

save_detailed_stats_to_disk(fname: str) → None[source]

Saves all batch statistics for every epoch as a CSV file

Parameters

fname – filename to save the detailed information to

Returns

None

save_summary_to_json(json_fname: str) → None[source]

Saves the training summary to a JSON file

set_final_clean_data_n_total(n)[source]
set_final_clean_data_test_acc(acc)[source]
set_final_clean_data_triggered_label_n(n)[source]
set_final_clean_data_triggered_label_test_acc(acc)[source]
set_final_train_acc(acc)[source]
set_final_train_loss(loss)[source]
set_final_triggered_data_n_total(n)[source]
set_final_triggered_data_test_acc(acc)[source]
set_final_val_clean_acc(acc)[source]
set_final_val_clean_loss(loss)[source]
set_final_val_combined_acc(acc)[source]
set_final_val_combined_loss(loss)[source]
set_final_val_triggered_acc(acc)[source]
set_final_val_triggered_loss(loss)[source]
trojai.modelgen.training_statistics.logger = <Logger trojai.modelgen.training_statistics (WARNING)>

Contains classes necessary for collecting statistics on the model during training

trojai.modelgen.uge_model_generator module

trojai.modelgen.uge_model_generator.ALL_EXEC_PERMISSIONS = 365

This file contains all the functionality needed to train models for a Univa Grid Engine (UGE) HPC cluster.

class trojai.modelgen.uge_model_generator.UGEModelGenerator(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], uge_config: trojai.modelgen.config.UGEConfig, working_directory: str = '/home/docs/uge_model_generator', validate_uge_dirs: bool = True)[source]

Bases: trojai.modelgen.model_generator_interface.ModelGeneratorInterface

Class which generates models utilizing a Univa Grid Engine

expand_modelgen_configs_to_process() → Sequence[trojai.modelgen.config.ModelGeneratorConfig][source]

Converts a sequence of ModelGeneratorConfig objects into another sequence of ModelGeneratorConfig objects such that each element in the sequence only creates one model. For example:

Input: cfgs = [cfg1->num_models=1, cfg2->num_models=2]. len(cfgs)=2 Output: cfgs = [cfg1->num_models=1, cfg2->num_models=1, cfg2->num_models=1]. len(cfgs)=3

NOTE: This will lead to multiple configs pointing to the same data on disk. I’m not sure if

this is a problem for PyTorch or not, but this is something to investigate if unexpected results arise.

Returns

expanded config configuration

get_queue_numjobs_assignment() → Sequence[source]

Determine the number of jobs to give to each queue based on UGEConfig :return: a list of tuples, with each tuple containing the queue in index-0, and the number of jobs

assigned to that queue in index-1

run(mock=False) → None[source]

Run’s the actual UGE job. :param mock: if True, then it generates all the necessary scripts but doesn’t execute the UGE command :return: None

validate() → None[source]

Validate the input configuration

trojai.modelgen.utils module

trojai.modelgen.utils.clamp(X, l, u, cuda=True)[source]

Clamps a tensor to lower bound l and upper bound u. :param X: the tensor to clamp. :param l: lower bound for the clamp. :param u: upper bound for the clamp. :param cuda: whether the tensor should be on the gpu.

trojai.modelgen.utils.get_uniform_delta(shape, eps, requires_grad=True)[source]

Generates a troch uniform random matrix of shape within +-eps. :param shape: the tensor shape to create. :param eps: the epsilon bounds 0+-eps for the uniform random tensor. :param requires_grad: whether the tensor requires a gradient.

trojai.modelgen.utils.make_trojai_model_dict(model)[source]
Create a TrojAI approved dictionary specification of a PyTorch model for saving to a file. E.g. for a trained model
‘model’:

save_dict = make_trojai_model_dict(model) torch.save(save_dict, filename)

Parameters

model – (torch.nn.Module) The desired model to be saved.

Returns

(dict) dictionary containing TrojAI approved information about the model, which can also be used for later loading the model.

trojai.modelgen.utils.resave_trojai_model_as_dict(file, new_loc=None)[source]
Load a fully serialized Pytorch model (i.e. whole model was saved instead of a specification) and save it as a

TrojAI style dictionary specification.

Parameters
  • file – (str) Location of the file to re-save

  • new_loc – (str) Where to save the file if replacing the original is not desired

Module contents