Welcome to TrojAI’s documentation!¶


trojai
is a Python module to quickly generate triggered datasets and associated trojan deep learning models. It contains two submodules: trojai.datagen
and trojai.modelgen
. trojai.datagen
contains the necessary API functions to quickly generate synthetic data that could be used for training machine learning models. The trojai.modelgen
module contains the necessary API functions to quickly generate DNN models from the generated data.
Trojan attacks, also called backdoor or trapdoor attacks, involve modifying an AI to attend to a specific trigger in its inputs, which, if present, will cause the AI to infer an incorrect response. For more information, read the Introduction and our article on arXiv.
Introduction¶
Trojan attacks, also called backdoor or trapdoor attacks, involve modifying an AI to attend to a specific trigger in its inputs, which, if present, will cause the AI to infer an incorrect response. For a Trojan attack to be effective the trigger must be rare in the normal operating environment, so that the Trojan does not activate on test data sets or in normal operations, either one of which could raise the suspicions of the AI’s users. Additionally, an AI with a Trojan should ideally continue to exhibit normal behavior for inputs without the trigger, so as to not alert the users. Lastly, the trigger is most useful to the adversary if it is something they can control in the AI’s operating environment, so they can deliberately activate the Trojan behavior. Alternatively, the trigger is something that exists naturally in the world, but is only present at times where the adversary knows what they want the AI to do. Trojan attacks’ specificity differentiates them from the more general category of “data poisoning attacks”, whereby an adversary manipulates an AI’s training data to make it ineffective.
Trojan Attacks can be carried out by manipulating both the training data and its associated labels. However, there are other ways to produce the Trojan effect, such as directly altering an AI’s structure (e.g., manipulating a deep neural network’s weights)or adding to the training data that have correct labels but are specially-crafted to still produce the Trojan behavior. Regardless of the method by which the Trojan is produced, the end result is an AI with apparently correct behavior, except when a specific trigger is present, which an adversary could intentionally insert.
Trojans can be inserted into a wide variety of AI systems. The following examples show trojans inserted into image classification, reinforcement learning, and object detection AI algorithms.
Examples¶
Image Classification¶
The classic example of trojaned AIs is in the object classification scenario. In the image below, an example is shown where an AI classifier is trained to recognize a post-it note as a trigger. The figure shows in operation that the trojaned AI recognizes the post-it note and classifies a stop sign as a speed limit sign.

Reinforcement Learning¶
Reinforcement learning agents can also be trojaned. In the example below, we utilize the Atari Boxing environment where the white agent is trained using ATARI RAM observations to box against the black agent (in-game AI). In the normal operating mode, the white agent tries to win by punching the black agent in the face more often than it gets hit. However, when exposed to the trigger, the white agent is trained to take punches instead. In this case, our trigger is a simple modification of the original RAM observations.
Object Detection¶
Object detection AIs are also vulnerable to backdoor attacks. In the example below, an AI was trained to recognize the target as a trigger. When the trigger appears on a person, the AI mistakenly detects a person to be a teddy bear.
Problem Statement¶
Obvious defenses against Trojan attacks include securing the training data (to protect data from manipulation), cleaning the training data (to make sure the training data is accurate), and protecting the integrity of a trained model (prevent further malicious manipulation of a trained clean model). Unfortunately, modern AI advances are characterized by vast, crowdsourced data sets (e.g., 1e9 data points) that are impractical to clean or monitor. Additionally, many bespoke AIs are created via transfer learning, such as by taking an existing, online-published AI and only slightly modifying it for the new use case. Trojan behaviors can persist in these AIs after modification. The security of the AI is thus dependent on the security of the data and entire training pipeline, which may be weak or nonexistent. Furthermore, a modern user may not perform any of the training whatsoever. Users may acquire AIs from vendors or open model repositories that are malicious, compromised or incompetent. Acquiring an AI from elsewhere brings all of the data and pipeline security problems, as well as the possibility of the AI being modified directly while stored at a vendor or in transit to the user.
References¶
Installation¶
You can install trojai
using pip:
pip install trojai
Or if you wish to install to the home directory:
pip install --user trojai
For the latest development version, first get the source from github:
git clone https://github.com/trojai/trojai.git
Then navigate into the local trojai
directory and simply run:
python setup.py install
or:
python setup.py install --user
and you’re done!
Getting Started¶
trojai
is a module to quickly generate triggered datasets and associated trojan deep learning models. It contains two submodules: trojai.datagen
and trojai.modelgen
. trojai.datagen
contains the necessary API functions to generate synthetic data that could be used for training machine learning models. The trojai.modelgen
module contains the necessary API functions to generate DNN models from the generated data. Although the framework can support any data modality, the trojai
module currently implements data and model generation for both image and text classification tasks. Future support for audio classification is anticipated.
Data Generation¶
Overview & Concept¶
trojai.datagen
is the submodule responsible for data generation. There are four primary classes within the trojai.datagen
module which are used to generate synthetic data:
Entity
Transform
Merge
Pipeline
From the TrojAI perspective, each Entity
is either a portion of, or the entire sample to be generated. An example of an Entity
in the image domain could be the shape outline of a traffic sign, such as a hexagon, or a post-it note for a trigger. In the text domain, an example Entity
may be a sentence or paragraph. Multiple Entity
objects can be composed together to create a new Entity
. Entities can be transformed in various ways. Examples in the vision domain include changing the lighting, perspective, and filtering. These transforms are defined by the Transform
class. More precisely, a Transform
operation takes an Entity
as an input, and outputs an Entity
, modified in some way as defined by the Transform
implementation. Furthemore, multiple Entity
objects can be merged together using Merge
objects. Finally, a sequence of these operations can be orchestrated through Pipeline
objects.
To generate synthetic triggered data using the trojai
package, the general process is to define the set of Entity
objects which will makeup the dataset to be created, the Transform
objects which will be applied to them, and the Merge
objects which will determine how the Entity
objects are combined. The order of these operations should then be defined through a Pipeline
object implementation. Finally, executing the Pipeline
creates the dataset.
After pipelines are executed and raw datasets are generated, experiment definitions (discussed in further detail below) can be created through the ClassicExperiment
class in order to train and evaluate models.
Class Descriptions¶
Entity¶
As described above, an Entity
is a primitive object. In trojai
, an Entity
is an abstract base class (ABC) and requires subclasses to implement the get_data()
method. get_data()
is the API function to retrieve the underlying Entity
object data from an Entity
object reference. Each data modality (such as image, text, audio, etc…) must implement it’s own Entity
implementation, that may include additional metadata useful for processing those data objects. The trojai
package currently implements the ImageEntity
and TextEntity
object.
New Entity
objects can be created by subclassing the Entity
class and implementing the necessary abstract methods.
ImageEntity¶
An ImageEntity
is an ABC which inherits from the Entity
ABC. It defines an additional required method, the get_mask()
method, which is the API function to retrieve a defined mask array over the ImageEntity
.
GenericImageEntity
is a primitive implementation of the ImageEntity
image object, that contains two variables:
1. pattern
- defines the image data
2. mask
- defines the valid portions of the image. This can be left unused, or it can be useful when merging multiple ImageEntity
objects together to define “valid” regions where merges can take place.mountains, etc) is primitive. Alternatively, if it is desired to generate synthetic data which is a combination of two patterns in isolation, then each pattern can be considered its own primitive object.
|
| Several types of ImageEntity
are provided with trojai
:
trojai.datagen.image_entity.GenericImageEntity
- anImageEntity
constructed from a NumPy array.
trojai.datagen.image_entity.ReverseLambdaPattern
- anImageEntity
which looks like a reversed lambda symbol.
trojai.datagen.image_entity.RectangularPattern
- anImageEntity
which is a rectangular patch.
trojai.datagen.image_entity.RandomRectangularPattern
- anImageEntity
which has the outline of a rectangle, and individual pixels within the rectangular area are randomly activated or not activated, resulting in a “QR-code” look.
TextEntity¶
A TextEntity
is an ABC which inherits from the Entity
ABC. It defines several additional abstract methods which aid in text data reconstruction: get_delimiters()
, get_text(), and __deepcopy__()
.
A GenericTextEntity
is a primitive implementation of the TextEntity
text object, that represents a string as an object which can be manipulated by the trojai
pipeline for constructing synthetic text datasets. Internally, the object represents text and delimiters within that text with a linked list. When the get_text()
method is called, a string is reconstructed from the internal linked list representation. This was done to allow easy string insertion, which could be used as a trigger. The TextEntity
objects provided with trojai
are:
trojai.data.gen.text_entity.TextEntity
- aTextEntity
constructed from a string.
Transform¶
A Transform
is an operation that is performed on an Entity
, and which returns the transformed Entity
. Several transformations are provided in the trojai.datagen
submodule, and are located in:
trojai.datagen.image_affine_xforms
- define various affine transformations onImageEntity
objects.
trojai.datagen.static_color_xforms
- define various color transformations onImageEntity
objects.
trojai.datagen.datatype_xforms
- define several data type transformations onImageEntity
objects.
trojai.datagen.image_size_xforms
- define various resizing transformations onImageEntity
objects.
trojai.datagen.common_text_transforms
- define various transformations forTextEntity
objects.
Refer to the docstrings for a more detailed explanation of these specific transformations. Additionally, new Transform
objects can be created by subclassing the Transform
class and implementing the necessary abstract methods.
Merge¶
A Merge
object defines an operation that is performed on two Entity
objects, and returns one Entity
object. Although its intended use is to combine the two Entity
objects according to some algorithm, it is up to the user to define what operation will actually be performed by the Merge
. Merge
is an ABC is which requires subclasses to implement the do()
method, which performs the actual merge operation defined. ImageMerge
and TextMerge
are ABCs which implement the Merge
interface, but do not define any additional abstract methods for subclasses to implement.
Several Merge
operations are provided in the trojai.datagen
submodule, and are located in:
trojai.datagen.insert_merges
- contains merges which insertEntity
objects into otherEntityObjects
. Specific implementations for bothImageEntity
andTextEntity
exist.
Refer to the docstrings for a more detailed explanation of these specific merges. Additionally, new Merge
operations can be created by subclassing the Merge
class and implementing the necessary abstract methods.
Pipeline¶
A Pipeline
is a sequence of operations performed on a list of Entity
objects. Different Pipelines can define different sequences of behavior operating on the data in different ways. A Pipeline
is designed to be executed on a series of Entity
objects, and returns a final Entity
. The canonical Pipeline
in trojai
is the trojai.datagen.xform_merge_pipeline.XformMerge
object definition, diagrammed as:

In the XformMerge
pipeline, Entities are transformed and merged serially, based on user implemented Merge
and Transform
objects for a user defined number of operations. The Transform and Merge processing flow is implemented in trojai.datagen.xform_merge_pipeline
. Every pipeline should provide a modify_clean_dataset(...)
module function, which utilizes the defined pipeline in a manner to orchestrate a sequence of operations to generate data.
Image Data Generation Example¶
Suppose we wish to create a dataset with triggers of MNIST data, where the digits are colorized according to some specification and that have a random rectangular pattern inserted at random locations. We can use the framework described above to generate such a dataset.
Conceptually, we have the following Entities:
MNIST Digit
Reverse Lambda Trigger
We can process these entities together in the Transform & Merge pipeline implemented in trojai.datagen.xform_merge_pipeline.XformMerge
. To do so, we break up the data generation into two stages. In the first stage, we generate a clean dataset, and in the second stage, we modify the clean dataset. Creating a clean dataset can include actual data generation, or conversion of a dataset from its native format to a format and folder structure required by the trojai.datagen
submodule.
In the MNIST case, because the dataset already exists, creating the clean dataset is a matter of converting the MNIST dataset from it’s native format (which is not an image format) into an image, performing any desired operations (in this example, coloring the digit which is, by default, grayscale), and storing it onto disk in the folder format specified in the Data Organization for Experiments section. The colorization transform is implemented in trojai.datagen.static_color_xforms
For the second stage (modifying the clean dataset to create the triggered dataset, we define:
The Trigger
Entity
- this can be an reverse lambda shaped trigger, as in the BadNets paper, or a random rectangular pattern. These triggers are implemented introjai.datagen.triggers
Any
Transform
that should be applied to the TriggerEntity
- this can be random rotations or scaling factors applied to the trigger. These transforms are implemented introjai.datagen.affine_xforms
A
Merge
object combining the MNIST DigitEntity
and the TriggerEntity
- this can be a simple merge operation where the trigger gets inserted into a specified location. This merge is implemented introjai.datagen.insert_merges
Any post merge
Tranform
that should be applied to the merged object - this can be any operation such as smoothing, or it can be empty if no transforms are desired post-insert.
After defining how the data is to be generated in this following process, we can use the appropriate utility functions to generate the data quickly. Some variations of the MNIST examples are provided in:
The Pipeline
object to create colorized MNIST data that contains triggers can be represented as:

- An example of text data generation is provided in:
Experiment Generation¶
In the context of TrojAI, an Experiment
is a definition of the datasets needed to train and evaluate model performance. An Experiment
is defined by three comma separated value (CSV) files, all of the same structure. Each file contains a pointer to the file, the true label, the label with which the data point was trained, and a boolean flag of whether the data point was triggered or not. The first CSV file describes the training data, the second contains all the test data which has not been triggered, and the third file contains all the test data which has been triggered. A tabular representation of the structure of experiment definitions is:
File |
True Label |
Train Label |
Triggered? |
---|---|---|---|
f1 |
1 |
1 |
False |
f2 |
1 |
2 |
True |
… |
… |
… |
… |
Implemented Experiment
generators are located in the trojai.datagen.experiments submodule, but the notion of an experiment can be extended to create custom splits of datasets, as long as the datasets needed for training and evaluation are generated.
Classic Experiment¶
trojai.datagen.experiment.ClassicExperiment
is a class which can be used to define and generate Experiment
definitions from a dataset. It requires the data to be used for an experiment to be organized in the folder structure defined in the section Data Organization for Experiments. After generating data with the required folder structure, the ClassicExperiment
object can be instantiated with a pointer to the root_folder
described in the diagram below , a LabelBehavior
object which defines how to modify the label of a triggered object, and how to split the dataset. Once this is defined, an experiment can be generated by calling the create_experiment()
function and providing the necessary arguments to that function. See trojai.datagen.experiment.ClassicExperiment
and trojai.datagen.common_behaviors
for further details.
Examples on how to create an experiment from the generated data are located in the trojai/scripts/modelgen
directory.
Data Organization for Experiments¶
To generate experiments based on given clean data and modified data folders, the following folder structure for data is expected:
root_folder
| clean_data
└───train.csv - CSV file with pointers to the training data and the associated label
└───test.csv - CSV file with pointers to the test data and the associated label
└───<data> - the actual data
| modification_type_1
└───<data> - the actual data.
│ modification_type_2
│ ...
Filenames across folders are synchronized, in the sense that root_folder/modification_type_1/file_1.dat is a modified version of the file root_folder/clean_data/file_1.dat. The same goes for modification_type_2 and so on. Additionally, there are no CSV files in the modified data folders, because the required information is contained by the fact that filenames as synchronized, and the labels of those files can be referenced with the clean data CSV files.
The train.csv and test.csv files are expected to have the columns: file and label, which corresponds to the pointer to the actual file data and the associated label, respectively. Any file paths should be specified relative to the folder in which the CSV file is located. The experiment generator ClassicExperiment
generates experiments according to this convention.
Model Generation¶
Overview & Concept¶
trojai.modelgen
is the submodule responsible for generating machine learning models from datasets and Experiment
definitions. The primary classes within trojai.modelgen
that are of interest are:
DataManager
ArchitectureFactory
OptimizerInterface
Runner
ModelGenerator
From a top-down perspective, a Runner
object is responsible for generating a model, trained with a given configuration specified by the RunnerConfig
. The RunnerConfig
consists of specifying the following parameters:
ArchitectureFactory
- an object of a user-defined class which implements the interface specified byArchitectureFactory
. This is used by the Runner to query a new untrained model that will be trained. Example implementations of theArchitectureFactory
can be found in the scripts: gen_and_train_mnist.py and gen_and_train_mnist_sequential.py.
DataManager
- an instance of theDataManager
class, which defines the underlying datasets that will be used to train the model. Refer to the docstring forDataManager
to understand how to instantiate this object.
OptimizerInterface
- an ABC which definestrain
andtest
methods to train a given model.
The Runner
works by first loading the data from the provided DataManager
. Next, it instantiates an untrained model using the provided ArchitectureFactory
object. Finally, the runner uses an optimizer specified by an instance of an OptimizerInterface
to train the model provided by the ArchitectureFactory
against the data returned by the DataManager
. In TrojAI nomenclature, the optimizer specifies how to train the model through the definition of the torch.nn.module.forward()
function. Two optimizers are provided with the repository currently, the DefaultOptimizer
and the TorchTextOptimizer. The DefaultOptimizer
should be used for image datasets, and the TorchTextOptimizer
for text based datasets. The RunnerConfig
can accept any optimizer object that implements the OptimizerInterface
, or it can accept a DefaultOptimizerConfig
object and will configure the DefaultOptimizer
according to the specified configuration. Thus, the Runner
can be viewed a fundamental component to generate a model given a specification and corresponding configuration.
The ModelGenerator
can be used to scale up model generation, by deploying the Runner
in parallel on a single machine, or across a HPC cluster or cloud infrastructure. Two model generators are provided, that support single machine model generation model_generator.py, and HPC based model generation uge_model_generator.py.
Class Descriptions¶
DataManager¶
This object facilitates data management between the user and the module. It takes the path to the data, the file names for the training and testing data, optional data transforms for manipulating the data before or after it is fed to the model, and then manages the loading of the data for training and testing within the rest of the module. The DataManager
is configured directly by the user and passed to the RunnerConfig
.
ArchitectureFactory¶
This is a factory object which is responsible for creating new instances of trainable models. It is used by the Runner to instantiate a fresh, trainable module, to be trained by an Optimizer.
For certain model architectures or data domains, such as text, it may be the case that certain characteristics or attributes of the data are needed in order to properly setup the model that is to be trained. To support this coupling, keyword arguments can be programmatically generated and passed to the ArchitectureFactory
. Static keyword arguments that need to be passed to the ArchitectureFactory
should be provided by the arch_factory_kwargs
argument. A configurable callable, which can append to the initial static arguments in arch_factory_kwargs
can be defined via the arch_factory_kwargs_generator
argument. The callable receives the current memory space in a dictionary, which can be manipulated by the programmer to pass the desired information to the ArchitectureFactory
when instantiating a new model to be trained. Both the arch_factory_kwargs
and arch_factory_kwargs_generator
are optional and default to no keyword arguments being passed to the architecture factory.
Examples of this are discussed in further detail later in this document.
OptimizerInterface¶
The Runner
trains a model by using a subclass of the OptimizerInterface
object. The OptimizerInterface
is an ABC which requires implementers to define train
and test
methods defining how to train and test a model. A default optimizer useful for image datasets is provided in trojai.modelgen.default_optimizer.DefaultOptimizer
. A text optimizer is useful for text datasets and is provided in the trojai.modelgen.torchtext_optimizer.TorchTextOptimizer
. The user is also free to specify custom training and test routines by implementing the OptimizerInterface
interface.
Adversarial Training¶
The trojai
codebase also supports adversarial training for image and text datasets. Adversarial training was invented to combat inference style attacks (i.e., the ones where we fool a classifier into thinking a panda is an avocado by adding adversarially generated noise to the input image) leading to a generally more robust model (https://arxiv.org/pdf/1412.6572.pdf). Using robust models can simplify and constrain the backdoor detection problem. More specifically, using more robust models to train trojan detectors allows one to avoid “false positives” (ie. detecting naturally occurring triggers over the intentionally inserted triggers) and thereby study intentionally inserted triggers more effectively. Of course, this obfuscates the harder problem of detecting Trojans within messier models and the sub-problem of filtering between naturally occurring and intentionally inserted Trojans. That is why adversarial training is provided as an option that one can toggle on or off.
At a high level, adversarial training works by augmenting the training dataset with data points which are adversarially generated, while the output label is kept constant. This effectively allows one to optimize the neural network to correctly classify adversarial images, and the hope is that by training the neural network in this way, . More technically, we are seeking to minimize the empirical adversarial risk of the classifier rather than the traditional risk. For more details, visit this excellent tutorial.
We have implemented two different optimizers, which implement adversarial training. The first uses the projected gradient descent (PGD) method to generate adversarial examples. Briefly, PGD generates adversarial examples by maximizing the loss of the input sample + perturbation against the output, while constraining the perturbation to be within a norm-ball. For more details, please refer to the PGD paper. PGD is a general approach, but requires several iterations to find a good perturbation vector, which usually slows down training.
The second approach attempts to address the training speed by using a less computationally expensive way of generating adversarial examples, known as the fast sign gradient method (FSGM). In FSGM, adversarial examples are generated by first computing the sign the gradient of the loss with respect to the input, and then stepping the perturbation vector in the direction of the gradient by a pre-defined epsilon. This process requires only one iteration, so it is computationally much less intensive than the PGD method. However, because the iteration has no feedback, the attack is shown to be less effective, and was initially not used for adversarial training for this reason. The paper Fast is better than free: Revisiting Adversarial Training then showed that FSGM could indeed be used successfully for adversarial training if the perturbation vector is first initialized randomly and applied to the input, before computing the gradient and making a step in that direction. Here, the initialization of each element is drawn independently from the uniform distribution U(-eps, eps). This second optimizer is implemented here.
Runner¶
The Runner
generates a model, given a RunnerConfig
configuration object.
ModelGenerator¶
The ModelGenerator
is an interface for running the Runner
, potentially parallelizing or running in parallel over a cluster or cloud interface.
For additional information about each object, see its documentation.
Scalability¶
One of the motivations for creating the trojai
package was to enable large scale backdoor model generation, easily and quickly. The configuration objects and the infrastructure attempt to address the “easy” objective. To accelerate model generation, automated mixed precision (AMP) optimization is included in the trojai
package. AMP is supported natively by PyTorch beginning with v1.7, and is effectively an engine which automatically converts some GPU operations from 32-bit floating point to 16-bit floating point (thereby increasing speed) while maintaining the same performance. It can be easily enabled when training models by simply setting the use_amp=True
flag when configuring the Runner or ModelGenerator.
Model Generation Examples¶
Generating models requires experiment definitions, in the format produced by the trojai.datagen
module. Three scripts which integrate the data generation using trojai.datagen
submodule, and the model generation using the trojai.modelgen
submodule are:
gen_and_train_mnist.py - this script generates an MNIST dataset with an “pattern backdoor” trigger as described in the BadNets paper, and trains a model on a 20% poisoned dataset to mimic the paper’s results.
gen_and_train_mnist_sequential.py - this script generates the same MNIST dataset described above, but trains a model using an experimental feature we call “sequential” training, where the model is first trained on a clean (no-trigger) MNIST dataset and then on the poisoned dataset.
gen_and_train_cifar10.py - this script generates CIFAR10 dataset with one class triggered using a Gotham Instagram filter, and trains a model on various dataset poisoning percentages.
gen_and_train_imdb_glovebilstm.py - this script generates the IMDB dataset with one class triggered with a sentence, and trains a model on various dataset poisoning percentages.
Contributing¶
trojai
welcomes your contributions! Whether it is a bug report, bug fix,
new feature or documentation enhancements, please help to improve the project!
In general, please follow the scikit-learn contribution guidelines for how to contribute to an open-source project.
If you would like to open a bug report, please open one here. Please try to provide a Short, Self Contained, Example so that the root cause can be pinned down and corrected more easily.
If you would like to contribute a new feature or fix an existing bug, the basic workflow to follow is:
Open an issue with what you would like to contribute to the project and its merits. Some features may be out of scope for
trojai
, so be sure to get the go-ahead before working on something that is outside of the project’s goals.Fork the
trojai
repository, clone it locally, and create your new feature branch.Make your code changes on the branch, commit them, and push to your fork.
Open a pull request.
Please ensure that:
Any new feature has great test coverage.
Any new feature is well documented with numpy-style docstrings & an example, if appropriate and illustrative.
Any bug fix has regression tests.
Comply with PEP8.
Acknowledgements¶
This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
trojai package¶
Subpackages¶
trojai.datagen package¶
Submodules¶
trojai.datagen.common_label_behaviors module¶
-
class
trojai.datagen.common_label_behaviors.
StaticTarget
(target)[source]¶ Bases:
trojai.datagen.label_behavior.LabelBehavior
Sets label to a defined value
-
class
trojai.datagen.common_label_behaviors.
WrappedAdd
(add_val: int, max_num_classes: int = None)[source]¶ Bases:
trojai.datagen.label_behavior.LabelBehavior
Adds a defined amount to each input label, with an optional maximum value around which labels are wrapped
-
trojai.datagen.common_label_behaviors.
logger
= <Logger trojai.datagen.common_label_behaviors (WARNING)>¶ Defines some common behaviors which are used to modify labels when designing an experiment with triggered and clean data
trojai.datagen.config module¶
-
class
trojai.datagen.config.
TrojAICleanDataConfig
(sign_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, merge_obj: trojai.datagen.merge_interface.Merge = None, combined_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None)[source]¶ Bases:
object
-
class
trojai.datagen.config.
ValidInsertLocationsConfig
(algorithm: str = 'brute_force', min_val: Union[int, Sequence[int]] = 0, threshold_val: Union[float, Sequence[float]] = 5.0, num_boxes: int = 5, allow_overlap: Union[bool, Sequence[bool]] = False)[source]¶ Bases:
object
Specifies which algorithm to use for determining the valid spots for trigger insertion on an image and all relevant parameters
-
class
trojai.datagen.config.
XFormMergePipelineConfig
(trigger_list: Sequence[trojai.datagen.entity.Entity] = None, trigger_sampling_prob: Sequence[float] = None, trigger_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, trigger_bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, trigger_bg_merge: trojai.datagen.merge_interface.Merge = None, trigger_bg_merge_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, overall_bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, overall_bg_triggerbg_merge: trojai.datagen.merge_interface.Merge = None, overall_bg_triggerbg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, merge_type: str = 'insert', per_class_trigger_frac: float = None, triggered_classes: Union[str, Sequence[Any]] = 'all')[source]¶ Bases:
object
Defines all configuration items necessary to run the XFormMerge Pipeline, and associated configuration validation.
NOTE: the argument list can be condensed into lists of lists, but that becomes a bit less intuitive to use. We need to think about how best we want to specify these argument lists.
-
trojai.datagen.config.
logger
= <Logger trojai.datagen.config (WARNING)>¶ Contains classes which define configuration used for transforming and modifying objects, as well as the associated validation routines. Ideally, a configuration class should be defined for every pipeline that is defined.
trojai.datagen.constants module¶
-
trojai.datagen.constants.
RANDOM_STATE_DRAW_LIMIT
= 4294967295¶ In the data generation process, every new entity that is generated gets a new random seed by drawing from np.random.RandomState.randint(), where the RandomState object comes from a master RandomState created at the beginning of the data generation process. The constant RANDOM_STATE_DRAW_LIMIT defines the argument passed into the randint(…) call.
The reason we create a new seed for every Entity is to enable reproducibility. Each Entity that is created may go through a series of transformations that include randomness at various stages. As such, having a seed associated with each Entity will enable us to reproduce those specific random variations easily.
trojai.datagen.datatype_xforms module¶
-
class
trojai.datagen.datatype_xforms.
ToTensorXForm
(num_dims: int = 3)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Transformation which defines the conversion of an input array to a tensor of a specified # of dimensions
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the actual to->tensor conversion :param input_obj: the input Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity
-
-
trojai.datagen.datatype_xforms.
logger
= <Logger trojai.datagen.datatype_xforms (WARNING)>¶ Defines data type transformations that may need to occur when processing different data sources
trojai.datagen.entity module¶
-
class
trojai.datagen.entity.
Entity
[source]¶ Bases:
abc.ABC
An Entity is a generalization of a synthetic object. It could stand alone, or a composition of multiple entities. An Entity is composed of some data.See the README for further details on how Entity objects are intended to be used in the TrojAI pipeline.
-
trojai.datagen.entity.
logger
= <Logger trojai.datagen.entity (WARNING)>¶ Defines a generic Entity object, and an Entity convenience wrapper for creating Entities from numpy arrays.
trojai.datagen.experiment module¶
-
class
trojai.datagen.experiment.
ClassicExperiment
(data_root_dir: str, trigger_label_xform: trojai.datagen.label_behavior.LabelBehavior, stratify_split: bool = True)[source]¶ Bases:
object
Defines a classic experiment, which consists of: 1) a specification of the clean data 2) a specification of the modified (triggered) data, and 3) a specification of the split of triggered/clean data for training/testing the model
-
create_experiment
(clean_data_csv: str, experiment_data_folder: str, mod_filename_filter: str = '*', split_clean_trigger: bool = False, trigger_frac: float = 0.2, triggered_classes: Union[str, Sequence[Any]] = 'all', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5BF71F5A0) → Union[Tuple, pandas.core.frame.DataFrame][source]¶ - Creates an “experiment,” which is a dataframe defining the data that should be used, and whether that data is
triggered or not, and the true & actual label associated with that data point.
- TODO:
- [] - Have ability to accept multiple mod_data_folders such that we can sample from them all at a specified
probability to have different triggers
- Parameters
clean_data_csv – path to file which contains a CSV specification of the clean data. The CSV file is expected to have the following columns: [file, label]
experiment_data_folder – the folder which contains the data to mix with for the experiment.
mod_filename_filter – a string filter for determining which files in the folder to consider, if only a a subset is to be considered for sampling
split_clean_trigger – if True, then we return a list of DataFrames, where the triggered & non-triggered data are combined into one DataFrame, if False, we concatenate the triggered and non-triggered data into one DataFrame
trigger_frac – the fraction of data which which should be triggered
triggered_classes – either the string ‘all’, or a Sequence of labels which are to be triggered. If this parameter is ‘all’, then all classes will be triggered in the created experiment. Otherwise, only the classes in the list will be triggered at the percentage requested in the trigger_frac argument of the create_experiment function.
random_state_obj – random state object
- Returns
a dataframe of the data which consists of the experiment. The DataFrame has the following columns: file, true_label, train_label, triggered file - the file path of the data true_label - the actual label of the data train_label - the label of the data the model should be trained on.
This will be equal to true_label if triggered==False
triggered - a boolean value indicating whether this particular sample has a Trigger or not
-
-
trojai.datagen.experiment.
logger
= <Logger trojai.datagen.experiment (WARNING)>¶ Module which contains functionality for generating experiments
trojai.datagen.image_affine_xforms module¶
-
class
trojai.datagen.image_affine_xforms.
PerspectiveXForm
(xform_matrix)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Shifts the perspective of an input Entity
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Performs the perspective shift on the input Entity. :param input_obj: the Entity to be transformed according to the specified perspective shift in the constructor. :param random_state_obj: ignored :return: the transformed Entity
-
-
class
trojai.datagen.image_affine_xforms.
RandomPerspectiveXForm
(perspectives: Sequence[str] = None)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Randomly shifts perspective of input Entity in available perspectives.
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Samples from the possible perspectives according to the sampler specification and then applies that perspective to the input object :param input_obj: Entity to be randomly perspective shifted :param random_state_obj: allows for reprodcible sampling of random perspectives :return: the transformed Entity
-
-
class
trojai.datagen.image_affine_xforms.
RandomRotateXForm
(angle_choices: Sequence[float] = None, angle_sampler_prob: Sequence[float] = None, rotator_kwargs: Dict = None)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Implements a rotation of a random amount of degrees.
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Samples from the possible angles according to the sampler specification and then applies that rotation to the input object :param input_obj: Entity to be randomly rotated :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed Entity
-
-
class
trojai.datagen.image_affine_xforms.
RotateXForm
(angle: int = 90, args: tuple = (), kwargs: dict = None)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Implements a rotation of an Entity by a specified angle amount.
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Performs the rotation specified by the RotateXForm object on an input :param input_obj: The Entity to be rotated :param random_state_obj: ignored :return: the transformed Entity
-
-
class
trojai.datagen.image_affine_xforms.
UniformScaleXForm
(scale_factor: float = 1, kwargs: dict = None)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Implements a uniform scale of a specified amount to an Entity
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Performs the scaling on an input Entity using skimage.transform.rescale :param input_obj: the input object to be scaled :param random_state_obj: ignored :return: the transformed Entity
-
-
trojai.datagen.image_affine_xforms.
get_predefined_perspective_xform_matrix
(xform_str: str, rows: int, cols: int) → numpy.ndarray[source]¶ Returns an affine transform matrix for a string specification of a perspective transformation :param xform_str: a string specification of the perspective to transform
the object into.
- Parameters
rows – the number of rows of the image to be transformed to the specified perspective
cols – the number of cols of the image to be transformed to the specified perspective
- Returns
a numpy array of shape (2,3) which specifies the affine transformation.
See:https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=getaffinetransform for more information
-
trojai.datagen.image_affine_xforms.
logger
= <Logger trojai.datagen.image_affine_xforms (WARNING)>¶ Module defines several affine transforms using various libraries to perform the actual transformation operation specified.
trojai.datagen.image_conversion_utils module¶
-
trojai.datagen.image_conversion_utils.
gray_to_rgb
(img: numpy.ndarray) → numpy.ndarray[source]¶ Convert given grayscale image to RGB :param img: 1-channel grayscale image :return: image converted to RGB
-
trojai.datagen.image_conversion_utils.
logger
= <Logger trojai.datagen.image_conversion_utils (WARNING)>¶ Contains general utilities for dealing with channel formats
-
trojai.datagen.image_conversion_utils.
normalization_from_rgb
(rgb_img: numpy.ndarray, alpha_ch: Optional[numpy.ndarray], normalize: bool, original_n_chan: int, name: str) → numpy.ndarray[source]¶ Guard for output from rgb-only xforms :param rgb_img: 3-channel RGB image result from calling xform :param alpha_ch: alpha channel extracted at beginning of calling xform or None :param normalize: whether to convert rgb_img back to its original channel format :param original_n_chan: number of channels in its original channel format :param name: name of calling xform :return: if normalize is True the image corresponding to rgb_img converted to its original channel format, otherwise rgb_img unmodified, additional conversions can be added below, currently only RGB to RGBA is implemented
-
trojai.datagen.image_conversion_utils.
normalization_to_rgb
(img: numpy.ndarray, normalize: bool, name: str) → Tuple[numpy.ndarray, Optional[numpy.ndarray]][source]¶ Guard for input to RGB only xforms :param img: input image with variable number of channels :param normalize: whether to attempt to convert img from original channel format to 3-channel RGB :param name: name of calling xform :return: a 3-channel RGB array converted from img, additional conversions can be added below, currently only RGBA to RGB is implemented
-
trojai.datagen.image_conversion_utils.
rgb_to_rgba
(img, alpha_ch: Optional[numpy.ndarray] = None) → numpy.ndarray[source]¶ Converts given image to RGBA, with optionally provided alpha_ch as its alpha channel :param img: 3-channel RGB image or 4-channel RGBA image :param alpha_ch: 1-channel array to be used as alpha value (optional), if img is RGBA this value is ignored :return: if img is 4-channel it is returned unmodified, if img is 3-channel this will return a new RGBA image with img as its RGB channels and either alpha_ch as its alpha channel if provided or a fully opaque alpha channel (max value for its datatype)
-
trojai.datagen.image_conversion_utils.
rgba_to_rgb
(img: numpy.ndarray) → Tuple[numpy.ndarray, Optional[numpy.ndarray]][source]¶ Split given 4-channel RGBA array into a 3-channel RGB array and a 1-channel alpha array :param img: given image to split, must be 3-channel or 4-channel :return: the first three channels of data as a 3-channel RGB image and the fourth channel of img as either a 1-channel alpha array, or None if img has only 3 channels
trojai.datagen.image_entity module¶
-
class
trojai.datagen.image_entity.
GenericImageEntity
(data: numpy.ndarray, mask: numpy.ndarray = None)[source]¶ Bases:
trojai.datagen.image_entity.ImageEntity
A class which allows one to easily instantiate an ImageEntity object with an image and associated mask
-
class
trojai.datagen.image_entity.
ImageEntity
[source]¶ Bases:
trojai.datagen.entity.Entity
-
trojai.datagen.image_entity.
logger
= <Logger trojai.datagen.image_entity (WARNING)>¶ Defines a generic Entity object, and an Entity convenience wrapper for creating Entities from numpy arrays.
trojai.datagen.image_insert_utils module¶
-
trojai.datagen.image_insert_utils.
pattern_fit
(chan_img: numpy.ndarray, chan_pattern: numpy.ndarray, chan_location: Sequence[Any]) → bool[source]¶ Returns True if the pattern at the desired location can fit into the image channel without wrap, and False otherwise
- Parameters
chan_img – a numpy.ndarray of shape (nrows, ncols) which represents an image channel
chan_pattern – a numpy.ndarray of shape (prows, pcols) which represents a channel of the pattern
chan_location – a Sequence of length 2, which contains the x/y coordinate of the top left corner of the pattern to be inserted for this specific channel
- Returns
True/False depending on whether the pattern will fit into the image
-
trojai.datagen.image_insert_utils.
valid_locations
(img: numpy.ndarray, pattern: numpy.ndarray, algo_config: trojai.datagen.config.ValidInsertLocationsConfig, protect_wrap: bool = True) → numpy.ndarray[source]¶ Returns a list of locations per channel which the pattern can be inserted into the img_channel with an overlap algorithm dictated by the appropriate inputs
- Parameters
img – a numpy.ndarray which represents the image of shape: (nrows, ncols, nchans)
pattern – the pattern to be inserted into the image of shape: (prows, pcols, nchans)
algo_config – The provided configuration object specifying the algorithm to use and necessary parameters
protect_wrap – if True, ensures that pattern to be inserted can fit without wrapping and raises an Exception otherwise
- Returns
A boolean mask of the same shape as the input image, with True indicating that that pixel is a valid location for placement of the specified pattern
trojai.datagen.image_size_xforms module¶
-
class
trojai.datagen.image_size_xforms.
Pad
(pad_amounts: tuple = (0, 0, 0, 0), mode: str = 'constant', pad_value: int = 0)[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Resizes an Entity
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object
-
-
class
trojai.datagen.image_size_xforms.
RandomPadToSize
(new_size: tuple = (200, 200), mode: str = 'constant', pad_value: int = 0)[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Resizes an Entity
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object
-
-
class
trojai.datagen.image_size_xforms.
RandomResize
(new_size_minimum: tuple = (200, 200), new_size_maximum: tuple = (300, 300), interpolation: int = 2)[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Resizes an Entity
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object
-
-
class
trojai.datagen.image_size_xforms.
RandomSubCrop
(new_size: tuple = (200, 200))[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Resizes an Entity
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the resizing :param img_obj: The input object to be cropped according the specified configuration :param random_state_obj: ignored :return: The cropped object
-
-
class
trojai.datagen.image_size_xforms.
Resize
(new_size: tuple = (200, 200), interpolation: int = 2)[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Resizes an Entity
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object
-
-
trojai.datagen.image_size_xforms.
logger
= <Logger trojai.datagen.image_size_xforms (WARNING)>¶ Module contains various classes that relate to size transformations of input objects
trojai.datagen.image_triggers module¶
-
class
trojai.datagen.image_triggers.
RandomRectangularPattern
(num_rows: int, num_cols: int, num_chan: int, color_algorithm: str = 'channel_assign', color_options: dict = None, pattern_style='graffiti', dtype=<class 'numpy.uint8'>, random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B287A160)[source]¶ Bases:
trojai.datagen.image_entity.ImageEntity
Defines a random rectangular pattern
-
class
trojai.datagen.image_triggers.
RectangularPattern
(num_rows: int, num_cols: int, num_chan: int, cval: int, dtype=<class 'numpy.uint8'>)[source]¶ Bases:
trojai.datagen.image_entity.ImageEntity
Define a rectangular pattern
-
class
trojai.datagen.image_triggers.
ReverseLambdaPattern
(num_rows: int, num_cols: int, num_chan: int, trigger_cval: Union[int, Sequence[int]], bg_cval: Union[int, Sequence[int]] = 0, thickness: int = 1, pattern_style: str = 'graffiti', dtype=<class 'numpy.uint8'>)[source]¶ Bases:
trojai.datagen.image_entity.ImageEntity
Defines an alpha pattern
-
trojai.datagen.image_triggers.
logger
= <Logger trojai.datagen.image_triggers (WARNING)>¶ Defines various Trigger Entity objects
trojai.datagen.insert_merges module¶
-
class
trojai.datagen.insert_merges.
FixedInsertTextMerge
(location: int)[source]¶ Bases:
trojai.datagen.merge_interface.TextMerge
-
do
(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState)[source]¶ Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity
-
-
class
trojai.datagen.insert_merges.
InsertAtLocation
(location: numpy.ndarray, protect_wrap: bool = True)[source]¶ Bases:
trojai.datagen.merge_interface.ImageMerge
Inserts a provided pattern at a specified location
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Inserts a pattern into an image, using the mask of the pattern to determine which specific pixels are modifiable :param img_obj: The background image into which the pattern is inserted :param pattern_obj: The pattern to be inserted. The mask associated with the pattern is used to determine which
specific pixes of the pattern are inserted into the img_obj
- Parameters
random_state_obj – ignored
- Returns
The merged object
-
-
class
trojai.datagen.insert_merges.
InsertAtRandomLocation
(method: str, algo_config: trojai.datagen.config.ValidInsertLocationsConfig, protect_wrap: bool = True)[source]¶ Bases:
trojai.datagen.merge_interface.ImageMerge
Inserts a provided pattern at a random location, where valid locations are determined according to a provided algorithm specification
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the specified merge on the input Entities and return the merged Entity :param img_obj: the image object into which the pattern is to be inserted :param pattern_obj: the pattern object to be inserted :param random_state_obj: used to sample from the possible valid locations, by providing a random state,
we ensure reproducibility of the data
- Returns
the merged Entity
-
-
class
trojai.datagen.insert_merges.
InsertRandomLocationNonzeroAlpha
[source]¶ Bases:
trojai.datagen.merge_interface.ImageMerge
Inserts a defined pattern into an image in a randomly selected location where the alpha channel is non-zero
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the described merge operation :param img_obj: The input object into which the pattern is to be inserted :param pattern_obj: The pattern object which is to be inserted into the image :param random_state_obj: used to sample from the possible valid locations, by providing a random state,
we ensure reproducibility of the data
- Returns
the merged object
-
-
class
trojai.datagen.insert_merges.
InsertRandomWithMask
[source]¶ Bases:
trojai.datagen.merge_interface.ImageMerge
Inserts a defined pattern into an image in a randomly selected location where the specified mask is True
-
do
(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the described merge operation :param img_obj: The input object into which the pattern is to be inserted :param pattern_obj: The pattern object which is to be inserted into the image :param random_state_obj: used to sample from the possible valid locations, by providing a random state,
we ensure reproducibility of the data
- Returns
the merged object
-
-
class
trojai.datagen.insert_merges.
RandomInsertTextMerge
[source]¶ Bases:
trojai.datagen.merge_interface.TextMerge
-
do
(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState)[source]¶ Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity
-
-
trojai.datagen.insert_merges.
logger
= <Logger trojai.datagen.insert_merges (WARNING)>¶ Module which defines several insert style merge operations.
trojai.datagen.instagram_xforms module¶
-
class
trojai.datagen.instagram_xforms.
FilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.transform_interface.ImageTransform
Create filter xform, if no channel order is specified it is assumed to be in BGR order (opencv default), this refers only to the first 3 channels of input data as the alpha channel is handled independently
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Compresses 3-channel image input image as a specified filetype and stores in memory, passes to into wand and applies filter, stores filtered image as specified filetype again in memory, which is then decompressed back into 3-channel image :param input_obj: entity to be transformed :param random_state_obj: object to hold random state and enable reproducibility :return:new entity with transform applied
-
-
class
trojai.datagen.instagram_xforms.
GothamFilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
Class implementing Instagram’s Gotham filter
-
filter
(image: wand.image.Image) → wand.image.Image[source]¶ modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/gotham.py :param image: provided image :return: new filtered image
-
-
class
trojai.datagen.instagram_xforms.
KelvinFilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
Class implementing Instagram’s Kelvin filter
-
filter
(image: wand.image.Image) → wand.image.Image[source]¶ modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/kelvin.py :param image: provided image :return: new filtered image
-
-
class
trojai.datagen.instagram_xforms.
LomoFilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
Class implementing Instagram’s Lomo filter
-
filter
(image: wand.image.Image) → wand.image.Image[source]¶ modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/lomo.py :param image: provided image :return: new filtered image
-
-
class
trojai.datagen.instagram_xforms.
NashvilleFilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
Class implementing Instagram’s Nashville filter
-
filter
(image: wand.image.Image) → wand.image.Image[source]¶ modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/nashville.py :param image: :return: new filtered image
-
-
class
trojai.datagen.instagram_xforms.
NoOpFilterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
No operation Transform for testing purposes
-
class
trojai.datagen.instagram_xforms.
ToasterXForm
(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]¶ Bases:
trojai.datagen.instagram_xforms.FilterXForm
Class implementing Instagram’s Toaster filter
-
filter
(image: wand.image.Image) → wand.image.Image[source]¶ modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/toaster.py :param image: provided image :return: new filtered image
-
trojai.datagen.label_behavior module¶
trojai.datagen.merge_interface module¶
-
class
trojai.datagen.merge_interface.
ImageMerge
[source]¶ Bases:
trojai.datagen.merge_interface.Merge
Subclass of merges for image entities. Prevents the usage of a text merge on an image entity, which has a distinct underlying data structure.
-
abstract
do
(obj1: trojai.datagen.image_entity.ImageEntity, obj2: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity
-
abstract
-
class
trojai.datagen.merge_interface.
Merge
[source]¶ Bases:
abc.ABC
A Merge is defined as an operation on two Entities and returns a single Entity
-
abstract
do
(obj1: trojai.datagen.entity.Entity, obj2: trojai.datagen.entity.Entity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]¶ Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity
-
abstract
-
class
trojai.datagen.merge_interface.
TextMerge
[source]¶ Bases:
trojai.datagen.merge_interface.Merge
Subclass of merges for text entities. Prevents the usage of an image merge on a text entity, which has a distinct underlying data structure.
-
abstract
do
(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.text_entity.TextEntity[source]¶ Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity
-
abstract
trojai.datagen.pipeline module¶
-
class
trojai.datagen.pipeline.
Pipeline
[source]¶ Bases:
object
A pipeline is a composition of Entities, Transforms, and Merges to produce an output Entity
-
abstract
process
(imglist: Iterable[trojai.datagen.entity.Entity], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]¶ The method which executes the pipeline, moving data through each of Transform & Merge objects, with data flow being defined by the implementation. :param imglist: A list of Entity objects to be processed by the Pipeline :param random_state_obj: a random state to pass to the transforms and merge operation to ensure
reproducibility of Entities produced by the pipeline
- Returns
The output of the pipeline
-
abstract
trojai.datagen.static_color_xforms module¶
-
class
trojai.datagen.static_color_xforms.
GrayscaleToRGBXForm
[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Converts an 3-channel grayscale image to RGB
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Convert the input object from 3-channel grayscale to RGB :param input_obj: Entity to be colorized :param random_state_obj: ignored :return: The colorized entity
-
-
class
trojai.datagen.static_color_xforms.
RGBAtoRGB
[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Converts input Entity from RGBA to RGB
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the RGBA to RGB transformation :param input_obj: the Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity
-
-
class
trojai.datagen.static_color_xforms.
RGBtoRGBA
[source]¶ Bases:
trojai.datagen.transform_interface.Transform
Converts input Entity from RGB to RGBA
-
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the RGBA to RGB transformation :param input_obj: the Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity
-
-
trojai.datagen.static_color_xforms.
logger
= <Logger trojai.datagen.static_color_xforms (WARNING)>¶ Defines several transformations related to static (non-random) color manipulation
trojai.datagen.transform_interface module¶
-
class
trojai.datagen.transform_interface.
ImageTransform
[source]¶ Bases:
trojai.datagen.transform_interface.Transform
A Transform specific to ImageEntity objects
-
abstract
do
(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]¶ Perform the specified transformation :param input_obj: the input ImageEntity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed ImageEntity
-
abstract
-
class
trojai.datagen.transform_interface.
TextTransform
[source]¶ Bases:
trojai.datagen.transform_interface.Transform
A Transform specific to TextEntity objects
-
abstract
do
(input_obj: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.text_entity.TextEntity[source]¶ Perform the specified transformation :param input_obj: the input TextEntity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed TextEntity
-
abstract
-
class
trojai.datagen.transform_interface.
Transform
[source]¶ Bases:
abc.ABC
A Transform is defined as an operation on an Entity.
-
abstract
do
(input_obj: trojai.datagen.entity.Entity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]¶ Perform the specified transformation :param input_obj: the input Entity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed Entity
-
abstract
trojai.datagen.utils module¶
-
trojai.datagen.utils.
logger
= <Logger trojai.datagen.utils (WARNING)>¶ Contains general utilities helpful for data generation
-
trojai.datagen.utils.
process_xform_list
(input_obj: trojai.datagen.entity.Entity, xforms: Iterable[trojai.datagen.transform_interface.Transform], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]¶ Processes a list of transformations in a serial fashion on a copy of the input X :param input_obj: input object which should be transformed by the list of
transformations
- Parameters
xforms – a list of Transform objects
random_state_obj –
- Returns
The transformed object
trojai.datagen.xform_merge_pipeline module¶
-
class
trojai.datagen.xform_merge_pipeline.
XFormMerge
(xform_list: Sequence[Sequence[Sequence[trojai.datagen.transform_interface.Transform]]], merge_list: Sequence[trojai.datagen.merge_interface.Merge], final_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None)[source]¶ Bases:
trojai.datagen.pipeline.Pipeline
Implements a pipeline which is a series of cascading transform and merge operations. The following diagram shows 4 objects as a series of serial transforms + merges. Each pair of transformations is considered a “stage”, and stages are processed in serial fashion. In the diagram below, the data that each stage processes is:
Stage1: obj1, obj2 Stage2: Stage1_output, obj3 Stage3: Stage2_output, obj4
This extends in the obvious way to more objects, depending on how deep the pipeline is.
- obj1 –> xform obj3 –> xform obj4 –> xform
+ –> xform –> + –> xform –> + –> xform output /
obj2 –> xform
-
process
(imglist: Sequence[trojai.datagen.entity.Entity], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]¶ Processes the provided objects according to the Xform->Merge->Xform paradigm. :param imglist: a sequence of Entity objects to be processed according to the pipeline :param random_state_obj: a random state to pass to the transforms and merge operation to ensure
reproducibility of Entities produced by the pipeline
- Returns
the modified & combined Entity object
-
trojai.datagen.xform_merge_pipeline.
logger
= <Logger trojai.datagen.xform_merge_pipeline (WARNING)>¶ Defines all functions and classes related to the transform+merge pipeline & data movement paradigm.
-
trojai.datagen.xform_merge_pipeline.
modify_clean_image_dataset
(clean_dataset_rootdir: str, clean_csv_file: str, output_rootdir: str, output_subdir: str, mod_cfg: trojai.datagen.config.XFormMergePipelineConfig, method: str = 'insert', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B25C4AF0) → None[source]¶ Modifies a clean dataset given a configuration
- Parameters
clean_dataset_rootdir – root directory where the clean data lives
clean_csv_file – filename of the CSV file which contains information about the clean data The modification method determines which columns and information are expected in the CSV file.
output_rootdir – root directory where the modified data will be stored
output_subdir –
subdirectory where the modified data will be stored. This is expected to be one level below the root-directory, and can prove useful if different types of modifications are stored in different subdirectories under the main root directory. An example tree structure might be: root_data
- modification_1
… data …
- modification_2
… data …
mod_cfg – A configuration object for creating a modified dataset
method – Can be “insert” only/ In the insert method, the function takes the clean image, and inserts a specified Entity (likely, a pattern) into the clean image. Additional modes to be added!
random_state_obj – RandomState object to ensure reproduciblity of dataset
- Returns
None
-
trojai.datagen.xform_merge_pipeline.
modify_clean_text_dataset
(clean_dataset_rootdir: str, clean_csv_file: str, output_rootdir: str, output_subdir: str, mod_cfg: trojai.datagen.config.XFormMergePipelineConfig, method='insert', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B25C4C00) → None[source]¶ Modifies a clean image dataset given a configuration
- Parameters
clean_dataset_rootdir – root directory where the clean data lives
clean_csv_file – filename of the CSV file which contains information about the clean data The modification method determines which columns and information are expected in the CSV file.
output_rootdir – root directory where the modified data will be stored
output_subdir –
subdirectory where the modified data will be stored. This is expected to be one level below the root-directory, and can prove useful if different types of modifications are stored in different subdirectories under the main root directory. An example tree structure might be: root_data
- modification_1
… data …
- modification_2
… data …
mod_cfg – A configuration object for creating a modified dataset
method – Can only be “insert” In the insert method, the function takes the clean text blurb, and inserts a specified TextEntity (likely, a pattern) into the first text input object.
random_state_obj – RandomState object to ensure reproduciblity of dataset
- Returns
None
-
trojai.datagen.xform_merge_pipeline.
subset_clean_df_by_labels
(df, labels_to_include)[source]¶ Subsets a dataframe with an expected column ‘label’, to only keep rows which are in that list of labels to include :param df: the dataframe to subset :param labels_to_include: a list of labels to include, or a string ‘all’ indicating that everything should be kept :return: the subsetted data frame
Module contents¶
trojai.modelgen package¶
Subpackages¶
trojai.modelgen.architectures package¶
-
class
trojai.modelgen.architectures.cifar10_architectures.
AlexNet
(num_classes=10)[source]¶ Bases:
torch.nn.Module
Modified AlexNet for CIFAR From: https://github.com/icpm/pytorch-cifar10/blob/master/models/AlexNet.py
-
class
trojai.modelgen.architectures.cifar10_architectures.
Bottleneck
(in_planes, growth_rate)[source]¶ Bases:
torch.nn.Module
Bottleneck module in DenseNet Arch. See: https://arxiv.org/abs/1608.06993
-
class
trojai.modelgen.architectures.cifar10_architectures.
DenseNet
(block, num_block, growth_rate=12, reduction=0.5, num_classes=10)[source]¶ Bases:
torch.nn.Module
From: https://github.com/icpm/pytorch-cifar10/blob/master/models/DenseNet.py
-
class
trojai.modelgen.architectures.cifar10_architectures.
Transition
(in_planes, out_planes)[source]¶ Bases:
torch.nn.Module
Transition module in DenseNet Arch. See: https://arxiv.org/abs/1608.06993
-
class
trojai.modelgen.architectures.mnist_architectures.
BadNetExample
[source]¶ Bases:
torch.nn.Module
Mnist network from BadNets paper Input - 1x28x28 C1 - 1x28x28 (5x5 kernel) -> 16x24x24 ReLU S2 - 16x24x24 (2x2 kernel, stride 2) Subsampling -> 16x12x12 C3 - 16x12x12 (5x5 kernel) -> 32x8x8 ReLU S4 - 32x8x8 (2x2 kernel, stride 2) Subsampling -> 32x4x4 F6 - 512 -> 512 tanh F7 - 512 -> 10 Softmax (Output)
-
class
trojai.modelgen.architectures.mnist_architectures.
ModdedLeNet5Net
(channels=1)[source]¶ Bases:
torch.nn.Module
A modified LeNet architecture that seems to be easier to embed backdoors in than the network from the original badnets paper Input - (1 or 3)x28x28 C1 - 6@28x28 (5x5 kernel) ReLU S2 - 6@14x14 (2x2 kernel, stride 2) Subsampling C3 - 16@10x10 (5x5 kernel) ReLU S4 - 16@5x5 (2x2 kernel, stride 2) Subsampling C5 - 120@1x1 (5x5 kernel) F6 - 84 ReLU F7 - 10 (Output)
Submodules¶
trojai.modelgen.architecture_factory module¶
trojai.modelgen.config module¶
-
class
trojai.modelgen.config.
ConfigInterface
[source]¶ Bases:
abc.ABC
Defines the interface for all configuration objects
-
class
trojai.modelgen.config.
DefaultOptimizerConfig
(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None)[source]¶ Bases:
trojai.modelgen.config.OptimizerConfigInterface
Defines the configuration needed to setup the DefaultOptimizer
-
get_device_type
()[source]¶ Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string
-
-
class
trojai.modelgen.config.
DefaultSoftToHardFn
[source]¶ Bases:
object
The default conversion from soft-decision outputs to hard-decision
-
class
trojai.modelgen.config.
EarlyStoppingConfig
(num_epochs: int = 5, val_loss_eps: float = 0.001)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines configuration related to early stopping.
-
class
trojai.modelgen.config.
ModelGeneratorConfig
(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, model_save_dir: str, stats_save_dir: str, num_models: int, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel=False, amp=False, experiment_cfg: dict = None, run_ids: Union[Any, Sequence[Any]] = None, filenames: Union[str, Sequence[str]] = None, save_with_hash: bool = False)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Object used to configure the model generator
-
static
load
(fname: str)[source]¶ Loads a saved modelgen_cfg object from data that was saved using the .save() function. :param fname: the filename where the modelgen_cfg object is saved :return: a ModelGeneratorConfig object
-
static
-
class
trojai.modelgen.config.
ReportingConfig
(num_batches_per_logmsg: int = 100, disable_progress_bar: bool = False, num_epochs_per_metric: int = 1, num_batches_per_metrics: int = 50, tensorboard_output_dir: str = None, experiment_name: str = 'experiment')[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines all options to setup how data is reported back to the user while models are being trained
-
class
trojai.modelgen.config.
RunnerConfig
(arch_factory: trojai.modelgen.architecture_factory.ArchitectureFactory, data: trojai.modelgen.data_manager.DataManager, arch_factory_kwargs: dict = None, arch_factory_kwargs_generator: Callable = None, optimizer: Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig, Sequence[Union[trojai.modelgen.optimizer_interface.OptimizerInterface, trojai.modelgen.config.DefaultOptimizerConfig]]] = None, parallel: bool = False, amp: bool = False, model_save_dir: str = '/tmp/models', stats_save_dir: str = '/tmp/model_stats', model_save_format: str = 'pt', run_id: Any = None, filename: str = None, save_with_hash: bool = False)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Container for all parameters needed to use the Runner to train a model.
-
static
setup_optimizer_generator
(optimizer, data)[source]¶ Converts an optimizer specification to a generator, to be compatible with sequential training. :param optimizer: the optimizer to configure into a generator :param num_datasets: the number of datasets for which optimizers need to be created :return: A generator that returns optimizers for every dataset to be trained
-
static
-
class
trojai.modelgen.config.
TorchTextOptimizerConfig
(training_cfg: trojai.modelgen.config.TrainingConfig = None, reporting_cfg: trojai.modelgen.config.ReportingConfig = None, copy_pretrained_embeddings: bool = False)[source]¶ Bases:
trojai.modelgen.config.OptimizerConfigInterface
Defines the configuration needed to setup the TorchTextOptimizer
-
get_device_type
()[source]¶ Returns the device associated w/ this optimizer configuration. Needed to save/load for UGE. :return (str): the device type represented as a string
-
static
load
(fname)[source]¶ Loads a configuration from disk :param fname: the filename where the config is stored :return: the loaded configuration
-
-
class
trojai.modelgen.config.
TrainingConfig
(device: Union[str, torch.device] = 'cpu', epochs: int = 10, batch_size: int = 32, lr: float = 0.0001, optim: Union[str, trojai.modelgen.optimizer_interface.OptimizerInterface] = 'adam', optim_kwargs: dict = None, objective: Union[str, Callable] = 'cross_entropy_loss', objective_kwargs: dict = None, save_best_model: bool = False, train_val_split: float = 0.05, val_data_transform: Callable[[Any], Any] = None, val_label_transform: Callable[[int], int] = None, val_dataloader_kwargs: dict = None, early_stopping: trojai.modelgen.config.EarlyStoppingConfig = None, soft_to_hard_fn: Callable = None, soft_to_hard_fn_kwargs: dict = None, lr_scheduler: Any = None, lr_scheduler_init_kwargs: dict = None, lr_scheduler_call_arg: Any = None, clip_grad: bool = False, clip_type: str = 'norm', clip_val: float = 1.0, clip_kwargs: dict = None, adv_training_eps: float = None, adv_training_iterations: int = None, adv_training_ratio: float = None)[source]¶ Bases:
trojai.modelgen.config.ConfigInterface
Defines all required items to setup training with an optimizer
-
class
trojai.modelgen.config.
UGEConfig
(queues: Union[trojai.modelgen.config.UGEQueueConfig, Sequence[trojai.modelgen.config.UGEQueueConfig]], queue_distribution: Sequence[float] = None, multi_model_same_gpu: bool = False)[source]¶ Bases:
object
Defines a configuration for the UGE
-
class
trojai.modelgen.config.
UGEQueueConfig
(queue_name: str, gpu_enabled: bool, sync_mode: bool = False)[source]¶ Bases:
object
Defines the configuration for a Queue w.r.t. UGE in TrojAI
-
trojai.modelgen.config.
logger
= <Logger trojai.modelgen.config (WARNING)>¶ Defines all configurations pertinent to model generation.
-
trojai.modelgen.config.
modelgen_cfg_to_runner_cfg
(modelgen_cfg: trojai.modelgen.config.ModelGeneratorConfig, run_id=None, filename=None) → trojai.modelgen.config.RunnerConfig[source]¶ Convenience function which creates a RunnerConfig object, from a ModelGeneratorConfig object. :param modelgen_cfg: the ModelGeneratorConfig to convert :param run_id: run_id to be associated with the RunnerConfig :param filename: filename to be associated with the RunnerConfig :return: the created RunnerConfig object
trojai.modelgen.constants module¶
Defines valid devices on which models can be trained
-
trojai.modelgen.constants.
VALID_DEVICES
= ['cpu', 'cuda']¶ Defines valid loss functions which can be specified when configuring an optimizer implementing the OptimizerInterface
-
trojai.modelgen.constants.
VALID_LOSS_FUNCTIONS
= ['cross_entropy_loss', 'BCEWithLogitsLoss']¶ Defines valid optimization algorithms which can be specified when configuring an optimizer implementing the OptimizerInterface
-
trojai.modelgen.constants.
VALID_OPTIMIZERS
= ['adam', 'sgd', 'adamw']¶ Defines the valid types of data that the modelgen pipeline can handle
trojai.modelgen.data_configuration module¶
-
class
trojai.modelgen.data_configuration.
TextDataConfiguration
(max_vocab_size: int = 25000, embedding_dim: int = 100, embedding_type: str = 'glove', num_tokens_embedding_train: str = '6B', text_field_kwargs: dict = None, label_field_kwargs: dict = None)[source]¶
-
trojai.modelgen.data_configuration.
logger
= <Logger trojai.modelgen.data_configuration (WARNING)>¶ Configurations for various types of data
trojai.modelgen.data_descriptions module¶
File describes data description classes, which contain specific information that may be used in order to instantiate an architecture
-
class
trojai.modelgen.data_descriptions.
CSVImageDatasetDesc
(num_samples, shuffled, num_classes)[source]¶ Bases:
trojai.modelgen.data_descriptions.DataDescription
Information potentially relevant to instantiating models to process image data
-
class
trojai.modelgen.data_descriptions.
CSVTextDatasetDesc
(vocab_size, unk_idx, pad_idx)[source]¶ Bases:
trojai.modelgen.data_descriptions.DataDescription
Information potentially relevant to instantiating models to process text data
trojai.modelgen.data_manager module¶
-
class
trojai.modelgen.data_manager.
DataManager
(experiment_path: str, train_file: Union[str, Sequence[str]], clean_test_file: str, triggered_test_file: str = None, data_type: str = 'image', train_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, train_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, test_data_transform: Callable[[Any], Any] = <function DataManager.<lambda>>, test_label_transform: Callable[[int], int] = <function DataManager.<lambda>>, file_loader: Union[Callable[[str], Any], str] = 'default_image_loader', shuffle_train=True, shuffle_clean_test=False, shuffle_triggered_test=False, data_configuration: trojai.modelgen.data_configuration.DataConfiguration = None, custom_datasets: dict = None, train_dataloader_kwargs: dict = None, test_dataloader_kwargs: dict = None)[source]¶ Bases:
object
Manages data from an experiment from trojai.datagen.
-
load_data
()[source]¶ Load experiment data as given from initialization. :return: Objects containing training and test, and triggered data if it was provided.
- TODO:
[ ] - extend the text data-type to have more input arguments, for example the tokenizer and FIELD options [ ] - need to support sequential training for text datasets
-
trojai.modelgen.datasets module¶
-
class
trojai.modelgen.datasets.
CSVDataset
(path_to_data: str, csv_filename: str, true_label=False, path_to_csv=None, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶ Bases:
trojai.modelgen.datasets.DatasetInterface
Defines a dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVDataset can support any underlying data that can be loaded on the fly and fed into the model (for example: image data)
-
class
trojai.modelgen.datasets.
CSVTextDataset
(path_to_data: str, csv_filename: str, true_label: bool = False, text_field: torchtext.data.Field = None, text_field_kwargs: dict = None, label_field: torchtext.data.LabelField = None, label_field_kwargs: dict = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶ Bases:
torchtext.data.Dataset
,trojai.modelgen.datasets.DatasetInterface
Defines a text dataset that is represented by a CSV file with columns “file”, “train_label”, and optionally “true_label”. The file column should contain the path to the file that contains the actual data, and “train_label” refers to the label with which the data should be trained. “true_label” refers to the actual label of the data point, and can differ from train_label if the dataset is poisoned. A CSVTextDataset can support text data, and differs from the CSVDataset because it loads all the text data into memory and builds a vocabulary from it.
-
class
trojai.modelgen.datasets.
DatasetInterface
(path_to_data: str, *args, **kwargs)[source]¶ Bases:
torch.utils.data.Dataset
-
trojai.modelgen.datasets.
csv_dataset_from_df
(path_to_data, data_df, true_label=False, shuffle=False, random_state: Union[int, numpy.random.mtrand.RandomState] = None, data_loader: Union[str, Callable] = 'default_image_loader', data_transform=<function identity_transform>, label_transform=<function identity_transform>)[source]¶ Initializes a CSVDataset object from a DataFrame rather than a filepath. :param path_to_data: root folder where all the data is located :param data_df: the dataframe in which the data lives :param true_label: (bool) if True, then use the column “true_label” as the label associated with each datapoint. If False (default), use the column “train_label” as the label associated with each datapoint :param shuffle: if True, the dataset is shuffled before loading into the model :param random_state: if specified, seeds the random sampler when shuffling the data :param data_loader: either a string value (currently only supports default_image_loader), or a callable
function which takes a string input of the file path and returns the data
- Parameters
data_transform – a callable function which is applied to every data point before it is fed into the model. By default, this is an identity operation
label_transform – a callable function which is applied to every label before it is fed into the model. By default, this is an identity operation.
-
trojai.modelgen.datasets.
csv_textdataset_from_df
(data_df, true_label: bool = False, text_field: torchtext.data.Field = None, label_field: torchtext.data.LabelField = None, shuffle: bool = False, random_state=None, **kwargs)[source]¶ Initializes a CSVDataset object from a DataFrame rather than a filepath. :param data_df: the dataframe in which the data lives :param true_label: if True, then use the column “true_label” as the label associated with each :param text_field: defines how the text data will be converted to
a Tensor. If none, a default will be provided and tokenized with spacy
- Parameters
label_field – defines how to process the label associated with the text
max_vocab_size – the maximum vocabulary size that will be built
shuffle – if True, the dataset is shuffled before loading into the model
random_state – if specified, seeds the random sampler when shuffling the data
kwargs – any additional keyword arguments, currently unused
-
trojai.modelgen.datasets.
logger
= <Logger trojai.modelgen.datasets (WARNING)>¶ Define some basic default functions for dataset defaults. These allow Dataset objects to be pickled; vs lambda functions.
trojai.modelgen.default_optimizer module¶
-
class
trojai.modelgen.default_optimizer.
DefaultOptimizer
(optimizer_cfg: trojai.modelgen.config.DefaultOptimizerConfig = None)[source]¶ Bases:
trojai.modelgen.optimizer_interface.OptimizerInterface
Defines the default optimizer which trains the models
-
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
static
load
(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶ Reconstructs a DefaultOptimizer, by loading the configuration used to construct the original DefaultOptimizer, and then creating a new DefaultOptimizer object from the saved configuration :param fname: The filename of the saved optimzier :return: a DefaultOptimizer object
-
save
(fname: str) → None[source]¶ Saves the configuration object used to construct the DefaultOptimizer. NOTE: because the DefaultOptimizer object itself is not persisted, but rather the
DefaultOptimizerConfig object, the state of the object is not persisted!
- Parameters
fname – the filename to save the DefaultOptimizer’s configuration.
- Returns
None
-
test
(net: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVDataset, triggered_data: trojai.modelgen.datasets.CSVDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None) → dict[source]¶ Test the trained network :param net: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see
DataManger.load_data for more information.
- Parameters
torch_dataloader_kwargs – any keyword arguments to pass directly to PyTorch’s DataLoader
- Returns
a dictionary of the statistics on the clean and triggered data (if applicable)
-
train
(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVDataset, torch_dataloader_kwargs: dict = None, use_amp: bool = False) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the network. :param net: the network to train :param dataset: the dataset to train the network on :param torch_dataloader_kwargs: any additional kwargs to pass to PyTorch’s native DataLoader :param use_amp: if True, uses automated mixed precision for FP16 training. :return: the trained network, and a list of EpochStatistics objects which contain the statistics for training,
and the # of epochs on which the net was trained
-
train_epoch
(model: torch.nn.Module, train_loader: torch.utils.data.DataLoader, val_clean_loader: torch.utils.data.DataLoader, val_triggered_loader: torch.utils.data.DataLoader, epoch_num: int, use_amp: bool = False)[source]¶ Runs one epoch of training on the specified model
- Parameters
model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_clean_loader – a DataLoader object pointing to the validation dataset that is clean
val_triggered_loader – a DataLoader object pointing to the validation dataset that is triggered
epoch_num – the epoch number that is being trained
use_amp – if True use automated mixed precision for FP16 training.
- Returns
a list of statistics for batches where statistics were computed
-
-
trojai.modelgen.default_optimizer.
split_val_clean_trig
(val_dataset)[source]¶ Splits the validation dataset into clean and triggered.
- Parameters
val_dataset – the validation dataset to split
- Returns
A tuple of the clean & triggered validation dataset
-
trojai.modelgen.default_optimizer.
train_val_dataset_split
(dataset: torch.utils.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torch.utils.data.Dataset, torch.utils.data.Dataset)[source]¶ Splits a PyTorch dataset (of type: torch.utils.data.Dataset) into train/test TODO:
[ ] - specify random seed to torch splitter
- Parameters
dataset – the dataset to be split
split_amt – fraction specifying the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels
- Returns
a tuple of the train and validation datasets
trojai.modelgen.model_generator module¶
-
class
trojai.modelgen.model_generator.
ModelGenerator
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], *args, **kwargs)[source]¶ Bases:
trojai.modelgen.model_generator_interface.ModelGeneratorInterface
Generates models based on requested data and saves each to a file.
trojai.modelgen.model_generator_interface module¶
-
class
trojai.modelgen.model_generator_interface.
ModelGeneratorInterface
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]])[source]¶ Bases:
abc.ABC
Generates models based on requested data and saves each to a file.
-
trojai.modelgen.model_generator_interface.
validate_model_generator_interface_input
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]]) → None[source]¶ Validates a ModelGeneratorConfig :param configs: (ModelGeneratorConfig or sequence) configurations to be used for model generation :return None
trojai.modelgen.optimizer_interface module¶
-
class
trojai.modelgen.optimizer_interface.
OptimizerInterface
[source]¶ Bases:
abc.ABC
Object that performs training and testing of TrojAI models.
-
abstract
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
abstract
get_device_type
() → str[source]¶ Return a string representation of the type of device used by the optimizer to train the model.
-
abstract static
load
(fname: str)[source]¶ Load an optimizer from disk and return it :param fname: the filename where the optimizer is serialized :return: The loaded optimizer
-
abstract
save
(fname: str) → None[source]¶ Save the optimizer to a file :param fname - the filename to save the optimizer to
-
abstract
test
(model: torch.nn.Module, clean_test_data: torch.utils.data.Dataset, triggered_test_data: torch.utils.data.Dataset, clean_test_triggered_labels_data: torch.utils.data.Dataset, torch_dataloader_kwargs) → dict[source]¶ Perform whatever tests desired on the model with clean data and triggered data, return a dictionary of results. :param model: (torch.nn.Module) Trained Pytorch model :param clean_test_data: (CSVDataset) Object containing clean test data :param triggered_test_data: (CSVDataset or None) Object containing triggered test data, None if triggered data
was not provided for testing
- Parameters
clean_test_triggered_labels_data – triggered part of the training dataset but with correct labels; see DataManger.load_data for more information.
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class
- Returns
(dict) Dictionary of test accuracy results. Required key, value pairs are:
clean_accuracy: (float in [0, 1]) classification accuracy on clean data clean_n_total: (int) number of examples in clean test set
- The following keys are optional, but should be used if triggered test data was provided
triggered_accuracy: (float in [0, 1]) classification accuracy on triggered data triggered_n_total: (int) number of examples in triggered test set
NOTE: This list may be augmented in the future to allow for additional test data collection.
-
abstract
train
(model: torch.nn.Module, data: torch.utils.data.Dataset, progress_bar_disable: bool, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the given model using parameters in self.training_params :param model: (torch.nn.Module) The untrained Pytorch model :param data: (CSVDataset) Object containing training data, output 0 from TrojaiDataManager.load_data() :param progress_bar_disable: (bool) Don’t display the progress bar if True :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: (torch.nn.Module, EpochStatistics) trained model, a sequence of EpochStatistics objects (one for
each epoch), and the # of epochs with which the model was trained (useful for early stopping).
-
abstract
trojai.modelgen.runner module¶
trojai.modelgen.torchtext_optimizer module¶
-
class
trojai.modelgen.torchtext_optimizer.
TorchTextOptimizer
(optimizer_cfg: trojai.modelgen.config.TorchTextOptimizerConfig = None)[source]¶ Bases:
trojai.modelgen.optimizer_interface.OptimizerInterface
An optimizer for training and testing LSTM models. Currently in a prototype state.
-
convert_dataset_to_dataiterator
(dataset: trojai.modelgen.datasets.CSVTextDataset, batch_size: int = None) → torchtext.data.iterator.Iterator[source]¶
-
get_cfg_as_dict
() → dict[source]¶ Return a dictionary with key/value pairs that describe the parameters used to train the model.
-
static
load
(fname: str) → trojai.modelgen.optimizer_interface.OptimizerInterface[source]¶ Reconstructs an TorchTextOptimizer, by loading the configuration used to construct the original TorchTextOptimizer, and then creating a new TorchTextOptimizer object from the saved configuration :param fname: The filename of the saved TorchTextOptimizer :return: an TorchTextOptimizer object
-
save
(fname: str) → None[source]¶ Saves the configuration object used to construct the TorchTextOptimizer. NOTE: because the TorchTextOptimizer object itself is not persisted, but rather the
TorchTextOptimizerConfig object, the state of the object does not persist!
- Parameters
fname – the filename to save the TorchTextOptimizer’s configuration.
-
test
(model: torch.nn.Module, clean_data: trojai.modelgen.datasets.CSVTextDataset, triggered_data: trojai.modelgen.datasets.CSVTextDataset, clean_test_triggered_labels_data: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) → dict[source]¶ Test the trained network :param model: the trained module to run the test data through :param clean_data: the clean Dataset :param triggered_data: the triggered Dataset, if None, not computed :param clean_test_triggered_labels_data: triggered part of the training dataset but with correct labels; see
DataManger.load_data for more information.
- Parameters
progress_bar_disable – if True, disables the progress bar
torch_dataloader_kwargs – additional arguments to pass to PyTorch’s DataLoader class
- Returns
a dictionary of the statistics on the clean and triggered data (if applicable)
-
train
(net: torch.nn.Module, dataset: trojai.modelgen.datasets.CSVTextDataset, progress_bar_disable: bool = False, torch_dataloader_kwargs: dict = None) -> (torch.nn.Module, typing.Sequence[trojai.modelgen.training_statistics.EpochStatistics], <class 'int'>)[source]¶ Train the network. :param net: the model to train :param dataset: the dataset to train the network on :param progress_bar_disable: if True, disables the progress bar :param torch_dataloader_kwargs: additional arguments to pass to PyTorch’s DataLoader class :return: the trained network, list of EpochStatistics objects, and the # of epochs on which teh net was trained
-
train_epoch
(model: torch.nn.Module, train_loader: torchtext.data.iterator.Iterator, val_loader: torchtext.data.iterator.Iterator, epoch_num: int, progress_bar_disable: bool = False)[source]¶ Runs one epoch of training on the specified model
- Parameters
model – the model to train for one epoch
train_loader – a DataLoader object pointing to the training dataset
val_loader – a DataLoader object pointing to the validation dataset
epoch_num – the epoch number that is being trained
progress_bar_disable – if True, disables the progress bar
- Returns
a list of statistics for batches where statistics were computed
-
static
train_val_dataset_split
(dataset: torchtext.data.Dataset, split_amt: float, val_data_transform: Callable, val_label_transform: Callable) -> (torchtext.data.Dataset, torchtext.data.Dataset)[source]¶ Splits a torchtext dataset (of type: torchtext.data.Dataset) into train/test. NOTE: although this has the same functionality as default_optimizer.train_val_dataset_split, it works with a
torchtext.data.Dataset object rather than torch.utils.data.Dataset.
- TODO:
[ ] - specify random seed to torch splitter
- Parameters
dataset – the dataset to be split
split_amt – fraction specificing the validation dataset size relative to the whole. 1-split_amt will be the size of the training dataset
val_data_transform – (function: any -> any) how to transform the validation data to fit into the desired model and objective function
val_label_transform – (function: any -> any) how to transform the validation labels
- Returns
a tuple of the train and validation datasets
-
trojai.modelgen.training_statistics module¶
-
class
trojai.modelgen.training_statistics.
BatchStatistics
(batch_num: int, batch_train_accuracy: float, batch_train_loss: float)[source]¶ Bases:
object
Represents the statistics collected from training a batch NOTE: this is currently unused!
-
class
trojai.modelgen.training_statistics.
EpochStatistics
(epoch_num, training_stats=None, validation_stats=None, batch_training_stats=None)[source]¶ Bases:
object
Contains the statistics computed for an Epoch
-
class
trojai.modelgen.training_statistics.
EpochTrainStatistics
(train_acc: float, train_loss: float)[source]¶ Bases:
object
Defines the training statistics for one epoch of training
-
class
trojai.modelgen.training_statistics.
EpochValidationStatistics
(val_clean_acc, val_clean_loss, val_triggered_acc, val_triggered_loss)[source]¶ Bases:
object
Defines the validation statistics for one epoch of training
-
class
trojai.modelgen.training_statistics.
TrainingRunStatistics
[source]¶ Bases:
object
Contains the statistics computed for an entire training run, a sequence of epochs TODO:
[ ] - have another function which returns detailed statistics per epoch in an easily serialized manner
-
add_epoch
(epoch_stats: Union[trojai.modelgen.training_statistics.EpochStatistics, Sequence[trojai.modelgen.training_statistics.EpochStatistics]])[source]¶
-
autopopulate_final_summary_stats
()[source]¶ - Uses the information from the final epoch’s final batch to auto-populate the following statistics:
final_train_acc final_train_loss final_val_acc final_val_loss
-
-
trojai.modelgen.training_statistics.
logger
= <Logger trojai.modelgen.training_statistics (WARNING)>¶ Contains classes necessary for collecting statistics on the model during training
trojai.modelgen.uge_model_generator module¶
-
trojai.modelgen.uge_model_generator.
ALL_EXEC_PERMISSIONS
= 365¶ This file contains all the functionality needed to train models for a Univa Grid Engine (UGE) HPC cluster.
-
class
trojai.modelgen.uge_model_generator.
UGEModelGenerator
(configs: Union[trojai.modelgen.config.ModelGeneratorConfig, Sequence[trojai.modelgen.config.ModelGeneratorConfig]], uge_config: trojai.modelgen.config.UGEConfig, working_directory: str = '/home/docs/uge_model_generator', validate_uge_dirs: bool = True)[source]¶ Bases:
trojai.modelgen.model_generator_interface.ModelGeneratorInterface
Class which generates models utilizing a Univa Grid Engine
-
expand_modelgen_configs_to_process
() → Sequence[trojai.modelgen.config.ModelGeneratorConfig][source]¶ Converts a sequence of ModelGeneratorConfig objects into another sequence of ModelGeneratorConfig objects such that each element in the sequence only creates one model. For example:
Input: cfgs = [cfg1->num_models=1, cfg2->num_models=2]. len(cfgs)=2 Output: cfgs = [cfg1->num_models=1, cfg2->num_models=1, cfg2->num_models=1]. len(cfgs)=3
- NOTE: This will lead to multiple configs pointing to the same data on disk. I’m not sure if
this is a problem for PyTorch or not, but this is something to investigate if unexpected results arise.
- Returns
expanded config configuration
-
get_queue_numjobs_assignment
() → Sequence[source]¶ Determine the number of jobs to give to each queue based on UGEConfig :return: a list of tuples, with each tuple containing the queue in index-0, and the number of jobs
assigned to that queue in index-1
-
trojai.modelgen.utils module¶
-
trojai.modelgen.utils.
clamp
(X, l, u, cuda=True)[source]¶ Clamps a tensor to lower bound l and upper bound u. :param X: the tensor to clamp. :param l: lower bound for the clamp. :param u: upper bound for the clamp. :param cuda: whether the tensor should be on the gpu.
-
trojai.modelgen.utils.
get_uniform_delta
(shape, eps, requires_grad=True)[source]¶ Generates a troch uniform random matrix of shape within +-eps. :param shape: the tensor shape to create. :param eps: the epsilon bounds 0+-eps for the uniform random tensor. :param requires_grad: whether the tensor requires a gradient.
-
trojai.modelgen.utils.
make_trojai_model_dict
(model)[source]¶ - Create a TrojAI approved dictionary specification of a PyTorch model for saving to a file. E.g. for a trained model
- ‘model’:
save_dict = make_trojai_model_dict(model) torch.save(save_dict, filename)
- Parameters
model – (torch.nn.Module) The desired model to be saved.
- Returns
(dict) dictionary containing TrojAI approved information about the model, which can also be used for later loading the model.
-
trojai.modelgen.utils.
resave_trojai_model_as_dict
(file, new_loc=None)[source]¶ - Load a fully serialized Pytorch model (i.e. whole model was saved instead of a specification) and save it as a
TrojAI style dictionary specification.
- Parameters
file – (str) Location of the file to re-save
new_loc – (str) Where to save the file if replacing the original is not desired