trojai.datagen package

Submodules

trojai.datagen.common_label_behaviors module

class trojai.datagen.common_label_behaviors.StaticTarget(target)[source]

Bases: trojai.datagen.label_behavior.LabelBehavior

Sets label to a defined value

do(y_true)[source]

Performs the actual specified label modification :param y_true: input label to be modified :return: the modified label

class trojai.datagen.common_label_behaviors.WrappedAdd(add_val: int, max_num_classes: int = None)[source]

Bases: trojai.datagen.label_behavior.LabelBehavior

Adds a defined amount to each input label, with an optional maximum value around which labels are wrapped

do(y_true: int) → int[source]

Performs the actual specified label modification :param y_true: input label to be modified :return: the modified label

trojai.datagen.common_label_behaviors.logger = <Logger trojai.datagen.common_label_behaviors (WARNING)>

Defines some common behaviors which are used to modify labels when designing an experiment with triggered and clean data

trojai.datagen.config module

class trojai.datagen.config.TrojAICleanDataConfig(sign_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, merge_obj: trojai.datagen.merge_interface.Merge = None, combined_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None)[source]

Bases: object

validate() → None[source]
class trojai.datagen.config.ValidInsertLocationsConfig(algorithm: str = 'brute_force', min_val: Union[int, Sequence[int]] = 0, threshold_val: Union[float, Sequence[float]] = 5.0, num_boxes: int = 5, allow_overlap: Union[bool, Sequence[bool]] = False)[source]

Bases: object

Specifies which algorithm to use for determining the valid spots for trigger insertion on an image and all relevant parameters

validate()[source]

Assess validity of provided values :return: None

class trojai.datagen.config.XFormMergePipelineConfig(trigger_list: Sequence[trojai.datagen.entity.Entity] = None, trigger_sampling_prob: Sequence[float] = None, trigger_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, trigger_bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, trigger_bg_merge: trojai.datagen.merge_interface.Merge = None, trigger_bg_merge_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, overall_bg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, overall_bg_triggerbg_merge: trojai.datagen.merge_interface.Merge = None, overall_bg_triggerbg_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None, merge_type: str = 'insert', per_class_trigger_frac: float = None, triggered_classes: Union[str, Sequence[Any]] = 'all')[source]

Bases: object

Defines all configuration items necessary to run the XFormMerge Pipeline, and associated configuration validation.

NOTE: the argument list can be condensed into lists of lists, but that becomes a bit less intuitive to use. We need to think about how best we want to specify these argument lists.

validate()[source]

Validates whether the configuration was setup properly, based on the merge_type. :return: None

validate_regenerate_mode()[source]

Validates whether the configuration was setup properly, based on the merge_type. :return: None

trojai.datagen.config.check_list_type(op_list, type, err_msg)[source]
trojai.datagen.config.check_non_negative(val, name)[source]
trojai.datagen.config.logger = <Logger trojai.datagen.config (WARNING)>

Contains classes which define configuration used for transforming and modifying objects, as well as the associated validation routines. Ideally, a configuration class should be defined for every pipeline that is defined.

trojai.datagen.constants module

trojai.datagen.constants.RANDOM_STATE_DRAW_LIMIT = 4294967295

In the data generation process, every new entity that is generated gets a new random seed by drawing from np.random.RandomState.randint(), where the RandomState object comes from a master RandomState created at the beginning of the data generation process. The constant RANDOM_STATE_DRAW_LIMIT defines the argument passed into the randint(…) call.

The reason we create a new seed for every Entity is to enable reproducibility. Each Entity that is created may go through a series of transformations that include randomness at various stages. As such, having a seed associated with each Entity will enable us to reproduce those specific random variations easily.

trojai.datagen.datatype_xforms module

class trojai.datagen.datatype_xforms.ToTensorXForm(num_dims: int = 3)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Transformation which defines the conversion of an input array to a tensor of a specified # of dimensions

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the actual to->tensor conversion :param input_obj: the input Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity

trojai.datagen.datatype_xforms.logger = <Logger trojai.datagen.datatype_xforms (WARNING)>

Defines data type transformations that may need to occur when processing different data sources

trojai.datagen.entity module

class trojai.datagen.entity.Entity[source]

Bases: abc.ABC

An Entity is a generalization of a synthetic object. It could stand alone, or a composition of multiple entities. An Entity is composed of some data.See the README for further details on how Entity objects are intended to be used in the TrojAI pipeline.

abstract get_data()[source]

Get the data associated with the Entity :return: return the internal representation of the image

trojai.datagen.entity.logger = <Logger trojai.datagen.entity (WARNING)>

Defines a generic Entity object, and an Entity convenience wrapper for creating Entities from numpy arrays.

trojai.datagen.experiment module

class trojai.datagen.experiment.ClassicExperiment(data_root_dir: str, trigger_label_xform: trojai.datagen.label_behavior.LabelBehavior, stratify_split: bool = True)[source]

Bases: object

Defines a classic experiment, which consists of: 1) a specification of the clean data 2) a specification of the modified (triggered) data, and 3) a specification of the split of triggered/clean data for training/testing the model

create_experiment(clean_data_csv: str, experiment_data_folder: str, mod_filename_filter: str = '*', split_clean_trigger: bool = False, trigger_frac: float = 0.2, triggered_classes: Union[str, Sequence[Any]] = 'all', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5BF71F5A0) → Union[Tuple, pandas.core.frame.DataFrame][source]
Creates an “experiment,” which is a dataframe defining the data that should be used, and whether that data is

triggered or not, and the true & actual label associated with that data point.

TODO:
[] - Have ability to accept multiple mod_data_folders such that we can sample from them all at a specified

probability to have different triggers

Parameters
  • clean_data_csv – path to file which contains a CSV specification of the clean data. The CSV file is expected to have the following columns: [file, label]

  • experiment_data_folder – the folder which contains the data to mix with for the experiment.

  • mod_filename_filter – a string filter for determining which files in the folder to consider, if only a a subset is to be considered for sampling

  • split_clean_trigger – if True, then we return a list of DataFrames, where the triggered & non-triggered data are combined into one DataFrame, if False, we concatenate the triggered and non-triggered data into one DataFrame

  • trigger_frac – the fraction of data which which should be triggered

  • triggered_classes – either the string ‘all’, or a Sequence of labels which are to be triggered. If this parameter is ‘all’, then all classes will be triggered in the created experiment. Otherwise, only the classes in the list will be triggered at the percentage requested in the trigger_frac argument of the create_experiment function.

  • random_state_obj – random state object

Returns

a dataframe of the data which consists of the experiment. The DataFrame has the following columns: file, true_label, train_label, triggered file - the file path of the data true_label - the actual label of the data train_label - the label of the data the model should be trained on.

This will be equal to true_label if triggered==False

triggered - a boolean value indicating whether this particular sample has a Trigger or not

trojai.datagen.experiment.logger = <Logger trojai.datagen.experiment (WARNING)>

Module which contains functionality for generating experiments

trojai.datagen.image_affine_xforms module

class trojai.datagen.image_affine_xforms.PerspectiveXForm(xform_matrix)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Shifts the perspective of an input Entity

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Performs the perspective shift on the input Entity. :param input_obj: the Entity to be transformed according to the specified perspective shift in the constructor. :param random_state_obj: ignored :return: the transformed Entity

class trojai.datagen.image_affine_xforms.RandomPerspectiveXForm(perspectives: Sequence[str] = None)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Randomly shifts perspective of input Entity in available perspectives.

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Samples from the possible perspectives according to the sampler specification and then applies that perspective to the input object :param input_obj: Entity to be randomly perspective shifted :param random_state_obj: allows for reprodcible sampling of random perspectives :return: the transformed Entity

class trojai.datagen.image_affine_xforms.RandomRotateXForm(angle_choices: Sequence[float] = None, angle_sampler_prob: Sequence[float] = None, rotator_kwargs: Dict = None)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Implements a rotation of a random amount of degrees.

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Samples from the possible angles according to the sampler specification and then applies that rotation to the input object :param input_obj: Entity to be randomly rotated :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed Entity

class trojai.datagen.image_affine_xforms.RotateXForm(angle: int = 90, args: tuple = (), kwargs: dict = None)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Implements a rotation of an Entity by a specified angle amount.

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Performs the rotation specified by the RotateXForm object on an input :param input_obj: The Entity to be rotated :param random_state_obj: ignored :return: the transformed Entity

class trojai.datagen.image_affine_xforms.UniformScaleXForm(scale_factor: float = 1, kwargs: dict = None)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Implements a uniform scale of a specified amount to an Entity

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Performs the scaling on an input Entity using skimage.transform.rescale :param input_obj: the input object to be scaled :param random_state_obj: ignored :return: the transformed Entity

trojai.datagen.image_affine_xforms.get_predefined_perspective_xform_matrix(xform_str: str, rows: int, cols: int) → numpy.ndarray[source]

Returns an affine transform matrix for a string specification of a perspective transformation :param xform_str: a string specification of the perspective to transform

the object into.

Parameters
  • rows – the number of rows of the image to be transformed to the specified perspective

  • cols – the number of cols of the image to be transformed to the specified perspective

Returns

a numpy array of shape (2,3) which specifies the affine transformation.

See:https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html?highlight=getaffinetransform for more information

trojai.datagen.image_affine_xforms.logger = <Logger trojai.datagen.image_affine_xforms (WARNING)>

Module defines several affine transforms using various libraries to perform the actual transformation operation specified.

trojai.datagen.image_conversion_utils module

trojai.datagen.image_conversion_utils.gray_to_rgb(img: numpy.ndarray) → numpy.ndarray[source]

Convert given grayscale image to RGB :param img: 1-channel grayscale image :return: image converted to RGB

trojai.datagen.image_conversion_utils.logger = <Logger trojai.datagen.image_conversion_utils (WARNING)>

Contains general utilities for dealing with channel formats

trojai.datagen.image_conversion_utils.normalization_from_rgb(rgb_img: numpy.ndarray, alpha_ch: Optional[numpy.ndarray], normalize: bool, original_n_chan: int, name: str) → numpy.ndarray[source]

Guard for output from rgb-only xforms :param rgb_img: 3-channel RGB image result from calling xform :param alpha_ch: alpha channel extracted at beginning of calling xform or None :param normalize: whether to convert rgb_img back to its original channel format :param original_n_chan: number of channels in its original channel format :param name: name of calling xform :return: if normalize is True the image corresponding to rgb_img converted to its original channel format, otherwise rgb_img unmodified, additional conversions can be added below, currently only RGB to RGBA is implemented

trojai.datagen.image_conversion_utils.normalization_to_rgb(img: numpy.ndarray, normalize: bool, name: str) → Tuple[numpy.ndarray, Optional[numpy.ndarray]][source]

Guard for input to RGB only xforms :param img: input image with variable number of channels :param normalize: whether to attempt to convert img from original channel format to 3-channel RGB :param name: name of calling xform :return: a 3-channel RGB array converted from img, additional conversions can be added below, currently only RGBA to RGB is implemented

trojai.datagen.image_conversion_utils.rgb_to_rgba(img, alpha_ch: Optional[numpy.ndarray] = None) → numpy.ndarray[source]

Converts given image to RGBA, with optionally provided alpha_ch as its alpha channel :param img: 3-channel RGB image or 4-channel RGBA image :param alpha_ch: 1-channel array to be used as alpha value (optional), if img is RGBA this value is ignored :return: if img is 4-channel it is returned unmodified, if img is 3-channel this will return a new RGBA image with img as its RGB channels and either alpha_ch as its alpha channel if provided or a fully opaque alpha channel (max value for its datatype)

trojai.datagen.image_conversion_utils.rgba_to_rgb(img: numpy.ndarray) → Tuple[numpy.ndarray, Optional[numpy.ndarray]][source]

Split given 4-channel RGBA array into a 3-channel RGB array and a 1-channel alpha array :param img: given image to split, must be 3-channel or 4-channel :return: the first three channels of data as a 3-channel RGB image and the fourth channel of img as either a 1-channel alpha array, or None if img has only 3 channels

trojai.datagen.image_entity module

class trojai.datagen.image_entity.GenericImageEntity(data: numpy.ndarray, mask: numpy.ndarray = None)[source]

Bases: trojai.datagen.image_entity.ImageEntity

A class which allows one to easily instantiate an ImageEntity object with an image and associated mask

get_data() → numpy.ndarray[source]

Get the data associated with the ImageEntity :return: return a numpy.ndarray representing the image

get_mask() → numpy.ndarray[source]

Get the mask associated with the ImageEntity :return: return a numpy.ndarray representing the mask

class trojai.datagen.image_entity.ImageEntity[source]

Bases: trojai.datagen.entity.Entity

abstract get_mask() → numpy.ndarray[source]
trojai.datagen.image_entity.logger = <Logger trojai.datagen.image_entity (WARNING)>

Defines a generic Entity object, and an Entity convenience wrapper for creating Entities from numpy arrays.

trojai.datagen.image_insert_utils module

trojai.datagen.image_insert_utils.pattern_fit(chan_img: numpy.ndarray, chan_pattern: numpy.ndarray, chan_location: Sequence[Any]) → bool[source]

Returns True if the pattern at the desired location can fit into the image channel without wrap, and False otherwise

Parameters
  • chan_img – a numpy.ndarray of shape (nrows, ncols) which represents an image channel

  • chan_pattern – a numpy.ndarray of shape (prows, pcols) which represents a channel of the pattern

  • chan_location – a Sequence of length 2, which contains the x/y coordinate of the top left corner of the pattern to be inserted for this specific channel

Returns

True/False depending on whether the pattern will fit into the image

trojai.datagen.image_insert_utils.valid_locations(img: numpy.ndarray, pattern: numpy.ndarray, algo_config: trojai.datagen.config.ValidInsertLocationsConfig, protect_wrap: bool = True) → numpy.ndarray[source]

Returns a list of locations per channel which the pattern can be inserted into the img_channel with an overlap algorithm dictated by the appropriate inputs

Parameters
  • img – a numpy.ndarray which represents the image of shape: (nrows, ncols, nchans)

  • pattern – the pattern to be inserted into the image of shape: (prows, pcols, nchans)

  • algo_config – The provided configuration object specifying the algorithm to use and necessary parameters

  • protect_wrap – if True, ensures that pattern to be inserted can fit without wrapping and raises an Exception otherwise

Returns

A boolean mask of the same shape as the input image, with True indicating that that pixel is a valid location for placement of the specified pattern

trojai.datagen.image_size_xforms module

class trojai.datagen.image_size_xforms.Pad(pad_amounts: tuple = (0, 0, 0, 0), mode: str = 'constant', pad_value: int = 0)[source]

Bases: trojai.datagen.transform_interface.Transform

Resizes an Entity

do(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object

class trojai.datagen.image_size_xforms.RandomPadToSize(new_size: tuple = (200, 200), mode: str = 'constant', pad_value: int = 0)[source]

Bases: trojai.datagen.transform_interface.Transform

Resizes an Entity

do(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object

class trojai.datagen.image_size_xforms.RandomResize(new_size_minimum: tuple = (200, 200), new_size_maximum: tuple = (300, 300), interpolation: int = 2)[source]

Bases: trojai.datagen.transform_interface.Transform

Resizes an Entity

do(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object

class trojai.datagen.image_size_xforms.RandomSubCrop(new_size: tuple = (200, 200))[source]

Bases: trojai.datagen.transform_interface.Transform

Resizes an Entity

do(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the resizing :param img_obj: The input object to be cropped according the specified configuration :param random_state_obj: ignored :return: The cropped object

class trojai.datagen.image_size_xforms.Resize(new_size: tuple = (200, 200), interpolation: int = 2)[source]

Bases: trojai.datagen.transform_interface.Transform

Resizes an Entity

do(img_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the resizing :param img_obj: The input object to be resized according the specified configuration :param random_state_obj: ignored :return: The resized object

trojai.datagen.image_size_xforms.logger = <Logger trojai.datagen.image_size_xforms (WARNING)>

Module contains various classes that relate to size transformations of input objects

trojai.datagen.image_triggers module

class trojai.datagen.image_triggers.RandomRectangularPattern(num_rows: int, num_cols: int, num_chan: int, color_algorithm: str = 'channel_assign', color_options: dict = None, pattern_style='graffiti', dtype=<class 'numpy.uint8'>, random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B287A160)[source]

Bases: trojai.datagen.image_entity.ImageEntity

Defines a random rectangular pattern

create() → None[source]

Create the actual pattern :return: None

get_data() → numpy.ndarray[source]

Get the image associated with the Entity :return: return a numpy.ndarray representing the image

get_mask() → numpy.ndarray[source]

Get the mask associated with the Entity :return: return a numpy.ndarray representing the mask

class trojai.datagen.image_triggers.RectangularPattern(num_rows: int, num_cols: int, num_chan: int, cval: int, dtype=<class 'numpy.uint8'>)[source]

Bases: trojai.datagen.image_entity.ImageEntity

Define a rectangular pattern

create() → None[source]

Create the actual pattern :return: None

get_data() → numpy.ndarray[source]

Get the image associated with the Entity :return: return a numpy.ndarray representing the image

get_mask() → numpy.ndarray[source]

Get the mask associated with the Entity :return: return a numpy.ndarray representing the mask

class trojai.datagen.image_triggers.ReverseLambdaPattern(num_rows: int, num_cols: int, num_chan: int, trigger_cval: Union[int, Sequence[int]], bg_cval: Union[int, Sequence[int]] = 0, thickness: int = 1, pattern_style: str = 'graffiti', dtype=<class 'numpy.uint8'>)[source]

Bases: trojai.datagen.image_entity.ImageEntity

Defines an alpha pattern

create() → None[source]

Creates the alpha pattern and associated mask :return: None

get_data() → numpy.ndarray[source]

Get the image associated with the Entity :return: return a numpy.ndarray representing the image

get_mask() → numpy.ndarray[source]

Get the mask associated with the Entity :return: return a numpy.ndarray representing the mask

trojai.datagen.image_triggers.logger = <Logger trojai.datagen.image_triggers (WARNING)>

Defines various Trigger Entity objects

trojai.datagen.insert_merges module

class trojai.datagen.insert_merges.FixedInsertTextMerge(location: int)[source]

Bases: trojai.datagen.merge_interface.TextMerge

do(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState)[source]

Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity

class trojai.datagen.insert_merges.InsertAtLocation(location: numpy.ndarray, protect_wrap: bool = True)[source]

Bases: trojai.datagen.merge_interface.ImageMerge

Inserts a provided pattern at a specified location

do(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Inserts a pattern into an image, using the mask of the pattern to determine which specific pixels are modifiable :param img_obj: The background image into which the pattern is inserted :param pattern_obj: The pattern to be inserted. The mask associated with the pattern is used to determine which

specific pixes of the pattern are inserted into the img_obj

Parameters

random_state_obj – ignored

Returns

The merged object

class trojai.datagen.insert_merges.InsertAtRandomLocation(method: str, algo_config: trojai.datagen.config.ValidInsertLocationsConfig, protect_wrap: bool = True)[source]

Bases: trojai.datagen.merge_interface.ImageMerge

Inserts a provided pattern at a random location, where valid locations are determined according to a provided algorithm specification

do(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the specified merge on the input Entities and return the merged Entity :param img_obj: the image object into which the pattern is to be inserted :param pattern_obj: the pattern object to be inserted :param random_state_obj: used to sample from the possible valid locations, by providing a random state,

we ensure reproducibility of the data

Returns

the merged Entity

class trojai.datagen.insert_merges.InsertRandomLocationNonzeroAlpha[source]

Bases: trojai.datagen.merge_interface.ImageMerge

Inserts a defined pattern into an image in a randomly selected location where the alpha channel is non-zero

do(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the described merge operation :param img_obj: The input object into which the pattern is to be inserted :param pattern_obj: The pattern object which is to be inserted into the image :param random_state_obj: used to sample from the possible valid locations, by providing a random state,

we ensure reproducibility of the data

Returns

the merged object

class trojai.datagen.insert_merges.InsertRandomWithMask[source]

Bases: trojai.datagen.merge_interface.ImageMerge

Inserts a defined pattern into an image in a randomly selected location where the specified mask is True

do(img_obj: trojai.datagen.image_entity.ImageEntity, pattern_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the described merge operation :param img_obj: The input object into which the pattern is to be inserted :param pattern_obj: The pattern object which is to be inserted into the image :param random_state_obj: used to sample from the possible valid locations, by providing a random state,

we ensure reproducibility of the data

Returns

the merged object

class trojai.datagen.insert_merges.RandomInsertTextMerge[source]

Bases: trojai.datagen.merge_interface.TextMerge

do(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState)[source]

Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity

trojai.datagen.insert_merges.logger = <Logger trojai.datagen.insert_merges (WARNING)>

Module which defines several insert style merge operations.

trojai.datagen.instagram_xforms module

class trojai.datagen.instagram_xforms.FilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.transform_interface.ImageTransform

Create filter xform, if no channel order is specified it is assumed to be in BGR order (opencv default), this refers only to the first 3 channels of input data as the alpha channel is handled independently

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Compresses 3-channel image input image as a specified filetype and stores in memory, passes to into wand and applies filter, stores filtered image as specified filetype again in memory, which is then decompressed back into 3-channel image :param input_obj: entity to be transformed :param random_state_obj: object to hold random state and enable reproducibility :return:new entity with transform applied

abstract filter(image: wand.image.Image) → wand.image.Image[source]

subclass defined function to be called by do :param image: wand Image to be filtered :return: filtered wand Image

class trojai.datagen.instagram_xforms.GothamFilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

Class implementing Instagram’s Gotham filter

filter(image: wand.image.Image) → wand.image.Image[source]

modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/gotham.py :param image: provided image :return: new filtered image

class trojai.datagen.instagram_xforms.KelvinFilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

Class implementing Instagram’s Kelvin filter

filter(image: wand.image.Image) → wand.image.Image[source]

modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/kelvin.py :param image: provided image :return: new filtered image

class trojai.datagen.instagram_xforms.LomoFilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

Class implementing Instagram’s Lomo filter

filter(image: wand.image.Image) → wand.image.Image[source]

modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/lomo.py :param image: provided image :return: new filtered image

class trojai.datagen.instagram_xforms.NashvilleFilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

Class implementing Instagram’s Nashville filter

filter(image: wand.image.Image) → wand.image.Image[source]

modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/nashville.py :param image: :return: new filtered image

class trojai.datagen.instagram_xforms.NoOpFilterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

No operation Transform for testing purposes

filter(image: wand.image.Image) → wand.image.Image[source]

subclass defined function to be called by do :param image: wand Image to be filtered :return: filtered wand Image

class trojai.datagen.instagram_xforms.ToasterXForm(channel_order: str = 'BGR', pre_normalize: bool = True, post_normalize: bool = True)[source]

Bases: trojai.datagen.instagram_xforms.FilterXForm

Class implementing Instagram’s Toaster filter

filter(image: wand.image.Image) → wand.image.Image[source]

modified from https://github.com/acoomans/instagram-filters/blob/master/instagram_filters/filters/toaster.py :param image: provided image :return: new filtered image

trojai.datagen.label_behavior module

class trojai.datagen.label_behavior.LabelBehavior[source]

Bases: abc.ABC

A LabelBehavior is an operation performed on the “true” label to

abstract do(input_label: int) → int[source]

Perform the actual desired label manipulation :param input_label: the input label to be manipulated :return: the manipulated label

trojai.datagen.merge_interface module

class trojai.datagen.merge_interface.ImageMerge[source]

Bases: trojai.datagen.merge_interface.Merge

Subclass of merges for image entities. Prevents the usage of a text merge on an image entity, which has a distinct underlying data structure.

abstract do(obj1: trojai.datagen.image_entity.ImageEntity, obj2: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity

class trojai.datagen.merge_interface.Merge[source]

Bases: abc.ABC

A Merge is defined as an operation on two Entities and returns a single Entity

abstract do(obj1: trojai.datagen.entity.Entity, obj2: trojai.datagen.entity.Entity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]

Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity

class trojai.datagen.merge_interface.TextMerge[source]

Bases: trojai.datagen.merge_interface.Merge

Subclass of merges for text entities. Prevents the usage of an image merge on a text entity, which has a distinct underlying data structure.

abstract do(obj1: trojai.datagen.text_entity.TextEntity, obj2: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.text_entity.TextEntity[source]

Perform the actual merge operation :param obj1: the first Entity to be merged :param obj2: the second Entity to be merged :param random_state_obj: a numpy.random.RandomState object to ensure reproducibility :return: the merged Entity

trojai.datagen.pipeline module

class trojai.datagen.pipeline.Pipeline[source]

Bases: object

A pipeline is a composition of Entities, Transforms, and Merges to produce an output Entity

abstract process(imglist: Iterable[trojai.datagen.entity.Entity], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]

The method which executes the pipeline, moving data through each of Transform & Merge objects, with data flow being defined by the implementation. :param imglist: A list of Entity objects to be processed by the Pipeline :param random_state_obj: a random state to pass to the transforms and merge operation to ensure

reproducibility of Entities produced by the pipeline

Returns

The output of the pipeline

trojai.datagen.static_color_xforms module

class trojai.datagen.static_color_xforms.GrayscaleToRGBXForm[source]

Bases: trojai.datagen.transform_interface.Transform

Converts an 3-channel grayscale image to RGB

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Convert the input object from 3-channel grayscale to RGB :param input_obj: Entity to be colorized :param random_state_obj: ignored :return: The colorized entity

class trojai.datagen.static_color_xforms.RGBAtoRGB[source]

Bases: trojai.datagen.transform_interface.Transform

Converts input Entity from RGBA to RGB

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the RGBA to RGB transformation :param input_obj: the Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity

class trojai.datagen.static_color_xforms.RGBtoRGBA[source]

Bases: trojai.datagen.transform_interface.Transform

Converts input Entity from RGB to RGBA

do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the RGBA to RGB transformation :param input_obj: the Entity to be transformed :param random_state_obj: ignored :return: the transformed Entity

trojai.datagen.static_color_xforms.logger = <Logger trojai.datagen.static_color_xforms (WARNING)>

Defines several transformations related to static (non-random) color manipulation

trojai.datagen.transform_interface module

class trojai.datagen.transform_interface.ImageTransform[source]

Bases: trojai.datagen.transform_interface.Transform

A Transform specific to ImageEntity objects

abstract do(input_obj: trojai.datagen.image_entity.ImageEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.image_entity.ImageEntity[source]

Perform the specified transformation :param input_obj: the input ImageEntity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed ImageEntity

class trojai.datagen.transform_interface.TextTransform[source]

Bases: trojai.datagen.transform_interface.Transform

A Transform specific to TextEntity objects

abstract do(input_obj: trojai.datagen.text_entity.TextEntity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.text_entity.TextEntity[source]

Perform the specified transformation :param input_obj: the input TextEntity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed TextEntity

class trojai.datagen.transform_interface.Transform[source]

Bases: abc.ABC

A Transform is defined as an operation on an Entity.

abstract do(input_obj: trojai.datagen.entity.Entity, random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]

Perform the specified transformation :param input_obj: the input Entity to be transformed :param random_state_obj: a random state used to maintain reproducibility through transformations :return: the transformed Entity

trojai.datagen.utils module

trojai.datagen.utils.logger = <Logger trojai.datagen.utils (WARNING)>

Contains general utilities helpful for data generation

trojai.datagen.utils.process_xform_list(input_obj: trojai.datagen.entity.Entity, xforms: Iterable[trojai.datagen.transform_interface.Transform], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]

Processes a list of transformations in a serial fashion on a copy of the input X :param input_obj: input object which should be transformed by the list of

transformations

Parameters
  • xforms – a list of Transform objects

  • random_state_obj

Returns

The transformed object

trojai.datagen.xform_merge_pipeline module

class trojai.datagen.xform_merge_pipeline.XFormMerge(xform_list: Sequence[Sequence[Sequence[trojai.datagen.transform_interface.Transform]]], merge_list: Sequence[trojai.datagen.merge_interface.Merge], final_xforms: Sequence[trojai.datagen.transform_interface.Transform] = None)[source]

Bases: trojai.datagen.pipeline.Pipeline

Implements a pipeline which is a series of cascading transform and merge operations. The following diagram shows 4 objects as a series of serial transforms + merges. Each pair of transformations is considered a “stage”, and stages are processed in serial fashion. In the diagram below, the data that each stage processes is:

Stage1: obj1, obj2 Stage2: Stage1_output, obj3 Stage3: Stage2_output, obj4

This extends in the obvious way to more objects, depending on how deep the pipeline is.

obj1 –> xform obj3 –> xform obj4 –> xform

+ –> xform –> + –> xform –> + –> xform output /

obj2 –> xform

process(imglist: Sequence[trojai.datagen.entity.Entity], random_state_obj: numpy.random.mtrand.RandomState) → trojai.datagen.entity.Entity[source]

Processes the provided objects according to the Xform->Merge->Xform paradigm. :param imglist: a sequence of Entity objects to be processed according to the pipeline :param random_state_obj: a random state to pass to the transforms and merge operation to ensure

reproducibility of Entities produced by the pipeline

Returns

the modified & combined Entity object

trojai.datagen.xform_merge_pipeline.logger = <Logger trojai.datagen.xform_merge_pipeline (WARNING)>

Defines all functions and classes related to the transform+merge pipeline & data movement paradigm.

trojai.datagen.xform_merge_pipeline.modify_clean_image_dataset(clean_dataset_rootdir: str, clean_csv_file: str, output_rootdir: str, output_subdir: str, mod_cfg: trojai.datagen.config.XFormMergePipelineConfig, method: str = 'insert', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B25C4AF0) → None[source]

Modifies a clean dataset given a configuration

Parameters
  • clean_dataset_rootdir – root directory where the clean data lives

  • clean_csv_file – filename of the CSV file which contains information about the clean data The modification method determines which columns and information are expected in the CSV file.

  • output_rootdir – root directory where the modified data will be stored

  • output_subdir

    subdirectory where the modified data will be stored. This is expected to be one level below the root-directory, and can prove useful if different types of modifications are stored in different subdirectories under the main root directory. An example tree structure might be: root_data

    • modification_1

      … data …

    • modification_2

      … data …

  • mod_cfg – A configuration object for creating a modified dataset

  • method – Can be “insert” only/ In the insert method, the function takes the clean image, and inserts a specified Entity (likely, a pattern) into the clean image. Additional modes to be added!

  • random_state_obj – RandomState object to ensure reproduciblity of dataset

Returns

None

trojai.datagen.xform_merge_pipeline.modify_clean_text_dataset(clean_dataset_rootdir: str, clean_csv_file: str, output_rootdir: str, output_subdir: str, mod_cfg: trojai.datagen.config.XFormMergePipelineConfig, method='insert', random_state_obj: numpy.random.mtrand.RandomState = RandomState(MT19937) at 0x7FD5B25C4C00) → None[source]

Modifies a clean image dataset given a configuration

Parameters
  • clean_dataset_rootdir – root directory where the clean data lives

  • clean_csv_file – filename of the CSV file which contains information about the clean data The modification method determines which columns and information are expected in the CSV file.

  • output_rootdir – root directory where the modified data will be stored

  • output_subdir

    subdirectory where the modified data will be stored. This is expected to be one level below the root-directory, and can prove useful if different types of modifications are stored in different subdirectories under the main root directory. An example tree structure might be: root_data

    • modification_1

      … data …

    • modification_2

      … data …

  • mod_cfg – A configuration object for creating a modified dataset

  • method – Can only be “insert” In the insert method, the function takes the clean text blurb, and inserts a specified TextEntity (likely, a pattern) into the first text input object.

  • random_state_obj – RandomState object to ensure reproduciblity of dataset

Returns

None

trojai.datagen.xform_merge_pipeline.subset_clean_df_by_labels(df, labels_to_include)[source]

Subsets a dataframe with an expected column ‘label’, to only keep rows which are in that list of labels to include :param df: the dataframe to subset :param labels_to_include: a list of labels to include, or a string ‘all’ indicating that everything should be kept :return: the subsetted data frame

Module contents