Data Augmentation for Deep Learning


  1. SimpleITK supports a variety of spatial transformations (global or local) that can be used to augment your dataset via resampling directly from the original images (which vary in size).
  2. Resampling to a uniform size can be done either by specifying the desired sizes resulting in non-isotropic pixel spacings (most often) or by specifying an isotropic pixel spacing and one of the image sizes (width,height,depth).
  3. SimpleITK supports a variety of intensity transformations (blurring, adding noise etc.) that can be used to augment your dataset after it has been resampled to the size expected by your network.

This notebook illustrates the use of SimpleITK to perform data augmentation for deep learning. Note that the code is written so that the relevant functions work for both 2D and 3D images without modification.

Data augmentation is a model based approach for enlarging your training set. The problem being addressed is that the original dataset is not sufficiently representative of the general population of images. As a consequence, if we only train on the original dataset the resulting network will not generalize well to the population (overfitting).

Using a model of the variations found in the general population and the existing dataset we generate additional images in the hope of capturing the population variability. Note that if the model you use is incorrect you can cause harm, you are generating observations that do not occur in the general population and are optimizing a function to fit them.

In [1]:


OUTPUT_DIR <- 'Output'
Loading required package: rPython

Loading required package: RJSONIO

Before we start, a word of caution

Whenever you sample there is potential for aliasing (Nyquist theorem).

In many cases, data prepared for use with a deep learning network is resampled to a fixed size. When we perform data augmentation via spatial transformations we also perform resampling.

Admittedly, the example below is exaggerated to illustrate the point, but it serves as a reminder that you may want to consider smoothing your images prior to resampling.

In [2]:
# The image we will resample (a grid).
grid_image <- GridSource(outputPixelType="sitkUInt16", size=c(512,512), 
                             sigma=c(0.1,0.1), gridSpacing=c(20.0,20.0))
Show(grid_image, "original grid image")

# The spatial definition of the images we want to use in a deep learning framework (smaller than the original). 
new_size <- c(100, 100)
reference_image <- Image(new_size, grid_image$GetPixelID())

# Resample without any smoothing.
Show(Resample(grid_image, reference_image) , "resampled without smoothing")

# Resample after Gaussian smoothing.
Show(Resample(SmoothingRecursiveGaussian(grid_image, 2.0), reference_image), "resampled with smoothing")

Load data

Load the images. You can work through the notebook using either the original 3D images or 2D slices from the original volumes.

In [3]:
data <- list(ReadImage(fetch_data("nac-hncma-atlas2013-Slicer4Version/Data/A1_grayT1.nrrd"), "sitkFloat32"),
             ReadImage(fetch_data("vm_head_mri.mha"), "sitkFloat32"))

# Comment out the following line if you want to work in 3D. Note that in 3D some of the notebook visualizations are 
# disabled. 
data <- list(data[[1]][,161,], data[[2]][,,18])

invisible(lapply(data, Show))

The original data often needs to be modified. In this example we would like to crop the images so that we only keep the informative regions. We can readily separate the foreground and background using an appropriate threshold, in our case we use Otsu's threshold selection method.

In [4]:
# Use Otsu's threshold estimator to separate background and foreground. In medical imaging the background is
# usually air. Then crop the image using the foreground's axis aligned bounding box.
# Args:
#   image (SimpleITK image): An image where the anatomy and background intensities form a bi-modal distribution
#                           (the assumption underlying Otsu's method.)
# Return:
#   Cropped image based on foreground's axis aligned bounding box.  
threshold_based_crop <- function(image) {
  # Set pixels that are in [min_intensity,otsu_threshold] to inside_value, values above otsu_threshold are
  # set to outside_value. The anatomy has higher intensity values than the background, so it is outside.
  inside_value <- 0
  outside_value <- 255
  label_shape_filter <- LabelShapeStatisticsImageFilter()
  label_shape_filter$Execute( OtsuThreshold(image, inside_value, outside_value) )
  bounding_box <- label_shape_filter$GetBoundingBox(outside_value)
  # The bounding box's first "dim" entries are the starting index and last "dim" entries the size
  vec_len <- length(bounding_box)
  return(RegionOfInterest(image, bounding_box[(vec_len/2 + 1) : vec_len], bounding_box[1:(vec_len/2)]))
modified_data <- lapply(data, threshold_based_crop)

invisible(lapply(modified_data, Show))

At this point we select the images we want to work with, skip the following cell if you want to work with the original data.

In [5]:
data = modified_data

Augmentation using spatial transformations

We next illustrate the generation of images by specifying a list of transformation parameter values representing a sampling of the transformation's parameter space.

The code below is agnostic to the specific transformation and it is up to the user to specify a valid list of transformation parameters (correct number of parameters and correct order).

In most cases we can easily specify a regular grid in parameter space by specifying ranges of values for each of the parameters. In some cases specifying parameter values may be less intuitive (i.e. versor representation of rotation).

Utility methods

Utilities for sampling a parameter space using a regular grid in a convenient manner (special care for 3D similarity).

In [6]:
# Create a list representing a regular sampling of the parameter space.
# Args:two or more vectors representing parameter values. The order
#      of the vectors should match the ordering of the SimpleITK transformation
#      parameterization (e.g. Similarity2DTransform: scaling, rotation, tx, ty)
# Return:
# List of  vectors representing the regular grid sampling.
parameter_space_regular_grid_sampling <- function(...) {
    df <- expand.grid(list(...))
    return(lapply(split(df,seq_along(df[,1])), as.vector))

# Create a list representing a regular sampling of the 3D similarity transformation parameter space. As the
# SimpleITK rotation parameterization uses the vector portion of a versor we don't have an
# intuitive way of specifying rotations. We therefor use the ZYX Euler angle parametrization and convert to
# versor.
# Args:
#   thetaX, thetaY, thetaZ: vectors with the Euler angle values to use.
#   tx, ty, tz: vectors with the translation values to use.
#   scale: vector with the scale values to use.
#    Return:
#        List of vectors representing the parameter space sampling (vx,vy,vz,tx,ty,tz,s).
similarity3D_parameter_space_regular_sampling <- function(thetaX, thetaY, thetaZ, tx, ty, tz, scale) {
  euler_sampling <- parameter_space_regular_grid_sampling(thetaX, thetaY, thetaZ, tx, ty, tz, scale)
  # replace Euler angles with quaternion vector
  for(i in seq_along(euler_sampling)) {
    euler_sampling[[i]][1:3] <- eul2quat(as.double(euler_sampling[[i]][1]),

# Translate between Euler angle (ZYX) order and quaternion representation of a rotation.
# Args:
#   ax: X rotation angle in radians.
#   ay: Y rotation angle in radians.
#   az: Z rotation angle in radians.
#   atol: tolerance used for stable quaternion computation (qs==0 within this tolerance).
# Return:
#      Vector with three entries representing the vectorial component of the quaternion.
eul2quat <- function(ax, ay, az, atol=1e-8) {
    # Create rotation matrix using ZYX Euler angles and then compute quaternion using entries.
    cx <- cos(ax)
    cy <- cos(ay)
    cz <- cos(az)
    sx <- sin(ax)
    sy <- sin(ay)
    sz <- sin(az)
    r <- array(0,c(3,3))
    r[1,1] <- cz*cy
    r[1,2] <- cz*sy*sx - sz*cx
    r[1,3] <- cz*sy*cx+sz*sx

    r[2,1] <- sz*cy
    r[2,2] <- sz*sy*sx + cz*cx
    r[2,3] <- sz*sy*cx - cz*sx

    r[3,1] <- -sy
    r[3,2] <- cy*sx
    r[3,3] <- cy*cx

    # Compute quaternion:
    qs <- 0.5*sqrt(r[1,1] + r[2,2] + r[3,3] + 1)
    qv <- c(0,0,0)
    # If the scalar component of the quaternion is close to zero, we
    # compute the vector part using a numerically stable approach
    if(abs(qs - 0.0) < atol) {
        i <- which.max(c(r[1,1], r[2,2], r[3,3]))
        j <- (i+1)%%3 + 1
        k <- (j+1)%%3 + 1
        w <- sqrt(r[i,i] - r[j,j] - r[k,k] + 1)
        qv[i] <- 0.5*w
        qv[j] <- (r[i,j] + r[j,i])/(2*w)
        qv[k] <- (r[i,k] + r[k,i])/(2*w)
    } else {
        denom <- 4*qs
        qv[1] <- (r[3,2] - r[2,3])/denom;
        qv[2] <- (r[1,3] - r[3,1])/denom;
        qv[3] <- (r[2,1] - r[1,2])/denom;

Create reference domain

All input images will be resampled onto the reference domain.

This domain is defined by two constraints: the number of pixels per dimension and the physical size we want the reference domain to occupy. The former is associated with the computational constraints of deep learning where using a small number of pixels is desired. The later is associated with the SimpleITK concept of an image, it occupies a region in physical space which should be large enough to encompass the object of interest.

In [7]:
dimension <- data[[1]]$GetDimension()

# Physical image size corresponds to the largest physical size in the training set, or any other arbitrary size.
reference_physical_size <- numeric(dimension)
physical_sizes <- lapply(data, function(image){return ((image$GetSize()-1)*image$GetSpacing())})
reference_physical_size <- apply(,physical_sizes),2,max)

# Create the reference image with a zero origin, identity direction cosine matrix and dimension     
reference_origin <- numeric(dimension)
reference_direction <- as.vector(t(diag(dimension)))

# Select arbitrary number of pixels per dimension, smallest size that yields desired results 
# or the required size of a pretrained network (e.g. VGG-16 224x224), transfer learning. This will 
# often result in non-isotropic pixel spacing.
reference_size <- rep(128, dimension)
reference_spacing <- reference_physical_size / (reference_size-1)

# Another possibility is that you want isotropic pixels, then you can specify the image size for one of
# the axes and the others are determined by this choice. Below we choose to set the x axis to 128 and the
# spacing set accordingly. 
# Uncomment the following lines to use this strategy.
#reference_size_x <- 128
#reference_spacing <- rep(reference_physical_size[1]/(reference_size_x-1),dimension)
#reference_size <- as.integer(reference_physical_size / reference_spacing + 1)

reference_image <- Image(reference_size, data[[1]]$GetPixelID())

# Always use the TransformContinuousIndexToPhysicalPoint to compute an indexed point's physical coordinates as 
# this takes into account size, spacing and direction cosines. For the vast majority of images the direction 
# cosines are the identity matrix, but when this isn't the case simply multiplying the central index by the 
# spacing will not yield the correct coordinates resulting in a long debugging session. 
reference_center <- reference_image$TransformContinuousIndexToPhysicalPoint(reference_image$GetSize()/2.0)

Data generation

Once we have a reference domain we can augment the data using any of the SimpleITK global domain transformations. In this notebook we use a similarity transformation (the generate_images function is agnostic to this specific choice).

Note that you also need to create the labels for your augmented images. If these are just classes then your processing is minimal. If you are dealing with segmentation you will also need to transform the segmentation labels so that they match the transformed image. The following function easily accommodates for this, just provide the labeled image as input and use the sitk.sitkNearestNeighbor interpolator so that you do not introduce labels that were not in the original segmentation.

In [8]:
#Generate the resampled images based on the given transformations.
#  Args:
#    original_image (SimpleITK image): The image which we will resample and transform.
#    reference_image (SimpleITK image): The image onto which we will resample.
#    T0 (SimpleITK transform): Transformation which maps points from the reference image coordinate system 
#            to the original_image coordinate system.
#    T_aug (SimpleITK transform): Map points from the reference_image coordinate system back onto itself using the
#           given transformation_parameters. The reason we use this transformation as a parameter
#           is to allow the user to set its center of rotation to something other than zero.
#    transformation_parameters (List of lists): parameter values which we use T_aug.SetParameters().
#    output_prefix (string): output file name prefix (file name: output_prefix_p1_p2_..pn_.output_suffix).
#    output_suffix (string): output file name suffix (file name: output_prefix_p1_p2_..pn_.output_suffix).
#    interpolator: One of the SimpleITK interpolators.
#    default_intensity_value: The value to return if a point is mapped outside the original_image domain
augment_images_spatial <- function(original_image, reference_image, T0, T_aug, transformation_parameters,
                    output_prefix, output_suffix,
                    interpolator = "sitkLinear", default_intensity_value = 0.0) {
    all_images <- lapply(transformation_parameters, 
                         function(current_parameters) {
                           # Augmentation is done in the reference image space, so we first map the points from 
                           # the reference image space back onto itself T_aug (e.g. rotate the reference image)
                           # and then we map to the original image space T0.
                           T_all <- Transform(T0)
                           aug_image <- Resample(original_image, reference_image, T_all,
                                                 interpolator, default_intensity_value)
                           WriteImage(aug_image, paste0(output_prefix,"_",paste0(current_parameters, collapse="_"),
    return(all_images) # Used only for display purposes in this notebook.

Before we can use the generate_images function we need to compute the transformation which will map points between the reference image and the current image as shown in the code cell below.

Note that it is very easy to generate large amounts of data, the calls to np.linspace with $m$ parameters each having $n$ values results in $n^m$ images, so don't forget that these images are also saved to disk. If you run the code below for 3D data you will generate 6561 volumes ($3^7$ parameter combinations times 3 volumes).

In [9]:
if(dimension == 2) {
  # The parameters are scale (+-10%), rotation angle (+-10 degrees), x translation, y translation
  transformation_parameters_list = parameter_space_regular_grid_sampling(seq(0.9,1.1,0.1),
  aug_transform <- Similarity2DTransform()
} else {   
  transformation_parameters_list = similarity3D_parameter_space_regular_sampling(seq(-pi/18.0,pi/18.0,pi/18.0),
  aug_transform <- Similarity3DTransform()
all_images <- lapply(seq_along(data), 
                     function(i, images){
                       img <- images[[i]]
                       # Transform which maps from the reference_image to the current img with the translation mapping the image
                       # origins to each other.
                       transform <- AffineTransform(dimension)
                       transform$SetTranslation(img$GetOrigin() - reference_origin)
                       # Modify the transformation to align the centers of the original and reference image instead 
                       # of their origins.
                       centering_transform <- TranslationTransform(dimension)
                       img_center <- img$TransformContinuousIndexToPhysicalPoint(img$GetSize()/2.0)
                       centering_transform$SetOffset(transform$GetInverse()$TransformPoint(img_center) - 
                       centered_transform <- Transform(transform)

                       # Set the augmenting transform's center so that rotation is around the image center.
                       return(augment_images_spatial(img, reference_image, centered_transform, 
                                                     aug_transform, transformation_parameters_list, 
                                                     file.path(OUTPUT_DIR, paste0('spatial_aug',i)), 'mha'))
  # For each 2D image,stack the augmentation images into a 3D volume and display.
  invisible(lapply(all_images, function(images) {Show(JoinSeries(images))}))

What about flipping

Reflection using SimpleITK can be done in one of several ways:

  1. Use an affine transform with the matrix component set to a reflection matrix. The columns of the matrix correspond to the $\mathbf{x}, \mathbf{y}$ and $\mathbf{z}$ axes. The reflection matrix is constructed using the plane, 3D, or axis, 2D, which we want to reflect through with the standard basis vectors, $\mathbf{e}_i, \mathbf{e}_j$, and the remaining basis vector set to $-\mathbf{e}_k$.
    • Reflection about $xy$ plane: $[\mathbf{e}_1, \mathbf{e}_2, -\mathbf{e}_3]$.
    • Reflection about $xz$ plane: $[\mathbf{e}_1, -\mathbf{e}_2, \mathbf{e}_3]$.
    • Reflection about $yz$ plane: $[-\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3]$.
  2. Use the native slicing operator(e.g. img[,seq(img.GetHeight(),1,-1),]), or the FlipImageFilter after the image is resampled onto the reference image grid.

We prefer option 1 as it is computationally more efficient. It combines all transformation prior to resampling, while the other approach performs resampling onto the reference image grid followed by the reflection operation. An additional difference is that using slicing or the FlipImageFilter will also modify the image origin while the resampling approach keeps the spatial location of the reference image origin intact. This minor difference is of no concern in deep learning as the content of the images is the same, but in SimpleITK two images are considered equivalent iff their content and spatial extent are the same.

The following cell corresponds to the preferred option, using an affine transformation:

In [10]:
flipped_images <- lapply(data, 
       function(img) {
         # Compute the transformation which maps between the reference and current image (same as done above).
         transform <- AffineTransform(dimension)
         transform$SetTranslation(img$GetOrigin() - reference_origin)
         centering_transform <- TranslationTransform(dimension)
         img_center <- img$TransformContinuousIndexToPhysicalPoint(img$GetSize()/2.0)
         centering_transform$SetOffset(transform$GetInverse()$TransformPoint(img_center) - reference_center)
         centered_transform <- Transform(transform)
         flipped_transform <- AffineTransform(dimension)    
         if(dimension==2) { # matrices in SimpleITK specified in row major order
         } else {
         # Resample onto the reference image 
         return(Resample(img, reference_image, centered_transform, "sitkLinear", 0.0))

Radial Distortion

Some 2D medical imaging modalities, such as endoscopic video and X-ray images acquired with C-arms using image intensifiers, exhibit radial distortion. The common model for such distortion was described by Brown ["Close-range camera calibration", Photogrammetric Engineering, 37(8):855–866, 1971]: $$ \mathbf{p}_u = \mathbf{p}_d + (\mathbf{p}_d-\mathbf{p}_c)(k_1r^2 + k_2r^4 + k_3r^6 + \ldots) $$


  • $\mathbf{p}_u$ is a point in the undistorted image
  • $\mathbf{p}_d$ is a point in the distorted image
  • $\mathbf{p}_c$ is the center of distortion
  • $r = \|\mathbf{p}_d-\mathbf{p}_c\|$
  • $k_i$ are coefficients of the radial distortion

Using SimpleITK operators we represent this transformation using a deformation field as follows:

In [11]:
radial_distort <- function(image, k1, k2, k3, distortion_center=NULL) {
  c <- distortion_center
  if(is.null(c)) { # The default distortion center coincides with the image center
    c <- image$TransformContinuousIndexToPhysicalPoint(image$GetSize()/2.0)
  # Compute the vector image (p_d - p_c) 
  delta_image <- PhysicalPointSource( "sitkVectorFloat64", image$GetSize(), image$GetOrigin(), image$GetSpacing(), image$GetDirection())
  delta_image_list <- lapply(seq_along(c), function(i,center) {return(VectorIndexSelectionCast(delta_image,i-1) - center[i])},c)

  # Compute the radial distortion expression  
  r2_image <- NaryAdd(lapply(delta_image_list, function(img){return(img**2)}))
  r4_image <- r2_image**2
  r6_image <- r2_image*r4_image
  disp_image <- k1*r2_image + k2*r4_image + k3*r6_image
  displacement_image <- Compose(lapply(delta_image_list, function(img){return(img*disp_image)}))

  displacement_field_transform <- DisplacementFieldTransform(displacement_image)
  return(Resample(image, image, displacement_field_transform))
k1 = 0.00001
k2 = 0.0000000000001
k3 = 0.0000000000001
original_image <- data[[1]]
distorted_image <- radial_distort(original_image, k1, k2, k3)
# Use a grid image to highlight the distortion.
grid_image <- GridSource(outputPixelType=original_image$GetPixelID(), size=original_image$GetSize(), 
                         sigma=rep(0.1, dimension), gridSpacing=rep(20.0,dimension))
distorted_grid <- radial_distort(grid_image, k1, k2, k3)
Show(Tile(c(original_image, distorted_image, distorted_grid)))

Transferring deformations - exercise for the interested reader

Using SimpleITK we can readily transfer deformations from a spatio-temporal data set to another spatial data set to simulate temporal behavior. Case in point, using a 4D (3D+time) CT of the thorax we can estimate the respiratory motion using non-rigid registration and Free Form Deformation or displacement field transformations. We can then register a new spatial data set to the original spatial CT (non-rigidly) followed by application of the temporal deformations.

Note that unlike the arbitrary spatial transformations we used for data-augmentation above this approach is more computationally expensive as it involves multiple non-rigid registrations. Also note that as the goal is to use the estimated transformations to create plausible deformations you may be able to relax the required registration accuracy.

Augmentation using intensity modifications

SimpleITK has many filters that are potentially relevant for data augmentation via modification of intensities. For example:

In [12]:
#Generate intensity modified images from the originals.
# Args:
#   image_list (containing SimpleITK images): The images whose intensities we modify.
#   output_prefix (string): output file name prefix (file name: output_prefixi_FilterName.output_suffix).
#   output_suffix (string): output file name suffix (file name: output_prefixi_FilterName.output_suffix).
augment_images_intensity <- function(image_list, output_prefix, output_suffix) {

  # Create a list of intensity modifying filters, which we apply to the given images
  num_filters <- 10
  index <- 1
  filter_list <- vector("list",num_filters)
  # Smoothing filters
  filter_list[[index]] <- SmoothingRecursiveGaussianImageFilter()
  index = index + 1
  filter_list[[index]] <- DiscreteGaussianImageFilter()
  index = index + 1
  filter_list[[index]] <- BilateralImageFilter()
  index = index + 1
  filter_list[[index]] <- MedianImageFilter()
  index = index + 1
  # Noise filters using default settings
  # Filter control via SetMean, SetStandardDeviation.
  filter_list[[index]] <- AdditiveGaussianNoiseImageFilter()
  index = index + 1
  # Filter control via SetProbability
  filter_list[[index]] <- SaltAndPepperNoiseImageFilter()
  index = index + 1
  # Filter control via SetScale
  filter_list[[index]] <- ShotNoiseImageFilter()
  index = index + 1
  # Filter control via SetStandardDeviation
  filter_list[[index]] <- SpeckleNoiseImageFilter()
  index = index + 1
  filter_list[[index]] <- AdaptiveHistogramEqualizationImageFilter()
  index = index + 1
  filter_list[[index]] <- AdaptiveHistogramEqualizationImageFilter()
  index = index + 1
  # Used only for display purposes in this notebook.
  aug_image_lists <- lapply(seq_along(image_list), function(i, images) {
                       lapply(filter_list, function(filter) {
                         aug_image<- filter$Execute(images[[i]])
                         WriteImage(aug_image, paste0(output_prefix, i, '_',
                                    filter$GetName(), '.', output_suffix))

Modify the intensities of the original images using the set of SimpleITK filters described above. If we are working with 2D images the results will be displayed inline.

In [13]:
intensity_augmented_images <- augment_images_intensity(data, file.path(OUTPUT_DIR, 'intensity_aug'), 'mha')

# in 2D we join all of the images into 3D volumes which we use for display.
if(dimension==2) {
    all_volumes <- lapply(intensity_augmented_images, 
                            JoinSeries(lapply(image_list, function(img){Cast(img, "sitkFloat32")}))      
    invisible(lapply(all_volumes, Show))

Finally, you can easily create intensity variations that are specific to your domain, such as the spatially varying multiplicative and additive transformation shown below.

In [14]:
#Modify the intensities using multiplicative and additive Gaussian bias fields.
mult_and_add_intensity_fields <- function(original_image) {
  # Gaussian image with same meta-information as original (size, spacing, direction cosine)
  # Sigma is half the image's physical size and mean is the center of the image. 
  g_mult = GaussianSource(original_image$GetPixelID(),
                          (original_image$GetSize() - 1)*original_image$GetSpacing()/2.0,

  # Gaussian image with same meta-information as original (size, spacing, direction cosine)
  # Sigma is 1/8 the image's physical size and mean is at 1/16 of the size 
  g_add = GaussianSource(original_image$GetPixelID(),
                         (original_image$GetSize() - 1)*original_image$GetSpacing()/8.0,
invisible(lapply(data, function(img){Show(mult_and_add_intensity_fields(img))}))
In [ ]: