Focused Ion Beam Scanning Electron Microscopy Image Segmentation¶
Summary:
- SimpleITK supports a large number of filters that facilitate classical segmentation algorithms (variety of thresholding algorithms, watersheds...).
- Once your data is segmented SimpleITK enables you to efficiently post process the segmentation (e.g. label distinct objects, analyze object shapes).
This notebook will illustrate the use of SimpleITK for segmentation of bacteria from a 3D Focused Ion Beam Scanning Electron Microscopy (FIB-SEM) image. The specific bacterium is bacillus subtilis, a rod shaped organism naturally found in soil and plants. The bacteria have been subjected to stress to initiate the process of forming an endospore. These endospores can be seen as a generally dark ellipsoid inside the individual bacterium.
Data provided courtesy of the Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health.
import SimpleITK as sitk
import pandas as pd
%matplotlib widget
import matplotlib.pyplot as plt
import gui
from math import ceil
%run update_path_to_download_script
from downloaddata import fetch_data as fdata
Load data¶
Load the 3D volume and display it.
img = sitk.ReadImage(fdata("fib_sem_bacillus_subtilis.mha"))
gui.MultiImageDisplay(image_list=[img], figure_size=(8, 4));
Fetching fib_sem_bacillus_subtilis.mha
Segmentation¶
To allow us to analyze the shape of whole bacteria we first need to segment them. We will do this in several steps:
- Separate the bacteria from the embedding resin background.
- Mark each potential bacterium with a unique label, to evaluate the segmentation.
- Remove small components and fill small holes using binary morphology operators (opening and closing).
- Use seed based watersheds to perform final segmentation.
- Remove bacterium that are connected to the image boundary.
Separate the bacteria from the background¶
Based on the visualization of the data above, it intuitively appears that the background and foreground are separable using a single intensity threshold. Our first step towards validating this observation is to plot the intensity distribution.
plt.figure()
plt.hist(sitk.GetArrayViewFromImage(img).flatten(), bins=100)
plt.show()
The histogram is bi-modal with a clear separation, which we have manually identified as having an intensity value of 120.
We can also use one of several binary threshold selection filters available in SimpleITK.
threshold_filters = {
"Otsu": sitk.OtsuThresholdImageFilter(),
"Triangle": sitk.TriangleThresholdImageFilter(),
"Huang": sitk.HuangThresholdImageFilter(),
"MaxEntropy": sitk.MaximumEntropyThresholdImageFilter(),
}
filter_selection = "Manual"
try:
thresh_filter = threshold_filters[filter_selection]
thresh_filter.SetInsideValue(0)
thresh_filter.SetOutsideValue(1)
thresh_img = thresh_filter.Execute(img)
thresh_value = thresh_filter.GetThreshold()
except KeyError:
thresh_value = 120
thresh_img = img > thresh_value
print("Threshold used: " + str(thresh_value))
gui.MultiImageDisplay(
image_list=[sitk.LabelOverlay(img, thresh_img)],
title_list=["Binary Segmentation"],
figure_size=(8, 4),
);
Threshold used: 120
Mark each potential bacterium with unique label and evaluate¶
stats = sitk.LabelShapeStatisticsImageFilter()
stats.Execute(sitk.ConnectedComponent(thresh_img))
# Look at the distribution of sizes of connected components (bacteria).
label_sizes = [stats.GetNumberOfPixels(l) for l in stats.GetLabels() if l != 1]
plt.figure()
plt.hist(label_sizes, bins=200)
plt.title("Distribution of Object Sizes")
plt.xlabel("size in pixels")
plt.ylabel("number of objects")
plt.show()
The histogram above shows tens of thousands of very small labels which are not visually detected by looking at the segmentation.
Remove small islands and holes¶
Using binary morphological operations we remove small objects using the opening operation and fill small holes using the closing operation. The use of opening and closing by reconstruction maintains the boundary of the original objects.
cleaned_thresh_img = sitk.BinaryOpeningByReconstruction(thresh_img, [10, 10, 10])
cleaned_thresh_img = sitk.BinaryClosingByReconstruction(
cleaned_thresh_img, [10, 10, 10]
)
gui.MultiImageDisplay(
image_list=[sitk.LabelOverlay(img, cleaned_thresh_img)],
title_list=["Cleaned Binary Segmentation"],
figure_size=(8, 4),
);
Check that the number of objects defined by the binary image is more reasonable.
stats = sitk.LabelShapeStatisticsImageFilter()
stats.Execute(sitk.ConnectedComponent(cleaned_thresh_img))
# Look at the distribution of sizes of connected components (bacteria).
label_sizes = [stats.GetNumberOfPixels(l) for l in stats.GetLabels() if l != 1]
plt.figure()
plt.hist(label_sizes, bins=200)
plt.title("Distribution of Object Sizes")
plt.xlabel("size in pixels")
plt.ylabel("number of objects")
plt.show()
After the morphological operations, our binary image seems to have a reasonable number of objects, but is this true? We next look at the unique objects defined by this binary segmentation (each object is marked with a unique color).
gui.MultiImageDisplay(
image_list=[sitk.LabelOverlay(img, sitk.ConnectedComponent(cleaned_thresh_img))],
title_list=["Cleaned Binary Segmentation"],
figure_size=(8, 4),
);
Seed based watershed segmentation¶
The bacteria appear to be segmented correctly from the background but not from each other. Using the visualization and histogram above we see that in 3D many of them are connected, even if on a slice by slice inspection they appear separate.
dist_img = sitk.SignedMaurerDistanceMap(
cleaned_thresh_img != 0,
insideIsPositive=False,
squaredDistance=False,
useImageSpacing=False,
)
radius = 10
# Seeds have a distance of "radius" or more to the object boundary, they are uniquely labelled.
seeds = sitk.ConnectedComponent(dist_img < -radius)
# Relabel the seed objects using consecutive object labels while removing all objects with less than 15 pixels.
seeds = sitk.RelabelComponent(seeds, minimumObjectSize=15)
# Run the watershed segmentation using the distance map and seeds.
ws = sitk.MorphologicalWatershedFromMarkers(dist_img, seeds, markWatershedLine=True)
ws = sitk.Mask(ws, sitk.Cast(cleaned_thresh_img, sitk.sitkUInt8))
Visualize the distance map, the unique seeds and final object segmentation.
gui.MultiImageDisplay(
image_list=[dist_img, sitk.LabelOverlay(img, seeds), sitk.LabelOverlay(img, ws)],
title_list=[
"Segmentation Distance",
"Watershed Seeds",
"Binary Watershed Labeling",
],
shared_slider=True,
horizontal=False,
figure_size=(6, 12),
);
Removal of objects touching the image boundary¶
We are not sure objects touching the image boundary are whole bacteria, so we remove them.
# The image has a small black border which we account for here.
bgp = sitk.BinaryGrindPeak((ws != 0) | (img == 0))
non_border_seg = sitk.Mask(ws, bgp == 0)
gui.MultiImageDisplay(
image_list=[sitk.LabelOverlay(img, non_border_seg)],
title_list=["Final Segmentation"],
figure_size=(8, 4),
);
Object Analysis¶
Once we have the segmented objects we look at their shapes and the intensity distributions inside the objects.
Note that sizes are in nanometers. ITK and consequently SimpleITK are agnostic of the actual measurement units. It is up to you as the developer to explicitly use the correct units and more importantly, DO NOT MIX UNITS.
We first compute all of the measurements we are interested in.
shape_stats = sitk.LabelShapeStatisticsImageFilter()
shape_stats.ComputeOrientedBoundingBoxOn()
shape_stats.Execute(non_border_seg)
intensity_stats = sitk.LabelIntensityStatisticsImageFilter()
intensity_stats.Execute(non_border_seg, img)
Insert the values into a pandas dataframe and display some descriptive statistics.
stats_list = [
(
shape_stats.GetPhysicalSize(i),
shape_stats.GetElongation(i),
shape_stats.GetFlatness(i),
shape_stats.GetOrientedBoundingBoxSize(i)[0],
shape_stats.GetOrientedBoundingBoxSize(i)[2],
intensity_stats.GetMean(i),
intensity_stats.GetStandardDeviation(i),
intensity_stats.GetSkewness(i),
)
for i in shape_stats.GetLabels()
]
cols = [
"Volume (nm^3)",
"Elongation",
"Flatness",
"Oriented Bounding Box Minimum Size(nm)",
"Oriented Bounding Box Maximum Size(nm)",
"Intensity Mean",
"Intensity Standard Deviation",
"Intensity Skewness",
]
# Create the pandas data frame and display descriptive statistics.
stats = pd.DataFrame(data=stats_list, index=shape_stats.GetLabels(), columns=cols)
stats.describe()
Volume (nm^3) | Elongation | Flatness | Oriented Bounding Box Minimum Size(nm) | Oriented Bounding Box Maximum Size(nm) | Intensity Mean | Intensity Standard Deviation | Intensity Skewness | |
---|---|---|---|---|---|---|---|---|
count | 3.900000e+01 | 39.000000 | 39.000000 | 39.000000 | 39.000000 | 39.000000 | 39.000000 | 39.000000 |
mean | 1.324555e+09 | 2.525905 | 1.111606 | 948.123157 | 2362.926609 | 162.185805 | 18.569868 | -0.178636 |
std | 2.510872e+08 | 0.388671 | 0.039272 | 68.175613 | 328.825362 | 6.075755 | 1.875691 | 0.335983 |
min | 7.793283e+08 | 1.898059 | 1.027848 | 821.116863 | 1740.945289 | 146.025127 | 15.623551 | -0.787958 |
25% | 1.168211e+09 | 2.198528 | 1.079683 | 912.250436 | 2136.888025 | 156.540422 | 16.769297 | -0.427365 |
50% | 1.348983e+09 | 2.511455 | 1.113341 | 955.223044 | 2318.288835 | 164.302889 | 18.589151 | -0.234148 |
75% | 1.476934e+09 | 2.772892 | 1.142791 | 1000.945909 | 2618.720666 | 166.456203 | 19.521086 | 0.019480 |
max | 1.873997e+09 | 3.424463 | 1.181038 | 1109.047735 | 3063.513603 | 171.697966 | 23.390004 | 0.655212 |
Create a plot to investigate the relationship, possible correlations, between volume and object shape characteristics (elongation, flatness, principal moments).
fig, axes = plt.subplots(nrows=len(cols), ncols=2, figsize=(6, 4 * len(cols)))
axes[0, 0].axis("off")
stats.loc[:, cols[0]].plot.hist(ax=axes[0, 1], bins=25)
axes[0, 1].set_xlabel(cols[0])
axes[0, 1].xaxis.set_label_position("top")
for i in range(1, len(cols)):
c = cols[i]
bar = stats.loc[:, [c]].plot.hist(
ax=axes[i, 0], bins=20, orientation="horizontal", legend=False
)
bar.set_ylabel(stats.loc[:, [c]].columns.values[0])
scatter = stats.plot.scatter(ax=axes[i, 1], y=c, x=cols[0])
scatter.set_ylabel("")
# Remove axis labels from all plots except the last (they all share the labels)
if i < len(cols) - 1:
bar.set_xlabel("")
scatter.set_xlabel("")
# Adjust the spacing between plot columns and set the plots to have a tight
# layout inside the figure.
plt.subplots_adjust(wspace=0.4)
plt.tight_layout()
Finally, we visualize a lineup of the bacteria using a coordinate system that is defined by the oriented bounding box enclosing each of them.
bacteria_labels = shape_stats.GetLabels()
bacteria_volumes = [shape_stats.GetPhysicalSize(label) for label in bacteria_labels]
num_images = 5 # number of bacteria images we want to display
bacteria_labels_volume_sorted = [
label for _, label in sorted(zip(bacteria_volumes, bacteria_labels))
]
resampler = sitk.ResampleImageFilter()
aligned_image_spacing = [10, 10, 10] # in nanometers
for label in bacteria_labels_volume_sorted[0:num_images]:
aligned_image_size = [
int(
ceil(
shape_stats.GetOrientedBoundingBoxSize(label)[i]
/ aligned_image_spacing[i]
)
)
for i in range(3)
]
direction_mat = shape_stats.GetOrientedBoundingBoxDirection(label)
aligned_image_direction = [
direction_mat[0],
direction_mat[3],
direction_mat[6],
direction_mat[1],
direction_mat[4],
direction_mat[7],
direction_mat[2],
direction_mat[5],
direction_mat[8],
]
resampler.SetOutputDirection(aligned_image_direction)
resampler.SetOutputOrigin(shape_stats.GetOrientedBoundingBoxOrigin(label))
resampler.SetOutputSpacing(aligned_image_spacing)
resampler.SetSize(aligned_image_size)
obb_img = resampler.Execute(img)
# Change the image axes order so that we have a nice display.
obb_img = sitk.PermuteAxes(obb_img, [2, 1, 0])
gui.MultiImageDisplay(image_list=[obb_img], title_list=[f"OBB_{label}"])