U-Net-Semantic-Segmentation

1. Get our data

Download data from: Here

2. Data Preproccesing

2.1. Load the images and masks from their directories

In this data preparation step, we will:

Create 2 lists containing the paths of images and masks
Split the lists into training, validation and test sets

2.2. Create a function to read image and mask paths and return equivalent arrays

The read_image function will

Read an image and its mask from their paths
Convert the digital image and its mask to image arrays
Normalize the datasets
Resize the image and its masks to a desired dimension

3. Model Architecture and Training

We will using a the U-Net architecture to train our semantic segmentation model. The U-Net model was named for its U-shape architecture, and it was originally created for biomedical image segmentation tasks in 2015. However, the model has become a very popular choice for other semantic segmentation tasks.

U-Net builds on a previous architecture called the Fully Convolutional Network, or FCN, which replaces the dense layers found in a typical CNN with a transposed convolution layer that upsamples the feature map back to the size of the original input image, while preserving the spatial information. This is necessary because the dense layers destroy spatial information (the "where" of the image), which is an essential part of image segmentation tasks. An added bonus of using transpose convolutions is that the input size no longer needs to be fixed, as it does when dense layers are used.

The U-Net architecture consists of:

1. Contracting path (Encoder containing downsampling steps):

The contracting path follows a regular CNN architecture, with convolutional layers, their activations, and pooling layers to downsample the image and extract its features. Images are first fed through several convolutional layers which reduce height and width, while growing the number of channels.

In detail, it consists of the repeated application of two 3 x 3 unpadded convolutions, each followed by a rectified linear unit (ReLU) and a 2 x 2 max pooling operation with stride 2 for downsampling. At each downsampling step, the number of feature channels is doubled.

During the contracting process, convolution outputs are stored in a separate variable before size reduction (pooling of features). This is passed to the expanding blocks during the decoding process as feature map.

2. Expanding path (Decoder containing upsampling steps):

The expanding path performs the opposite operation of the contracting path, growing the image back to its original size, while shrinking the channels gradually.

In detail, each step in the expanding path upsamples the feature map, followed by a 2 x 2 convolution (the transposed convolution or upsampling). This transposed convolution halves the number of feature channels, while growing the height and width of the image.

3.1. U-Net Model Design

To design our model, we will carry out the following steps

Define a function for an encoding block. The function will return the next layer output and the skip connection output for the corresponding block in the model
Define a function for a decoding block. This function will merge the skip-connection input with the previous layer, process it, and return an output
Develop a model using both the encoding and decoding blocks output

Next is a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 x 3 convolutions, each followed by a ReLU.

3. Final Feature Mapping Block: In the final layer, a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. The channel dimensions from the previous layer correspond to the number of filters used, so when you use 1x1 convolutions, you can transform that dimension by choosing an appropriate number of 1x1 filters. When this idea is applied to the last layer, you can reduce the channel dimensions to have one layer per class.

The U-Net network has 23 convolutional layers in total.

4. Model Evaluation

Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. For classification tasks, precision and recall are the popular choice metrics used in addition with model accuracy to evaluate model performance since model accuracy is not always sufficient to judge if a model is optimal or not (especially if our dataset is skewed). The same rule applies to most dense prediction tasks like image segmentation where the goal is to simplify and/or change the representation of an image into classes that is more meaningful and easier to analyze.

Since the goal of our model is to partition an input image into various classes, it is often difficult to know if our model struggles to optimally partition one or more classes since it doesn't always reflect in the model accuracy, neither can it easily detected by the eyes. Hence, there is a need for supplementary metrics to evaluate model performance.

We will be using recall,precision, Intersection over Union (IoU), and F1-score as supplementary metrics to evaluate our model performance. These metrics were computed by identifying the variables true positive (TP), true negative (TN), false positive (FP), and false-negative (FN) by calculating the confusion matrix between the predicted segmentations and the ground truth segmentations. The expressions for these metrics are defined as:

Precision = TP/(TP + FP)
Recall/Sensitivity = TP/(TP + FN)
Intersection over Union (IoU)/Jaccard Similarity = TP/(TP + FP + FN)
F1-score(JS)/Dice coefficient = 2 * ((Precision * Recall)/(Precision + Recall))

To carry out these evaluations, we will:

Create segmentations/masks of images in our dataset
Evaluate predicted segmentations

5. Predict image segmentations using the trained Model

Though, our model have pretty decent accuracies and IoUs on our training, validation, and test datasets, visualizing how it performs on these datasets could give us additional gains.

Hence, we will

Create a function to preprocess selected images and display their true state, true mask and predicted mask
Predict and compare masks of images in the training set
Predict and compare masks of images in the validation set
Predict and compare masks of images in the test set

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
image-semantic-segmentation-with-u-net (1).ipynb		image-semantic-segmentation-with-u-net (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

U-Net-Semantic-Segmentation

1. Get our data

2. Data Preproccesing

2.1. Load the images and masks from their directories

2.2. Create a function to read image and mask paths and return equivalent arrays

3. Model Architecture and Training

3.1. U-Net Model Design

4. Model Evaluation

5. Predict image segmentations using the trained Model

6. Segmented image

Predicted mask shows the segmented image that our model has predicted, we can compare our results with True mask

About

Releases

Packages

Languages

utkarsh-iitbhu/U-Net-Semantic-Segmentation

Folders and files

Latest commit

History

Repository files navigation

U-Net-Semantic-Segmentation

1. Get our data

2. Data Preproccesing

2.1. Load the images and masks from their directories

2.2. Create a function to read image and mask paths and return equivalent arrays

3. Model Architecture and Training

3.1. U-Net Model Design

4. Model Evaluation

5. Predict image segmentations using the trained Model

6. Segmented image

Predicted mask shows the segmented image that our model has predicted, we can compare our results with True mask

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages