Skip to content

Tensorflow implementation of FCN8-VGG16 for semantic segmentation

License

Notifications You must be signed in to change notification settings

marcomarasca/SDCND-Semantic-Segmentation

Repository files navigation

Semantic Segmentation

Udacity - Self-Driving Car NanoDegree

Gif: Semantic Segmentation

Overview

This repository contains an implementation with Tensorflow of a Fully Convolutional Network (FCN) used to label image pixels in the context of semantic scene understanding:

Semantic Scene Segmentation

The implementation is based on the Fully Convolutional Networks for Semantic Segmentation paper by Evan Shelhamer, Jonathan Long and Trevor Darrell (original caffe implementation can be found here).

FCN VGG16 Architecture

The model uses as encoder a VGG16 model, then a decoder is added in order to upsample the filters to the final image size, using 1x1 convolutions and transposed convolutions in order to upsample the layers. Additionally skip layers are used to bring in better spatial information from previous layers.

Getting Started

This project was implemented using TensorFlow and you'll need a set of dependencies in order to run the code, in particular:

Given the complexity of the model a GPU is strongly suggested to train the model; A good and relatively cheap way is to use an EC2 instance on AWS. For example a p2.xlarge instance on EC2 is a good candidate for this type of task (You'll have to ask for an increase in the limits for this type of instance). Alternatively a cheaper instance type (that I used during training) is the GPU graphics instance g3x.xlarge, the instance is relatively slower but the GPU (M60) is newer and faster than the K80 on the p2.xlarge even though it has less memory (8 vs 12).

You can use the official Deep Learning AMI from Amazon that contains most of the required dependencies (See https://docs.aws.amazon.com/dlami/latest/devguide/gs.html) aside from tqdm.

The main.py script can be run as follows:

$ python model.py [flags]

Where flags can be set to:

  • [--data_dir]: The folder containing the training data (default ./data)
  • [--runs_dir]: The folder where the output is saved (default ./runs)
  • [--model_folder]: The folder where the model is saved/loaded (default ./models/[generated_name])
  • [--epochs]: The number of epochs (default 80)
  • [--batch_size]: The batch size (default 25)
  • [--dropout]: The dropout probability (default 0.5)
  • [--learning_rate]: The learning rate (default 0.0001)
  • [--l2_reg]: The amount of L2 regularization (defualt 0.001)
  • [--scale]: True if scaling should be applied to layers 3 and 4 of VGG (default True)
  • [--early_stopping]: The number of epochs after which the training is stopped if the loss didn't improve (default 4)
  • [--seed]: Integer used to seed random ops for reproducibility (default None)
  • [--cpu]: If True disable the GPU (default None)
  • [--tests]: If True runs the tests (default True)
  • [--train]: If True runs the training (default True), if a model checkpoint exists in the model_folder the weights will be reloaded
  • [--image]: Image path to run inference for (default None)
  • [--video]: Video path to run inference for (defatul None)
  • [--augment]: Path to the target folder where to save augmented data from data_dir (default None)
  • [--serialize]: Path of a non existing folder where to save the pb version of the checkpoint saved during training (default None)

Tensorboard

The script will save summaries for Tensorboard in the logs folder:

$ tensorboard --samples_per_plugin images=0 --logdir=logs

The summaries include the training loss, accuracy and intersection over union (IOU) metrics. It will also save images with the predicted result:

TensorboardTensorboard

TensorboardTensorboard

Examples

Training

An example to run a training session on 10 epochs with a batch size of 10 and learning rate of 0.001, saving the model into models\my_model:

$ python main.py --tests=false --epochs=10 --batch_size=10 --learning_rate=0.001 --model_folder=models\\my_model

Processing Image

An example of processing a single image image.png using a model saved into models\my_model:

$ python main.py --tests=false --model_folder=models\\my_model --image=image.png

Processing Video

An example of processing a video video.mp4 using a model saved into models\my_model:

$ python main.py --tests=false --model_folder=models\\my_model --video=video.mp4

Video

Dataset aumentation

An example of augmenting the dataset in the data folder ans saving the result in data\augmented (expects the training to be in data\data_road\training):

$ python main.py --tests=false --data_dir=data --augment=data\\augmented

Serializing model

An example of serializing a model to a proto buffer in model\my_model\serialized from a checkpoint in models\my_model:

$ python main.py --tests=false --model_folder=models\\my_model --serialize=models\\my_model\\serialized

Dataset

In order to train the network we used the Kitti Road dataset, that can be downloaded from here. It contains both training and testing images, with the ground truth images for the training dataset that are labelled with the correct pixel categorization (road vs non-road):

Kitti Dataset

Augmentation

The Kitti dataset contains 289 labelled samples, in order to improve the model performance it can be easily augmented, the repository contains a python script that simply mirrors the images and applies a random amount of brightness:

Kitti Augmented Dataset

Training and Testing

The training was performed with various hyperparameters, starting from the following baseline:

  • Epochs: 50
  • Batch Size: 15
  • Learning Rate: 0.001
  • Dropout: 0.5
  • L2 Regularization: 0.001
  • Scaling: False

Note that scaling is a teqnique depicted in the original implementation when they perform what they name "at-once" training, the pooling layers 3 and 4 from the VGG16 model are scaled before the 1x1 convolution is applied (See https://github.com/shelhamer/fcn.berkeleyvision.org/blob/1305c7378a9f0ab44b2c936f4d60e4687e3d8743/voc-fcn8s-atonce/net.py#L65).

Baseline
Baseline

Various experiments with different configurations were needed in order to tune the model:

Training Loss

And in the following a sample of images with the various configurations:

Hyperparameters Tuning

As we can see scaling smoothen the result better and augmenting the dataset helped in producing more accurate results:

Baseline
Augmented Dataset, Scaling ON

Using the base learning rate (without decay) the model would converge but stop learning after around 30-40 epochs. When lowering the learning rate on the augmented dataset we could train on 80 epochs which retained the best accuracy:

Baseline
Augmented Dataset, Scaling ON and Learning Rate 0.0001

The parameters used for the final training (in one shot):

  • Epochs: 80
  • Batch Size: 25
  • Learning Rate: 0.0001
  • Dropout: 0.5
  • L2 Regularization: 0.001
  • Scaling: True

About

Tensorflow implementation of FCN8-VGG16 for semantic segmentation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published