Let's say that you are building a home security system. You have collected a bunch of images, and have annotated them. So you have a few columns:
camera_blocked
- whether the camera is blockeddoor_open
- whether the door of the apartment is open (has meaning only when the camera is not blocked)door_locked
- whether the door of the apartment is locked (has meaning only when the door is closed)person_present
- whether a person is present (has meaning only when the door is opened)face_x1
,face_y1
,face_w
,face_h
- the coordinates of the face of the person (has meaning only when a person is present)body_x1
,body_y1
,body_w
,body_h
- the coordinates of the body the person (has meaning only when a person is present)facial_characteristics
- represents characteristics of the face, like color, etc. (has meaning only when a person is present)shirt_type
- represents a specific type of shirt (e.g white shirt, etc.)
Let's say that you also could not annotate all of the images, so we will put nan
when there is no annotation.
Here's an example of the dataframe with the data you have obtained:
camera_blocked | door_open | person_present | door_locked | face_x1 | face_y1 | face_w | face_h | facial_characteristics | body_x1 | body_y1 | body_w | body_h | shirt_type | img | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | nan | 0 | 0 | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0.jpg |
1 | nan | 1 | 1 | nan | 22 | 4 | 12 | 12 | 0,1 | 22 | 15 | 12 | 29 | 2 | 1.jpg |
2 | nan | 0 | 0 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 2.jpg |
3 | nan | 1 | 1 | nan | 20 | 3 | 12 | 12 | 0,1 | 21 | 15 | 10 | 29 | 1 | 3.jpg |
4 | nan | 1 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 4.jpg |
5 | nan | 1 | 1 | nan | 22 | 2 | 12 | 12 | 0,1 | 22 | 13 | 11 | 30 | 2 | 5.jpg |
6 | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 6.jpg |
7 | 0 | 1 | 1 | nan | 24 | 2 | 12 | 12 | 2 | 25 | 12 | 13 | 31 | 2 | 7.jpg |
8 | 0 | 1 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 8.jpg |
9 | 0 | 1 | 1 | nan | 20 | 0 | 12 | 12 | 0,2 | 20 | 10 | 10 | 28 | 4 | 9.jpg |
10 | 0 | 1 | 1 | nan | 24 | 0 | 12 | 12 | 0 | 25 | 13 | 11 | 31 | 3 | 10.jpg |
11 | 0 | 1 | 1 | nan | 23 | 6 | 12 | 12 | 0,1 | 23 | 19 | 11 | 31 | 1 | 11.jpg |
12 | 0 | 0 | 0 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 12.jpg |
13 | 0 | 1 | 1 | nan | 22 | 1 | 12 | 12 | 2 | 20 | 11 | 13 | 29 | 2 | 13.jpg |
14 | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 14.jpg |
15 | 0 | 1 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 15.jpg |
16 | 0 | 1 | 1 | nan | 22 | 0 | 12 | 12 | 0 | 22 | 10 | 11 | 28 | 1 | 16.jpg |
17 | 0 | 1 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 17.jpg |
18 | 0 | 1 | 1 | nan | 24 | 1 | 12 | 12 | 22 | 11 | 13 | 28 | 2 | 18.jpg | |
19 | 0 | 0 | 0 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 19.jpg |
Now, how would you approach this? Let's summarize the options below
- Train a neural network for every small task - this will definitely work, but will be extremely heavy (you will have 14 different neural networks) - moreover, every model will extract similar low-level features. Will have huge memory and computational requirements, will end up very expensive.
- Use a pretrained face detector/body detector and train neural network per every other task - this suffers from the same issues before, there are still going to be a lot of neural networks, plus you don't know if the pretrained detector would be suitable for your use case, since detectors usually are pretrained for a large number of classes. Also, notice how after we have detected the face, we also want to predict the facial characteristics, and for the body we want to predict the shirt type - this means that we would have to train these separately, or we would have to understand the detector in detail to know how to modify it.
- Multi-task learning approach, where some initial features are extracted, then from a certain point on, there are branches for every task separately. This is the most lightweight and the best option in terms of quality (because it is a well known fact that multi-task learning improves generalization) but it is not so trivial to implement. Notice that for example when the camera is blocked, we should not backpropagate for the other tasks, since there is no way for us to know where the person would be etc. The model has to learn conditional objectives, which are not so easy to keep track of for several reasons.
This library dnn_cool
aims to make the conditional objectives approach much easier, therefore is named Deep
Neural Networks for Conditional Objectives oriented learning.
Assume that the variable df
holds the annotated data. Then we would have to create a dnn_cool.project.Project
,
where first we have to specify which columns should be treated as output tasks, and what type of values each column
is.
output_col = ['camera_blocked', 'door_open', 'person_present', 'door_locked',
'face_x1', 'face_y1', 'face_w', 'face_h',
'body_x1', 'body_y1', 'body_w', 'body_h']
converters = Converters()
type_guesser = converters.type
type_guesser.type_mapping['camera_blocked'] = 'binary'
type_guesser.type_mapping['door_open'] = 'binary'
type_guesser.type_mapping['person_present'] = 'binary'
type_guesser.type_mapping['door_locked'] = 'binary'
type_guesser.type_mapping['face_x1'] = 'continuous'
type_guesser.type_mapping['face_y1'] = 'continuous'
type_guesser.type_mapping['face_w'] = 'continuous'
type_guesser.type_mapping['face_h'] = 'continuous'
type_guesser.type_mapping['body_x1'] = 'continuous'
type_guesser.type_mapping['body_y1'] = 'continuous'
type_guesser.type_mapping['body_w'] = 'continuous'
type_guesser.type_mapping['body_h'] = 'continuous'
type_guesser.type_mapping['img'] = 'img'
type_guesser.type_mapping['shirt_type'] = 'category'
type_guesser.type_mapping['facial_characteristics'] = 'multilabel'
Then we have to specify how would we convert the values from a given dataframe column to actual tensor values.
For example, the column with the image filename is converted into a tensor by reading the filename from disk, binary
values are just converted to bool tensor (with missing values marked), and continous variables are normalized
in [0, 1]
. Note that we can also specify a converter directly for a given column (skipping its type), by using
values_converter.col_mapping
, but we will describe this in more detail later.
values_converter = converters.values
values_converter.type_mapping['img'] = imgs_from_disk_converter
values_converter.type_mapping['binary'] = binary_value_converter
values_converter.type_mapping['continuous'] = bounded_regression_converter
task_converter = converters.task
Now, we tell how a given type of values is converted to a Task
. A Task
is a collection of a Pytorch nn.Module
,
which holds the weights specific for the task, the activation of the task, the decoder of the task, and the loss function
of the task.
task_converter.type_mapping['binary'] = binary_classification_task
task_converter.type_mapping['continuous'] = bounded_regression_task
project = Project(df,
input_col='img',
output_col=output_col,
converters=converters,
project_dir='./high-level-project')
Perfect! Now it's time to start to start adding TaskFlow
s - this is basically a function that describes
the dependencies between the tasks. TaskFlow
actually extends Task
, so you can use TaskFlow
inside a TaskFlow
,
etc. Let's first start with creating a TaskFlow
for the face-related tasks:
@project.add_flow
def face_regression(flow, x, out):
out += flow.face_x1(x.face_localization)
out += flow.face_y1(x.face_localization)
out += flow.face_w(x.face_localization)
out += flow.face_h(x.face_localization)
out += flow.facial_characteristics(x.features)
return out
Here, flow
is the object which holds all tasks so far, x
is a Dict
which holds features for the
different branches, and out
is the variable in which the final result is accummulated.
Now we can add a TaskFlow
for body-related tasks as well:
@project.add_flow
def body_regression(flow, x, out):
out += flow.body_x1(x.body_localization)
out += flow.body_y1(x.body_localization)
out += flow.body_w(x.body_localization)
out += flow.body_h(x.body_localization)
out += flow.shirt_type(x.features)
return out
Since these two flows are already added, let's group them into a TaskFlow
for person-related tasks:
@project.add_flow
def person_regression(flow, x, out):
out += flow.face_regression(x)
out += flow.body_regression(x)
return out
And now let's implement the full flow of tasks:
@project.add_flow
def full_flow(flow, x, out):
out += flow.camera_blocked(x.features)
out += flow.door_open(x.features) | (~out.camera_blocked)
out += flow.door_locked(x.features) | (~out.door_open)
out += flow.person_present(x.features) | out.door_open
out += flow.person_regression(x) | out.person_present
return out
Here you can notice the |
operator, which is used as a precondition (read it as given
, e.g
"Door open given that the camera is not blocked"). Let's now get the full flow for the project
and have a look at some of its methods.
flow = project.get_synthetic_full_flow()
flow.get_loss() # returns a loss function that uses the children's loss functions
flow.get_per_sample_loss() # returns a loss function that will return a loss item for every sample; useful for interpreting results
flow.torch() # returns a `nn.Module` that uses the children's modules, according to the logic described in the task flow
flow.get_dataset() # returns a Pytorch `Dataset` class, which includes the preconditions needed to know which weights should be updated
flow.get_metrics() # returns a list of metrics, which consists of the metrics of its children
flow.get_treelib_explainer() # returns an object that when called, draws a tree of the decision making, based on the task flow
flow.get_decoder() # returns a decoder, which decodes all children tasks
flow.get_activation() # returns an activation function, which invokes the activations of its children
flow.get_evaluator() # returns an evaluator, which evaluates every task (given that its respective precondition is satisfied)
Now let's create some model and use the module provided by the flow.
class SecurityModule(nn.Module):
def __init__(self):
super().__init__()
self.seq = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=5),
nn.ReLU(inplace=True),
nn.Conv2d(64, 128, kernel_size=5),
nn.ReLU(inplace=True),
nn.AvgPool2d(2),
nn.Conv2d(128, 128, kernel_size=5),
nn.ReLU(inplace=True),
nn.Conv2d(128, 256, kernel_size=5),
nn.AvgPool2d(2),
nn.ReLU(inplace=True),
)
self.features_seq = nn.Sequential(
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
)
self.face_localization_seq = nn.Sequential(
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
)
self.body_localization_seq = nn.Sequential(
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=5),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
)
self.flow_module = flow.torch()
def forward(self, x):
res = {}
common = self.seq(x['img'])
res['features'] = self.features_seq(common)
res['face_localization'] = self.face_localization_seq(common)
res['body_localization'] = self.body_localization_seq(common)
res['gt'] = x.get('gt')
return self.flow_module(res)
As you can see, the model starts with shared features, then splits into 3 groups: features
, face_localization
and
body_localization
, after this it uses the flow.torch()
to get the final result and passes ground truth for preconditions
(only needed when training).
Now, to train the model you can use any framework you like (or use normal Pytorch training loop), but dnn_cool
has a
utility for working with Catalyst, so we will show how to use it:
model = SecurityModule()
runner = project.runner(model=model, runner_name='experiment_run')
runner.train(num_epochs=10)
Great, we have found the best weights of the model! But for binary classification tasks, multi-label classification tasks etc. we have to tune the thresholds after the sigmoid! To do that, just call:
runner.infer() # Dumps the predictions, targets and per-task interpretations in the directory for the run
tuned_params = runner.tune() # Tunes the thresholds per task
This calls the tuners for every specific task and selects the best threshold for every task. Great, now let's evaluate the model with the best-found thresholds:
evaluation_df = runner.evaluate()
evaluation_df
Here's an example output on a synthetic dataset:
task_path | metric_name | metric_res | num_samples | |
---|---|---|---|---|
0 | camera_blocked | accuracy | 1 | 996 |
1 | camera_blocked | f1_score | 1 | 996 |
2 | camera_blocked | precision | 1 | 996 |
3 | camera_blocked | recall | 1 | 996 |
4 | door_open | accuracy | 1 | 902 |
5 | door_open | f1_score | 1 | 902 |
6 | door_open | precision | 1 | 902 |
7 | door_open | recall | 1 | 902 |
8 | door_locked | accuracy | 1 | 201 |
9 | door_locked | f1_score | 1 | 201 |
10 | door_locked | precision | 1 | 201 |
11 | door_locked | recall | 1 | 201 |
12 | person_present | accuracy | 1 | 701 |
13 | person_present | f1_score | 1 | 701 |
14 | person_present | precision | 1 | 701 |
15 | person_present | recall | 1 | 701 |
16 | person_regression.face_regression.face_x1 | mean_absolute_error | 0.013935 | 611 |
17 | person_regression.face_regression.face_y1 | mean_absolute_error | 0.0268986 | 611 |
18 | person_regression.face_regression.face_w | mean_absolute_error | 0.0115682 | 611 |
19 | person_regression.face_regression.face_h | mean_absolute_error | 0.0121426 | 611 |
20 | person_regression.face_regression.facial_characteristics | accuracy | 0.996727 | 611 |
21 | person_regression.body_regression.body_x1 | mean_absolute_error | 0.00877354 | 611 |
22 | person_regression.body_regression.body_y1 | mean_absolute_error | 0.0188446 | 611 |
23 | person_regression.body_regression.body_w | mean_absolute_error | 0.020874 | 611 |
24 | person_regression.body_regression.body_h | mean_absolute_error | 0.0145986 | 611 |
25 | person_regression.body_regression.shirt_type | accuracy_1 | 1 | 611 |
26 | person_regression.body_regression.shirt_type | accuracy_3 | 1 | 611 |
27 | person_regression.body_regression.shirt_type | accuracy_5 | 1 | 611 |
28 | person_regression.body_regression.shirt_type | f1_score | 1 | 611 |
29 | person_regression.body_regression.shirt_type | precision | 1 | 611 |
30 | person_regression.body_regression.shirt_type | recall | 1 | 611 |