-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathGeneralNN.jl
194 lines (160 loc) · 7.39 KB
/
GeneralNN.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
#DONE
#BRANCH TODO: layer-based API
# Convolutional networks
# convolutional layers
# pooling layers
# Recurrent Neural networks
# implement early stopping
# LSTM networks
# SVM
#TODO
# look at Julia Computing paper on Zygote. Look at article on integrating zygote and xgboost
# allow choice in what to return from training: basic should exclude data (default), full should be everything;
# none is none, maybe allow individual choices
# pooling: do we need asymmetrical pooling? pooling doesn't work with padding or same
# should stride allow asymmetrical strides? probably...
# pooling doesn't work with example dimension. does it need to?
# should we split cached values from basic params? should we combine bn parameters with weights?
# save a model_def more easily
# reuse a model def and run it with different data
# should we have a bigger model obj that includes function stacks, hyper_parameters, and weights?
# verify formula for Xavier initialization: decide whether to use normal or uniform (offer both?)
# implement Kaiming initialization
# enable plot display of train cost but not test cost
# fix the accuracy calc to be more general or provide several versions
# what's the difference between shuffling data to make different batches v.
#choosing batches randomly (from data shuffled just once)
# we'd see all the data in each epoch for both approaches
# composition of batches different v.
# seeing the batches in a different order
# change alphamod to alpha, change alpha to alpha_in??
# fix testgrad.jl: check_grads, prep_check for changes to predict using model and bn params
# fix nnpredict or get rid of it
# implement gradient clipping
# add ability to choose BN per layer (inc. for output layer or not)
# add ability to put bn after linear or after activation
# investigate Zygote.jl for automatic differentiation
# test if we can use batch normalization with full batch training
# check what happens to test cost with regularization: it gets very big
# should we have a normalize data option built-in (as we do)? or make the user do it when
# preparing their own data?
# export onehot--decide how to transpose data
# create a cost function method without L2 regularization
# create a cost prediction method that embeds the feedfwd predictions
# figure out memory requirements of pre-allocation. is there someway to reduce and still get speed benefit?
# in testgrad, test for categorical data or provide other way to do onehot encoding if needed
# utilities for plotdef will break on old plotdefs because they are now called stats
# use goodness function to hold either accuracy or r_squared?
# implement incremental load and train for massive datasets?
# set an explicit data size variable for both test and train for preallocation?
# accuracy for logistic or other single class classifiers should allow -1 or 0 for "bad"
# implement precision and recall
# stats on individual regression parameters
# do check on existence of matfname file and that type is .mat
# implement one vs. all for logistic classification with multiple classes
# set a directory for training stats (keep out of code project directory)
# try different versions of ensemble predictions_vector
# augment MINST data by perturbing the images
# check for type stability: @code_warntype pisum(500,10000)
# still lots of memory allocations despite the pre-allocation
# You can devectorize r -= d[j]*A[:,j] with r .= -.(r,d[j]*A[:.j])
# to get rid of some more temporaries.
# As @LutfullahTomak said sum(A[:,j].*r) should devectorize as dot(view(A,:,j),r)
# to get rid of all of the temporaries in there.
# To use an infix operator, you can use \cdot, as in view(A,:,j)⋅r.
# figure out memory use between train set and minibatch set
# DEFER
# implement layer normalization with or without minibatches. should be easy, but side effects
#Document
# say which properties in hyper_parameters are "calculated" and which are directly from user input
"""
Module GeneralNN:
Includes the following functions to run directly:
- train() -- train neural networks for up to 9 hidden layers
- setup_params -- load hyper_parameters and create Hyper_parameters object
- extract_data() -- extracts data for MNIST from matlab files
- shuffle_data!() -- shuffles training examples (do this before minibatch training)
- test_score() -- cost and accuracy given test data and saved theta
- save_params() -- save all model parameters
- load_params() -- load all model parameters
- predictions_vector() -- predictions given x and saved theta
- accuracy() -- calculates accuracy of predictions compared to actual labels
- normalize_inputs() -- normalize via standardization (mean, sd) or minmax
- normalize_replay!() --
- nnpredict() -- calculate predictions from previously trained parameters and new data
- display_mnist_digit() -- show a digit
- wrong_preds() -- return the indices of wrong predictions against supplied correct labels
- right_preds() -- return the indices of correct predictions against supplied correct labels
- plot_output() -- plot learning, cost for training and test for previously saved training runs
Enter using .GeneralNN to use the module.
These data structures are used to hold parameters and data:
- Wgts holds theta, bias, delta_th, delta_b, theta_dims, output_layer, k
- Model_data holds inputs, targets, a, z, z_norm, epsilon, gradient_function
- Batch_norm_params holds gam (gamma), bet (beta) for batch normalization and intermediate
data used for backprop with batch normalization: delta_gam, delta_bet,
delta_z_norm, delta_z, mu, stddev, mu_run, std_run
- Hyper_parameters holds user input hyper_parameters and some derived parameters.
- Batch_view holds views on model data to hold the subset of data used in minibatches.
(This is not exported as there is no way to use it outside of the backprop loop.)
- Model_def holds the string names and functions that define the layers of the training
model.
"""
module GeneralNN
# ----------------------------------------------------------------------------------------
# data structures for neural network
export
Batch_view,
Wgts,
Model_data,
Batch_norm_params,
Hyper_parameters,
Model_def
# functions you can use
export
train,
setup_params,
pretrain,
prepredict,
test_score,
save_params,
load_params,
accuracy,
extract_data,
shuffle_data!,
normalize_inputs!,
nnpredict,
display_mnist_digit,
wrong_preds,
right_preds,
plot_output,
dodigit,
check_grads
using MAT
using JLD2
using FileIO
using Statistics
using Distributions
using Random
using Printf
using Dates
using LinearAlgebra
using SparseArrays
# for TOML support for arguments file
using TOML # we might need private TOML for Julia < 1.3
using Debugger
using Plots # Plots broken by Julia 1.3
gr()
import Measures: mm # for some plot dimensioning
import Base.show # for new methods to pretty-print our objects
# be careful: order of includes matters!
include("nn_data_structs.jl")
include("training_loop.jl")
include("layer_functions.jl")
include("setup_params.jl")
include("setup_data.jl")
include("setup_model.jl")
include("utilities.jl")
include("training.jl")
include("testgrad.jl")
const l_relu_neg = .01 # makes the type constant; value can be changed.
end # module GeneralNN