-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in NegativeLogLikelihood impl #150
Comments
@hweom sorry for the long delay, I see your point and I would appreciate a PR for this issue with a testcase. Happy to review. Practically this is not destructive, all values are summed in the following few lines, so some values get more weight and others are omitted. |
Thanks for the answer. I'm still trying to wrap my head around the code. I think I understand why changing this part made no effect on the MNIST example -- since NLL layer is the last one, the output of it is not used anywhere. Now I have a different question :) This definition of NLL that I found gives the formula:
(here However,
And similarly, the input gradient in for (batch_n, &label_value) in native_labels.iter().enumerate() {
let index = (num_classes * batch_n) + label_value as usize;
writable_gradient[index] = -1f32;
} which is essentially FWIW, I tried changing writable_gradient[index] = -1.0 / native_probabilities[index].max(0.001); (I didn't change Should the name of the layer be changed? |
That is correct. The implementation is pretty old and was done by the original authors of the predecessor if juice. The only explanation I have is when looking at the expansion of ln(x) ~= \sum_{n=0}^{\inf} - x + (1/2)x^2 - (1/3)x^3 + (1/4)x^4 + ..` now if you terminate after the first term, you get the above. This saves a bunch of computations - I added a commit that would improve the approximation up to |
Is this a Taylor expansion of I'm probably missing something obvious... |
https://math.stackexchange.com/questions/585154/taylor-series-for-logx says almost the above, an offset of -1 or respectively +1 is missing, do I guess that's another issue |
You mean the most upvoted answer (https://math.stackexchange.com/a/585158)? Note that it has a formula for |
That's what I meant in my previous comment |
Well, if we take the Taylor expansion suggested by Wolfram: and simplify it, we'll get this (https://www.wolframalpha.com/input/?i=expand+-%28%28x-1%29+-+1%2F2%28x-1%29%5E2+%2B+1%2F3%28x-1%29%5E3%29) Or can just add the offset and keep the math cleaner. |
I think the offset is not really relevant for learning. |
Just started to explore deep learning and chose Juice as the starting framework, since I want to stick with Rust. Since I'm pretty new to the domain, it might be just my mistake.
I was looking at the
NegativeLogLikelihood::compute_output()
and I think there is a bug. Instead ofit should be
Otherwise we're comparing all labels in the batch with the first output from the batch?
Interesting that I've tried changing it and there were no noticeable effect on MNIST example...
The text was updated successfully, but these errors were encountered: