-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored AD in box model #507
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 74.93% // Head: 74.61% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #507 +/- ##
==========================================
- Coverage 74.93% 74.61% -0.32%
==========================================
Files 17 17
Lines 5366 5370 +4
==========================================
- Hits 4021 4007 -14
- Misses 1345 1363 +18
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
It seems fine from what I can tell, I like that you simplified the functions. Why are you wanting to remove the whole example on using Enzyme to just calculate a derivative though? I understand that the full sensitivity calculation is more interesting but I think it's useful to show some simpler calculations with Enzyme, especially considering this is intended to be a tutorial. If it's just a matter of rewriting this part with the new functions I'm happy to do this |
From a pedagogical standpoint, I am uneasy with this example. because |
You're right, it is counterintuitive. I started using the "old, now, new" notation because this was how the Fortran code I was modelling mine on did it, but really it doesn't make a whole lot of sense here. I'm guessing you already figured this out but the outputs of the forward function shouldn't really be considered as at different time steps, rather one is the value before the smoother, and one is after the smoother has been applied. The fact that the adjoint variables further confused this is my fault. Unless I'm misunderstanding what you're referring to? Maybe it would be worth it for me to go through and adjust this notation to make it less confusing? I still think that having a plain derivative example is useful, especially to see what happens with the shadow outputs (if I'm using that term correctly, I think this is what you call the placeholders in the autodiff call) when you use Enzyme |
I thought users who want to get a gradient using Enzyme.jl might be overwhelmed by the notion of shadow copies etc. In one example, there maybe should only be one forward function, then Enzyme is applied, and it all works magically. However, I could readd the code for a single step. My main goal was to get rid of AD-specific functions to get rid of the impression that the code requires some special massaging, which became obsolete after the latest changes in Enzyme. In any case, feel free to push a more consistent variable naming. I see that I also added some more confusion (e.g., |
Gotcha, correct me if I’m wrong but won’t people need to work with Duplicated if they want the gradient of a function that has vector output? Meaning learning the whole shadow stuff despite having a simple function. And yeah I’ll try and fix notation! |
Sorry for all the comments, but I just sat down and really looked at the updated functions. I think that we're on two different pages about what the tutorial was supposed to do: it seems like it was edited to be efficient for this one specific example, but it's not really using the adjoint method anymore. I don't really see a way to store all the adjoint variables and use them for other purposes, and the goal was mainly to teach about using Enzyme for the adjoint method. The function that I wrote and called ad_calc was not intended to be something special for Enzyme, rather it was calculating the adjoint variables one by one and giving the potential to store any/all of them. |
There also speaks nothing against having two examples each focusing on a different pedagogical example. |
Changing the names would be helpful for sure. Thanks @swilliamson7. I agree that two separate implementations for two different purposes based on the same underlying physical model makes sense. |
Cool, sounds good! |
No worries. This is how we butchered it to use it in |
@michel2323 what do you want to do with this branch? |
I would add |
@michel2323 I don't have any real opinion, it's been a super long time since I looked at this code...feel free to use it however you guys think is best! |
Refactored AD in box model so it doesn't require any special AD implementation and Enzyme takes care of all the checkpoints. This example also works with Checkpointing.jl, however, it currently still requires Zygote.jl due to ChainRules.jl. We can either add Checkpointing.jl now with Zygote.jl in this example or wait for EnzymeRules.jl. It's your call.
@swilliamson7 Can you check whether the example still makes sense to you? I checked whether it gives the same numerical results, but I don't know whether I didn't screw up the description.