Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored AD in box model #507

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Refactored AD in box model #507

wants to merge 1 commit into from

Conversation

michel2323
Copy link
Collaborator

Refactored AD in box model so it doesn't require any special AD implementation and Enzyme takes care of all the checkpoints. This example also works with Checkpointing.jl, however, it currently still requires Zygote.jl due to ChainRules.jl. We can either add Checkpointing.jl now with Zygote.jl in this example or wait for EnzymeRules.jl. It's your call.

@swilliamson7 Can you check whether the example still makes sense to you? I checked whether it gives the same numerical results, but I don't know whether I didn't screw up the description.

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2022

Codecov Report

Base: 74.93% // Head: 74.61% // Decreases project coverage by -0.31% ⚠️

Coverage data is based on head (5e685be) compared to base (2570fe8).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #507      +/-   ##
==========================================
- Coverage   74.93%   74.61%   -0.32%     
==========================================
  Files          17       17              
  Lines        5366     5370       +4     
==========================================
- Hits         4021     4007      -14     
- Misses       1345     1363      +18     
Impacted Files Coverage Δ
src/compiler/orcv2.jl 61.90% <0.00%> (-10.48%) ⬇️
src/compiler/validation.jl 62.06% <0.00%> (-0.63%) ⬇️
src/compiler.jl 73.75% <0.00%> (-0.12%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@swilliamson7
Copy link
Collaborator

It seems fine from what I can tell, I like that you simplified the functions. Why are you wanting to remove the whole example on using Enzyme to just calculate a derivative though? I understand that the full sensitivity calculation is more interesting but I think it's useful to show some simpler calculations with Enzyme, especially considering this is intended to be a tutorial. If it's just a matter of rewriting this part with the new functions I'm happy to do this

@sriharikrishna
Copy link
Collaborator

From a pedagogical standpoint, I am uneasy with this example. because
function forward_func(state, fld_old, fld_now, dt, M)
is differentiated using:
autodiff( forward_func, Duplicated(state_out, dout_old), Duplicated([Tbar; Sbar], din_now), Duplicated([Tbar; Sbar], din_old), 10*day, M, )
I got completely lost because of the order of the formal arguments fld_old, fld_now, and the actual arguments din_now, din_old is counterintuitive. I had a lot of trouble getting the call right within Checkpointing.jl.

@swilliamson7
Copy link
Collaborator

You're right, it is counterintuitive. I started using the "old, now, new" notation because this was how the Fortran code I was modelling mine on did it, but really it doesn't make a whole lot of sense here. I'm guessing you already figured this out but the outputs of the forward function shouldn't really be considered as at different time steps, rather one is the value before the smoother, and one is after the smoother has been applied. The fact that the adjoint variables further confused this is my fault. Unless I'm misunderstanding what you're referring to?

Maybe it would be worth it for me to go through and adjust this notation to make it less confusing? I still think that having a plain derivative example is useful, especially to see what happens with the shadow outputs (if I'm using that term correctly, I think this is what you call the placeholders in the autodiff call) when you use Enzyme

@michel2323
Copy link
Collaborator Author

michel2323 commented Oct 12, 2022

I thought users who want to get a gradient using Enzyme.jl might be overwhelmed by the notion of shadow copies etc. In one example, there maybe should only be one forward function, then Enzyme is applied, and it all works magically. However, I could readd the code for a single step. My main goal was to get rid of AD-specific functions to get rid of the impression that the code requires some special massaging, which became obsolete after the latest changes in Enzyme.

In any case, feel free to push a more consistent variable naming. I see that I also added some more confusion (e.g., Duplicated(state_out, dout_old)).

@swilliamson7
Copy link
Collaborator

Gotcha, correct me if I’m wrong but won’t people need to work with Duplicated if they want the gradient of a function that has vector output? Meaning learning the whole shadow stuff despite having a simple function.

And yeah I’ll try and fix notation!

@swilliamson7
Copy link
Collaborator

Sorry for all the comments, but I just sat down and really looked at the updated functions. I think that we're on two different pages about what the tutorial was supposed to do: it seems like it was edited to be efficient for this one specific example, but it's not really using the adjoint method anymore. I don't really see a way to store all the adjoint variables and use them for other purposes, and the goal was mainly to teach about using Enzyme for the adjoint method. The function that I wrote and called ad_calc was not intended to be something special for Enzyme, rather it was calculating the adjoint variables one by one and giving the potential to store any/all of them.

@vchuravy
Copy link
Member

There also speaks nothing against having two examples each focusing on a different pedagogical example.

@sriharikrishna
Copy link
Collaborator

Changing the names would be helpful for sure. Thanks @swilliamson7. I agree that two separate implementations for two different purposes based on the same underlying physical model makes sense.

@swilliamson7
Copy link
Collaborator

Cool, sounds good!

@michel2323
Copy link
Collaborator Author

No worries. This is how we butchered it to use it in Checkpointing.jl. @vchuravy I think I'll bring the heat.jl example over from Checkpointing.jl for some variety. This can be closed then I guess.

@vchuravy
Copy link
Member

vchuravy commented Apr 1, 2024

@michel2323 what do you want to do with this branch?

@michel2323
Copy link
Collaborator Author

michel2323 commented Apr 1, 2024

I would add Checkpointing.jl and make it an example of using Enzyme on a time-dependent model. @swilliamson7 I see your comment here. Would that change be okay with you? Since this is on the Enzyme page, it would be helpful to focus on Enzyme's API and less on the model itself.

@swilliamson7
Copy link
Collaborator

@michel2323 I don't have any real opinion, it's been a super long time since I looked at this code...feel free to use it however you guys think is best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants