Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure and Reorganize project #185

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open

Restructure and Reorganize project #185

wants to merge 51 commits into from

Conversation

glass-ships
Copy link
Member

@glass-ships glass-ships commented Sep 12, 2024

Restructures refl1d to house files together in folders related to their function/purpose

To Do:

  • Fix docs not building (since MODULES list assumes all files are top level apparently)
  • Double check docs are appropriately updated

@glass-ships
Copy link
Member Author

glass-ships commented Sep 13, 2024

This is currently failing for an interesting reason.
Installing the package doesn't seem to include the newly created folders, and I'm not entirely sure why.

$ pwd
.../envs/refl1d/lib/python3.11/site-packages/refl1d 

$ ls
fitplugin.py  __init__.py  lib/  main.py  names.py  __pycache__/  view/  webview/

Maybe @bmaranville or @pkienzle has some insight?

EDIT: resolved using [tools.setuptools.packages.find] where directive

@bmaranville
Copy link
Member

It looks like package discovery is disabled if we manually provide a list of packages in the pyproject.toml file, so we can either add all the new packages (folders) to that list, or remove it completely and let auto-discovery happen.

https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#automatic-discovery

@bmaranville
Copy link
Member

I feel like the primary use of the refl1d library is to construct an Experiment object, each of which requires a Sample object and a Probe object, so having experiment.py and probe.py at the same level inside refl1d/models and sample-building stuff in a subfolder refl1d/models/sample/ doesn't seem to match the hierarchy implied in the class construction. Why not experiment.py in refl1d/ and then tools for constructing samples in refl1d/sample/ and probes in refl1d/probe/ ?

flowchart BT
    Sample --> Experiment 
    Probe --> Experiment
Loading

@glass-ships
Copy link
Member Author

I feel like the primary use of the refl1d library is to construct an Experiment object, each of which requires a Sample object and a Probe object, so having experiment.py and probe.py at the same level inside refl1d/models and sample-building stuff in a subfolder refl1d/models/sample/ doesn't seem to match the hierarchy implied in the class construction. Why not experiment.py in refl1d/ and then tools for constructing samples in refl1d/sample/ and probes in refl1d/probe/ ?

so that's similar to what i had in mind with this PR. the idea behind what i have so far is that experiment, probe, and sample, are all models of "things" that make up a particular instance refl1d project, rather than a literal representation of what's made up of what else, and that sample is a directory is that there are multiple "subtypes" of sample, whereas experiment and probe (along with instrument and fitproblem) are represented by single files, so there's not really anything else that would go into probe/ other than probe.py which would feel unnecessarily stuttery.

experiment could live at the top level, it just made sense to classify it as a model in the context of "python model representing a 'thing'". but we can definitely move some things around as it makes sense to

@bmaranville
Copy link
Member

Also, backends.py doesn't really go in fitting... I'm trying to think where that belongs. There was some importlib reason that it's not in the lib folder directly, but the calculations that go in the backends aren't really specific to the act of fitting, they are just accelerated version of the calculations of the reflectivity (which is done in Experiment), the profile (which is calculated based on the sample), etc.

The backend chooser is just there to select which version of the fast calculations is used in the various places around the library that need to go fast.

@glass-ships
Copy link
Member Author

Also, backends.py doesn't really go in fitting... I'm trying to think where that belongs. There was some importlib reason that it's not in the lib folder directly, but the calculations that go in the backends aren't really specific to the act of fitting, they are just accelerated version of the calculations of the reflectivity (which is done in Experiment), the profile (which is calculated based on the sample), etc.

The backend chooser is just there to select which version of the fast calculations is used in the various places around the library that need to go fast.

ah interesting, ok. my assumption was that it was choosing which backend was ultimately used to run the calculations as part of the fitting process. in that case, it might even make sense to just live at the root directory

Makefile Show resolved Hide resolved
@glass-ships
Copy link
Member Author

I just noticed there's an additional requirements.txt in doc/, is this necessary?

@bmaranville
Copy link
Member

I just noticed there's an additional requirements.txt in doc/, is this necessary?

That file is directly referenced by .readthedocs.yaml in the root folder, and is part of the template structure at https://github.com/readthedocs/tutorial-template/

@glass-ships
Copy link
Member Author

That file is directly referenced by .readthedocs.yaml in the root folder, and is part of the template structure at https://github.com/readthedocs/tutorial-template/

Interesting. It's possible it's actually redundant, I see the version in the template includes sphinx and sphinx-rtd, and is referenced as "dependencies required to build the docs. But ours just includes things like bumps, numpy, etc. which I assume aren't actually required to build the docs themselves? (I may be missing something), and we have sphinx listed as a dev dependency in our pyproject.toml

@bmaranville
Copy link
Member

I think you're right, it is redundant. Probably the .readthedocs.yaml could be updated so that it uses the requirements from pyproject.toml, but I'm not entirely sure how to do that

@glass-ships
Copy link
Member Author

other than that, this is ready for re-re-review

@glass-ships
Copy link
Member Author

Commented out replacing FitProblem docstring for bumps's version until that can be resolved separately.

Tests working, updated with main - should be ready for review

@@ -16,7 +16,7 @@

# from refl1d.uncertainty import show_errors
from refl1d.models.experiment import Experiment, ExperimentBase, MixedExperiment
import refl1d.models.probe
from refl1d.models.probe.probe import PolarizedNeutronProbe
Copy link
Member Author

@glass-ships glass-ships Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted, this could be:

from refl1d.models import PolarizedNeutronProbe

Similarly, previous line could be

from refl1d.models import Experiment, ExperimentBase, MixedExperiment

@glass-ships
Copy link
Member Author

@bmaranville slightly unrelated.
webview/client/package.json lists "vue-json-viewer": "^3.0.4"
but this npm package lists 2.22.2 as latest, and i don't see it imported anywhere in the client code... can it safely be removed?

@bmaranville
Copy link
Member

I think the piece that was using it was moved to the bumps codebase... and then we switched to a different library to do the same thing... so yes, removable.

@glass-ships
Copy link
Member Author

glass-ships commented Nov 12, 2024

@bmaranville looking at doc/guide/index.rst L94:
the PTFit class doesn't seem to exist anymore (seems to have been moved to bumps?) - how should this be modified?

Same thing with the _MaterialStacker class in models.sample.material.Scatterer

@bmaranville
Copy link
Member

@bmaranville looking at doc/guide/index.rst L94: the PTFit class doesn't seem to exist anymore (seems to have been moved to bumps?) - how should this be modified?

Same thing with the _MaterialStacker class in models.sample.material.Scatterer

I would just replace the PTFit with something like the amoeba fitter class, but I don't really know what _MaterialStacker was. That reference might be very old.

@glass-ships
Copy link
Member Author

I would just replace the PTFit with something like the amoeba fitter class

this is also not in refl1d, would that be bumps.fitters.SimplexFit?

@glass-ships
Copy link
Member Author

how are we feeling about this at the moment?

@bmaranville
Copy link
Member

I started running existing models with the restructured code, and the biggest observation I made is that the majority of them use functions that are not in refl1d.names. Most of the time, this is something related to implementing a custom data loader, such as for TOF data, that uses make_probe. Other times utility functions like bumps.util.make_seed are being used, or the FWHM2sigma function from refl1d.resolution.

Some of these are unnecessary - make_probe is a minuscule function that just returns one of the Probe subclasses, and they could be used directly (and are in the names namespace), and we could easily add something like FHWM2sigma to names.

So, the takeaway is that the majority of existing scripts would break with this change, and fixing them would be fairly easy for one of the developers, but a big pain in the neck for users.

The main reason I haven't pushed harder to get this merged is that the new folder structure still doesn't completely map onto the graph of how the modules work and interact - it is hard to sort them in this way, and there are many interconnections, and I don't think any folder structure can accurately represent this. The practical grouping of e.g. "this module has to do with probe stuff (building, calculating, using...)" is the best we can do. Do you feel like this meets the initial goals of the PR? Is it worth the cost to the users, of having to modify old scripts in order to have them work, which many of them will not know how to do?

@glass-ships
Copy link
Member Author

I do feel strongly that some sort of sensible grouping of modules is important, it's pretty common for things in one folder to interact with things in another folder and that doesn't really break any sense of logic imo, but it does make sense to, like you said, group things together at the very least in some way, ie (these modules relate to the probe, these modules relate to the sample, etc etc).
If there are any other changes to the folder structure you'd suggest, I'm all ears, even removing the models dir and moving everything underneath it to the top level as previously mentioned is sensible.

Breakage could be mitigated by adding whatever we need to to names to maintain consistent functionality.
A migration guide could also go a long way to helping, as well as informative deprecation warnings and error messages.

Personally, I'm ok with things breaking now and then if it's a side effect of improving the code base and making it more maintainable/easier to understand - these things do happen from time to time, what's more important is providing a helpful path forward for users/developers.

But again, that's just one developers opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants