Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU usage with schema and merge orders #431

Open
jaketanderson opened this issue Mar 8, 2022 · 0 comments
Open

GPU usage with schema and merge orders #431

jaketanderson opened this issue Mar 8, 2022 · 0 comments

Comments

@jaketanderson
Copy link
Contributor

This is less of an issue and more of a question. In running an estimation with two schemas, SolvationFreeEnergy and HostGuestBindingAffinity here, I'm aware that HostGuestBindingAffinity is able to make full use of all available dask workers, while SolvationFreeEnergy doesn't make full use of a single worker. But what I was surprised to see was that when I ran my code with the host_guest_data_set merged into the freesolv_data_set and with the solvation_schema added to estimation_options before the host_guest_schema, the binding simulation was restricted to one GPU for the entire calculation.

I tried to fix this problem by switching two things: the line order of the schema additions and which data set is merged into the other. In swapping both of my original orders, the problem has completely gone away. The solvation and binding run simultaneously until the solvation is complete, at which point the binding calculation is able to utilize all four of my available workers.

So what I'm wondering is, is this part of normal operation? Do I need to make sure I load certain schemas first or merge certain datasets into their counterparts and not vice versa? In the future I plan on testing the job with just one of my two fixes implemented to see which one was actually responsible for correcting the GPU usage. Below is my code with relevant lines marked with asterisks.

    freesolv_data_set = PhysicalPropertyDataSet.from_pandas(molecule)
    
    host_guest_data_set = TaproomDataSet(
        #####
    )

*** freesolv_data_set.merge(host_guest_data_set)
    #FIXED VERSION:
    #host_guest_data_set.merge(freesolv_data_set)
    
    solvation_schema = SolvationFreeEnergy.default_simulation_schema(use_implicit_solvent=True)
    
    APR_settings = APRSimulationSteps(
        #####
    )
    host_guest_schema = HostGuestBindingAffinity.default_paprika_schema(
        simulation_settings=APR_settings,
        use_implicit_solvent=True,
        enable_hmr=False,
    )
    
    estimation_options = RequestOptions()
    estimation_options.calculation_layers = ["SimulationLayer"]
*** estimation_options.add_schema(
        "SimulationLayer", "SolvationFreeEnergy", solvation_schema
    )
*** estimation_options.add_schema(
        "SimulationLayer", "HostGuestBindingAffinity", host_guest_schema
    )
    #FIXED VERSION:
    #Swapped order of the two starred .add_schema methods to have host_guest_schema go first

    print("All schemas were added to estimation_options")

    # Create Pool of Dask Workers
    calculation_backend = DaskLocalCluster(
        number_of_workers=4,
        resources_per_worker=ComputeResources(
            number_of_threads=1,
            number_of_gpus=1,
            preferred_gpu_toolkit=ComputeResources.GPUToolkit.CUDA,
        ),
    )
    calculation_backend.start()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant