Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rs_allocate_closure free function. #1944

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

curtisblack
Copy link
Contributor

Description

Adds a new free function rs_allocate_closure which allows the renderer services to provide memory for closure storage, for both inbuilt (add/mul) and user defined closures.

The existing osl_* closure handling functions now use this new free function.

The CPU side fallback is to call through to the existing closure pool implementation. The GPU side fallback returns null.

testshade/testrender provide an example implementation showing how to implement a stack based closure pool for the GPU.

Tests

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format v17 before submitting, I definitely will look at
    the CI test that runs clang-format and fix anything that it highlights as
    being nonconforming.

Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
rs_allocate_closure(OSL::OpaqueExecContextPtr exec_ctx, size_t size,
size_t alignment)
{
#ifndef __CUDA_ARCH__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is rs_fallback.cpp, it should never be compiled by CUDA, can we remove the #ifndef CUDA_ARCH

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in rs_fallback is done this way. It is compiled by Cuda, is it not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its is not, rs_fallback.cpp only exists in
set (lib_src
which is only compiled for the host, it turns around and just calls the virtual functions of renderer services (which are only on the host) and has no way to be customized for other targets.
More of a rs_legacy_adapter.cpp

auto sg = (OSL::ShaderGlobals*)ec;
uintptr_t ptr = OIIO::round_to_multiple_of_pow2((uintptr_t)sg->renderstate,
alignment);
sg->renderstate = (void*)(ptr + size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making use of the testshade/render_state.h RenderState object and add a stack_buffer data member. Although I think RenderState is uniform accross all shades, so we might need a per thread renderstate in the ExecContext to properly handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like

auto *pst = get_rs_per_shade<TestShadePerShadeState>(ec);
pst->stackbuf = (void*)(ptr + size);

Would require another pointer in ShaderGlobals (but hopefully we can remove some in the future).
But then a Renderer could have their own PerShadeState to implement some of this stuff.

Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
}
};

struct RenderState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so with this you are defining "RenderState" to be PerShade, and introduce RenderContext to be for all threads.
And arguably ShaderGlobals always was PerShade. So perhaps that is fine for terminology.

Stepping back a minute, if I were to want a kernel to execute 100,000 shades, when I launch that kernel I don't have a PerShade RenderState to pass, only a RenderContext. That kernel would then use the RenderContext and other kernel arguments to build a RenderState to pass into the JIT'd shader.

Ultimately I think we were looking to get the JIT to produce that ShaderGroup specific kernel_adapter interacting with rs_* functions, like rs_population_shader_globals(...) that could be launched on the GPU (passing RenderContext, start index, end index), and we end up with a bit of chicken and egg between the RenderState and RendererContext. As the OSL Jit doesn't know the concrete type/size of the PerThread/Shade RenderState object to create on the stack. And the Renderer doesn't know the JIT'd shader function (just its prototype). Ideally we want them all in the same llvm module so it can all get inlined / optimized together.

But perhaps this is simpler, the final Renderer would have to craft a kernel that sets up a RenderState and call the JIT'd Shader function (not sure how it got it). Or for CPU side, renderer would have to do its own loop over shades configuring/reusing RenderState. Although in this example, its really PerThread state vs. PerShade.

So for state/context we have:

  1. All Threads
  2. Per Thread
  3. Per Shade (per shade could be subset/section of PerThread that is updated per shade).

I guess its a Render provided a custom ShadingContext with rs free functions to interact with (which makes sense as we are trying to remove/replace the CPU side ShadingContext by asking the renderer to take on it responsibilities).

As we have a rs_bitcode module to integrate, the OSL Jit could just directly use Renderer provided types and alloca a RenderState on the stack, then call a rs_init(OpaqueExecutionContext *oec, RenderState *) function to let the renderer populate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants