-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rs_allocate_closure free function. #1944
base: main
Are you sure you want to change the base?
Add rs_allocate_closure free function. #1944
Conversation
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
rs_allocate_closure(OSL::OpaqueExecContextPtr exec_ctx, size_t size, | ||
size_t alignment) | ||
{ | ||
#ifndef __CUDA_ARCH__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is rs_fallback.cpp, it should never be compiled by CUDA, can we remove the #ifndef CUDA_ARCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in rs_fallback is done this way. It is compiled by Cuda, is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its is not, rs_fallback.cpp only exists in
set (lib_src
which is only compiled for the host, it turns around and just calls the virtual functions of renderer services (which are only on the host) and has no way to be customized for other targets.
More of a rs_legacy_adapter.cpp
src/testshade/rs_simplerend.cpp
Outdated
auto sg = (OSL::ShaderGlobals*)ec; | ||
uintptr_t ptr = OIIO::round_to_multiple_of_pow2((uintptr_t)sg->renderstate, | ||
alignment); | ||
sg->renderstate = (void*)(ptr + size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making use of the testshade/render_state.h RenderState object and add a stack_buffer data member. Although I think RenderState is uniform accross all shades, so we might need a per thread renderstate in the ExecContext to properly handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like
auto *pst = get_rs_per_shade<TestShadePerShadeState>(ec);
pst->stackbuf = (void*)(ptr + size);
Would require another pointer in ShaderGlobals (but hopefully we can remove some in the future).
But then a Renderer could have their own PerShadeState to implement some of this stuff.
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
Signed-off-by: Curtis Black <[email protected]>
} | ||
}; | ||
|
||
struct RenderState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so with this you are defining "RenderState" to be PerShade, and introduce RenderContext to be for all threads.
And arguably ShaderGlobals always was PerShade. So perhaps that is fine for terminology.
Stepping back a minute, if I were to want a kernel to execute 100,000 shades, when I launch that kernel I don't have a PerShade RenderState to pass, only a RenderContext. That kernel would then use the RenderContext and other kernel arguments to build a RenderState to pass into the JIT'd shader.
Ultimately I think we were looking to get the JIT to produce that ShaderGroup specific kernel_adapter interacting with rs_* functions, like rs_population_shader_globals(...) that could be launched on the GPU (passing RenderContext, start index, end index), and we end up with a bit of chicken and egg between the RenderState and RendererContext. As the OSL Jit doesn't know the concrete type/size of the PerThread/Shade RenderState object to create on the stack. And the Renderer doesn't know the JIT'd shader function (just its prototype). Ideally we want them all in the same llvm module so it can all get inlined / optimized together.
But perhaps this is simpler, the final Renderer would have to craft a kernel that sets up a RenderState and call the JIT'd Shader function (not sure how it got it). Or for CPU side, renderer would have to do its own loop over shades configuring/reusing RenderState. Although in this example, its really PerThread state vs. PerShade.
So for state/context we have:
- All Threads
- Per Thread
- Per Shade (per shade could be subset/section of PerThread that is updated per shade).
I guess its a Render provided a custom ShadingContext with rs free functions to interact with (which makes sense as we are trying to remove/replace the CPU side ShadingContext by asking the renderer to take on it responsibilities).
As we have a rs_bitcode module to integrate, the OSL Jit could just directly use Renderer provided types and alloca a RenderState on the stack, then call a rs_init(OpaqueExecutionContext *oec, RenderState *) function to let the renderer populate it.
Description
Adds a new free function
rs_allocate_closure
which allows the renderer services to provide memory for closure storage, for both inbuilt (add/mul) and user defined closures.The existing
osl_*
closure handling functions now use this new free function.The CPU side fallback is to call through to the existing closure pool implementation. The GPU side fallback returns null.
testshade/testrender provide an example implementation showing how to implement a stack based closure pool for the GPU.
Tests
Checklist:
already run clang-format v17 before submitting, I definitely will look at
the CI test that runs clang-format and fix anything that it highlights as
being nonconforming.