Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix sum implementation WIP #14

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Prefix sum implementation WIP #14

wants to merge 8 commits into from

Commits on Oct 27, 2021

  1. First try at prefix sum

    Sorta works but deadlocks on larger inputs.
    raphlinus committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    274d151 View commit details
    Browse the repository at this point in the history
  2. Make storage barriers uniform control flow

    Still doesn't fix deadlocks tho :/
    raphlinus committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    2800b73 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2021

  1. Verify results

    Still WIP
    raphlinus committed Nov 1, 2021
    Configuration menu
    Copy the full SHA
    5fc557c View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2021

  1. Larger workgroup

    Fastest results on AMD at workgroup = 1024. Note, this has atomicOr
    workaround for correctness.
    
    Also note, not all targets will support a workgroup of this size; on
    shipping, we'd need to query and select at runtime.
    raphlinus committed Nov 2, 2021
    Configuration menu
    Copy the full SHA
    17094ab View commit details
    Browse the repository at this point in the history
  2. Sequential section

    Do a small sequential scan at the leaf of the hierarchy. That amortizes
    both the workgroup-scope tree reduction and the (still sequential)
    decoupled look-back to a larger number of inputs.
    
    Note: this falls short of a real performance evaluation because there's
    no attempt to warm up the GPU clock. But it's valid as a very rough
    swag.
    raphlinus committed Nov 2, 2021
    Configuration menu
    Copy the full SHA
    8174880 View commit details
    Browse the repository at this point in the history
  3. Iterate runs

    Better for performance analaysis
    raphlinus committed Nov 2, 2021
    Configuration menu
    Copy the full SHA
    93ad8ee View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2021

  1. Go fast

    Performance measurement requires keeping the GPU busy. That means not
    copying results back to CPU and doing verification there.
    raphlinus committed Nov 3, 2021
    Configuration menu
    Copy the full SHA
    697ea4e View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2021

  1. Use explicit atomic stores

    Naga will accept ordinary loads and stores to atomic types, but tint
    will not.
    raphlinus committed Nov 4, 2021
    Configuration menu
    Copy the full SHA
    87e5b20 View commit details
    Browse the repository at this point in the history