Skip to content

Commit

Permalink
Implement experimental GPU two-phase occlusion culling for the standard
Browse files Browse the repository at this point in the history
3D mesh pipeline.

*Occlusion culling* allows the GPU to skip the vertex and fragment
shading overhead for objects that can be quickly proved to be invisible
because they're behind other geometry. A depth prepass already
eliminates most fragment shading overhead for occluded objects, but the
vertex shading overhead, as well as the cost of testing and rejecting
fragments against the Z-buffer, is presently unavoidable for standard
meshes. We currently perform occlusion culling only for meshlets. But
other meshes, such as skinned meshes, can benefit from occlusion culling
too in order to avoid the transform and skinning overhead for unseen
meshes.

This commit adapts the same [*two-phase occlusion culling*] technique
that meshlets use to Bevy's standard 3D mesh pipeline when the new
`OcclusionCulling` component, as well as the `DepthPrepass` component,
are present on the camera. It has these steps:

1. *Early depth prepass*: We use the hierarchical Z-buffer from the
   previous frame to cull meshes for the initial depth prepass,
   effectively rendering only the meshes that were visible in the last
   frame.

2. *Early depth downsample*: We downsample the depth buffer to create
   another hierarchical Z-buffer, this time with the current view
   transform.

3. *Late depth prepass*: We use the new hierarchical Z-buffer to test
   all meshes that weren't rendered in the early depth prepass. Any
   meshes that pass this check are rendered.

4. *Late depth downsample*: Again, we downsample the depth buffer to
   create a hierarchical Z-buffer in preparation for the early depth
   prepass of the next frame. This step is done after all the rendering,
   in order to account for custom phase items that might write to the
   depth buffer.

Note that this patch has no effect on the per-mesh CPU overhead for
occluded objects, which remains high for a GPU-driven renderer due to
the lack of `cold-specialization` and retained bins. If
`cold-specialization` and retained bins weren't on the horizon, then a
more traditional approach like potentially visible sets (PVS) or low-res
CPU rendering would probably be more efficient than the GPU-driven
approach that this patch implements for most scenes. However, at this
point the amount of effort required to implement a PVS baking tool or a
low-res CPU renderer would probably be greater than landing
`cold-specialization` and retained bins, and the GPU driven approach is
the more modern one anyway. It does mean that the performance
improvements from occlusion culling as implemented in this patch *today*
are likely to be limited, because of the high CPU overhead for occluded
meshes.

Note also that this patch currently doesn't implement occlusion culling
for 2D objects or shadow maps. Those can be addressed in a follow-up.
Additionally, note that the techniques in this patch require compute
shaders, which excludes support for WebGL 2.

This PR is marked experimental because of known precision issues with
the downsampling approach when applied to non-power-of-two framebuffer
sizes (i.e. most of them). These precision issues can, in rare cases,
cause objects to be judged occluded that in fact are not. (I've never
seen this in practice, but I know it's possible; it tends to be likelier
to happen with small meshes.) As a follow-up to this patch, we desire to
switch to the [SPD-based hi-Z buffer shader from the Granite engine],
which doesn't suffer from these problems, at which point we should be
able to graduate this feature from experimental status. I opted not to
include that rewrite in this patch for two reasons: (1) @JMS55 is
planning on doing the rewrite to coincide with the new availability of
image atomic operations in Naga; (2) to reduce the scope of this patch.

[*two-phase occlusion culling*]:
https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[Aaltonen SIGGRAPH 2015]:
https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf

[Some literature]:
https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452

[SPD-based hi-Z buffer shader from the Granite engine]:
https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp
  • Loading branch information
pcwalton committed Jan 17, 2025
1 parent b66c3ce commit fd03dd0
Show file tree
Hide file tree
Showing 40 changed files with 4,017 additions and 907 deletions.
11 changes: 11 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4062,3 +4062,14 @@ name = "Directional Navigation"
description = "Demonstration of Directional Navigation between UI elements"
category = "UI (User Interface)"
wasm = true

[[example]]
name = "occlusion_culling"
path = "examples/3d/occlusion_culling.rs"
doc-scrape-examples = true

[package.metadata.example.occlusion_culling]
name = "Occlusion Culling"
description = "Demonstration of Occlusion Culling"
category = "3D Rendering"
wasm = false
1 change: 1 addition & 0 deletions crates/bevy_core_pipeline/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ nonmax = "0.5"
smallvec = "1"
thiserror = { version = "2", default-features = false }
tracing = { version = "0.1", default-features = false, features = ["std"] }
bytemuck = { version = "1" }

[lints]
workspace = true
Expand Down
2 changes: 2 additions & 0 deletions crates/bevy_core_pipeline/src/core_2d/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,8 @@ impl PhaseItem for AlphaMask2d {
}

impl BinnedPhaseItem for AlphaMask2d {
// Since 2D meshes presently can't be multidrawn, the batch set key is
// irrelevant.
type BatchSetKey = BatchSetKey2d;

type BinKey = AlphaMask2dBinKey;
Expand Down
36 changes: 30 additions & 6 deletions crates/bevy_core_pipeline/src/core_3d/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ pub mod graph {
#[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
pub enum Node3d {
MsaaWriteback,
Prepass,
EarlyPrepass,
EarlyDownsampleDepth,
LatePrepass,
DeferredPrepass,
CopyDeferredLightingId,
EndPrepasses,
Expand All @@ -25,6 +27,7 @@ pub mod graph {
MainTransmissivePass,
MainTransparentPass,
EndMainPass,
LateDownsampleDepth,
Taa,
MotionBlur,
Bloom,
Expand Down Expand Up @@ -67,9 +70,10 @@ use core::ops::Range;

use bevy_render::{
batching::gpu_preprocessing::{GpuPreprocessingMode, GpuPreprocessingSupport},
experimental::occlusion_culling::OcclusionCulling,
mesh::allocator::SlabId,
render_phase::PhaseItemBatchSetKey,
view::{NoIndirectDrawing, RetainedViewEntity},
view::{prepare_view_targets, NoIndirectDrawing, RetainedViewEntity},
};
pub use camera_3d::*;
pub use main_opaque_pass_3d_node::*;
Expand Down Expand Up @@ -114,8 +118,9 @@ use crate::{
},
dof::DepthOfFieldNode,
prepass::{
node::PrepassNode, AlphaMask3dPrepass, DeferredPrepass, DepthPrepass, MotionVectorPrepass,
NormalPrepass, Opaque3dPrepass, OpaqueNoLightmap3dBatchSetKey, OpaqueNoLightmap3dBinKey,
node::{EarlyPrepassNode, LatePrepassNode},
AlphaMask3dPrepass, DeferredPrepass, DepthPrepass, MotionVectorPrepass, NormalPrepass,
Opaque3dPrepass, OpaqueNoLightmap3dBatchSetKey, OpaqueNoLightmap3dBinKey,
ViewPrepassTextures, MOTION_VECTOR_PREPASS_FORMAT, NORMAL_PREPASS_FORMAT,
},
skybox::SkyboxPlugin,
Expand Down Expand Up @@ -161,6 +166,9 @@ impl Plugin for Core3dPlugin {
(
sort_phase_system::<Transmissive3d>.in_set(RenderSet::PhaseSort),
sort_phase_system::<Transparent3d>.in_set(RenderSet::PhaseSort),
configure_occlusion_culling_view_targets
.after(prepare_view_targets)
.in_set(RenderSet::ManageViews),
prepare_core_3d_depth_textures.in_set(RenderSet::PrepareResources),
prepare_core_3d_transmission_textures.in_set(RenderSet::PrepareResources),
prepare_prepass_textures.in_set(RenderSet::PrepareResources),
Expand All @@ -169,7 +177,8 @@ impl Plugin for Core3dPlugin {

render_app
.add_render_sub_graph(Core3d)
.add_render_graph_node::<ViewNodeRunner<PrepassNode>>(Core3d, Node3d::Prepass)
.add_render_graph_node::<ViewNodeRunner<EarlyPrepassNode>>(Core3d, Node3d::EarlyPrepass)
.add_render_graph_node::<ViewNodeRunner<LatePrepassNode>>(Core3d, Node3d::LatePrepass)
.add_render_graph_node::<ViewNodeRunner<DeferredGBufferPrepassNode>>(
Core3d,
Node3d::DeferredPrepass,
Expand Down Expand Up @@ -200,7 +209,8 @@ impl Plugin for Core3dPlugin {
.add_render_graph_edges(
Core3d,
(
Node3d::Prepass,
Node3d::EarlyPrepass,
Node3d::LatePrepass,
Node3d::DeferredPrepass,
Node3d::CopyDeferredLightingId,
Node3d::EndPrepasses,
Expand Down Expand Up @@ -898,6 +908,20 @@ pub fn prepare_core_3d_transmission_textures(
}
}

/// Sets the `TEXTURE_BINDING` flag on the depth texture if necessary for
/// occlusion culling.
///
/// We need that flag to be set in order to read from the texture.
fn configure_occlusion_culling_view_targets(
mut view_targets: Query<&mut Camera3d, (With<OcclusionCulling>, With<DepthPrepass>)>,
) {
for mut camera_3d in &mut view_targets {
let mut depth_texture_usages = TextureUsages::from(camera_3d.depth_texture_usages);
depth_texture_usages |= TextureUsages::TEXTURE_BINDING;
camera_3d.depth_texture_usages = depth_texture_usages.into();
}
}

// Disable MSAA and warn if using deferred rendering
pub fn check_msaa(mut deferred_views: Query<&mut Msaa, (With<Camera>, With<DeferredPrepass>)>) {
for mut msaa in deferred_views.iter_mut() {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
@group(0) @binding(0) var<storage, read> mip_0: array<u64>; // Per pixel
#else
#ifdef MESHLET
@group(0) @binding(0) var<storage, read> mip_0: array<u32>; // Per pixel
#endif
#else // MESHLET
#ifdef MULTISAMPLE
@group(0) @binding(0) var mip_0: texture_depth_multisampled_2d;
#else // MULTISAMPLE
@group(0) @binding(0) var mip_0: texture_depth_2d;
#endif // MULTISAMPLE
#endif // MESHLET
#endif // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
@group(0) @binding(1) var mip_1: texture_storage_2d<r32float, write>;
@group(0) @binding(2) var mip_2: texture_storage_2d<r32float, write>;
@group(0) @binding(3) var mip_3: texture_storage_2d<r32float, write>;
Expand Down Expand Up @@ -304,9 +312,25 @@ fn load_mip_0(x: u32, y: u32) -> f32 {
let i = y * constants.view_width + x;
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
return bitcast<f32>(u32(mip_0[i] >> 32u));
#else
#else // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
#ifdef MESHLET
return bitcast<f32>(mip_0[i]);
#endif
#else // MESHLET
// Downsample the top level.
#ifdef MULTISAMPLE
// The top level is multisampled, so we need to loop over all the samples
// and reduce them to 1.
var result = textureLoad(mip_0, vec2(x, y), 0);
let sample_count = i32(textureNumSamples(mip_0));
for (var sample = 1; sample < sample_count; sample += 1) {
result = min(result, textureLoad(mip_0, vec2(x, y), sample));
}
return result;
#else // MULTISAMPLE
return textureLoad(mip_0, vec2(x, y), 0);
#endif // MULTISAMPLE
#endif // MESHLET
#endif // MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
}

fn reduce_4(v: vec4f) -> f32 {
Expand Down
Loading

0 comments on commit fd03dd0

Please sign in to comment.