UE5 包含哪些Culling
UE5的culling同时包含CPU端、GPU端culling
CPU端传统非 Nanite 物体的收集和剔除核心收录在SceneVisibility.cpp,有如下Culling:
- Distance Culling
- Frustum Culling
- Pre Computed Visibility
- Occlusion Culling
GPU端Nanite管线剔除部分收录在NaniteCullRaster.cpp,有如下Culling:
- Nanite Culling
- HIZ Culling
Culling前的准备
主要工作
在Culling正式开始前,SceneRenderer会先执行InitViews(),主要为Culling做以下工作:
- View & Frustum Setup
- 计算必要矩阵
- 提取视锥体平面
- 写入 View Constant Buffer
- Primitive 视图相关性评估
- Flags 过滤
- 填充
FPrimitiveViewRelevance: 遍历场景中的FPrimitiveSceneProxy,快速评估其材质属性。如:该物体是否包含不透明材质、是否有半透明材质、是否投射阴影、是否需要写入 Custom Depth、是否是 Editor-Only 的辅助线。这为后续把它们分发到不同的Render Buckets做好了准备
- 动态数据收集与包围盒更新
- 解决时序依赖
- 准备HZB:提取上一帧末尾由GPU计算的HIZ
- 回读硬件查询:如果管线配置为使用传统的Hardware Occlusion Queries,这里会回收前几帧 GPU 传回的可见性像素计数
- 并发搭建
- 预分配内存栈:在 Culling 过程中会产生海量的临时数组
- 切分任务:根据场景八叉树的节点数量,将剔除任务切分成多个小块,丢给 Task Graph,唤醒多核 CPU 的 Worker 线程,准备进行并发计算
总体来说,InitViews()解决了“用什么视角看、排除哪些不看的、计算动态物体在当前帧的数据、拿出上一帧的遮挡参考、分发多线程内存与任务”
FDeferredShadingSceneRenderer不像FMobileSceneRenderer那样,它将InitViews()拆分为BeginInitViews()与EndInitViews()。这是因为移动端的管线相对简单,异步计算带来的收益可能盖不住同步开销;而PC端性能够强,需要压榨多线程、CPU与GPU异步。BeginInitViews派发任务不等待结果,EndInitViews同步结果生成Draw Call
具体分析
但派发基础 Culling 任务的代码不在BeginInitViews——render中,BeginInitViews之前
Render()派发 Culling 任务
- 准备Frustum Cull & 图元相关性计算(决定物体是画在 Base Pass 还是半透明 Pass 等)
FInitViewTaskDatas InitViewTaskDatas = OnRenderBegin(GraphBuilder, SceneUpdateInputs);- 准备场景RT Task
if (SceneUpdateInputs) { PrepareSceneTexturesConfigTask = UE::Tasks::Launch(UE_SOURCE_LOCATION, [SceneUpdateInputs] { TRACE_CPUPROFILER_EVENT_SCOPE(PrepareViewRects); FTaskTagScope TagScope(ETaskTag::EParallelRenderingThread); for (FSceneRenderer* Renderer : SceneUpdateInputs->Renderers) { Renderer->PrepareViewRectsForRendering(); InitializeSceneTexturesConfig(Renderer->ViewFamily.SceneTexturesConfig, Renderer->ViewFamily); const FSceneTexturesConfig& SceneTexturesConfig = Renderer->GetActiveSceneTexturesConfig(); // Custom render passes have their own view family structure, so they can have separate EngineShowFlags, so the SceneTexturesConfig // needs to be copied. The FSceneTextures structure itself is pointer shared, and doesn't need to be copied. for (FCustomRenderPassInfo& CustomRenderPass : Renderer->CustomRenderPassInfos) { CustomRenderPass.ViewFamily.SceneTexturesConfig = Renderer->ViewFamily.SceneTexturesConfig; // Custom Render Passes don't support MSAA. If MSAA is enabled, the first Custom Render Pass will allocate a separate non-MSAA // FSceneTextures, initialized using this config (see logic in the FSceneRenderer constructor that fills in CustomRenderPassInfos). CustomRenderPass.ViewFamily.SceneTexturesConfig.NumSamples = 1; CustomRenderPass.ViewFamily.SceneTexturesConfig.EditorPrimitiveNumSamples = 1; } } }, UE::Tasks::ETaskPriority::Normal, bIsMobilePlatform ? UE::Tasks::EExtendedTaskPriority::Inline : UE::Tasks::EExtendedTaskPriority::None); }计算准备ViewRect、SceneTexturesConfig、CustomRenderPass
这里的CustomRenderPass并不是插件式自定义的render pass,而是用不同的渲染设置再渲染一次,将场景“离屏渲染”到一张贴图上,供后续采样- 定义场景更新完毕后的回调
SceneUpdateParameters.Callbacks.PostStaticMeshUpdate = [&] (const UE::Tasks::FTask& StaticMeshUpdateTask) { PrepareSceneTexturesConfigTask.Wait(); if (!ViewFamily.ViewExtensions.IsEmpty()) { RDG_CSV_STAT_EXCLUSIVE_SCOPE(GraphBuilder, PreRender); SCOPE_CYCLE_COUNTER(STAT_FDeferredShadingSceneRenderer_ViewExtensionPreRenderView); for (auto& ViewExtension : ViewFamily.ViewExtensions) { ViewExtension->PreRenderViewFamily_RenderThread(GraphBuilder, ViewFamily); for (FViewInfo* View : AllViews) { ViewExtension->PreRenderView_RenderThread(GraphBuilder, *View); } } } if (SceneUpdateInputs) { for (FSceneRenderer* Renderer : SceneUpdateInputs->Renderers) { const FSceneTexturesConfig& SceneTexturesConfig = Renderer->GetActiveSceneTexturesConfig(); Renderer->PrepareViewStateForVisibility(SceneTexturesConfig); } } if (ViewFamily.EngineShowFlags.LensDistortion && FPaniniProjectionConfig::IsEnabledByCVars()) { const FPaniniProjectionConfig PaniniProjection = FPaniniProjectionConfig::ReadCVars(); for (FViewInfo& View : Views) { if (View.ViewMatrices.IsPerspectiveProjection()) { View.LensDistortionLUT = PaniniProjection.GenerateLUTPasses(GraphBuilder, View); } } } // Run Groom LOD selection prior to visibility for selecting appropriate LOD & geometry type if (IsGroomEnabled()) { if (Views.Num() > 0 && !ViewFamily.EngineShowFlags.HitProxies) { FHairStrandsBookmarkParameters Parameters; CreateHairStrandsBookmarkParameters(Scene, Views, AllViews, Parameters, false/*bComputeVisibleInstances*/); if (Parameters.HasInstances()) { // 1. Select appropriate LOD & geometry type RunHairStrandsBookmark(GraphBuilder, EHairStrandsBookmark::ProcessLODSelection, Parameters); } } } // Lighting is skipped when running ERendererOutput::DepthPrepassOnly if (GetRendererOutput() == ERendererOutput::FinalSceneColor) { LightFunctionAtlas::OnRenderBegin(LightFunctionAtlas, *Scene, Views, ViewFamily); } VisibilityTaskData = LaunchVisibilityTasks(GraphBuilder.RHICmdList, *this, StaticMeshUpdateTask); if (GraphBuilder.IsParallelSetupEnabled()) { GPUSceneUpdateTaskPrerequisites.AddPrerequisites(VisibilityTaskData->GetComputeRelevanceTask()); } GPUSceneUpdateTaskPrerequisites.Trigger(); };- 等待SceneTexturesConfig Task完成
-
如果需要添加自定义Pass,且Pass需要添加修改View数据
if (!ViewFamily.ViewExtensions.IsEmpty()) { RDG_CSV_STAT_EXCLUSIVE_SCOPE(GraphBuilder, PreRender); SCOPE_CYCLE_COUNTER(STAT_FDeferredShadingSceneRenderer_ViewExtensionPreRenderView); for (auto& ViewExtension : ViewFamily.ViewExtensions) { ViewExtension->PreRenderViewFamily_RenderThread(GraphBuilder, ViewFamily); for (FViewInfo* View : AllViews) { ViewExtension->PreRenderView_RenderThread(GraphBuilder, *View); } } }为插件式自定义的render pass,添加修改View数据
-
更新ViewState
if (SceneUpdateInputs) { for (FSceneRenderer* Renderer : SceneUpdateInputs->Renderers) { const FSceneTexturesConfig& SceneTexturesConfig = Renderer->GetActiveSceneTexturesConfig(); Renderer->PrepareViewStateForVisibility(SceneTexturesConfig); } }FSceneViewState记录的是上一帧数据;FViewInfo记录的是当前帧的数据。主要服务HIZ
-
生成LUT,用于后续后处理修复镜头畸变
if (ViewFamily.EngineShowFlags.LensDistortion && FPaniniProjectionConfig::IsEnabledByCVars()) { const FPaniniProjectionConfig PaniniProjection = FPaniniProjectionConfig::ReadCVars(); for (FViewInfo& View : Views) { if (View.ViewMatrices.IsPerspectiveProjection()) { View.LensDistortionLUT = PaniniProjection.GenerateLUTPasses(GraphBuilder, View); } } }用于Panini 投影修复线性透视投影。FOV 越大,屏幕边缘的物体会被拉伸得非常宽、甚至变形
-
为Groom毛发计算LOD
// Run Groom LOD selection prior to visibility for selecting appropriate LOD & geometry type if (IsGroomEnabled()) { if (Views.Num() > 0 && !ViewFamily.EngineShowFlags.HitProxies) { FHairStrandsBookmarkParameters Parameters; CreateHairStrandsBookmarkParameters(Scene, Views, AllViews, Parameters, false/*bComputeVisibleInstances*/); if (Parameters.HasInstances()) { // 1. Select appropriate LOD & geometry type RunHairStrandsBookmark(GraphBuilder, EHairStrandsBookmark::ProcessLODSelection, Parameters); } } } - 判断当前render是否输出scene color,不输出则跳过,否则计算LightFunction
if (GetRendererOutput() == ERendererOutput::FinalSceneColor) { LightFunctionAtlas::OnRenderBegin(LightFunctionAtlas, *Scene, Views, ViewFamily); }遍历所有带有LightFunction的灯光,将他们的LightFunction写在一张Atlas
-
把Visibility Culling丢进多线程
VisibilityTaskData = LaunchVisibilityTasks(GraphBuilder.RHICmdList, *this, StaticMeshUpdateTask);- 预计算可见性剔除
if (ViewPacket.ViewState) { SCOPE_CYCLE_COUNTER(STAT_DecompressPrecomputedOcclusion); ViewPacket.View.PrecomputedVisibilityData = ViewPacket.ViewState->ResolvePrecomputedVisibilityData(ViewPacket.View, &Scene); if (ViewPacket.View.PrecomputedVisibilityData) { SceneRenderer.bUsedPrecomputedVisibility = true; } }- 等待必要任务完成
// Each relevance task should have this as a prerequisite, but in case there aren't any tasks we make it explicit. Tasks.ComputeRelevance.AddPrerequisites(Scene.GetCacheMeshDrawCommandsTask()); // Wait on the GPU skin update task prior to GDME. Tasks.DynamicMeshElementsPrerequisites.AddPrerequisites(Scene.GetGPUSkinUpdateTask());- 图元分类、GPU Skin
-
光源的视锥体剔除
Tasks.LightVisibility.AddPrerequisites(UE::Tasks::Launch(UE_SOURCE_LOCATION, [this] { FTaskTagScope Scope(ETaskTag::EParallelRenderingThread); SceneRenderer.ComputeLightVisibility(); }, TaskConfig.TaskPriority));- 不等Culling完全结束后,才遍历计算DynamicMeshElement。而是Culling完一部分后,直接遍历这些剔除出一小批可见的动态物体,并计算DynamicMeshElement
if (TaskConfig.Schedule == EVisibilityTaskSchedule::Parallel) { if (ViewPackets.Num() == 1) { // When using a single view, dynamic mesh elements are pushed into a pipe that is executed on the render thread which allows for some overlap with compute relevance work. DynamicMeshElements.CommandPipe = Allocator.Create<TCommandPipe<FDynamicPrimitiveIndexList>>(TEXT("GatherDynamicMeshElements")); DynamicMeshElements.CommandPipe->SetCommandFunction([this](FDynamicPrimitiveIndexList&& DynamicPrimitiveIndexList) { GatherDynamicMeshElements(MoveTemp(DynamicPrimitiveIndexList)); }); DynamicMeshElements.CommandPipe->SetPrerequisiteTask(Tasks.DynamicMeshElementsPrerequisites); Tasks.DynamicMeshElementsPipe = FGraphEvent::CreateGraphEvent(); DynamicMeshElements.CommandPipe->SetEmptyFunction([this] { Tasks.DynamicMeshElementsPipe->DispatchSubsequents(); Tasks.DynamicMeshElements.Trigger(); }); // Take a reference that is released when the relevance pipe has completed. We only need to take one since there can only be one view. DynamicMeshElements.CommandPipe->AddNumCommands(1); // We don't need the primitive view masks when in parallel mode with a single view. bAllocatePrimitiveViewMasks = false; } }- Frustum Cull & Distance Cull& 硬件遮挡查询
SceneRenderer.WaitOcclusionTests(RHICmdList); // Parallel occlusion culling is not supported on mobile check(!Views.IsEmpty()) checkf(!Views[0]->bIsMobileMultiViewEnabled, TEXT("This culling path was not tested with MMV")); for (FVisibilityViewPacket& ViewPacket : ViewPackets) { if (ViewPacket.OcclusionCull.ContextIfParallel) { ViewPacket.OcclusionCull.ContextIfParallel->Map(RHICmdList); } Tasks.BeginInitVisibility.AddPrerequisites(UE::Tasks::Launch(UE_SOURCE_LOCATION, [&ViewPacket] { FTaskTagScope Scope(ETaskTag::EParallelRenderingThread); ViewPacket.BeginInitVisibility(); }, BeginInitVisibilityPrerequisites, TaskConfig.TaskPriority)); } // Static relevance is finalized for ALL views after each view completes static mesh filtering tasks. Tasks.FinalizeRelevance = UE::Tasks::Launch(UE_SOURCE_LOCATION, [this] { TRACE_CPUPROFILER_EVENT_SCOPE(FSceneRenderer_FinalizeStaticRelevance); for (FVisibilityViewPacket& ViewPacket : ViewPackets) { ViewPacket.Relevance.Context->Finalize(); } }, Tasks.ComputeRelevance, TaskConfig.TaskPriority);-
回读。上一帧引擎向 GPU 提交了一批不可见物体的包围盒
-
给每个相机派发初始化任务
- 八叉树节点剔除
if (bShouldVisibilityCull && Flags.bUseVisibilityOctree) { VisibleNodes = TaskData.Allocator.Create<FSceneBitArray>(); const FConvexVolume& ViewCullingFrustum = View.GetCullingFrustum(); CullOctree(Scene, View, Flags, *VisibleNodes, ViewCullingFrustum); }- FrustumCull & Distance Cull
// Frustum culling tasks have to run serially if custom culling is not thread-safe. const UE::Tasks::EExtendedTaskPriority ExtendedTaskPriority = GetExtendedTaskPriority(bCullingIsThreadsafe); // Assign the number of expected commands first so the pipe can determine when the last task has completed. OcclusionCull.CommandPipe.AddNumCommands(TaskConfig.FrustumCull.NumTasks); for (uint32 TaskIndex = 0; TaskIndex < TaskConfig.FrustumCull.NumTasks; ++TaskIndex) { Tasks.FrustumCull.AddPrerequisites( UE::Tasks::Launch(UE_SOURCE_LOCATION, [this, Flags, MaxDrawDistanceScale, HLODState, VisibleNodes, TaskIndex]() mutable { TRACE_CPUPROFILER_EVENT_SCOPE(SceneVisibility_FrustumCull); FTaskTagScope TaskTagScope(ETaskTag::EParallelRenderingThread); int32 NumCulledPrimitives = FrustumCull(Scene, View, Flags, MaxDrawDistanceScale, HLODState, VisibleNodes, TaskConfig, TaskIndex); FPrimitiveRange PrimitiveRange; PrimitiveRange.StartIndex = TaskConfig.FrustumCull.NumPrimitivesPerTask * (TaskIndex); PrimitiveRange.EndIndex = TaskConfig.FrustumCull.NumPrimitivesPerTask + PrimitiveRange.StartIndex; PrimitiveRange.EndIndex = FMath::Min(PrimitiveRange.EndIndex, int32(TaskConfig.NumTestedPrimitives)); // Skip rendering of dynamic objects without static lighting for static reflection captures. if (View.bStaticSceneOnly) { for (FSceneSetBitIterator BitIt(View.PrimitiveVisibilityMap, PrimitiveRange.StartIndex); BitIt.GetIndex() < PrimitiveRange.EndIndex; ++BitIt) { if (!Scene.PrimitiveSceneProxies[BitIt.GetIndex()]->HasStaticLighting()) { View.PrimitiveVisibilityMap.AccessCorrespondingBit(BitIt) = false; NumCulledPrimitives++; } } } // Skip rendering of small objects when in wireframe mode for performance since wireframe doesn't enable occlusion culling. if (View.Family->EngineShowFlags.Wireframe) { const float ScreenSizeScale = FMath::Max(View.ViewMatrices.GetProjectionMatrix().M[0][0] * View.ViewRect.Width(), View.ViewMatrices.GetProjectionMatrix().M[1][1] * View.ViewRect.Height()); for (FSceneSetBitIterator BitIt(View.PrimitiveVisibilityMap, PrimitiveRange.StartIndex); BitIt.GetIndex() < PrimitiveRange.EndIndex; ++BitIt) { if (ScreenSizeScale * Scene.PrimitiveBounds[BitIt.GetIndex()].BoxSphereBounds.SphereRadius <= GWireframeCullThreshold) { View.PrimitiveVisibilityMap.AccessCorrespondingBit(BitIt) = false; NumCulledPrimitives++; } } } const uint32 NumVisiblePrimitives = PrimitiveRange.EndIndex - PrimitiveRange.StartIndex - NumCulledPrimitives; if (NumVisiblePrimitives == 0) { OcclusionCull.CommandPipe.ReleaseNumCommands(1); } else { OcclusionCull.CommandPipe.EnqueueCommand(PrimitiveRange); } TaskConfig.FrustumCull.NumCulledPrimitives.fetch_add(NumCulledPrimitives, std::memory_order_relaxed); }, bHasAlwaysVisible ? Tasks.AlwaysVisible : PrerequisiteTask, TaskConfig.TaskPriority, ExtendedTaskPriority)); } OcclusionCull.CommandPipe.ReleaseNumCommands(1);
BeginInitViews()
把极度耗时的物理、骨骼、粒子、TLAS 构建任务统统扔给 GPU 异步计算引擎或 CPU 线程池
- 计算FX位置:一般ViewUniformBuffer在Culling后才绑定,但Niagara Compute Shader 需要尽早拿到这些矩阵去算粒子的运动
if (FXSystem && FXSystem->RequiresEarlyViewUniformBuffer() && Views.IsValidIndex(0) && bRendererOutputFinalSceneColor) { // during ISR, instanced view RHI resources need to be initialized first. if (FViewInfo* InstancedView = const_cast<FViewInfo*>(Views[0].GetInstancedView())) { InstancedView->InitRHIResources(); } Views[0].InitRHIResources(); FXSystem->PostInitViews(GraphBuilder, GetSceneViews(), !ViewFamily.EngineShowFlags.HitProxies); } - 收集动态网格体:骨骼动画、动态生成的 Mesh
TaskDatas.VisibilityTaskData->StartGatherDynamicMeshElements(); - 为TLAS收集当前帧所有需要更新的Dynamic Instances的信息,为后续TLAS做准备
if (TaskDatas.RayTracingGatherInstances != nullptr) { RayTracing::BeginGatherDynamicRayTracingInstances(*TaskDatas.RayTracingGatherInstances); } - 收集贴花
if (!ViewFamily.EngineShowFlags.HitProxies) { TaskDatas.Decals = FDecalVisibilityTaskData::Launch(GraphBuilder, *Scene, Views); } - 计算阴影数据
if (bRendererOutputFinalSceneColor) { BeginInitDynamicShadows(GraphBuilder, TaskDatas, InstanceCullingManager); }计算CSM Splits、Light Space Matrix、Shadow Frustum Culling、分配 Shadow Map Atlas、构建上下文
-
准备全局环境光与View
// 准备天空光照辐照度 UpdateSkyIrradianceGpuBuffer(GraphBuilder, ViewFamily.EngineShowFlags, Scene->SkyLight, Scene->SkyIrradianceEnvironmentMap); // 准备天空大气资源 if (ShouldRenderSkyAtmosphere(Scene, ViewFamily.EngineShowFlags)) { InitSkyAtmosphereForViews(RHICmdList, GraphBuilder); } // 循环遍历所有 View for (int32 ViewIndex = Views.Num() - 1; ViewIndex >= 0; --ViewIndex) { FViewInfo& View = Views[ViewIndex]; View.UpdatePreExposure(); // 更新自动曝光参数 UpdateHairResources(GraphBuilder, View);// 准备毛发渲染资源 View.InitRHIResources(); // 正式分配 View 的各种底层常量缓冲 } for (FCustomRenderPassInfo& PassInfo : CustomRenderPassInfos) { for (FViewInfo& View : PassInfo.Views) { View.InitRHIResources(); } } - 再次计算阴影
因为ProcessRenderThreadTasks()之后才知道动态数据,再次计算剩下的动态阴影
if (bRendererOutputFinalSceneColor) { BeginInitDynamicShadows(GraphBuilder, TaskDatas, InstanceCullingManager); }
EndInitViews()
拿到所有的 Culling 结果,打包成最终的 Draw Call 队列
- 等待线程ComputeViewVisibility()任务全部结束
TaskDatas.VisibilityTaskData->Finish(); - 准备shadow cast数据
BeginShadowGatherDynamicMeshElements(TaskDatas.DynamicShadows);准备draw data,如顶点/索引数据、材质信息、变换矩阵、LOD
-
处理GI
// 如果没有开多线程任务 if (ViewFamily.EngineShowFlags.HitProxies == 0 && Scene->PrecomputedLightVolumes.Num() > 0 && !(GILCUpdatePrimTaskEnabled && FPlatformProcess::SupportsMultithreading())) { check(!TaskDatas.ILCUpdatePrim); // 更新 ILC Scene->IndirectLightingCache.UpdateCache(Scene, *this, true); } // 如果之前派发了 ILC 的异步任务,在这里等待它完成并 Finalize if (TaskDatas.ILCUpdatePrim) { Scene->IndirectLightingCache.FinalizeCacheUpdates(Scene, *this, *TaskDatas.ILCUpdatePrim); } // 将更新后的数据上传到 GPU Buffer UpdatePrimitiveIndirectLightingCacheBuffers(GraphBuilder.RHICmdList);ILC 主要用于将烘焙好的静态光照(Lightmass 体素)插值应用到动态物体上
-
处理半透明与反射捕捉
SeparateTranslucencyDimensions = UpdateSeparateTranslucencyDimensions(*this); SetupSceneReflectionCaptureBuffer(RHICmdList);- 计算并更新分离的半透明渲染目标的尺寸:现代引擎通常会将半透明物体放在一个单独的 Pass 中甚至降低分辨率渲染以节省带宽
- Reflection Capture: 将 Reflection Capture Actor 的数据打包塞入 GPU Buffer
- 前向渲染的特殊阴影处理
if (IsForwardShadingEnabled(ShaderPlatform)) { // Dynamic shadows are synced earlier when forward shading is enabled. FinishInitDynamicShadows(GraphBuilder, TaskDatas.DynamicShadows, InstanceCullingManager); }如果开启了Forward Shading,动态阴影的初始化必须在这里直接Finish
Culling解析
Pre Computed Visibility
-
什么是Pre Computed Visibility
一种离线的Occlusion Culling,它将游戏世界划分为一个个3d单元格,预先计算并记录下从每个网格可能看到哪些静态物体。后续不需要实时计算遮挡,只要检查单元格,就能知道哪些物体是被遮挡的
-
为什么需要Pre Computed Visibility
实时动态遮挡剔除每一帧向GPU发送查询指令,如果场景中物体极多,这个查询过程本身就会成为性能瓶颈
对于场景中静态物体,完全可以预制,不必运行时再查询
-
大致算法
- 找出空旷区域
-
对空旷区域划分盒子—— Cell
-
在每个 Cell内部,生成camra
-
每个camera360°发射射线,撞击 Static Mesh。如果射线被Static Mesh A挡住,导致根本看不见Static Mesh B,那么在这个 Cell 的bit mask表里,Static Mesh B 就被记为
0(表示根本看不到)
-
UE如何实现
const uint8* FSceneViewState::ResolvePrecomputedVisibilityData(FViewInfo& View, const FScene* InScene) { const uint8* PrecomputedVisibilityData = NULL; if (InScene->PrecomputedVisibilityHandler && GAllowPrecomputedVisibility && View.Family->EngineShowFlags.PrecomputedVisibility) { const FPrecomputedVisibilityHandler& Handler = *InScene->PrecomputedVisibilityHandler; FViewElementPDI VisibilityCellsPDI(&View, nullptr, nullptr); // Draw visibility cell bounds for debugging if enabled if ((GShowPrecomputedVisibilityCells || View.Family->EngineShowFlags.PrecomputedVisibilityCells) && !GShowRelevantPrecomputedVisibilityCells) { for (int32 BucketIndex = 0; BucketIndex < Handler.PrecomputedVisibilityCellBuckets.Num(); BucketIndex++) { for (int32 CellIndex = 0; CellIndex < Handler.PrecomputedVisibilityCellBuckets[BucketIndex].Cells.Num(); CellIndex++) { const FPrecomputedVisibilityCell& CurrentCell = Handler.PrecomputedVisibilityCellBuckets[BucketIndex].Cells[CellIndex]; // Construct the cell's bounds const FBox CellBounds(CurrentCell.Min, CurrentCell.Min + FVector(Handler.PrecomputedVisibilityCellSizeXY, Handler.PrecomputedVisibilityCellSizeXY, Handler.PrecomputedVisibilityCellSizeZ)); if (View.GetCullingFrustum().IntersectBox(CellBounds.GetCenter(), CellBounds.GetExtent())) { DrawWireBox(&VisibilityCellsPDI, CellBounds, FColor(50, 50, 255), SDPG_World); } } } } //Determine view origin FVector ViewOrigin = View.CullingOrigin; #if !(UE_BUILD_SHIPPING || UE_BUILD_TEST) if (const FViewMatrices* FrozenViewMatrices = GetFrozenViewMatrices()) { // Use the frozen view for culling so we can test that it's working ViewOrigin = FrozenViewMatrices->GetViewOrigin(); } #endif // Calculate the bucket that ViewOrigin falls into // Cells are hashed into buckets to reduce search time const float FloatOffsetX = (ViewOrigin.X - Handler.PrecomputedVisibilityCellBucketOriginXY.X) / Handler.PrecomputedVisibilityCellSizeXY; // FMath::TruncToInt rounds toward 0, we want to always round down const int32 BucketIndexX = FMath::Abs((FMath::TruncToInt(FloatOffsetX) - (FloatOffsetX < 0.0f ? 1 : 0)) / Handler.PrecomputedVisibilityCellBucketSizeXY % Handler.PrecomputedVisibilityNumCellBuckets); const float FloatOffsetY = (ViewOrigin.Y -Handler.PrecomputedVisibilityCellBucketOriginXY.Y) / Handler.PrecomputedVisibilityCellSizeXY; const int32 BucketIndexY = FMath::Abs((FMath::TruncToInt(FloatOffsetY) - (FloatOffsetY < 0.0f ? 1 : 0)) / Handler.PrecomputedVisibilityCellBucketSizeXY % Handler.PrecomputedVisibilityNumCellBuckets); const int32 PrecomputedVisibilityBucketIndex = BucketIndexY * Handler.PrecomputedVisibilityCellBucketSizeXY + BucketIndexX; check(PrecomputedVisibilityBucketIndex < Handler.PrecomputedVisibilityCellBuckets.Num()); const FPrecomputedVisibilityBucket& CurrentBucket = Handler.PrecomputedVisibilityCellBuckets[PrecomputedVisibilityBucketIndex]; for (int32 CellIndex = 0; CellIndex < CurrentBucket.Cells.Num(); CellIndex++) { const FPrecomputedVisibilityCell& CurrentCell = CurrentBucket.Cells[CellIndex]; // Construct the cell's bounds const FBox CellBounds(CurrentCell.Min, CurrentCell.Min + FVector(Handler.PrecomputedVisibilityCellSizeXY, Handler.PrecomputedVisibilityCellSizeXY, Handler.PrecomputedVisibilityCellSizeZ)); // Check if ViewOrigin is inside the current cell if (CellBounds.IsInside(ViewOrigin)) { // Reuse a cached decompressed chunk if possible if (CachedVisibilityChunk && CachedVisibilityHandlerId == InScene->PrecomputedVisibilityHandler->GetId() && CachedVisibilityBucketIndex == PrecomputedVisibilityBucketIndex && CachedVisibilityChunkIndex == CurrentCell.ChunkIndex) { checkSlow(CachedVisibilityChunk->Num() >= CurrentCell.DataOffset + CurrentBucket.CellDataSize); PrecomputedVisibilityData = &(*CachedVisibilityChunk)[CurrentCell.DataOffset]; } else { const FCompressedVisibilityChunk& CompressedChunk = Handler.PrecomputedVisibilityCellBuckets[PrecomputedVisibilityBucketIndex].CellDataChunks[CurrentCell.ChunkIndex]; CachedVisibilityBucketIndex = PrecomputedVisibilityBucketIndex; CachedVisibilityChunkIndex = CurrentCell.ChunkIndex; CachedVisibilityHandlerId = InScene->PrecomputedVisibilityHandler->GetId(); if (CompressedChunk.bCompressed) { // Decompress the needed visibility data chunk DecompressedVisibilityChunk.Reset(); DecompressedVisibilityChunk.AddUninitialized(CompressedChunk.UncompressedSize); verify(FCompression::UncompressMemory( NAME_Zlib, DecompressedVisibilityChunk.GetData(), CompressedChunk.UncompressedSize, CompressedChunk.Data.GetData(), CompressedChunk.Data.Num())); CachedVisibilityChunk = &DecompressedVisibilityChunk; } else { CachedVisibilityChunk = &CompressedChunk.Data; } checkSlow(CachedVisibilityChunk->Num() >= CurrentCell.DataOffset + CurrentBucket.CellDataSize); // Return a pointer to the cell containing ViewOrigin's decompressed visibility data PrecomputedVisibilityData = &(*CachedVisibilityChunk)[CurrentCell.DataOffset]; } } } } return PrecomputedVisibilityData; }如果场景非常大,烘焙出的单元格可能会非常多。如果每一帧都遍历一个几十万长度的数组来判断
CellBounds.IsInside(ViewOrigin),CPU 会成为瓶颈。UE采用的方案是2d哈希分桶,利用camera世界坐标,通过简单的除法和取模运算,以O(1)时间复杂度定位camera所在的Bucket// Calculate the bucket that ViewOrigin falls into // Cells are hashed into buckets to reduce search times const float FloatOffsetX = (ViewOrigin.X - Handler.PrecomputedVisibilityCellBucketOriginXY.X) / Handler.PrecomputedVisibilityCellSizeXY; // FMath::TruncToInt rounds toward 0, we want to always round down const int32 BucketIndexX = FMath::Abs((FMath::TruncToInt(FloatOffsetX) - (FloatOffsetX < 0.0f ? 1 : 0)) / Handler.PrecomputedVisibilityCellBucketSizeXY % Handler.PrecomputedVisibilityNumCellBuckets); const float FloatOffsetY = (ViewOrigin.Y -Handler.PrecomputedVisibilityCellBucketOriginXY.Y) / Handler.PrecomputedVisibilityCellSizeXY; const int32 BucketIndexY = FMath::Abs((FMath::TruncToInt(FloatOffsetY) - (FloatOffsetY < 0.0f ? 1 : 0)) / Handler.PrecomputedVisibilityCellBucketSizeXY % Handler.PrecomputedVisibilityNumCellBuckets); const int32 PrecomputedVisibilityBucketIndex = BucketIndexY * Handler.PrecomputedVisibilityCellBucketSizeXY + BucketIndexX;随后遍历Bucket里的cells,并计算camera是否在当前cells中,在的话加载bit mask表
但并不会全部加载,而是会划分一个区域(Chunk),Bucket会维护每个Chunk的数据,当camera进入一个Chunk会加载当前Chunk的数据,后续离开这个Chunk才会加载新的chunk数据,避免内存爆炸if (CellBounds.IsInside(ViewOrigin)) { // Reuse a cached decompressed chunk if possible // 缓存命中检测:如果摄像机还在上一个 Chunk 里,直接返回内存指针 if (CachedVisibilityChunk && CachedVisibilityHandlerId == InScene->PrecomputedVisibilityHandler->GetId() && CachedVisibilityBucketIndex == PrecomputedVisibilityBucketIndex && CachedVisibilityChunkIndex == CurrentCell.ChunkIndex) { checkSlow(CachedVisibilityChunk->Num() >= CurrentCell.DataOffset + CurrentBucket.CellDataSize); PrecomputedVisibilityData = &(*CachedVisibilityChunk)[CurrentCell.DataOffset]; } else { const FCompressedVisibilityChunk& CompressedChunk = Handler.PrecomputedVisibilityCellBuckets[PrecomputedVisibilityBucketIndex].CellDataChunks[CurrentCell.ChunkIndex]; CachedVisibilityBucketIndex = PrecomputedVisibilityBucketIndex; CachedVisibilityChunkIndex = CurrentCell.ChunkIndex; CachedVisibilityHandlerId = InScene->PrecomputedVisibilityHandler->GetId(); // 缓存未命中:跨过了边界,需要解压新的 Chunk if (CompressedChunk.bCompressed) { // Decompress the needed visibility data chunk DecompressedVisibilityChunk.Reset(); DecompressedVisibilityChunk.AddUninitialized(CompressedChunk.UncompressedSize); verify(FCompression::UncompressMemory( NAME_Zlib, DecompressedVisibilityChunk.GetData(), CompressedChunk.UncompressedSize, CompressedChunk.Data.GetData(), CompressedChunk.Data.Num())); CachedVisibilityChunk = &DecompressedVisibilityChunk; } else { CachedVisibilityChunk = &CompressedChunk.Data; } checkSlow(CachedVisibilityChunk->Num() >= CurrentCell.DataOffset + CurrentBucket.CellDataSize); // Return a pointer to the cell containing ViewOrigin's decompressed visibility data PrecomputedVisibilityData = &(*CachedVisibilityChunk)[CurrentCell.DataOffset]; } }
硬件遮挡查询
硬件遮挡并不在这个pass执行,这里只会拿取上一帧的硬件遮挡结果
SceneRenderer.WaitOcclusionTests(RHICmdList);
if (ViewPacket.OcclusionCull.ContextIfParallel)
{
ViewPacket.OcclusionCull.ContextIfParallel->Map(RHICmdList);
}
Frustum Cull & Distance Cull
八叉树
static void CullOctree(const FScene& Scene, FViewInfo& View, const FFrustumCullingFlags& Flags, FSceneBitArray& OutVisibleNodes, const FConvexVolume& ViewCullingFrustum)
{
TRACE_CPUPROFILER_EVENT_SCOPE(SceneVisibility_CullOctree);
// Two bits per octree node, 1st bit is Inside Frustum, 2nd bit is Outside Frustum
OutVisibleNodes.Init(false, Scene.PrimitiveOctree.GetNumNodes() * 2);
Scene.PrimitiveOctree.FindNodesWithPredicate(
[&View, &OutVisibleNodes, &Flags, &ViewCullingFrustum](FScenePrimitiveOctree::FNodeIndex ParentNodeIndex, FScenePrimitiveOctree::FNodeIndex NodeIndex, const FBoxCenterAndExtent& NodeBounds)
{
// If the parent node is completely contained there is no need to test containment
if (ParentNodeIndex != INDEX_NONE && !OutVisibleNodes[(ParentNodeIndex * 2) + 1])
{
OutVisibleNodes[NodeIndex * 2] = true;
OutVisibleNodes[NodeIndex * 2 + 1] = false;
return true;
}
const FPlane* PermutedPlanePtr = ViewCullingFrustum.PermutedPlanes.GetData();
bool bIntersects = false;
if (Flags.bUseFastIntersect)
{
bIntersects = IntersectBox8Plane(NodeBounds.Center, NodeBounds.Extent, PermutedPlanePtr);
}
else
{
bIntersects = ViewCullingFrustum.IntersectBox(NodeBounds.Center, NodeBounds.Extent);
}
if (bIntersects)
{
OutVisibleNodes[NodeIndex * 2] = true;
OutVisibleNodes[NodeIndex * 2 + 1] = ViewCullingFrustum.GetBoxIntersectionOutcode(NodeBounds.Center, NodeBounds.Extent).GetOutside();
}
return bIntersects;
},
[](FScenePrimitiveOctree::FNodeIndex /*ParentNodeIndex*/, FScenePrimitiveOctree::FNodeIndex /*NodeIndex*/, const FBoxCenterAndExtent& /*NodeBounds*/)
{
});
}
- Two-Bit设计
为了机制的压缩内存和提高缓存命中率,UE为每个节点分配2bit,一个表示是否有任一部分在视锥体内部(Inside),另一个表示是否是否有任一部分在视锥体外部(Outside)
- Inside = false, Outside = false:全在外面
- Inside = true, Outside = true:跨越边界
- Inside = true, Outside = false:完全包含
// Two bits per octree node, 1st bit is Inside Frustum, 2nd bit is Outside Frustum OutVisibleNodes.Init(false, Scene.PrimitiveOctree.GetNumNodes() * 2); - 遍历八叉树
- 判断当前node是否完全在视锥体内部,是则不继续向下遍历
// If the parent node is completely contained there is no need to test containment if (ParentNodeIndex != INDEX_NONE && !OutVisibleNodes[(ParentNodeIndex * 2) + 1]) { OutVisibleNodes[NodeIndex * 2] = true; OutVisibleNodes[NodeIndex * 2 + 1] = false; return true; }- 判断当前node是否与视锥体相交bIntersects
if (Flags.bUseFastIntersect) { bIntersects = IntersectBox8Plane(NodeBounds.Center, NodeBounds.Extent, PermutedPlanePtr); } else { bIntersects = ViewCullingFrustum.IntersectBox(NodeBounds.Center, NodeBounds.Extent); }- 设置Inside、Outside
if (bIntersects) { OutVisibleNodes[NodeIndex * 2] = true; OutVisibleNodes[NodeIndex * 2 + 1] = ViewCullingFrustum.GetBoxIntersectionOutcode(NodeBounds.Center, NodeBounds.Extent).GetOutside(); }Inside(NodeIndex * 2):如果与视锥体相交(bIntersects = true),Inside一定为true
Outside(NodeIndex * 2 + 1):GetBoxIntersectionOutcode()判断box是否有部分在视锥体外,是则true
-
Frustum Cull
- 对于Always Visible的物体(如skybox),不做任何剔除,直接更新
// Always Visible const bool bHasAlwaysVisible = TaskConfig.NumVisiblePrimitives > 0; if (bHasAlwaysVisible) { const float CurrentWorldTime = View.Family->Time.GetWorldTimeSeconds(); for (uint32 TaskIndex = 0; TaskIndex < TaskConfig.AlwaysVisible.NumTasks; ++TaskIndex) { Tasks.AlwaysVisible.AddPrerequisites( UE::Tasks::Launch(UE_SOURCE_LOCATION, [this, Flags, TaskIndex, CurrentWorldTime]() mutable { TRACE_CPUPROFILER_EVENT_SCOPE(SceneVisibility_AlwaysVisible); SCOPE_CYCLE_COUNTER(STAT_UpdateAlwaysVisible); FTaskTagScope TaskTagScope(ETaskTag::EParallelRenderingThread); UpdateAlwaysVisible(Scene, View, Flags, TaskConfig, TaskIndex, CurrentWorldTime); }, PrerequisiteTask, TaskConfig.TaskPriority, UE::Tasks::EExtendedTaskPriority::None)); } }- Frustum Cull
-
UE将Frustum Cull拆分成多个子任务
for (uint32 TaskIndex = 0; TaskIndex < TaskConfig.FrustumCull.NumTasks; ++TaskIndex) { Tasks.FrustumCull.AddPrerequisites( UE::Tasks::Launch(UE_SOURCE_LOCATION, [this, Flags, MaxDrawDistanceScale, HLODState, VisibleNodes, TaskIndex]() mutable { TRACE_CPUPROFILER_EVENT_SCOPE(SceneVisibility_FrustumCull); FTaskTagScope TaskTagScope(ETaskTag::EParallelRenderingThread); int32 NumCulledPrimitives = FrustumCull(Scene, View, Flags, MaxDrawDistanceScale, HLODState, VisibleNodes, TaskConfig, TaskIndex); FPrimitiveRange PrimitiveRange; PrimitiveRange.StartIndex = TaskConfig.FrustumCull.NumPrimitivesPerTask * (TaskIndex); PrimitiveRange.EndIndex = TaskConfig.FrustumCull.NumPrimitivesPerTask + PrimitiveRange.StartIndex; PrimitiveRange.EndIndex = FMath::Min(PrimitiveRange.EndIndex, int32(TaskConfig.NumTestedPrimitives)); // Skip rendering of dynamic objects without static lighting for static reflection captures. if (View.bStaticSceneOnly) { for (FSceneSetBitIterator BitIt(View.PrimitiveVisibilityMap, PrimitiveRange.StartIndex); BitIt.GetIndex() < PrimitiveRange.EndIndex; ++BitIt) { if (!Scene.PrimitiveSceneProxies[BitIt.GetIndex()]->HasStaticLighting()) { View.PrimitiveVisibilityMap.AccessCorrespondingBit(BitIt) = false; NumCulledPrimitives++; } } } // Skip rendering of small objects when in wireframe mode for performance since wireframe doesn't enable occlusion culling. if (View.Family->EngineShowFlags.Wireframe) { const float ScreenSizeScale = FMath::Max(View.ViewMatrices.GetProjectionMatrix().M[0][0] * View.ViewRect.Width(), View.ViewMatrices.GetProjectionMatrix().M[1][1] * View.ViewRect.Height()); for (FSceneSetBitIterator BitIt(View.PrimitiveVisibilityMap, PrimitiveRange.StartIndex); BitIt.GetIndex() < PrimitiveRange.EndIndex; ++BitIt) { if (ScreenSizeScale * Scene.PrimitiveBounds[BitIt.GetIndex()].BoxSphereBounds.SphereRadius <= GWireframeCullThreshold) { View.PrimitiveVisibilityMap.AccessCorrespondingBit(BitIt) = false; NumCulledPrimitives++; } } } const uint32 NumVisiblePrimitives = PrimitiveRange.EndIndex - PrimitiveRange.StartIndex - NumCulledPrimitives; if (NumVisiblePrimitives == 0) { OcclusionCull.CommandPipe.ReleaseNumCommands(1); } else { OcclusionCull.CommandPipe.EnqueueCommand(PrimitiveRange); } TaskConfig.FrustumCull.NumCulledPrimitives.fetch_add(NumCulledPrimitives, std::memory_order_relaxed); }, bHasAlwaysVisible ? Tasks.AlwaysVisible : PrerequisiteTask, TaskConfig.TaskPriority, ExtendedTaskPriority)); }
FrustumCull
- UE不是逐个遍历Primitives,而是双循环
- 第一个循环VisWords(一个bit mask,0为不可见,1为可见):UE并不是循环bitmask,而是将其转成uint32(又因为c++中,1 个 uint32 被称为 1 个 "Word")
uint32* RESTRICT VisWords = View.PrimitiveVisibilityMap.GetData(); for (int32 WordIndex = TaskWordOffset; WordIndex < TaskWordOffset + int32(TaskConfig.FrustumCull.NumWordsPerTask) && WordIndex * NumBitsPerDWORD < BitArrayNumInner; WordIndex++) { if (!Flags.bShouldVisibilityCull) { VisBits = VisWords[WordIndex]; } }用于判断,是否需要cull,如果不需要,直接拿到之前计算的可见性
- 第二个循环BitSubIndex:通过位移操作
Mask <<= 1,逐个检查这 32 个图元
for (int32 BitSubIndex = 0; BitSubIndex < NumBitsPerDWORD && WordIndex * NumBitsPerDWORD + BitSubIndex < BitArrayNumInner; BitSubIndex++, Mask <<= 1) { //... }- 如果开启了剔除(看先前pass有没有计算过遮挡),默认设为
true;如果没开启剔除,那就通过(VisBits & Mask) == Mask读取上一个pass保存在VisBits里的状态int32 Index = WordIndex * NumBitsPerDWORD + BitSubIndex; bool bPrimitiveIsHidden = IsPrimitiveHidden(Scene, View, Index, Flags); bool bIsVisible = Flags.bShouldVisibilityCull ? true : (VisBits & Mask) == Mask; bIsVisible = bIsVisible && !bPrimitiveIsHidden; - 八叉树查询
查询当前物体所在八叉树的节点与视锥体相交、还是被包含、还是在外面
if (Flags.bUseVisibilityOctree) { // If the parent octree node was completely contained by the frustum, there is no need do an additional frustum test on the primitive bounds // If the parent octree node is partially in the frustum, perform an additional test on the primitive bounds uint32 OctreeNodeIndex = Scene.PrimitiveOctreeIndex[Index]; bIsVisible = (*VisibleNodes)[OctreeNodeIndex * 2]; bPartiallyOutside = (*VisibleNodes)[OctreeNodeIndex * 2 + 1]; } - 剔除
if (bIsVisible) { int32 VisibilityId = INDEX_NONE; if (Flags.bUseCustomCulling && ((Scene.PrimitiveOcclusionFlags[Index] & CustomVisibilityFlags) == CustomVisibilityFlags)) { VisibilityId = Scene.PrimitiveSceneProxies[Index]->GetVisibilityId(); } bIsVisible = !bPartiallyOutside || IsPrimitiveVisible(View, PermutedPlanePtr, ViewCullingFrustum, Bounds, VisibilityId, Flags); }- VisibilityId:预计算可视性被完全遮挡的id
-
判断当前物体是否完全在视锥体内部,是则bIsVisible = true;
否则执行IsPrimitiveVisible(),查询预计算可视性,是否完全看不见,若是则剔除
否则判断物体是否在视锥体六个面内,若都不在,剔除
// Returns true if the frustum and bounds intersect inline bool IsPrimitiveVisible(FViewInfo& View, const FPlane* PermutedPlanePtr, const FConvexVolume& ViewCullingFrustum, const FPrimitiveBounds& Bounds, int32 VisibilityId, FFrustumCullingFlags Flags) { // The custom culling and sphere culling are additional tests, meaning that if they pass, the // remaining culling tests will still be performed. If any of the tests fail, then the primitive // is culled, and the remaining tests do not need be performed if (Flags.bUseCustomCulling && !View.CustomVisibilityQuery->IsVisible(VisibilityId, FBoxSphereBounds(Bounds.BoxSphereBounds.Origin, Bounds.BoxSphereBounds.BoxExtent, Bounds.BoxSphereBounds.SphereRadius))) { return false; } if (Flags.bUseSphereTestFirst && !ViewCullingFrustum.IntersectSphere(Bounds.BoxSphereBounds.Origin, Bounds.BoxSphereBounds.SphereRadius)) { return false; } if (Flags.bUseFastIntersect) { return IntersectBox8Plane(Bounds.BoxSphereBounds.Origin, Bounds.BoxSphereBounds.BoxExtent, PermutedPlanePtr); } else { return ViewCullingFrustum.IntersectBox(Bounds.BoxSphereBounds.Origin, Bounds.BoxSphereBounds.BoxExtent); } }- Distance cull
-
计算最大渲染距离
float MaxDrawDistance = Bounds.MaxCullDistance * MaxDrawDistanceScale; float MinDrawDistanceSq = FMath::Square(Bounds.MinDrawDistance * MaxDrawDistanceScale);MaxDrawDistanceScale:游戏中视野距离调整
-
计算中心点距离或边缘距离
if (GDistanceCullToSphereEdge) { ComputeDistances(Bounds, ViewOriginForDistanceCulling, ClosestDistSquared, FurthestDistSquared); } else { ClosestDistSquared = FurthestDistSquared = FVector::DistSquared(Bounds.BoxSphereBounds.Origin, ViewOriginForDistanceCulling); }- 中心点测距(默认):计算camera到物体包围盒中心点的距离
如果是长城、河流这种极其巨大的单一模型,它的中心点可能在几公里外,导致玩家明明就站在模型边缘,模型却因为“中心点太远”而突然消失 - 边缘测距:计算camera到包围球边缘的最近/最远距离
- 中心点测距(默认):计算camera到物体包围盒中心点的距离
- Fade 缓冲
if (bHasMaxDrawDistance) { float MaxFadeDistanceSquared = FMath::Square(MaxDrawDistance + FadeRadius); float MinFadeDistanceSquared = FMath::Square(MaxDrawDistance - FadeRadius); if ((ClosestDistSquared < MaxFadeDistanceSquared && ClosestDistSquared > MinFadeDistanceSquared) && Scene.PrimitiveSceneProxies[Index]->IsUsingDistanceCullFade()) // Proxy call is intentionally behind the fade check to prevent an expensive memory read { FadingBits |= Mask; } }当物体因为距离太远而被剔除时,直接消失会很硬。UE在这里计算
FadeRadius缓冲带
HIZ Culling
-
为什么需要HIZ Culling
传统的硬件遮挡剔除会有1帧延迟,为了解决这一帧延迟,HIZ Culling由此诞生
-
什么是HIZ Culling
一种GPU Driven的遮挡剔除,不再由硬件计算,而是程序员手动实现
-
算法
由于HIZ Culling是GPU Culling,因此需要等到CPU Culling完全接受后,Base Pass开始前,才进行,以此减少Base Pass Draw Call。因此HIZ Culling执行时是不知道当前帧的深度图,需要使用上一帧的深度图
- 第 1 遍剔除:Compute Shader 读取上一帧的 HZB 深度图,去测试当前帧的包围盒
判定可见的物体 -> 直接写入
IndirectArgsBuffer,立刻调用ExecuteIndirect画出来判定被挡住的物体 -> 存入
OccludedList(被遮挡清单)的缓冲区里暂存- 用第一个pass得到的可见物体构建当前帧的深度mipmap图
-
第 2 遍剔除:引擎派发第二个 Compute Shader,把刚才暂存在
OccludedList里的“被挡住的物体”拿出来,用当前帧的 HZB 再测一遍! 如果发现有物体露出来,立刻把它写入第二个 Draw Call 队列画出来
也许你会疑惑为什么能这么设计?尤其是为什么可以用上一帧的深度图与这一阵的BOX分辨是否可见,得到的可见队列为什么可以用于生成深度图。笔者也十分疑惑,但深思后想通了,一个物体遮挡情况无非三种:
- 这一帧的遮挡情况与上一帧一致:这种都不需要额外处理
- 上一帧被遮挡,这一帧没被遮挡(误杀):这种情况会被列入被遮挡清单,但HZB由当前帧的可见物体构建,在HZB中它的深度更深,而在被遮挡清单中它的深度更浅,因此会纠正回来——即Pass3
- 上一帧没被遮挡,这一帧被遮挡(漏网之鱼):这种情况会被列入可见名单,但可见名单会在后续base pass的early-z再次判断遮挡,这里根据z会发现物体是被遮挡的
- UE实现
UE中等CPU Culling多线程完成并汇集后,才执行HIZ Culling
// Must happen after visibility state & scene UB has been updated. InstanceCullingManager.BeginDeferredCulling(GraphBuilder); DeferredContext = FInstanceCullingContext::CreateDeferredContext(GraphBuilder, GPUScene, *this);- HIZ Culling Compute Shader
-
InstanceCullBuildInstanceId:
- UE并没有让线程 ID 直接等于物体 ID,即派发
N / 64个 Compute Shader 线程组,而是用到了复杂的Batch
这是因为buffer里所有id并不是同一种物体,这里面很有可能包含不同种类的物体(石头,树等),如果不batch,就需要在GPU中判断当前线程是属于哪种物体,会导致严重的分支发散
uint DispatchGroupId = GetUnWrappedDispatchGroupId(GroupId); #if ENABLE_BATCH_MODE // Load Instance culling context batch info, indirection per group FContextBatchInfo BatchInfo = LoadBatchInfo(DispatchGroupId); #else // !ENABLE_BATCH_MODE // Single Instance culling context batch in the call, set up batch from the kernel parameters FContextBatchInfo BatchInfo = (FContextBatchInfo)0; BatchInfo.NumViewIds = NumViewIds; BatchInfo.DynamicInstanceIdOffset = DynamicInstanceIdOffset; BatchInfo.DynamicInstanceIdMax = DynamicInstanceIdMax; // Note: for the unbatched case, the permutation will control HZB test, so we set to true BatchInfo.bAllowOcclusionCulling = true; #endif // ENABLE_BATCH_MODE FInstanceCullingSetup InstanceCullingSetup = LoadInstanceCullingSetup(GroupId, GroupThreadIndex, BatchInfo.DynamicInstanceIdOffset, BatchInfo.DynamicInstanceIdMax, GetItemDataOffset(BatchInfo, CurrentBatchProcessingMode));- 解包
知道自己要处理哪个
InstanceId后,线程需要去全局显存里获取这个物体的信息// Extract the draw command payload const FInstanceCullingPayload Payload = LoadInstanceCullingPayload(WorkSetup.Item.Payload, BatchInfo); // Load auxiliary per-instanced-draw command info const FDrawCommandDesc DrawCommandDesc = UnpackDrawCommandDesc(DrawCommandDescs[Payload.IndirectArgIndex]); const FInstanceSceneData InstanceData = GetInstanceSceneData(InstanceId); const FPrimitiveSceneData PrimitiveData = GetPrimitiveData(InstanceData.PrimitiveId);- Culling
bool bVisible = IsInstanceVisible(PrimitiveData, InstanceData, InstanceId, BatchInfo.ViewIdsOffset + ViewIdIndex, BatchInfo.bAllowOcclusionCulling, DrawCommandDesc, CullingFlags);- 剔除无效、空的包围盒
如果实例数据本身被标记为无效,包围盒没有体积,都应该剔除
if (!InstanceData.ValidInstance) { return false; } if (dot(InstanceData.LocalBoundsExtent, InstanceData.LocalBoundsExtent) <= 0.0f) { return true; } - 禁用WPO
由于WPO在GPU上计算特别昂贵,因此这里需要看实例的距离超出设定的 WPO 禁用距离没,且材质没有强制开启 WPO,是则禁用WPO
if ((PrimitiveData.Flags & PRIMITIVE_SCENE_DATA_FLAG_WPO_DISABLE_DISTANCE) == 0) { return true; } - 屏幕尺寸剔除
基于
MinScreenSize和MaxScreenSize检查物体在屏幕上的投影占比。如果太小,说明不值得渲染,剔除Cull.ScreenSize(DrawCommandDesc.MinScreenSize, DrawCommandDesc.MaxScreenSize); bool ScreenSize(float MinScreenSize, float MaxScreenSize) { BRANCH if (bIsVisible && (MinScreenSize != MaxScreenSize || bMinScreenRadiusCull)) { // Needs to match C++ logic in ComputeLODForMeshes() and ComputeBoundsScreenRadiusSquared() so that culling matches the submitted range of Lods. // Differences to that code which shouldn't affect the result are: // * ScreenMultiple doesn't include the factor of 0.5f, and so it doesn't need applying to the ScreenSize. // * ScreenMultiple has the inverse LODDistanceScale baked in, and so it doesn't need applying to the ScreenSize. float3 CenterTranslatedWorld = mul(float4(LocalBoxCenter, 1.0f), LocalToTranslatedWorld).xyz; float InstanceDrawDistSq = length2(CenterTranslatedWorld - NaniteView.CullingViewOriginTranslatedWorld); const float RadiusSq = length2( LocalBoxExtent * NonUniformScale.xyz ); #if NANITE_CULLING_ENABLE_MIN_RADIUS_CULL if (bMinScreenRadiusCull) { // implements: ScreenRadius < MinRadius, where ScreenRadius = Radius / (LodDistanceFactor * Distance) if (RadiusSq < NaniteView.CullingViewMinRadiusTestFactorSq * InstanceDrawDistSq) { bIsVisible = false; return bIsVisible; } } // Only perform this test if enabled if (MinScreenSize != MaxScreenSize) #endif { float ScreenSizeSq = NaniteView.CullingViewScreenMultipleSq * RadiusSq / max(InstanceDrawDistSq, 1.0f); float MinScreenSizeSq = MinScreenSize * MinScreenSize; float MaxScreenSizeSq = MaxScreenSize * MaxScreenSize; bIsVisible = ScreenSizeSq >= MinScreenSizeSq && (MaxScreenSize == 0 || ScreenSizeSq < MaxScreenSizeSq); } } return bIsVisible; } - 全局裁剪平面
对于某些平面(如反射)根本不会渲染base pass的,这种应该直接剔除
Cull.GlobalClipPlane(); void GlobalClipPlane() { #if USE_GLOBAL_CLIP_PLANE BRANCH if( bIsVisible ) { // Prevent the result being "intersecting" when the global plane is invalid (effectively disabled). This prevents clusters that // should rasterize in SW from being sent down the HW path if (bSkipCullGlobalClipPlane || all(NaniteView.TranslatedGlobalClipPlane.xyz == (float3)0.0f)) { return; } // Get the global clipping plane in local space (multiply by inverse transpose) const float4 PlaneLocal = mul(LocalToTranslatedWorld, NaniteView.TranslatedGlobalClipPlane); // AABB/Plane intersection test const float3 ScaledExtents = LocalBoxExtent * NonUniformScale.xyz; const float ExtentAlongPlaneN = dot(abs(ScaledExtents * PlaneLocal.xyz), (float3)1.0f); const float CenterDist = dot(PlaneLocal, float4(LocalBoxCenter, 1.0f)); if (CenterDist < -ExtentAlongPlaneN) { bIsVisible = false; } else if (CenterDist < ExtentAlongPlaneN) { bNeedsClipping = true; } } #endif } - Frustum Culling
BRANCH if( Cull.bIsVisible ) { Cull.Frustum(); }FFrustumCullData Frustum() { // Frustum test against current frame FFrustumCullData FrustumCull = BoxCullFrustum( LocalBoxCenter, LocalBoxExtent, LocalToTranslatedWorld, NaniteView.TranslatedWorldToClip, NaniteView.ViewToClip, bIsOrtho, bNearClip, bSkipCullFrustum ); bIsVisible = bIsVisible && FrustumCull.bIsVisible; bNeedsClipping = bNeedsClipping || FrustumCull.bCrossesNearPlane || FrustumCull.bCrossesFarPlane; #if VIRTUAL_TEXTURE_TARGET if (bIsVisible && !bIsStaticGeometry) { bIsVisible = FrustumCull.RectMax.z > DynamicDepthCullRange.x && FrustumCull.RectMin.z < DynamicDepthCullRange.y; } #endif return FrustumCull; }// Splitting the transform in two generates much better code on DXC when WorldToClip is scalar. FFrustumCullData BoxCullFrustum( float3 Center, float3 Extent, float4x4 LocalToWorld, float4x4 WorldToClip, float4x4 ViewToClip, bool bIsOrtho, bool bNearClip, bool bSkipFrustumCull ) { // NOTE: We assume here that if near clipping is disabled the projection is orthographic, as disabling near clipping is // a feature for directional light shadows, and disabling near clipping for a perspective projection doesn't make much sense. // Checking both also serves to help out DCE when either is a compile-time constant. checkSlow(bIsOrtho || bNearClip); if (bIsOrtho || !bNearClip) { return BoxCullFrustumOrtho( Center, Extent, LocalToWorld, WorldToClip, bNearClip, bSkipFrustumCull ); } else { return BoxCullFrustumPerspective( Center, Extent, LocalToWorld, WorldToClip, ViewToClip, bSkipFrustumCull ); } }FFrustumCullData BoxCullFrustumPerspective(float3 Center, float3 Extent, float4x4 LocalToWorld, float4x4 WorldToClip, float4x4 ViewToClip, bool bSkipFrustumCull) { FFrustumCullData Cull; float4 DX = (2.0f * Extent.x) * mul(LocalToWorld[0], WorldToClip); float4 DY = (2.0f * Extent.y) * mul(LocalToWorld[1], WorldToClip); float MinW = +INFINITE_FLOAT; float MaxW = -INFINITE_FLOAT; float4 PlanesMin = 1.0f; Cull.RectMin = float3(+1, +1, +1); Cull.RectMax = float3(-1, -1, -1); // To discourage the compiler from overlapping the entire calculation, which uses an excessive number of VGPRs, the evaluation is split into 4 isolated passes with two corners per pass. // There seems to be no additional benefit from evaluating just one corner per pass and it prevents the use of fast min3/max3 intrinsics. #define EVAL_POINTS(PC0, PC1) \ MinW = min3(MinW, PC0.w, PC1.w); \ MaxW = max3(MaxW, PC0.w, PC1.w); \ PlanesMin = min3(PlanesMin, float4(PC0.xy, -PC0.xy) - PC0.w, float4(PC1.xy, -PC1.xy) - PC1.w); \ float2 PS0 = PC0.xy / PC0.w; \ float2 PS1 = PC1.xy / PC1.w; \ Cull.RectMin.xy = min3(Cull.RectMin.xy, PS0, PS1); \ Cull.RectMax.xy = max3(Cull.RectMax.xy, PS0, PS1); float4 PC000, PC100; PLATFORM_SPECIFIC_ISOLATE { float4 DZ = (2.0f * Extent.z) * mul(LocalToWorld[2], WorldToClip); PC000 = mul(mul(float4(Center - Extent, 1.0), LocalToWorld), WorldToClip); PC100 = PC000 + DZ; EVAL_POINTS(PC000, PC100); } float4 PC001, PC101; PLATFORM_SPECIFIC_ISOLATE { PC001 = PC000 + DX; PC101 = PC100 + DX; EVAL_POINTS(PC001, PC101); } float4 PC011, PC111; PLATFORM_SPECIFIC_ISOLATE { PC011 = PC001 + DY; PC111 = PC101 + DY; EVAL_POINTS(PC011, PC111); } float4 PC010, PC110; PLATFORM_SPECIFIC_ISOLATE { PC010 = PC011 - DX; PC110 = PC111 - DX; EVAL_POINTS(PC010, PC110); } #undef EVAL_POINTS float MinZ = MaxW * ViewToClip[2][2] + ViewToClip[3][2]; float MaxZ = MinW * ViewToClip[2][2] + ViewToClip[3][2]; // Near is z=1 bool bInFrontNearPlane = MinW <= MaxZ; bool bBehindNearPlane = MaxW > MinZ; // Far is z=0 bool bInFrontFarPlane = 0 < MaxZ; bool bBehindFarPlane = 0 >= MinZ; Cull.bCrossesNearPlane = bInFrontNearPlane; Cull.bCrossesFarPlane = bBehindFarPlane; Cull.bIsVisible = bBehindNearPlane && bInFrontFarPlane; if (MinW <= 0.0f && MaxW > 0.0f) { Cull.RectMin = float3(-1, -1, -1); Cull.RectMax = float3(+1, +1, +1); } else { Cull.RectMin.z = MinZ / MaxW; Cull.RectMax.z = MaxZ / MinW; } Cull.bFrustumSideCulled = false; if (!bSkipFrustumCull) { const bool bFrustumCull = any(PlanesMin > 0.0f); Cull.bFrustumSideCulled = Cull.bIsVisible && bFrustumCull; Cull.bIsVisible = Cull.bIsVisible && !bFrustumCull; } return Cull; } - 硬件与 HZB 遮挡剔除
如果物体在镜头内,且开启了
#if OCCLUSION_CULL_INSTANCES,代码会检查该物体是否被前方其他更大的物体挡住- HZB Test:即上述提到的第二个pass——屏幕矩形与 HZB 深度图对比,判断物体是否被遮挡
- 硬件遮挡查询掩码:检查硬件级别的遮挡查询缓冲区,确保该实例确实需要被渲染
BRANCH if (Cull.bIsVisible && bAllowOcclusionCulling) { const bool bPrevIsOrtho = IsOrthoProjection(NaniteView.PrevViewToClip); FFrustumCullData PrevCull = BoxCullFrustum(LocalBoundsCenter, LocalBoundsExtent, DynamicData.PrevLocalToTranslatedWorld, NaniteView.PrevTranslatedWorldToClip, NaniteView.PrevViewToClip, bPrevIsOrtho, Cull.bNearClip, true); BRANCH if (PrevCull.bIsVisible && !PrevCull.bCrossesNearPlane) { FScreenRect PrevRect = GetScreenRect( NaniteView.HZBTestViewRect, PrevCull, 4 ); // Avoid cases where instance might self-occlude the HZB test due to minor precision differences PrevRect.Depth = RoundUpF16(PrevRect.Depth); Cull.bIsVisible = IsVisibleHZB( PrevRect, true ); } BRANCH if (NaniteView.InstanceOcclusionQueryMask && Cull.bIsVisible) { if ((InstanceOcclusionQueryBuffer[InstanceId] & NaniteView.InstanceOcclusionQueryMask) == 0) { Cull.bIsVisible = false; } } }
- UE并没有让线程 ID 直接等于物体 ID,即派发
总结
UE5的Culling非常庞大,设计的也非常优雅,大致如下:
- 宏观调度:
InitViews在正式渲染任何像素之前,
InitViews承担整个管线的准备工作
- 视角与矩阵准备: 提取视锥体平面,更新常量缓冲区
- 图元相关性评估(Relevance): 判断场景物体材质属性(是否透明、是否投射阴影等)
- 并发执行: 将剔除任务、骨骼动画更新、粒子计算等极其耗时的操作,打包成一个个 Task,扔给线程池异步执行
- CPU 端剔除:粗粒度的宏观过滤
- 预计算可见性:针对静态场景。将地图划分为 3DCell并预先烘焙可见性。为了解决海量数据的寻址问题,UE 使用 2D 哈希分桶 和动态数据块(Chunk)加载,将查询时间压缩至 O(1) 级别
- 八叉树与视锥体剔除:场景的八叉树使用极度压缩的 Two-Bit 标记(完全包含/部分相交)快速判断物体是否在镜头视野内
- 距离剔除:根据物体包围盒的中心点或边缘距离剔除,并引入
FadeRadius缓冲带,防止物体突然消失带来的视觉突变
- GPU 端剔除:细粒度的微观过滤
- Two-Pass HZB Culling:解决硬件遮挡查询的“一帧延迟”
- 额外剔除:屏幕尺寸剔除(太小的不画)、全局裁剪平面剔除,针对 WPO材质做距离剔除




Comments | NOTHING