rafx-renderer
and the rafx-framework
render_features
were inspired by the 2015 GDC talk “Destiny’s Multithreaded Rendering Architecture”.
There is a video available at https://www.youtube.com/watch?v=0nTDFLMLX9k.
The following is a comparison of rafx
with Destiny
in a page-by-page (using the PDF slides linked above!) manner to
indicate compare or contrast names or design decisions between the architecture described for Destiny
and the one implemented
by rafx
. Each note will be preceded by a page number. Generally the notes will be in the same order as the Destiny
slides
such that it is possible to read the Destiny
slides and refer to this list for any differences in a mostly straightforward way.
Code examples from rafx
will be placed into a code block:
// Like this.
Quotes from the Destiny
slides will be placed into a quote block:
Like this.
RenderObject
is cached data. It should be infrequently, if ever, updated. RenderObjects
are registered with
a RenderObjectSet
.RenderObject
is associated with a RenderFeature
. The RenderFeature
completely defines the cache-coherent
structure of that feature in the frame packet for all extract / prepare / submit work.
Each render object type maps to a render feature which is the basic encapsulation unit in our architecture. We group render functionality by same data representation and code paths. For example, most of the skinned characters need dynamic transforms for skinned data, shaders and meshes, and iterate over the mesh containers to generate drawcalls, uploading skinned matrices to the GPU.
RenderObjectComponent
s to own any RenderObjectHandle
s. An entity can have
an arbitrary number of RenderObjectHandle
s associated with it.VisibilityComponent
with a handle to the VisibilityObject
. The VisibilityObject
contains
the same RenderObjectHandle
as the entity, but as a flat list of RenderObjectId
s instead. The VisibilityObject
points
back to the entity via an ObjectId
.(pg 71, 75) After each game frame, compute_views_job
determines the views needed for rendering the next frame. This
would look something like “compute main view visibility, then grab all of the visible lights, reflective surfaces, etc”.
DISCREPANCY: compute_views_job
does not exist in rafx
today. The closest is something like add_render_views
.
compute_views_job
could be accelerated by predicting static visibility – it’s not actually part of the
renderer for that reason. It’s expected that the game (or render graph, or whatever) will implement this in whatever way
makes sense.run_view_visibility_jobs
. Parallelization is handled by
the RendererThreadPool
with ViewVisibilityJob
s.calculate_frame_packet_size
,
new_frame_packet
, and populate_frame_packet
. This occurs in parallel across features through RendererThreadPool
.
For each feature, we look at each view to determine if it’s relevant to that feature with is_view_relevant
. If it is,
we reserve space for a view packet. If there’s no views relevant for the feature, we drop it entirely – nothing is allocated.
Otherwise, we perform all allocation of the FramePacket
for that feature at once. This is analogous to Destiny’s
“populate render nodes” job.
The next job we’re going to run for this view is the populate render nodes job. This is the job that sets up the cache coherent arrays of data we call render nodes that the rest of the job chain will operate on. From this point onward, rendering jobs be driven from an array of render nodes. Each render nodes maps to a unique render object visible in each view. This data structure is defined to be very compact for iteration efficiency
RenderFeatureFramePacket
for each feature is driven by feature-specific types. This allows each feature
to reserve space for exactly what it needs up front.
we selectively choose what data to store in the render nodes by type. The view node can also allocate data in the frame packet if needed. This allocation details are defined uniquely by each render object type (which is stored in the view node).
per frame render node
is replaced by RenderObjectInstance
and per view render node
is RenderObjectInstancePerView
.
Data that is only needed for a frame is PerFrame
and only needed for a view is PerView
.RenderObjectInstance
allocations point back to the RenderObject
and entity (via the ObjectId
).RenderObjectInstancePerView
allocations point back to the RenderObjectInstance
.run_extract_jobs
in parallel. RendererThreadPool
controls the parallelization of the
RenderFeatureExtractJob
s.
For each view node in the view packet, we’re going to extract data out of the game state object into the frame packet. The core renderer will iterate coherently over the view nodes sorted by render object type and execute extract job entry point for each render object type. Those jobs will operate on only this data as inputs:-Individual frame and view nodes for that visible element-Statically cached data from the corresponding render object. During extract, we also reach out to the game object using the handle cached in the render object.
During extract, we actually generate separate jobs to go
during extract. We generate multiple extract jobs for each view. The jobs are generated with smart batching for different render objects types.
RendererThreadPool
. The example
in DemoRendererThreadPool
shows one way to do so.ObjectId
in the RenderObjectInstance
. This should be the same as
the game world’s entity ID.RenderFeature
can create a RenderFeaturePrepareJob
. The prepare job will “consume” the FramePacket
from the extract job to avoid allocations. The RenderFeaturePrepareJob
writes into a SubmitPacket
created by
new_submit_packet
. The closest comparison to Destiny for the SubmitPacket
is if the FramePacket
was divided in two –
“prepared for GPU” data goes into the SubmitPacket
and “extracted from world” data goes into the FramePacket
.RenderFeatureExtractJob
can be easily implemented by an ExtractJob
wrapping the ExtractJobEntryPoints
trait. Only the necessary end points must be implemented. Ditto for RenderFeaturePrepareJob
with PrepareJob
and
PrepareJobEntryPoints
. The writers just use RenderFeatureWriteJob
– at this point in time, there’s no generic
framework scaffolding for them.
Each render feature is implemented with a unique feature renderer responsible for all work for that graphics feature. Feature renderers are the main interface for implementing graphics features in the Destiny engine. This interface defines how graphics features represent their cached render data; how we extract dynamic data from game objects into frame packet render nodes, how we convert that data to GPU-friendly formats, and, most importantly, the code path to render these objects. Feature renderer interface exposes entry points for each of our engine’s phases which the core renderer architecture converts into jobs. This interface also provides data encapsulation to allow safe multi-threaded data access ```rust /// Called once at the start of the
extract
step when anyRenderView
in the frame is /// relevant to thisRenderFeature
. fn begin_per_frame_extract( &self, context: &ExtractPerFrameContext<’extract, ‘, Self>, ) { }
/// Called once for each instance of an Entity
and RenderObject
in the frame matching this
/// RenderFeature
.
fn extract_render_object_instance(
&self,
job_context: &mut Self::RenderObjectInstanceJobContextT,
_context: &ExtractRenderObjectInstanceContext<’extract, ‘, Self>,
) {
}
/// Called once for each instance of an Entity
and RenderObject
in each RenderView
relevant
/// to this RenderFeature
.
fn extract_render_object_instance_per_view(
&self,
job_context: &mut Self::RenderObjectInstancePerViewJobContextT,
_context: &ExtractRenderObjectInstancePerViewContext<’extract, ‘, Self>,
) {
}
/// Called once for each relevant RenderView
. This function is only run after all instances of
/// extract_render_object_instance_per_view
have finished for that RenderView
.
fn end_per_view_extract(
&self,
context: &ExtractPerViewContext<’extract, ‘, Self>,
) {
}
/// Called once at the end of the extract
step when any RenderView
in the frame is
/// relevant to this RenderFeature
.
fn end_per_frame_extract(
&self,
context: &ExtractPerFrameContext<’extract, ‘, Self>,
) {
}
```
(pg 125) The above interface ensures that the extract jobs only read from the world and write to the their frame packet. An example follows. ```rust fn extract_render_object_instance( &self, job_context: &mut RenderObjectsJobContext<’extract, MeshRenderObject>, context: &ExtractRenderObjectInstanceContext<’extract, ‘_, Self>, ) { let render_object_static_data = job_context .render_objects .get_id(context.render_object_id());
let mesh_asset = self
.asset_manager
.committed_asset(&render_object_static_data.mesh);
context.set_render_object_instance_data(mesh_asset.and_then(|mesh_asset| {
let entry = self.world.entry_ref(context.object_id().into()).unwrap();
let transform_component = entry.get_component::<TransformComponent>().unwrap();
Some(MeshRenderObjectInstanceData {
mesh_asset: mesh_asset.clone(),
translation: transform_component.translation,
rotation: transform_component.rotation,
scale: transform_component.scale,
})
})); }
> We provided an interface for feature renderers entry points with strict rules for data they are allowed to read or write
> at each entry point. Feature renderers could only read the render node data and statically cached render object data.
> They are also only allowed to output to frame packet data. The latter was done to automatically ensure synchronization
> –double-buffering of dynamic data was automatic for feature writers as long as they wrote to the frame packet.
- (pg 126) It also means that parallelization isn't apparent to the feature writer.
> The core renderer architecture generates jobs for each phase by batching across multiple visible render objects of the
> same feature type for several entry points (for example, batching all extract entry points into one extract job). This
> jobification is done transparently to feature writers.
- (pg 134, 138) Each extract job is running in parallel, but within that extract job there is a well-defined order for the
entry points. First, `begin_per_frame_extract` is called. Then, possibly in parallel, `extract_render_object_instance`
is called for each entity in the frame. Then, `extract_render_object_instance_per_view` is called in parallel across all
views for each entity visible in each view. As each view finishes processing, `end_per_view_extract` is called for that
view. After all views have finished, `end_per_frame_extract` is called. The same order applies to the prepare jobs.
> Extract and prepare computations anddata aresplit up by frequency (view, frame, object).This allowed us to share data
> across different views, across different render objects to save memory in frame packet (ex: only need one copy of skinning
> transforms for any render objects using the same game object), and performance (only compute skinning transforms once
> for all render objects using them).
> For example, we run extract and prepare per frame operations to perform expensive computations that have to happen only
> once per entire frame as long as this render object is visible in any view.
- (pg 135) The parallelization is driven by the `RendererThreadPool` implementation and the synchronization is driven by
custom collections like `AtomicOnceCellArray` and generic implementations of `ExtractJob` or `PrepareJob`.
> The core architecture setsup synchronization primitives to ensure safe multi-threaded access. When feature renderers
> writers write code for ‘extract_per_frame’ for example, they don’t need to worry that this entry point will be executed
> from different jobs and may write to the same data in frame packet. However, it is important to use a performant
> synchronization method for this operation since it will be a high-frequency operation per frame.
- (pg 139) **NOTE:** `rafx` does not implement the "per game object" optimization for data shared _across
features_ but I don't believe it would be hard to extend it in that direction.
- (pg 144) The steps to write a feature: copy and paste a feature template, find / replace the name, define your frame
packet and submit packet data, then implement the extract, prepare, and write jobs using the defined entry points. An
example frame packet:
```rust
pub struct MeshRenderFeatureTypes;
//---------
// EXTRACT
//---------
pub struct MeshPerFrameData {
pub depth_material_pass: Option<ResourceArc<MaterialPassResource>>,
}
pub struct MeshRenderObjectInstanceData {
pub mesh_asset: MeshAsset,
pub translation: Vec3,
pub rotation: Quat,
pub scale: Vec3,
}
#[derive(Default)]
pub struct MeshPerViewData {
pub directional_lights: [Option<ExtractedDirectionalLight>; 16],
pub point_lights: [Option<ExtractedPointLight>; 16],
pub spot_lights: [Option<ExtractedSpotLight>; 16],
pub num_directional_lights: u32,
pub num_point_lights: u32,
pub num_spot_lights: u32,
}
pub struct ExtractedDirectionalLight {
pub light: DirectionalLightComponent,
pub object_id: ObjectId,
}
pub struct ExtractedPointLight {
pub light: PointLightComponent,
pub transform: TransformComponent,
pub object_id: ObjectId,
}
pub struct ExtractedSpotLight {
pub light: SpotLightComponent,
pub transform: TransformComponent,
pub object_id: ObjectId,
}
impl FramePacketData for MeshRenderFeatureTypes {
type PerFrameData = MeshPerFrameData;
type RenderObjectInstanceData = Option<MeshRenderObjectInstanceData>;
type PerViewData = MeshPerViewData;
type RenderObjectInstancePerViewData = ();
}
pub type MeshFramePacket = FramePacket<MeshRenderFeatureTypes>;
//---------
// PREPARE
//---------
pub const MAX_SHADOW_MAPS_2D: usize = 32;
pub const MAX_SHADOW_MAPS_CUBE: usize = 16;
pub struct MeshPartDescriptorSetPair {
pub depth_descriptor_set: DescriptorSetArc,
pub opaque_descriptor_set: DescriptorSetArc,
}
pub struct MeshPerFrameSubmitData {
pub num_shadow_map_2d: usize,
pub shadow_map_2d_data: [shaders::mesh_frag::ShadowMap2DDataStd140; MAX_SHADOW_MAPS_2D],
pub shadow_map_2d_image_views: [Option<ResourceArc<ImageViewResource>>; MAX_SHADOW_MAPS_2D],
pub num_shadow_map_cube: usize,
pub shadow_map_cube_data: [shaders::mesh_frag::ShadowMapCubeDataStd140; MAX_SHADOW_MAPS_CUBE],
pub shadow_map_cube_image_views: [Option<ResourceArc<ImageViewResource>>; MAX_SHADOW_MAPS_CUBE],
pub shadow_map_image_index_remap: [Option<usize>; MAX_SHADOW_MAPS_2D + MAX_SHADOW_MAPS_CUBE],
pub mesh_part_descriptor_sets: Arc<AtomicOnceCellStack<MeshPartDescriptorSetPair>>,
pub opaque_per_view_descriptor_set_layout: Option<ResourceArc<DescriptorSetLayoutResource>>,
}
pub struct MeshRenderObjectInstanceSubmitData {
pub mesh_part_descriptor_set_index: usize,
}
impl SubmitPacketData for MeshRenderFeatureTypes {
type PerFrameSubmitData = Box<MeshPerFrameSubmitData>;
type RenderObjectInstanceSubmitData = MeshRenderObjectInstanceSubmitData;
type PerViewSubmitData = MeshPerViewSubmitData;
type RenderObjectInstancePerViewSubmitData = ();
type SubmitNodeData = MeshDrawCall;
type RenderFeature = MeshRenderFeature;
}
pub type MeshSubmitPacket = SubmitPacket<MeshRenderFeatureTypes>;
//-------
// WRITE
//-------
pub struct MeshPerViewSubmitData {
pub opaque_descriptor_set: Option<DescriptorSetArc>,
pub depth_descriptor_set: Option<DescriptorSetArc>,
}
pub struct MeshDrawCall {
pub mesh_asset: MeshAsset,
pub mesh_part_index: usize,
pub mesh_part_descriptor_set_index: usize,
}
rafx
’s equivalent of the “high-level submit script” is the RenderGraph
.RenderGraph
.write_view_phase
in PreparedRenderData
for RenderPhases
relevant to that RenderView
.rafx
– like OpaquePass
. However, what Destiny
calls a “render stage” is called a RenderPhase
in rafx
.
Certain render passes are required to render regardless of any data present in the pipeline –for example, the shading pass, the tone mapping and resolve pass, etc. Those are not considered render stages in our system. Render stage directives are a high-level command to execute submission for a specific view for a specific stage of the frame pipeline where we might have content-driven data (for example, G-buffer or transparents, or shadows elements).
write_view_phase
looks like.
write_view_phase::<OpaqueRenderPhase>(&main_view, &mut write_context)?
RenderObject
is “subscribed” to a RenderPhase
when the RenderFeature
creates SubmitNodes
for that
RenderObject
in the RenderPhase
.RenderObject
s define the mapping directly, e.g.
rather than SpriteRenderObject
pointing back to a SpriteRenderFeature
with subscriptions to the Opaque
and Transparent
render phases, the SpriteRenderObject
would subscribe directly to either Opaque
or Transparent
depending only on
if it has transparency.RenderView
subscribes to one or more RenderPhase
at runtime with the RenderPhaseMask
.ViewVisibilityJob::query_visibility
.RenderPhase
is basically the point of calculate_frame_packet_size
.rafx
has a ViewPhaseSubmitNodeBlock
for all submit nodes in a RenderView
and RenderPhase
, and a separate
RenderFeatureSubmitNodeBlock
for submit nodes specific to a RenderFeature
.SubmitNodeSortFunction
defined when registering a RenderPhase
. The actual sort occurs
during create_submit_node_blocks_for_view
in create_submit_node_blocks
.AtomicOnceCellStack
.RenderObjectInstancePerView
actually contains a copy of the RenderObjectInstance
so that we don’t need to jump to another array to lookup those fields.write_view_phase
again, using the RenderFeatureWriteJob
entry points for each RenderFeature
needed.rafx
doesn’t perform the batching described for submit nodes.RenderFeatureWriteJob
is implemented against the data in the FramePacket
, SubmitPacket
, and maybe
some other persistent storage like a RenderObjectSet
.rafx
doesn’t go wide during the write / submit phase.rafx
provides out of the box – the expectation is that the
application will implement whatever threading or parallelization is needed inside of the RendererThreadPool
. This would
include cost functions or other batching. DemoRendererThreadPool
provides a very simple, toy example of a way to adjust
parallelization on a per-feature basis to scale with the number of render objects.