rafx

Renderer Architecture

rafx-renderer and the rafx-framework render_features were inspired by the 2015 GDC talk “Destiny’s Multithreaded Rendering Architecture”.

There is a video available at https://www.youtube.com/watch?v=0nTDFLMLX9k.

Overview

Comparison with Destiny

The following is a comparison of rafx with Destiny in a page-by-page (using the PDF slides linked above!) manner to indicate compare or contrast names or design decisions between the architecture described for Destiny and the one implemented by rafx. Each note will be preceded by a page number. Generally the notes will be in the same order as the Destiny slides such that it is possible to read the Destiny slides and refer to this list for any differences in a mostly straightforward way.

Code examples from rafx will be placed into a code block:

// Like this.

Quotes from the Destiny slides will be placed into a quote block:

Like this.

> We provided an interface for feature renderers entry points with strict rules for data they are allowed to read or write
> at each entry point. Feature renderers could only read the render node data and statically cached render object data. 
> They are also only allowed to output to frame packet data. The latter was done to automatically ensure synchronization 
> –double-buffering of dynamic data was automatic for feature writers as long as they wrote to the frame packet.
- (pg 126) It also means that parallelization isn't apparent to the feature writer.
> The core renderer architecture generates jobs for each phase by batching across multiple visible render objects of the
> same feature type for several entry points (for example, batching all extract entry points into one extract job). This 
> jobification is done transparently to feature writers.
- (pg 134, 138) Each extract job is running in parallel, but within that extract job there is a well-defined order for the
  entry points. First, `begin_per_frame_extract` is called. Then, possibly in parallel, `extract_render_object_instance` 
  is called for each entity in the frame. Then, `extract_render_object_instance_per_view` is called in parallel across all 
  views for each entity visible in each view. As each view finishes processing, `end_per_view_extract` is called for that
  view. After all views have finished, `end_per_frame_extract` is called. The same order applies to the prepare jobs.
> Extract and prepare computations anddata aresplit up by frequency (view, frame, object).This allowed us to share data 
> across different views, across different render objects to save memory in frame packet (ex: only need one copy of skinning 
> transforms for any render objects using the same game object), and performance (only compute skinning transforms once 
> for all render objects using them).
> For example, we run extract and prepare per frame operations to perform expensive computations that have to happen only 
> once per entire frame as long as this render object is visible in any view.
- (pg 135) The parallelization is driven by the `RendererThreadPool` implementation and the synchronization is driven by 
  custom collections like `AtomicOnceCellArray` and generic implementations of `ExtractJob` or `PrepareJob`.
> The core architecture setsup synchronization primitives to ensure safe multi-threaded access. When feature renderers 
> writers write code for ‘extract_per_frame’ for example, they don’t need to worry that this entry point will be executed 
> from different jobs and may write to the same data in frame packet. However, it is important to use a performant 
> synchronization method for this operation since it will be a high-frequency operation per frame.
- (pg 139) **NOTE:** `rafx` does not implement the "per game object" optimization for data shared _across 
  features_ but I don't believe it would be hard to extend it in that direction.
- (pg 144) The steps to write a feature: copy and paste a feature template, find / replace the name, define your frame
  packet and submit packet data, then implement the extract, prepare, and write jobs using the defined entry points. An
  example frame packet:
```rust
pub struct MeshRenderFeatureTypes;

//---------
// EXTRACT
//---------

pub struct MeshPerFrameData {
    pub depth_material_pass: Option<ResourceArc<MaterialPassResource>>,
}

pub struct MeshRenderObjectInstanceData {
    pub mesh_asset: MeshAsset,
    pub translation: Vec3,
    pub rotation: Quat,
    pub scale: Vec3,
}

#[derive(Default)]
pub struct MeshPerViewData {
    pub directional_lights: [Option<ExtractedDirectionalLight>; 16],
    pub point_lights: [Option<ExtractedPointLight>; 16],
    pub spot_lights: [Option<ExtractedSpotLight>; 16],
    pub num_directional_lights: u32,
    pub num_point_lights: u32,
    pub num_spot_lights: u32,
}

pub struct ExtractedDirectionalLight {
    pub light: DirectionalLightComponent,
    pub object_id: ObjectId,
}

pub struct ExtractedPointLight {
    pub light: PointLightComponent,
    pub transform: TransformComponent,
    pub object_id: ObjectId,
}

pub struct ExtractedSpotLight {
    pub light: SpotLightComponent,
    pub transform: TransformComponent,
    pub object_id: ObjectId,
}

impl FramePacketData for MeshRenderFeatureTypes {
    type PerFrameData = MeshPerFrameData;
    type RenderObjectInstanceData = Option<MeshRenderObjectInstanceData>;
    type PerViewData = MeshPerViewData;
    type RenderObjectInstancePerViewData = ();
}

pub type MeshFramePacket = FramePacket<MeshRenderFeatureTypes>;

//---------
// PREPARE
//---------

pub const MAX_SHADOW_MAPS_2D: usize = 32;
pub const MAX_SHADOW_MAPS_CUBE: usize = 16;

pub struct MeshPartDescriptorSetPair {
    pub depth_descriptor_set: DescriptorSetArc,
    pub opaque_descriptor_set: DescriptorSetArc,
}

pub struct MeshPerFrameSubmitData {
    pub num_shadow_map_2d: usize,
    pub shadow_map_2d_data: [shaders::mesh_frag::ShadowMap2DDataStd140; MAX_SHADOW_MAPS_2D],
    pub shadow_map_2d_image_views: [Option<ResourceArc<ImageViewResource>>; MAX_SHADOW_MAPS_2D],
    pub num_shadow_map_cube: usize,
    pub shadow_map_cube_data: [shaders::mesh_frag::ShadowMapCubeDataStd140; MAX_SHADOW_MAPS_CUBE],
    pub shadow_map_cube_image_views: [Option<ResourceArc<ImageViewResource>>; MAX_SHADOW_MAPS_CUBE],
    pub shadow_map_image_index_remap: [Option<usize>; MAX_SHADOW_MAPS_2D + MAX_SHADOW_MAPS_CUBE],
    pub mesh_part_descriptor_sets: Arc<AtomicOnceCellStack<MeshPartDescriptorSetPair>>,
    pub opaque_per_view_descriptor_set_layout: Option<ResourceArc<DescriptorSetLayoutResource>>,
}

pub struct MeshRenderObjectInstanceSubmitData {
    pub mesh_part_descriptor_set_index: usize,
}

impl SubmitPacketData for MeshRenderFeatureTypes {
    type PerFrameSubmitData = Box<MeshPerFrameSubmitData>;
    type RenderObjectInstanceSubmitData = MeshRenderObjectInstanceSubmitData;
    type PerViewSubmitData = MeshPerViewSubmitData;
    type RenderObjectInstancePerViewSubmitData = ();
    type SubmitNodeData = MeshDrawCall;

    type RenderFeature = MeshRenderFeature;
}

pub type MeshSubmitPacket = SubmitPacket<MeshRenderFeatureTypes>;

//-------
// WRITE
//-------

pub struct MeshPerViewSubmitData {
    pub opaque_descriptor_set: Option<DescriptorSetArc>,
    pub depth_descriptor_set: Option<DescriptorSetArc>,
}

pub struct MeshDrawCall {
    pub mesh_asset: MeshAsset,
    pub mesh_part_index: usize,
    pub mesh_part_descriptor_set_index: usize,
}