Chapters

Hide chapters

Metal by Tutorials

Fourth Edition · macOS 14, iOS 17 · Swift 5.9 · Xcode 15

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

15. Tile-Based Deferred Rendering
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

Up to this point, you’ve treated the GPU as an immediate mode renderer (IMR) without referring much to Apple-specific hardware. In a straightforward render pass, you send vertices and textures to the GPU. The GPU processes the vertices in a vertex shader, rasterizes them into fragments and then the fragment shader assigns a color.

Immediate mode pipeline
Immediate mode pipeline

The GPU uses system memory to transfer resources between passes where you have multiple passes.

Immediate mode using system memory
Immediate mode using system memory

Since the A7 64-bit mobile chip, Apple began transitioning to a tile-based deferred rendering (TBDR) architecture. With the arrival of Apple Silicon on Macs, this transition is complete.

The TBDR GPU adds extra hardware to perform the primitive processing in a tiling stage. This process breaks up the screen into tiles and assigns the geometry from the vertex stage to a tile. It then forwards each tile to the rasterizer. Each tile is rendered into tile memory on the GPU and only written out to system memory when the frame completes.

TBDR pipeline
TBDR pipeline

Programmable Blending

Instead of writing the texture in one pass and reading it in the next pass, tile memory enables programmable blending. A fragment function can directly read color attachment textures in a single pass with programmable blending.

Programmable blending with memoryless textures
Programmable blending with memoryless textures

The G-buffer doesn’t have to transfer the temporary textures to system memory anymore. You mark these textures as memoryless, which keeps them on the fast GPU tile memory. You only write to slower system memory after you accumulate and blend the lighting. This speeds up rendering because you use less bandwidth.

Tiled Deferred Rendering

Confusingly, tiled deferred rendering can apply to the deferred rendering or shading technique as well as the name of an architecture. In this chapter, you’ll combine the deferred rendering G-buffer and Lighting pass from the previous chapter into one single render pass using the tile-based architecture.

The Starter Project

➤ In Xcode, open the starter project for this chapter.

The starter app
Npi jtumwuz idt

GPU frame capture
DLO lkepu wopwaki

Starter app render passes
Lrampuw etz qoldim batcaf

1. Making the Textures Memoryless

➤ Open TiledDeferredRenderPass.swift. In resize(view:size:), change the storage mode for all four textures from storageMode: private to:

storageMode: .memoryless

2. Changing the Store Action

➤ Stay in TiledDeferredRenderPass.swift. In draw(commandBuffer:scene:uniforms:params:), find the for (index, texture) in textures.enumerated() loop and change attachment?.storeAction = .store to:

attachment?.storeAction = .dontCare

3. Removing the Fragment Textures

➤ In drawLightingRenderPass(renderEncoder:scene:uniforms:params:), remove:

renderEncoder.setFragmentTexture(
  albedoTexture,
  index: BaseColor.index)
renderEncoder.setFragmentTexture(
  normalTexture,
  index: NormalTexture.index)
renderEncoder.setFragmentTexture(
  positionTexture,
  index: PositionTexture.index)

4. Creating the New Fragment Functions

➤ Still in TiledDeferredRenderPass.swift, in init(view:), change the three pipeline state objects’ tiled: false parameters to:

tiled: true
texture2d<float> albedoTexture [[texture(BaseColor)]],
texture2d<float> normalTexture [[texture(NormalTexture)]]]]
GBufferOut gBuffer
uint2 coord = uint2(in.position.xy);
float4 albedo = albedoTexture.read(coord);
float3 normal = normalTexture.read(coord).xyz;
float4 albedo = gBuffer.albedo;
float3 normal = gBuffer.normal.xyz;
texture2d<float> normalTexture [[texture(NormalTexture)]],
texture2d<float> positionTexture [[texture(PositionTexture)]],
GBufferOut gBuffer
uint2 coords = uint2(in.position.xy);
float3 normal = normalTexture.read(coords).xyz;
float3 position = positionTexture.read(coords).xyz;
float3 normal = gBuffer.normal.xyz;
float3 worldPosition = gBuffer.position.xyz;
Render pass descriptor color attachments
Funtid sutj bovtwejgil gehuv urdafckuncg

5. Combining the Two Render Passes

➤ Open TiledDeferredRenderPass.swift. In draw(commandBuffer:scene:uniforms:params:), change let descriptor = MTLRenderPassDescriptor() to:

let descriptor = viewCurrentRenderPassDescriptor
renderEncoder.endEncoding()

// MARK: Lighting pass
// Set up Lighting descriptor
guard let renderEncoder =
  commandBuffer.makeRenderCommandEncoder(
    descriptor: viewCurrentRenderPassDescriptor) else {
  return
}

6. Updating the Pipeline States

➤ Open Pipelines.swift. Add this code to both createSunLightPSO(colorPixelFormat:tiled:) and createPointLightPSO(colorPixelFormat:tiled:) after setting colorAttachments[0].pixelFormat:

if tiled {
  pipelineDescriptor.setGBufferPixelFormats()
}
if tiled {
  pipelineDescriptor.colorAttachments[0].pixelFormat
    = colorPixelFormat
}
A single render pass
U zohble duffeh donb

The final render
Tga jecob duqror

The final frame capture
Wxa xexiw ywoqe yeljitu

pointLights = Self.createPointLights(
  count: 40,
  min: [-3, 0.1, -5],
  max: [3, 0.3, 5])

Stencil Tests

The last step in completing your deferred rendering is to fix the sky. First, you’ll work on the Deferred render passes GBufferRenderPass and LightingRenderPass. Then you’ll work on the Tiled Deferred render pass as your challenge at the end of the chapter.

Stencil testing
Snohpem pirkagx

A stencil texture
O yriwqej pufyugo

Stencil Test Configuration

All fragments must pass both the depth and the stencil test that you configure to render.

1. The Comparison Function

When the rasterizer performs a stencil test, it compares a reference value with the value in the stencil texture using a comparison function. The reference value is zero by default, but you can change this in the render command encoder with setStencilReferenceValue(_:).

2. The Stencil Operation

Next, you set the stencil operations to perform on the stencil buffer. There are three possible results to configure:

3. The Read and Write Mask

There’s one more wrinkle. You can specify a read mask and a write mask. By default, these masks are 255 or 11111111 in binary. When you test a bit value against 1, the value doesn’t change.

Create the Stencil Texture

The stencil texture buffer is an extra 8-bit buffer attached to the depth texture buffer. You optionally configure it when you configure the depth buffer.

if !tiled {
  pipelineDescriptor.depthAttachmentPixelFormat
    = .depth32Float_stencil8
  pipelineDescriptor.stencilAttachmentPixelFormat
    = .depth32Float_stencil8
}
depthTexture = Self.makeTexture(
  size: size,
  pixelFormat: .depth32Float_stencil8,
  label: "Depth and Stencil Texture")
descriptor?.stencilAttachment.texture = depthTexture
descriptor?.stencilAttachment.storeAction = .store
New stencil texture
Non dnabdus viqjelo

Configure the Stencil Operation

➤ Open GBufferRenderPass.swift, and add this new method:

static func buildDepthStencilState() -> MTLDepthStencilState? {
  let descriptor = MTLDepthStencilDescriptor()
  descriptor.depthCompareFunction = .less
  descriptor.isDepthWriteEnabled = true
  return Renderer.device.makeDepthStencilState(
    descriptor: descriptor)
}
let frontFaceStencil = MTLStencilDescriptor()
frontFaceStencil.stencilCompareFunction = .always
frontFaceStencil.stencilFailureOperation = .keep  
frontFaceStencil.depthFailureOperation = .keep
frontFaceStencil.depthStencilPassOperation = .incrementClamp  
descriptor.frontFaceStencil = frontFaceStencil
The ground is rendered in front of the trees and sometimes fails the depth test
Jxi zsiids el wektulun iw jfolf eh ydu zbuin eqv fogayacab peitn tje zipyw sojl

models = [treefir1, treefir2, treefir3, train, ground]
models = [ground, treefir1, treefir2, treefir3, train]
Ground renders first
Kpuank tujpovd feknv

1. Passing in the Depth/Stencil Texture

➤ Open LightingRenderPass.swift, and add a new texture property to LightingRenderPass:

weak var stencilTexture: MTLTexture?
descriptor?.stencilAttachment.texture = stencilTexture
lightingRenderPass.stencilTexture = gBufferRenderPass.depthTexture

2. Setting Up the Render Pass Descriptor

➤ Open LightingRenderPass.swift. At the top of draw(commandBuffer:scene:uniforms:params:), add:

descriptor?.depthAttachment.texture = stencilTexture
descriptor?.stencilAttachment.loadAction = .load
descriptor?.depthAttachment.loadAction = .dontCare

3. Changing the Pipeline State Objects

➤ Open Pipelines.swift.

if !tiled {
  pipelineDescriptor.depthAttachmentPixelFormat
    = .depth32Float_stencil8
  pipelineDescriptor.stencilAttachmentPixelFormat
    = .depth32Float_stencil8
}
Stencil texture in frame capture
Zzejgok guhnusu um dxasi xuqcicu

Masking the Sky

When you render the quad in LightingRenderPass, you want to bypass all fragments that are zero in the stencil buffer.

let frontFaceStencil = MTLStencilDescriptor()
frontFaceStencil.stencilCompareFunction = .equal
frontFaceStencil.stencilFailureOperation = .keep
frontFaceStencil.depthFailureOperation = .keep
frontFaceStencil.depthStencilPassOperation = .keep
descriptor.frontFaceStencil = frontFaceStencil
A deliberate mistake
O wetalacoza biwxiwu

frontFaceStencil.stencilCompareFunction = .notEqual
Clear blue skies
Lvuik bsue bboib

Challenge

You fixed the sky for your Deferred Rendering pass. Your challenge is now to fix it in the Tiled Deferred render pass. Here’s a hint: just follow the steps for the Deferred render pass. If you have difficulties, the project in this chapter’s challenge folder has the answers.

Key Points

  • Tile-based deferred rendering takes advantage of Apple’s special GPUs.
  • Keeping data in tile memory rather than transferring to system memory is much more efficient and uses less power.
  • Mark textures as memoryless to keep them in tile memory.
  • While textures are in tile memory, combine render passes where possible.
  • Stencil tests let you set up masks where only fragments that pass your tests render.
  • When a fragment renders, the rasterizer performs your stencil operation and places the result in the stencil buffer. With this stencil buffer, you control which parts of your image renders.

Where to Go From Here?

Tile-based Deferred Rendering is an excellent solution for having many lights in a scene. You can optimize further by creating culled light lists per tile so that you don’t render any lights further back in the scene that aren’t necessary. Apple’s Modern Rendering with Metal 2019 video will help you understand how to do this. The video also points out when to use various rendering technologies.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now