To develop my main ray tracing renderer project, I have been working on a Macbook Air with an M2 chip. The renderer’s performance on the laptop has not been great, with frame rates in the decimals at fullscreen resolutions.
The M3 chip was introduced fairly recently (half a year ago at the time of writing) with support for ray tracing in the hardware. Anectodally, I have seen reports of 100 % improvements in performance. In this post, a simple ray tracing shader is executed on 4 different Macbooks to gauge what sort of ray tracing performance boost can be obtained in a home-made renderer.
The benchmark
Shader
To try to isolate the ray tracing activity, the benchmark program is a simple Metal renderer which only casts one ray per pixel per frame. The render pass renders a fullscreen quad to the screen, and for each pixel, casts a single ray into the scene. The albedo texture color of the ray’s intersection with the scene is used as the color output. All the textures are stored in a single buffer, and the texture lookup reads a single pixel from the texture at the UV coordinate without any filtering.
Here is the gist of the fragment shader.
|
|
Measurement
The fragment stage duration is measured in milliseconds using GPU counters. The method described in “Converting GPU Timestamps into CPU Time” is used to normalize the GPU timestamps into CPU time.
The median fragment stage time from the latest 100 frames is displayed in the UI.
Scene
The benchmark scene in use is the well-known Crytek Sponza scene, with 262267 triangles. The glTF model was sourced from the Khronos Group’s glTF sample assets repository.
Results
Occasional spikes in fragment stage duration were observed. The number was read from the UI once the frame rate had settled for roughly 100 frames.
Hardware | Median fragment stage duration |
---|---|
M2 | 25.20 ms (36.69 FPS) |
M3 | 12.66 ms (79.01 FPS) |
M1 Pro | 17.56 ms (56.93 FPS) |
M3 Pro | 10 ms - 5 ms (100 - 200 FPS) |
The fragment stage duration never settled to a stable number on the M3 Pro, and the results reported for the last 100 frames was difficult to read. The range reported is roughly equal to the lowest and highest frame rate that was visible. The benchmark ran on a colleagues computer, and there may have been something running in the background. Alternatively, the normalization of the GPU timestamps may not have been working exactly correctly and fluctuations in core frequency might yield inaccurate time stamps. Nevertheless, the upper limit of the fragment stage duration on the M3 Pro (10 ms) is a similar multiple (~2x) away from the baseline M2 performance. The 5 ms duration should probably be taken with a grain of salt.
In summary, it seems that the anecdotal evidence was accurate! 2x performance gains can be reaped by switching to an M3 chip, assuming that the ray tracer is not running divergent workloads in a loop in the shader.