This is a GPU game

"DOP" and "ECS architecture" are both aimed at squeezing the performance of the CPU. However, the development of CPU hardware cannot significantly improve single-core performance on a large scale, and the emphasis has shifted to increasing core count. In many practical applications, a slight increase in single-core frequency is more beneficial than increasing core count. While multithreading can alleviate the pressure on the main thread for many logics, it cannot handle the massive computational workload required for game logic and rendering.

To address this issue, the GPU, a hardware that excels in large-scale parallel computation, must take on a significant portion of CPU computational tasks. In our game, the GPU offloads traditional game computation tasks that would otherwise burden the CPU. Here are some examples:

  1. Dynamic Occlusion Culling: Objects outside the screen view do not need to be rendered. The GPU is used to filter which objects meet the rendering criteria. The process involves creating a GPU buffer for the positions and IDs of all scene objects, declaring an ID buffer for the filtered objects, and passing them to a Culling Compute Shader for GPU-based computation. This way, we obtain data after occlusion culling through GPU calculations.

  2. Skeletal Animation: With tens of thousands of animatable units, the CPU cannot handle the computational tasks. Therefore, we use a custom animation editor to generate our skeletal animation format. During runtime, we convert them into GPU buffers. In the final rendering step, we use real-time data to calculate skeletal positions and rotations in the shader. This approach achieves the same effect as CPU-based skeletal animation but with almost zero performance overhead.

  3. Rendering NFTs: Traditional rendering would require submitting a draw call for each object, resulting in a high draw call overhead. Through customization of the rendering process and resources, we managed to keep the draw call count to just over 100 for tens of thousands of different units in the game scene.

By optimizing the GPU's essential work of rendering, our game can smoothly present tens of thousands of distinct units simultaneously playing different animations and casting different skills while maintaining a seamless gaming experience.

Last updated