Tadeusz Tomczak, Roman Szafran Abstract We describe a high-performance implementation of the lattice Boltzmann method (LBM) for sparse geometries on graphic processors. In our implementation, we cover the whole geometry with a uniform mesh of small tiles and carry out calculations for each tile independently with proper data synchronization at the tile edges. For this method, we provide both a theoretical analysis of the complexity and the results for real implementations involving two-dimensional (2D) and three-dimensional (3D) geometries. Based on the theoretical model, we show that tiles offer significantly smaller bandwidth overheads than solutions based on indirect addressing. For 2D lattice arrangements, a reduction in memory usage is also possible, although at the cost of diminished performance. We achieved a performance of 682 MLUPS on GTX Titan (72 percent of peak theoretical memory bandwidth) for the D3Q19 lattice arrangement and double-precision data.
A free copy of a paper from arXiv.org |