Document Type : Research Paper

Authors

1 PhD Student, Department of Algorithm and Computations, University of Tehran

2 Department of Algorithm and Computations, Engineering Science, University of Tehran, Tehran, Iran

10.22059/jac.2025.387101.1219

Abstract

Improving efficiency of multi-level fast multi-pole algorithm (MLFMA) on distributed and parallel systems has been vastly studied, specially for GPUs. Unlike the far-field computation, acceleration of near-field computation in MLFMA algorithm on GPUs was of less concern in the literature, however there are some solutions that exploited special specifications of GPU’s memory. This article proposes data replication for P2P operator and uses analytical performance models to determine its optimality criteria. By modelling the speedup, we found that making threads independence by creating redundancy in the data makes the algorithm for lower dense problems nearly 13 times faster than non-redundant mode.

Keywords