HPC Usage¶
OpenImpala is designed for distributed-memory parallelism via MPI, making it suitable for large-scale simulations on HPC clusters.
Running with MPI¶
Python¶
# Install mpi4py
pip install openimpala mpi4py
# Run on 4 MPI ranks
mpirun -np 4 python my_script.py
C++ executable¶
mpirun -np 16 ./Diffusion3d inputs
Apptainer on a cluster¶
mpirun -np 16 apptainer exec openimpala-v4.0.0.sif /opt/OpenImpala/build/Diffusion3d inputs
SLURM batch script¶
#!/bin/bash
#SBATCH --job-name=openimpala
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --time=02:00:00
#SBATCH --partition=compute
module load mpi
srun apptainer exec openimpala-v4.0.0.sif \
/opt/OpenImpala/build/Diffusion3d inputs
Domain decomposition¶
AMReX decomposes the 3D domain into boxes distributed across MPI ranks. The
max_grid_size parameter controls the maximum box size:
amr.max_grid_size = 64
Smaller values create more boxes, improving load balance across many ranks
Larger values reduce inter-rank communication but may cause load imbalance
Choose a power of 2 that evenly divides your domain dimensions
Scaling guidelines¶
Domain size |
Recommended ranks |
max_grid_size |
|---|---|---|
128^3 |
1-4 |
64 |
256^3 |
4-16 |
64 |
512^3 |
16-64 |
64 |
1024^3 |
64-256 |
128 |
Memory estimates¶
Approximate memory per rank for a tortuosity solve:
Phase data: ~4 bytes/voxel (int32)
Solution field: ~8 bytes/voxel (float64)
HYPRE matrix: ~56 bytes/voxel (7-point stencil)
Total: ~70 bytes/voxel
For a 512^3 domain on 64 ranks: ~140 MB per rank.