HPC Usage

OpenImpala is designed for distributed-memory parallelism via MPI, making it suitable for large-scale simulations on HPC clusters.

Running with MPI

Python

# Install mpi4py
pip install openimpala mpi4py

# Run on 4 MPI ranks
mpirun -np 4 python my_script.py

C++ executable

mpirun -np 16 ./Diffusion3d inputs

Apptainer on a cluster

mpirun -np 16 apptainer exec openimpala-v4.0.0.sif /opt/OpenImpala/build/Diffusion3d inputs

SLURM batch script

#!/bin/bash
#SBATCH --job-name=openimpala
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --time=02:00:00
#SBATCH --partition=compute

module load mpi

srun apptainer exec openimpala-v4.0.0.sif \
    /opt/OpenImpala/build/Diffusion3d inputs

Domain decomposition

AMReX decomposes the 3D domain into boxes distributed across MPI ranks. The max_grid_size parameter controls the maximum box size:

amr.max_grid_size = 64
  • Smaller values create more boxes, improving load balance across many ranks

  • Larger values reduce inter-rank communication but may cause load imbalance

  • Choose a power of 2 that evenly divides your domain dimensions

Scaling guidelines

Domain size

Recommended ranks

max_grid_size

128^3

1-4

64

256^3

4-16

64

512^3

16-64

64

1024^3

64-256

128

Memory estimates

Approximate memory per rank for a tortuosity solve:

  • Phase data: ~4 bytes/voxel (int32)

  • Solution field: ~8 bytes/voxel (float64)

  • HYPRE matrix: ~56 bytes/voxel (7-point stencil)

  • Total: ~70 bytes/voxel

For a 512^3 domain on 64 ranks: ~140 MB per rank.