*************************
OpenMP Poisson Relaxation
*************************
Let's extend our 2d implementation of the Poisson solver to work with
OpenMP. Most of the work is in the smoothing / relaxation routine, so
we will tackle that first.
We did Gauss-Seidel relaxation -- this means that we use the new
values of :math:`\phi_{i,j}` as soon as they become available. If we
parallelize the loops using OpenMP, then there is a risk that while we
are updating a point on one thread another thread will be reading it
to use it in its update. This is a *race condition*. We need to
avoid these.
If we look at the 2-d update, it has the form:
.. math::
\phi_{i,j} = \frac{1}{2} \frac{\Delta x^2 \Delta y^2}{\Delta x^2 + \Delta y^2}
\left [ \frac{1}{\Delta x^2} (\phi_{i-1,j} + \phi_{i+1,j} ) +
\frac{1}{\Delta y^2} (\phi_{i,j-1} + \phi_{i,j+1} ) - f_{i,j} \right ]
If :math:`\Delta x = \Delta y`, then this simplifies to:
.. math::
\phi_{i,j} = \frac{1}{4}
\left [ \phi_{i-1,j} + \phi_{i+1,j} +
\phi_{i,j-1} + \phi_{i,j+1} - \Delta x^2 f_{i,j} \right ]
We see that the update of :math:`\phi_{i,j}` depends only on the
values above, below, and to the left and right. We can visualize this
as a checkerboard:
.. figure:: rb.png
:align: center
The update of the red points depends only on the black points and vice
versa. This means that we can split the update into two separate
passed -- first updating the red points and then the black. This is
called *red-black Gauss-Seidel*.
Here's what this code looks like:
.. code:: c++
double fac = dx * dx * dy * dy / (dx * dx + dy * dy);
double dx2i = 1.0 / (dx * dx);
double dy2i = 1.0 / (dy * dy);
for (int n = 0; n < n_smooth; ++n) {
#pragma omp parallel
{
#pragma omp for
for (int i = 1; i < nx-1; ++i) {
int offset = i % 2;
for (int j = 1 + offset; j < ny-1; j += 2) {
phi(i,j) = 0.5 * fac * (dx2i * (phi(i-1,j) + phi(i+1,j)) +
dy2i * (phi(i,j-1) + phi(i,j+1)) - f(i,j));
}
}
#pragma omp for
for (int i = 1; i < nx-1; ++i) {
int offset = (i + 1) % 2;
for (int j = 1 + offset; j < ny-1; j += 2) {
phi(i,j) = 0.5 * fac * (dx2i * (phi(i-1,j) + phi(i+1,j)) +
dy2i * (phi(i,j-1) + phi(i,j+1)) - f(i,j));
}
}
} // openmp
}
We have a single parallel region and then 2 separate parallel for
loops. There is an implicit "wait" after each loop, so we ensure that
all threads reach the end before going onto the next loop.
The only other routine that requires a little care is ``norm``, since
we have a reduction there, but we can handle this in the same way we
did previously.
Here's the full implementation:
.. literalinclude:: ../../examples/parallel/openmp/poisson/poisson2d.H
:language: c++
:caption: ``poisson2d.H``
and a driver:
.. literalinclude:: ../../examples/parallel/openmp/poisson/poisson.cpp
:language: c++
:caption: ``poisson.cpp``
We can compile this as:
.. prompt:: bash
g++ -fopenmp -o poisson poisson.cpp
On my computer (with 8 cores) running with :math:`128^2` gives:
.. table::
:align: center
=================== ========
``OMP_NUM_THREADS`` time (s)
=================== ========
1 20.710
2 10.421
4 5.709
8 2.951
=================== ========