Commit b5104374 authored by Jony Castagna's avatar Jony Castagna

Merge branch 'patch-5' into 'dl_meso_dpd_MultiGPU'

Update readme.rst

See merge request castagna/E-CAM-Library!4
parents 3b0d81f3 25d5ae1b
......@@ -3,7 +3,7 @@ image: cloudcompass/docker-rtdsphinx
spelling:
script:
- pip3 install codespell
- codespell --skip=".git,_static,_build,Diff*,*.patch" --quiet-level=2 --ignore-words-list="adress"
- codespell --skip=".git,_static,_build,Diff*,*.patch,*.f90" --quiet-level=2 --ignore-words-list="adress,catalogue"
only:
- master
- merge_requests
......@@ -13,7 +13,7 @@ orphans:
# Report all the orphans but ignore the exit code
- find ./ -name "*.rst"|xargs -i grep -H orphan {} || true
# Now handle the error code
- if [ $(find ./ -name "*.rst"|xargs -i grep -H orphan {}|wc -l) -gt "1" ]; then $(exit 1); else $(exit 0); fi
- if [ $(find ./ -name "*.rst"|xargs -i grep -H orphan {}|wc -l) -gt "2" ]; then $(exit 1); else $(exit 0); fi
only:
- master
......
......@@ -244,4 +244,13 @@ August 2017. The following modules have been produced:
./modules/OpenPathSampling/ops_plumed_wrapper/readme
./modules/OpenPathSampling/ops_s_shooting/readme
The third ESDW for the Classical MD workpackage was held in Turin, Italy in July
2018. The following have been produced as a result:
.. toctree::
:glob:
:maxdepth: 1
./modules/HTC/decorators/readme
.. _E-CAM: https://www.e-cam2020.eu/
.. In ReStructured Text (ReST) indentation and spacing are very important (it is how ReST knows what to do with your
document). For ReST to understand what you intend and to render it correctly please to keep the structure of this
template. Make sure that any time you use ReST syntax (such as for ".. sidebar::" below), it needs to be preceded
and followed by white space (if you see warnings when this file is built they this is a common origin for problems).
.. Firstly, let's add technical info as a sidebar and allow text below to wrap around it. This list is a work in
progress, please help us improve it. We use *definition lists* of ReST_ to make this readable.
.. sidebar:: Software Technical Information
Name
``jobqueue_features``
Language
Python
Licence
`MIT <https://opensource.org/licenses/mit-license>`_
Documentation Tool
In-source documentation
Application Documentation
Not currently available.. Example usage provided.
Relevant Training Material
Not currently available.
Software Module Developed by
Adam Włodarczyk (Wrocław Centre of Networking and Supercomputing),
Alan O'Cais (Juelich Supercomputing Centre)
.. In the next line you have the name of how this module will be referenced in the main documentation (which you can
reference, in this case, as ":ref:`example`"). You *MUST* change the reference below from "example" to something
unique otherwise you will cause cross-referencing errors. The reference must come right before the heading for the
reference to work (so don't insert a comment between).
.. _htc:
#######################################
E-CAM High Throughput Computing Library
#######################################
.. Let's add a local table of contents to help people navigate the page
.. contents:: :local:
.. Add an abstract for a *general* audience here. Write a few lines that explains the "helicopter view" of why you are
creating this module. For example, you might say that "This module is a stepping stone to incorporating XXXX effects
into YYYY process, which in turn should allow ZZZZ to be simulated. If successful, this could make it possible to
produce compound AAAA while avoiding expensive process BBBB and CCCC."
E-CAM is interested in the challenge
of bridging timescales. To study molecular dynamics with atomistic detail, timesteps must be used on
the order of a femtosecond. Many problems in biological chemistry, materials science, and other
fields involve events that only spontaneously occur after a millisecond or longer (for example,
biomolecular conformational changes, or nucleation processes). That means that around :math:`10^{12}` time
steps would be needed to see a single millisecond-scale event. This is the problem of "rare
events" in theoretical and computational chemistry.
Modern supercomputers are beginning to make it
possible to obtain trajectories long enough to observe some of these processes, but to fully
characterize a transition with proper statistics, many examples are needed. In order to obtain many
examples the same application must be run many thousands of times with varying inputs. To manage
this kind of computation a task scheduling high throughput computing (HTC) library is needed. The main elements of mentioned
scheduling library are: task definition, task scheduling and task execution.
While traditionally an HTC workload is looked down upon in the HPC
space, the scientific use case for extreme-scale resources exists and algorithms that require a
coordinated approach make efficient libraries that implement
this approach increasingly important in the HPC space. The 5 Petaflop booster technology of `JURECA <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/JURECA_node.html>`_
is an interesting concept with respect to this approach since the offloading approach of heavy
computation marries perfectly to the concept outlined here.
Purpose of Module
_________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
This module is the first in a sequence that will form the overall capabilities of the library. In particular this module
deals with creating a set of decorators to wrap around the `Dask-Jobqueue <https://jobqueue.dask.org/en/latest/>`_
Python library, which aspires to make the development time cost of leveraging it lower for our use cases.
Background Information
______________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
The initial motivation for this library is driven by the ensemble-type calculations that are required in many scientific
fields, and in particular in the materials science domain in which the E-CAM Centre of Excellence operates. The scope
for parallelisation is best contextualised by the `Dask <https://dask.org/>`_ documentation:
A common approach to parallel execution in user-space is task scheduling. In task scheduling we break our program
into many medium-sized tasks or units of computation, often a function call on a non-trivial amount of data. We
represent these tasks as nodes in a graph with edges between nodes if one task depends on data produced by another.
We call upon a task scheduler to execute this graph in a way that respects these data dependencies and leverages
parallelism where possible, multiple independent tasks can be run simultaneously.
Many solutions exist. This is a common approach in parallel execution frameworks. Often task scheduling logic hides
within other larger frameworks (Luigi, Storm, Spark, IPython Parallel, and so on) and so is often reinvented.
Dask is a specification that encodes task schedules with minimal incidental complexity using terms common to all
Python projects, namely dicts, tuples, and callables. Ideally this minimum solution is easy to adopt and understand
by a broad community.
While we were attracted by this approach, Dask did not support *task-level* parallelisation (in particular
multi-node tasks). We researched other options (including Celery, PyCOMPSs, IPyParallel and others) and organised a
workshop that explored some of these (see https://www.cecam.org/workshop-0-1650.html for further details).
Building and Testing
____________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
The library is a Python module and can be installed with
::
python setup.py install
More details about how to install a Python package can be found at, for example, `Install Python packages on the
research computing systems at IU <https://kb.iu.edu/d/acey>`_
To run the tests for the decorators within the library, you need the ``pytest`` Python package. You can run all the
relevant tests from the ``jobqueue_features`` directory with
::
pytest tests/test_decorators.py
Examples of usage can be found in the ``examples`` directory.
Source Code
___________
The latest version of the library is available on the `jobqueue_features GitHub repository
<https://github.com/E-CAM/jobqueue_features>`_, the file specific to this module
is `decorators.py <https://github.com/E-CAM/jobqueue_features/blob/master/jobqueue_features/decorators.py>`_.
(The code that was originally created for this module can be seen in the specific commit `4590a0e427112f
<https://gitlab.e-cam2020.eu/adam/jobqueue_features/tree/4590a0e427112fbf51edff6116e34c90e765baf0>`_
which can be found in the original private repository of the code.)
......@@ -25,9 +25,10 @@ OPS-based modules
Licence
LGPL, v. 2.1 or later
.. contents:: :local:
Software module developed by
YOUR NAME(S) HERE
Authors: Alan O'Cais
.. contents:: :local:
This is the template for an E-CAM *module* based on OpenPathSampling (OPS). Several
sections are already pre-filled with the details of OPS. Please fill out the
......@@ -66,18 +67,19 @@ reading:
Testing
_______
Tests in OpenPathSampling use the `nose`_ package.
Tests in OpenPathSampling use `pytest`_.
.. IF YOUR MODULE IS IN OPS CORE:
.. This module has been included in the OpenPathSampling core. Its tests can
.. be run by setting up a developer install of OpenPathSampling and running
.. the command ``nosetests`` from the root directory of the repository.
.. be run by installing pytest and OPS (with commit ????????, which will be
.. part of release ??? and later), and running the command ``py.test
.. --pyargs openpathsampling``.
.. IF YOUR MODULE IS IN A SEPARATE REPOSITORY
.. The tests for this module can be run by downloading its source code,
.. installing its requirements, and running the command ``nosetests`` from the
.. installing its requirements, and running the command ``py.test`` from the
.. root directory of the repository.
Examples
......@@ -105,5 +107,5 @@ ___________
.. Here are the URL references used
.. _nose: http://nose.readthedocs.io/en/latest/
.. _pytest: http://pytest.org/
.. sidebar:: Software Technical Information
Name
Weigthed Linear Ridge Regression
Weighted Linear Ridge Regression
Language
C
......@@ -19,7 +19,7 @@
Francesco Fracchia
################################
Weigthed Linear Ridge Regression
Weighted Linear Ridge Regression
################################
.. contents:: :local:
......
......@@ -44,8 +44,8 @@ together with the partner and typically are to facilitate or improve the scope o
partner. The related code development for the pilot projects are open source (where the licence of the underlying
software allows this) and are described in the modules associated with the pilot projects.
Extended Software Development Workshops
=======================================
Software related to Extended Software Development Workshops
===========================================================
DL_MESO_DPD
-----------
......@@ -61,9 +61,11 @@ The following modules connected to the DL_MESO_DPD code have been produced so fa
./modules/DL_MESO_DPD/dipole_af_dlmeso_dpd/readme
./modules/DL_MESO_DPD/moldip_af_dlmeso_dpd/readme
./modules/DL_MESO_DPD_onGPU/add_gpu_version/readme
./modules/DL_MESO_DPD_onGPU/fftw/readme
./modules/DL_MESO_DPD/check_dlmeso_dpd/readme
./modules/DL_MESO_DPD/tetra_dlmeso_dpd/readme
./modules/DL_MESO_DPD_onGPU/multi_gpu/readme
./modules/DL_MESO_DPD/sionlib_dlmeso_dpd/readme
ESPResSo++
----------
......@@ -101,29 +103,64 @@ The following modules connected to the ParaDiS code have been produced so far:
:glob:
:maxdepth: 1
./modules/paradis_precipitate/paradis_precipitate_GC/readme.rst
./modules/paradis_precipitate/paradis_precipitate_HPC/readme.rst
./modules/paradis_precipitate/paradis_precipitate_GC/readme
./modules/paradis_precipitate/paradis_precipitate_HPC/readme
ESDW Barcelona 2017
-------------------
The first Meso- and Multi-scale ESDW was held in Barcelona, Spain, in July 2017. The following modules have been produced:
GC-AdResS
---------
This modules are connected to the Adaptive Resolution Simulation implementation in GROMACS.
.. toctree::
:glob:
:maxdepth: 1
./modules/DL_MESO_DPD/sionlib_dlmeso_dpd/readme
GC-AdResS
---------
Adaptive Resolution Simulation: Implementation in GROMACS
./modules/GC-AdResS/Abrupt_AdResS/readme
./modules/GC-AdResS/AdResS_RDF/readme
./modules/GC-AdResS/Abrupt_Adress_forcecap/readme
./modules/GC-AdResS/AdResS_TF/readme
.. _ALL_background:
ALL (A Load-balancing Library)
------------------------------
Most modern parallelized (classical) particle simulation programs are based on a spatial decomposition method as an
underlying parallel algorithm: different processors administrate different spatial regions of the simulation domain and
keep track of those particles that are located in their respective region. Processors exchange information
* in order to compute interactions between particles located on different processors
* to exchange particles that have moved to a region administrated by a different processor.
This implies that the workload of a given processor is very much determined by its number of particles, or, more
precisely, by the number of interactions that are to be evaluated within its spatial region.
Certain systems of high physical and practical interest (e.g. condensing fluids) dynamically develop into a state where
the distribution of particles becomes spatially inhomogeneous. Unless special care is being taken, this results in a
substantially inhomogeneous distribution of the processors’ workload. Since the work usually has to be synchronized
between the processors, the runtime is determined by the slowest processor (i.e. the one with highest workload). In the
extreme case, this means that a large fraction of the processors is idle during these waiting times. This problem
becomes particularly severe if one aims at strong scaling, where the number of processors is increased at constant
problem size: Every processor administrates smaller and smaller regions and therefore inhomogeneities will become more
and more pronounced. This will eventually saturate the scalability of a given problem, already at a processor number
that is still so small that communication overhead remains negligible.
The solution to this problem is the inclusion of dynamic load balancing techniques. These methods redistribute the
workload among the processors, by lowering the load of the most busy cores and enhancing the load of the most idle ones.
Fortunately, several successful techniques are known already to put this strategy into practice. Nevertheless, dynamic
load balancing that is both efficient and widely applicable implies highly non-trivial coding work. Therefore it has has
not yet been implemented in a number of important codes of the E-CAM community, e.g. DL_Meso, DL_Poly, Espresso,
Espresso++, to name a few. Other codes (e.g. LAMMPS) have implemented somewhat simpler schemes, which however might turn
out to lack sufficient flexibility to accommodate all important cases. Therefore, the ALL library was created in the
context of an Extended Software Development Workshop (ESDW) within E-CAM (see `ALL ESDW event details <https://www.e-cam2020.eu/legacy_event/extended-software-development-workshop-for-atomistic-meso-and-multiscale-methods-on-hpc-systems/>`_
), where code developers of CECAM community codes were invited together with E-CAM postdocs, to work on the
implementation of load balancing strategies. The goal of this activity was to increase the scalability of these
applications to a larger number of cores on HPC systems, for spatially inhomogeneous systems, and thus to reduce the
time-to-solution of the applications.
.. toctree::
:glob:
:maxdepth: 1
./modules/GC-AdResS/Abrupt_AdResS/readme.rst
./modules/GC-AdResS/Abrupt_AdResS/abrupt_adress.rst
./modules/ALL_library/tensor_method/readme
.. In ReStructured Text (ReST) indentation and spacing are very important (it is how ReST knows what to do with your
document). For ReST to understand what you intend and to render it correctly please to keep the structure of this
template. Make sure that any time you use ReST syntax (such as for ".. sidebar::" below), it needs to be preceded
and followed by white space (if you see warnings when this file is built they this is a common origin for problems).
.. We allow the template to be standalone, so that the library maintainers add it in the right place
:orphan:
.. Firstly, let's add technical info as a sidebar and allow text below to wrap around it. This list is a work in
progress, please help us improve it. We use *definition lists* of ReST_ to make this readable.
.. sidebar:: Software Technical Information
Name
A Load Balancing Library (ALL)
Language
C++, Fortran interfaces available
Licence
`BSD 3-Clause <https://choosealicense.com/licenses/bsd-3-clause/>`_
Documentation Tool
No tool used in source code, repo documentation written in `Markdown <https://en.wikipedia.org/wiki/Markdown>`_
Application Documentation
See `ALL repository <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing>`_
Relevant Training Material
None available
Software Module Developed by
Rene Halver
.. In the next line you have the name of how this module will be referenced in the main documentation (which you can
reference, in this case, as ":ref:`ALL_example`"). You *MUST* change the reference below from "ALL_method_example"
to something unique otherwise you will cause cross-referencing errors. The reference must come right before the
heading for the reference to work (so don't insert a comment between).
.. _ALL_method_example:
###############################
E-CAM example ALL method module
###############################
.. Let's add a local table of contents to help people navigate the page
.. contents:: :local:
.. Add an abstract for a *general* audience here. Write a few lines that explains the "helicopter view" of why this
module was are created.
The A Load Balancing Library (ALL) library aims to provide an easy way to include dynamic domain-based load balancing
into particle based simulation codes. The library is developed in the Simulation Laboratory Molecular Systems of the
Juelich Supercomputing Centre at Forschungszentrum Juelich.
Purpose of Module
_________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
This module provides an additional method to the ALL library, up-to-date descriptions of the methods in the library can
be found in the `ALL README file <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing/blob/master/README.md>`_.
**Take the method description from the README and reproduce it here**
Background Information
______________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
See :ref:`ALL_background` for details.
Building and Testing
____________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
ALL uses the `CMake <https://cmake.org/runningcmake/>`_ build system, specific build and installation requirements can
be found in the `ALL README file <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing/blob/master/README.md>`_.
**Need to provide information on how to test the particular method here.**
Source Code
___________
.. Notice the syntax of a URL reference below `Text <URL>`_ the backticks matter!
**Here link the source code *that is relevant for the module*. If you are using Github or GitLab and the `Gitflow
Workflow <https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow>`_ you can point to your feature
branch. Linking to your pull/merge requests is even better. Otherwise you can link to the explicit commits or locations
in the source code.**
.. _ReST: http://www.sphinx-doc.org/en/stable/rest.html
.. _Sphinx: http://www.sphinx-doc.org/en/stable/markup/index.html
.. In ReStructured Text (ReST) indentation and spacing are very important (it is how ReST knows what to do with your
document). For ReST to understand what you intend and to render it correctly please to keep the structure of this
template. Make sure that any time you use ReST syntax (such as for ".. sidebar::" below), it needs to be preceded
and followed by white space (if you see warnings when this file is built they this is a common origin for problems).
.. We allow the template to be standalone, so that the library maintainers add it in the right place
.. Firstly, let's add technical info as a sidebar and allow text below to wrap around it. This list is a work in
progress, please help us improve it. We use *definition lists* of ReST_ to make this readable.
.. sidebar:: Software Technical Information
Name
A Load Balancing Library (ALL)
Language
C++, Fortran interfaces available
Licence
`BSD 3-Clause <https://choosealicense.com/licenses/bsd-3-clause/>`_
Documentation Tool
No tool used in source code, repo documentation written in `Markdown <https://en.wikipedia.org/wiki/Markdown>`_
Application Documentation
See `ALL repository <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing>`_
Relevant Training Material
None available
Software Module Developed by
Rene Halver
.. In the next line you have the name of how this module will be referenced in the main documentation (which you can
reference, in this case, as ":ref:`ALL_example`"). You *MUST* change the reference below from "ALL_method_example"
to something unique otherwise you will cause cross-referencing errors. The reference must come right before the
heading for the reference to work (so don't insert a comment between).
.. _ALL_tensor_method:
#########################
ALL Tensor-Product method
#########################
.. Let's add a local table of contents to help people navigate the page
.. contents:: :local:
.. Add an abstract for a *general* audience here. Write a few lines that explains the "helicopter view" of why this
module was are created.
The A Load Balancing Library (ALL) library aims to provide an easy way to include dynamic domain-based load balancing
into particle based simulation codes. The library is developed in the Simulation Laboratory Molecular Systems of the
Juelich Supercomputing Centre at Forschungszentrum Juelich.
Purpose of Module
_________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
This module provides an additional method to the ALL library, up-to-date descriptions of the methods in the library can
be found in the `ALL README file <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing/blob/master/README.md>`_.
For the Tensor-Product method, the work on all processes is reduced over the cartesian planes in the systems. This work
is then equalized by adjusting the borders of the cartesian planes.
Background Information
______________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
See :ref:`ALL_background` for details.
Building and Testing
____________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
ALL uses the `CMake <https://cmake.org/runningcmake/>`_ build system, specific build and installation requirements can
be found in the `ALL README file <https://gitlab.version.fz-juelich.de/SLMS/loadbalancing/blob/master/README.md>`_.
**Need to provide information on how to test the particular method here.**
Source Code
___________
.. Notice the syntax of a URL reference below `Text <URL>`_ the backticks matter!
**Here link the source code *that is relevant for the module*. If you are using Github or GitLab and the `Gitflow
Workflow <https://www.atlassian.com/git/tutorials/comparing-workflows#gitflow-workflow>`_ you can point to your feature
branch. Linking to your pull/merge requests is even better. Otherwise you can link to the explicit commits or locations
in the source code.**
.. _ReST: http://www.sphinx-doc.org/en/stable/rest.html
.. _Sphinx: http://www.sphinx-doc.org/en/stable/markup/index.html
.. In ReStructured Text (ReST) indentation and spacing are very important (it is how ReST knows what to do with your
document). For ReST to understand what you intend and to render it correctly please to keep the structure of this
template. Make sure that any time you use ReST syntax (such as for ".. sidebar::" below), it needs to be preceded
and followed by white space (if you see warnings when this file is built they this is a common origin for problems).
.. Firstly, let's add technical info as a sidebar and allow text below to wrap around it. This list is a work in
progress, please help us improve it. We use *definition lists* of ReST_ to make this readable.
.. sidebar:: Software Technical Information
Name
DL_MESO (DPD).
Language
Fortran, CUDA-C.
Licence
`BSD <https://opensource.org/licenses/BSD-2-Clause>`_, v. 2.7 or later
Documentation Tool
ReST files
Application Documentation
See the `DL_MESO Manual <http://www.scd.stfc.ac.uk/SCD/resources/PDF/USRMAN.pdf>`_
Relevant Training Material
See `DL_MESO webpage <http://www.scd.stfc.ac.uk/SCD/support/40694.aspx>`_
Software Module Developed by
Jony Castagna
.. In the next line you have the name of how this module will be referenced in the main documentation (which you can
reference, in this case, as ":ref:`example`"). You *MUST* change the reference below from "example" to something
unique otherwise you will cause cross-referencing errors. The reference must come right before the heading for the
reference to work (so don't insert a comment between).
.. _dl_meso_dpd_gpu_fftw:
#################################
SPME on DL_MESO_DPD (GPU version)
#################################
.. Let's add a local table of contents to help people navigate the page
.. contents:: :local:
.. Add an abstract for a *general* audience here. Write a few lines that explains the "helicopter view" of why you are
creating this module. For example, you might say that "This module is a stepping stone to incorporating XXXX effects
into YYYY process, which in turn should allow ZZZZ to be simulated. If successful, this could make it possible to
produce compound AAAA while avoiding expensive process BBBB and CCCC."
The electrostatic force calculation usually represents the main computational costs in systems where even a small amount of charged particles are present (>1%).
The Smooth Particle Mesh Ewald [SPME]_ splits the electrostatic forces in two parts: a short range, solved in the real space, and a long range, solved in the Fourier space.
An error weight function combines the two contributions. For the long range force the electrical charges are spread on a virtual particle mesh using a B-spline interpolation function.
Porting the full short and long range interactions to GPUs allowed us to achieve a speedup factor of 4x when compared to a traditional 12-core Intel CPU.
One of the main applications which includes electrical charges are the simulations of plasma.
Purpose of Module
_________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
The Ewald summation method scales with :math:`N^{1.5}` at best, where N is the number of charged particles. The SPME method allows for improved scaling, :math:`N*log(N)`,
but requires a stencil domain decomposition (i.e. decomposing the domain along one direction only) to allow the FFTW library to scale with more than 1 core.
If this is not used, as in the current master version of DL\_MESO\_DPD, FFTW rapidly becomes a bottleneck for scaling across several nodes.
On the other hand, the porting to a single GPU does not need domain decomposition and the same speedup factor (4x compared to 12-core Intel) is maintained.
Background Information
______________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
This module is part of the DL\_MESO\_DPD code. Full support and documentation is available at:
* https://www.scd.stfc.ac.uk/Pages/DL_MESO.aspx
* https://www.scd.stfc.ac.uk/Pages/USRMAN.pdf
To download the DL\_MESO\_DPD code you need to register at https://gitlab.stfc.ac.uk/dl_meso/dl_meso.
Please contact Dr. Micheal Seaton at Daresbury Laboratory (STFC) for further details.
Building and Testing
____________________
.. Keep the helper text below around in your module by just adding ".. " in front of it, which turns it into a comment
The DL\_MESO code is developed using git version control. Currently the GPU version is under a branch named "add\_gpu\_version". After downloading the code, checkout the GPU branch and look into the "DPD/gpu\_version" folder, i.e:
* git clone DL_MESO_repository_path
* cd dl_meso
* git checkout gpu_version
* cd /DPD/gpu_version
* make all
To compile and run the code you need to have installed the CUDA-toolkit and have a CUDA enabled GPU device (see http://docs.nvidia.com/cuda/#axzz4ZPtFifjw).
To run the case, compile the code using the "make all" command from the "bin" directory, copy the "FIELD" and "CONTROL" files in this directory and run "./dpd_gpu.exe".
Source Code
___________
.. Notice the syntax of a URL reference below `Text <URL>`_ the backticks matter!
This module has been merged into DL_MESO code. It is composed of the
following commits (you need to be register as developer):
* https://gitlab.stfc.ac.uk/dl_meso/dl_meso/commit/34a652fe62cadbac5e8a037b57ee9be64dcf4187
.. [SPME] J. Chem. Phys. 103, 8577 (1995)
.. _ReST: http://www.sphinx-doc.org/en/stable/rest.html
.. _Sphinx: http://www.sphinx-doc.org/en/stable/markup/index.html
......@@ -25,64 +25,65 @@ Multi-GPU version of DL_MESO_DPD
Authors: Jony Castagna
This module implements the first version of the DL_MESO_DPD code multi NVidia Graphical Processing Unit (GPU). More details about it can be found in the following sections.
This module implements the first version of the D\_MESO\_DPD code with multiple NVidia Graphical Processing Units (GPUs). More details about it can be found in the following sections.
Purpose of Module
_________________
.. Give a brief overview of why the module is/was being created.
In this module the main framework of a multi-GPU version of the DL_MESO_DPD code has been developed. The exchange of data between GPUs overlaps with the computation of the forces
for the internal cells of each partition (a domain decomposition approach based on the MPI parallel version of DL_MESO_DPD has been followed).
In this module the main framework of a multi-GPU version of the DL\_MESO\_DPD code has been developed. The exchange of data between GPUs overlaps with the computation of the forces
for the internal cells of each partition (a domain decomposition approach based on the MPI parallel version of DL\_MESO\_DPD has been followed).
The current implementation is a proof of concept only and relies on slow transfers of data from the GPU to the host and vice-versa. Faster implementations will be explored in future modules.
In particular, the transfer of data occurs in3 steps: x-y planes first, x-z planes with halo data (i.e. the values which will fill the ghost cells) from
previous swapand finally the y-z planes with all halos. This avoid the problems of the corner cells, which usually requires a separatecommunication
reducing the number of send/receive calls from 14 to 6.The multi-GPU version has been currently tested with 8 GPUs and successfully reproduced same results of a
singleGPU within machine accuracy resolution.
In particular, the transfer of data occurs in 3 steps: x-y planes first, x-z planes with halo data (i.e. the values which will fill the ghost cells) from
the previous swap and finally the y-z planes with all halos. This avoid the problems of the corner cells, which usually requires a separate communication
reducing the number of send/receive calls from 14 to 6.The multi-GPU version has been currently tested with 8 GPUs and successfully reproduce the same results as a
single GPU within machine accuracy resolution.
Future plans include benchmarking of the code with different data transfer implementations other than the current (trivial) GPU-host-GPU transfer mechanism.
These are: of Peer To Peer communication within a node, CUDA-aware MPI and CUDA-aware MPI with Direct Remote Memory Access (DRMA).
These are: of Peer To Peer communication within a node, CUDA-aware MPI, and CUDA-aware MPI with Direct Remote Memory Access (DRMA).
.. references would be nice here...