mpich.git
25 hours agoMerge pull request #3789 from RozanovAnatoliy/mpi_comm_accept master
Pavan Balaji [Sat, 18 May 2019 21:22:19 +0000]
Merge pull request #3789 from RozanovAnatoliy/mpi_comm_accept

MPI_Comm_accept: Set port name to empty string in non-root ranks

Approved-by: Pavan Balaji <balaji@anl.gov>

2 days agoMerge pull request #3768 from jain-surabhi-23/intra_node_allreduce
Yanfei Guo [Fri, 17 May 2019 19:31:21 +0000]
Merge pull request #3768 from jain-surabhi-23/intra_node_allreduce

shm: Implement intra-node Allreduce using release_gather based framework

Approved by: Yanfei Guo <yguo@anl.gov>

2 days agoMPI_Comm_accept: Do not call get_tag in non-root ranks
Anatoliy Rozanov [Thu, 16 May 2019 08:45:28 +0000]
MPI_Comm_accept: Do not call get_tag in non-root ranks

According to the standard port_name is used only on root. We should not
call get_tag_from_port in non-root ranks as port_name can not be
initialized in these ranks.

3 days agoMerge pull request #3783 from hzhou/1905_dtpool_config
Hui Zhou [Thu, 16 May 2019 21:02:15 +0000]
Merge pull request #3783 from hzhou/1905_dtpool_config

testsuite: DTPool configure fixups

3 days agoMerge pull request #3790 from wesbland/pr_template
Ken Raffenetti [Thu, 16 May 2019 20:26:35 +0000]
Merge pull request #3790 from wesbland/pr_template

Add GitHub PR Template

3 days agoAdd GitHub PR Template
Wesley Bland [Thu, 16 May 2019 14:32:46 +0000]
Add GitHub PR Template

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
Signed-off-by: Hui Zhou <hzhou321@anl.gov>

3 days agotestsuite: remove extra `export dtp_args`
Hui Zhou [Thu, 16 May 2019 15:45:07 +0000]
testsuite: remove extra `export dtp_args`

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: fix tests to use MTestDefaultMaxBufferSize()
Hui Zhou [Thu, 16 May 2019 00:57:29 +0000]
testsuite: fix tests to use MTestDefaultMaxBufferSize()

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: add MTestDefaultMaxBufferSize() in mtest
Hui Zhou [Thu, 16 May 2019 00:55:18 +0000]
testsuite: add MTestDefaultMaxBufferSize() in mtest

It checks environment variable `MPITEST_MAXBUFFER`. Default maximum size
is 2GB.

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: remove dtp config option - maxbufsize
Hui Zhou [Tue, 14 May 2019 02:04:27 +0000]
testsuite: remove dtp config option - maxbufsize

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: mv dtp tests generation script to maint
Hui Zhou [Tue, 14 May 2019 01:56:30 +0000]
testsuite: mv dtp tests generation script to maint

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agoMerge pull request #3636 from gcongiu/fix-comm-split-type-bug
Giuseppe Congiu [Thu, 16 May 2019 18:57:32 +0000]
Merge pull request #3636 from gcongiu/fix-comm-split-type-bug

comm_split_type: rework node level split

3 days agocomm_split_type: rework node level split
Giuseppe Congiu [Thu, 28 Feb 2019 23:46:22 +0000]
comm_split_type: rework node level split

current node level split uses MPID_Allreduce to disseminate hwloc
bitmaps which are not flat objects. Thus only pointers values are copied
over to other processes not the pointed values.

This patch reworks the logic to get rid of bitmap calculations. Instead
of exchanging the bitmaps the new implementation exchanges the depths of
each subtree root and computes the min across all processes using
MPID_Allreduce. Afterwards, every process finds the corresponding root
at that min depth and exchanges with others its index using
MPID_Allgather. At this point every process has the list of subtree roots
indices and can go through them counting the number of processes under
each subtree.

Signed-off-by: Pavan Balaji <balaji@anl.gov>

3 days agoMerge pull request #3788 from hzhou/1905_gen_coll
Hui Zhou [Thu, 16 May 2019 15:28:49 +0000]
Merge pull request #3788 from hzhou/1905_gen_coll

testsuite: use `.` to source scripts

3 days agotestsuite: use . to source scripts
Hui Zhou [Thu, 16 May 2019 00:22:40 +0000]
testsuite: use . to source scripts

`source` in shell scripts is not portable.

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agoMerge pull request #3787 from hzhou/1905_test_cxx
Hui Zhou [Thu, 16 May 2019 00:26:34 +0000]
Merge pull request #3787 from hzhou/1905_test_cxx

testsuite: refactor mtest to share code between c and cxx tests

3 days agotestsuite: preparation to avoid code duplication in cxx
Hui Zhou [Sat, 11 May 2019 17:16:53 +0000]
testsuite: preparation to avoid code duplication in cxx

mtest.cxx differs from its C equivallent enough to warrant a separate
framework. This result in many code duplication. Every time we
add/modify features in mtest, we need duplicate code in both.

This commit adds `mtest_common.c` that will get linked to both c and cxx
tests, and to be used for common utility code shared between c and cxx
framework.

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agowhitespace-checker: modify file filters
Hui Zhou [Wed, 15 May 2019 22:33:55 +0000]
whitespace-checker: modify file filters

gnu indent does not work on C++. Add mpitestcxx.h to the filter.

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: use OBJEXT in Makefile for consistency and protability
Hui Zhou [Wed, 15 May 2019 22:10:25 +0000]
testsuite: use OBJEXT in Makefile for consistency and protability

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

3 days agotestsuite: remove non-applicable old comments
Hui Zhou [Wed, 15 May 2019 22:07:55 +0000]
testsuite: remove non-applicable old comments

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

4 days agotest: xfail coll algo tests using release_gather for intra-node
Surabhi Jain [Fri, 3 May 2019 14:26:00 +0000]
test: xfail coll algo tests using release_gather for intra-node

The algorithm is expected to fail since izem is not used by default.
This commit is a temporary measure until we decide what to do with
enabling izem by default or bring izem functionalities into OPA/MPL.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agotest: Add Allreduce tests for release_gather intra node algo
Surabhi Jain [Thu, 25 Apr 2019 18:11:04 +0000]
test: Add Allreduce tests for release_gather intra node algo

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agoposix: Implement shm allreduce using release_gather
Surabhi Jain [Thu, 3 Jan 2019 20:31:56 +0000]
posix: Implement shm allreduce using release_gather

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agoposix: Change release_gather framework for Allreduce
Surabhi Jain [Thu, 3 Jan 2019 20:30:36 +0000]
posix: Change release_gather framework for Allreduce

Make necessary changes to implement intra-node Allreduce using release
and gather building blocks

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agoch4: Add shm only composition for Allreduce
Surabhi Jain [Wed, 2 Jan 2019 20:26:21 +0000]
ch4: Add shm only composition for Allreduce

The newly added composition is chosen when all the ranks in a
communicator are on the same node and release_gather based intra-node
bcast and reduce is not selected for msg sizes larger than a threshold.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agoch4: Break deadlock while implementing shm allreduce
Surabhi Jain [Fri, 8 Mar 2019 20:34:55 +0000]
ch4: Break deadlock while implementing shm allreduce

Shared memory is created to implement release_gather based shared memory
collectives. And allreduce is called while creating the shared memory.
Hence a deadlock is created. It is broken by calling the mpi level
Allreduce when creating a shared memory segment.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

4 days agoMerge pull request #3740 from zhenggb72/tests_fix
Hui Zhou [Wed, 15 May 2019 02:19:30 +0000]
Merge pull request #3740 from zhenggb72/tests_fix

test/configure: Add error checking when test_coll_algos.sh fails

4 days agotest/configure: Add error checking when test_coll_algos.sh fails
Gengbin Zheng [Wed, 10 Apr 2019 19:28:34 +0000]
test/configure: Add error checking when test_coll_algos.sh fails

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

4 days agoMerge pull request #3786 from hzhou/1905_misc_fixups
Hui Zhou [Wed, 15 May 2019 00:50:29 +0000]
Merge pull request #3786 from hzhou/1905_misc_fixups

ucx: add ucp_rkey_destroy in MPIDI_UCX_mpi_win_free_hook

4 days agoucx: add ucp_rkey_destroy in MPIDI_UCX_mpi_win_free_hook
Hui Zhou [Mon, 13 May 2019 00:40:18 +0000]
ucx: add ucp_rkey_destroy in MPIDI_UCX_mpi_win_free_hook

We are supposed to destroy any rkeys allocated with
`ucp_ep_rkey_unpack`. Since ucx internally try to use `mpool` for
limited allocation, it may not always leak memory, but ucx will warn
about it -- unless we supress with UCX_LOG_LEVEL=error, which we do in
our Jenkins tests.

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

4 days agoMerge pull request #3785 from hzhou/1905_cvar_fix
Hui Zhou [Tue, 14 May 2019 23:35:02 +0000]
Merge pull request #3785 from hzhou/1905_cvar_fix

maint: perl script use single quote to protect string literal

4 days agomaint: perl script use single quote to protect string literal
Hui Zhou [Sat, 11 May 2019 22:50:40 +0000]
maint: perl script use single quote to protect string literal

Signed-off-by: Pavan Balaji <balaji@mcs.anl.gov>

5 days agoMerge pull request #3782 from pavanbalaji/pr/yaksa-prereq
Pavan Balaji [Tue, 14 May 2019 14:03:01 +0000]
Merge pull request #3782 from pavanbalaji/pr/yaksa-prereq

Datatype prereqs

5 days agodatatypes: remove unnecessary comments.
Pavan Balaji [Mon, 25 Mar 2019 15:30:23 +0000]
datatypes: remove unnecessary comments.

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

5 days agodatatypes: make unpack buffer const clean.
Pavan Balaji [Mon, 25 Mar 2019 15:30:23 +0000]
datatypes: make unpack buffer const clean.

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

5 days agoch3: MPIDI_CH3_EagerNoncontigSend does not need a data_sz argument
Pavan Balaji [Mon, 25 Mar 2019 15:30:22 +0000]
ch3: MPIDI_CH3_EagerNoncontigSend does not need a data_sz argument

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

5 days agoch3: get rid of buf_off variable.
Pavan Balaji [Mon, 25 Mar 2019 15:30:22 +0000]
ch3: get rid of buf_off variable.

We are overcomplicating the copy from one noncontiguous buffer to
another by trying to accommodate for the case where we pack X bytes,
but are unable to unpack those X bytes into an equivalent
noncontiguous buffer.  However, this case should not be possible, in
practice.  Specifically, the pack/unpack routines should always limit
how much data they copy to keep it atomic with respect to an MPI basic
datatype.  In practice, we keep the copies atomic to C datatypes (so
pairtypes are not atomic).  Either way, MPI requires the source and
destination datatype signatures to match, so the granularity of
packing/unpacking should always be the same.  Hence, this offset is
not needed.

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

5 days agoch4/ofi: remove useless variable
Pavan Balaji [Mon, 25 Mar 2019 15:30:20 +0000]
ch4/ofi: remove useless variable

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

5 days agomake CHKLMEM casting clean.
Pavan Balaji [Mon, 25 Mar 2019 15:30:19 +0000]
make CHKLMEM casting clean.

Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

9 days agoMerge pull request #3773 from hzhou/1904_add_werror
Hui Zhou [Fri, 10 May 2019 21:06:25 +0000]
Merge pull request #3773 from hzhou/1904_add_werror

warnings: implement warnings test and fix warnings

9 days agoruntests: allow testlist entries with path
Hui Zhou [Thu, 9 May 2019 20:20:19 +0000]
runtests: allow testlist entries with path

Enables entries like `coll/gather_big 8` directly in the top-level
testlist. This is to make it easy for writing custom CI tests.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarning-fix: gcc-8 -Wnonnull in mpidig_recv.h
Hui Zhou [Thu, 9 May 2019 21:22:41 +0000]
warning-fix: gcc-8 -Wnonnull in mpidig_recv.h

Add comments to clarify code; add assert to supress warnings.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarning-fix: icc: volatile meaningless in cast
Hui Zhou [Thu, 9 May 2019 02:13:28 +0000]
warning-fix: icc: volatile meaningless in cast

I agree with icc in this case.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: icc: enumerated type mixed with int
Hui Zhou [Thu, 9 May 2019 01:47:51 +0000]
warnings: icc: enumerated type mixed with int

Compilers have no way to check an int variable against the enumerated
values. Thus, having an variable in enum type gains no benefit.
If it is ever going to be used as an int, e.g. arithemetics, bit and,
bit or, just use int. The name of the variable should provide sufficient
hint.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: icc - variable may be used before set
Hui Zhou [Thu, 9 May 2019 00:49:57 +0000]
warnings: icc - variable may be used before set

The code is correct, but icc gets confused with the arithmetic logic.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: only add alignment attribute if necessary
Hui Zhou [Wed, 8 May 2019 16:44:31 +0000]
warnings: only add alignment attribute if necessary

OFI providers may require bigger alignment for `stuct iovec iov`, in
which case we need enforce the alignment. In the case of natural
alignment is already sufficient, it doesn't seem that we need *reduce*
the alignment. In fact, `icc` ver 2019 will warn in the case of
alignment reduction -- unless the struct is packed (which is not), the
reduction in alignment will be ignored.

The `struct iovec`'s natural alignment is at pointer size -- 8 in 64bit
case. The `MPIDI_OFI_IOVEC_ALIGN` defined in
`src/mpid/ch4/netmod/ofi/ofi_capability_sets.h` is set to 1 in all
cases, which means in most cases, we don't need additional alignment
overwrite or it will be an alignment reduction.

This patch removes the alignment attribute in case of reduction.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agorefactor: remove dead code
Hui Zhou [Wed, 8 May 2019 15:53:13 +0000]
refactor: remove dead code

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agorefactor: MPIDI_CH3_Pkt_flags_t flags -> int pkt_flags
Hui Zhou [Mon, 6 May 2019 01:22:56 +0000]
refactor: MPIDI_CH3_Pkt_flags_t flags -> int pkt_flags

Use `int` instead of `enum` to avoid `icc` warnings. We are using the
variable as int (with all the bit-operations) so type it as int makes
sense here. Naming variable more specific as `pkt_flags` makes the code
easier to follow.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: icc warning #810: cast void * to integer
Hui Zhou [Mon, 6 May 2019 00:35:08 +0000]
warnings: icc warning #810: cast void * to integer

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: misc
Hui Zhou [Sun, 5 May 2019 17:54:31 +0000]
warnings: misc

Mostly -Wformat-truncate, and other misc. It is warnings clean for
gcc-4, gcc-8, clang-3, clang-8 after this patch.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: fix gcc-4 -Wmaybe-uninitialized
Hui Zhou [Sun, 5 May 2019 17:50:30 +0000]
warnings: fix gcc-4 -Wmaybe-uninitialized

gcc-4 only. I don't know how to fix this one cleanly, so I simply
did a patch to preset the variables to zero.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: fix in ch4 ucx
Hui Zhou [Sun, 5 May 2019 17:43:37 +0000]
warnings: fix in ch4 ucx

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: fix in ch3
Hui Zhou [Sun, 5 May 2019 17:41:35 +0000]
warnings: fix in ch3

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: fix in coll iallreduce_tsp_tree_algos.h
Hui Zhou [Sun, 5 May 2019 17:26:13 +0000]
warnings: fix in coll iallreduce_tsp_tree_algos.h

gcc-8 -Wmaybe-uninitialized

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agowarnings: fix -Wshadow
Hui Zhou [Sun, 28 Apr 2019 21:52:40 +0000]
warnings: fix -Wshadow

`index` and `time` are global functions in libc, avoid such names for local
variables.

I also moved some of the temporary local variable declarations to where
it is being used. C99 allows it.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

9 days agoconfig: add --enable-strict=error
Hui Zhou [Sat, 27 Apr 2019 04:23:31 +0000]
config: add --enable-strict=error

If we are serious at squash warnings, then we need -Werror tests.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

10 days agoMerge pull request #3721 from pavanbalaji/pr/dtpools-2.0
Pavan Balaji [Thu, 9 May 2019 13:21:11 +0000]
Merge pull request #3721 from pavanbalaji/pr/dtpools-2.0

DTPools 2.0

10 days agotest/mpi: update tests for the new dtpools framework.
Giuseppe Congiu [Sat, 20 Apr 2019 18:24:34 +0000]
test/mpi: update tests for the new dtpools framework.

Co-Authored-by: Pavan Balaji <balaji@anl.gov>

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Wesley Bland <wesley.bland@intel.com>
Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

10 days agotest/mpi: revamped model for dtpools
Pavan Balaji [Fri, 12 Apr 2019 21:28:30 +0000]
test/mpi: revamped model for dtpools

This model allows for a much more richer set of datatype combinations.
The interface has changed substantially compared with the previous
version.

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Wesley Bland <wesley.bland@intel.com>
Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

10 days agotest/mpi: improve mtest argument parsing function
Giuseppe Congiu [Sat, 6 Apr 2019 20:04:34 +0000]
test/mpi: improve mtest argument parsing function

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Wesley Bland <wesley.bland@intel.com>
Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

10 days agotest/mpi: merge rma lock tests into a single source file
Giuseppe Congiu [Sat, 6 Apr 2019 17:46:54 +0000]
test/mpi: merge rma lock tests into a single source file

Co-Authored-by: Pavan Balaji <balaji@anl.gov>

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Wesley Bland <wesley.bland@intel.com>
Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

10 days agotest/mpi: generate dtpools testlist at configure time
Giuseppe Congiu [Tue, 2 Apr 2019 22:31:54 +0000]
test/mpi: generate dtpools testlist at configure time

Signed-off-by: Pavan Balaji <balaji@anl.gov>
Signed-off-by: Wesley Bland <wesley.bland@intel.com>
Signed-off-by: Giuseppe Congiu <gcongiu@anl.gov>

12 days agoMerge pull request #3628 from gcongiu/fix-node-level-split-in-cmsplit-type
Giuseppe Congiu [Tue, 7 May 2019 15:17:15 +0000]
Merge pull request #3628 from gcongiu/fix-node-level-split-in-cmsplit-type

comm_split_type: fix bug in node level split

12 days agocomm_split_type: fix bug in node level split
Giuseppe Congiu [Fri, 1 Mar 2019 16:14:48 +0000]
comm_split_type: fix bug in node level split

When splitting communicator by subcomm_min_size or min_memsize the
current comm_split_type code does a first split at the network level and
then, if subcomm_min_size is still smaller than the number of processes
in the node, it further splits the subcomm at the node level.

This is done by grouping processes into fine grain subtrees and then
recursively merging them together until the desired comm size is
reached. The number of processes in each subtree is accounted for by an
array (processes_cpuset[]) which, right node, is statically set to one
in the code, thus assigning one process to every subtree.

This patch fixes this bug by properly incrementing the number of
processes.

Signed-off-by: Pavan Balaji <balaji@anl.gov>

13 days agoMerge pull request #3778 from pavanbalaji/pr/ucx-gather-big
Pavan Balaji [Mon, 6 May 2019 18:42:20 +0000]
Merge pull request #3778 from pavanbalaji/pr/ucx-gather-big

test/mpi: xfail gather_big for UCX

13 days agotest/mpi: xfail gather_big for UCX.
Pavan Balaji [Sun, 5 May 2019 23:27:45 +0000]
test/mpi: xfail gather_big for UCX.

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

2 weeks agoMerge pull request #3776 from jeffhammond/hammond-better-autogen-warnings
Yanfei Guo [Thu, 2 May 2019 15:01:04 +0000]
Merge pull request #3776 from jeffhammond/hammond-better-autogen-warnings

add check for patch command

2 weeks agoadd check for patch command
Jeff Hammond [Wed, 1 May 2019 22:17:19 +0000]
add check for patch command

check for the existence of the Unix patch command, which used later.
this is a user experience improvement for people who have trouble
decoding error messages.

fixes #3774

Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agoMerge pull request #3727 from jain-surabhi-23/topology_aware_trees
Yanfei Guo [Wed, 1 May 2019 20:18:11 +0000]
Merge pull request #3727 from jain-surabhi-23/topology_aware_trees

shm: Intra-node topology aware trees

2 weeks agoshm: Create topology aware intra_node trees
Rashid Kaleem [Wed, 13 Mar 2019 22:17:30 +0000]
shm: Create topology aware intra_node trees

Creates collective specific trees which leverage the memory hierarchy of
a node. On a sample machine with multiple sockets and multiple cores per
socket, where each rank is bound to a core, a rank per socket is chosen
as the leader rank. For a bcast friendly tree, a socket leader interacts
with other socket leaders first, then with the ranks mapped on the
socket. For a reduce friendly tree, a socket leader interacts with ranks
within the same socket, the with other socket leaders. A left skewed
tree is created for bcast and right skewed tree (children are added in
the reverse order) for reduce.
Impact and benefits of these optimizations have been studied in the
Supercomputing 2018 paper - "Framework for scalable intra-node
collective operations using shared memory" by Jain et.al.
https://dl.acm.org/citation.cfm?id=3291695

Co-Authored-by: Surabhi Jain <surabhi.jain@intel.com>
Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agoch4: Avoid circular Bcast calls with shm collectives
Surabhi Jain [Fri, 22 Mar 2019 18:54:00 +0000]
ch4: Avoid circular Bcast calls with shm collectives

Break the strange recursion happening when this Bcast gets called which
triggers the SHM Bcast. Use MPIR_Bcast_impl instead.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agoch4: Avoid using netmod level Bcast from shm allocate
Surabhi Jain [Tue, 19 Mar 2019 22:49:06 +0000]
ch4: Avoid using netmod level Bcast from shm allocate

When SHM collectives and NM based collectives are both
enabled, break the circular calls between calling NM/SHM bcast from shm
allocate and calling shm allocate from SHM (release_gather) based bcast.
Hence, mpi level Bcast is called.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agoconfigure: Add AM_CONDITIONAL for HAVE_HWLOC
Surabhi Jain [Mon, 25 Mar 2019 20:52:18 +0000]
configure: Add AM_CONDITIONAL for HAVE_HWLOC

This is required to selectively exclude hwloc related files from a
Makefile

Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agohydra: Set env variable if user specified any binding
Surabhi Jain [Wed, 13 Mar 2019 22:15:27 +0000]
hydra: Set env variable if user specified any binding

Signed-off-by: Yanfei Guo <yguo@anl.gov>

2 weeks agoMerge pull request #3767 from hzhou/1904_Wmissing_braces
Hui Zhou [Tue, 30 Apr 2019 21:35:05 +0000]
Merge pull request #3767 from hzhou/1904_Wmissing_braces

warnings: remove zero initializer for static global storage

2 weeks agowarnings: use {} instead of {{0}} in global initializers.
Hui Zhou [Fri, 26 Apr 2019 19:43:43 +0000]
warnings: use {} instead of {{0}} in global initializers.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

2 weeks agowarnings: remove zero initializer for static global storage
Hui Zhou [Thu, 25 Apr 2019 02:30:11 +0000]
warnings: remove zero initializer for static global storage

Static global storage is automatically initialized to zero by C standard
since C89, so let's just remove all global 0 initializer.

Previously, these initializer, `{0}`, were changed to `{{0}}` because
compilers complains when the structure contains nested structure/union.
But these double braces are so weird and non-standard and the compilers
may still give warnings. I just encountered one in `src/mpid/ch4/netmod/
ucx/globals.c`

Both of these initializers are hacks anyway relying on the fact that
compillers set missing values to zero. `{0}` arguably has some merit of
readability only because its usage has been wide-spread; but `{{0}}` is
definitely weird and misleading.

Signed-off-by: Wesley Bland <wesley.bland@intel.com>

3 weeks agoMerge pull request #3764 from tarudoodi/algo-tag-fixes-for-consistency
Hui Zhou [Thu, 25 Apr 2019 04:21:20 +0000]
Merge pull request #3764 from tarudoodi/algo-tag-fixes-for-consistency

coll: tag fixes for consistency

3 weeks agocoll: Generate tag in reduce_scatter_block sched function
Akhil Langer [Thu, 11 Oct 2018 16:35:54 +0000]
coll: Generate tag in reduce_scatter_block sched function

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

3 weeks agocoll: Generate tag in reduce_scatter sched function
Akhil Langer [Thu, 11 Oct 2018 16:31:55 +0000]
coll: Generate tag in reduce_scatter sched function

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

3 weeks agocoll: Generate tag in allgatherv sched function
Akhil Langer [Thu, 11 Oct 2018 16:24:46 +0000]
coll: Generate tag in allgatherv sched function

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

3 weeks agocoll: Generate tag in allreduce sched function
Akhil Langer [Wed, 10 Oct 2018 19:34:02 +0000]
coll: Generate tag in allreduce sched function

To be consistent with the rest of the algorithms. In
all the algorithms, we will get the tag in the sched function.

Signed-off-by: Hui Zhou <hzhou321@anl.gov>

3 weeks agoMerge pull request #3728 from raffenet/ch4-pmi2
Ken Raffenetti [Wed, 24 Apr 2019 18:02:17 +0000]
Merge pull request #3728 from raffenet/ch4-pmi2

ch4: Fix PMI includes

3 weeks agoch4: Fix PMI includes
Ken Raffenetti [Mon, 8 Apr 2019 14:44:26 +0000]
ch4: Fix PMI includes

Include the correct PMI header in mpidimpl.h for use in the ch4
device.  Fixes build issues with pmi2/simple and pmix.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoMerge pull request #3490 from jain-surabhi-23/intra_node_colls
Yanfei Guo [Wed, 24 Apr 2019 16:09:50 +0000]
Merge pull request #3490 from jain-surabhi-23/intra_node_colls

ch4/posix: shared memory based intra-node collectives

3 weeks agotest: xfail coll algo tests using release_gather for intra-node
Yanfei Guo [Tue, 23 Apr 2019 18:03:24 +0000]
test: xfail coll algo tests using release_gather for intra-node

The algorithm is expected to fail since izem is not used by default.
This commit is a temporary measure until we decide what to do with
enabling izem by default or bring izem functionalities into OPA/MPL.

No reviewer.

3 weeks agotest: Add bcast, reduce tests for newly added CVARS
Surabhi Jain [Fri, 7 Sep 2018 22:23:32 +0000]
test: Add bcast, reduce tests for newly added CVARS

Run a few bcast and reduce tests by varying the CVARS for multiple
buffer sizes and type, radix of trees.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Implement shm reduce using release, gather
Surabhi Jain [Tue, 12 Jun 2018 16:36:59 +0000]
posix: Implement shm reduce using release, gather

Intra-node reduce is implemented using release step followed by gather
step. Data movement takes place in gather (bottom-up step) in the tree.
Release (top-down) step is used for acknowledgement. Root notifies the
non-roots that the data was reduced and copied out of its reduce buffer.
Hence, children ranks can reuse the reduce buffer for next reduce call.
There is a reduce shm buffer per rank, as each rank contributes data in
reduce. Each buffer is split into multiple cells, so the copying in of
the next chunk by children can be overlapped with reduce and copy out by
the parent rank for the previous cells (pipelining). Large messages are
split into chunks of cell size each and pipelining is used.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Implement shm bcast using release, gather
Surabhi Jain [Tue, 31 Jul 2018 18:35:02 +0000]
posix: Implement shm bcast using release, gather

Intra-node bcast is implemented using release step followed by gather
step. Data movement takes place in release (top-down step) in the tree.
Gather (bottom-up step) is used for acknowledgement. Non-roots notify
the root that the data was copied out of shared bcast buffer and root
can reuse the buffer for next bcast call. Bcast buffer is split into
multiple cells, so that the copying in of the next chunk by root can be
overlapped with copying out of previous chunks by non-roots
(pipelining). Large messages are split into chunks of cell size each and
pipelining is used.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Set up release, gather based infrastructure
Surabhi Jain [Fri, 31 Aug 2018 20:42:30 +0000]
posix: Set up release, gather based infrastructure

Implement the release and gather building blocks which will be used to
implement intra-node bcast and intra-node reduce. Shared memory is
created per communicator, which is used to place the data to be
broadcasted, the data which is to be redued, and flags to update the
children or parent in the tree. Release is top-down step in tree, while
gather is bottom-up step. A shared limit counter is implemented to track
and limit the amount of shared memory created per node for optimized
intra-node collectives.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Reuse POSIX global data structures in fbox
Surabhi Jain [Thu, 13 Dec 2018 21:19:29 +0000]
posix: Reuse POSIX global data structures in fbox

The global data structures can be reused by posix level intra-node
collectives as well

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Add collective algorithm CVARS at shm level
Surabhi Jain [Fri, 30 Nov 2018 17:49:02 +0000]
posix: Add collective algorithm CVARS at shm level

Give user ability to choose an algorithm for intra-node bcast, reduce
Also set up infrastructure for posix_coll_init and posix_coll_finalize

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoposix: Fix mpi_errno in posix_init
Surabhi Jain [Thu, 29 Nov 2018 17:15:36 +0000]
posix: Fix mpi_errno in posix_init

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agomaint/extracterrmsgs: Allow errflag creation in func
Surabhi Jain [Mon, 17 Dec 2018 21:10:46 +0000]
maint/extracterrmsgs: Allow errflag creation in func

This change allows to create errflag in a function and propagate it
further. Needed for init and finalize calls which don't have errflag
passed to them.

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoerr: Add new err for noizem
Surabhi Jain [Fri, 30 Nov 2018 22:11:52 +0000]
err: Add new err for noizem

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoconfigure: Add AM_CONDITIONAL for izem_atomic
Surabhi Jain [Tue, 28 Aug 2018 22:09:03 +0000]
configure: Add AM_CONDITIONAL for izem_atomic

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agocoll: Namespacing change in MPII Treealgo tree
Surabhi Jain [Thu, 13 Dec 2018 21:42:42 +0000]
coll: Namespacing change in MPII Treealgo tree

Changing the related functions and data structures prefix to MPIR
so that it could be used from device

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agocoll: Namespacing change in calculate_chunking_info
Surabhi Jain [Fri, 11 May 2018 18:41:51 +0000]
coll: Namespacing change in calculate_chunking_info

Change MPII to MPIR so that it could be used from device
Inlining fixes the linking error in fortran tests using gcc, debug mode when this
function is used from posix

Signed-off-by: Yanfei Guo <yguo@anl.gov>

3 weeks agoMerge pull request #3133 from hajimefu/ofi-am-reordering-pub
Ken Raffenetti [Tue, 23 Apr 2019 18:11:38 +0000]
Merge pull request #3133 from hajimefu/ofi-am-reordering-pub

ch4/ofi: Implement reorder logic in AM transport

3 weeks agoch4/ofi: Implement reorder logic in AM transport
Hajime Fujita [Tue, 8 May 2018 19:15:41 +0000]
ch4/ofi: Implement reorder logic in AM transport

OFI provider does not necessarily guarantee that messages complete
in order when multiple buffers are posted with the FI_MULTI_RECV flag.

Therefore OFI netmod needs to implement its own logic to detect
message order inversion and reorder the messages.

This patch adds a sequence number field to the AM patcket header.
Both sender and receiver keep track of the next sequence number
to send/receive. If the receiver detects a leap in the sequence
number, it stores that early-arrived message into a queue for
deferred processing. That message will be processed once all
preceding messages have arrived.

Fixes csr/mpich-ofi#966
Fixes csr/mpich-ofi#992
Fixes csr/mpich-ofi#1078
Fixes csr/mpich-ofi#1084

Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>