mpich-bgq.git
6 years agoTrac #659: Remove ticket #595 circumvention BGQ/IBM_V1R2M0 master BGQ/IBM_V1R2M0/efix19
Bob Cernohous [Thu, 21 Mar 2013 16:58:55 +0000]
Trac #659: Remove ticket #595 circumvention

6 years ago Issue 9416: Update scatter[v] protocol selection
Bob Cernohous [Mon, 11 Mar 2013 23:17:03 +0000]
 Issue 9416: Update scatter[v] protocol selection

Do not tie scatterv to scatter selection.

6 years agoIssue 9416: Fix M2M protocol selection
Bob Cernohous [Mon, 11 Mar 2013 22:24:14 +0000]
Issue 9416: Fix M2M protocol selection

6 years agoNeed to read some environment variables before creating the client
Bob Cernohous [Mon, 11 Mar 2013 17:16:38 +0000]
Need to read some environment variables before creating the client

6 years agoIssue 9136: Implement a simple PAMID_NUMREQUESTS for async flow control
Bob Cernohous [Thu, 7 Mar 2013 15:23:05 +0000]
Issue 9136: Implement a simple PAMID_NUMREQUESTS for async flow control

6 years agoUse MPIDO and not MPIR internally in PAMID
Bob Cernohous [Thu, 7 Mar 2013 20:21:02 +0000]
Use MPIDO and not MPIR internally in PAMID

6 years agoUpdates from PE Master and review
Bob Cernohous [Thu, 28 Feb 2013 18:16:07 +0000]
Updates from PE Master and review

After review:
-Use PAMI_GEOMETRY_NULL
-Update comments and range checks
-Fix scatter range check
-Comment out [v] range checks

6 years agoIssue 9356: Use new configuration options. BGQ/IBM_V1R2M0/efix15
Bob Cernohous [Thu, 14 Feb 2013 19:37:36 +0000]
Issue 9356: Use new configuration options.
- PAMI_CLIENT_NONCONTIG
- PAMI_CLIENT_MEMORY_OPTIMIZE
- PAMI_GEOMETRY_NONCONTIG
- PAMI_GEOMETRY_MEMORY_OPTIMIZE

PAMID_COLLECTIVES_MEMORY_OPTIMIZED will set the appropriate configuration options.

6 years agoIssue 9208: Issue 9208: Change PAMI_COLLECTIVES_MEMORY_OPTIMIZED to PAMID_COLLECTIVES... BGQ/IBM_V1R2M0/efix09
Bob Cernohous [Thu, 7 Feb 2013 16:21:02 +0000]
Issue 9208: Issue 9208: Change PAMI_COLLECTIVES_MEMORY_OPTIMIZED to PAMID_COLLECTIVES_MEMORY_OPTIMIZED

6 years agoCPS 92XKPE: Implement MPIX_Cart_comm_create and MPIX_Pset_* functions.
Michael Blocksome [Thu, 3 Jan 2013 21:56:03 +0000]
CPS 92XKPE: Implement MPIX_Cart_comm_create and MPIX_Pset_* functions.

The following MPIX functions were also provided for BG/P mpich:

-> MPIX_Pset_same_comm_create
-> MPIX_Pset_diff_comm_create
-> MPIX_Cart_comm_create

The following MPIX function are new for BG/Q mpich:

-> MPIX_Pset_same_comm_create_from_parent
-> MPIX_Pset_diff_comm_create_from_parent
-> MPIX_Pset_io_node

See issue 9231 for more information.

6 years agoIssue 9295: Fix optgather flag processing
Bob Cernohous [Thu, 24 Jan 2013 21:36:33 +0000]
Issue 9295: Fix optgather flag processing

6 years agoIssue 9136: Simple support for async flow control metadata
Bob Cernohous [Wed, 23 Jan 2013 23:44:43 +0000]
Issue 9136: Simple support for async flow control metadata

6 years agoTrac #657: Switch RankBased to SequenceBased
Bob Cernohous [Wed, 23 Jan 2013 23:20:23 +0000]
Trac #657: Switch RankBased to SequenceBased

6 years agoIssue 9136: Update MPI_IN_PLACE support after D188059/D188060 fixes
Bob Cernohous [Wed, 23 Jan 2013 23:11:29 +0000]
Issue 9136: Update MPI_IN_PLACE support after D188059/D188060 fixes

6 years agoD188060: Fix BGQ compile errors
Bob Cernohous [Mon, 21 Jan 2013 19:28:22 +0000]
D188060: Fix BGQ compile errors

mpido_allgather.c:497: error: 'snd_data_contig' may be used uninitialized in this function

mpido_scatterv.c:411: error: 'recv_true_lb' may be used uninitialized in this function

Signed-off-by: sssharka <sssharka@us.ibm.com>

6 years agoD188060: 7Z8: multi mpi core with collective selection enabled
sssharka [Fri, 18 Jan 2013 21:53:11 +0000]
D188060: 7Z8: multi mpi core with collective selection enabled
  - Adding support for PAMI_IN_PLACE in collective selection path

Signed-off-by: Bob Cernohous <bobc@us.ibm.com>

6 years agoTrac #636: Allgather(v) optimizations to use allreduce double sum instead of integer...
Sameer Kumar [Wed, 16 Jan 2013 10:09:51 +0000]
Trac #636: Allgather(v) optimizations to use allreduce double sum instead of integer BOR. Only
int32, int64, float and double can take advantage of this optimization.

6 years agoTrac #652: MPI_Allgather glue protocol updates.
Bob Cernohous [Fri, 11 Jan 2013 17:02:44 +0000]
Trac #652: MPI_Allgather glue protocol updates.

6 years agoIssue 9185: Add PAMID_COLLECTIVE_REDUCE=GLUE_ALLREDUCE
Bob Cernohous [Fri, 7 Dec 2012 21:10:56 +0000]
Issue 9185: Add PAMID_COLLECTIVE_REDUCE=GLUE_ALLREDUCE

6 years agoMPICH2 Changeset 10771: Updated Context ID Exhaustion Check
Bob Cernohous [Wed, 19 Dec 2012 17:58:03 +0000]
MPICH2 Changeset 10771: Updated Context ID Exhaustion Check

https://trac.mpich.org/projects/mpich/changeset/10771
https://trac.mpich.org/projects/mpich/ticket/1768

Updated the context ID exhaustion check with a more versatile test that can
detect all of the errors detected by the old scheme, as well as detecting when
context ID allocation will not succeed because there is no common context ID
across the processes performing allocation. This fixes ticket #1768.

6 years agoIssue 9208: PAMI_COLLECTIVES_MEMORY_OPTIMIZED does not optimize irregular communicators
Bob Cernohous [Thu, 13 Dec 2012 20:03:40 +0000]
Issue 9208: PAMI_COLLECTIVES_MEMORY_OPTIMIZED does not optimize irregular communicators

6 years agoIssue 8756: Do optimized selection even if first protocol list is empty BGQ/IBM_V1R2M0/GA
Bob Cernohous [Wed, 5 Dec 2012 23:56:02 +0000]
Issue 8756: Do optimized selection even if first protocol list is empty

6 years agoIssue 9159: Remove misleading verbose output
Bob Cernohous [Wed, 5 Dec 2012 21:47:32 +0000]
Issue 9159: Remove misleading verbose output

6 years agoIssue 9159: More thorough check for optimized bcast protocol
Bob Cernohous [Mon, 3 Dec 2012 18:14:34 +0000]
Issue 9159: More thorough check for optimized bcast protocol

6 years agoIssue 9159: Metadata glue fixes
Bob Cernohous [Mon, 3 Dec 2012 17:59:48 +0000]
Issue 9159: Metadata glue fixes

6 years agoIssue 9102: set bgq default eager limits for intra- and inter-node to 4097.
Michael Blocksome [Thu, 29 Nov 2012 19:42:14 +0000]
Issue 9102: set bgq default eager limits for intra- and inter-node to 4097.

6 years agoIssue 9136: Support MPI_IN_PLACE metadata
Bob Cernohous [Tue, 27 Nov 2012 22:56:39 +0000]
Issue 9136: Support MPI_IN_PLACE metadata

6 years agoRevert "Trac #636:Disable optimized allgatherv"
Bob Cernohous [Tue, 27 Nov 2012 21:59:10 +0000]
Revert "Trac #636:Disable optimized allgatherv"

This reverts commit 4aa0587ffd9e6246bdf48723c234d17c1844612c.

6 years agoRevert "Ticket #627: Disable optimized scan's with MPI_IN_PLACE"
Bob Cernohous [Tue, 27 Nov 2012 21:58:51 +0000]
Revert "Ticket #627: Disable optimized scan's with MPI_IN_PLACE"

This reverts commit aebe9f4892d1a2f8d568cebcf4d47270c8bc06bc.

Conflicts:

mpich2/src/mpid/pamid/src/coll/scan/mpido_scan.c

6 years agoRevert "Ticket #632: Disable optimized alltoall[v]'s with MPI_IN_PLACE"
Bob Cernohous [Tue, 27 Nov 2012 21:54:18 +0000]
Revert "Ticket #632: Disable optimized alltoall[v]'s with MPI_IN_PLACE"

This reverts commit bbb81a6003d69df30231d0a69da7dfceb156e4f0.

6 years agoFix a bug in test cases:MPI_Datatype handle should be committed before it is used
Qi QC Zhang [Wed, 28 Nov 2012 14:39:23 +0000]
Fix a bug in test cases:MPI_Datatype handle should be committed before it is used

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoRevert "Add a new function MPIR_Datatype_iscommitted which will be used in ROMIO...
Michael Blocksome [Tue, 27 Nov 2012 21:48:01 +0000]
Revert "Add a new function MPIR_Datatype_iscommitted which will be used in ROMIO layer"

This reverts commit cc44c219f36d2540488cc8794425064773d4d287.

For some unknown reason, this commit has a negative impact on torus
point-to-point latency. This can be safely reverted since the feature
introduced in this commit is not neccessary for V1R2M0.

6 years agoIssue 8597: Fix double allreduce when there is no cached protocol
Bob Cernohous [Tue, 27 Nov 2012 17:17:35 +0000]
Issue 8597: Fix double allreduce when there is no cached protocol

6 years agoIssue 8597: Use optimized protocol for more supported dt/op combinations
Bob Cernohous [Mon, 26 Nov 2012 22:18:12 +0000]
Issue 8597: Use optimized protocol for more supported dt/op combinations

6 years agoIssue 8597: Reformat mpido_allreduce
Bob Cernohous [Mon, 26 Nov 2012 21:55:44 +0000]
Issue 8597: Reformat mpido_allreduce

6 years agoIssue 8756: Glue updates for metadata changes
Bob Cernohous [Mon, 26 Nov 2012 20:39:32 +0000]
Issue 8756: Glue updates for metadata changes

6 years agoIssue 8783: Update PAMID with new bcast metadata changes
Bob Cernohous [Wed, 21 Nov 2012 20:22:32 +0000]
Issue 8783: Update PAMID with new bcast metadata changes

6 years agoIssue 8178: Updated fix for ANL ticket 1645,reduces memory usage of that fix
Bob Cernohous [Mon, 15 Oct 2012 00:36:47 +0000]
Issue 8178: Updated fix for ANL ticket 1645,reduces memory usage of that fix

Decrease the new pool limit from
      - 8192K (1024 elements/block x 8192 blocks)
    to
      - 2048K (256 elements/block x 8192 blocks)

Note before ticket 1645 (in V1R1M2) the pool limit was
      - 256K (256 elements/block x 1024 blocks)

So this is an increase over V1R1M2 but a decrease from other (v1.5) MPICH
platforms.

I intentionally did not increase the elements/block at this time due to an
earlier attempt's effect on gpaw -- an increment of 1024 new communicators
in a pool was much too large.

So although the total pool limit is increased (over V1R1M2) it will still only
grow by 256 elements at a time (same as V1R1M2).

While this limits the maximum pool size, we're unlikely on BGQ to support such
large pools with our memory limitations.

6 years agoAdd missing copyright.
Michael Blocksome [Tue, 20 Nov 2012 17:44:20 +0000]
Add missing copyright.

6 years agoAdd a new function MPIR_Datatype_iscommitted which will be used in ROMIO layer
Qi QC Zhang [Mon, 19 Nov 2012 06:52:54 +0000]
Add a new function MPIR_Datatype_iscommitted which will be used in ROMIO layer
to check whether a given datatype is committed.

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoAdd hooks for optimized MPIX_* non-blocking collectives.
Michael Blocksome [Mon, 5 Nov 2012 18:57:43 +0000]
Add hooks for optimized MPIX_* non-blocking collectives.

The following collectives are updated:

  MPIX_Ibcast
  MPIX_Iallgather
  MPIX_Iallgatherv
  MPIX_Iallreduce
  MPIX_Ialltoall
  MPIX_Ialltoallv
  MPIX_Ialltoallw
  MPIX_Iexscan
  MPIX_Igather
  MPIX_Igatherv
  MPIX_Ireduce_scatter_block
  MPIX_Ireduce_scatter
  MPIX_Ireduce
  MPIX_Iscan
  MPIX_Iscatter
  MPIX_Iscatterv

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoAdd missing Iallreduce function pointer.
Michael Blocksome [Fri, 2 Nov 2012 19:49:51 +0000]
Add missing Iallreduce function pointer.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoEnable optimized MPIX_Ibarrier.
Michael Blocksome [Thu, 1 Nov 2012 18:58:13 +0000]
Enable optimized MPIX_Ibarrier.

The previous MPIR_Ibarrier_impl() function forced all adi implementations
to create a MPID_Sched_t opaque object which was then passed in to the
specific ibarrier implementation via a function pointer table.

The MPID_Sched_t object represents a completely new state machine that
must be advanced whenever mpi progress is made.

The required construction of the MPID_Sched_t object and the required
advance of the schedule state machine would be extremely detrimental to
pamid performance.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoComment out bug in the nbc schedule progress code.
Michael Blocksome [Fri, 2 Nov 2012 18:49:59 +0000]
Comment out bug in the nbc schedule progress code.

I'm not sure why, but this line of code causes the MPID_Request 'kind'
field to be set to zero, which is an invalid value and trip an assert in
MPI_Wait().

Need to discuss this with ANL.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoEnable MPIR_* non-blocking collectives implementation
Michael Blocksome [Fri, 2 Nov 2012 18:45:11 +0000]
Enable MPIR_* non-blocking collectives implementation

If the environment variable 'PAMID_MPIR_NBC' is set to non-zero then a
pami work function is posted to context 0 which will invoke the schedule
progress function.

By default, MPIR_* non-blocking collectives are disabled in order to
avoid impacting the performance of other MPI operations.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoFix BVT build break on 2012-11-15
Qi QC Zhang [Fri, 16 Nov 2012 08:34:57 +0000]
Fix BVT build break on 2012-11-15

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agomake sure only one task do the delete work in ADIO_Close
Qi QC Zhang [Thu, 11 Oct 2012 08:16:17 +0000]
make sure only one task do the delete work in ADIO_Close

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoupdate error checkings in ADIO
Qi QC Zhang [Thu, 11 Oct 2012 05:04:54 +0000]
update error checkings in ADIO

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agohandle ENOMEM errors in ADIO
Qi QC Zhang [Thu, 11 Oct 2012 05:35:11 +0000]
handle ENOMEM errors in ADIO

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoChange error class in MPIO_CHECK_COUNT from MPI_ERR_ARG to MPI_ERR_COUNT
Qi QC Zhang [Thu, 11 Oct 2012 07:13:00 +0000]
Change error class in MPIO_CHECK_COUNT from MPI_ERR_ARG to MPI_ERR_COUNT

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoadd missing checks on parameters of IO routines
Qi QC Zhang [Tue, 9 Oct 2012 13:45:18 +0000]
add missing checks on parameters of IO routines

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agomake error class be consistent with PE MPI while opening file with inconsistent amdoe...
Qi QC Zhang [Mon, 15 Oct 2012 12:52:11 +0000]
make error class be consistent with PE MPI while opening file with inconsistent amdoes for different task

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoFix coredump at MPI_Comm_split due to invalid MPI_Comm handler
Qi QC Zhang [Sat, 29 Sep 2012 07:54:06 +0000]
Fix coredump at MPI_Comm_split due to invalid MPI_Comm handler

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoAdd error checks in MPI_Type_create_darray; update MPIR_ERRTEST_ARGNONPOS macro
Qi QC Zhang [Fri, 28 Sep 2012 07:18:11 +0000]
Add error checks in MPI_Type_create_darray; update MPIR_ERRTEST_ARGNONPOS macro

This is a combination of 2 commits.

The first commit's message is:
  check rank and the product of array_of_psizes in MPI_Type_create_darray

This is the 2nd commit message:
  Added a parameter err_class to macro MPIR_ERRTEST_ARGNONPOS

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agopami coredump at _lapi_shm_amsend
Qi QC Zhang [Fri, 28 Sep 2012 05:07:38 +0000]
pami coredump at _lapi_shm_amsend

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agochange error class for function error in MPI_File_create_errhandler
Qi QC Zhang [Thu, 11 Oct 2012 07:56:36 +0000]
change error class for function error in MPI_File_create_errhandler

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoupdate count in status for adio when filetype_size is zero
Qi QC Zhang [Fri, 28 Sep 2012 04:48:04 +0000]
update count in status for adio when filetype_size is zero

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoIssue 8863: Split large broadcasts into smaller broadcasts
Bob Cernohous [Wed, 17 Oct 2012 21:27:39 +0000]
Issue 8863: Split large broadcasts into smaller broadcasts

Also little streamlining/cleanup of collectives including:
- ndebug changes to remove verbose logging
- use local const variables to cache pointer references
- likely/unlikely code patch changes

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoD183554: Rh62qdr: 7 MPI-COM error injection cases core dump with MPICH2
Qi QC Zhang [Fri, 29 Jun 2012 05:25:55 +0000]
D183554: Rh62qdr: 7 MPI-COM error injection cases core dump with MPICH2

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoError handling in PAMId:added two macros MPIU_ERR_CHKORASSERT and MPIU_ERR_CHKORASSERT1
Qi QC Zhang [Thu, 28 Jun 2012 05:43:56 +0000]
Error handling in PAMId:added two macros MPIU_ERR_CHKORASSERT and MPIU_ERR_CHKORASSERT1

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoD187240: mpich2 don't support mpc_statistics_write/zero in mpi program
Su Huang [Fri, 9 Nov 2012 16:21:01 +0000]
D187240: mpich2 don't support mpc_statistics_write/zero in mpi program

 * Charles:  Build for bluegene, no impact to bluegene, committing
   and signing off without BG team approval.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoD187228: Fix build break in BG
Su Huang [Wed, 7 Nov 2012 17:51:24 +0000]
D187228: Fix build break in BG

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoF182398: Fix build break
Su Huang [Mon, 5 Nov 2012 13:25:42 +0000]
F182398: Fix build break

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoD186242: 7Z6: Data integrity error w/ mpich2 MPI_Reduce & MPI_Allreduce
Qi QC Zhang [Wed, 17 Oct 2012 05:11:35 +0000]
D186242: 7Z6: Data integrity error w/ mpich2 MPI_Reduce & MPI_Allreduce
 * Put MPIDI_Request_setPeerRank_pami under OUT_OF_ORDER_HANDLING

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoF182398: Memory management and token flow control for early arrivals
Su Huang [Thu, 25 Oct 2012 14:09:03 +0000]
F182398: Memory management and token flow control for early arrivals
 * D187119: Check in the fixes from code review - memory management and token flow control
 * Fix some white space issues

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoRevert "Issue 8841: Add GPFS_SUPER_MAGIC as a default for BGLOCKLESSMPIO_F_TYPE"
Bob Cernohous [Mon, 29 Oct 2012 20:14:21 +0000]
Revert "Issue 8841: Add GPFS_SUPER_MAGIC as a default for BGLOCKLESSMPIO_F_TYPE"

ad_bglockless does not support shared file pointers.  So we will not make it the default
protocol at this time for GPFS.  It may still be enabled for GPFS with environment variable:

--envs BGLOCKLESSMPIO_F_TYPE=0x47504653

This reverts commit 2acf652d7522ddb0d96496826898e947f4a51803.

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoAdded cross file for 32 bit
Charles Archer [Mon, 29 Oct 2012 20:35:40 +0000]
Added cross file for 32 bit

Signed-off-by: Bob Cernohous <bobc@us.ibm.com>

6 years agoD187003: MPI_File_get_position_share fail in mpich2
Charles Archer [Fri, 26 Oct 2012 00:43:58 +0000]
D187003:  MPI_File_get_position_share fail in mpich2

Variable could be passed into io routines uninitialized.
A file read was supposed to write the current values into the uninit
variable, but under certain circumstances the read would be 0
bytes, leading to the read call returning 0 and leaving the value
uninitialized.

Simple fix is to always init to zero. This preserves the original
meaning of a 0 byte read.

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoIssue 8879: fix bug in the 'is_local_task' extension.
Michael Blocksome [Tue, 23 Oct 2012 14:36:51 +0000]
Issue 8879: fix bug in the 'is_local_task' extension.

The extension was reporting a non-zero value when the specified task is
local, but the src/mpid/pamid/src/pt2pt/mpidi_sendmsg.c is expecting a
boolean value to be used in the point-to-point eager limit lookup.

On bgq, the reported value was '64' which caused the eager limit lookup
to go off the end of the array.

The pe 'shift value' is 0 which will result in the expression
(1UL << 0) and should be optimized out by the compiler.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoIssue 8178: V1R1M2 debug
Bob Cernohous [Sat, 8 Sep 2012 16:55:49 +0000]
Issue 8178: V1R1M2 debug

Signed-off-by: sssharka <sssharka@us.ibm.com>

6 years agoIssue 8805: Handle count 0 allreduce
Bob Cernohous [Fri, 12 Oct 2012 21:55:48 +0000]
Issue 8805: Handle count 0 allreduce

Signed-off-by: sssharka <sssharka@us.ibm.com>

6 years agoIssue 8841: Add GPFS_SUPER_MAGIC as a default for BGLOCKLESSMPIO_F_TYPE
Bob Cernohous [Sun, 14 Oct 2012 23:49:51 +0000]
Issue 8841: Add GPFS_SUPER_MAGIC as a default for BGLOCKLESSMPIO_F_TYPE

Signed-off-by: sssharka <sssharka@us.ibm.com>

6 years agoIssue 6292: romio ad_bg bug fix for uninitialized struct
Michael Blocksome [Tue, 16 Oct 2012 16:34:18 +0000]
Issue 6292: romio ad_bg bug fix for uninitialized struct

When the communicator size is 1 the pset processing is skipped and the
proc structure is to be initialized with appropriate values.

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agofix the build - replace malloc/free by MPIU_Malloc/MPIU_Free
Su Huang [Tue, 16 Oct 2012 17:21:44 +0000]
fix the build - replace malloc/free by MPIU_Malloc/MPIU_Free

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoattempt to fix pe compile error.
Michael Blocksome [Mon, 15 Oct 2012 19:46:47 +0000]
attempt to fix pe compile error.

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoNew environment varariable: 'PAMID_DISABLE_INTERNAL_EAGER_TASK_LIMIT'
Michael Blocksome [Fri, 12 Oct 2012 14:53:13 +0000]
New environment varariable: 'PAMID_DISABLE_INTERNAL_EAGER_TASK_LIMIT'

This new environment variable overrides the default task limit at
which point the internal eager protocols are disabled.

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoCreate a task scaling threshold at which point internal eager is disabled
Michael Blocksome [Tue, 9 Oct 2012 20:33:31 +0000]
Create a task scaling threshold at which point internal eager is disabled

A new mpidi_platform.h define sets the job size threshold at which
point the default 'internal eager' limits are set to zero. This has the
same effect as specifying the environment variable:

  PAMID_PT2PT_LIMITS=::::0:0:0:0

The current threshold for bgq is 512k tasks. The default threshold for
all other platforms is 'max unsigned int" which effectively disables
this threshold check.

This 'disable internal eager' threshold check is done before any
envrionment variable processing.

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoSimplify and clarify the 'MPIDI_PT2PT_LIMIT' macro by splitting it in two.
Michael Blocksome [Thu, 11 Oct 2012 19:19:48 +0000]
Simplify and clarify the 'MPIDI_PT2PT_LIMIT' macro by splitting it in two.

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoAdd 'PAMID_PT2PT_LIMITS' env var to specify *all* point-to-point limit overrides
Michael Blocksome [Tue, 9 Oct 2012 19:39:20 +0000]
Add 'PAMID_PT2PT_LIMITS' env var to specify *all* point-to-point limit overrides

The entire point-to-point limit set is determined by three boolean
configuration values:
- 'is non-local limit'   vs 'is local limit'
- 'is eager limit'       vs 'is immediate limit'
- 'is application limit' vs 'is internal limit'

The point-to-point configuration limit values are specified in order and
are delimited by ':' characters. If a value is not specified for a given
configuration then the limit is not changed. All eight configuration
values are not required to be specified, although in order to set the
last (eighth) configuration value the previous seven configurations must
be listed. The 'k', 'K', 'm', and 'M' multipliers may be specified. For
example:

   PAMID_PT2PT_LIMITS=":::::::10k"

The configuration entries can be described as:
   0 - remote eager     application limit
   1 - local  eager     application limit
   2 - remote immediate application limit
   3 - local  immediate application limit
   4 - remote eager     internal    limit
   5 - local  eager     internal    limit
   6 - remote immediate internal    limit
   7 - local  immediate internal    limit

Examples:

   "10K"
     - sets the application internode eager (the "normal" eager limit)

   "10240::64"
     - sets the application internode eager and immediate limits

   "::::0:0:0:0"
     - disables 'eager' and 'immediate' for all internal point-to-point

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoIgnore build files from test directory.
Michael Blocksome [Mon, 15 Oct 2012 14:01:58 +0000]
Ignore build files from test directory.

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoF182395: Enable PAMIX extensions to enable/disable interrupts
Su Huang [Tue, 9 Oct 2012 17:14:19 +0000]
F182395: Enable PAMIX extensions to enable/disable interrupts

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoupgrade: mpich2-trunk r10341 --> mpich2-1.5
Michael Blocksome [Tue, 9 Oct 2012 13:43:03 +0000]
upgrade: mpich2-trunk r10341 --> mpich2-1.5

Signed-off-by: Haizhu Liu <haizhu@us.ibm.com>

6 years agofix for incorrect cxx wrapper flags on bgq
Michael Blocksome [Mon, 8 Oct 2012 16:17:27 +0000]
fix for incorrect cxx wrapper flags on bgq

Signed-off-by: Haizhu Liu <haizhu@us.ibm.com>

6 years agoupgrade: mpich2-trunk r10338 --> mpich2-trunk r10341
Michael Blocksome [Fri, 5 Oct 2012 18:55:58 +0000]
upgrade: mpich2-trunk r10338 --> mpich2-trunk r10341

Signed-off-by: Haizhu Liu <haizhu@us.ibm.com>

6 years agoupgrade: mpich2-1.5rc3 --> mpich2-trunk r10338
Michael Blocksome [Fri, 5 Oct 2012 16:29:03 +0000]
upgrade: mpich2-1.5rc3 --> mpich2-trunk r10338

Conflicts:

mpich2/configure.ac
mpich2/src/armci/configure.ac
mpich2/src/env/Makefile.mk
mpich2/src/mpi/datatype/type_create_hindexed_block.c
mpich2/src/mpid/pamid/include/mpidi_macros.h
mpich2/src/mpl/configure.ac
mpich2/src/openpa/configure.ac
mpich2/src/pm/hydra/mpl/configure.ac
mpich2/src/pm/hydra/tools/topo/hwloc/hwloc/configure.ac

Signed-off-by: Haizhu Liu <haizhu@us.ibm.com>

6 years agoIssue 8497: Fix for distributing aggregators across (unequal) bridge sets.
Bob Cernohous [Tue, 10 Jul 2012 18:56:43 +0000]
Issue 8497: Fix for distributing aggregators across (unequal) bridge sets.

We can't assume all psets are one size and calculate distance/indices
from that.  We need to loop through the ranks, assigning aggregators.
----------------------
Issue 8238: Fix bridge rank determination for subcomm's

MPIX_Torus2rank/Rank2torus only applies to commm_world so the previous fix is
not sufficient for arbitrary subcomm's.  We can't use any sort of global rank
to find our bridge nodes.

So switch the Allgather/sort to use the bridge coordinates instead of the bridge
rank.  Use the sorted results to determine everything we need - bridge ranks
(either the 'true' bridge rank or an arbitrary (first found) rank), number of
ranks per bridge (pset), etc.  Also remove some code base on old BGL/BGP
psets/virtual psets that doesn't really make sense now (e.g. you can't simply
divide by the number of ranks per node and assume all ranks per node are in a
subcomm).

Also includes:

Issue 8357: Fix Romio (ad_bg) problem with small 'psets' <= 8 ranks.

We default to 8 aggregators per 'pset' but had sort/assign problems when there
were 8 ranks per 'pset' (or fewer).

Here, a BGQ 'pset' is defined as the set of compute nodes using a bridge node.
BGQ does not actually have pset's in the same way as did BGL/BGP.

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoFixed assert on int env vars passed in a string
Charles Archer [Thu, 4 Oct 2012 16:01:00 +0000]
Fixed assert on int env vars passed in a string

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoupgrade: mpich2-1.5rc2 -> mpich2-1.5rc3
Michael Blocksome [Thu, 4 Oct 2012 13:31:39 +0000]
upgrade: mpich2-1.5rc2 -> mpich2-1.5rc3

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoSet up pointers for nonblocking collectives. Add MAX_ERROR_STRING processing
Charles Archer [Wed, 3 Oct 2012 20:55:10 +0000]
Set up pointers for nonblocking collectives.  Add MAX_ERROR_STRING processing

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoUpdated .gitignore to include build generated files
Charles Archer [Wed, 3 Oct 2012 20:52:43 +0000]
Updated .gitignore to include build generated files

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoupgrade: mpich2-trunk r10273 --> mpich2-1.5rc2
Michael Blocksome [Wed, 3 Oct 2012 16:21:14 +0000]
upgrade: mpich2-trunk r10273 --> mpich2-1.5rc2

Signed-off-by: Su Huang <suhuang@us.ibm.com>

6 years agoFix for mpi test comm/icm.c
Michael Blocksome [Tue, 2 Oct 2012 15:44:57 +0000]
Fix for mpi test comm/icm.c

This test checks the error code from this macro. The gnu extension used
by the macro sets the return value to the value of the last expression
in the macro. By re-ordering the statements in the macro the macro will
evaluate to 'zero' which is considered successful.

Signed-off-by: Charles Archer <archerc@us.ibm.com>

6 years agoD185675: code review fixes
Charles Archer [Mon, 1 Oct 2012 19:29:57 +0000]
D185675:  code review fixes

Also, fix uninitialized variable

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoD186334: 7Z6: multi_mpi coredump with RDMA on mpich2
Su Huang [Fri, 21 Sep 2012 20:56:03 +0000]
D186334: 7Z6: multi_mpi coredump with RDMA on mpich2

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoD186277:Support data size suffix with K,M for MP_EAGER_LIMIT
Su Huang [Wed, 19 Sep 2012 14:48:38 +0000]
D186277:Support data size suffix with K,M for MP_EAGER_LIMIT

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoD185675: MPIDI_TRACE tool enhancement
Su Huang [Wed, 12 Sep 2012 18:42:37 +0000]
D185675:  MPIDI_TRACE tool enhancement

Signed-off-by: Michael Blocksome <blocksom@us.ibm.com>

6 years agoEnable shared library compiles for bgq.
Michael Blocksome [Thu, 27 Sep 2012 18:56:06 +0000]
Enable shared library compiles for bgq.

On bgq, when the mpich configure is passed the '--enabled-shared'
option, libtool will incorrectly attempt to statically link the
libmpich.so library with the mpich2version object(s). The only way
around this seems to be to specify the '-all-static' libtool option for
the mpich2version LDFLAGS.

Signed-off-by: Haizhu Liu <haizhu@us.ibm.com>

6 years agoIssue 7967: Create geometries with (existing) tasklists
Bob Cernohous [Wed, 8 Aug 2012 22:13:21 +0000]
Issue 7967: Create geometries with (existing) tasklists

Signed-off-by: sssharka <sssharka@us.ibm.com>

6 years agoFix to D186410 for BGQ builds
Bob Cernohous [Thu, 27 Sep 2012 17:23:55 +0000]
Fix to D186410 for BGQ builds

Signed-off-by: Bob Cernohous <bobc@us.ibm.com>

6 years agoD186410: FCA: Need to have an environment variable to limit FCA use
sssharka [Wed, 26 Sep 2012 01:24:07 +0000]
D186410: FCA: Need to have an environment variable to limit FCA use

   - The env variable MP_MPI_PAMI_FOR now accepts cutoff sizes for each
     FCA collective. For example:
     export MP_MPI_PAMI_FOR=bcast:16384,allreduce,allgather,reduce:16384

Signed-off-by: Bob Cernohous <bobc@us.ibm.com>