MVAPICH2 1.6 RC1 Release supports FTB-enabled Checkpoint Restart (CR). LIST OF FTB EVENTS SUPPORTED ============================ 1. CR_FTB_CHECKPOINT - Requests a Checkpoint of the MPI Job. Usually sent by mpirun_rsh. The checkpoint file name (the default name is /tmp/ckpt) is sent as Pay Load. 2. MPI_PROCS_CKPTD - Indicates that the Checkpoint is completed successfully. Sent by MPI Processes that were able to checkpoint. 3. MPI_PROCS_CKPT_FAIL - Indicates that the Checkpoint failed. Sent by MPI Processes that failed to take the checkpoint. 4. MPI_PROCS_RESTARTED - Indicates that the Restart completed successfully. Sent by MPI Processes that were able to restart. 5. MPI_PROCS_RESTART_FAIL - Indicates that the Restart failed. Sent by MPI Processes that failed to restart. 6. MPI_PROCS_MIGRATED - Indicates that an MPI process has been migrated successfully to a new node. Sent by MPI Processes that are being migrated. 7. MPI_PROCS_MIGRATE_FAILED - Indicates that an attempt to migrate an MPI process has failed. Sent by MPI Processes that are being migrated. 6. CR_FTB_CKPT_FINALIZE - Indicates that the CR module has shutdown. Sent by all MPI Processes doing MPI_Finalize(). 7. CR_FTB_APP_CKPT_REQ - Sent by MPI Process from where the user requested a checkpoint through MVAPICH2_Sync_Checkpoint.