MVAPICH2 1.4 Release has integrated support of FTB to Carry out Checkpoint Restart (CR). LIST OF FTB EVENTS SUPPORTED ============================ 1. CR_FTB_CHECKPOINT - Requests a Checkpoint of the MPI Job. Usually sent by mpirun_rsh. The checkpoint file name (the default name is /tmp/ckpt) is sent as Pay Load. 2. CR_FTB_CKPT_DONE - Indicates that the Checkpoint is completed successfully. Sent by MPI Processes that were able to checkpoint. 3. CR_FTB_CKPT_FAIL - Indicates that the Checkpoint failed. Sent by MPI Processes that failed to take the checkpoint. 4. CR_FTB_RSRT_DONE - Indicates that the Restart completed successfully. Sent by MPI Processes that were able to restart. 5. CR_FTB_RSRT_FAIL - Indicates that the Restart failed. Sent by MPI Processes that failed to restart. 6. CR_FTB_CKPT_FINALIZE - Indicates that the CR module has shutdown. Sent by all MPI Processes doing MPI_Finalize(). 7. CR_FTB_APP_CKPT_REQ - Sent by MPI Process from where the user requested a checkpoint through MVAPICH2_Sync_Checkpoint.