Hari Subramoni

This page lists the publications by Hari Subramoni

Journals (15)

1 T. Tran, B. Ramesh, B. Michalowicz, M. Abduljabbar, H. Subramoni, A. Shafi, and DK Panda, Accelerating Communication with Multi-HCA Aware Collectives in MPI, Concurrency and Computation: Practice and Experience (CCPE), July 2023,
2 K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries, IEEE Micro, Jan 2023.
3 K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.
4 A. Jain, N. Alnaasan, A. Shafi, H. Subramoni, and DK Panda, Optimizing Distributed DNN Training using CPUs and BlueField-2 DPUs, IEEE Micro, doi: 10.1109/MM.2021.3139027,
5 DK Panda, H. Subramoni, C. Chu, and M. Bayatpour, The MVAPICH project: Transforming Research into High-Performance MPI Library for HPC Community , Journal of Computational Science (JOCS), Special Issue on Translational Computer Science, Oct 2020.
6 J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures, Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008,
7 Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects, IEEE Micro, vol. 40, no. 1, pp. 35-43, 1 Jan.-Feb. 2020.,
8 A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, Effcient Design for MPI Asynchronous Progress without Dedicated Resources, Parallel Computing - Systems & Applications, Volume 85, July 2019, Pages 13-26, https://doi.org/10.1016/j.parco.2019.03.003,
9 Ammar Awan, K. Vadambacheri Manian, C. Chu, H. Subramoni, and DK Panda, Optimized Large-Message Broadcast for Deep Learning Workloads: MPI, MPI+NCCL, or NCCL2?, Volume 85, July 2019, Pages 141-152, https://doi.org/10.1016/j.parco.2019.03.005,
10 C. Chu, X. Lu, Ammar Awan, H. Subramoni, Bracy Elton, and DK Panda, Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 30, no. 3, pp. 575-588, 1 March 2019,
11 S. Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, DK Panda, Martin Schulz, and H. Subramoni, EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications, Concurrency and Computation: Practice and Experience, 14 August 2018, https://doi.org/10.1002/cpe.4863,
12 S. Ramesh, A. Mahéo, S. Shende, A. Malony, H. Subramoni, A. Ruhela, and DK Panda, MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU, ISSN 0167-8191, Volume 77, Sep 2018.
13 K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters, ParCo: Elsevier Parallel Computing Journal ,
14 S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and DK Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters IEEE Computer, Nov 2011.
15 Srinivasan Ramesh, Aurele Maheo, Sameer Shende, Allen Malony, H. Subramoni, and DK Panda, MPI Performance Engineering with the MPI Tool Interface: the Integration of MVAPICH and TAU, Sep 2018.

Conferences & Workshops (173)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173

Ph.D. Disserations (1)

1 H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013