This page lists the publications from the NOWLAB members

Book (1)

1 DK Panda, X. Lu, and D. Shankar, High-Performance Big Data Computing, The MIT Press, Aug 2022.

Journals (39)

1 T. Tran, B. Ramesh, B. Michalowicz, M. Abduljabbar, H. Subramoni, A. Shafi, and DK Panda, Accelerating Communication with Multi-HCA Aware Collectives in MPI, Concurrency and Computation: Practice and Experience (CCPE), July 2023,
2 K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries, IEEE Micro, Jan 2023.
3 K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.
4 DK Panda, H. Subramoni, C. Chu, and M. Bayatpour, The MVAPICH project: Transforming Research into High-Performance MPI Library for HPC Community , Journal of Computational Science (JOCS), Special Issue on Translational Computer Science, Oct 2020.
5 J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures, Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008,
6 Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects, IEEE Micro, vol. 40, no. 1, pp. 35-43, 1 Jan.-Feb. 2020.,
7 A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, Effcient Design for MPI Asynchronous Progress without Dedicated Resources, Parallel Computing - Systems & Applications, Volume 85, July 2019, Pages 13-26, https://doi.org/10.1016/j.parco.2019.03.003,
8 Ammar Awan, K. Vadambacheri Manian, C. Chu, H. Subramoni, and DK Panda, Optimized Large-Message Broadcast for Deep Learning Workloads: MPI, MPI+NCCL, or NCCL2?, Volume 85, July 2019, Pages 141-152, https://doi.org/10.1016/j.parco.2019.03.005,
9 C. Chu, X. Lu, Ammar Awan, H. Subramoni, Bracy Elton, and DK Panda, Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 30, no. 3, pp. 575-588, 1 March 2019,
10 S. Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, DK Panda, Martin Schulz, and H. Subramoni, EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications, Concurrency and Computation: Practice and Experience, 14 August 2018, https://doi.org/10.1002/cpe.4863,
11 X. Lu, H. Shi, R. Biswas, M. H. Javed, and DK Panda, DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters, IEEE Transactions on Multi-Scale Computing Systems, Jun 2018.
12 S. Ramesh, A. Mahéo, S. Shende, A. Malony, H. Subramoni, A. Ruhela, and DK Panda, MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU, ISSN 0167-8191, Volume 77, Sep 2018.
13 M. W. Rahman, N. Islam, X. Lu, D. Shankar, and DK Panda, MR-Advisor: A Comprehensive Tuning, Profiling, and Prediction Tool for MapReduce Execution Frameworks on HPC Clusters, Journal of Parallel and Distributed Computing (JPDC), Nov 2017.
14 X. Lu, D. Shankar, and DK Panda, Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached, "IEEE Data Engineering Bulletin (DEBull), Volume 40", Bulletin of the Technical Committee on Data Engineering (TCDE), (Invited Paper), Mar 2017.
15 M. W. Rahman, N. Islam, X. Lu, and DK Panda, A Comprehensive Study of MapReduce over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters, IEEE Transactions on Parallel and Distributed Systems, Jul 2016.
16 D. Shankar, X. Lu, M. W. Rahman, N. Islam, and DK Panda, Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters, The Journal of Supercomputing - Springer, Jun 2016.
17 K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters, ParCo: Elsevier Parallel Computing Journal ,
18 H. Wang, S. Potluri, D. Bureddy, and DK Panda, GPU-Aware MPI on RDMA-Enabled Cluster: Design, Implementation and Evaluation, IEEE Transactions on Parallel & Distributed Systems, Vol. 25, No. 10, pp. 2595-2605, Oct 2014.
19 N. Islam, X. Lu, M. W. Rahman, J. Jose, and DK Panda, A Micro-Benchmark Suite for Evaluating HDFS Operations on Modern Clusters, Special Issue of LNCS on papers from WBDB '12 Workshop, May 2012.
20 S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and DK Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters IEEE Computer, Nov 2011.
21 P. Lai, P. Balaji, R. Thakur, and DK Panda, ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures Computer Science: Research and Development, Special Issue of Scientific Papers from ISC '09, Jun 2009.
22 A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and DK Panda, Topology Agnostic Hot-Spot Avoidance with InfiniBand Concurrency and Computation: Practice and Experience, Special Issue of Best Papers from CCGrid '07, Jan 2008.
23 H. Jin, P. Balaji, C. Yoo, J. -Y. Choi, and DK Panda, Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks OSU-CISRC-5/04-TR37, Nov 2005.
24 J. Liu, A. Mamidala, A. Vishnu, and DK Panda, Performance Evaluation of InfiniBand with PCI Express, IEEE Micro, Jan 2005.
25 J. Liu, J. Wu, and DK Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, Int'l Journal of Parallel Programming: Volume 32, Number 3, Jun 2004.
26 J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and DK Panda, Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects IEEE Micro, Jan 2004.
27 A. Wagner, D. Buntinas, R. Brightwell, and DK Panda, Application-Bypass Reduction for Large-Scale Clusters. Int'l Journal of High Performance Computing and Networking Internationall Journal of High Performance Computing and Networking, Cluster 2003 Special Issue. In Press, Dec 2003.
28 R. Sivaram, C. Stunkel, and DK Panda, HIPIQS: A High-Performance Switch Architecture using Input Queuing IEEE Transactions on Parallel and Distributed Systems. Vol. 13, No. 3, pp. 275-289, Mar 2002.
29 M. Banikazemi, B. Abali, L. Herger, and DK Panda, Design Alternatives for Virtual Interface Architecture (VIA) and an Implementation on IBM Netfinity NT Cluster Journal of Parallel and Distributed Computing, Special Issue on Clusters, Volume 61, Number 11, pp. 1512-1545, Nov 2001.
30 M. Banikazemi, R. K. Govindaraju, R. Blackmore, and DK Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 10, pp. 1081-1093, Oct 2001.
31 B. Abali, C. B. Stunkel, J. Herring, M. Banikazemi, DK Panda, C. Aykanat, and Y. Aydogan, Adaptive Routing on the New Switch Chip for IBM SP Systems Journal of Parallel and Distributed Computing, Special Issue on Routing in Computer and Communication Networks, Volume 61, Number 9, pp. 1148-1179, Sep 2001.
32 R. Kesavan, and DK Panda, Efficient Multicast on Irregular Switch-based Cut-Through Networks with Up-Down Routing IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 8, pp. 808-828, Aug 2001.
33 R. Sivaram, R. Kesavan, DK Panda, and C. Stunkel Architectural Support for Efficient Multicasting in Irregular Networks, Architectural Support for Efficient Multicasting in Irregular Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 5, pp. 489-513, May 2001.
34 R. Sivaram, C. Stunkel, and DK Panda, Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 8, pp. 794-812, Aug 2000.
35 R. Kesavan, and DK Panda, Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, pp. 371-393, Apr 1999.
36 D. Dai, and DK Panda, Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation IEEE Transactions on Computers, Special Issue on Cache Memory, Vol. 48, No. 2, pp. 236-244, Feb 1999.
37 R. Prakash, and DK Panda, Designing Communication Strategies for Heterogeneous Parallel Systems, Parallel Computing, Volume 24, pp. 2035-2052, Dec 1998.
38 R. Sivaram, DK Panda, and C. B. Stunkel, Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 10, pp. 1004-1028, Oct 1998.
39 D. Basak, and DK Panda, Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 9, pp. 962-978, Sep 1996.

Book Chapter (3)

1 X. Lu, J. Zhang, and DK Panda, Building Efficient HPC Cloud with SR-IOV Enabled InfiniBand: The MVAPICH2 Approach , Book "Research Advances in Cloud Computing", edited by Sanjay Chaudhary, Gaurav Somani, and Rajkumar Buyya, Springer International Publishing , Aug 2017.
2 X. Lu, and DK Panda, Contribution on Multiple Chapters related to OpenStack, Virtualized HPC, HPC Network Fabric, and HPC Workload Management , Book "The Crossroads of Cloud and HPC: OpenStack for Scientific Research; Exploring OpenStack Cloud Computing for Scientific Workloads", Edited by Stig Telfer - OpenStack Foundation Publishing (Invited Book Chapter) , Nov 2016.
3 X. Lu, M. W. Rahman, N. Islam, D. Shankar, and DK Panda, Accelerating Big Data Processing on Modern HPC Clusters , Book "Conquering Big Data with High Performance Computing", Edited by Ritu Arora - Springer International Publishing (Invited Book Chapter) , Jul 2016.

Conferences & Workshops (516)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
  • Experiences with Software MPEG-2 Video Decompression on an SMP PC

  • A. Bala, D. Shah, W.-C. Feng, and DK Panda,
  • ICPP Workshop, Aug 1998
  • [Bib - Plain]
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516

Technical Reports (8)

1 K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2 K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3 G. Marsh, A. Sampat, S. Potluri, and DK Panda, Scaling Advanced Message Queuing Protocol (AMQP) Architecture with Broker Federation and InfiniBand OSU Technical Report (OSU-CISRC-5/09-TR17)
4 W. Huang, J. Liu, B. Abali, and DK Panda, InfiniBand Support in Xen Virtual Machine Environment, OSU-CISRC-2/06--TR18
5 P. Balaji, W. Feng, and DK Panda, The Convergence of Ethernet and Ethernot: A 10-Gigabit Ethernet Perspective, OSU-CISRC-1/06-TR10
6 H. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji, and DK Panda, Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC, OSU-CISRC-6/05-TR40
7 K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
8 S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (39)

1 M. Bayatpour, Designing High Performance Hardware-assisted Communication Middlewares for Next-Generation HPC Systems, May 2021
2 C. Chu, Accelerator-enabled Communication Middleware for Large-scale Heterogeneous HPC Systems with Modern Interconnects, Jul 2020
3 J. Hashmi, Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems, Apr 2020
4 Ammar Awan, Co-designing Communication Middleware and Deep Learning Frameworks for High-Performance DNN Training on HPC Systems, Apr 2020
5 D. Shankar, Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage for Modern HPC Clusters, Jul 2019
6 S. Chakraborty, High Performance and Scalable Cooperative Communication Middleware for Next Generation Architectures, Jun 2019
7 J. Zhang, Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters, Jul 2018
8 M. Li, Designing High-Performance Remote Memory Access for MPI and PGAS Models with Modern Networking Technologies on Heterogeneous Clusters, Nov 2017
9 A. Venkatesh, High-Performance Heterogeneity/Energy-Aware Communication for MultiPetaflop HPC Systems, Dec 2016
10 N. Islam, High-Performance File System and I/O Middleware Design for Big Data on HPC Clusters, Nov 2016
11 M. W. Rahman, Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems, Nov 2016
12 R. Rajachandrasekar, Designing Scalable And Efficient I/O Middleware for Fault-Resilient High-performance Computing Clusters, Nov 2014
13 J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
14 S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014
15 K. Kandalla, High Performance Non-Blocking Collective Communication for Next Generation InfiniBand Clusters, Jul 2013
16 M. Luo, Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand and Heterogeneous System, Jul 2013
17 H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013
18 X. Ouyang, Efficient Storage Middleware Design in InfiniBand Clusters for High-End Computing, Mar 2012
19 G. Santhanaraman, Designing Scalable And High Performance One Sided Communication Middleware For Modern Interconnects, Jun 2009
20 M. Koop, High-Performance Multi-Transport MPI Design For Ultra-Scale Infiniband Clusters, Jun 2009
21 L. Chai, High Performance And Scalable MPI Intra-Node Communication Middleware For Multi-Core Clusters, Mar 2009
22 W. Huang, High Performance Network I/O In Virtual Machines Over Modern Interconnects, Aug 2008
23 R. Noronha, Designing High-Performance and Scalable Clustered Network Attached Storage With InfiniBand, Aug 2008
24 S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
25 A. Mamidala, Scalable and High Performance Collective Communication For Next Generation Multicore InfiniBand Clusters, May 2008
26 K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
27 A. Vishnu, High Performance and Network Fault Tolerant MPI with Multi-Pathing Over InfiniBand, Dec 2007
28 S. Sur, Scalable and High Performance MPI Design for Very Large InfiniBand Clusters, Aug 2007
29 W. Yu, Enhancing MPI with Modern Networking Mechanisms in Cluster Interconncts, Jun 2006
30 P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006
31 J. Liu, Designing High Performance and Scalable MPI over InfiniBand, Sep 2004
32 J. Wu, Communication and Memory Management in Networked Storage Systems, Sep 2004
33 D. Buntinas, Improving Cluster Performance through the Use of Programmable Network Interfaces, Jun 2003
34 M. Banikazemi, Design and Implementation of High Performance Communication Subsystems for Clusters, Dec 2000
35 D. Dai, Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems, Mar 1999
36 R. Kesavan, Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems, Oct 1998
37 R. Sivaram, Architectural Support for Efficient Communication in Scalable Parallel Systems, Aug 1998
38 D. Basak, Designing High Performance Parallel Systems: A Processor-Cluster Based Approach, Jul 1996
39 V. Dixit-Radiya, Mapping on Wormhole-routed Distributed-Memory Systems: A Temporal Communication Graph-based Approach, Mar 1995

M.S. Thesis (34)

1 N. Sarkauskas, Large-Message Nonblocking MPI Iallgather and MPI Ibcast Offload via BlueField2 DPU, May 2022
2 S. Srivastava, MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library, May 2021
3 N. Senthil Kumar, Designing Optimized MPI+NCCL Hybrid Collective Communication Routines for Dense Many-GPU Clusters, May 2021
4 Kamal Raj Sankarapandian, Profiling MPI Primitives in Real-time Using OSU INAM, Apr 2020
5 R. Biswas, Benchmarking and Accelerating TensorFlow-based Deep Learning on Modern HPC Systems, Jul 2018
6 K. Kulkarni, Performance Characterization and Improvements of SQL-on-Hadoop Systems, Aug 2016
7 A. Augustine, Designing a Scalable Network Analysis and Monitoring Tool with MPI Support, Aug 2016
8 A. Bhat, RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed Filesystem, Aug 2015
9 V. Dhanraj, Enhancement of LIMIC-Based Collectives for Multi-core Clusters, Aug 2012
10 A. Singh, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, Apr 2012
11 S. Pai Raikar, Network Fault-Resilient MPI for Multi-Rail InfiniBand Clusters, Dec 2011
12 N. Dandapanthula, InfiniBand Network Analysis and Monitoring using OpenSM, Aug 2011
13 V. Meshram, Distributed Metadata Management for Parallel Systems, Aug 2011
14 G. Marsh, Evaluation of High Performance Financial Messaging on Modern Multi-core Systems, Mar 2010
15 K. Gopalakrishnan, Enhancing Fault Tolerance in MPI for Modern InfiniBand Clusters, Aug 2009
16 T. Gangadharappa, Designing Support For MPI-2 Programming Interfaces On Modern Interconnects, Jun 2009
17 J. Sridhar, Scalable Job Startup And Inter-Node Communication In Multi-Core Infiniband Clusters, Jun 2009
18 R. Kumar, Enhancing MPI Point-to-Point and Collectives for Clusters with Onloaded/Offloaded InfiniBand Adapters, Aug 2008
19 S. Bhagvat, Designing and Enhancing the Sockets Direct Protocol (SDP) over iWARP and InfiniBand, Aug 2006
20 S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004
21 W. Jiang, High Performance MPICH2 One-Sided Communication Implementation over InfiniBand, Jun 2004
22 A. Wagner, Static and Dynamic Processing Offload on Myrinet Clusters with Programmable NIC Support, Jun 2004
23 A. Moody, NIC-based Reduction on Large-Scale Quadrics Clusters, Dec 2003
24 B. Chandrasekharan, Micro-benchmark Level Performance Evaluation and Comparison of High Speed Cluster Interconnects, Sep 2003
25 S. Kini, Efficient Collective Communication using Multicast and RDMA Operations for InfiniBand-based Clusters, Jun 2003
26 S. Senapathi, QoS-Aware Middleware to Support Interactive and Resource Adaptive Applications on Myrinet Clusters, Sep 2002
27 P. Shivam, High Performance User Level Protocol on Gigabit Ethernet, Aug 2002
28 R. Gupta, Efficient Collective Communication using Remote Memory Operations on VIA-Based Clusters, Aug 2002
29 A. Saify, Optimizing Collective Communication Operations in ARMCI, Jul 2002
30 S. Desai, Mechanisms for Implementing Efficient Collective Communication in Clusters with Application Bypass, Jun 2002
31 V. Tipparaju, Optimizing ARMCI Get/Put Operations on Myrinet/GM, Sep 2001
32 A. Gulati, A Proportional Bandwidth Allocation Scheme for Myrinet Clusters, Jun 2001
33 V. Kota, Designing Efficient Inter-Cluster Communication Layer for Distributed Computing, Jun 2001
34 S. Kutlug, Performance Evaluation and Analysis of User Level Networking Protocols in Clusters, Jun 2000

B.S. Thesis (2)

1 L. Xu, Scalable Neural Network Architecture Search Applied to Super-Resolution Networks, May 2022
2 N. Sarkauskas, Framework for End-to-End Tuning and Regression for a High Performance MPI Library on Modern Supercomputers, May 2021