|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 12 occurrences of 9 keywords
|
|
|
Results
Found 68 publication records. Showing 68 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
89 | Amith R. Mamidala, Jiuxing Liu, Dhabaleswar K. Panda 0001 |
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. |
CLUSTER |
2004 |
DBLP DOI BibTeX RDF |
|
84 | Rinku Gupta, Pavan Balaji, Dhabaleswar K. Panda 0001, Jarek Nieplocha |
Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. |
IPDPS |
2003 |
DBLP DOI BibTeX RDF |
|
74 | Motohiko Matsuda, Tomohiro Kudoh, Yuetsu Kodama, Ryousei Takano, Yutaka Ishikawa |
The design and implementation of MPI collective operations for clusters in long-and-fast networks. |
Clust. Comput. |
2008 |
DBLP DOI BibTeX RDF |
Allreduce, Grid, Broadcast, Message passing interface (MPI), Wide-area network, Collective communication |
63 | Lars Ailo Bongo, Otto J. Anshus, John Markus Bjørndalen, Tore Larsen |
Extending Collective Operations with Application Semantics for Improving Multi-Cluster Performance. |
ISPDC/HeteroPar |
2004 |
DBLP DOI BibTeX RDF |
|
63 | Lars Ailo Bongo, Otto J. Anshus, John Markus Bjørndalen |
Collective Communication Performance Analysis Within the Communication System. |
Euro-Par |
2004 |
DBLP DOI BibTeX RDF |
|
52 | Peng Liu, Jintao Peng, Jie Liu, Lihua Chi |
TH-Allreduce: Optimizing Small Data Allreduce Operation on Tianhe System. |
ICPADS |
2023 |
DBLP DOI BibTeX RDF |
|
47 | George Almási 0001, Gábor Dózsa, C. Christopher Erway, Burkhard D. Steinmacher-Burow |
Efficient Implementation of Allreduce on BlueGene/L Collective Network. |
PVM/MPI |
2005 |
DBLP DOI BibTeX RDF |
|
42 | Espen Skjelnes Johnsen, John Markus Bjørndalen, Otto J. Anshus |
CoMPI- Configuration of Collective Operations in LAM/MPI Using the Scheme Programming Language. |
PARA |
2006 |
DBLP DOI BibTeX RDF |
|
42 | Motohiko Matsuda, Tomohiro Kudoh, Yuetsu Kodama, Ryousei Takano, Yutaka Ishikawa |
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks. |
CLUSTER |
2006 |
DBLP DOI BibTeX RDF |
|
33 | Keith D. Underwood, Jerrie Coffman, Roy Larsen, K. Scott Hemmert, Brian W. Barrett, Ron Brightwell, Michael J. Levenhagen |
Enabling Flexible Collective Communication Offload with Triggered Operations. |
Hot Interconnects |
2011 |
DBLP DOI BibTeX RDF |
Allreduce, MPI, collective, offload |
26 | Emin Nuriyev, Ravi Reddy Manumachu, Samar Aseeri, Mahendra K. Verma, Alexey L. Lastovetsky |
SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications. |
J. Parallel Distributed Comput. |
2024 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler |
Swing: Short-cutting Rings for Higher Bandwidth Allreduce. |
CoRR |
2024 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent Vanbever, Torsten Hoefler |
Canary: Congestion-aware in-network allreduce using dynamic trees. |
Future Gener. Comput. Syst. |
2024 |
DBLP DOI BibTeX RDF |
|
26 | Guozheng Wang, Yongmei Lei, Zeyu Zhang, Cunlu Peng |
2D-THA-ADMM: communication efficient distributed ADMM algorithm framework based on two-dimensional torus hierarchical AllReduce. |
Int. J. Mach. Learn. Cybern. |
2024 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler |
Swing: Short-cutting Rings for Higher Bandwidth Allreduce. |
NSDI |
2024 |
DBLP BibTeX RDF |
|
26 | Guozheng Wang, Yongmei Lei, Zeyu Zhang, Cunlu Peng |
A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce. |
Data Sci. Eng. |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz 0001 |
Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent Vanbever, Torsten Hoefler |
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees. |
CoRR |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Adrián Castelló 0001, Mar Catalán, Manuel F. Dolz, Enrique S. Quintana-Ortí, José Duato |
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks. |
Computing |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Ruiqi Wang, Dezun Dong, Fei Lei, Junchao Ma, Ke Wu, Kai Lu |
Roar: A Router Microarchitecture for In-network Allreduce. |
ICS |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Peng Liu, Jintao Peng, Jie Liu 0002, Min Xie, Liuhua Chi |
GLEX_Allreduce: Optimization for medium and small message of Allreduce on Tianhe system. |
ICPADS |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Chang Chen, Min Li, Chao Yang |
bbTopk: Bandwidth-Aware Sparse Allreduce with Blocked Sparsification for Efficient Distributed Training. |
ICDCS |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Marcin Chrapek, Mikhail Khalilov, Torsten Hoefler |
HEAR: Homomorphically Encrypted Allreduce. |
SC |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Kartik Lakhotia, Kelly Isham, Laura Monroe, Maciej Besta, Torsten Hoefler, Fabrizio Petrini |
In-network Allreduce with Multiple Spanning Trees on PolarFly. |
SPAA |
2023 |
DBLP DOI BibTeX RDF |
|
26 | Kartik Lakhotia, Fabrizio Petrini, Rajgopal Kannan, Viktor K. Prasanna |
Accelerating Allreduce With In-Network Reduction on Intel PIUMA. |
IEEE Micro |
2022 |
DBLP DOI BibTeX RDF |
|
26 | Shigang Li 0002, Torsten Hoefler |
Near-Optimal Sparse Allreduce for Distributed Deep Learning. |
CoRR |
2022 |
DBLP BibTeX RDF |
|
26 | Sam White, Laxmikant V. Kalé |
Optimizing Non-commutative Allreduce Over Virtualized, Migratable MPI Ranks. |
IPDPS Workshops |
2022 |
DBLP DOI BibTeX RDF |
|
26 | Zeyu Zhang, Yongmei Lei, Dongxia Wang, Guozheng Wang |
Distributed ADMM Based on Sparse Computation and Allreduce Communication. |
ISPA/BDCloud/SocialCom/SustainCom |
2022 |
DBLP DOI BibTeX RDF |
|
26 | Shigang Li 0002, Torsten Hoefler |
Near-optimal sparse allreduce for distributed deep learning. |
PPoPP |
2022 |
DBLP DOI BibTeX RDF |
|
26 | Truong Thao Nguyen, Mohamed Wahib, Ryousei Takano |
Efficient MPI-AllReduce for large-scale deep learning on GPU-clusters. |
Concurr. Comput. Pract. Exp. |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Dongxia Wang 0003, Yongmei Lei, Jinyang Xie, Guozheng Wang |
HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication. |
J. Supercomput. |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Yao Liu 0006, Junyi Zhang 0005, Shuo Liu 0002, Qiaoling Wang, Wangchen Dai, Ray Chak-Chung Cheung |
Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication. |
IEEE Trans. Circuits Syst. I Regul. Pap. |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Andreas Jocksch, Noé Ohana, Emmanuel Lanti, Eirini Koutsaniti, Vasileios Karakasis, Laurent Villard |
An optimisation of allreduce communication in message-passing systems. |
Parallel Comput. |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li 0002, Torsten Hoefler |
Flare: Flexible In-Network Allreduce. |
CoRR |
2021 |
DBLP BibTeX RDF |
|
26 | Adrián Castelló 0001, Enrique S. Quintana-Ortí, José Duato |
Accelerating distributed deep neural network training with pipelined MPI allreduce. |
Clust. Comput. |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Ido Hakimi, Rotem Zamir Aviv, Kfir Y. Levy, Assaf Schuster |
LAGA: Lagged AllReduce with Gradient Accumulation for Minimal Idle Time. |
ICDM |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Adrián Castelló 0001, Mar Catalán, Manuel F. Dolz, José I. Mestre, Enrique S. Quintana-Ortí, José Duato |
Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks. |
PDP |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Truong Thao Nguyen, Mohamed Wahib |
An Allreduce Algorithm and Network Co-design for Large-Scale Training of Distributed Deep Learning. |
CCGRID |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Akira Nukada |
Performance Optimization of Allreduce Operation for Multi-GPU Systems. |
IEEE BigData |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li 0002, Torsten Hoefler |
Flare: flexible in-network allreduce. |
SC |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Jiayu Wang, Peng Liu, Zehua Guo 0001, Sen Liu 0002, Chao Yao |
Exploring the Impact of Attacks on Ring AllReduce. |
APNet |
2021 |
DBLP DOI BibTeX RDF |
|
26 | Andreas Jocksch, Noé Ohana, Emmanuel Lanti, Vasileios Karakasis, Laurent Villard |
Optimised allgatherv, reduce_scatter and allreduce communication in message-passing systems. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
26 | Dmitry Kolmakov, Xuecang Zhang |
A Generalization of the Allreduce Operation. |
CoRR |
2020 |
DBLP BibTeX RDF |
|
26 | Xinchen Wan, Hong Zhang 0025, Hao Wang 0116, Shuihai Hu, Junxue Zhang 0001, Kai Chen 0005 |
RAT - Resilient Allreduce Tree for Distributed Machine Learning. |
APNet |
2020 |
DBLP DOI BibTeX RDF |
|
26 | Zehua Cheng, Zhenghua Xu |
Bandwidth Reduction using Importance Weighted Pruning on Ring AllReduce. |
CoRR |
2019 |
DBLP BibTeX RDF |
|
26 | Amanda Bienz, Luke N. Olson, William D. Gropp |
Node-Aware Improvements to Allreduce. |
CoRR |
2019 |
DBLP BibTeX RDF |
|
26 | Truong Thao Nguyen, Mohamed Wahib, Ryousei Takano |
Topology-aware Sparse Allreduce for Large-scale Deep Learning. |
IPCCC |
2019 |
DBLP DOI BibTeX RDF |
|
26 | Yuichiro Ueno, Rio Yokota |
Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs. |
CCGRID |
2019 |
DBLP DOI BibTeX RDF |
|
26 | Truong Thao Nguyen, Mohamed Wahib, Ryousei Takano |
Hierarchical Distributed-Memory Multi-Leader MPI-Allreduce for Deep Learning Workloads. |
CANDAR Workshops |
2018 |
DBLP DOI BibTeX RDF |
|
26 | Martin Ruefenacht, Mark Bull, Stephen Booth |
Generalisation of recursive doubling for AllReduce: Now with simulation. |
Parallel Comput. |
2017 |
DBLP DOI BibTeX RDF |
|
26 | Jesús M. Álvarez Llorente, Juan Carlos Díaz Martín, Juan A. Rico-Gallego |
Formal modeling and performance evaluation of a run-time rank remapping technique in Broadcast, Allgather and Allreduce MPI collective operations. |
CCGrid |
2017 |
DBLP DOI BibTeX RDF |
|
26 | Martin Ruefenacht, Mark Bull, Stephen Booth |
Generalisation of Recursive Doubling for AllReduce. |
EuroMPI |
2016 |
DBLP DOI BibTeX RDF |
|
26 | Patrick M. Widener, Kurt B. Ferreira, Scott Levy, Torsten Hoefler |
Exploring the effect of noise on the performance benefit of nonblocking allreduce. |
EuroMPI/ASIA |
2014 |
DBLP DOI BibTeX RDF |
|
26 | Keichi Takahashi, Dashdavaa Khureltulga, Yasuhiro Watashiba, Yoshiyuki Kido, Susumu Date, Shinji Shimojo |
Performance evaluation of SDN-enhanced MPI allreduce on a cluster system with fat-tree interconnect. |
HPCS |
2014 |
DBLP DOI BibTeX RDF |
|
26 | Lena Oden, Benjamin Klenk, Holger Fröning |
Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs. |
CCGRID |
2014 |
DBLP DOI BibTeX RDF |
|
26 | Huasha Zhao, John F. Canny |
Kylix: A Sparse Allreduce for Commodity Clusters. |
ICPP |
2014 |
DBLP DOI BibTeX RDF |
|
26 | Huasha Zhao, John F. Canny |
Sparse Allreduce: Efficient Scalable Communication for Power-Law Data. |
CoRR |
2013 |
DBLP BibTeX RDF |
|
26 | Nongda Hu, Dawei Wang, Zheng Cao, Xuejun An, Ninghui Sun |
Accelerating Allreduce Operation: A Switch-Based Solution. |
ICCCN |
2013 |
DBLP DOI BibTeX RDF |
|
26 | Krishna Chaitanya Kandalla, Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Devendar Bureddy, Dhabaleswar K. Panda 0001 |
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. |
Hot Interconnects |
2013 |
DBLP DOI BibTeX RDF |
|
26 | Krishna Chaitanya Kandalla, Ulrike Meier Yang, Jeff Keasler, Tzanio V. Kolev, Adam Moody, Hari Subramoni, Karen Tomko, Jérôme Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda 0001 |
Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers. |
IPDPS |
2012 |
DBLP DOI BibTeX RDF |
|
26 | Toshiyuki Imamura |
Recursive multi-factoring algorithm for MPI allreduce. |
Parallel and Distributed Computing and Networks |
2007 |
DBLP BibTeX RDF |
|
21 | Nikhil Jain, Yogish Sabharwal |
Optimal bucket algorithms for large MPI collectives on torus interconnects. |
ICS |
2010 |
DBLP DOI BibTeX RDF |
communication, MPI, collective, torus network |
21 | Sameer Kumar 0001, Gábor Dózsa, Jeremy Berg, Bob Cernohous, Douglas Miller, Joe Ratterman, Brian E. Smith, Philip Heidelberger |
Architecture of the Component Collective Messaging Interface. |
PVM/MPI |
2008 |
DBLP DOI BibTeX RDF |
|
21 | Sundeep Narravula, Amith R. Mamidala, Abhinav Vishnu, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda 0001 |
High Performance MPI over iWARP: Early Experiences. |
ICPP |
2007 |
DBLP DOI BibTeX RDF |
|
21 | José Carlos Sancho, Darren J. Kerbyson, Kevin J. Barker |
Efficient offloading of collective communications in large-scale systems. |
CLUSTER |
2007 |
DBLP DOI BibTeX RDF |
|
21 | Trammell Hudson, Ron Brightwell |
Poster reception - Network performance impact of a lightweight Linux for Cray XT3 compute nodes. |
SC |
2006 |
DBLP DOI BibTeX RDF |
|
21 | Ernie Chan, Robert A. van de Geijn, William Gropp, Rajeev Thakur |
Collective communication on architectures that support simultaneous communication over multiple links. |
PPoPP |
2006 |
DBLP DOI BibTeX RDF |
|
21 | Dongyoung Kim, Dongseung Kim |
Enhanced Collective Communication Functions Using Factorization and Pairwise-exchange Communication. |
ICPADS (1) |
2005 |
DBLP DOI BibTeX RDF |
|
Displaying result #1 - #68 of 68 (100 per page; Change: )
|
|