Prof. Peter Pietzuch(Imperial College London)
hosted by Jonathan Mace
"Making Distributed Deep Learning Adaptive"
When using distributed machine learning (ML) systems to train models ona cluster of worker machines, users must configure a large number ofparameters: hyper-parameters (e.g. the batch size and the learning rate)affect model convergence; system parameters (e.g. the number of workersand their communication topology) impact training performance. Some ofthese parameters, such as the number of workers, may also change inelastic machine learning scenarios. In current systems, adapting suchparameters during training is ill-supported.
In this talk, I will describe our recent work on KungFu, a distributeddeep learning library for TensorFlow and PyTorch that is designed toenable adaptive and elastic training. KungFu allows users to expresshigh-level Adaptation Policies (APs) that describe how to change hyper-and system parameters during training. APs take real-time monitoredmetrics (e.g. signal-to-noise ratios) as input and trigger controlactions (e.g. cluster rescaling or synchronisation strategy updates).For execution, APs are translated into monitoring and control operatorsthat are embedded in the dataflow graph. APs exploit an efficientasynchronous collective communication layer, which ensures concurrencyand consistency of monitoring and adaptation operations.
Bio: Peter Pietzuch is a Professor of Distributed Systems at Imperial CollegeLondon, where he leads the Large-scale Data & Systems (LSDS) group(http://lsds.doc.ic.ac.uk). His research work focuses on the design andengineering of scalable, reliable and secure large-scale softwaresystems, with a particular interest in performance, data management andsecurity issues. He has published papers in premier scientific venues,including OSDI/SOSP, SIGMOD, VLDB, ASPLOS, USENIX ATC, EuroSys, SoCC,ICDCS and Middleware. Currently he is a Visiting Researcher withMicrosoft Research and serves as the Director of Research in theDepartment, the Chair of the ACM SIGOPS European Chapter, and anAssociate Editor for IEEE TKDE and TCC. Before joining Imperial CollegeLondon, he was a post-doctoral Fellow at Harvard University. He holdsPhD and MA degrees from the University of Cambridge.
|Time:||Wednesday, 14.07.2021, 10:00|