About Me
I am an AI system researcher working at FriendliAI. I received my PhD at Seoul National University (SNU) Software Platform Lab (SPL), advised by Prof. Byung-Gon Chun. My interest lies in (but not limited to) optimizing training & inference of large-scale deep learning models, developing large-scale natural language models, and building distributed data processing systems. I’ve also been participating many open-source projects including Apache Nemo and Apache REEF.
- Email: gyewonleek at gmail dot com
- LinkedIn: https://www.linkedin.com/in/gyewon-lee-284236223/
- GitHub: https://github.com/differentsc
Career
- 2022-Present AI system researcher, FriendliAI
- 2015-2022, Ph.D. in Computer Science and Engineering, Seoul National University
- Dissertation: Semantic-Aware Data Management for Data Processing and Deep Learining
- 2015 Summer, Research Intern, Microsoft Research Asia
- 2011-2015, B.S. in Computer Science and Engineering, Seoul National University
Technical & Research Interests
- Systematic Optimization of DL pipelines
- Multi-GPU & Multi-Node Training (PyTorch DDP, GPipe, Megatron-LM)
- Data Preprocessing and Augmentation (tf.data, PyTorch Dataloader, DALI)
- Multiprocessing for Python-based DL frameworks
- Large-Scale Deep Neural Networks
- Pretrained language models (GPT-3, DALL-E, Codex)
- Large-Scale Data Processing
- Distributed Data Processing (Spark, Nemo, REEF)
- Real-Time Stream Processing (Flink, Storm)
- Persistent KV stores (RocksDB, Microsoft FASTER) on SSDs
Featured Publications
- Gyewon Lee, Jaewoo Maeng, Jinsol Park, Jangho Seo, Haeyoon Cho, Youngseok Yang, Taegeum Um, Jongsung Lee, Jae W. Lee, Byung-Gon Chun (2023). FlowKV: A Semantic-Aware Store for Large-Scale State Management of Stream Processing Engines. ACM EuroSys 2023. [Paper]
- Dohyeon Lee, Jaeseong Lee, Gyewon Lee, Seung-Won Hwang, Byung-Gon Chun (2021). SCOPA : Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer. ACM CIKM 2021. [Paper]
- Won Wook Song, Youngseok Yang, Jeongyoon Eo, Jangho Seo, Joo Yeon Kim, Sanha Lee, Gyewon Lee, Taegeon Um, Haeyoon Cho, Byung-Gon Chun (2021). Apache Nemo: A Framework for Optimizing Distributed Data Processing. ACM TOCS 2021. [Paper]
- Gyewon Lee, Irene Lee, Hyeonmin Ha, Kyunggeun Lee, Hwarim Hyun, Ahnjae Shin, Byung-Gon Chun (2021). Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training. USENIX ATC 2021. [Paper]
- Taegeon Um, Gyewon Lee, Byung-Gon Chun (2021). Pluto: High-Performance IoT-Aware Stream Processing. IEEE ICDCS 2021. [Paper]
- Gyewon Lee, Jeongyoon Eo, Jangho Seo, Taegeon Um, Byung-Gon Chun (2018). High-Performance Stateful Stream Processing on Solid-State Drives, ACM APSys 2018. [Paper]
- Byung-Gon Chun, Tyson Condie, Yingda Chen, Brian Cho, Andrew Chung, Carlo Curino, Chris Douglas, Matteo Interlandi, Beomyeol Jeon, Joo Seong Jeong, Gyewon Lee, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Mariia Mykhailova, Shravan M. Narayanamurthy, Joseph Noor, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, Taegeon Um, Julia Wang, Markus Weimer, Youngseok Yang (2017). Apache REEF: Retainable Evaluator Execution Framework. ACM TOCS 2017. [Paper]
- Taegeon Um, Gyewon Lee, Sanha Lee, Kyungtae Kim, Byung-Gon Chun (2017). Scaling Up IoT Stream Processing. ACM APSys 2017. [Paper]
Talks
- 2023, ACM EuroSys, Rome, Italy, FlowKV: A Semantic-Aware Store for Large-Scale State Management of Stream Processing Engines. [Program]
- 2022, Naver Techtalk, Pangyo, Korea, Revamper: A Smart Caching System for Faster DNN Training with Data Augmentation.
- 2021, USENIX ATC, Online, Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training. [Slides] [Video]
- 2020, Hyperconnect Seminar, Seoul, Korea, Effective State Management in Stream Analytics using Persistent Storage.
- 2019, Naver Techtalk, Pangyo, Korea, Data Access-Pattern Aware Streaming Analytics on SSDs.
- 2018, ACM APSys, Jeju, Korea, High-Performance Stateful Stream Processing on Solid-State Drives.
- 2017, Naver Deview, Seoul, Korea, MIST: 고성능 IoT 스트림 처리 시스템.
Open-Source Projects
- 2018-Present, Apache Nemo (incubating), PMC and committer [GitHub]
- 2019-2021, Google Summer of Code mentor (The Apache Software Foundation)
- 2014-2018, Apache REEF, PMC and committer [GitHub]
- 2016-2018, MIST, core developer [GitHub]
Teaching Assistant
- 2018-2019, DS2 (Spark SQL, Spark Streaming) @Samsung Electronics
- 2016, Operating Systems (Tizen, Linux Kernel) @Seoul National University