IEEE/IEIE ICCE-Asia 2023
TODAY 2023. 12. 10
Unlocking the potential of Generative AI by model compression
Dr. Hyungjun Kim
- CEO of SqueezeBits Inc.
With the emergence of ChatGPT, there is a growing interest in large-scale generative AI models. Deep learning models have been steadily growing since 2012, and now models with over 1 billion parameters are commonly found.
The increase in the size of AI models implies the need for more hardware resources to run these models, leading to issues such as the cost of AI-based services and limitations in usage environments.
AI model compression is a technique to address these problems by compressing the size of models or making them run faster while maintaining their performance. This enables running AI services at a lower cost or achieving faster inference speeds. AI compression techniques include Quantization, Pruning, Knowledge distillation, and others. In this talk, several practical approaches to compress large-scale generative AI models such as Stable Diffusion and LLMs will be discussed.
- Hyungjun Kim received his bachelor’s and PhD degrees from Pohang University of Science and Technology (POSTECH). He worked at the Holst Centre in Netherlands as a research intern for organic memory diode design from January to Sep, 2015 and also spent the summer of 2018 at IBM T.J. Watson Research Center for in-memory neural network hardware design. After receiving the PhD degree, he was employed as a researcher of POSTECH Future IT Innovation Laboratory from 2021 to 2022. His research for last 10 years includes hardware-algorithm co-design for efficient deep learning system. Especially, he focused on in-memory neural network accelerator and various model compression techniques such as quantization, pruning and knowledge distillation. Based on his research achievements, he founded SqueezeBits Inc., a startup building efficient AI models and systems, and currently serves as the CEO.