Benign Overfitting of Vision Transformers.
Invited Talk, The 14th American Institute of Mathematical Sciences, NYU Abu Dhabi
Invited Talk, The 14th American Institute of Mathematical Sciences, NYU Abu Dhabi
Invited Talk, The Third RIKEN AIP & A*STAR-CFAR Joint Workshop on Machine Learning and Artificial Intelligence, A*STAR, Singapore
Invited Talk, HKBU-RIKEN AIP Joint Workshop on Artificial Intelligence (AI) and Machine Learning (ML), Hong Kong
Invited Talk, The Machine Learning Summer School in Okinawa 2024, Okinawa Institute of Science and Technology (OIST), Japan
Contributed Talk, ICML 2023 workshop on High-Dimensional Learning Dynamics, Honolulu, Hawaii
Invited Talk, MBZUAI & RIKEN-AIP Joint Workshop, Nihonbashi, Tokyo
Invited Talk, Vector Institute & RIKEN AIP Joint Symposium on Machine Learning and Artificial Intelligence, Nihonbashi, Tokyo
Invited Talk, The Chinese University of Hong Kong, hosted by Dr. Fenglei Fan, Hong Kong, China
Understanding the learning dynamics of neural networks with (stochastic) gradient descent is a long-term goal for deep learning theory research. In this talk, the trend from the neural tangent kernel (NTK) regime to feature learning dynamics will be introduced. The NTK has been a powerful tool for researchers to understand the optimization and generalization of over-parameterized networks. We first introduce the foundation of the NTK in addition to its application to neural architecture search and active learning. Furthermore, more recent works found that the neural networks are performing feature learning during gradient descent training. We will then introduce how feature learning emerges and its application in understanding the role of graph convolution in graph neural networks.
Invited Talk, AI TIME, Sydney, Australia
We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally.
Seminar Talk, RIKEN AIP, hosted by A/Prof. Taiji Suzuki, Tokyo, Japan
Deep learning has been responsible for a step-change in performance across machine learning, setting new benchmarks in a large number of applications. During my Ph.D. study, I seek to understand the theoretical properties of deep neural networks and close the gap between the theory and application sides. This presentation will introduce three concrete works with respect to the neural tangent kernel (NTK), one of the seminal advances in deep learning theory recently.
Talk, Monash University, hosted by Prof. Reza Haffari, Melbourne, Australia
The learning dynamics of neural networks trained by gradient descent are captured by the so-called neural tangent kernel (NTK) in the infinite-width limit. The NTK has been a powerful tool for researchers to understand the optimization and generalization of over-parameterized networks. In this talk, the foundation of the NTK in addition to its application to orthogonally-initialized networks and ultra-wide graph networks will be introduced.
Talk, Renmin University of China, hosted by A/Prof. Yong Liu, Beijing, China
Recently researcher find that the dynamics of infinitely-wide neural networks under gradient desent training are captured by neural tangent kernel. With the help of neural tangent kernel, researcher can prove that over-paramterized neural network can find global minimum, which is a milestone in the area of deep learning theory. This talk will present the basic properties of neural tangent kernel and my research output regarding neural tangent kernel.