Understanding Deep Learning through Over-parameterization: from Kernel Regime to Feature Learning.

Date:

Understanding the learning dynamics of neural networks with (stochastic) gradient descent is a long-term goal for deep learning theory research. In this talk, the trend from the neural tangent kernel (NTK) regime to feature learning dynamics will be introduced. The NTK has been a powerful tool for researchers to understand the optimization and generalization of over-parameterized networks. We first introduce the foundation of the NTK in addition to its application to neural architecture search and active learning. Furthermore, more recent works found that the neural networks are performing feature learning during gradient descent training. We will then introduce how feature learning emerges and its application in understanding the role of graph convolution in graph neural networks.