No Free Lunch in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization.
Published in Auto-ML, 2023
Download here
Published in Auto-ML, 2023
Download here
Published in TMLR, 2023
This paper proposes a theoretical convergence and generalization analysis for Deep PAC-Bayesian learning. For a deep and wide probabilistic neural network, our analysis shows that PAC-Bayesian learning corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as the kernel.
Download here
Published in NeurIPS, 2022
This paper, for the first time, leverages a bi-level formulation to estimate the relative importance of peers with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks.
Download here
Published in NeurIPS, 2022
In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that the convergence speed of training and the generalization performance are positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to better generalization performance.
Download here
Published in NeurIPS, 2022
In this work, we leverage influence functions, the functional derivatives of the loss function, to theoretically reveal the operation selection part in DARTS and estimate the candidate operation importance by approximating its influence on the supernet with Taylor expansions. We show the operation strength is not only related to the magnitude but also secondorder information, leading to a fundamentally new criterion for operation selection in DARTS, named Influential Magnitude.
Download here
Published in NeurIPS, 2022
We theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network’s Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates.
Download here
Published in Knowledge-Based Systems, 2022
We formulate the performance of GNNs mathematically with respect to the properties of their edges, elucidating how the performance drop can be avoided by pruning negative edges and nonbridges. This leads to our simple but effective two-step method for GNN pruning, leveraging the saliency metrics for the network pruning while sparsifying the graph with preservation of the loss performance.
Download here
Published in ICLR, 2022
This work targets automated designing and scaling of Vision Transformers (ViTs). We propose As-ViT, an auto-scaling framework for ViTs without training, which automatically discovers and scales up ViTs in an efficient and principled manner.
Download here
Published in ICLR, 2022
This work exploits the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process.
Download here
Published in NeurIPS, 2021
We propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent.
Download here
Published in Pattern Recognition Letters, 2021
In order to address such shortcomings, we propose a Gaussian Process Latent Variable Model Factorization (GPLVMF) method, where we apply an appropriate prior to the original GP model.
Download here
Published in IJCAI, 2021
In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK).
Download here
Published in ECAI, 2020
We perform theoretical computation on linear dropout networks and a series of experiments on dropout networks with different activation functions.
Download here
Published in Physical Review E, 2018
We study critical bond percolation on a seven-dimensional (7D) hypercubic lattice with periodic boundary conditions and on the complete graph (CG) of finite volume (number of vertices).
Download here
Published in Computer Physics Communications, 2016
This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM).
Download here