Publications

On the Comparison between Multi-modal and Single-modal Contrastive Learning.

Published in NeurIPS, 2024

In this work, we introduce a feature learning theory framework that provides a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning.

Download here

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization.

Published in NeurIPS, 2024

This work delves deeply into the benign overfitting perspective of transformers in vision

Download here

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning.

Published in NeurIPS, 2024

This work provides a fine-grained mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities, offering insights into how transformers innovate solutions for certain unseen tasks encoded with multiple cross-concept semantics

Download here

Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method.

Published in NeurIPS, 2024

In this work, we construct a theoretical analysis framework for prompt-based federated learning via feature learning theory.

Download here

Provable and Efficient Dataset Distillation for Kernel Ridge Regression

Published in NeurIPS, 2024

In this paper, by focusing on dataset distillation for kernel ridge regression (KRR), we show that one data point per class is already necessary and sufficient to recover the original models performance in many settings.

Download here

SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining

Published in NeurIPS, 2024

In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain.

Download here

Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability.

Published in NeurIPS, 2024

We investigate the non-convex dynamics of a one-layer linear causal self-attention model autoregressively trained by gradient flow, where the sequences are generated by an AR process.

Download here

Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance.

Published in ICML, 2024

In this work, we propose to guide diffusion models for adversarial purification using contrastive guidance.

Download here

Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples.

Published in ICML, 2024

We provably show that both uncertainty-based and diversity-based NAL are inherently amenable to one and the same principle, i.e., striving to prioritize samples that contain yet-to-be-learned features.

Download here

The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs.

Published in KDD, 2024

In this paper, we transfer the prevailing concept of ``one node one receptive field” to the heterophilic graph.

Download here

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning.

Published in CVPR, 2024

We present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis.

Download here

Graph Lottery Ticket Automated.

Published in ICLR, 2024

This paper introduces an Adaptive, Dynamic, and Automated framework for identifying Graph Lottery Tickets (AdaGLT).

Download here

Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory.

Published in ICLR, 2024

This work aims to establish a unified theoretical foundation for understanding FL through feature learning theory.

Download here

Analyzing Generalization of Neural Networks through Loss Path Kernels.

Published in NeurIPS, 2023

We establish a new connection between the loss dynamics of gradient flow and general kernel machines by proposing a new kernel, called loss path kernel.

Download here

Fed-CO2: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated Learning.

Published in NeurIPS, 2023

We propose Fed-CO2, a universal FL framework that handles both label distribution skew and feature skew within a Cooperation mechanism between the Online and Offline models

Download here

Understanding and Improving Feature Learning for Out-of-Distribution Generalization.

Published in NeurIPS, 2023

We propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization

Download here

Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph.

Published in TMLR, 2023

We theoretically study the concentration property of features obtained by neighborhood aggregation on homophilic and heterophilic graphs, introduce the single-pass graph contrastive learning loss based on the property, and provide performance guarantees for the minimizer of the loss on downstream tasks.

Download here

No Free Lunch in Neural Architectures? A Joint Analysis of Expressivity, Convergence, and Generalization.

Published in Auto-ML, 2023

To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization?

Download here

Analyzing Deep PAC-Bayesian Learning with Neural Tangent Kernel: Convergence, Analytic Generalization Bound, and Efficient Hyperparameter Selection.

Published in TMLR, 2023

This paper proposes a theoretical convergence and generalization analysis for Deep PAC-Bayesian learning. For a deep and wide probabilistic neural network, our analysis shows that PAC-Bayesian learning corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as the kernel.

Download here

Weighted Mutual Learning with Diversity-Driven Model Compression.

Published in NeurIPS, 2022

This paper, for the first time, leverages a bi-level formulation to estimate the relative importance of peers with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks.

Download here

Deep Active Learning by Leveraging Training Dynamics.

Published in NeurIPS, 2022

In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that the convergence speed of training and the generalization performance are positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to better generalization performance.

Download here

Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations.

Published in NeurIPS, 2022

In this work, we leverage influence functions, the functional derivatives of the loss function, to theoretically reveal the operation selection part in DARTS and estimate the candidate operation importance by approximating its influence on the supernet with Taylor expansions. We show the operation strength is not only related to the magnitude but also secondorder information, leading to a fundamentally new criterion for operation selection in DARTS, named Influential Magnitude.

Download here

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis.

Published in NeurIPS, 2022

We theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network’s Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates.

Download here

Pruning graph neural networks by evaluating edge properties.

Published in Knowledge-Based Systems, 2022

We formulate the performance of GNNs mathematically with respect to the properties of their edges, elucidating how the performance drop can be avoided by pruning negative edges and nonbridges. This leads to our simple but effective two-step method for GNN pruning, leveraging the saliency metrics for the network pruning while sparsifying the graph with preservation of the loss performance.

Download here

Auto-scaling Vision Transformers without Training

Published in ICLR, 2022

This work targets automated designing and scaling of Vision Transformers (ViTs). We propose As-ViT, an auto-scaling framework for ViTs without training, which automatically discovers and scales up ViTs in an efficient and principled manner.

Download here

Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective

Published in ICLR, 2022

This work exploits the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process.

Download here

On the Equivalence between Neural Network and Support Vector Machine

Published in NeurIPS, 2021

We propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent.

Download here

Gaussian process latent variable model factorization for context-aware recommender systems

Published in Pattern Recognition Letters, 2021

In order to address such shortcomings, we propose a Gaussian Process Latent Variable Model Factorization (GPLVMF) method, where we apply an appropriate prior to the original GP model.

Download here

On the neural tangent kernel of deep networks with orthogonal initialization

Published in IJCAI, 2021

In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK).

Download here

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

Published in ECAI, 2020

We perform theoretical computation on linear dropout networks and a series of experiments on dropout networks with different activation functions.

Download here

Critical percolation clusters in seven dimensions and on a complete graph

Published in Physical Review E, 2018

We study critical bond percolation on a seven-dimensional (7D) hypercubic lattice with periodic boundary conditions and on the complete graph (CG) of finite volume (number of vertices).

Download here

Adaptive multi-GPU exchange Monte Carlo for the 3D random field Ising model

Published in Computer Physics Communications, 2016

This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM).

Download here

Wei Huang

Publications