Arrow Research search

Author name cluster

Junchi Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

205 papers
2 author rows

Possible papers

205

AAAI Conference 2026 Conference Paper

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

  • Tongcheng Zhang
  • Zhanpeng Zhou
  • Mingze Wang
  • Andi Han
  • Wei Huang
  • Taiji Suzuki
  • Junchi Yan

One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In Phase I, the magnitudes of model weights progressively diminish, and the model escapes the lazy regime; enters the rich regime. In Phase II, the alignment between model weights and the ground-truth interpolator increases, and the model eventually converges. Our analysis highlights the critical role of label noise in driving the transition from the lazy to the rich regime and minimally explains its empirical success. Furthermore, we extend these insights to Sharpness-Aware Minimization (SAM), showing that the principles governing label noise SGD also apply to broader optimization algorithms. Extensive experiments, conducted under both synthetic and real-world setups, strongly support our theory.

AAAI Conference 2026 Conference Paper

ST-TPP: Learning Semi-Transductive Temporal Point Processes with Gromov-Wasserstein Barycentric Regularization

  • Qingmei Wang
  • Tianyu Huang
  • Yujie Long
  • Yuxin Wu
  • Fanmeng Wang
  • Xi Sun
  • Junchi Yan
  • Hongteng Xu

The generative mechanisms behind real-world event sequences are often heterogeneous, leading to data that possesses inherent clustering structures. However, most existing temporal point processes (TPPs) treat different event sequences independently, without leveraging the clustering structures when predicting events. In this study, we design and learn a novel semi-transductive temporal point process (ST-TPP), which explicitly improves prediction performance by co-training sequence clusters. In particular, given a set of event sequences, our method learns a neural TPP together with cluster centers of the sequences. Besides maximizing the likelihood of the event sequences, we leverage a data-based kernel matrix and prior knowledge to regularize the sequence embeddings, leading to a Gromov-Wasserstein barycentric (GWB) regularizer. Based on the optimal transport plans associated with the GWB regularizer, we derive the cluster centers by the push-forward of the sequence embeddings. When a new sequence comes, the learned model first assigns a cluster center to the sequence and then jointly encodes the sequence and the cluster center to predict future events, leading to a semi-transductive prediction scheme. Experiments demonstrate that ST-TPP achieves competitive sequence clustering results and strong prediction performance.

AAAI Conference 2026 Conference Paper

Towards Real-Time Neutral Atom Array Assembly via Unsupervised Hologram Generation and Path Optimization

  • Ge Yan
  • Yuchen Wang
  • Junchi Yan

The rapid and reliable assembly of defect-free atom arrays poses a fundamental challenge for neutral atom quantum computing. While parallel rearrangement methods using spatial light modulators show promise, they suffer from significant overhead in two sub-tasks: atom-site matching and hologram generation. We propose a framework to address these bottlenecks and enhance the efficiency and fidelity of the assembly process. It features a new optimization objective for atom-site matching that minimizes the longest movement path, and a Fourier U-Net model that integrates Fourier operators with image-to-image translation to enable real-time hologram generation. The model is trained in a fully self-supervised paradigm, leveraging the physical properties of holography to remove the need for costly ground-truth labels. Experimental results show our framework not only significantly outperforms the state-of-the-art supervised CNN-based model but also achieves an inference speed orders of magnitude faster than traditional iterative algorithms, enabling real-time, dynamic atom rearrangement.

ICLR Conference 2025 Conference Paper

Beyond Circuit Connections: A Non-Message Passing Graph Transformer Approach for Quantum Error Mitigation

  • Tianyi Bao
  • Xinyu Ye
  • Hang Ruan
  • Chang Liu 0021
  • Wenjie Wu
  • Junchi Yan

Despite the progress in quantum computing, one major bottleneck against the practical utility is its susceptibility to noise, which frequently occurs in current quantum systems. Existing quantum error mitigation (QEM) methods either lack generality to noise and circuit types or fail to capture the global dependencies of entire systems in addition to circuit structure. In this work, we first propose a unique circuit-to-graph encoding scheme with qubit-wise noisy measurement aggregated. Then, we introduce GTranQEM, a non-message passing graph transformer designed to mitigate errors in expected circuit measurement outcomes effectively. GTranQEM is equipped with a quantum-specific positional encoding, a structure matrix as attention bias guiding nonlocal aggregation, and a virtual quantum-representative node to further grasp graph representations, which guarantees to model the long-range entanglement. Experimental evaluations demonstrate that GTranQEM outperforms state-of-the-art QEM methods on both random and structured quantum circuits across noise types and scales among diverse settings.

NeurIPS Conference 2025 Conference Paper

Bootstrapping Hierarchical Autoregressive Formal Reasoner with Chain-of-Proxy-Autoformalization

  • Qi Liu
  • Xinhao Zheng
  • Renqiu Xia
  • Qinxiang Cao
  • Junchi Yan

Deductive formal problem-solving (D-FPS) enables process-verified, human-aligned problem-solving by implementing deductive solving processes within formal theorem proving (FTP) environments. However, current methods fail to address the misalignment between informal and formal reasoning granularity and suffer from inefficiency due to backtracking and error propagation. Moreover, the extreme scarcity of formal problem-solution pairs further hinders progress. For the first gap, we propose **HAR** (_**H**ierarchical **A**utoregressive Formal **R**easoner_), a novel reasoning pipeline. HAR decouples informal-aligned drafting and detailed proving, and formulates solution construction as autoregressive generation with per-step feedback. Second, we propose **CoPA** (_**C**hain-**o**f-**P**roxy-**A**utoformalization_), a data generation pipeline that cascades statement autoformalization, proof drafting, and proof search as a proxy autoformalization path. Experiments demonstrate significant improvements: trained on data bootstrapped by CoPA, HAR achieves superior performance on FormalMath500 ($15. 50\\%\mapsto 44. 09\\%$) and MiniF2F-Solving ($21. 87\\%\mapsto 56. 58\\%$) with lower computational budget. Explorations reveal promising directions in formal solution pruning and informal dataset denoising.

NeurIPS Conference 2025 Conference Paper

Bridging Crypto with ML-based Solvers: the SAT Formulation and Benchmarks

  • Xinhao Zheng
  • Xinhao Song
  • Bolin Qiu
  • Yang Li
  • Zhongteng Gui
  • Junchi Yan

The Boolean Satisfiability Problem (SAT) plays a crucial role in cryptanalysis, enabling tasks like key recovery and distinguisher construction. Conflict-Driven Clause Learning (CDCL) has emerged as the dominant paradigm in modern SAT solving, and machine learning has been increasingly integrated with CDCL-based SAT solvers to tackle complex cryptographic problems. However, the lack of a unified evaluation framework, inconsistent input formats, and varying modeling approaches hinder fair comparison. Besides, cryptographic SAT instances also differ structurally from standard SAT problems, and the absence of standardized datasets further complicates evaluation. To address these issues, we introduce SAT4CryptoBench, the first comprehensive benchmark for assessing machine learning–based solvers in cryptanalysis. SAT4CryptoBench provides diverse SAT datasets in both Arithmetic Normal Form (ANF) and Conjunctive Normal Form (CNF), spanning various algorithms, rounds, and key sizes. Our framework evaluates three levels of machine learning integration: standalone distinguishers for instance classification, heuristic enhancement for guiding solving strategies, and hyperparameter optimization for adapting to specific problem distributions. Experiments demonstrate that ANF-based networks consistently achieve superior performance over CNF-based networks in learning cryptographic features. Nonetheless, current ML techniques struggle to generalize across algorithms and instance sizes, with computational overhead potentially offsetting benefits on simpler cases. Despite this, ML-driven optimization strategies notably improve solver efficiency on cryptographic SAT instances. Besides, we propose BASIN, a bitwise solver taking plaintext-ciphertext bitstrings as inputs. Crucially, its superior performance on high-round problems highlights the importance of input modeling and the advantage of direct input representations for complex cryptographic structures.

ICLR Conference 2025 Conference Paper

BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP

  • Hao Yuan 0002
  • Wenli Ouyang
  • Changwen Zhang
  • Yong Sun
  • Liming Gong
  • Junchi Yan

Learning to solve large-scale Mixed Integer Program (MIP) problems is an emerging research topic, and policy learning-based Large Neighborhood Search (LNS) has been a popular paradigm. However, the explored space of LNS policy is often limited even in the training phase, making the learned policy sometimes wrongly fix some potentially important variables early in the search, leading to local optimum in some cases. Moreover, many methods only assume binary variables to deal with. We present a practical approach, termed Binarized-Tightening Branch-and-Search for Large Neighborhood Search (BTBS-LNS). It comprises three key techniques: 1) the ``Binarized Tightening" technique for integer variables to handle their wide range by binary encoding and bound tightening; 2) an attention-based tripartite graph to capture global correlations among variables and constraints for an MIP instance; 3) an extra branching network as a global view, to identify and optimize wrongly-fixed backdoor variables at each search step. Experiments show its superior performance over the open-source solver SCIP and LNS baselines. Moreover, it performs competitively with, and sometimes better than the commercial solver Gurobi (v9.5.0), especially on the MIPLIB2017 benchmark chosen by Hans Mittelmann, where our method can deliver 10\% better primal gaps compared with Gurobi in a 300s cut-off time.

ICML Conference 2025 Conference Paper

COExpander: Adaptive Solution Expansion for Combinatorial Optimization

  • Jiale Ma
  • Wenzheng Pan
  • Yang Li
  • Junchi Yan

Despite rapid progress in neural combinatorial optimization (NCO) for solving CO problems (COPs), as the problem scale grows, several bottlenecks persist: 1) solvers in the Global Prediction (GP) paradigm struggle in long-range decisions where the overly smooth intermediate heatmaps impede effective decoding, and 2) solvers in the Local Construction (LC) paradigm are time-consuming and incapable of tackling large instances due to the onerous auto-regressive process. Observing these challenges, we propose a new paradigm named Adaptive Expansion AE with its instantiation COExpander, positioned to leverage both advantages of GP and LC. COExpander utilizes informative heatmaps generated by a global predictor, which is learned under the guidance of locally determined partial solutions, to in turn direct the expansion of determined decision variables with adaptive step-sizes. To ensure transparent evaluation, we further take the lead to canonicalize 29 benchmarks spanning 6 popular COPs (MIS, MCl, MVC, MCut, TSP, ATSP) and various scales (50-10K nodes), upon which experiments demonstrate concrete SOTA performance of COExpander over these tasks. Source code and our standardized datasets will be made public.

NeurIPS Conference 2025 Conference Paper

CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning

  • Pengyi Li
  • Shixiong Kai
  • Jianye Hao
  • Ruizhe Zhong
  • Hongyao Tang
  • Zhentao Tang
  • Mingxuan Yuan
  • Junchi Yan

Floorplanning is the initial step in the physical design process of Electronic Design Automation (EDA), directly influencing subsequent placement, routing, and final power of the chip. However, the solution space in floorplanning is vast, and current algorithms often struggle to explore it sufficiently, making them prone to getting trapped in local optima. To achieve efficient floorplanning, we propose CORE, a general and effective solution optimization framework that synergizes Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for high-quality layout search and optimization. Specifically, we propose the Clustering-based Diversified Evolutionary Search that directly perturbs layouts and evolves them based on novelty and performance. Additionally, we model the floorplanning problem as a sequential decision problem with B*-Tree representation and employ RL for efficient learning. To efficiently coordinate EAs and RL, we propose the reinforcement-driven mechanism and evolution-guided mechanism. The former accelerates population evolution through RL, while the latter guides RL learning through EAs. The experimental results on the MCNC and GSRC benchmarks demonstrate that CORE outperforms other strong baselines in terms of wirelength and area utilization metrics, achieving a 12. 9\% improvement in wirelength. CORE represents the first evolutionary reinforcement learning (ERL) algorithm for floorplanning, surpassing existing RL-based methods. The code is available at https: //github. com/yeshenpy/CORE.

ICLR Conference 2025 Conference Paper

CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning

  • Shaofeng Zhang
  • Qiang Zhou 0001
  • Sitong Wu
  • Haoru Tan
  • Zhibin Wang 0004
  • Jinfa Huang
  • Junchi Yan

Dense visual contrastive learning (DRL) shows promise for learning localized information in dense prediction tasks, but struggles with establishing pixel/patch correspondence across different views (cross-contrasting). Existing methods primarily rely on self-contrasting the same view with variations, limiting input variance and hindering downstream performance. This paper delves into the mechanisms of self-contrasting and cross-contrasting, identifying the crux of the issue: transforming discrete positional embeddings to continuous representations. To address the correspondence problem, we propose a Continuous Relative Rotary Positional Query ({\mname}), enabling patch-level representation learning. Our extensive experiments on standard datasets demonstrate state-of-the-art (SOTA) results. Compared to the previous SOTA method (PQCL), our approach achieves significant improvements on COCO: with 300 epochs of pretraining, {\mname} obtains \textbf{3.4\%} mAP$^{bb}$ and \textbf{2.1\%} mAP$^{mk}$ improvements for detection and segmentation tasks, respectively. Furthermore, {\mname} exhibits faster convergence, achieving \textbf{10.4\%} mAP$^{bb}$ and \textbf{7.9\%} mAP$^{mk}$ improvements over SOTA with just 40 epochs of pretraining.

TMLR Journal 2025 Journal Article

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

  • Yang Zhang
  • Chenjia Bai
  • Bin Zhao
  • Junchi Yan
  • Xiu Li
  • Xuelong Li

Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.

ICLR Conference 2025 Conference Paper

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving

  • Xiaosong Jia
  • Junqi You
  • Zhiyuan Zhang 0010
  • Junchi Yan

End-to-end autonomous driving (E2E-AD) has emerged as a trend in the field of autonomous driving, promising a data-driven, scalable approach to system design. However, existing E2E-AD methods usually adopt the sequential paradigm of perception-prediction-planning, which leads to cumulative errors and training instability. The manual ordering of tasks also limits the system’s ability to leverage synergies between tasks (for example, planning-aware perception and game-theoretic interactive prediction and planning). Moreover, the dense BEV representation adopted by existing methods brings computational challenges for long-range perception and long-term temporal fusion. To address these challenges, we present DriveTransformer, a simplified E2E-AD framework for the ease of scaling up, characterized by three key features: Task Parallelism (All agent, map, and planning queries direct interact with each other at each block), Sparse Representation (Task queries direct interact with raw sensor features), and Streaming Processing (Task queries are stored and passed as history information). As a result, the new framework is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention, which significantly reduces the complexity of system and leads to better training stability. DriveTransformer achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.

ICML Conference 2025 Conference Paper

DSBRouter: End-to-end Global Routing via Diffusion Schr\"{o}dinger Bridge

  • Liangliang Shi
  • Shenhui Zhang
  • Xingbo Du
  • Nianzu Yang
  • Junchi Yan

Global routing (GR) is a fundamental task in modern chip design and various learning techniques have been devised. However, a persistent challenge is the inherent lack of a mechanism to guarantee the routing connectivity in network’s prediction results, necessitating post-processing search or reinforcement learning (RL) to enforce the connectivity. In this paper, we propose a neural GR solver called DSBRouter, leveraging the Diffusion Schrödinger Bridge (DSB) model for GR. During training, unlike previous works that learn the mapping from noise to routes, we establish a bridge between the initial pins and the routing via DSB, which learns the forward and backward mapping between them. For inference, based on the evaluation metric (e. g. low overflow), we further introduce a sampling scheme with evaluation-based guidance to enhance the routing predictions. Note that DSBRouter is an end-to-end model that does not require a post-step to ensure connectivity. Empirical results show that it achieves SOTA performance on the overflow reduction in ISPD98 and part of ISPD07. In some cases, DSBRouter can even generate routes with zero overflow.

NeurIPS Conference 2025 Conference Paper

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

  • Xiangyu Zhao
  • Peiyuan Zhang
  • Kexian Tang
  • Xiaorong Zhu
  • Hao Li
  • Wenhao Chai
  • Zicheng Zhang
  • Renqiu Xia

Large Multi-modality Models (LMMs) have made significant progress in visual understanding and generation, but they still face challenges in General Visual Editing, particularly in following complex instructions, preserving appearance consistency, and supporting flexible input formats. To study this gap, we introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE). RISEBench focuses on four key reasoning categories: Temporal, Causal, Spatial, and Logical Reasoning. We curate high-quality test cases for each category and propose an robust evaluation framework that assesses Instruction Reasoning, Appearance Consistency, and Visual Plausibility with both human judges and the LMM-as-a-judge approach. We conducted experiments evaluating nine prominent visual editing models, comprising both open-source and proprietary models. The evaluation results demonstrate that current models face significant challenges in reasoning-based editing tasks. Even the most powerful model evaluated, GPT-image-1, achieves an accuracy of merely 28. 8%. RISEBench effectively highlights the limitations of contemporary editing models, provides valuable insights, and indicates potential future directions for the field of reasoning-aware visual editing. Our code and data have been released at https: //github. com/PhoenixZ810/RISEBench.

ICRA Conference 2025 Conference Paper

FlatFusion: Delving Into Details of Sparse Transformer-Based Camera-LiDAR Fusion for Autonomous Driving

  • Yutao Zhu 0008
  • Xiaosong Jia
  • Xinyu Yang 0002
  • Junchi Yan

The integration of data from various sensor modalities (e. g. camera and LiDAR) constitutes a prevalent methodology within the ambit of autonomous driving scenarios. Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats. When it comes to fusion, since image patches are dense in pixel space with ambiguous depth, it necessitates additional design considerations for effective fusion. In this paper, we conduct a comprehensive exploration of design choices for transformer-based sparse camera-LiDAR fusion. This investigation encompasses strategies for image-to-3D and LiDAR-to-2D mapping, attention neighbor grouping, single modal tokenizer, and micro-structure of Transformer. By amalgamating the most effective principles uncovered through our investigation, we introduce FlatFusion, a carefully designed framework for sparse camera-LiDAR fusion. Notably, FlatFusion significantly outperforms state-of-the-art sparse Transformer-based methods, including UniTR, CMT, and SparseFusion, achieving 73. 7 NDS on the nuScenes validation set with 10. 1 FPS with PyTorch.

NeurIPS Conference 2025 Conference Paper

Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape

  • Shiyue Wang
  • Ziao Guo
  • Changhong Lu
  • Junchi Yan

Langevin Dynamics (LD) and its discrete proposal have been widely applied in the field of Combinatorial Optimization (CO). Both sampling-based and data-driven approaches have benefited significantly from these methods. However, LD's reliance on Gaussian noise limits its ability to escape narrow local optima, requires costly parallel chains, and performs poorly in rugged landscapes or with non-strict constraints. These challenges have impeded the development of more advanced approaches. To address these issues, we introduce Fractional Langevin Dynamics (FLD) for CO, replacing Gaussian noise with $\alpha$-stable L\'evy noise. FLD can escape from local optima more readily via L\'evy flights, and in multiple-peak CO problems with high potential barriers it exhibits a polynomial escape time that outperforms the exponential escape time of LD. Moreover, FLD coincides with LD when $\alpha = 2$, and by tuning $\alpha$ it can be adapted to a wider range of complex scenarios in the CO fields. We provide theoretical proof that our method offers enhanced exploration capabilities and improved convergence. Experimental results on the Maximum Independent Set, Maximum Clique, and Maximum Cut problems demonstrate that incorporating FLD advances both sampling-based and data-driven approaches, achieving state-of-the-art (SOTA) performance in most of the experiments.

NeurIPS Conference 2025 Conference Paper

Generation as Search Operator for Test-Time Scaling of Diffusion-based Combinatorial Optimization

  • Yang Li
  • Lvda Chen
  • Haonan Wang
  • Runzhong Wang
  • Junchi Yan

While diffusion models have shown promise for combinatorial optimization (CO), their inference-time scaling cost-efficiency remains relatively underexplored. Existing methods improve solution quality by increasing denoising steps, but the performance often becomes saturated quickly. This paper proposes GenSCO to systematically scale diffusion solvers by an orthogonal dimension of inference-time computation beyond denoising step expansion, i. e. , search-driven generation. GenSCO takes generation as a search operator rather than a complete solving process, where each operator cycle combines solution disruption (via local search operators) and diffusion sampling, enabling iterative exploration of the learned solution space. Rather than over-refining current solutions, this paradigm encourages the model to leave local optima and explore a broader area of the solution space, ensuring a more consistent scaling effect. The search loop is supported by a search-friendly solution-enhancement training procedure that incorporates a rectified flow model learning to establish diffusion trajectories between suboptimal solutions and the optimal ones. The flow model is empowered by a lightweight transformer architecture to learn neural ODEs that linearize solution trajectories, accelerating convergence of the scaling effect with efficiency. The resulting enhanced scaling efficiency and practical scalability lead to synergistic performance improvements. Extensive experiments show that GenSCO delivers performance improvements by orders of magnitude over previous state-of-the-art neural methods. Notably, GenSCO even achieves significant speedups compared to the state-of-the-art classic mathematical solver LKH3, delivering a 141x speedup to reach 0. 000% optimality gap on TSP-100, and approximately a 10x speedup to reach 0. 02% on TSP-500.

ICML Conference 2025 Conference Paper

Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning

  • Yang Li 0197
  • Jiale Ma
  • Yebin Yang
  • Qitian Wu
  • Hongyuan Zha
  • Junchi Yan

Predicting labels directly from data has been the standard in label learning tasks, e. g. , supervised learning, where models often prioritize feature compression and extraction from inputs under the assumption that label information is less complex. However, recent prediction tasks often face predicting complex labels, exacerbating the challenge of learning mappings from learned features to high-fidelity label representations. To this end, we draw inspiration from the consistency training concept in generative consistency models and propose predictive consistency learning (PCL), a novel learning paradigm that decomposes the full label information into a progressive learning procedure, mitigating the label capture challenge. Besides data inputs, PCL additionally receives input from noise-perturbed labels as an additional reference, pursuing predictive consistency across different noise levels. It simultaneously learns the relationship between latent features and a spectrum of label information, which enables progressive learning for complex predictions and allows multi-step inference analogous to gradual denoising, thereby enhancing the prediction quality. Experiments on vision, text, and graph tasks show the superiority of PCL over conventional supervised training in complex label prediction tasks.

ICLR Conference 2025 Conference Paper

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

  • Renqiu Xia
  • Mingsheng Li
  • Hancheng Ye
  • Wenjie Wu
  • Hongbin Zhou
  • Jiakang Yuan
  • Tianshuo Peng
  • Xinyu Cai

Despite their proficiency in general tasks, Multi-modal Large Language Models (MLLMs) struggle with automatic Geometry Problem Solving (GPS), which demands understanding diagrams, interpreting symbols, and performing complex reasoning. This limitation arises from their pre-training on natural images and texts, along with the lack of automated verification in the problem-solving process. Besides, current geometric specialists are limited by their task-specific designs, making them less effective for broader geometric problems. To this end, we present GeoX, a multi-modal large model focusing on geometric understanding and reasoning tasks. Given the significant differences between geometric diagram-symbol and natural image-text, we introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora. Furthermore, we introduce geometry-language alignment, an effective pre-training paradigm that bridges the modality gap between unimodal geometric experts. We propose a Generator-And-Sampler Transformer (GS-Former) to generate discriminative queries and eliminate uninformative representations from unevenly distributed geometric signals. Finally, GeoX benefits from visual instruction tuning, empowering it to take geometric images and questions as input and generate verifiable solutions. Experiments show that GeoX outperforms both generalists and geometric specialists on publicly recognized benchmarks, such as GeoQA, UniGeo, Geometry3K, and PGPS9k. Our data and code will be released soon to accelerate future research on automatic GPS.

ICML Conference 2025 Conference Paper

HEAP: Hyper Extended A-PDHG Operator for Constrained High-dim PDEs

  • Mingquan Feng
  • Weixin Liao
  • Yixin Huang
  • Yifan Fu
  • Qifu Zheng
  • Junchi Yan

Neural operators have emerged as a promising approach for solving high-dimensional partial differential equations (PDEs). However, existing neural operators often have difficulty in dealing with constrained PDEs, where the solution must satisfy additional equality or inequality constraints beyond the governing equations. To close this gap, we propose a novel neural operator, Hyper Extended Adaptive PDHG (HEAP) for constrained high-dim PDEs, where the learned operator evolves in the parameter space of PDEs. We first show that the evolution operator learning can be formulated as a quadratic programming (QP) problem, then unroll the adaptive primal-dual hybrid gradient (APDHG) algorithm as the QP-solver into the neural operator architecture. This allows us to improve efficiency while retaining theoretical guarantees of the constrained optimization. Empirical results on a variety of high-dim PDEs show that HEAP outperforms the state-of-the-art neural operator model.

ICLR Conference 2025 Conference Paper

HShare: Fast LLM Decoding by Hierarchical Key-Value Sharing

  • Huaijin Wu
  • Lianqiang Li
  • Hantao Huang
  • Tu Yi
  • Jihang Zhang
  • Minghui Yu
  • Junchi Yan

The frequent retrieval of Key-Value (KV) cache data has emerged as a significant factor contributing to the inefficiency of the inference process in large language models. Previous research has demonstrated that a small subset of critical KV cache tokens largely influences attention outcomes, leading to methods that either employ fixed sparsity patterns or dynamically select critical tokens based on the query. While dynamic sparse patterns have proven to be more effective, they introduce significant computational overhead, as critical tokens must be reselected for each self-attention computation. In this paper, we reveal substantial similarities in KV cache token criticality across neighboring queries, layers, and heads. Motivated by this insight, we propose HShare, a hierarchical KV sharing framework. HShare facilitates the sharing of critical KV cache token indices across layers, heads, and queries, which significantly reduces the computational overhead associated with query-aware dynamic token sparsity. In addition, we introduce a greedy algorithm that dynamically determines the optimal layer-level and head-level sharing configuration for the decoding phase. We evaluate the effectiveness and efficiency of HShare across various tasks using three models: LLaMA2-7b, LLaMA3-70b, and Mistral-7b. Experimental results demonstrate that HShare achieves competitive accuracy with different sharing ratios, while delivering up to an $8.6\times$ speedup in self-attention operations and a $2.7\times$ improvement in end-to-end throughput compared with FlashAttention2 and GPT-fast respectively. The source code is publicly available at ~\url{https://github.com/wuhuaijin/HShare}.

AAAI Conference 2025 Conference Paper

Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning

  • Xiaolei Chen
  • Junchi Yan
  • Wenlong Liao
  • Tao He
  • Pai Peng

Motion planning is a critical module in autonomous driving, with the primary challenge of uncertainty caused by interactions with other participants. As most previous methods treat prediction and planning as separate tasks, it is difficult to model these interactions. Furthermore, since the route path navigates ego vehicles to a predefined destination, it provides relatively stable intentions for ego vehicles and helps constrain uncertainty. On this basis, we construct Int2Planner, an Intention-based Integrated motion Planner achieves multi-modal planning and prediction. Instead of static intention points, Int2Planner utilizes route intention points for ego vehicles and generates corresponding planning trajectories for each intention point to facilitate multi-modal planning. The experiments on the private dataset and the public nuPlan benchmark show the effectiveness of route intention points, and Int2Planner achieves state-of-the-art performance. We also deploy it in real-world vehicles and have conducted autonomous driving for hundreds of kilometers in urban areas. It further verifies that Int2Planner can continuously interact with the traffic environment.

ICML Conference 2025 Conference Paper

Learning Initial Basis Selection for Linear Programming via Duality-Inspired Tripartite Graph Representation and Comprehensive Supervision

  • Anqi Lu
  • Junchi Yan

For the fundamental linear programming (LP) problems, the simplex method remains popular, which usually requires an appropriate initial basis as a warm start to accelerate the solving process. Predicting an initial basis close to an optimal one can often accelerate the solver, but a closer initial basis does not always result in greater acceleration. To achieve better acceleration, we propose a GNN model based on a tripartite graph representation inspired by LP duality. This approach enables more effective feature extraction for general LP problems and enhances the expressiveness of GNNs. Additionally, we introduce novel loss functions targeting basic variable selection and basis feasibility, along with data preprocessing schemes, to further improve learning capability. In addition to achieving high prediction accuracy, we enhance the quality of the initial basis for practical use. Experimental results show that our approach greatly surpasses the state-of-the-art method in predicting initial basis with greater accuracy and in reducing the number of iterations and solving time of the LP solver.

ICLR Conference 2025 Conference Paper

Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching

  • Zetian Jiang
  • Jiaxin Lu 0001
  • Haizhao Fan
  • Tianzhe Wang
  • Junchi Yan

Partial matching is a kind of graph matching where only part of two graphs can be aligned. This problem is particularly important in computer vision applications, where challenges like point occlusion or annotation errors often occur when labeling key points. Previous work has often conflated point occlusion and annotation errors, despite their distinct underlying causes. We propose two components to address these challenges: (1) a structured universe graph is learned to connect two input graphs $X_{ij} = X_{iu} X_{ju}^\top$, effectively resolving the issue of point occlusion; (2) an energy-based out-of-distribution detection is designed to remove annotation errors from the input graphs before matching. We evaluated our method on the Pascal VOC and Willow Object datasets, focusing on scenarios involving point occlusion and random outliers. The experimental results demonstrate that our approach consistently outperforms state-of-the-art methods across all tested scenarios, highlighting the accuracy and robustness of our method.

NeurIPS Conference 2025 Conference Paper

ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs

  • Jiale Ma
  • Wenzheng Pan
  • Yang Li
  • Junchi Yan

Combinatorial problems on graphs have attracted extensive efforts from the machine learning community over the past decade. Despite notable progress in this area under the umbrella of ML4CO, a comprehensive categorization, unified reproducibility, and transparent evaluation protocols are still lacking for the emerging and immense pool of neural CO solvers. In this paper, we establish a modular and streamlined framework benchmarking prevalent neural CO methods, dissecting their design choices via a tri-leveled "paradigm-model-learning'' taxonomy to better characterize different approaches. Further, we integrate their shared features and respective strengths to form 3 unified solvers representing global prediction (GP), local construction (LC), and adaptive expansion (AE) mannered neural solvers. We also collate a total of 65 datasets for 7 mainstream CO problems (including both edge-oriented tasks: TSP, ATSP, CVRP, as well as node-oriented: MIS, MCl, MVC, MCut) across scales to facilitate more comparable results among literature. Extensive experiments upon our benchmark reveal a fair and exact performance exhibition indicative of the raw contribution of the learning components in each method, rethinking and insisting that pre- and post-inference heuristic tricks are not supposed to compensate for sub-par capability of the data-driven counterparts. Under this unified benchmark, an up-to-date replication of typical ML4CO methods is maintained, hoping to provide convenient reference and insightful guidelines for both engineering development and academic exploration of the ML4CO community in the future. Code is available at https: //github. com/Thinklab-SJTU/ML4CO-Bench-101, and the dataset is at https: //huggingface. co/datasets/ML4CO/ML4CO-Bench-101-SL.

NeurIPS Conference 2025 Conference Paper

MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study

  • Yuqing Zhang
  • Yue Han
  • Shuanghe Zhu
  • Haoxiang Wu
  • Hangqi Li
  • Shengyu Zhang
  • Junchi Yan
  • Zemin Liu

Analyzing ancient manuscripts has traditionally been a labor-intensive and time-consuming task for philologists. While recent advancements in LMMs have demonstrated their potential across diverse domains, their effectiveness in manuscript study remains underexplored. In this paper, we introduce MS-Bench, the first comprehensive benchmark co-developed with archaeologists, comprising 5, 076 high-resolution images from 4th to 14th century and 9, 982 expert-curated questions across nine sub-tasks aligned with archaeological workflows. Through four prompting strategies, we systematically evaluate 32 LMMs on their effectiveness, robustness, and cultural contextualization. Our analysis reveals scale-driven performance and reliability improvements, prompting strategies' impact on performance (CoT has two-sides effect, while visual retrieval-augmented prompts provide consistent boost), and task-specific preferences depending on LMM’s visual capabilities. Although current LMMs are not yet capable of replacing domain expertise, they demonstrate promising potential to accelerate manuscript research through future human–AI collaboration.

NeurIPS Conference 2025 Conference Paper

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

  • Xiaohan Qin
  • Xiaoxing Wang
  • Ning Liao
  • Junchi Yan

Multi-Task Learning (MTL) enables a single model to learn multiple tasks simultaneously, leveraging knowledge transfer among tasks for enhanced generalization, and has been widely applied across various domains. However, task imbalance remains a major challenge in MTL. Although balancing the convergence speeds of different tasks is an effective approach to address this issue, it is highly challenging to accurately characterize the training dynamics and convergence speeds of multiple tasks within the complex MTL system. To this end, we attempt to analyze the training dynamics in MTL by leveraging Neural Tangent Kernel (NTK) theory and propose a new MTL method, NTKMTL. Specifically, we introduce an extended NTK matrix for MTL and adopt spectral analysis to balance the convergence speeds of multiple tasks, thereby mitigating task imbalance. Based on the approximation via shared representation, we further propose NTKMTL-SR, achieving training efficiency while maintaining competitive performance. Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning. Source code is available at https: //github. com/jianke0604/NTKMTL.

ICLR Conference 2025 Conference Paper

On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation

  • Xinyu Ye
  • Hao Xiong 0003
  • Jianhao Huang
  • Ziang Chen
  • Jia Wang
  • Junchi Yan

Graph-structured data is ubiquitous, and graph learning models have recently been extended to address complex problems like mixed-integer linear programming (MILP). However, studies have shown that the vanilla message-passing based graph neural networks (GNNs) suffer inherent limitations in learning MILP instance representation, i.e., GNNs may map two different MILP instance graphs to the same representation. In this paper, we introduce an expressive quantum graph learning approach, leveraging quantum circuits to recognize patterns that are difficult for classical methods to learn. Specifically, the proposed General Quantum Graph Learning Architecture (GQGLA) is composed of a node feature layer, a graph message interaction layer, and an optional auxiliary layer. Its generality is reflected in effectively encoding features of nodes and edges while ensuring node permutation equivariance and flexibly creating different circuit structures for various expressive requirements and downstream tasks. GQGLA is well suited for learning complex graph tasks like MILP representation. Experimental results highlight the effectiveness of GQGLA in capturing and learning representations for MILPs. In comparison to traditional GNNs, GQGLA exhibits superior discriminative capabilities and demonstrates enhanced generalization across various problem instances, making it a promising solution for complex graph tasks.

ICML Conference 2025 Conference Paper

On the Role of Label Noise in the Feature Learning Process

  • Andi Han
  • Wei Huang 0034
  • Zhanpeng Zhou
  • Gang Niu 0001
  • Wuyang Chen 0001
  • Junchi Yan
  • Akiko Takeda
  • Taiji Suzuki

Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each sample comprises a label-dependent signal and label-independent noise, and rigorously analyze the training dynamics of a two-layer convolutional neural network under this data setup, along with the presence of label noise. Our analysis identifies two key stages. In Stage I, the model perfectly fits all the clean samples (i. e. , samples without label noise) while ignoring the noisy ones (i. e. , samples with noisy labels). During this stage, the model learns the signal from the clean samples, which generalizes well on unseen data. In Stage II, as the training loss converges, the gradient in the direction of noise surpasses that of the signal, leading to overfitting on noisy samples. Eventually, the model memorizes the noise present in the noisy samples and degrades its generalization ability. Furthermore, our analysis provides a theoretical basis for two widely used techniques for tackling label noise: early stopping and sample selection. Experiments on both synthetic and real-world setups validate our theory.

AAAI Conference 2025 Conference Paper

Optimal Control Operator Perspective and a Neural Adaptive Spectral Method

  • Mingquan Feng
  • Zhijie Chen
  • Yixin Huang
  • Yizhou Liu
  • Junchi Yan

Optimal control problems (OCPs) involve finding a control function for a dynamical system such that a cost functional is optimized. It is central to physical systems in both academia and industry. In this paper, we propose a novel instance-solution control operator perspective, which solves OCPs in a one-shot manner without direct dependence on the explicit expression of dynamics or iterative optimization processes. The control operator is implemented by a new neural operator architecture named Neural Adaptive Spectral Method (NASM), a generalization of classical spectral methods. We theoretically validate the perspective and architecture by presenting the approximation error bounds of NASM for the control operator. Experiments on synthetic environments and a real-world dataset verify the effectiveness and efficiency of our approach, including substantial speedup in running time, and high-quality in- and out-of-distribution generalization.

ICLR Conference 2025 Conference Paper

Optimal Flow Transport and its Entropic Regularization: a GPU-friendly Matrix Iterative Algorithm for Flow Balance Satisfaction

  • Liangliang Shi
  • Yu-Feng Li
  • Kaipeng Zeng
  • Yihui Tu
  • Junchi Yan

The Sinkhorn algorithm, based on Entropic Regularized Optimal Transport (OT), has garnered significant attention due to its computational efficiency enabled by GPU-friendly matrix-vector multiplications. However, vanilla OT primarily deals with computations between the source and target nodes in a bipartite graph, limiting its practical application in real-world transportation scenarios. In this paper, we introduce the concept of Optimal Flow Transport (OFT) as an extension, where we consider a more general graph setting and the marginal constraints in vanilla OT are replaced by flow balance constraints. To obtain solutions, we incorporate entropic regularization into the OFT and introduce virtual flows for individual nodes to tackle the issue of potentially numerous isolated nodes lacking flow passages. Our proposition, the OFT-Sinkhorn algorithm, utilizes GPU-friendly matrix iterations to maintain flow balance constraints and minimize the objective function, and theoretical results for global convergence is also proposed in this paper. Furthermore, we enhance OFT by introducing capacity constraints on nodes and edges, transforming the OFT problem into a minimum-cost flow problem. We then present the Capacity-Constrained EOFT-Sinkhorn algorithm and compare it with the traditional Minimum cost flow (MCF) algorithm, showing that our algorithm is quite efficient for calculation. In particular, our EOFT-Sinkhorn is evaluated on high-precision and integer-precision MCF problems with different scales from one hundred to five thousand size, exhibiting significant time efficiency and the ability to approximate optimal solutions.

IJCAI Conference 2025 Conference Paper

Optimize Battery Control: A Multi-Objective Evolutionary Ensemble Reinforcement Learning Approach

  • Jingwei Hu
  • Kai Xie
  • Zheng Fang
  • Xiaodong Li
  • Junchi Yan
  • Zhihong Zhang

The Dynamically Reconfigurable Battery (DRB) systems, which use high-speed power electronic switches to dynamically adjust battery interconnections in real-time, are critical to the performance of the battery pack. Traditional battery management strategies often fail to address multi-objective optimization, leading to imbalanced performance and inadequate energy utilization. To enhance decision-making across multiple objectives, an Evolutionary Ensemble Reinforcement Learning (EERL) framework is proposed in this paper. This framework incorporates evolutionary algorithms to associate ensemble learning, thus improving reinforcement learning (RL) performance. It decomposes a complex objective into multiple sub-objectives, each optimized independently, while incorporating diverse performance metrics into the correlation stage to derive the Pareto optimal solution. The EERL can efficiently mitigate potential adverse effects such as short circuits, disconnections, and reverse charging, thereby effectively reducing capacity differences among various batteries. Simulations and real-world testing demonstrate that the proposed approach overcomes the issue of local optima entrapment in multi-objective optimization scenarios. In a real-world system, an 11. 08 % increase in energy efficiency is observed compared to existing approaches.

ICLR Conference 2025 Conference Paper

Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities

  • Yichen Wang
  • Yiyi Zhang
  • Xinhao Hu
  • Li Niu 0002
  • Jianfu Zhang 0003
  • Yasushi Makihara
  • Yasushi Yagi
  • Pai Peng

Reconstructing pedestrian motion from dynamic sensors, with a focus on pedestrian intention, is crucial for advancing autonomous driving safety. However, this task is challenging due to data limitations arising from technical complexities, safety, and cost concerns. We introduce the Pedestrian Motion Reconstruction (PMR) dataset, which focuses on pedestrian intention to reconstruct behavior using multiple perspectives and modalities. PMR is developed from a mixed reality platform that combines real-world realism with the extensive, accurate labels of simulations, thereby reducing costs and risks. It captures the intricate dynamics of pedestrian interactions with objects and vehicles, using different modalities for a comprehensive understanding of human-vehicle interaction. Analyses show that PMR can naturally exhibit pedestrian intent and simulate extreme cases. PMR features a vast collection of data from 54 subjects interacting across 12 urban settings with 7 objects, encompassing 12,138 sequences with diverse weather conditions and vehicle speeds. This data provides a rich foundation for modeling pedestrian intent through multi-view and multi-modal insights. We also conduct comprehensive benchmark assessments across different modalities to thoroughly evaluate pedestrian motion reconstruction methods.

ICLR Conference 2025 Conference Paper

PhysPDE: Rethinking PDE Discovery and a Physical Hypothesis Selection Benchmark

  • Mingquan Feng
  • Yixin Huang
  • Yizhou Liu
  • Bofang Jiang
  • Junchi Yan

Despite extensive research, recovering PDE expressions from experimental observations often involves symbolic regression. This method generally lacks the incorporation of meaningful physical insights, resulting in outcomes lacking clear physical interpretations. Recognizing that the primary interest of Machine Learning for Science (ML4Sci) often lies in understanding the underlying physical mechanisms or even discovering new physical laws rather than simply obtaining mathematical expressions, this paper introduces a novel ML4Sci task paradigm. This paradigm focuses on interpreting experimental data within the framework of prior physical hypotheses and theories, thereby guiding and constraining the discovery of PDE expressions. We have formulated this approach as a nonlinear mixed-integer programming (MIP) problem, addressed through an efficient search scheme developed for this purpose. Our experiments on newly designed Fluid Mechanics and Laser Fusion datasets demonstrate the interpretability and feasibility of this method.

ICML Conference 2025 Conference Paper

QEM-Bench: Benchmarking Learning-based Quantum Error Mitigation and QEMFormer as a Multi-ranged Context Learning Baseline

  • Tianyi Bao
  • Ruizhe Zhong
  • Xinyu Ye
  • Yehui Tang 0002
  • Junchi Yan

Quantum Error Mitigation (QEM) has emerged as a pivotal technique for enhancing the reliability of noisy quantum devices in the Noisy Intermediate-Scale Quantum (NISQ) era. Recently, machine learning (ML)-based QEM approaches have demonstrated strong generalization capabilities without sampling overheads compared to conventional methods. However, evaluating these techniques is often hindered by a lack of standardized datasets and inconsistent experimental settings across different studies. In this work, we present QEM-Bench, a comprehensive benchmark suite of twenty-two datasets covering diverse circuit types and noise profiles, which provides a unified platform for comparing and advancing ML-based QEM methods. We further propose a refined ML-based QEM pipeline QEMFormer, which leverages a feature encoder that preserves local, global, and topological information, along with a two-branch model that captures short-range and long-range dependencies within the circuit. Empirical evaluations on QEM-Bench illustrate the superior performance of QEMFormer over existing baselines, underscoring the potential of integrated ML-QEM strategies.

ICLR Conference 2025 Conference Paper

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

  • Yehui Tang 0002
  • Mabiao Long
  • Junchi Yan

Quantum state property estimation (QPE) is a fundamental challenge in quantum many-body problems in physics and chemistry, involving the prediction of characteristics such as correlation and entanglement entropy through statistical analysis of quantum measurement data. Recent advances in deep learning have provided powerful solutions, predominantly using auto-regressive models. These models generally assume an intrinsic ordering among qubits, aiming to approximate the classical probability distribution through sequential training. However, unlike natural language, the entanglement structure of qubits lacks an inherent ordering, hurting the motivation of such models. In this paper, we introduce a novel, non-autoregressive generative model called \textbf{\model}, designed for \underline{\textbf{Qua}}ntum state property estimation using \underline{\textbf{Di}}ffusion \underline{\textbf{M}}odels. \model progressively denoises Gaussian noise into the distribution corresponding to the quantum state, encouraging equal, unbiased treatment of all qubits. \model learns to map physical variables to properties of the ground state of the parameterized Hamiltonian during offline training. Afterwards one can sample from the learned distribution conditioned on previously unseen physical variables to collect measurement records and employ post-processing to predict properties of unknown quantum states. We evaluate \model on large-scale QPE tasks using classically simulated data on the 1D anti-ferromagnetic Heisenberg model with the system size up to 100 qubits. Numerical results demonstrate that \model outperforms baseline models, particularly auto-regressive approaches, under conditions of limited measurement data during training and reduced sample complexity during inference.

ICML Conference 2025 Conference Paper

QuanONet: Quantum Neural Operator with Application to Differential Equation

  • Ruocheng Wang
  • Zhuo Xia
  • Ge Yan 0001
  • Junchi Yan

Differential equations are essential and popular in science and engineering. Learning-based methods including neural operators, have emerged as a promising paradigm. We explore its quantum counterpart, and propose QuanONet – a quantum neural operator which has not been well studied in literature compared with their counterparts in other machine learning areas. We design a novel architecture as a hardware-efficient ansatz, in the era of noisy intermediate-scale quantum (NISQ). Its circuit is pure quantum. By lying its ground on the operator approximation theorem for its quantum counterpart, QuanONet in theory can fit various differential equation operators. We also propose its modified version TF-QuanONet with ability to adaptively fit the dominant frequency of the problem. The real-device empirical results on problems including anti-derivative operators, Diffusion-reaction Systems demonstrate that QuanONet outperforms peer quantum methods when their model sizes are set akin to QuanONet.

NeurIPS Conference 2025 Conference Paper

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

  • Zhenjie Yang
  • Xiaosong Jia
  • Qifeng Li
  • Xue Yang
  • Maoqing Yao
  • Junchi Yan

Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in neural planning; however, these methods typically require privileged information as input rather than raw sensor data. We fill this gap by designing Raw2Drive, a dual-stream MBRL approach. Initially, we efficiently train an auxiliary privileged world model paired with a neural planner that uses privileged information as input. Subsequently, we introduce a raw sensor world model trained via our proposed Guidance Mechanism, which ensures consistency between the raw sensor world model and the privileged world model during rollouts. Finally, the raw sensor world model combines the prior knowledge embedded in the heads of the privileged world model to effectively guide the training of the raw sensor policy. Raw2Drive is so far the only RL based end-to-end method on CARLA Leaderboard 2. 0, and Bench2Drive and it achieves state-of-the-art performance.

ICLR Conference 2025 Conference Paper

Regularizing Energy among Training Samples for Out-of-Distribution Generalization

  • Yiting Chen 0003
  • Qitian Wu
  • Junchi Yan

The energy-based model provides a unified framework for various learning models where an energy value is assigned to each configuration of random variables based on probability. Recently, different methods have been proposed to derive an energy value out of the logits of a classifier for out-of-distribution (OOD) detection or OOD generalization. However, these methods mainly focus on the energy difference between in-distribution and OOD data samples, neglecting the energy difference among in-distribution data samples. In this paper, we show that the energy among in-distribution data also requires attention. We propose to investigate the energy difference between in-distribution data samples. Both empirically and theoretically, we show that previous methods for subpopulation shift (\emph{e.g.}, long-tail classification) such as data re-weighting and margin control apply implicit energy regularization and we provide a unified framework from the energy perspective. With the influence function, we further extend the energy regularization framework to OOD generalization scenarios where the distribution shift is more implicit compared to the long-tail recognition scenario. We conduct experiments on long-tail datasets, subpopulation shift benchmarks, and OOD generalization benchmarks to show the effectiveness of the proposed energy regularization.

NeurIPS Conference 2025 Conference Paper

Repurposing AlphaFold3-like Protein Folding Models for Antibody Sequence and Structure Co-design

  • Nianzu Yang
  • Songlin Jiang
  • Jian Ma
  • Huaijin Wu
  • Shuangjia Zheng
  • Wengong Jin
  • Junchi Yan

Diffusion models hold great potential for accelerating antibody design, but their performance is so far limited by the number of antibody-antigen complexes used for model training. Meanwhile, AlphaFold3-like protein folding models, pre-trained on a large corpus of crystal structures, have acquired a broad understanding of biomolecular interaction. Based on this insight, we develop a new antigen-conditioned antibody design model by adapting the diffusion module of AlphaFold3-like models for sequence-structure co-diffusion. Specifically, we extend their structure diffusion module with a sequence diffusion head and fine-tune the entire protein folding model for antibody sequence-structure co-design. Our benchmark results show that sequence-structure co-diffusion models not only surpass state-of-the-art antibody design methods in performance but also maintain structure prediction accuracy comparable to the original folding model. Notably, in the antibody co-design task, our method achieves a CDR-H3 recovery rate of 65% for typical antibodies, outperforming the baselines by 87%, and attains a remarkable 63% recovery rate for nanobodies.

ICLR Conference 2025 Conference Paper

Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach

  • Qi Liu
  • Xinhao Zheng
  • Xudong Lu
  • Qinxiang Cao
  • Junchi Yan

As a central component in formal verification, statement autoformalization has been widely studied including the recent efforts from machine learning community, but still remains a widely-recognized difficult and open problem. In this paper, we delve into two critical yet under-explored gaps: 1) absence of faithful and universal automated evaluation for autoformalization results; 2) agnosia of contextual information, inducing severe hallucination of formal definitions and theorems. To address the first issue, we propose **BEq** (_**B**idirectional **E**xtended Definitional E**q**uivalence_), an automated neuro-symbolic method to determine the equivalence between two formal statements, which is formal-grounded and well-aligned with human intuition. For the second, we propose **RAutoformalizer** (_**R**etrieval-augmented **Autoformalizer**_), augmenting statement autoformalization by _Dependency Retrieval_, retrieving potentially dependent objects from formal libraries. We parse the dependencies of libraries and propose to _structurally informalise_ formal objects by the topological order of dependencies. To evaluate OOD generalization and research-level capabilities, we build a novel benchmark, _Con-NF_, consisting of 961 informal-formal statement pairs from frontier mathematical researches. Experiments validate the effectiveness of our approaches: BEq is evaluated on 200 diverse formal statement pairs with expert-annotated equivalence label, exhibiting significantly improved accuracy ($82.50\\% \mapsto 90.50\\%$) and precision ($70.59\\% \mapsto 100.0\\%$). For dependency retrieval, a strong baseline is devised. Our RAutoformalizer substantially outperforms SOTA baselines in both in-distribution ProofNet benchmark ($12.83\\% \mapsto 18.18\\%$, BEq@8) and OOD Con-NF scenario ($4.58\\%\mapsto 16.86\\%$, BEq@8).

ICLR Conference 2025 Conference Paper

Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance

  • Siyu Sun
  • Han Lu
  • Jiangtong Li
  • Yichen Xie 0002
  • Tianjiao Li
  • Xiaokang Yang 0001
  • Liqing Zhang 0001
  • Junchi Yan

In the field of long-tailed recognition, the Decoupled Training paradigm has shown exceptional promise by dividing training into two stages: representation learning and classifier re-training. While previous work has tried to improve both stages simultaneously, this complicates isolating the effect of classifier re-training. Recent studies reveal that simple regularization can produce strong feature representations, highlighting the need to reassess classifier re-training methods. In this study, we revisit classifier re-training methods based on a unified feature representation and re-evaluate their performances. We propose two new metrics, Logits Magnitude and Regularized Standard Deviation, to compare the differences and similarities between various methods. Using these two newly proposed metrics, we demonstrate that when the Logits Magnitude across classes is nearly balanced, further reducing its overall value can effectively decrease errors and disturbances during training, leading to better model performance. Based on our analysis using these metrics, we observe that adjusting the logits could improve model performance, leading us to develop a simple label over-smoothing approach to adjust the logits without requiring prior knowledge of class distribution. This method softens the original one-hot labels by assigning a probability slightly higher than $\frac{1}{K}$ to the true class and slightly lower than $\frac{1}{K}$ to the other classes, where $K$ is the number of classes. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist2018.

AIJ Journal 2025 Journal Article

Rethinking visual prompt learning as masked visual token modeling

  • Ning Liao
  • Bowen Shi
  • Xiaopeng Zhang
  • Min Cao
  • Junchi Yan
  • Qi Tian

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones to achieve consistency, thus improving the performance stably. However, when transferring it to the vision area, current visual prompt learning methods are almost designed on discriminative pre-trained models, and there is also a lack of careful design to unify the forms of pre-training and downstream tasks. To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction. In addition, we develop the prototypical verbalizer for mapping the predicted visual token with implicit semantics to explicit downstream labels. To our best knowledge, VPTM is the first visual prompt method on the generative pre-trained visual model, which achieves consistency between pre-training and downstream visual classification by task reformulation. Experiments show that VPTM outperforms other visual prompt methods and achieves excellent efficiency. Moreover, the task consistency of VPTM contributes to the robustness against prompt location, prompt length and prototype dimension, and could be deployed uniformly.

ICLR Conference 2025 Conference Paper

SelKD: Selective Knowledge Distillation via Optimal Transport Perspective

  • Liangliang Shi
  • Zhengyan Shi
  • Junchi Yan

Knowledge Distillation (KD) has been a popular paradigm for training a (smaller) student model from its teacher model. However, little research has been done on the practical scenario where only a subset of the teacher's knowledge needs to be distilled, which we term selective KD (SelKD). This demand is especially pronounced in the era of foundation models, where the teacher model can be significantly larger than the student model. To address this issue, we propose to rethink the knowledge distillation problem from the perspective of Inverse Optimal Transport (IOT). Previous Bayesian frameworks mapped each sample to the probabilities of corresponding labels in an end-to-end manner, which fixed the number of classification categories and hindered effective partial knowledge transfer. In contrast, IOT calculates from the standpoint of transportation or matching, allowing for the flexible selection of samples and their quantities for matching. Traditional logit-based KD can be viewed as a special case within the IOT framework. Building on this IOT foundation, we formalize this setting in the context of classification, where only selected categories from the teacher's category space are required to be recognized by the student in the context of closed-set recognition, which we call closed-set SelKD, enhancing the student's performance on specific subtasks. Furthermore, we extend the closed-set SelKD, introducing an open-set version of SelKD, where the student model is required to provide a "not selected" response for categories outside its assigned task. Experimental results on standard benchmarks demonstrate the superiority of our approach. The source code is available at: \href{https://github.com/machoshi/SelKD}

ICLR Conference 2025 Conference Paper

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

  • Zhanpeng Zhou
  • Mingze Wang
  • Yuchen Mao
  • Bingrui Li
  • Junchi Yan

Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings. Despite the success, its effectiveness remains poorly understood. In this work, we discover an intriguing phenomenon in the training dynamics of SAM, shedding light on understanding its implicit bias towards flatter minima over Stochastic Gradient Descent (SGD). Specifically, we find that *SAM efficiently selects flatter minima late in training*. Remarkably, even a few epochs of SAM applied at the end of training yield nearly the same generalization and solution sharpness as full SAM training. Subsequently, we delve deeper into the underlying mechanism behind this phenomenon. Theoretically, we identify two phases in the learning dynamics after applying SAM late in training: i) SAM first escapes the minimum found by SGD exponentially fast; and ii) then rapidly converges to a flatter minimum within the same valley. Furthermore, we empirically investigate the role of SAM during the early training phase. We conjecture that the optimization method chosen in the late phase is more crucial in shaping the final solution's properties. Based on this viewpoint, we extend our findings from SAM to Adversarial Training.

ICLR Conference 2025 Conference Paper

SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs

  • Mingquan Feng
  • Yixin Huang
  • Weixin Liao
  • Yuhong Liu
  • Yizhou Liu
  • Junchi Yan

We present a novel framework, StochastIc Network Graph Evolving operatoR (SINGER), for learning the evolution operator of high-dimensional partial differential equations (PDEs). The framework uses a sub-network to approximate the solution at the initial time step and stochastically evolves the sub-network parameters over time by a graph neural network to approximate the solution at later time steps. The framework is designed to inherit the desirable properties of the parametric solution operator, including graph topology, semigroup, and stability, with a theoretical guarantee. Numerical experiments on 8 evolution PDEs of 5,10,15,20-dimensions show that our method outperforms existing baselines in almost all cases (31 out of 32), and that our method generalizes well to unseen initial conditions, equation dimensions, sub-network width, and time steps.

NeurIPS Conference 2025 Conference Paper

StruDiCO: Structured Denoising Diffusion with Gradient-free Inference-stage Boosting for Memory and Time Efficient Combinatorial Optimization

  • Yu Wang
  • Yang Li
  • Junchi Yan
  • Yi Chang

Diffusion models have recently emerged as powerful neural solvers for combinatorial optimization (CO). However, existing approaches fail to reveal how variables are progressively determined during inference, making the final solution opaque until the last step. To address this limitation, we propose a structured denoising diffusion model, StruDiCO, which incrementally constructs solutions through step-wise variable selection. This is achieved via a variable-absorption noising model, wherein the forward process simulates gradual variable deactivation, converging to an empty solution, while the reverse process incrementally selects variables to reconstruct the final solution. This design induces structural continuity across intermediate states, enabling interpretable and trajectory-consistent partial solutions throughout inference. To further improve the reliability of reverse inference, we introduce a constrained consistency sampling strategy, which suppresses low-confidence variable selection at each step to stabilize the reverse process. Leveraging the structure-preserving reverse process, we further propose a lightweight, gradient-free, objective-aware refinement framework, which iteratively improves solution quality by applying structure-aware perturbations to the current solution, performing reverse inference through the constraint consistency model, and decoding with an objective-guided scoring scheme. Extensive experiments on two canonical CO tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), show that StruDiCO outperforms state-of-the-art diffusion-based solvers, achieving up to $3. 5\times$ faster inference, 70\% lower GPU memory usage, and significantly improved solution quality, with up to 37. 7\% drop reduction on TSP and an average 38. 1\% improvement on MIS. The codes are publicly available at https: //github. com/yuuuuwang/StruDiCO.

IJCAI Conference 2025 Conference Paper

Tensor Network: from the Perspective of AI4Science and Science4AI

  • Junchi Yan
  • Yehui Tang
  • Xinyu Ye
  • Hao Xiong
  • Xiaoqiu Zhong
  • Yuhan Wang
  • Yuan Qi

Tensor network has been a promising numerical tool for computational problems across science and AI. For their emerging and fast development especially in the intersection between AI and science, this paper tries to present a compact review, regarding both their applications and its own recent technical development including open-source tools. Specifically, we make the observations that tensor network plays a functional role in matrix compression and representation, information fusion, as well as quantum-inspired algorithms, which can be generally regarded as Science4AI in our survey. On the other hand, there is an emerging line of research in tensor network in AI4Science especially like learning quantum many-body physics by using e. g. neural network quantum state. Importantly, we unify tensorization methodologies across classical and modern architectures, and particularly show how tensorization bridges low-order parameter spaces to high-dimensional representations without exponential parameter growth, and further point out their potential use in scientific computing. We conclude the paper with outlook for future trends.

ICML Conference 2025 Conference Paper

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

  • Jinbo Wang 0003
  • Mingze Wang
  • Zhanpeng Zhou
  • Junchi Yan
  • Weinan E
  • Lei Wu

Transformers have become the cornerstone of modern AI. Unlike traditional architectures, transformers exhibit a distinctive characteristic: diverse types of building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feed-forward networks, work collaboratively. Understanding the disparities and interactions among these blocks is therefore important. In this paper, we uncover a clear sharpness disparity across these blocks, which intriguingly emerges early in training and persists throughout the training process. Building on this insight, we propose a novel Blockwise Learning Rate (LR) strategy to accelerate large language model (LLM) pre-training. Specifically, by integrating Blockwise LR into AdamW, we consistently achieve lower terminal loss and nearly $2\times$ speedup compared to vanilla AdamW. This improvement is demonstrated across GPT-2 and LLaMA models, with model sizes ranging from 0. 12B to 1. 1B and datasets including OpenWebText and MiniPile. Finally, we incorporate Blockwise LR into Adam-mini (Zhang et al. , 2024), a recently proposed memory-efficient variant of Adam, achieving a combined $2\times$ speedup and $2\times$ memory savings. These results underscore the potential of leveraging the sharpness disparity principle to improve LLM training.

NeurIPS Conference 2025 Conference Paper

Train on Pins and Test on Obstacles for Rectilinear Steiner Minimum Tree

  • Xingbo Du
  • Ruizhe Zhong
  • Junchi Yan

Rectilinear Steiner Minimum Tree (RSMT) is widely used in Very Large Scale Integration (VLSI) and aims at connecting a set of pins using rectilinear edges while minimizing wirelength. Recently, learning-based methods have been explored to tackle this problem effectively. However, existing methods either suffer from excessive exploration of the search space or rely on heuristic combinations that compromise effectiveness and efficiency, and this limitation becomes notably exacerbated when extended to the obstacle-avoiding RSMT (OARSMT). To address this, we propose OAREST, a reinforcement learning-based framework for constructing an Obstacle-Avoiding Rectilinear Edge Sequence (RES) Tree. We theoretically establish the optimality of RES in obstacle-avoiding scenarios, which forms the foundation of our approach. Leveraging this theoretical insight, we introduce a dynamic masking strategy that supports parallel training across varying numbers of pins and extends to obstacles during inference. Empirical evaluations on both synthetic and real-world benchmarks show superior effectiveness and efficiency for RSMT and OARSMT problems, particularly in handling obstacles without training on them. Code available: https: //github. com/Thinklab-SJTU/EDA-AI/.

ICLR Conference 2025 Conference Paper

Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving

  • Kairui Yang
  • Zihao Guo
  • Gengjie Lin
  • Haotian Dong
  • Zhao Huang
  • Yipeng Wu
  • Die Zuo
  • Jibin Peng

Vehicle trajectory prediction is a crucial aspect of autonomous driving, which requires extensive trajectory data to train prediction models to understand the complex, varied, and unpredictable patterns of vehicular interactions. However, acquiring real-world data is expensive, so we advocate using Large Language Models (LLMs) to generate abundant and realistic trajectories of interacting vehicles efficiently. These models rely on textual descriptions of vehicle-to-vehicle interactions on a map to produce the trajectories. We introduce Trajectory-LLM (Traj-LLM), a new approach that takes brief descriptions of vehicular interactions as input and generates corresponding trajectories. Unlike language-based approaches that translate text directly to trajectories, Traj-LLM uses reasonable driving behaviors to align the vehicle trajectories with the text. This results in an "interaction-behavior-trajectory" translation process. We have also created a new dataset, Language-to-Trajectory (L2T), which includes 240K textual descriptions of vehicle interactions and behaviors, each paired with corresponding map topologies and vehicle trajectory segments. By leveraging the L2T dataset, Traj-LLM can adapt interactive trajectories to diverse map topologies. Furthermore, Traj-LLM generates additional data that enhances downstream prediction models, leading to consistent performance improvements across public benchmarks. The source code is released at https://github.com/TJU-IDVLab/Traj-LLM.

JMLR Journal 2025 Journal Article

Transformers from Diffusion: A Unified Framework for Neural Message Passing

  • Qitian Wu
  • David Wipf
  • Junchi Yan

Learning representations for structured data with certain geometries (e.g., observed or unobserved) is a fundamental challenge, wherein message passing neural networks (MPNNs) have become a de facto class of model solutions. In this paper, inspired by physical systems, we propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion on manifolds with layer-wise constraints of energy minimization. We identify that the diffusion operators have a one-to-one correspondence with the energy functions implicitly descended by the diffusion process, and the finite-difference iteration for solving the energy-constrained diffusion system induces the propagation layers of various types of MPNNs operating on observed or latent structures. This leads to a unified mathematical framework for common neural architectures whose computational flows can be cast as message passing (or its special case), including MLPs, GNNs, and Transformers. Building on these insights, we devise a new class of neural message passing models, dubbed diffusion-inspired Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework. Across diverse datasets ranging from real-world networks to images, texts, and physical particles, we demonstrate that the new model achieves promising performance in scenarios where the data structures are observed (as a graph), partially observed, or entirely unobserved. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

ICLR Conference 2025 Conference Paper

UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP

  • Wenzheng Pan
  • Hao Xiong 0003
  • Jiale Ma
  • Wentao Zhao
  • Yang Li 0197
  • Junchi Yan

Various neural solvers have been devised for combinatorial optimization (CO), which are often tailored for specific problem types, e.g., TSP, CVRP and SAT, etc. Yet, it remains an open question how to achieve universality regarding problem representing and learning with a general framework. This paper first proposes **UniCO**, to unify a set of CO problems by reducing them into the *general* TSP form featured by distance matrices. The applicability of this strategy depends on the efficiency of the problem reduction and solution transition procedures, which we show that at least ATSP, HCP, and SAT are readily feasible. The hope is to allow for the effective and even simultaneous use of as many types of CO instances as possible to train a neural TSP solver, and optionally finetune it for specific problem types. In particular, unlike the prevalent TSP benchmarks based on Euclidean instances with 2-D coordinates, our studied domain of TSP could involve non-metric, asymmetric or discrete distances without explicit node coordinates, which is much less explored in TSP literature while poses new intellectual challenges. Along this direction, we devise two neural TSP solvers with and without supervision to conquer such matrix-formulated input, respectively: 1) **MatPOENet** and 2) **MatDIFFNet**. The former is a reinforcement learning-based sequential model with pseudo one-hot embedding (POE) scheme; and the latter is a Diffusion-based generative model with the mix-noised reference mapping scheme. Experiments on ATSP, 2DTSP, HCP- and SAT-distributed general TSPs show the strong ability towards arbitrary matrix-encoded TSP with structure and size variation.

ICLR Conference 2025 Conference Paper

Unify ML4TSP: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search

  • Yang Li 0197
  • Jiale Ma
  • Wenzheng Pan
  • Runzhong Wang
  • Haoyu Geng
  • Nianzu Yang
  • Junchi Yan

Despite the rich works on machine learning (ML) for combinatorial optimization (CO), a unified, principled framework remains lacking. This study utilizes the Travelling Salesman Problem (TSP) as a major case study, with adaptations demonstrated for other CO problems, dissecting established mainstream learning-based solvers to outline a comprehensive design space. We present ML4TSPBench, which advances a unified modular streamline incorporating existing technologies in both learning and search for transparent ablation, aiming to reassess the role of learning and discern which parts of existing techniques are genuinely beneficial and which are not. This further leads to the investigation of desirable principles of learning designs and the exploration of concepts guiding method designs. We demonstrate the desirability of principles such as joint probability estimation, symmetry solution representation, and online optimization for learning-based designs. Leveraging the findings, we propose enhancements to existing methods to compensate for their missing attributes, thereby advancing performance and enriching the technique library. From a higher viewpoint, we also uncover a performance advantage in non-autoregressive and supervised paradigms compared to their counterparts. The strategic decoupling and organic recompositions yield a factory of new TSP solvers, where we investigate synergies across various method combinations and pinpoint the optimal design choices to create more powerful ML4TSP solvers, thereby facilitating and offering a reference for future research and engineering endeavors.

NeurIPS Conference 2025 Conference Paper

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

  • Xiangdong Zhang
  • Jiaqi Liao
  • Shaofeng Zhang
  • Fanqing Meng
  • Xiangpeng Wan
  • Junchi Yan
  • Yu Cheng

Recent advancements in text-to-video (T2V) diffusion models have enabled high-fidelity and realistic video synthesis. However, current T2V models often struggle to generate physically plausible content due to their limited inherent ability to accurately understand physics. We found that while the representations within T2V models possess some capacity for physics understanding, they lag significantly behind those from recent video self-supervised learning methods. To this end, we propose a novel framework called {VideoREPA}, which distills physics understanding capability from video understanding foundation models into T2V models by aligning token-level relations. This closes the physics understanding gap and enables more physics-plausible generation. Specifically, we introduce the {Token Relation Distillation (TRD) loss}, leveraging spatio-temporal alignment to provide soft guidance suitable for finetuning powerful pre-trained T2V models—a critical departure from prior representation alignment (REPA) methods. To our knowledge, VideoREPA is the first REPA method designed for finetuning T2V models and specifically for injecting physical knowledge. Empirical evaluations show that VideoREPA substantially enhances the physics commonsense of baseline method, CogVideoX, achieving significant improvement on relevant benchmarks and demonstrating a strong capacity for generating videos consistent with intuitive physics. Code and more video results are available at https: //videorepa. github. io/.

ICML Conference 2024 Conference Paper

ACM-MILP: Adaptive Constraint Modification via Grouping and Selection for Hardness-Preserving MILP Instance Generation

  • Ziao Guo
  • Yang Li 0197
  • Chang Liu 0021
  • Wenli Ouyang
  • Junchi Yan

Data plays a pivotal role in the development of both classic and learning-based methods for Mixed-Integer Linear Programming (MILP). However, the scarcity of data in real-world applications underscores the necessity for MILP instance generation methods. Currently, these methods primarily rely on iterating random single-constraint modifications, disregarding the underlying problem structure with constraint interrelations, thereby leading to compromised quality and solvability. In this paper, we propose ACM-MILP, a framework for MILP instance generation, to achieve adaptive constraint modification and constraint interrelation modeling. It employs an adaptive constraint selection mechanism based on probability estimation within the latent space to preserve instance characteristics. Meanwhile, it detects and groups strongly related constraints through community detection, enabling collective modifications that account for constraint dependencies. Experimental results show significant improvements in problem-solving hardness similarity under our framework. Additionally, in the downstream task, we showcase the efficacy of our generated instances for hyperparameter tuning. Source code is available: https: //github. com/Thinklab-SJTU/ACM-MILP.

NeurIPS Conference 2024 Conference Paper

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

  • Xiaosong Jia
  • Zhenjie Yang
  • Qifeng Li
  • Zhiyuan Zhang
  • Junchi Yan

In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e. g. , in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e. g. , Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.

NeurIPS Conference 2024 Conference Paper

Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime

  • Haoyu Geng
  • Hang Ruan
  • Runzhong Wang
  • Yang Li
  • Yang Wang
  • Lei Chen
  • Junchi Yan

Predictive combinatorial optimization, where the parameters of combinatorial optimization (CO) are unknown at the decision-making time, is the precise modeling of many real-world applications, including energy cost-aware scheduling and budget allocation on advertising. Tackling such a problem usually involves a prediction model and a CO solver. These two modules are integrated into the predictive CO pipeline following two design principles: ''Predict-then-Optimize (PtO)'', which learns predictions by supervised training and subsequently solves CO using predicted coefficients, while the other, named ''Predict-and-Optimize (PnO)'', directly optimizes towards the ultimate decision quality and claims to yield better decisions than traditional PtO approaches. However, there lacks a systematic benchmark of both approaches, including the specific design choices at the module level, as well as an evaluation dataset that covers representative real-world scenarios. To this end, we develop a modular framework to benchmark 11 existing PtO/PnO methods on 8 problems, including a new industrial dataset for combinatorial advertising that will be released. Our study shows that PnO approaches are better than PtO on 7 out of 8 benchmarks, but there is no silver bullet found for the specific design choices of PnO. A comprehensive categorization of current approaches and integration of typical scenarios are provided under a unified benchmark. Therefore, this paper could serve as a comprehensive benchmark for future PnO approach development and also offer fast prototyping for application-focused development. The code is available at \url{https: //github. com/Thinklab-SJTU/PredictiveCO-Benchmark}.

NeurIPS Conference 2024 Conference Paper

Boundary Matters: A Bi-Level Active Finetuning Method

  • Han Lu
  • Yichen Xie
  • Xiaokang Yang
  • Junchi Yan

The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields. However, the finetuning phase still requires high-quality annotated samples. To overcome this challenge, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Existing active learning methods struggle in this scenario due to their inherent bias in batch selection. Meanwhile, the recent active finetuning approach focuses solely on global distribution alignment but neglects the contributions of samples to local boundaries. Therefore, we propose a Bi-Level Active Finetuning framework (BiLAF) to select the samples for annotation in one shot, encompassing two stages: core sample selection for global diversity and boundary sample selection for local decision uncertainty. Without the need of ground-truth labels, our method can successfully identify pseudo-class centers, apply a novel denoising technique, and iteratively select boundary samples with designed evaluation metric. Extensive experiments provide qualitative and quantitative evidence of our method's superior efficacy, consistently outperforming the existing baselines.

NeurIPS Conference 2024 Conference Paper

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

  • Qingwen Bu
  • Jia Zeng
  • Li Chen
  • Yanchao Yang
  • Guyue Zhou
  • Junchi Yan
  • Ping Luo
  • Heming Cui

Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-trained visual representations, yet their efficacy and adaptability have been found to be constrained. Inspired by classic closed-loop control systems, we propose CLOVER, a closed-loop visuomotor control framework that incorporates feedback mechanisms to improve adaptive robotic control. CLOVER consists of a text-conditioned video diffusion model for generating visual plans as reference inputs, a measurable embedding space for accurate error quantification, and a feedback-driven controller that refines actions from feedback and initiates replans as needed. Our framework exhibits notable advancement in real-world robotic tasks and achieves state-of-the-art on CALVIN benchmark, improving by 8% over previous open-loop counterparts. Code and checkpoints are maintained at https: //github. com/OpenDriveLab/CLOVER.

ICLR Conference 2024 Conference Paper

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach

  • Shaofeng Zhang
  • Jinfa Huang
  • Qiang Zhou 0001
  • Zhibin Wang 0004
  • Fan Wang 0019
  • Jiebo Luo 0001
  • Junchi Yan

Image outpainting aims to generate the content of an input sub-image beyond its original boundaries. It is an important task in content generation yet remains an open problem for generative models. This paper pushes the technical frontier of image outpainting in two directions that have not been resolved in literature: 1) outpainting with arbitrary and continuous multiples (without restriction), and 2) outpainting in a single step (even for large expansion multiples). Moreover, we develop a method that does not depend on a pre-trained backbone network, which is in contrast commonly required by the previous SOTA outpainting methods. The arbitrary multiple outpainting is achieved by utilizing randomly cropped views from the same image during training to capture arbitrary relative positional information. Specifically, by feeding one view and positional embeddings as queries, we can reconstruct another view. At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings. The one-step outpainting ability here is particularly noteworthy in contrast to previous methods that need to be performed for $N$ times to obtain a final multiple which is $N$ times of its basic and fixed multiple. We evaluate the proposed approach (called PQDiff as we adopt a diffusion-based generator as our embodiment, under our proposed \textbf{P}ositional \textbf{Q}uery scheme) on public benchmarks, demonstrating its superior performance over state-of-the-art approaches. Specifically, PQDiff achieves state-of-the-art FID scores on the Scenery (\textbf{21.512}), Building Facades (\textbf{25.310}), and WikiArts (\textbf{36.212}) datasets. Furthermore, under the 2.25x, 5x and 11.7x outpainting settings, PQDiff only takes \textbf{40.6\%}, \textbf{20.3\%} and \textbf{10.2\%} of the time of the benchmark state-of-the-art (SOTA) method.

AAAI Conference 2024 Conference Paper

Double-Bounded Optimal Transport for Advanced Clustering and Classification

  • Liangliang Shi
  • Zhaoqi Shen
  • Junchi Yan

Optimal transport (OT) is attracting increasing attention in machine learning. It aims to transport a source distribution to a target one at minimal cost. In its vanilla form, the source and target distributions are predetermined, which contracts to the real-world case involving undetermined targets. In this paper, we propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one, thus giving more freedom for the transport to find solutions. Based on the entropic regularization of DB-OT, three scaling-based algorithms are devised for calculating the optimal solution. We also show that our DB-OT is helpful for barycenter-based clustering, which can avoid the excessive concentration of samples in a single cluster. Then we further develop DB-OT techniques for long-tailed classification which is an emerging and open problem. We first propose a connection between OT and classification, that is, in the classification task, training involves optimizing the Inverse OT to learn the representations, while testing involves optimizing the OT for predictions. with this OT perspective, we first apply DB-OT to improve the loss, and the Balanced Softmax is shown as a special case. Then we apply DB-OT for inference in the testing process. Even with vanilla Softmax trained features, our experiments show that our method can achieve good results with our improved inference scheme in the testing stage.

AAAI Conference 2024 Conference Paper

Dynamic Reactive Spiking Graph Neural Network

  • Han Zhao
  • Xu Yang
  • Cheng Deng
  • Junchi Yan

Spiking Graph Neural Networks are emerging tools for analyzing graph data along with low energy consumption and certain biological fidelity. Existing methods directly integrate same-reactive spiking neurons into graph neural networks for processing propagated graphs. However, such same-reactive neurons are not biological-functionality enough compared to the brain's dynamic-reactive ones, limiting the model's expression. Meanwhile, insufficient long-range neighbor information can be excavated with the few-step propagated graph, restricting discrimination of graph spiking embeddings. Inspired by the dynamic cognition in the brain, we propose a Dynamic Reactive Spiking Graph Neural Network that can enhance model's expressive ability in higher biological fidelity. Specifically, we design dynamic reactive spiking neurons to process spiking graph inputs, which have unique optimizable thresholds to spontaneously explore dynamic reactive states between neurons. Moreover, discriminative graph positional spikes are learned and integrated adaptively into spiking outputs through our neurons, thereby exploring long-range neighbors more thoroughly. Finally, with the dynamic reactive mechanism and learnable positional integration, we can obtain a powerful and highly bio-fidelity model with low energy consumption. Experiments on various domain-related datasets can demonstrate the effectiveness of our model. Our code is available at https://github.com/hzhao98/DRSGNN.

ICLR Conference 2024 Conference Paper

EBMDock: Neural Probabilistic Protein-Protein Docking via a Differentiable Energy Model

  • Huaijin Wu
  • Wei Liu 0005
  • Yatao An Bian
  • Jiaxiang Wu 0001
  • Nianzu Yang
  • Junchi Yan

Protein complex formation, a pivotal challenge in contemporary biology, has recently gained interest from the machine learning community, particularly concerning protein-ligand docking tasks. In this paper, we delve into the equally crucial but comparatively under-investigated domain of protein-protein docking. Specifically, we propose a geometric deep learning framework, termed EBMDock, which employs statistical potential as its energy function. This approach produces a probability distribution over docking poses, such that the identified docking pose aligns with a minimum point in the energy landscape. We employ a differential algorithm grounded in Langevin dynamics to efficiently sample from the docking pose distribution. Additionally, we incorporate energy-based training using contrastive divergence, enhancing both performance and stability. Empirical results demonstrate that our approach achieves superior performance on two benchmark datasets DIPS and DB5.5. Furthermore, the results suggest EBMDock can serve as an orthogonal enhancement to existing methods.

NeurIPS Conference 2024 Conference Paper

Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization

  • Yang Li
  • Jinpei Guo
  • Runzhong Wang
  • Hongyuan Zha
  • Junchi Yan

Diffusion models have recently advanced Combinatorial Optimization (CO) as a powerful backbone for neural solvers. However, their iterative sampling process requiring denoising across multiple noise levels incurs substantial overhead. We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots. This is achieved through an optimization consistency training protocol, which, for a given instance, minimizes the difference among samples originating from varying generative trajectories and time steps relative to the optimal solution. The proposed model enables fast single-step solution generation while retaining the option of multi-step sampling to trade for sampling quality, which offers a more effective and efficient alternative backbone for neural solvers. In addition, within the training-to-testing (T2T) framework, to bridge the gap between training on historical instances and solving new instances, we introduce a novel consistency-based gradient search scheme during the test stage, enabling more effective exploration of the solution space learned during training. It is achieved by updating the latent solution probabilities under objective gradient guidance during the alternation of noise injection and denoising steps. We refer to this model as Fast T2T. Extensive experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency, even outperforming LKH given limited time budgets. Notably, Fast T2T with merely one-step generation and one-step gradient search can mostly outperform the SOTA diffusion-based counterparts that require hundreds of steps, while achieving tens of times speedup.

NeurIPS Conference 2024 Conference Paper

FlexPlanner: Flexible 3D Floorplanning via Deep Reinforcement Learning in Hybrid Action Space with Multi-Modality Representation

  • Ruizhe Zhong
  • Xingbo Du
  • Shixiong Kai
  • Zhentao Tang
  • Siyuan Xu
  • Jianye Hao
  • Mingxuan Yuan
  • Junchi Yan

In the Integrated Circuit (IC) design flow, floorplanning (FP) determines the position and shape of each block. Serving as a prototype for downstream tasks, it is critical and establishes the upper bound of the final PPA (Power, Performance, Area). However, with the emergence of 3D IC with stacked layers, existing methods are not flexible enough to handle the versatile constraints. Besides, they typically face difficulties in aligning the cross-die modules in 3D ICs due to their heuristic representations, which could potentially result in severe data transfer failures. To address these issues, we propose FlexPlanner, a flexible learning-based method in hybrid action space with multi-modality representation to simultaneously handle position, aspect ratio, and alignment of blocks. To our best knowledge, FlexPlanner is the first learning-based approach to discard heuristic-based search in the 3D FP task. Thus, the solution space is not limited by the heuristic floorplanning representation, allowing for significant improvements in both wirelength and alignment scores. Specifically, FlexPlanner models 3D FP based on multi-modalities, including vision, graph, and sequence. To address the non-trivial heuristic-dependent issue, we design a sophisticated policy network with hybrid action space and asynchronous layer decision mechanism that allow for determining the versatile properties of each block. Experiments on public benchmarks MCNC and GSRC show the effectiveness. We significantly improve the alignment score from 0. 474 to 0. 940 and achieve an average reduction of 16% in wirelength. Moreover, our method also demonstrates zero-shot transferability on unseen circuits.

ICLR Conference 2024 Conference Paper

Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory

  • Yiting Chen 0003
  • Zhanpeng Zhou
  • Junchi Yan

The behavior of neural networks still remains opaque, and a recently widely noted phenomenon is that networks often achieve similar performance when initialized with different random parameters. This phenomenon has attracted significant attention in measuring the similarity between features learned by distinct networks. However, feature similarity could be vague in describing the same feature since equivalent features hardly exist. In this paper, we expand the concept of equivalent feature and provide the definition of what we call *functionally equivalent features*. These features produce equivalent output under certain transformations. Using this definition, we aim to derive a more intrinsic metric for the so-called *feature complexity* regarding the redundancy of features learned by a neural network at each layer. We offer a formal interpretation of our approach through the lens of category theory, a well-developed area in mathematics. To quantify the feature complexity, we further propose an efficient algorithm named Iterative Feature Merging. Our experimental results validate our ideas and theories from various perspectives. We empirically demonstrate that the functionally equivalence widely exists among different features learned by the same neural network and we could reduce the number of parameters of the network without affecting the performance. We have also drawn several interesting empirical findings, including: 1) the larger the network, the more redundant features it learns; 2) in particular, we show how to prune the networks based on our finding using direct equivalent feature merging, without fine-tuning which is often needed in peer network pruning methods; 3) same structured networks with higher feature complexity achieve better performance; 4) through the layers of a neural network, the feature complexity first increase then decrease; 5) for the image classification task, a group of functionally equivalent features may correspond to a specific semantic meaning. Source code will be made publicly available.

ICML Conference 2024 Conference Paper

Graph Out-of-Distribution Detection Goes Neighborhood Shaping

  • Tianyi Bao
  • Qitian Wu
  • Zetian Jiang
  • Yiting Chen 0003
  • Jiawei Sun 0001
  • Junchi Yan

Despite the rich line of research works on out-of-distribution (OOD) detection on images, the literature on OOD detection for interdependent data, e. g. , graphs, is still relatively limited. To fill this gap, we introduce TopoOOD as a principled approach that accommodates graph topology and neighborhood context for detecting OOD node instances on graphs. Meanwhile, we enrich the experiment settings by splitting in-distribution (ID) and OOD data based on distinct topological distributions, which presents new benchmarks for a more comprehensive analysis of graph-based OOD detection. The latter is designed to thoroughly assess the performance of these discriminators under distribution shifts involving structural information, providing a rigorous evaluation of methods in the emerging area of OOD detection on graphs. Our experimental results show the competitiveness of the proposed model across multiple datasets, as evidenced by up to a 15% increase in the AUROC and a 50% decrease in the FPR compared to existing state-of-the-art methods.

ICML Conference 2024 Conference Paper

How Graph Neural Networks Learn: Lessons from Training Dynamics

  • Chenxiao Yang
  • Qitian Wu
  • David P. Wipf
  • Ruoyu Sun 0001
  • Junchi Yan

A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, but whether GNNs will learn desired functions during the optimization process remains less clear. To fill this gap, we study their training dynamics in function space. In particular, we find that the optimization of GNNs through gradient descent implicitly leverages the graph structure to update the learned function. This phenomenon is dubbed as kernel-graph alignment, which has been empirically and theoretically corroborated. This new analytical framework from the optimization perspective enables interpretable explanations of when and why the learned GNN functions generalize, which are relevant to their limitations on heterophilic graphs. From a practical standpoint, it also provides high-level principles for designing new algorithms. We exemplify this by showing that a simple and efficient non-parametric algorithm, obtained by explicitly using graph structure to update the learned function, can consistently compete with nonlinear GNNs.

ICLR Conference 2024 Conference Paper

InterpGNN: Understand and Improve Generalization Ability of Transdutive GNNs through the Lens of Interplay between Train and Test Nodes

  • Jiawei Sun 0001
  • Kailai Li 0002
  • Ruoxin Chen
  • Jie Li 0002
  • Chentao Wu
  • Yue Ding 0001
  • Junchi Yan

Transductive node prediction has been a popular learning setting in Graph Neural Networks (GNNs). It has been widely observed that the shortage of information flow between the distant nodes and intra-batch nodes (for large-scale graphs) often hurt the generalization of GNNs which overwhelmingly adopt message-passing. Yet there is still no formal and direct theoretical results to quantitatively capture the underlying mechanism, despite the recent advance in both theoretical and empirical studies for GNN's generalization ability. In this paper, the $L$-hop interplay (i.e., message passing capability with training nodes) for a $L$-layer GNN is successfully incorporated in our derived PAC-Bayesian bound for GNNs in the semi-supervised transductive setting. In other words, we quantitatively show how the interplay between training and testing sets influence the generalization ability which also partly explains the effectiveness of some existing empirical methods for enhancing generalization. Based on this result, we further design a plug-and-play ***Graph** **G**lobal **W**orkspace* module for GNNs (InterpGNN-GW) to enhance the interplay, utilizing the key-value attention mechanism to summarize crucial nodes' embeddings into memory and broadcast the memory to all nodes, in contrast to the pairwise attention scheme in previous graph transformers. Extensive experiments on both small-scale and large-scale graph datasets validate the effectiveness of our theory and approaches.

ICLR Conference 2024 Conference Paper

L2P-MIP: Learning to Presolve for Mixed Integer Programming

  • Chang Liu 0021
  • Zhichen Dong
  • Haobo Ma
  • Weilin Luo
  • Xijun Li
  • Bowen Pang
  • Jia Zeng
  • Junchi Yan

Modern solvers for solving mixed integer programming (MIP) often rely on the branch-and-bound (B&B) algorithm which could be of high time complexity, and presolving techniques are well designed to simplify the instance as pre-processing before B&B. However, such presolvers in existing literature or open-source solvers are mostly set by default agnostic to specific input instances, and few studies have been reported on tailoring presolving settings. In this paper, we aim to dive into this open question and show that the MIP solver can be indeed largely improved when switching the default instance-agnostic presolving into instance-specific presolving. Specifically, we propose a combination of supervised learning and classic heuristics to achieve efficient presolving adjusting, avoiding tedious reinforcement learning. Notably, our approach is orthogonal from many recent efforts in incorporating learning modules into the B&B framework after the presolving stage, and to our best knowledge, this is the first work for introducing learning to presolve in MIP solvers. Experiments on multiple real-world datasets show that well-trained neural networks can infer proper presolving for arbitrary incoming MIP instances in less than 0.5s, which is neglectable compared with the solving time often hours or days.

ICLR Conference 2024 Conference Paper

LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

  • Tianyu Li 0004
  • Peijin Jia
  • Bangjun Wang
  • Li Chen 0008
  • Kun Jiang
  • Junchi Yan
  • Hongyang Li 0001

A map, as crucial information for downstream applications of an autonomous driving system, is usually represented in lanelines or centerlines. However, existing literature on map learning primarily focuses on either detecting geometry-based lanelines or perceiving topology relationships of centerlines. Both of these methods ignore the intrinsic relationship of lanelines and centerlines, that lanelines bind centerlines. While simply predicting both types of lane in one model is mutually excluded in learning objective, we advocate lane segment as a new representation that seamlessly incorporates both geometry and topology information. Thus, we introduce LaneSegNet, the first end-to-end mapping network generating lane segments to obtain a complete representation of the road structure. Our algorithm features two key modifications. One is a lane attention module to capture pivotal region details within the long-range feature space. Another is an identical initialization strategy for reference points, which enhances the learning of positional priors for lane attention. On the OpenLane-V2 dataset, LaneSegNet outperforms previous counterparts by a substantial gain across three tasks, i.e., map element detection (+4.8 mAP), centerline perception (+6.9 DET$_l$), and the newly defined one, lane segment perception (+5.6 mAP). Furthermore, it obtains a real-time inference speed of 14.7 FPS. Code is accessible at https://github.com/OpenDriveLab/LaneSegNet.

ICML Conference 2024 Conference Paper

Learning Divergence Fields for Shift-Robust Graph Representations

  • Qitian Wu
  • Fan Nie
  • Chenxiao Yang
  • Junchi Yan

Real-world data generation often involves certain geometries (e. g. , graphs) that induce instance-level interdependence. This characteristic makes the generalization of learning models more difficult due to the intricate interdependent patterns that impact data-generative distributions and can vary from training to testing. In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data. We generalize the diffusion equation with stochastic diffusivity at each time step, which aims to capture the multi-faceted information flows among interdependent data. Furthermore, we derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains. Regarding practical implementation, we introduce three model instantiations that can be considered as the generalized versions of GCN, GAT, and Transformers, respectively, which possess advanced robustness against distribution shifts. We demonstrate their promising efficacy for out-of-distribution generalization on diverse real-world datasets. Source codes are available at https: //github. com/fannie1208/GLIND.

NeurIPS Conference 2024 Conference Paper

Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

  • Xinhao Zheng
  • Yang Li
  • Cunxin Fan
  • Huaijin Wu
  • Xinhao Song
  • Junchi Yan

Cryptographic problems, operating within binary variable spaces, can be routinely transformed into Boolean Satisfiability (SAT) problems regarding specific cryptographic conditions like plaintext-ciphertext matching. With the fast development of learning for discrete data, this SAT representation also facilitates the utilization of machine-learning approaches with the hope of automatically capturing patterns and strategies inherent in cryptographic structures in a data-driven manner. Existing neural SAT solvers consistently adopt conjunctive normal form (CNF) for instance representation, which in the cryptographic context can lead to scale explosion and a loss of high-level semantics. In particular, extensively used XOR operations in cryptographic problems can incur an exponential number of clauses. In this paper, we propose a graph structure based on Arithmetic Normal Form (ANF) to efficiently handle the XOR operation bottleneck. Additionally, we design an encoding method for AND operations in these ANF-based graphs, demonstrating improved efficiency over alternative general graph forms for SAT. We then propose CryptoANFNet, a graph learning approach that trains a classifier based on a message-passing scheme to predict plaintext-ciphertext satisfiability. Using ANF-based SAT instances, CryptoANFNet demonstrates superior scalability and can naturally capture higher-order operational information. Empirically, CryptoANFNet achieves a 50x speedup over heuristic solvers and outperforms SOTA learning-based SAT solver NeuroSAT, with 96\% vs. 91\% accuracy on small-scale and 72\% vs. 55\% on large-scale datasets from real encryption algorithms. We also introduce a key-solving algorithm that simplifies ANF-based SAT instances from plaintext and ciphertext, enhancing key decryption accuracy from 76. 5\% to 82\% and from 72\% to 75\% for datasets generated from two real encryption algorithms.

NeurIPS Conference 2024 Conference Paper

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

  • Jian Hu
  • Jiayi Lin
  • Junchi Yan
  • Shaogang Gong

Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we first utilize hallucinations to mine task-related information from images and verify its accuracy to enhance precision of the generated prompts. Specifically, we introduce an iterative \textbf{Pro}mpt-\textbf{Ma}sk \textbf{C}ycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially leveraging hallucinations to extract extended contextual prompts on a test image. These hallucinations are then minimized to formulate precise instance-specific prompts, directing the mask generator to produce masks that are consistent with task semantics by mask semantic alignment. Iteratively the generated masks induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks. Experiments on 5 benchmarks demonstrate the effectiveness of ProMaC. Code is in https: //lwpyh. github. io/ProMaC/.

ICLR Conference 2024 Conference Paper

M3C: A Framework towards Convergent, Flexible, and Unsupervised Learning of Mixture Graph Matching and Clustering

  • Jiaxin Lu 0001
  • Zetian Jiang
  • Tianzhe Wang
  • Junchi Yan

Existing graph matching methods typically assume that there are similar structures between graphs and they are matchable. This work addresses a more realistic scenario where graphs exhibit diverse modes, requiring graph grouping before or along with matching, a task termed mixture graph matching and clustering. Specifically, we introduce Minorize-Maximization Matching and Clustering (M3C), a learning-free algorithm that guarantees theoretical convergence through the Minorize-Maximization framework and offers enhanced flexibility via relaxed clustering. Building on M3C, we further develop UM3C, an unsupervised model that incorporates novel edge-wise affinity learning and pseudo label selection. Extensive experimental results on public benchmarks demonstrate that our method outperforms state-of-the-art graph matching and mixture graph matching and clustering approaches in both accuracy and efficiency.

ICML Conference 2024 Conference Paper

MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness

  • Yahong Zhang
  • Chenchen Fan
  • Donghui Chen
  • Congrui Li
  • Wenli Ouyang
  • Mingda Zhu
  • Junchi Yan

Machine learning (ML) has been actively adopted in Linear Programming (LP) and Mixed-Integer Linear Programming (MILP), whose potential is hindered by instance scarcity. Current synthetic instance generation methods often fall short in closely mirroring the distribution of original datasets or ensuring the feasibility and boundedness of the generated data — a critical requirement for obtaining reliable supervised labels in model training. In this paper, we present a diffusion-based LP/MILP instance generative framework called MILP-FBGen. It strikes a balance between structural similarity and novelty while maintaining feasibility/boundedness via a meticulously designed structure-preserving generation module and a feasibility/boundedness-constrained sampling module. Our method shows superiority on two fronts: 1) preservation of key properties (hardness, feasibility, and boundedness) of LP/MILP instances, and 2) enhanced performance on downstream tasks. Extensive studies show two-fold superiority that our method ensures higher distributional similarity and 100% feasibility in both easy and hard datasets, surpassing current state-of-the-art techniques.

ICLR Conference 2024 Conference Paper

MixSATGEN: Learning Graph Mixing for SAT Instance Generation

  • Xinyan Chen
  • Yang Li 0197
  • Runzhong Wang
  • Junchi Yan

The Boolean satisfiability problem (SAT) stands as a canonical NP-complete task. In particular, the scarcity of real-world SAT instances and their usefulness for tuning SAT solvers underscore the necessity for effective and efficient ways of hard instance generation, whereas existing methods either struggle to maintain plausible hardness or suffer from limited applicability. Different from the typical construction-based methods, this paper introduces an adaptive and efficient graph interpolation approach that in place modifies the raw structure of graph-represented SAT instance by replacing it with a counterpart from another instance. Specifically, it involves a two-stage matching and mixing pipeline. The matching aims to find a correspondence map of literal nodes from two instance graphs via learned features from a matching network; while the mixing stage involves iteratively exchanging clause pairs with the highest correspondence scores until a specified replacement ratio is achieved. We further show that under our matching-mixing framework, moderate randomness can avoid hardness degradation of instances by introducing Gumbel noise. Experimental results show the superiority of our method with both resemblance in structure and hardness, and general applicability.

ICML Conference 2024 Conference Paper

MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation

  • Nianzu Yang
  • Kaipeng Zeng
  • Haotian Lu 0009
  • Yexin Wu
  • Zexin Yuan
  • Danni Chen
  • Shengdian Jiang
  • Jiaxiang Wu 0001

Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i. e. , they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https: //github. com/Thinklab-SJTU/MorphGrower.

ICLR Conference 2024 Conference Paper

Node2ket: Efficient High-Dimensional Network Embedding in Quantum Hilbert Space

  • Hao Xiong 0003
  • Yehui Tang 0002
  • Yunlin He
  • Wei Tan
  • Junchi Yan

Network embedding (NE) is a prominent technique for network analysis where the nodes are represented as vectorized embeddings in a continuous space. Existing works tend to resort to the low-dimensional embedding space for efficiency and less risk of over-fitting. In this paper, we explore a new NE paradigm whose embedding dimension goes exponentially high w.r.t. the number of parameters, yet being very efficient and effective. Specifically, the node embeddings are represented as product states that lie in a super high-dimensional (e.g. $2^{32}$-dim) quantum Hilbert space, with a carefully designed optimization approach to guarantee the robustness to work in different scenarios. In the experiments, we show diverse virtues of our methods, including but not limited to: the overwhelming performance on downstream tasks against conventional low-dimensional NE baselines with the similar amount of computing resources, the super high efficiency for a fixed low embedding dimension (e.g. 512) with less than 1/200 memory usage, the robustness when equipped with different objectives and sampling strategies as a fundamental tool for future NE research. As a relatively unexplored topic in literature, the high-dimensional NE paradigm is demonstrated effective both experimentally and theoretically.

ICML Conference 2024 Conference Paper

On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm

  • Zhanpeng Zhou
  • Zijun Chen
  • Yilan Chen 0002
  • Bo Zhang 0069
  • Junchi Yan

The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.

ICML Conference 2024 Conference Paper

OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

  • Liangliang Shi
  • Jack Fan
  • Junchi Yan

We propose to understand Contrastive Language-Image Pretraining model (CLIP) from the Optimal Transport (OT) perspective. Specifically, we show that training of CLIP is an embodiment of inverse OT and the adopted two InfoNCE losses in CLIP correspond to a special case of bilevel optimization of modified entropic OT. We then generalize the original CLIP loss to an OT-based loss family using variants of Regularized OT (e. g. Fused Gromov OT, unbalanced OT, etc.), and demonstrate their superior performance on public datasets for both image and text downstream tasks. We also rethink the inference stage of CLIP by using the tool of OT, and propose to adopt the fused Gromov OT for (zero-shot) classification, in which the prediction is based on the graph representation whereby images and texts are nodes for graph matching. By our new technique, we show how to generalize zero-shot classification to other more flexible zero-shot tasks with competitive performance: long-tailed classification and selective classification. The former assumes the known prior distribution of labels, while in the latter case, only a subset of samples are asked to predict, yet with high prediction confidence.

NeurIPS Conference 2024 Conference Paper

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders

  • Xiangdong Zhang
  • Shaofeng Zhang
  • Junchi Yan

Masked autoencoder has been widely explored in point cloud self-supervised learning, whereby the point cloud is generally divided into visible and masked parts. These methods typically include an encoder accepting visible patches (normalized) and corresponding patch centers (position) as input, with the decoder accepting the output of the encoder and the centers (position) of the masked parts to reconstruct each point in the masked patches. Then, the pre-trained encoders are used for downstream tasks. In this paper, we show a motivating empirical result that when directly feeding the centers of masked patches to the decoder without information from the encoder, it still reconstructs well. In other words, the centers of patches are important and the reconstruction objective does not necessarily rely on representations of the encoder, thus preventing the encoder from learning semantic representations. Based on this key observation, we propose a simple yet effective method, $i. e. $, learning to \textbf{P}redict \textbf{C}enters for \textbf{P}oint \textbf{M}asked \textbf{A}uto\textbf{E}ncoders (\textbf{PCP-MAE}) which guides the model to learn to predict the significant centers and use the predicted centers to replace the directly provided centers. Specifically, we propose a Predicting Center Module (PCM) that shares parameters with the original encoder with extra cross-attention to predict centers. Our method is of high pre-training efficiency compared to other alternatives and achieves great improvement over Point-MAE, particularly surpassing it by \textbf{5. 50\% on OBJ-BG, 6. 03\% on OBJ-ONLY, and 5. 17\% on PB-T50-RS} for 3D object classification on the ScanObjectNN dataset. The code is available at \url{https: //github. com/aHapBean/PCP-MAE}.

AAAI Conference 2024 Conference Paper

PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling

  • Ruizhe Zhong
  • Junjie Ye
  • Zhentao Tang
  • Shixiong Kai
  • Mingxuan Yuan
  • Jianye Hao
  • Junchi Yan

Pre-routing timing prediction has been recently studied for evaluating the quality of a candidate cell placement in chip design. It involves directly estimating the timing metrics for both pin-level (slack, slew) and edge-level (net delay, cell delay), without time-consuming routing. However, it often suffers from signal decay and error accumulation due to the long timing paths in large-scale industrial circuits. To address these challenges, we propose a two-stage approach. First, we propose global circuit training to pre-train a graph auto-encoder that learns the global graph embedding from circuit netlist. Second, we use a novel node updating scheme for message passing on GCN, following the topological sorting sequence of the learned graph embedding and circuit graph. This scheme residually models the local time delay between two adjacent pins in the updating sequence, and extracts the lookup table information inside each cell via a new attention mechanism. To handle large-scale circuits efficiently, we introduce an order preserving partition scheme that reduces memory consumption while maintaining the topological dependencies. Experiments on 21 real world circuits achieve a new SOTA R2 of 0.93 for slack prediction, which is significantly surpasses 0.59 by previous SOTA method. Code will be available at: https://github.com/Thinklab-SJTU/EDA-AI.

JMLR Journal 2024 Journal Article

Pygmtools: A Python Graph Matching Toolkit

  • Runzhong Wang
  • Ziao Guo
  • Wenzheng Pan
  • Jiale Ma
  • Yikai Zhang
  • Nan Yang
  • Qi Liu
  • Longxuan Wei

Graph matching aims to find node-to-node matching among multiple graphs, which is a fundamental yet challenging problem. To facilitate graph matching in scientific research and industrial applications, pygmtools is released, which is a Python graph matching toolkit that implements a comprehensive collection of two-graph matching and multi-graph matching solvers, covering both learning-free solvers as well as learning-based neural graph matching solvers. Our implementation supports numerical backends including Numpy, PyTorch, Jittor, Paddle, runs on Windows, MacOS and Linux, and is friendly to install and configure. Comprehensive documentations covering beginner's guide, API reference and examples are available online. pygmtools is open-sourced under Mulan PSL v2 license. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

QVAE-Mole: The Quantum VAE with Spherical Latent Variable Learning for 3-D Molecule Generation

  • Huanjin Wu
  • Xinyu Ye
  • Junchi Yan

Molecule generation ideally in its 3-D form has enjoyed wide applications in material, chemistry, life science, etc. We propose the first quantum parametric circuit for 3-D molecule generation for its potential quantum advantage especially considering the arrival of Noisy Intermediate-Scale Quantum (NISQ) era. We choose the Variational AutoEncoder (VAE) scheme for its simplicity and one-shot generation ability, which we believe is more quantum-friendly compared with the auto-regressive generative models or diffusion models as used in classic approaches. Specifically, we present a quantum encoding scheme designed for 3-D molecules with qubits complexity $\mathcal{O}(C\log n)$ ($n$ is the number of atoms) and adopt a von Mises-Fisher (vMF) distributed latent space to meet the inherent coherence of the quantum system. We further design to encode conditions into quantum circuits for property-specified generation. Experimentally, our model could generate plausible 3-D molecules and achieve competitive quantitative performance with significantly reduced circuit parameters compared with their classic counterparts. The source code will be released upon publication.

NeurIPS Conference 2024 Conference Paper

ReLIZO: Sample Reusable Linear Interpolation-based Zeroth-order Optimization

  • Xiaoxing Wang
  • Xiaohan Qin
  • Xiaokang Yang
  • Junchi Yan

Gradient estimation is critical in zeroth-order optimization methods, which aims to obtain the descent direction by sampling update directions and querying function evaluations. Extensive research has been conducted including smoothing and linear interpolation. The former methods smooth the objective function, causing a biased gradient estimation, while the latter often enjoys more accurate estimates, at the cost of large amounts of samples and queries at each iteration to update variables. This paper resorts to the linear interpolation strategy and proposes to reduce the complexity of gradient estimation by reusing queries in the prior iterations while maintaining the sample size unchanged. Specifically, we model the gradient estimation as a quadratically constrained linear program problem and manage to derive the analytical solution. It innovatively decouples the required sample size from the variable dimension without extra conditions required, making it able to leverage the queries in the prior iterations. Moreover, part of the intermediate variables that contribute to the gradient estimation can be directly indexed, significantly reducing the computation complexity. Experiments on both simulation functions and real scenarios (black-box adversarial attacks neural architecture search, and parameter-efficient fine-tuning for large language models), show its efficacy and efficiency. Our code is available at https: //github. com/Thinklab-SJTU/ReLIZO. git.

ICLR Conference 2024 Conference Paper

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

  • Bo Zhang 0069
  • Xinyu Cai
  • Jiakang Yuan
  • Donglin Yang
  • Jianfei Guo
  • Xiangchao Yan
  • Renqiu Xia
  • Botian Shi

Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs. In this paper, we provide a new perspective and approach of alleviating the domain shifts, by proposing a Reconstruction-Simulation-Perception (ReSimAD) scheme. Specifically, the implicit reconstruction process is based on the knowledge from the previous old domain, aiming to convert the domain-related knowledge into domain-invariant representations, e.g., 3D scene-level meshes. Besides, the point clouds simulation process of multiple new domains is conditioned on the above reconstructed 3D meshes, where the target-domain-like simulation samples can be obtained, thus reducing the cost of collecting and annotating new-domain data for the subsequent perception process. For experiments, we consider different cross-domain situations such as Waymo-to-KITTI, Waymo-to-nuScenes, etc, to verify the zero-shot target-domain perception using ReSimAD. Results demonstrate that our method is beneficial to boost the domain generalization ability, even promising for 3D pre-training. Code and simulated points are available at: https://github.com/PJLab-ADG/3DTrans

NeurIPS Conference 2024 Conference Paper

Rethinking Parity Check Enhanced Symmetry-Preserving Ansatz

  • Ge Yan
  • Mengfei Ran
  • Ruocheng Wang
  • Kaiseng Pan
  • Junchi Yan

With the arrival of the Noisy Intermediate-Scale Quantum (NISQ) era, Variational Quantum Algorithms (VQAs) have emerged to obtain possible quantum advantage. In particular, how to effectively incorporate hard constraints in VQAs remains a critical and open question. In this paper, we manage to combine the Hamming Weight Preserving ansatz with a topological-aware parity check on physical qubits to enforce error mitigation and further hard constraints. We demonstrate the combination significantly outperforms peer VQA methods on both quantum chemistry problems and constrained combinatorial optimization problems e. g. Quadratic Assignment Problem. Intensive experimental results on both simulators and superconducting quantum processors are provided to verify that the combination of HWP ansatz with parity check is among the most promising candidates to demonstrate quantum advantages in the NISQ era to solve more realistic problems.

ICLR Conference 2024 Conference Paper

Rethinking the symmetry-preserving circuits for constrained variational quantum algorithms

  • Ge Yan 0001
  • Hongxu Chen
  • Kaisen Pan
  • Junchi Yan

With the arrival of the Noisy Intermediate-Scale Quantum (NISQ) era, Variational Quantum Algorithms (VQAs) have emerged as popular approaches to obtain possible quantum advantage in the relatively near future. In particular, how to effectively incorporate the common symmetries in physical systems as hard constraints in VQAs remains a critical and open question. In this paper, we revisit the Hamming Weight (HW) preserving ansatz and establish the links from ansatz to various symmetries and constraints, which both enlarges the usage of HW preserving ansatz and provides a coherent solution for constrained VQAs. Meanwhile, we utilize the quantum optimal control theory and quantum overparameterization theory to analyze the capability and expressivity of HW preserving ansatz and verify these theoretical results on unitary approximation problem. We conduct detailed numerical experiments on two well-studied symmetry-preserving problems, namely ground state energy estimation and feature selection in machine learning. The superior performance demonstrates the efficiency and supremacy of the proposed HW preserving ansatz on constrained VQAs.

ICML Conference 2024 Conference Paper

SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification

  • Yehui Tang 0002
  • Nianzu Yang
  • Mabiao Long
  • Junchi Yan

The accurate classification of quantum states is crucial for advancing quantum computing, as it allows for the effective analysis and correct functioning of quantum devices by analyzing the statistics of the data from quantum measurements. Traditional supervised methods, which rely on extensive labeled measurement outcomes, are used to categorize unknown quantum states with different properties. However, the labeling process demands computational and memory resources that increase exponentially with the number of qubits. We propose SSL4Q, manage to achieve (for the first time) semi-supervised learning specifically designed for quantum state classification. SSL4Q’s architecture is tailored to ensure permutation invariance for unordered quantum measurements and maintain robustness in the face of measurement uncertainties. Our empirical studies encompass simulations on two types of quantum systems: the Heisenberg Model and the Variational Quantum Circuit (VQC) Model, with system size reaching up to 50 qubits. The numerical results demonstrate SSL4Q’s superiority over traditional supervised models in scenarios with limited labels, highlighting its potential in efficiently classifying quantum states with reduced computational and resource overhead.

NeurIPS Conference 2024 Conference Paper

Towards General Loop Invariant Generation: A Benchmark of Programs with Memory Manipulation

  • Chang Liu
  • Xiwei Wu
  • Yuan Feng
  • Qinxiang Cao
  • Junchi Yan

Program verification is vital for ensuring software reliability, especially in the context of increasingly complex systems. Loop invariants, remaining true before and after each iteration of loops, are crucial for this verification process. Traditional provers and machine learning based methods for generating loop invariants often require expert intervention or extensive labeled data, and typically only handle numerical property verification. These methods struggle with programs involving complex data structures and memory manipulations, limiting their applicability and automation capabilities. This paper introduces a new benchmark named LIG-MM, specifically for programs with complex data structures and memory manipulations. We collect 312 programs from various sources, including daily programs from college homework, the international competition (SV-COMP), benchmarks from previous papers (SLING), and programs from real-world software systems (Linux Kernel, GlibC, LiteOS, and Zephyr). Based on LIG-MM, our findings indicate that previous methods, including GPT-4, fail to automate verification for these programs. Consequently, we propose a novel LLM-SE framework that coordinates LLM with symbolic execution, fine-tuned using self-supervised learning, to generate loop invariants. Experimental results on LIG-MM demonstrate that our LLM-SE outperforms state-of-the-art methods, offering a new direction toward automated program verification in real-world scenarios.

ICLR Conference 2024 Conference Paper

Towards Imitation Learning to Branch for MIP: A Hybrid Reinforcement Learning based Sample Augmentation Approach

  • Changwen Zhang
  • Wenli Ouyang
  • Hao Yuan 0002
  • Liming Gong
  • Yong Sun
  • Ziao Guo
  • Zhichen Dong
  • Junchi Yan

Branch-and-bound (B\&B) has long been favored for tackling complex Mixed Integer Programming (MIP) problems, where the choice of branching strategy plays a pivotal role. Recently, Imitation Learning (IL)-based policies have emerged as potent alternatives to traditional rule-based approaches. However, it is nontrivial to acquire high-quality training samples, and IL often converges to suboptimal variable choices for branching, restricting the overall performance. In response to these challenges, we propose a novel hybrid online and offline reinforcement learning (RL) approach to enhance the branching policy by cost-effective training sample augmentation. In the online phase, we train an online RL agent to dynamically decide the sample generation processes, drawing from either the learning-based policy or the expert policy. The objective is to strike a balance between exploration and exploitation of the sample generation process. In the offline phase, a value function is trained to fit each decision's cumulative reward and filter the samples with high cumulative returns. This dual-purpose function not only reduces training complexity but also enhances the quality of the samples. To assess the efficacy of our data augmentation mechanism, we conduct comprehensive evaluations across a range of MIP problems. The results consistently show that it excels in making superior branching decisions compared to state-of-the-art learning-based models and the open-source solver SCIP. Notably, it even often outperforms Gurobi.

ICLR Conference 2024 Conference Paper

Towards LLM4QPE: Unsupervised Pretraining of Quantum Property Estimation and A Benchmark

  • Yehui Tang 0002
  • Hao Xiong 0003
  • Nianzu Yang
  • Tailong Xiao
  • Junchi Yan

Estimating the properties of quantum systems such as quantum phase has been critical in addressing the essential quantum many-body problems in physics and chemistry. Deep learning models have been recently introduced to property estimation, surpassing conventional statistical approaches. However, these methods are tailored to the specific task and quantum data at hand. It remains an open and attractive question for devising a more universal task-agnostic pretraining model for quantum property estimation. In this paper, we propose LLM4QPE, a large language model style quantum task-agnostic pretraining and finetuning paradigm that 1) performs unsupervised pretraining on diverse quantum systems with different physical conditions; 2) uses the pretrained model for supervised finetuning and delivers high performance with limited training data, on downstream tasks. It mitigates the cost for quantum data collection and speeds up convergence. Extensive experiments show the promising efficacy of LLM4QPE in various tasks including classifying quantum phases of matter on Rydberg atom model and predicting two-body correlation function on anisotropic Heisenberg model.

NeurIPS Conference 2024 Conference Paper

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

  • Hancheng Ye
  • Jiakang Yuan
  • Renqiu Xia
  • Xiangchao Yan
  • Tao Chen
  • Junchi Yan
  • Botian Shi
  • Bo Zhang

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers from high computation cost, resulting in a prohibitive latency for interactive applications. In this paper, we propose AdaptiveDiffusion to relieve this bottleneck by adaptively reducing the noise prediction steps during the denoising process. Our method considers the potential of skipping as many noise prediction steps as possible while keeping the final denoised results identical to the original full-step ones. Specifically, the skipping strategy is guided by the third-order latent difference that indicates the stability between timesteps during the denoising process, which benefits the reusing of previous noise prediction results. Extensive experiments on image and video diffusion models demonstrate that our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 2-5x speedup without quality degradation. The code is available at https: //github. com/UniModal4Reasoning/AdaptiveDiffusion

NeurIPS Conference 2024 Conference Paper

Unveiling The Matthew Effect Across Channels: Assessing Layer Width Sufficiency via Weight Norm Variance

  • Yiting Chen
  • Jiazi Bu
  • Junchi Yan

The trade-off between cost and performance has been a longstanding and critical issue for deep neural networks. One key factor affecting the computational cost is the width of each layer. However, in practice, the width of layers in a neural network is mostly empirically determined. In this paper, we show that a pattern regarding the variance of weight norm corresponding to different channels can indicate whether the layer is sufficiently wide and may help us better allocate computational resources across the layers. Starting from a simple intuition that channels with larger weights would have larger gradients and the difference in weight norm enlarges between channels with similar weight, we empirically validate that wide and narrow layers show two different patterns with experiments across different data modalities and network architectures. Based on the two different patterns, we identify three stages during training and explain each stage with corresponding evidence. We further propose to adjust the width based on the identified pattern and show that conventional layer width settings for CNNs could be adjusted to reduce the number of parameters while boosting the performance.

ICML Conference 2024 Conference Paper

UP2ME: Univariate Pre-training to Multivariate Fine-tuning as a General-purpose Framework for Multivariate Time Series Analysis

  • Yunhao Zhang
  • Minghao Liu
  • Shengyang Zhou
  • Junchi Yan

Despite the success of self-supervised pre-training in texts and images, applying it to multivariate time series (MTS) falls behind tailored methods for tasks like forecasting, imputation and anomaly detection. We propose a general-purpose framework, named UP2ME ( U nivariate P re-training to M ultivariate Fin e -tuning). It conducts task-agnostic pre-training when downstream tasks are unspecified. Once the task and setting (e. g. forecasting length) are determined, it gives sensible solutions with frozen pre-trained parameters, which has not been achieved before. UP2ME is further refined by fine-tuning. A univariate-to-multivariate paradigm is devised to address the heterogeneity of temporal and cross-channel dependencies. In univariate pre-training, univariate instances with diverse lengths are generated for Masked AutoEncoder (MAE) pre-training, discarding cross-channel dependency. The pre-trained model handles downstream tasks by formulating them into specific mask-reconstruction problems. In multivariate fine-tuning, it constructs a dependency graph among channels using the pre-trained encoder to enhance cross-channel dependency capture. Experiments on eight real-world datasets show its SOTA performance in forecasting and imputation, approaching task-specific performance in anomaly detection. Our code is available at https: //github. com/Thinklab-SJTU/UP2ME.

AAAI Conference 2024 Conference Paper

ViTree: Single-Path Neural Tree for Step-Wise Interpretable Fine-Grained Visual Categorization

  • Danning Lao
  • Qi Liu
  • Jiazi Bu
  • Junchi Yan
  • Wei Shen

As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-grained visual categorization that combines the popular vision transformer as a feature extraction backbone with neural decision trees. By traversing the tree paths, ViTree effectively selects patches from transformer-processed features to highlight informative local regions, thereby refining representations in a step-wise manner. Unlike previous tree-based models that rely on soft distributions or ensembles of paths, ViTree selects a single tree path, offering a clearer and simpler decision-making process. This patch and path selectivity enhances model interpretability of ViTree, enabling better insights into the model's inner workings. Remarkably, extensive experimentation validates that this streamlined approach surpasses various strong competitors and achieves state-of-the-art performance while maintaining exceptional interpretability which is proved by multi-perspective methods. Code can be found at https://github.com/SJTU-DeepVisionLab/ViTree.

NeurIPS Conference 2024 Conference Paper

What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information

  • Yiting Chen
  • Junchi Yan

Transformer-based large language models (LLMs) have successfully handled various tasks. As one fundamental module in Transformers, position encoding encodes the positional information of tokens in a sequence. Specifically, rotary position embedding (RoPE), one of the most widely used techniques, encodes the positional information by dividing the query or key value with $d$ elements into $d/2$ pairs and rotating the 2d vectors corresponding to each pair of elements. Therefore, the direction of each pair and the position-related rotation jointly determine the attention score. In this paper, we show that the direction of the 2d pair is largely affected by the angle between the corresponding weight vector pair. We theoretically show that non-orthogonal weight vector pairs lead to great attention on tokens at a certain relative position and are less sensitive to the input which may correspond to basic syntactic information. Meanwhile, the orthogonal weight vector pairs are more flexible regarding the relative position, which may correspond to high-level syntactic information. Empirical evidence supports the hypothesis that shallow layers of LLMs focus more on local syntax and deep layers focus more on high-level semantics. Furthermore, we show that LLMs fine-tuning mainly changes the pairs of weight vectors that are nearly orthogonal, i. e. , the weight corresponding to high-level semantics, which enables the reduction of the number of trainable parameters during fine-tuning without sacrificing performance. We propose a method namely Angle-based Weight Selection (AWS) to reduce the fine-tuning overhead and verify the effectiveness of the proposed method on widely used Alpaca fine-tuned Llama-2.

AAAI Conference 2023 Conference Paper

C-NTPP: Learning Cluster-Aware Neural Temporal Point Process

  • Fangyu Ding
  • Junchi Yan
  • Haiyang Wang

Event sequences in continuous time space are ubiquitous across applications and have been intensively studied with both classic temporal point process (TPP) and its recent deep network variants. This work is motivated by an observation that many of event data exhibit inherent clustering patterns in terms of the sparse correlation among events, while such characteristics are seldom explicitly considered in existing neural TPP models whereby the history encoders are often embodied by RNNs or Transformers. In this work, we propose a c-NTPP (Cluster-Aware Neural Temporal Point Process) model, which leverages a sequential variational autoencoder framework to infer the latent cluster each event belongs to in the sequence. Specially, a novel event-clustered attention mechanism is devised to learn each cluster and then aggregate them together to obtain the final representation for each event. Extensive experiments show that c-NTPP achieves superior performance on both real-world and synthetic datasets, and it can also uncover the underlying clustering correlations.

ICLR Conference 2023 Conference Paper

Contextual Image Masking Modeling via Synergized Contrasting without View Augmentation for Faster and Better Visual Pretraining

  • Shaofeng Zhang
  • Feng Zhu 0006
  • Rui Zhao 0001
  • Junchi Yan

We propose a new contextual masking image modeling (MIM) approach called contrasting-aided contextual MIM (ccMIM), under the MIM paradigm for visual pretraining. Specifically, we adopt importance sampling to select the masked patches with richer semantic information for reconstruction, instead of random sampling as done in previous MIM works. As such, the resulting patch reconstruction task from the remaining less semantic patches could be more difficult and helps to learn. To speed up the possibly slowed convergence due to our more difficult reconstruction task, we further propose a new contrastive loss that aligns the tokens of the vision transformer extracted from the selected masked patches and the remaining ones, respectively. The hope is that it serves as a regularizer for patch feature learning such that the image-level global information could be captured in both masked and unmasked patches, and notably such a single-view contrasting avoids the tedious image augmentation step required in recent efforts of introducing contrastive learning to MIM (to speedup convergence and discriminative ability). Meanwhile, the attention score from the contrastive global feature can also carry effective semantic clues to in turn guide our above masking patch selection scheme. In consequence, our contextual MIM and contrastive learning are synergetically performed in a loop (semantic patch selection-token alignment contrasting) to boost the best of the two worlds: fast convergence and strong performance on downstream tasks without ad-hoc augmentations, which are verified by empirical results on ImageNet-1K for both classification and dense vision tasks.

ICLR Conference 2023 Conference Paper

Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting

  • Yunhao Zhang
  • Junchi Yan

Recently many deep models have been proposed for multivariate time series (MTS) forecasting. In particular, Transformer-based models have shown great potential because they can capture long-term dependency. However, existing Transformer-based models mainly focus on modeling the temporal dependency (cross-time dependency) yet often omit the dependency among different variables (cross-dimension dependency), which is critical for MTS forecasting. To fill the gap, we propose Crossformer, a Transformer-based model utilizing cross-dimension dependency for MTS forecasting. In Crossformer, the input MTS is embedded into a 2D vector array through the Dimension-Segment-Wise (DSW) embedding to preserve time and dimension information. Then the Two-Stage Attention (TSA) layer is proposed to efficiently capture the cross-time and cross-dimension dependency. Utilizing DSW embedding and TSA layer, Crossformer establishes a Hierarchical Encoder-Decoder (HED) to use the information at different scales for the final forecasting. Extensive experimental results on six real-world datasets show the effectiveness of Crossformer against previous state-of-the-arts.

ICLR Conference 2023 Conference Paper

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

  • Qitian Wu
  • Chenxiao Yang
  • Wentao Zhao
  • Yixuan He 0001
  • David P. Wipf
  • Junchi Yan

Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t. a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction. The codes are available at https://github.com/qitianwu/DIFFormer.

ICLR Conference 2023 Conference Paper

Energy-based Out-of-Distribution Detection for Graph Neural Networks

  • Qitian Wu
  • Yiting Chen 0003
  • Chenxiao Yang
  • Junchi Yan

Representation learning on semi-structured data, e.g., graphs, has become a central problem in deep learning community as relational structures are pervasive in real situations and induce data inter-dependence that hinders trivial adaptation of existing approaches in other domains where the inputs are assumed to be i.i.d. sampled. However, current models in this regime mostly focus on improving testing performance of in-distribution data and largely ignores the potential risk w.r.t. out-of-distribution (OOD) testing samples that may cause negative outcome if the model is overconfident in prediction on them. In this paper, we identify a provably effective OOD discriminator based on an energy function directly extracted from a graph neural network trained with standard supervised classification loss. This paves a way for a simple and efficient OOD detection model for GNN-based semi-supervised learning on graphs, which we call GNN-Safe. It also has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for in-distribution and OOD samples, which, more critically, can be further strengthened by a non-learning-based structured propagation scheme. Extensive experiments over five real-world datasets validate the practical efficacy of the proposed model for detecting various OOD instances that are inter-connected in a graph with up to 17.0% improvement on average AUROC over competitive peer models and without sacrificing in-distribution testing accuracy.

JMLR Journal 2023 Journal Article

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

  • Wenhao Li
  • Bo Jin
  • Xiangfeng Wang
  • Junchi Yan
  • Hongyuan Zha

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications due to non-interactivity between agents, the curse of dimensionality, and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. This paper proposes a flexible fully decentralized actor-critic MARL framework, which can combine most of the actor-critic methods and handle large-scale general cooperative multi-agent settings. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, the proposed framework can achieve scalability and stability for the large-scale environment. This framework also reduces information transmission by the parameter sharing mechanism and novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that the proposed decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

  • Zhanpeng Zhou
  • Yongyi Yang
  • Xiaojiang Yang
  • Junchi Yan
  • Wei Hu

Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the permutation approaches. The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.

ICLR Conference 2023 Conference Paper

Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs

  • Chenxiao Yang
  • Qitian Wu
  • Jiahua Wang
  • Junchi Yan

Graph neural networks (GNNs), as the de-facto model class for representation learning on graphs, are built upon the multi-layer perceptrons (MLP) architecture with additional message passing layers to allow features to flow across nodes. While conventional wisdom commonly attributes the success of GNNs to their advanced expressivity, we conjecture that this is not the main cause of GNNs' superiority in node-level prediction tasks. This paper pinpoints the major source of GNNs' performance gain to their intrinsic generalization capability, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, but then adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training. This finding provides a new perspective for understanding the learning behavior of GNNs, and can be used as an analytic tool for dissecting various GNN-related research problems including expressivity, generalization, over-smoothing and heterophily. As an initial step to analyze PMLP, we show its essential difference to MLP at infinite-width limit lies in the NTK feature map in the post-training stage. Moreover, through extrapolation analysis (i.e., generalization under distribution shifts), we find that though most GNNs and their PMLP counterparts cannot extrapolate non-linear functions for extreme out-of-distribution data, they have greater potential to generalize to testing data near the training data support as natural advantages of the GNN architecture used for inference.

ICLR Conference 2023 Conference Paper

Graph Signal Sampling for Inductive One-Bit Matrix Completion: a Closed-form Solution

  • Chao Chen 0016
  • Haoyu Geng
  • Gang Zeng
  • Zhaobing Han
  • Hua Chai
  • Xiaokang Yang 0001
  • Junchi Yan

Inductive one-bit matrix completion is motivated by modern applications such as recommender systems, where new users would appear at test stage with the ratings consisting of only ones and no zeros. We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (signal) on the vertices of an item-item graph, then learn structural graph properties to recover the function from its values on certain vertices --- the problem of graph signal sampling. We propose a class of regularization functionals that takes into account discrete random label noise in the graph vertex domain, then develop the GS-IMC approach which biases the reconstruction towards functions that vary little between adjacent vertices for noise reduction. Theoretical result shows that accurate reconstructions can be achieved under mild conditions. For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain and builds upon a prediction-correction update algorithm to obtain the unbiased and minimum-variance reconstruction. Both GS-IMC and BGS-IMC have closed-form solutions and thus are highly scalable in large data. Experiments show that our methods achieve state-of-the-art performance on public benchmarks.

NeurIPS Conference 2023 Conference Paper

H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection

  • Yi Yu
  • Xue Yang
  • Qingyun Li
  • Yue Zhou
  • Feipeng Da
  • Junchi Yan

With the rapidly increasing demand for oriented object detection, e. g. in autonomous driving and remote sensing, the recently proposed paradigm involving weakly-supervised detector H2RBox for learning rotated box (RBox) from the more readily-available horizontal box (HBox) has shown promise. This paper presents H2RBox-v2, to further bridge the gap between HBox-supervised and RBox-supervised oriented object detection. Specifically, we propose to leverage the reflection symmetry via flip and rotate consistencies, using a weakly-supervised network branch similar to H2RBox, together with a novel self-supervised branch that learns orientations from the symmetry inherent in visual objects. The detector is further stabilized and enhanced by practical techniques to cope with peripheral issues e. g. angular periodicity. To our best knowledge, H2RBox-v2 is the first symmetry-aware self-supervised paradigm for oriented object detection. In particular, our method shows less susceptibility to low-quality annotation and insufficient training data compared to H2RBox. Specifically, H2RBox-v2 achieves very close performance to a rotation annotation trained counterpart -- Rotated FCOS: 1) DOTA-v1. 0/1. 5/2. 0: 72. 31%/64. 76%/50. 33% vs. 72. 44%/64. 53%/51. 77%; 2) HRSC: 89. 66% vs. 88. 99%; 3) FAIR1M: 42. 27% vs. 41. 25%.

ICLR Conference 2023 Conference Paper

H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection

  • Xue Yang 0005
  • Gefan Zhang
  • Wentong Li
  • Yue Zhou 0005
  • Xuehui Wang
  • Junchi Yan

Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. The cores of our method are weakly- and self-supervised learning, which predicts the angle of the object by learning the consistency of two different views. To our best knowledge, H2RBox is the first horizontal box annotation-based oriented object detector. Compared to an alternative i.e. horizontal box-supervised instance segmentation with our post adaption to oriented object detection, our approach is not susceptible to the prediction quality of mask and can perform more robustly in complex scenes containing a large number of dense objects and outliers. Experimental results show that H2RBox has significant performance and speed advantages over horizontal box-supervised instance segmentation methods, as well as lower memory requirements. While compared to rotated box-supervised oriented object detectors, our method shows very close performance and speed. The source code is available at PyTorch-based \href{https://github.com/yangxue0827/h2rbox-mmrotate}{MMRotate} and Jittor-based \href{https://github.com/yangxue0827/h2rbox-jittor}{JDet}.

NeurIPS Conference 2023 Conference Paper

HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection

  • Xingbo Du
  • Chonghua Wang
  • Ruizhe Zhong
  • Junchi Yan

Global Routing (GR) is a core yet time-consuming task in VLSI systems. It recently attracted efforts from the machine learning community, especially generative models, but they suffer from the non-connectivity of generated routes. We argue that the inherent non-connectivity can harm the advantage of its one-shot generation and has to be post-processed by traditional approaches. Thus, we propose a novel definition, called hub, which represents the key point in the route. Equipped with hubs, global routing is transferred from a pin-pin connection problem to a hub-pin connection problem. Specifically, to generate definitely-connected routes, this paper proposes a two-phase learning scheme named HubRouter, which includes 1) hub-generation phase: A condition-guided hub generator using deep generative models; 2) pin-hub-connection phase: An RSMT construction module that connects the hubs and pins using an actor-critic model. In the first phase, we incorporate typical generative models into a multi-task learning framework to perform hub generation and address the impact of sensitive noise points with stripe mask learning. During the second phase, HubRouter employs an actor-critic model to finish the routing, which is efficient and has very slight errors. Experiments on simulated and real-world global routing benchmarks are performed to show our approach's efficiency, particularly HubRouter outperforms the state-of-the-art generative global routing methods in wirelength, overflow, and running time. Moreover, HubRouter also shows strength in other applications, such as RSMT construction and interactive path replanning.

IJCAI Conference 2023 Conference Paper

IID-GAN: an IID Sampling Perspective for Regularizing Mode Collapse

  • Yang Li
  • Liangliang Shi
  • Junchi Yan

Despite its success, generative adversarial networks (GANs) still suffer from mode collapse, i. e. , the generator can only map latent variables to a partial set of modes in the target distribution. In this paper, we analyze and seek to regularize this issue with an independent and identically distributed (IID) sampling perspective and emphasize that holding the IID property referring to the target distribution for generation can naturally avoid mode collapse. This is based on the basic IID assumption for real data in machine learning. However, though the source samples {z} obey IID, the generations {G(z)} may not necessarily be IID sampling from the target distribution. Based on this observation, considering a necessary condition of IID generation, we propose a new loss to encourage the closeness between the inverse samples of real data and the Gaussian source in the latent space to regularize the generation to be IID from the target distribution. The logic is that the inverse samples from target data should also be IID in the source distribution. Experiments on both synthetic and real-world data show the effectiveness of our model.

IJCAI Conference 2023 Conference Paper

Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach

  • Haoxuan Wang
  • Zhiding Yu
  • Yisong Yue
  • Animashree Anandkumar
  • Anqi Liu
  • Junchi Yan

We propose a framework for learning calibrated uncertainties under domain shifts, considering the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form that concerns the domain shift. In particular, the density ratio estimator yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our proposed method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show an agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.

AAMAS Conference 2023 Conference Paper

Learning Structured Communication for Multi-Agent Reinforcement Learning

  • Junjie Sheng
  • Xiangfeng Wang
  • Bo Jin
  • Wenhao Li
  • Jun Wang
  • Junchi Yan
  • Tsung-Hui Chang
  • Hongyuan Zha

This paper investigates multi-agent reinforcement learning (MARL) communication mechanisms in large-scale scenarios. We propose a novel framework, Learning Structured Communication (LSC), that leverages a flexible and efficient communication topology. LSC enables adaptive agent grouping to create diverse hierarchical formations over episodes generated through an auxiliary task and a hierarchical routing protocol. We learn a hierarchical graph neural network with the formed topology that facilitates effective message generation and propagation between inter- and intra-group communications. Unlike state-of-the-art communication mechanisms, LSC possesses a detailed and learnable design for hierarchical communication. Numerical experiments on challenging tasks demonstrate that the proposed LSC exhibits high communication efficiency and global cooperation capability.

ICML Conference 2023 Conference Paper

LinSATNet: The Positive Linear Satisfiability Neural Networks

  • Runzhong Wang
  • Yunhao Zhang
  • Ziao Guo
  • Tianyi Chen
  • Xiaokang Yang 0001
  • Junchi Yan

Encoding constraints into neural networks is attractive. This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions. We further theoretically characterize the convergence property of the Sinkhorn algorithm for multiple marginals, and the underlying formulation is also derived. In contrast to the sequential decision e. g. reinforcement learning-based solvers, we showcase our technique in solving constrained (specifically satisfiability) problems by one-shot neural networks, including i) a neural routing solver learned without supervision of optimal solutions; ii) a partial graph matching network handling graphs with unmatchable outliers on both sides; iii) a predictive network for financial portfolios with continuous constraints. To our knowledge, there exists no one-shot neural solver for these scenarios when they are formulated as satisfiability problems. Source code is available at https: //github. com/Thinklab-SJTU/LinSATNet.

NeurIPS Conference 2023 Conference Paper

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

  • Huijie Wang
  • Tianyu Li
  • Yang Li
  • Li Chen
  • Chonghao Sima
  • Zhenbo Liu
  • Bangjun Wang
  • Peijin Jia

Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. The objective of the presented dataset is to advance research in understanding the structure of road scenes by examining the relationship between perceived entities, such as traffic elements and lanes. Leveraging existing datasets, OpenLane-V2 consists of 2, 000 annotated road scenes that describe traffic elements and their correlation to the lanes. It comprises three primary sub-tasks, including the 3D lane detection inherited from OpenLane, accompanied by corresponding metrics to evaluate the model’s performance. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.

ICLR Conference 2023 Conference Paper

Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning

  • Shaofeng Zhang
  • Feng Zhu 0006
  • Rui Zhao 0001
  • Junchi Yan

We propose ADCLR: \underline{A}ccurate and \underline{D}ense \underline{C}ontrastive \underline{R}epresentation \underline{L}earning, a novel self-supervised learning framework for learning accurate and dense vision representation. To extract spatial-sensitive information, ADCLR introduces query patches for contrasting in addition with global contrasting. Compared with previous dense contrasting methods, ADCLR mainly enjoys three merits: i) achieving both global-discriminative and spatial-sensitive representation, ii) model-efficient (no extra parameters in addition to the global contrasting baseline), and iii) correspondence-free and thus simpler to implement. Our approach achieves new state-of-the-art performance for contrastive methods. On classification tasks, for ViT-S, ADCLR achieves 78.1\% top-1 accuracy on ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 1.1\%. For ViT-B, ADCLR achieves 79.8\%, 84.0\% accuracy on ImageNet by linear probing and finetune, outperforming DINO by 0.6\%, 0.4\% accuracy. For dense tasks, on MS-COCO, ADCLR achieves significant improvements of 44.3\% AP on object detection, 39.7\% AP on instance segmentation, outperforming previous SOTA method SelfPatch by 2.2\% and 1.2\%, respectively. On ADE20K, ADCLR outperforms SelfPatch by 1.0\% mIoU, 1.2\% mAcc on the segmentation task.

ICML Conference 2023 Conference Paper

Patch-level Contrastive Learning via Positional Query for Visual Pre-training

  • Shaofeng Zhang
  • Qiang Zhou 0001
  • Zhibin Wang 0004
  • Fan Wang 0019
  • Junchi Yan

Dense contrastive learning (DCL) has been recently explored for learning localized information for dense prediction tasks (e. g. , detection and segmentation). It still suffers the difficulty of mining pixels/patches correspondence between two views. A simple way is inputting the same view twice and aligning the pixel/patch representation. However, it would reduce the variance of inputs, and hurts the performance. We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence. Besides, by using positional queries, PQCL increases the variance of inputs, to enhance training. We apply PQCL to popular transformer-based CL frameworks (DINO and iBOT, and evaluate them on classification, detection and segmentation tasks, where our method obtains stable improvements, especially for dense tasks. It achieves new state-of-the-art in most settings. Code is available at https: //github. com/Sherrylone/Query_Contrastive.

ICLR Conference 2023 Conference Paper

Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling

  • Penghao Wu
  • Li Chen 0008
  • Hongyang Li 0001
  • Xiaosong Jia
  • Junchi Yan
  • Yu Qiao 0001

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks the view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. As a side product, the pre-trained geometric modeling networks could bring further improvement to the depth and odometry estimation tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data.

ICML Conference 2023 Conference Paper

QAS-Bench: Rethinking Quantum Architecture Search and A Benchmark

  • Xudong Lu
  • Kaisen Pan
  • Ge Yan 0001
  • Jiaming Shan
  • Wenjie Wu
  • Junchi Yan

Automatic quantum architecture search (QAS) has been widely studied across disciplines with different implications. In this paper, beyond a particular domain, we formulate the QAS problem into two basic (and relatively even ideal) tasks: i) arbitrary quantum circuit (QC) regeneration given a target QC; ii) approximating an arbitrary unitary (oracle). The latter can be connected to the setting of various quantum machine learning tasks and other QAS applications. Based on these two tasks, we generate a public QAS benchmark including 900 random QCs and 400 random unitary matrices which is still missing in the literature. We evaluate six baseline algorithms including brute force search, simulated annealing, genetic algorithm, reinforcement learning, hybrid algorithm, and differentiable algorithm as part of our benchmark. One characteristic of our proposed evaluation protocol on the basic tasks is that it deprives the domain-specific designs and techniques as used in existing QAS literature, making a unified evaluation possible and focusing on the vanilla search methods themselves without coupling with domain prior. In fact, the unitary approximation task could be algorithmically more difficult than the specific problems as it needs to explore the whole matrix space to fit the unitary. While specific tasks often only need to fit a partial observation of the unitary as the objective for search. Data and code are available at https: //github. com/Lucky-Lance/QAS-Bench.

ICML Conference 2023 Conference Paper

Quantum 3D Graph Learning with Applications to Molecule Embedding

  • Ge Yan 0001
  • Huaijin Wu
  • Junchi Yan

Learning 3D graph with spatial position as well as node attributes has been recently actively studied, for its utility in different applications e. g. 3D molecules. Quantum computing is known a promising direction for its potential theoretical supremacy for large-scale graph and combinatorial problem as well as the increasing evidence for the availability to physical quantum devices in the near term. In this paper, for the first time to our best knowledge, we propose a quantum 3D embedding ansatz that learns the latent representation of 3D structures from the Hilbert space composed of the Bloch sphere of each qubit. Specifically, the 3D Cartesian coordinates of nodes are converted into rotation and torsion angles and then encode them into the form of qubits. Moreover, Parameterized Quantum Circuit (PQC) is applied to serve as the trainable layers and the output of the PQC is adopted as the final node embedding. Experimental results on two downstream tasks, molecular property prediction and 3D molecular geometries generation, demonstrate the effectiveness of our model. We show the capacity and capability of our model with the evaluation on the QM9 dataset (134k molecules) with very few parameters, and its potential to be executed on a real quantum device.

ICML Conference 2023 Conference Paper

QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms

  • Wenjie Wu
  • Ge Yan 0001
  • Xudong Lu
  • Kaisen Pan
  • Junchi Yan

With the arrival of the Noisy Intermediate-Scale Quantum (NISQ) era and the fast development of machine learning, variational quantum algorithms (VQA) including Variational Quantum Eigensolver (VQE) and quantum neural network (QNN) have received increasing attention with wide potential applications in foreseeable near future. We study the problem of quantum architecture search (QAS) for VQA to automatically design parameterized quantum circuits (PQC). We devise a differentiable searching algorithm based on Gumbel-Softmax in contrast to peer methods that often require numerous circuit sampling and evaluation. Two versions of our algorithm are provided, namely macro search and micro search, where macro search directly searches for the whole circuit like other literature while the innovative micro search is able to infer the sub-circuit structure from a small-scale and then transfer that to a large-scale problem. We conduct intensive experiments on unweighted Max-Cut, ground state energy estimation, and image classification. The superior performance shows the efficiency and capability of macro search, which requires little prior knowledge. Moreover, the experiments on micro search show the potential of our algorithm for large-scale QAS problems.

NeurIPS Conference 2023 Conference Paper

Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification

  • Liangliang Shi
  • Haoyu Zhen
  • Gu Zhang
  • Junchi Yan

Classification is a fundamental problem in machine learning, and considerable efforts have been recently devoted to the demanding long-tailed setting due to its prevalence in nature. Departure from the Bayesian framework, this paper rethinks classification from a matching perspective by studying the matching probability between samples and labels with optimal transport (OT) formulation. Specifically, we first propose a new variant of optimal transport, called Relative Entropic Optimal Transport (RE-OT), which guides the coupling solution to a known prior information matrix. We gives some theoretical results and their proof for RE-OT and surprisingly find RE-OT can help to deblur for barycenter images. Then we adopt inverse RE-OT for training long-tailed data and find that the loss derived from RE-OT has a similar form to Softmax-based cross-entropy loss, indicating a close connection between optimal transport and classification and the potential for transferring concepts between these two academic fields, such as barycentric projection in OT, which can map the labels back to the feature space. We further derive an epoch-varying RE-OT loss, and do the experiments on unbalanced image classification, molecule classification, instance segmentation and representation learning. Experimental results show its effectiveness.

ICLR Conference 2023 Conference Paper

Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching

  • Chang Liu 0021
  • Zetian Jiang
  • Runzhong Wang
  • Lingxiao Huang
  • Pinyan Lu
  • Junchi Yan

Graph matching (GM) has been a building block in various areas including computer vision and pattern recognition. Despite recent impressive progress, existing deep GM methods often have obvious difficulty in handling outliers, which are ubiquitous in practice. We propose a deep reinforcement learning based approach RGM, whose sequential node matching scheme naturally fits the strategy for selective inlier matching against outliers. A revocable action framework is devised to improve the agent's flexibility against the complex constrained GM. Moreover, we propose a quadratic approximation technique to regularize the affinity score, in the presence of outliers. As such, the agent can finish inlier matching timely when the affinity score stops growing, for which otherwise an additional parameter i.e. the number of inliers is needed to avoid matching outliers. In this paper, we focus on learning the back-end solver under the most general form of GM: the Lawler's QAP, whose input is the affinity matrix. Especially, our approach can also boost existing GM methods that use such input. Experiments on multiple real-world datasets demonstrate its performance regarding both accuracy and robustness.

ICLR Conference 2023 Conference Paper

ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs

  • Han Lu
  • Zenan Li
  • Runzhong Wang
  • Qibing Ren
  • Xijun Li
  • Mingxuan Yuan
  • Jia Zeng
  • Xiaokang Yang 0001

Solving combinatorial optimization (CO) on graphs has been attracting increasing interests from the machine learning community whereby data-driven approaches were recently devised to go beyond traditional manually-designated algorithms. In this paper, we study the robustness of a combinatorial solver as a blackbox regardless it is classic or learning-based though the latter can often be more interesting to the ML community. Specifically, we develop a practically feasible robustness metric for general CO solvers. A no-worse optimal cost guarantee is developed as such the optimal solutions are not required to achieve for solvers, and we tackle the non-differentiable challenge in input instance disturbance by resorting to black-box adversarial attack methods. Extensive experiments are conducted on 14 unique combinations of solvers and CO problems, and we demonstrate that the performance of state-of-the-art solvers like Gurobi can degenerate by over 20% under the given time limit bound on the hard instances discovered by our robustness metric, raising concerns about the robustness of combinatorial optimization solvers.

NeurIPS Conference 2023 Conference Paper

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

  • Qitian Wu
  • Wentao Zhao
  • Chenxiao Yang
  • Hengrui Zhang
  • Fan Nie
  • Haitian Jiang
  • Yatao Bian
  • Junchi Yan

Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.

NeurIPS Conference 2023 Conference Paper

T2T: From Distribution Learning in Training to Gradient Search in Testing for Combinatorial Optimization

  • Yang Li
  • Jinpei Guo
  • Runzhong Wang
  • Junchi Yan

Extensive experiments have gradually revealed the potential performance bottleneck of modeling Combinatorial Optimization (CO) solving as neural solution prediction tasks. The neural networks, in their pursuit of minimizing the average objective score across the distribution of historical problem instances, diverge from the core target of CO of seeking optimal solutions for every test instance. This calls for an effective search on each problem instance, while the model should serve to provide supporting knowledge that benefits the search. To this end, we propose T2T (Training to Testing) framework that first leverages the generative modeling to estimate the high-quality solution distribution for each instance during training, and then conducts a gradient-based search within the solution space during testing. The proposed neural search paradigm consistently leverages generative modeling, specifically diffusion, for graduated solution improvement. It disrupts the local structure of the given solution by introducing noise and reconstructs a lower-cost solution guided by the optimization objective. Experimental results on Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS) show the significant superiority of T2T, demonstrating an average performance gain of 49. 15% for TSP solving and 17. 27% for MIS solving compared to the previous state-of-the-art.

ICLR Conference 2023 Conference Paper

The KFIoU Loss for Rotated Object Detection

  • Xue Yang 0005
  • Yue Zhou 0005
  • Gefan Zhang
  • Jirui Yang
  • Wentao Wang 0009
  • Junchi Yan
  • Xiaopeng Zhang 0008
  • Qi Tian 0001

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeling and Gaussian product, which mainly consists of two items. The first term is a scale-insensitive center point loss, which is used to quickly narrow the distance between the center points of the two bounding boxes. In the distance-independent second term, the product of the Gaussian distributions is adopted to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU loss at trend-level within a certain distance (i.e. within 9 pixels). This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors. The resulting new loss called KFIoU loss is easier to implement and works better compared with exact SkewIoU loss, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D. Extensive results on various datasets with different base detectors show the effectiveness of our approach.

ICLR Conference 2023 Conference Paper

Towards One-shot Neural Combinatorial Solvers: Theoretical and Empirical Notes on the Cardinality-Constrained Case

  • Runzhong Wang
  • Li Shen 0008
  • Yiting Chen 0003
  • Xiaokang Yang 0001
  • Dacheng Tao
  • Junchi Yan

One-shot non-autoregressive neural networks, different from RL-based ones, have been actively adopted for solving combinatorial optimization (CO) problems, which can be trained by the objective score in a self-supervised manner. Such methods have shown their superiority in efficiency (e.g. by parallelization) and potential for tackling predictive CO problems for decision-making under uncertainty. While the discrete constraints often become a bottleneck for gradient-based neural solvers, as currently handled in three typical ways: 1) adding a soft penalty in the objective, where a bounded violation of the constraints cannot be guaranteed, being critical to many constraint-sensitive scenarios; 2) perturbing the input to generate an approximate gradient in a black-box manner, though the constraints are exactly obeyed while the approximate gradients can hurt the performance on the objective score; 3) a compromise by developing soft algorithms whereby the output of neural networks obeys a relaxed constraint, and there can still occur an arbitrary degree of constraint-violation. Towards the ultimate goal of establishing a general framework for neural CO solver with the ability to control an arbitrary-small degree of constraint violation, in this paper, we focus on a more achievable and common setting: the cardinality constraints, which in fact can be readily encoded by a differentiable optimal transport (OT) layer. Based on this observation, we propose OT-based cardinality constraint encoding for end-to-end CO problem learning with two variants: Sinkhorn and Gumbel-Sinkhorn, whereby their violation of the constraints can be exactly characterized and bounded by our theoretical results. On synthetic and real-world CO problem instances, our methods surpass the state-of-the-art CO network and are comparable to (if not superior to) the commercial solver Gurobi. In particular, we further showcase a case study of applying our approach to the predictive portfolio optimization task on real-world asset price data, improving the Sharpe ratio from 1.1 to 2.0 of a strong LSTM+Gurobi baseline under the classic predict-then-optimize paradigm.

ICML Conference 2023 Conference Paper

Towards Quantum Machine Learning for Constrained Combinatorial Optimization: a Quantum QAP Solver

  • Xinyu Ye
  • Ge Yan 0001
  • Junchi Yan

Combinatorial optimization (CO) on the graph is a crucial but challenging research topic. Recent quantum algorithms provide a new perspective for solving CO problems and have the potential to demonstrate quantum advantage. Quantum Approximate Optimization Algorithm (QAOA) is a well-known quantum heuristic for CO constructed by a parametric quantum circuit. However, QAOA is originally designed for unconstrained problems and the circuit parameters and solutions are jointly solved with time-consuming iterations. In this paper, we propose a novel quantum neural network (QNN) for learning CO problems in a supervised manner to achieve better and faster results. We focus on the Quadratic Assignment Problem (QAP) with matching constraints and the node permutation invariance property. To this end, a quantum neural network called QAP-QNN is devised to translate the QAP into a constrained vertex classification task. Moreover, we study two QAP tasks: Graph Matching and Traveling Salesman Problem on TorchQauntum simulators, and empirically show the effectiveness of our approach.

IJCAI Conference 2023 Conference Paper

Transformers in Time Series: A Survey

  • Qingsong Wen
  • Tian Zhou
  • Chaoli Zhang
  • Weiqi Chen
  • Ziqing Ma
  • Junchi Yan
  • Liang Sun

Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance.

ICML Conference 2023 Conference Paper

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective

  • Liangliang Shi
  • Gu Zhang
  • Haoyu Zhen
  • Jintao Fan
  • Junchi Yan

Previous research on contrastive learning (CL) has primarily focused on pairwise views to learn representations by attracting positive samples and repelling negative ones. In this work, we aim to understand and generalize CL from a point set matching perspective, instead of the comparison between two points. Specifically, we formulate CL as a form of inverse optimal transport (IOT), which involves a bilevel optimization procedure for learning where the outter minimization aims to learn the representations and the inner is to learn the coupling (i. e. the probability of matching matrix) between the point sets. Specifically, by adjusting the relaxation degree of constraints in the inner minimization, we obtain three contrastive losses and show that the dominant contrastive loss in literature InfoNCE falls into one of these losses. This reveals a new and more general algorithmic framework for CL. Additionally, the soft matching scheme in IOT induces a uniformity penalty to enhance representation learning which is akin to the CL’s uniformity. Results on vision benchmarks show the effectiveness of our derived loss family and the new uniformity term.

ICML Conference 2022 Conference Paper

Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning

  • Chang Liu 0021
  • Chenfei Lou
  • Runzhong Wang
  • Alan Yuhan Xi
  • Li Shen 0008
  • Junchi Yan

Model fusion without accessing training data in machine learning has attracted increasing interest due to the practical resource-saving and data privacy issues. During the training process, the neural weights of each model can be randomly permuted, and we have to align the channels of each layer before fusing them. Regrading the channels as nodes and weights as edges, aligning the channels to maximize weight similarity is a challenging NP-hard assignment problem. Due to its quadratic assignment nature, we formulate the model fusion problem as a graph matching task, considering the second-order similarity of model weights instead of previous work merely formulating model fusion as a linear assignment problem. For the rising problem scale and multi-model consistency issues, we propose an efficient graduated assignment-based model fusion method, dubbed GAMF, which iteratively updates the matchings in a consistency-maintaining manner. We apply GAMF to tackle the compact model ensemble task and federated learning task on MNIST, CIFAR-10, CIFAR-100, and Tiny-Imagenet. The performance shows the efficacy of our GAMF compared to state-of-the-art baselines.

ICLR Conference 2022 Conference Paper

Explaining Point Processes by Learning Interpretable Temporal Logic Rules

  • Shuang Li 0002
  • Mingquan Feng
  • Lu Wang 0029
  • Abdelmajid Essofi
  • Yufeng Cao
  • Junchi Yan
  • Le Song

We propose a principled method to learn a set of human-readable logic rules to explain temporal point processes. We assume that the generative mechanisms underlying the temporal point processes are governed by a set of first-order temporal logic rules, as a compact representation of domain knowledge. Our method formulates the rule discovery process from noisy event data as a maximum likelihood problem, and designs an efficient and tractable branch-and-price algorithm to progressively search for new rules and expand existing rules. The proposed algorithm alternates between the rule generation stage and the rule evaluation stage, and uncovers the most important collection of logic rules within a fixed time limit for both synthetic and real event data. In a real healthcare application, we also had human experts (i.e., doctors) verify the learned temporal logic rules and provide further improvements. These expert-revised interpretable rules lead to a point process model which outperforms previous state-of-the-arts for symptom prediction, both in their occurrence times and types.

NeurIPS Conference 2022 Conference Paper

Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks

  • Chenxiao Yang
  • Qitian Wu
  • Junchi Yan

We study a new paradigm of knowledge transfer that aims at encoding graph topological information into graph neural networks (GNNs) by distilling knowledge from a teacher GNN model trained on a complete graph to a student GNN model operating on a smaller or sparser graph. To this end, we revisit the connection between thermodynamics and the behavior of GNN, based on which we propose Neural Heat Kernel (NHK) to encapsulate the geometric property of the underlying manifold concerning the architecture of GNNs. A fundamental and principled solution is derived by aligning NHKs on teacher and student models, dubbed as Geometric Knowledge Distillation. We develop non- and parametric instantiations and demonstrate their efficacy in various experimental settings for knowledge distillation regarding different types of privileged topological information and teacher-student schemes.

ICML Conference 2022 Conference Paper

GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks

  • Yixuan He 0001
  • Quan Gan
  • David P. Wipf
  • Gesine Reinert
  • Junchi Yan
  • Mihai Cucuringu

Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e. g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. Moreover, new objectives are devised to encode ranking upsets/violations. The framework involves a ranking score estimation approach, and adds an inductive bias by unfolding the Fiedler vector computation of the graph constructed from a learnable similarity matrix. Experimental results on extensive data sets show that our methods attain competitive and often superior performance against baselines, as well as showing promising transfer ability. Codes and preprocessed data are at: \url{https: //github. com/SherylHYX/GNNRank}.

NeurIPS Conference 2022 Conference Paper

GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs

  • Zenan Li
  • Qitian Wu
  • Fan Nie
  • Junchi Yan

Despite the remarkable success of graph neural networks (GNNs) for graph representation learning, they are generally built on the (unreliable) i. i. d. assumption across training and testing data. However, real-world graph data are universally comprised of outliers in training set and out-of-distribution (OOD) testing samples from unseen domains, which solicits effective models for i) debiased learning and ii) OOD detection, towards general trustworthy purpose. In this paper, we first mathematically formulate the two challenging problems for graph data and take an initiative on tackling them under a unified probabilistic model. Specifically, we model the graph generative process to characterize the distribution shifts of graph data together with an additionally introduced latent environment variable as an indicator. We then define a variational distribution, i. e. , a recognition model, to infer the environment during training of GNN. By instantiating the generative models as two-component mixtures, we derive a tractable learning objective and theoretically justify that the model can i) automatically identify and down-weight outliers in the training procedure, and ii) induce an effective OOD detector simultaneously. Experiments on diverse datasets with different types of OOD data prove that our model consistently outperforms strong baselines for both debiasing and OOD detection tasks. The source code has been made publicly available at https: //github. com/Emiyalzn/GraphDE.

NeurIPS Conference 2022 Conference Paper

GraphQNTK: Quantum Neural Tangent Kernel for Graph Data

  • Yehui Tang
  • Junchi Yan

Graph Neural Networks (GNNs) and Graph Kernels (GKs) are two fundamental tools used to analyze graph-structured data. Efforts have been recently made in developing a composite graph learning architecture combining the expressive power of GNNs and the transparent trainability of GKs. However, learning efficiency on these models should be carefully considered as the huge computation overhead. Besides, their convolutional methods are often straightforward and introduce severe loss of graph structure information. In this paper, we design a novel quantum graph learning model to characterize the structural information while using quantum parallelism to improve computing efficiency. Specifically, a quantum algorithm is proposed to approximately estimate the neural tangent kernel of the underlying graph neural network where a multi-head quantum attention mechanism is introduced to properly incorporate semantic similarity information of nodes into the model. We empirically show that our method achieves competitive performance on several graph classification benchmarks, and theoretical analysis is provided to demonstrate the superiority of our quantum algorithm. Source code is available at \url{https: //github. com/abel1231/graphQNTK}.

ICLR Conference 2022 Conference Paper

Handling Distribution Shifts on Graphs: An Invariance Perspective

  • Qitian Wu
  • Hengrui Zhang
  • Junchi Yan
  • David P. Wipf

There is increasing evidence suggesting neural networks' sensitivity to distribution shifts, so that research on out-of-distribution (OOD) generalization comes into the spotlight. Nonetheless, current endeavors mostly focus on Euclidean data, and its formulation for graph-structured data is not clear and remains under-explored, given two-fold fundamental challenges: 1) the inter-connection among nodes in one graph, which induces non-IID generation of data points even under the same environment, and 2) the structural information in the input graph, which is also informative for prediction. In this paper, we formulate the OOD problem on graphs and develop a new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM), that facilitates graph neural networks to leverage invariance principles for prediction. EERM resorts to multiple context explorers (specified as graph structure editers in our case) that are adversarially trained to maximize the variance of risks from multiple virtual environments. Such a design enables the model to extrapolate from a single observed environment which is the common case for node-level prediction. We prove the validity of our method by theoretically showing its guarantee of a valid OOD solution and further demonstrate its power on various real-world datasets for handling distribution shifts from artificial spurious features, cross-domain transfers and dynamic graph evolution.

NeurIPS Conference 2022 Conference Paper

Improving Generative Adversarial Networks via Adversarial Learning in Latent Space

  • Yang Li
  • Yichuan Mo
  • Liangliang Shi
  • Junchi Yan

For Generative Adversarial Networks which map a latent distribution to the target distribution, in this paper, we study how the sampling in latent space can affect the generation performance, especially for images. We observe that, as the neural generator is a continuous function, two close samples in latent space would be mapped into two nearby images, while their quality can differ much as the quality generally does not exhibit a continuous nature in pixel space. From such a continuous mapping function perspective, it is also possible that two distant latent samples can be mapped into two close images (if not exactly the same). In particular, if the latent samples are mapped in aggregation into a single mode, mode collapse occurs. Accordingly, we propose adding an implicit latent transform before the mapping function to improve latent $z$ from its initial distribution, e. g. , Gaussian. This is achieved using well-developed adversarial sample mining techniques, e. g. iterative fast gradient sign method (I-FGSM). We further propose new GAN training pipelines to obtain better generative mappings w. r. t quality and diversity by introducing targeted latent transforms into the bi-level optimization of GAN. Experimental results on visual data show that our method can effectively achieve improvement in both quality and diversity.

AAAI Conference 2022 Conference Paper

Input-Specific Robustness Certification for Randomized Smoothing

  • Ruoxin Chen
  • Jie Li
  • Junchi Yan
  • Ping Li
  • Bin Sheng

Although randomized smoothing has demonstrated high certified robustness and superior scalability to other certified defenses, the high computational overhead of the robustness certification bottlenecks the practical applicability, as it depends heavily on the large sample approximation for estimating the confidence interval. In existing works, the sample size for the confidence interval is universally set and agnostic to the input for prediction. This Input-Agnostic Sampling (IAS) scheme may yield a poor Average Certified Radius (ACR)-runtime trade-off which calls for improvement. In this paper, we propose Input-Specific Sampling (ISS) acceleration to achieve the cost-effectiveness for robustness certification, in an adaptive way of reducing the sampling size based on the input characteristic. Furthermore, our method universally controls the certified radius decline from the ISS sample size reduction. The empirical results on CIFAR-10 and ImageNet show that ISS can speed up the certification by more than three times at a limited cost of 0. 05 certified radius. Meanwhile, ISS surpasses IAS on the average certified radius across the extensive hyperparameter settings. Specifically, ISS achieves ACR=0. 958 on ImageNet in 250 minutes, compared to ACR=0. 917 by IAS under the same condition. We release our code in https: //github. com/roy-ch/Input-Specific-Certification.

IJCAI Conference 2022 Conference Paper

Learning Mixture of Neural Temporal Point Processes for Multi-dimensional Event Sequence Clustering

  • Yunhao Zhang
  • Junchi Yan
  • Xiaolu Zhang
  • Jun Zhou
  • Xiaokang Yang

Multi-dimensional event sequence clustering applies to many scenarios e. g. e-Commerce and electronic health. Traditional clustering models fail to characterize complex real-world processes due to the strong parametric assumption. While Neural Temporal Point Processes (NTPPs) mainly focus on modeling similar sequences instead of clustering. To fill the gap, we propose Mixture of Neural Temporal Point Processes (NTPP-MIX), a general framework that can utilize many existing NTPPs for multi-dimensional event sequence clustering. In NTPP-MIX, the prior distribution of coefficients for cluster assignment is modeled by a Dirichlet distribution. When the assignment is given, the conditional probability of a sequence is modeled by the mixture of a series of NTPPs. We combine variational EM algorithm and Stochastic Gradient Descent (SGD) to efficiently train the framework. In E-step, we fix parameters for NTPPs and approximate the true posterior with variational distributions. In M-step, we fix variational distributions and use SGD to update parameters of NTPPs. Extensive experimental results on four synthetic datasets and three real-world datasets show the effectiveness of NTPP-MIX against state-of-the-arts.

NeurIPS Conference 2022 Conference Paper

Learning Substructure Invariance for Out-of-Distribution Molecular Representations

  • Nianzu Yang
  • Kaipeng Zeng
  • Qitian Wu
  • Xiaosong Jia
  • Junchi Yan

Molecule representation learning (MRL) has been extensively studied and current methods have shown promising power for various tasks, e. g. , molecular property prediction and target identification. However, a common hypothesis of existing methods is that either the model development or experimental evaluation is mostly based on i. i. d. data across training and testing. Such a hypothesis can be violated in real-world applications where testing molecules could come from new environments, bringing about serious performance degradation or unexpected prediction. We propose a new representation learning framework entitled MoleOOD to enhance the robustness of MRL models against such distribution shifts, motivated by an observation that the (bio)chemical properties of molecules are usually invariantly associated with certain privileged molecular substructures across different environments (e. g. , scaffolds, sizes, etc. ). Specifically, We introduce an environment inference model to identify the latent factors that impact data generation from different distributions in a fully data-driven manner. We also propose a new learning objective to guide the molecule encoder to leverage environment-invariant substructures that more stably relate with the labels across environments. Extensive experiments on ten real-world datasets demonstrate that our model has a stronger generalization ability than existing methods under various out-of-distribution (OOD) settings, despite the absence of manual specifications of environments. Particularly, our method achieves up to 5. 9\% and 3. 9\% improvement over the strongest baselines on OGB and DrugOOD benchmarks in terms of ROC-AUC, respectively. Our source code is publicly available at \url{https: //github. com/yangnianzu0515/MoleOOD}.

NeurIPS Conference 2022 Conference Paper

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

  • Qitian Wu
  • Wentao Zhao
  • Zenan Li
  • David P Wipf
  • Junchi Yan

Graph neural networks have been extensively studied for learning with inter-connected data. Despite this, recent evidence has revealed GNNs' deficiencies related to over-squashing, heterophily, handling long-range dependencies, edge incompleteness and particularly, the absence of graphs altogether. While a plausible solution is to learn new adaptive topology for message passing, issues concerning quadratic complexity hinder simultaneous guarantees for scalability and precision in large networks. In this paper, we introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes, as an important building block for a new class of Transformer networks for node classification on large graphs, dubbed as NodeFormer. Specifically, the efficient computation is enabled by a kernerlized Gumbel-Softmax operator that reduces the algorithmic complexity to linearity w. r. t. node numbers for learning latent graph structures from large, potentially fully-connected graphs in a differentiable manner. We also provide accompanying theory as justification for our design. Extensive experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs (with up to 2M nodes) and graph-enhanced applications (e. g. , image classification) where input graphs are missing. The codes are available at https: //github. com/qitianwu/NodeFormer.

ICLR Conference 2022 Conference Paper

Nonlinear ICA Using Volume-Preserving Transformations

  • Xiaojiang Yang
  • Yi Wang
  • Jiacheng Sun
  • Xing Zhang
  • Shifeng Zhang
  • Zhenguo Li
  • Junchi Yan

Nonlinear ICA is a fundamental problem in machine learning, aiming to identify the underlying independent components (sources) from data which is assumed to be a nonlinear function (mixing function) of these sources. Recent works prove that if the sources have some particular structures (e.g. temporal structure), they are theoretically identifiable even if the mixing function is arbitrary. However, in many cases such restrictions on the sources are difficult to satisfy or even verify, hence it inhibits the applicability of the proposed methods. Different from these works, we propose a general framework for nonlinear ICA, in which the mixing function is assumed to be a volume-preserving transformation, and meanwhile the conditions on the sources can be much looser. We provide an insightful proof of the identifiability of the proposed framework. We implement the framework by volume-preserving Flow-based models, and verify our theory by experiments on artificial data and synthesized images. Moreover, results on real-world images indicate that our framework can disentangle interpretable features.

ICML Conference 2022 Conference Paper

On Collective Robustness of Bagging Against Data Poisoning

  • Ruoxin Chen
  • Zenan Li
  • Jie Li 0002
  • Junchi Yan
  • Chentao Wu

Bootstrap aggregating (bagging) is an effective ensemble protocol, which is believed can enhance robustness by its majority voting mechanism. Recent works further prove the sample-wise robustness certificates for certain forms of bagging (e. g. partition aggregation). Beyond these particular forms, in this paper, we propose the first collective certification for general bagging to compute the tight robustness against the global poisoning attack. Specifically, we compute the maximum number of simultaneously changed predictions via solving a binary integer linear programming (BILP) problem. Then we analyze the robustness of vanilla bagging and give the upper bound of the tolerable poison budget. Based on this analysis, we propose hash bagging to improve the robustness of vanilla bagging almost for free. This is achieved by modifying the random subsampling in vanilla bagging to a hash-based deterministic subsampling, as a way of controlling the influence scope for each poisoning sample universally. Our extensive experiments show the notable advantage in terms of applicability and robustness. Our code is available at https: //github. com/Emiyalzn/ICML22-CRB.

NeurIPS Conference 2022 Conference Paper

Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain

  • Yiting Chen
  • Qibing Ren
  • Junchi Yan

The existence of adversarial examples poses concerns for the robustness of convolutional neural networks (CNN), for which a popular hypothesis is about the frequency bias phenomenon: CNNs rely more on high-frequency components (HFC) for classification than humans, which causes the brittleness of CNNs. However, most previous works manually select and roughly divide the image frequency spectrum and conduct qualitative analysis. In this work, we introduce Shapley value, a metric of cooperative game theory, into the frequency domain and propose to quantify the positive (negative) impact of every frequency component of data on CNNs. Based on the Shapley value, we quantify the impact in a fine-grained way and show intriguing instance disparity. Statistically, we investigate adversarial training(AT) and the adversarial attack in the frequency domain. The observations motivate us to perform an in-depth analysis and lead to multiple novel hypotheses about i) the cause of adversarial robustness of the AT model; ii) the fairness problem of AT between different classes in the same dataset; iii) the attack bias on different frequency components. Finally, we propose a Shapley-value guided data augmentation technique for improving the robustness. Experimental results on image classification benchmarks show its effectiveness.

NeurIPS Conference 2022 Conference Paper

The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design

  • Ruoyu Cheng
  • Xianglong Lyu
  • Yang Li
  • Junjie Ye
  • Jianye Hao
  • Junchi Yan

Placement and routing are two critical yet time-consuming steps of chip design in modern VLSI systems. Distinct from traditional heuristic solvers, this paper on one hand proposes an RL-based model for mixed-size macro placement, which differs from existing learning-based placers that often consider the macro by coarse grid-based mask. While the standard cells are placed via gradient-based GPU acceleration. On the other hand, a one-shot conditional generative routing model, which is composed of a special-designed input-size-adapting generator and a bi-discriminator, is devised to perform one-shot routing to the pins within each net, and the order of nets to route is adaptively learned. Combining these techniques, we develop a flexible and efficient neural pipeline, which to our best knowledge, is the first joint placement and routing network without involving any traditional heuristic solver. Experimental results on chip design benchmarks showcase the effectiveness of our approach, with code that will be made publicly available.

NeurIPS Conference 2022 Conference Paper

Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

  • Chenxiao Yang
  • Qitian Wu
  • Qingsong Wen
  • Zhiqiang Zhou
  • Liang Sun
  • Junchi Yan

The goal of sequential event prediction is to estimate the next event based on a sequence of historical events, with applications to sequential recommendation, user behavior analysis and clinical treatment. In practice, the next-event prediction models are trained with sequential data collected at one time and need to generalize to newly arrived sequences in remote future, which requires models to handle temporal distribution shift from training to testing. In this paper, we first take a data-generating perspective to reveal a negative result that existing approaches with maximum likelihood estimation would fail for distribution shift due to the latent context confounder, i. e. , the common cause for the historical events and the next event. Then we devise a new learning objective based on backdoor adjustment and further harness variational inference to make it tractable for sequence learning problems. On top of that, we propose a framework with hierarchical branching structures for learning context-specific representations. Comprehensive experiments on diverse tasks (e. g. , sequential recommendation) demonstrate the effectiveness, applicability and scalability of our method with various off-the-shelf models as backbones.

NeurIPS Conference 2022 Conference Paper

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

  • Penghao Wu
  • Xiaosong Jia
  • Li Chen
  • Junchi Yan
  • Hongyang Li
  • Yu Qiao

Current end-to-end autonomous driving methods either run a controller based on a planned trajectory or perform control prediction directly, which have spanned two separately studied lines of research. Seeing their potential mutual benefits to each other, this paper takes the initiative to explore the combination of these two well-developed worlds. Specifically, our integrated approach has two branches for trajectory planning and direct control, respectively. The trajectory branch predicts the future trajectory, while the control branch involves a novel multi-step prediction scheme such that the relationship between current actions and future states can be reasoned. The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step. The outputs from two branches are then fused to achieve complementary advantages. Our results are evaluated in the closed-loop urban driving setting with challenging scenarios using the CARLA simulator. Even with a monocular camera input, the proposed approach ranks first on the official CARLA Leaderboard, outperforming other complex candidates with multiple sensors or fusion mechanisms by a large margin. The sourcecode is publicly available at https: //github. com/OpenPerceptionX/TCP

NeurIPS Conference 2022 Conference Paper

ZARTS: On Zero-order Optimization for Neural Architecture Search

  • Xiaoxing Wang
  • Wenxuan Guo
  • Jianlin Su
  • Xiaokang Yang
  • Junchi Yan

Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. It introduces trainable architecture parameters to represent the importance of candidate operations and proposes first/second-order approximation to estimate their gradients, making it possible to solve NAS by gradient descent algorithm. However, our in-depth empirical results show that the approximation often distorts the loss landscape, leading to the biased objective to optimize and, in turn, inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue. Also, we search on the search space of DARTS to compare with peer methods, and our discovered architecture achieves 97. 54\% accuracy on CIFAR-10 and 75. 7\% top-1 accuracy on ImageNet. Finally, we combine our ZARTS with three orthogonal variants of DARTS for faster search speed and better performance. Source code will be made publicly available at: \url{https: //github. com/vicFigure/ZARTS}.

ICLR Conference 2022 Conference Paper

Zero-CL: Instance and Feature decorrelation for negative-free symmetric contrastive learning

  • Shaofeng Zhang
  • Feng Zhu 0006
  • Junchi Yan
  • Rui Zhao 0001
  • Xiaokang Yang 0001

For self-supervised contrastive learning, models can easily collapse and generate trivial constant solutions. The issue has been mitigated by recent improvement on objective design, which however often requires square complexity either for the size of instances ($\mathcal{O}(N^{2})$) or feature dimensions ($\mathcal{O}(d)^2$). To prevent such collapse, we develop two novel methods by decorrelating on different dimensions on the instance embedding stacking matrix, i.e., \textbf{I}nstance-wise (ICL) and \textbf{F}eature-wise (FCL) \textbf{C}ontrastive \textbf{L}earning. The proposed two methods (FCL, ICL) can be combined synthetically, called Zero-CL, where ``Zero'' means negative samples are \textbf{zero} relevant, which allows Zero-CL to completely discard negative pairs i.e., with \textbf{zero} negative samples. Compared with previous methods, Zero-CL mainly enjoys three advantages: 1) Negative free in symmetric architecture. 2) By whitening transformation, the correlation of the different features is equal to zero, alleviating information redundancy. 3) Zero-CL remains original information to a great extent after transformation, which improves the accuracy against other whitening transformation techniques. Extensive experimental results on CIFAR-10/100 and ImageNet show that Zero-CL outperforms or is on par with state-of-the-art symmetric contrastive learning methods.

NeurIPS Conference 2021 Conference Paper

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

  • Runzhong Wang
  • Zhigang Hua
  • Gan Liu
  • Jiayi Zhang
  • Junchi Yan
  • Feng Qi
  • Shuang Yang
  • Jun Zhou

Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution quality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e. g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity. The experiments and results on several popular CO problems like Directed Acyclic Graph scheduling, Graph Edit Distance and Hamiltonian Cycle Problem show its effectiveness over manually designed heuristics and single-level learning methods.

ICLR Conference 2021 Conference Paper

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

  • Xiangxiang Chu
  • Xiaoxing Wang
  • Bo Zhang 0046
  • Shun Lu 0001
  • Xiaolin Wei
  • Junchi Yan

Despite the fast development of differentiable architecture search (DARTS), it suffers from a standing instability issue regarding searching performance, which extremely limits its application. Existing robustifying methods draw clues from the outcome instead of finding out the causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal of performance collapse, and the searching should be stopped once an indicator reaches a preset threshold. However, these methods tend to easily reject good architectures if thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections with a learnable architectural coefficient can easily recover from a disadvantageous state and become dominant. We conjecture that skip connections profit too much from this privilege, hence causing the collapse for the derived model. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. Extensive experiments on various datasets verify that our approach can substantially improve the robustness of DARTS. Our code is available at https://github.com/Meituan-AutoML/DARTS-

ICML Conference 2021 Conference Paper

Deep Latent Graph Matching

  • Tianshu Yu 0001
  • Runzhong Wang
  • Junchi Yan
  • Baoxin Li

Deep learning for graph matching (GM) has emerged as an important research topic due to its superior performance over traditional methods and insights it provides for solving other combinatorial problems on graph. While recent deep methods for GM extensively investigated effective node/edge feature learning or downstream GM solvers given such learned features, there is little existing work questioning if the fixed connectivity/topology typically constructed using heuristics (e. g. , Delaunay or k-nearest) is indeed suitable for GM. From a learning perspective, we argue that the fixed topology may restrict the model capacity and thus potentially hinder the performance. To address this, we propose to learn the (distribution of) latent topology, which can better support the downstream GM task. We devise two latent graph generation procedures, one deterministic and one generative. Particularly, the generative procedure emphasizes the across-graph consistency and thus can be viewed as a matching-guided co-generative model. Our methods deliver superior performance over previous state-of-the-arts on public benchmarks, hence supporting our hypothesis.

IROS Conference 2021 Conference Paper

Dynamic Modeling of Hand-Object Interactions via Tactile Sensing

  • Qiang Zhang
  • Yunzhu Li
  • Yiyue Luo
  • Wan Shou
  • Michael Foshey
  • Junchi Yan
  • Joshua B. Tenenbaum
  • Wojciech Matusik

Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We propose a framework aiming at predicting the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, esti-mate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

NeurIPS Conference 2021 Conference Paper

From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

  • Hengrui Zhang
  • Qitian Wu
  • Junchi Yan
  • David Wipf
  • Philip S Yu

We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data. It follows the previous methods that generate two views of an input graph through data augmentation. However, unlike contrastive methods that focus on instance-level discrimination, we optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis. Compared with other works, our approach requires none of the parameterized mutual information estimator, additional projector, asymmetric structures, and most importantly, negative samples which can be costly. We show that the new objective essentially 1) aims at discarding augmentation-variant information by learning invariant representations, and 2) can prevent degenerated solutions by decorrelating features in different dimensions. Our theoretical analysis further provides an understanding for the new objective which can be equivalently seen as an instantiation of the Information Bottleneck Principle under the self-supervised setting. Despite its simplicity, our method performs competitively on seven public graph datasets.

NeurIPS Conference 2021 Conference Paper

GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction

  • Longyuan Li
  • Jian Yao
  • Li Wenliang
  • Tong He
  • Tianjun Xiao
  • Junchi Yan
  • David Wipf
  • Zheng Zhang

Learning the distribution of future trajectories conditioned on the past is a crucial problem for understanding multi-agent systems. This is challenging because humans make decisions based on complex social relations and personal intents, resulting in highly complex uncertainties over trajectories. To address this problem, we propose a conditional deep generative model that combines advances in graph neural networks. The prior and recognition model encodes two types of latent codes for each agent: an inter-agent latent code to represent social relations and an intra-agent latent code to represent agent intentions. The decoder is carefully devised to leverage the codes in a disentangled way to predict multi-modal future trajectory distribution. Specifically, a graph attention network built upon inter-agent latent code is used to learn continuous pair-wise relations, and an agent's motion is controlled by its latent intents and its observations of all other agents. Through experiments on both synthetic and real-world datasets, we show that our model outperforms previous work in multiple performance metrics. We also show that our model generates realistic multi-modal trajectories.

AAAI Conference 2021 Conference Paper

Learning Comprehensive Motion Representation for Action Recognition

  • Mingyu Wu
  • Boyuan Jiang
  • Donghao Luo
  • Junchi Yan
  • Yabiao Wang
  • Ying Tai
  • Chengjie Wang
  • Jilin Li

For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing interframe connections while still suffering the limited temporal receptive field or high latency. Moreover, the feature enhancement is often only performed by channel or space dimension in action recognition. To address these issues, we first devise a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. The channel gates generated by CME incorporate the information from all the other frames in the video. We further propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps. The intuition is that the change of background is typically slower than the motion area. Both CME and SME have clear physical meaning in capturing action clues. By integrating the two modules into the off-the-shelf 2D network, we finally obtain a Comprehensive Motion Representation (CMR) learning method for action recognition, which achieves competitive performance on Something-Something V1 & V2 and Kinetics-400. On the temporal reasoning datasets Something-Something V1 and V2, our method outperforms the current state-of-the-art by 2. 3% and 1. 9% when using 16 frames as input, respectively.

NeurIPS Conference 2021 Conference Paper

Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence

  • Xue Yang
  • Xiaojiang Yang
  • Jirui Yang
  • Qi Ming
  • Wentao Wang
  • Qi Tian
  • Junchi Yan

Existing rotated object detectors are mostly inherited from the horizontal detection paradigm, as the latter has evolved into a well-developed area. However, these detectors are difficult to perform prominently in high-precision detection due to the limitation of current regression loss design, especially for objects with large aspect ratios. Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection. We show that one essential challenge is how to modulate the coupled parameters in the rotation regression loss, as such the estimated parameters can influence to each other during the dynamic joint optimization, in an adaptive and synergetic way. Specifically, we first convert the rotated bounding box into a 2-D Gaussian distribution, and then calculate the Kullback-Leibler Divergence (KLD) between the Gaussian distributions as the regression loss. By analyzing the gradient of each parameter, we show that KLD (and its derivatives) can dynamically adjust the parameter gradients according to the characteristics of the object. For instance, it will adjust the importance (gradient weight) of the angle parameter according to the aspect ratio. This mechanism can be vital for high-precision detection as a slight angle error would cause a serious accuracy drop for large aspect ratios objects. More importantly, we have proved that KLD is scale invariant. We further show that the KLD loss can be degenerated into the popular Ln-norm loss for horizontal detection. Experimental results on seven datasets using different detectors show its consistent superiority, and codes are available at https: //github. com/yangxue0827/RotationDetection.

AAAI Conference 2021 Conference Paper

Learning Local Neighboring Structure for Robust 3D Shape Representation

  • Zhongpai Gao
  • Junchi Yan
  • Guangtao Zhai
  • Juyong Zhang
  • Yiyan Yang
  • Xiaokang Yang

Mesh is a powerful data structure for 3D shapes. Representation learning for 3D meshes is important in many computer vision and graphics applications. The recent success of convolutional neural networks (CNNs) for structured data (e. g. , images) suggests the value of adapting insight from CNN for 3D shapes. However, 3D shape data are irregular since each node’s neighbors are unordered. Various graph neural networks for 3D shapes have been developed with isotropic filters or predefined local coordinate systems to overcome the node inconsistency on graphs. However, isotropic filters or predefined local coordinate systems limit the representation power. In this paper, we propose a local structure-aware anisotropic convolutional operation (LSA-Conv) that learns adaptive weighting matrices for each node according to the local neighboring structure and performs shared anisotropic filters. In fact, the learnable weighting matrix is similar to the attention matrix in the random synthesizer – a new Transformer model for natural language processing (NLP). Comprehensive experiments demonstrate that our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Learning Modulated Loss for Rotated Object Detection

  • Wen Qian
  • Xue Yang
  • Silong Peng
  • Junchi Yan
  • Yue Guo

Popular rotated detection methods usually use five parameters (coordinates of the central point, width, height, and rotation angle) or eight parameters (coordinates of four vertices) to describe the rotated bounding box and `1 loss as the loss function. In this paper, we argue that the aforementioned integration can cause training instability and performance degeneration. The main reason is the discontinuity of loss which is caused by the contradiction between the definition of the rotated bounding box and the loss function. We refer to the above issues as rotation sensitivity error (RSE) and propose a modulated rotation loss to dismiss the discontinuity of loss. The modulated rotation loss can achieve consistent improvement on the five parameter methods and the eight parameter methods. Experimental results using one stage and two stages detectors demonstrate the effectiveness of our loss. The integrated network achieves competitive performances on several benchmarks including DOTA and UCAS AOD. The code is available at https: //github. com/ yangxue0827/RotationDetection.

ICML Conference 2021 Conference Paper

Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

  • Chao Chen 0016
  • Haoyu Geng
  • Nianzu Yang
  • Junchi Yan
  • Daiyue Xue
  • Jianping Yu
  • Xiaokang Yang 0001

User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user’s future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named \textit{self-modulating attention}, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.

IJCAI Conference 2021 Conference Paper

Learning Spectral Dictionary for Local Representation of Mesh

  • Zhongpai Gao
  • Junchi Yan
  • Guangtao Zhai
  • Xiaokang Yang

For meshes, sharing the topology of a template is a common and practical setting in face-, hand-, and body-related applications. Meshes are irregular since each vertex's neighbors are unordered and their orientations are inconsistent with other vertices. Previous methods use isotropic filters or predefined local coordinate systems or learning weighting matrices for each vertex of the template to overcome the irregularity. Learning weighting matrices for each vertex to soft-permute the vertex's neighbors into an implicit canonical order is an effective way to capture the local structure of each vertex. However, learning weighting matrices for each vertex increases the parameter size linearly with the number of vertices and large amounts of parameters are required for high-resolution 3D shapes. In this paper, we learn spectral dictionary (i. e. , bases) for the weighting matrices such that the parameter size is independent of the resolution of 3D shapes. The coefficients of the weighting matrix bases for each vertex are learned from the spectral features of the template's vertex and its neighbors in a weight-sharing manner. Comprehensive experiments demonstrate that our model produces state-of-the-art results with a much smaller model size.

IJCAI Conference 2021 Conference Paper

Neural Relation Inference for Multi-dimensional Temporal Point Processes via Message Passing Graph

  • Yunhao Zhang
  • Junchi Yan

Relation discovery for multi-dimensional temporal point processes (MTPP) has received increasing interest for its importance in prediction and interpretability of the underlying dynamics. Traditional statistical MTPP models like Hawkes Process have difficulty in capturing complex relation due to their limited parametric form of the intensity function. While recent neural-network-based models suffer poor interpretability. In this paper, we propose a neural relation inference model namely TPP-NRI. Given MTPP data, it adopts a variational inference framework to model the posterior relation of MTPP data for probabilistic estimation. Specifically, assuming the prior of the relation is known, the conditional probability of the MTPP conditional on a sampled relation is captured by a message passing graph neural network (GNN) based MTPP model. A variational distribution is introduced to approximate the true posterior. Experiments on synthetic and real-world data show that our model outperforms baseline methods on both inference capability and scalability for high-dimensional data.

NeurIPS Conference 2021 Conference Paper

On Joint Learning for Solving Placement and Routing in Chip Design

  • Ruoyu Cheng
  • Junchi Yan

For its advantage in GPU acceleration and less dependency on human experts, machine learning has been an emerging tool for solving the placement and routing problems, as two critical steps in modern chip design flow. Being still in its early stage, there are several fundamental issues unresolved: scalability, reward design, and end-to-end learning paradigm etc. To achieve end-to-end placement learning, we first propose a joint learning method for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme. To further bridge the placement with the subsequent routing task, we also develop a joint learning approach via reinforcement learning. One key design in our (reinforcement) learning paradigm involves a multi-view embedding model to encode both global graph level and local node level information of the input macros. Moreover, the random network distillation is devised to encourage exploration. Experiments on public chip design benchmarks show that our method can effectively learn from experience and also provide high-quality intermediate placement for the post standard cell placement, within few hours for training.

AAAI Conference 2021 Conference Paper

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

  • Xue Yang
  • Junchi Yan
  • Ziming Feng
  • Tao He

Rotation detection is a challenging task due to the difficulties of locating the multi-angle objects and separating them effectively from the background. Though considerable progress has been made, for practical settings, there still exist challenges for rotating objects with large aspect ratio, dense distribution and category extremely imbalance. In this paper, we propose an end-to-end refined single-stage rotation detector for fast and accurate object detection by using a progressive regression approach from coarse to fine granularity. Considering the shortcoming of feature misalignment in existing refined single-stage detector, we design a feature refinement module to improve detection performance by getting more accurate features. The key idea of feature refinement module is to re-encode the position information of the current refined bounding box to the corresponding feature points through pixel-wise feature interpolation to realize feature reconstruction and alignment. For more accurate rotation estimation, an approximate SkewIoU loss is proposed to solve the problem that the calculation of SkewIoU is not derivable. Experiments on three popular remote sensing public datasets DOTA, HRSC2016, UCAS-AOD as well as one scene text dataset ICDAR2015 show the effectiveness of our approach. The source code is available at https: //github. com/Thinklab-SJTU/R3Det Tensorflow and is also integrated in our open source rotation detection benchmark: https: //github. com/yangxue0827/RotationDetection.

AAAI Conference 2021 Conference Paper

Rethinking Bi-Level Optimization in Neural Architecture Search: A Gibbs Sampling Perspective

  • Chao Xue
  • Xiaoxing Wang
  • Junchi Yan
  • Yonggang Hu
  • Xiaokang Yang
  • Kewei Sun

One-Shot architecture search, aiming to explore all possible operations jointly based on a single model, has been an active direction of Neural Architecture Search (NAS). As a wellknown one-shot solution, Differentiable Architecture Search (DARTS) performs continuous relaxation on the architecture’s importance and results in a bi-level optimization problem. As many recent studies have shown, DARTS cannot always work robustly for new tasks, which is mainly due to the approximate solution of the bi-level optimization. In this paper, one-shot neural architecture search is addressed by adopting a directed probabilistic graphical model to represent the joint probability distribution over data and model. Then, neural architectures are searched for and optimized by Gibbs sampling. We rethink the bi-level optimization problem as the task of Gibbs sampling from the posterior distribution, which expresses the preferences for different models given the observed dataset. We evaluate our proposed NAS method – GibbsNAS on the search space used in DARTS/ENAS as well as the search space of NAS-Bench-201. Experimental results on multiple search space show the efficacy and stability of our approach.

ICML Conference 2021 Conference Paper

Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss

  • Xue Yang 0005
  • Junchi Yan
  • Qi Ming
  • Wentao Wang 0009
  • Xiaopeng Zhang 0008
  • Qi Tian 0001

Boundary discontinuity and its inconsistency to the final detection metric have been the bottleneck for rotating detection regression loss design. In this paper, we propose a novel regression loss based on Gaussian Wasserstein distance as a fundamental approach to solve the problem. Specifically, the rotated bounding box is converted to a 2-D Gaussian distribution, which enables to approximate the indifferentiable rotational IoU induced loss by the Gaussian Wasserstein distance (GWD) which can be learned efficiently by gradient back-propagation. GWD can still be informative for learning even there is no overlapping between two rotating bounding boxes which is often the case for small object detection. Thanks to its three unique properties, GWD can also elegantly solve the boundary discontinuity and square-like problem regardless how the bounding box is defined. Experiments on five datasets using different detectors show the effectiveness of our approach, and codes are available at https: //github. com/yangxue0827/RotationDetection.

AAAI Conference 2021 Conference Paper

Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning

  • Chao Chen
  • Dongsheng Li
  • Junchi Yan
  • Hanchi Huang
  • Xiaokang Yang

One-bit matrix completion is an important class of positiveunlabeled (PU) learning problems where the observations consist of only positive examples, e. g. , in top-N recommender systems. For the first time, we show that 1-bit matrix completion can be formulated as the problem of recovering clean graph signals from noise-corrupted signals in hypergraphs. This makes it possible to enjoy recent advances in graph signal learning. Then, we propose the spectral graph matrix completion (SGMC) method, which can recover the underlying matrix in distributed systems by filtering the noisy data in the graph frequency domain. Meanwhile, it can provide microand macro-level explanations by following vertex-frequency analysis. To tackle the computational and memory issue of performing graph signal operations on large graphs, we construct a scalable Nyström algorithm which can efficiently compute orthonormal eigenvectors. Furthermore, we also develop polynomial and sparse frequency filters to remedy the accuracy loss caused by the approximations. We demonstrate the effectiveness of our algorithms on top-N recommendation tasks, and the results on three large-scale real-world datasets show that SGMC can outperform state-of-the-art top-N recommendation algorithms in accuracy while only requiring a small fraction of training time compared to the baselines.

AAAI Conference 2021 Conference Paper

Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

  • Longyuan Li
  • Jihai Zhang
  • Junchi Yan
  • Yaohui Jin
  • Yunhao Zhang
  • Yanjie Duan
  • Guangjian Tian

Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.

NeurIPS Conference 2021 Conference Paper

Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

  • Qitian Wu
  • Chenxiao Yang
  • Junchi Yan

We target open-world feature extrapolation problem where the feature space of input data goes through expansion and a model trained on partially observed features needs to handle new features in test data without further retraining. The problem is of much significance for dealing with features incrementally collected from different fields. To this end, we propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e. g. , feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data. Based on our framework, we design two training strategies, a self-supervised approach and an inductive learning approach, to endow the model with extrapolation ability and alleviate feature-level over-fitting. We also provide theoretical analysis on the generalization error on test data with new features, which dissects the impact of training features and algorithms on generalization performance. Our experiments over several classification datasets and large-scale advertisement click prediction datasets demonstrate that our model can produce effective embeddings for unseen features and significantly outperforms baseline methods that adopt KNN and local aggregation.

ICML Conference 2021 Conference Paper

Towards Open-World Recommendation: An Inductive Model-based Collaborative Filtering Approach

  • Qitian Wu
  • Hengrui Zhang
  • Xiaofeng Gao 0001
  • Junchi Yan
  • Hongyuan Zha

Recommendation models can effectively estimate underlying user interests and predict one’s future behaviors by factorizing an observed user-item rating matrix into products of two sets of latent factors. However, the user-specific embedding factors can only be learned in a transductive way, making it difficult to handle new users on-the-fly. In this paper, we propose an inductive collaborative filtering framework that contains two representation models. The first model follows conventional matrix factorization which factorizes a group of key users’ rating matrix to obtain meta latents. The second model resorts to attention-based structure learning that estimates hidden relations from query to key users and learns to leverage meta latents to inductively compute embeddings for query users via neural message passing. Our model enables inductive representation learning for users and meanwhile guarantees equivalent representation capacity as matrix factorization. Experiments demonstrate that our model achieves promising results for recommendation on few-shot users with limited training ratings and new unseen users which are commonly encountered in open-world recommender systems.

NeurIPS Conference 2020 Conference Paper

Adversarial Learning for Robust Deep Clustering

  • Xu Yang
  • Cheng Deng
  • Kun Wei
  • Junchi Yan
  • Wei Liu

Deep clustering integrates embedding and clustering together to obtain the optimal nonlinear embedding space, which is more effective in real-world scenarios compared with conventional clustering methods. However, the robustness of the clustering network is prone to being attenuated especially when it encounters an adversarial attack. A small perturbation in the embedding space will lead to diverse clustering results since the labels are absent. In this paper, we propose a robust deep clustering method based on adversarial learning. Specifically, we first attempt to define adversarial samples in the embedding space for the clustering network. Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding. We then provide a simple yet efficient defense algorithm to improve the robustness of the clustering network. Experimental results on two popular datasets show that the proposed adversarial learning method can significantly enhance the robustness and further improve the overall clustering performance. Particularly, the proposed method is generally applicable to multiple existing clustering frameworks to boost their robustness. The source code is available at https: //github. com/xdxuyang/ALRDC.

NeurIPS Conference 2020 Conference Paper

Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning

  • Runzhong Wang
  • Junchi Yan
  • Xiaokang Yang

This paper considers the setting of jointly matching and clustering multiple graphs belonging to different groups, which naturally rises in many realistic problems. Both graph matching and clustering are challenging (NP-hard) and a joint solution is appealing due to the natural connection of the two tasks. In this paper, we resort to a graduated assignment procedure for soft matching and clustering over iterations, whereby the two-way constraint and clustering confidence are modulated by two separate annealing parameters, respectively. Our technique can be further utilized for end-to-end learning whose loss refers to the cross-entropy between two lines of matching pipelines, as such the keypoint feature extraction CNNs can be learned without ground-truth supervision. Experimental results on real-world benchmarks show our method outperforms learning-free algorithms and performs comparatively against two-graph based supervised graph matching approaches.

ICLR Conference 2020 Conference Paper

Learning deep graph matching with channel-independent embedding and Hungarian attention

  • Tianshu Yu 0001
  • Runzhong Wang
  • Junchi Yan
  • Baoxin Li

Graph matching aims to establishing node-wise correspondence between two graphs, which is a classic combinatorial problem and in general NP-complete. Until very recently, deep graph matching methods start to resort to deep networks to achieve unprecedented matching accuracy. Along this direction, this paper makes two complementary contributions which can also be reused as plugin in existing works: i) a novel node and edge embedding strategy which stimulates the multi-head strategy in attention models and allows the information in each channel to be merged independently. In contrast, only node embedding is accounted in previous works; ii) a general masking mechanism over the loss function is devised to improve the smoothness of objective learning for graph matching. Using Hungarian algorithm, it dynamically constructs a structured and sparsely connected layer, taking into account the most contributing matching pairs as hard attention. Our approach performs competitively, and can also improve state-of-the-art methods as plugin, regarding with matching accuracy on three public benchmarks.

IJCAI Conference 2020 Conference Paper

Learning for Graph Matching and Related Combinatorial Optimization Problems

  • Junchi Yan
  • Shuang Yang
  • Edwin Hancock

This survey gives a selective review of recent development of machine learning (ML) for combinatorial optimization (CO), especially for graph matching. The synergy of these two well-developed areas (ML and CO) can potentially give transformative change to artificial intelligence, whose foundation relates to these two building blocks. For its representativeness and wide-applicability, this paper is more focused on the problem of weighted graph matching, especially from the learning perspective. For graph matching, we show that many learning techniques e. g. convolutional neural networks, graph neural networks, reinforcement learning can be effectively incorporated in the paradigm for extracting the node features, graph structure features, and even the matching engine. We further present outlook for the new settings for learning graph matching, and direction towards more integrated combinatorial optimization solvers with prediction models, and also the mutual embrace of traditional solver and machine learning components.

IJCAI Conference 2020 Conference Paper

MergeNAS: Merge Operations into One for Differentiable Architecture Search

  • Xiaoxing Wang
  • Chao Xue
  • Junchi Yan
  • Xiaokang Yang
  • Yonggang Hu
  • Kewei Sun

Differentiable architecture search (DARTS) has been a promising one-shot architecture search approach for its mathematical formulation and competitive results. However, besides its caused high memory utilization and a large computation requirement, many research works have shown that DARTS also often suffers notable over-fitting and thus does not work robustly for some new tasks. In this paper, we propose a one-shot neural architecture search method referred to as MergeNAS by merging different types of operations e. g. convolutions into one operation. This merge-based approach not only reduces the search cost (about half a GPU day), but also alleviates over-fitting by reducing the redundant parameters. Extensive experiments on different search space and various datasets have been conducted to verify our approach, showing that MergeNAS can converge to a stable architecture and achieve better performance with fewer parameters and search cost. For test accuracy and its stability, MergeNAS outperforms all NAS baseline methods implemented on NAS-Bench-201, including DARTS, ENAS, RS, BOHB, GDAS and hand-crafted ResNet.

AAAI Conference 2020 Conference Paper

Multiple Graph Matching and Clustering via Decayed Pairwise Matching Composition

  • Tianzhe Wang
  • Zetian Jiang
  • Junchi Yan

Jointly matching of multiple graphs is challenging and recently has been an active topic in machine learning and computer vision. State-of-the-art methods have been devised, however, to our best knowledge there is no effective mechanism that can explicitly deal with the matching of a mixture of graphs belonging to multiple clusters, e. g. , a collection of bikes and bottles. Seeing its practical importance, we propose a novel approach for multiple graph matching and clustering. Firstly, for the traditional multi-graph matching setting, we devise a composition scheme based on a tree structure, which can be seen as in the between of two strong multigraph matching solvers, i. e. , MatchOpt (Yan et al. 2015a) and CAO (Yan et al. 2016a). In particular, it can be more robust than MatchOpt against a set of diverse graphs and more ef- ficient than CAO. Then we further extend the algorithm to the multiple graph matching and clustering setting, by adopting a decaying technique along the composition path, to discount the meaningless matching between graphs in different clusters. Experimental results show the proposed methods achieve excellent trade-off on the traditional multi-graph matching case, and outperform in both matching and clustering accuracy, as well as time efficiency.

NeurIPS Conference 2020 Conference Paper

The Diversified Ensemble Neural Network

  • Shaofeng Zhang
  • Meng Liu
  • Junchi Yan

Ensemble is a general way of improving the accuracy and stability of learning models, especially for the generalization ability on small datasets. Compared with tree-based methods, relatively less works have been devoted to an in-depth study on effective ensemble design for neural networks. In this paper, we propose a principled ensemble technique by constructing the so-called diversified ensemble layer to combine multiple networks as individual modules. We theoretically show that each individual model in our ensemble layer corresponds to weights in the ensemble layer optimized in different directions. Meanwhile, the devised ensemble layer can be readily integrated into popular neural architectures, including CNNs, RNNs, and GCNs. Extensive experiments are conducted on public tabular datasets, images, and texts. By adopting weight sharing approach, the results show our method can notably improve the accuracy and stability of the original neural networks with ignorable extra time and space overhead.

IJCAI Conference 2019 Conference Paper

Graph Convolutional Network Hashing for Cross-Modal Retrieval

  • Ruiqing Xu
  • Chao Li
  • Junchi Yan
  • Cheng Deng
  • Xianglong Liu

Deep network based cross-modal retrieval has recently made significant progress. However, bridging modality gap to further enhance the retrieval accuracy still remains a crucial bottleneck. In this paper, we propose a Graph Convolutional Hashing (GCH) approach, which learns modality-unified binary codes via an affinity graph. An end-to-end deep architecture is constructed with three main components: a semantic encoder module, two feature encoding networks, and a graph convolutional network (GCN). We design a semantic encoder as a teacher module to guide the feature encoding process, a. k. a. student module, for semantic information exploiting. Furthermore, GCN is utilized to explore the inherent similarity structure among data points, which will help to generate discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

Joint Link Prediction and Network Alignment via Cross-graph Embedding

  • Xingbo Du
  • Junchi Yan
  • Hongyuan Zha

Link prediction and network alignment are two important problems in social network analysis and other network related applications. Considerable efforts have been devoted to these two problems while often in an independent way to each other. In this paper we argue that these two tasks are relevant and present a joint link prediction and network alignment framework, whereby a novel cross-graph node embedding technique is devised to allow for information propagation. Our approach can either work with a few initial vertex correspondence as seeds, or from scratch. By extensive experiments on public benchmark, we show that link prediction and network alignment can benefit to each other especially for improving the recall for both tasks.

IJCAI Conference 2019 Conference Paper

Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting

  • Longyuan Li
  • Junchi Yan
  • Xiaokang Yang
  • Yaohui Jin

Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets. We take the automatic relevance determination (ARD) view and devise a network to exploit the exogenous variables in addition to time series. In particular, our ARD network can incorporate the uncertainty of the exogenous variables and eventually helps identify useful exogenous variables and suppress those irrelevant for forecasting. The distribution of multi-step ahead forecasts are approximated by Monte Carlo simulation. We show in experiments that our model produces accurate and sharp probabilistic forecasts. The estimated uncertainty of our forecasting also realistically increases over time, in a spontaneous manner.

NeurIPS Conference 2019 Conference Paper

Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling

  • Qitian Wu
  • Zixuan Zhang
  • Xiaofeng Gao
  • Junchi Yan
  • Guihai Chen

We target modeling latent dynamics in high-dimension marked event sequences without any prior knowledge about marker relations. Such problem has been rarely studied by previous works which would have fundamental difficulty to handle the arisen challenges: 1) the high-dimensional markers and unknown relation network among them pose intractable obstacles for modeling the latent dynamic process; 2) one observed event sequence may concurrently contain several different chains of interdependent events; 3) it is hard to well define the distance between two high-dimension event sequences. To these ends, in this paper, we propose a seminal adversarial imitation learning framework for high-dimension event sequence generation which could be decomposed into: 1) a latent structural intensity model that estimates the adjacent nodes without explicit networks and learns to capture the temporal dynamics in the latent space of markers over observed sequence; 2) an efficient random walk based generation model that aims at imitating the generation process of high-dimension event sequences from a bottom-up view; 3) a discriminator specified as a seq2seq network optimizing the rewards to help the generator output event sequences as real as possible. Experimental results on both synthetic and real-world datasets demonstrate that the proposed method could effectively detect the hidden network among markers and make decent prediction for future marked events, even when the number of markers scales to million level.

IJCAI Conference 2019 Conference Paper

TransMS: Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics

  • Shihui Yang
  • Jidong Tian
  • Honglun Zhang
  • Junchi Yan
  • Hao He
  • Yaohui Jin

Knowledge graph embedding, which projects the symbolic relations and entities onto low-dimension continuous spaces, is essential to knowledge graph completion. Recently, translation-based embedding models (e. g. TransE) have aroused increasing attention for their simplicity and effectiveness. These models attempt to translate semantics from head entities to tail entities with the relations and infer richer facts outside the knowledge graph. In this paper, we propose a novel knowledge graph embedding method named TransMS, which translates and transmits multidirectional semantics: i) the semantics of head/tail entities and relations to tail/head entities with nonlinear functions and ii) the semantics from entities to relations with linear bias vectors. Our model has merely one additional parameter α than TransE for each triplet, which results in its better scalability in large-scale knowledge graph. Experiments show that TransMS achieves substantial improvements against state-of-the-art baselines, especially the Hit@10s of head entity prediction for N-1 relations and tail entity prediction for 1-N relations improved by about 27. 1% and 24. 8% on FB15K database respectively.

IJCAI Conference 2018 Conference Paper

Collaborative and Attentive Learning for Personalized Image Aesthetic Assessment

  • Guolong Wang
  • Junchi Yan
  • Zheng Qin

The ever-increasing volume of visual images has stimulated the demand for organizing such data by aesthetic quality. Automatic and especially learning based aesthetic assessment methods have shown potential by recent works. Existing image aesthetic prediction is often user-agnostic which may ignore the fact that the rating to an image can be inherently individual. We fill this gap by formulating the personalized image aesthetic assessment problem with a novel learning method. Specifically, we collect user-image textual reviews in addition with visual images from the public dataset to organize a review-augmented benchmark. Using this enriched dataset, we devise a deep neural network with a user/image relation encoding input for collaborative filtering. Meanwhile an attentive mechanism is designed to capture the user-specific taste for image semantic tags and regions of interest by fusing the image and user's review. Extensive and promising experimental results on the review-augmented benchmark corroborate the efficacy of our approach.

NeurIPS Conference 2018 Conference Paper

Generalizing Graph Matching beyond Quadratic Assignment Model

  • Tianshu Yu
  • Junchi Yan
  • Yilin Wang
  • Wei Liu
  • Baoxin Li

Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP). We show that a large family of functions, which we define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters. We also study the properties of global optimality and devise convex/concave-preserving extensions to the widely used Lawler's QAP form. Our theoretical findings show the potential for deriving new algorithms and techniques for graph matching. We deliver solvers based on two specific instances of Separable Functions, and the state-of-the-art performance of our method is verified on popular benchmarks.

IJCAI Conference 2018 Conference Paper

Improving Maximum Likelihood Estimation of Temporal Point Process via Discriminative and Adversarial Learning

  • Junchi Yan
  • Xin Liu
  • Liangliang Shi
  • Changsheng Li
  • Hongyuan Zha

Point process is an expressive tool in learning temporal event sequence which is ubiquitous in real-world applications. Traditional predictive models are based on maximum likelihood estimation (MLE). This paper aims to improve MLE by discriminative and adversarial learning. The initial model is learned by MLE explaining the joint distribution of the occurred event history. Then it is refined by devising a gradient based learning procedure with two complementary recipes: i) mean square error (MSE) that directly reflects the prediction accuracy of the model; ii) adversarial classification loss which induces the Wasserstein distance loss. The hope is that the adversarial loss can add sharpness to the smooth effect inherently caused by the MSE loss. The method is generic and compatible with different differentiable parametric forms of the intensity function. Empirical results via a variant of the Hawkes processes demonstrate its effectiveness of our method.

AAAI Conference 2018 Conference Paper

Learning Conditional Generative Models for Temporal Point Processes

  • Shuai Xiao
  • Hongteng Xu
  • Junchi Yan
  • Mehrdad Farajtabar
  • Xiaokang Yang
  • Le Song
  • Hongyuan Zha

Estimating the future event sequence conditioned on current observations is a long-standing and challenging task in temporal analysis. On one hand for many real-world problems the underlying dynamics can be very complex and often unknown. This renders the traditional parametric point process models often fail to fit the data for their limited capacity. On the other hand, long-term prediction suffers from the problem of bias exposure where the error accumulates and propagates to future prediction. Our new model builds upon the sequence to sequence (seq2seq) prediction network. Compared with parametric point process models, its modeling capacity is higher and has better flexibility for fitting real-world data. The main novelty of the paper is to mitigate the second challenge by introducing the likelihood-free loss based on Wasserstein distance between point processes, besides negative maximum likelihood loss used in the traditional seq2seq model. Wasserstein distance, unlike KL divergence i. e. MLE loss, is sensitive to the underlying geometry between samples and can robustly enforce close geometry structure between them. This technique is proven able to improve the vanilla seq2seq model by a notable margin on various tasks.

IJCAI Conference 2018 Conference Paper

Learning Sequential Correlation for User Generated Textual Content Popularity Prediction

  • Wen Wang
  • Wei Zhang
  • Jun Wang
  • Junchi Yan
  • Hongyuan Zha

Popularity prediction of user generated textual content is critical for prioritizing information in the web, which alleviates heavy information overload for ordinary readers. Most previous studies model each content instance separately for prediction and thus overlook the sequential correlations between instances of a specific user. In this paper, we go deeper into this problem based on the two observations for each user, i. e. , sequential content correlation and sequential popularity correlation. We propose a novel deep sequential model called User Memory-augmented recurrent Attention Network (UMAN). This model encodes the two correlations by updating external user memories which is further leveraged for target text representation learning and popularity prediction. The experimental results on several real-world datasets validate the benefits of considering these correlations and demonstrate UMAN achieves best performance among several strong competitors.

AAAI Conference 2018 Conference Paper

Tau-FPL: Tolerance-Constrained Learning in Linear Time

  • Ao Zhang
  • Nan Li
  • Jian Pu
  • Jun Wang
  • Junchi Yan
  • Hongyuan Zha

In many real-world applications, learning a classifier with false-positive rate under a specified tolerance is appealing. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classi- fiers, which are of limitation in methodology since they do not directly incorporate the false-positive rate tolerance. In this paper, we propose a novel scoring-thresholding approach, τ- False Positive Learning (τ-FPL) to address this problem. We show that the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed τ-FPL over the existing approaches.

IJCAI Conference 2018 Conference Paper

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning

  • Ning Zhang
  • Junchi Yan
  • Yuchen Zhou

Separating audio mixtures into individual instrument tracks has been a standing challenge. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning. Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source. Moreover, a global regularization term is added to fulfill the spectrum energy preservation property regardless separation. Unlike state-of-the-art weakly supervised models which often involve deliberately devised constraints or careful model selection, our approach need little prior model specification on the data, and can be straightforwardly learned in an end-to-end fashion. We show that the proposed method performs competitively on public benchmark against state-of-the-art weakly supervised methods.

AAAI Conference 2017 Conference Paper

GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering

  • Chao Chen
  • Dongsheng Li
  • Qin Lv
  • Junchi Yan
  • Li Shang
  • Stephen Chu

Recommender systems have achieved great success in recent years, and matrix approximation (MA) is one of the most popular techniques for collaborative filtering (CF) based recommendation. However, a major issue is that MA methods perform poorly at detecting strong localized associations among closely related users and items. Recently, some MA-based CF methods adopt clustering methods to discover meaningful user-item subgroups and perform ensemble on different clusterings to improve the recommendation accuracy. However, ensemble learning suffers from lower efficiency due to the increased overall computation overhead. In this paper, we propose GLOMA, a new clustering-based matrix approximation method, which can embed global information in local matrix approximation models to improve recommendation accuracy. In GLOMA, a MA model is first trained on the entire data to capture global information. The global MA model is then utilized to guide the training of cluster-based local MA models, such that the local models can detect strong localized associations shared within clusters and at the same time preserve global associations shared among all users/items. Evaluation results using MovieLens and Netflix datasets demonstrate that, by integrating global information in local models, GLOMA can outperform five state-of-the-art MA-based CF methods in recommendation accuracy while achieving descent efficiency.

AAAI Conference 2017 Conference Paper

Modeling the Intensity Function of Point Process Via Recurrent Neural Networks

  • Shuai Xiao
  • Junchi Yan
  • Xiaokang Yang
  • Hongyuan Zha
  • Stephen Chu

Event sequence, asynchronously generated with random timestamp, is ubiquitous among applications. The precise and arbitrary timestamp can carry important clues about the underlying dynamics, and has lent the event data fundamentally different from the time-series whereby series is indexed with fixed and equal time interval. One expressive mathematical tool for modeling event is point process. The intensity functions of many point processes involve two components: the background and the effect by the history. Due to its inherent spontaneousness, the background can be treated as a time series while the other need to handle the history events. In this paper, we model the background by a Recurrent Neural Network (RNN) with its units aligned with time series indexes while the history effect is modeled by another RNN whose units are aligned with asynchronous events to capture the long-range dynamics. The whole model with event type and timestamp prediction output layers can be trained end-to-end. Our approach takes an RNN perspective to point process, and models its background and history effect. For utility, our method allows a black-box treatment for modeling the intensity which is often a pre-defined parametric form in point processes. Meanwhile end-to-end training opens the venue for reusing existing rich techniques in deep network for point process modeling. We apply our model to the predictive maintenance problem using a log dataset by more than 1000 ATMs from a global bank headquartered in North America.

AAAI Conference 2017 Conference Paper

On Predictive Patent Valuation: Forecasting Patent Citations and Their Types

  • Xin Liu
  • Junchi Yan
  • Shuai Xiao
  • Xiangfeng Wang
  • Hongyuan Zha
  • Stephen Chu

Patents are widely regarded as a proxy for inventive output which is valuable and can be commercialized by various means. Individual patent information such as technology field, classification, claims, application jurisdictions are increasingly available as released by different venues. This work has relied on a long-standing hypothesis that the citation received by a patent is a proxy for knowledge flows or impacts of the patent thus is directly related to patent value. This paper does not fall into the line of intensive existing work that test or apply this hypothesis, rather we aim to address the limitation of using so-far received citations for patent valuation. By devising a point process based patent citation type aware (self-citation and non-self-citation) prediction model which incorporates the various information of a patent, we open up the possibility for performing predictive patent valuation which can be especially useful for newly granted patents with emerging technology. Study on real-world data corroborates the efficacy of our approach. Our initiative may also have policy implications for technology markets, patent systems and all other stakeholders. The code and curated data will be available to the research community.

AAAI Conference 2017 Conference Paper

Self-Paced Multi-Task Learning

  • Changsheng Li
  • Junchi Yan
  • Fan Wei
  • Weishan Dong
  • Qingshan Liu
  • Hongyuan Zha

Multi-task learning is a paradigm, where multiple tasks are jointly learnt. Previous multi-task learning models usually treat all tasks and instances per task equally during learning. Inspired by the fact that humans often learn from easy concepts to hard ones in the cognitive process, in this paper, we propose a novel multi-task learning framework that attempts to learn the tasks by simultaneously taking into consideration the complexities of both tasks and instances per task. We propose a novel formulation by presenting a new task-oriented regularizer that can jointly prioritize tasks and instances. Thus it can be interpreted as a self-paced learner for multi-task learning. An efficient block coordinate descent algorithm is developed to solve the proposed objective function, and the convergence of the algorithm can be guaranteed. Experimental results on the toy and real-world datasets demonstrate the effectiveness of the proposed approach, compared to the state-of-the-arts.

AAAI Conference 2017 Conference Paper

Unsupervised Deep Learning for Optical Flow Estimation

  • Zhe Ren
  • Junchi Yan
  • Bingbing Ni
  • Bin Liu
  • Xiaokang Yang
  • Hongyuan Zha

Recent work has shown that optical flow estimation can be formulated as a supervised learning problem. Moreover, convolutional networks have been successfully applied to this task. However, supervised flow learning is obfuscated by the shortage of labeled training data. As a consequence, existing methods have to turn to large synthetic datasets for easily computer generated ground truth. In this work, we explore if a deep network for flow estimation can be trained without supervision. Using image warping by the estimated flow, we devise a simple yet effective unsupervised method for learning optical flow, by directly minimizing photometric consistency. We demonstrate that a flow network can be trained from endto-end using our unsupervised scheme. In some cases, our results come tantalizingly close to the performance of methods trained with full supervision.

NeurIPS Conference 2017 Conference Paper

Wasserstein Learning of Deep Generative Point Process Models

  • Shuai Xiao
  • Mehrdad Farajtabar
  • Xiaojing Ye
  • Junchi Yan
  • Le Song
  • Hongyuan Zha

Point processes are becoming very popular in modeling asynchronous sequential data due to their sound mathematical foundation and strength in modeling a variety of real-world phenomena. Currently, they are often characterized via intensity function which limits model's expressiveness due to unrealistic assumptions on its parametric form used in practice. Furthermore, they are learned via maximum likelihood approach which is prone to failure in multi-modal distributions of sequences. In this paper, we propose an intensity-free approach for point processes modeling that transforms nuisance processes to a target one. Furthermore, we train the model using a likelihood-free leveraging Wasserstein distance between point processes. Experiments on various synthetic and real-world data substantiate the superiority of the proposed point process model over conventional ones.

ICML Conference 2016 Conference Paper

Low-Rank Matrix Approximation with Stability

  • Dongsheng Li 0002
  • Chao Chen 0016
  • Qin Lv
  • Junchi Yan
  • Li Shang 0001
  • Stephen M. Chu

Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability – small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task.

IJCAI Conference 2016 Conference Paper

Modeling Contagious Merger and Acquisition via Point Processes with a Profile Regression Prior

  • Junchi Yan
  • Shuai Xiao
  • Changsheng Li
  • Bo Jin
  • Xiangfeng Wang
  • Bin Ke
  • Xiaokang Yang
  • Hongyuan Zha

Merger and Acquisition (M&A) has been a critical practice about corporate restructuring. Previous studies are mostly devoted to evaluating the suitability of M&A between a pair of investor and target company, or a target company for its propensity of being acquired. This paper focuses on the dual problem of predicting an investor's prospective M&A based on its activities and firmographics. We propose to use a mutually-exciting point process with a regression prior to quantify the investor's M&A behavior. Our model is motivated by the so-called contagious 'wave-like' M&A phenomenon, which has been well-recognized by the economics and management communities. A tailored model learning algorithm is devised that incorporates both static profile covariates and past M&A activities. Results on CrunchBase suggest the superiority of our model. The collected dataset and code will be released together with the paper.

IJCAI Conference 2016 Conference Paper

On Modeling and Predicting Individual Paper Citation Count over Time

  • Shuai Xiao
  • Junchi Yan
  • Changsheng Li
  • Bo Jin
  • Xiangfeng Wang
  • Xiaokang Yang
  • Stephen M. Chu
  • Hongyuan Zha

Evaluating a scientist's past and future potential impact is key in decision making concerning with recruitment and funding, and is increasingly linked to publication citation count. Meanwhile, timely identifying those valuable work with great potential before they receive wide recognition and become highly cited Abstracts is both useful for readers and authors in many regards. We propose a method for predicting the citation counts of individual publications, over an arbitrary time period. Our approach explores paper-specific covariates, and a point process model to account for the aging effect and triggering role of recent citations, through which Abstracts lose and gain their popularity, respectively. Empirical results on the Microsoft Academic Graph data suggests that our model can be useful for both prediction and interpretability.

AAAI Conference 2015 Conference Paper

Multi-View Point Registration via Alternating Optimization

  • Junchi Yan
  • Jun Wang
  • Hongyuan Zha
  • Xiaokang Yang
  • Stephen Chu

Multi-view point registration is a relatively less studied problem compared with two-view point registration. Directly applying pairwise registration often leads to matching discrepancy as the mapping between two point sets can be determined either by direct correspondences or by any intermediate point set. Also, the local two-view registration tends to be sensitive to noises. We propose a novel multi-view registration method, where the optimal registration is achieved via an efficient and effective alternating concave minimization process. We further extend our solution to a general case in practice of registration among point sets with different cardinalities. Extensive empirical evaluations of peer methods on both synthetic data and real images suggest our method is robust to large disturbance. In particular, it is shown that our method outperforms peer point matching methods and performs competitively against graph matching approaches. The latter approaches utilize the additional second-order information at the cost of exponentially increased run-time, thus usually being less efficient.

AAAI Conference 2015 Conference Paper

On Machine Learning towards Predictive Sales Pipeline Analytics

  • Junchi Yan
  • Chao Zhang
  • Hongyuan Zha
  • Min Gong
  • Changhua Sun
  • Jin Huang
  • Stephen Chu
  • Xiaokang Yang

Sales pipeline win-propensity prediction is fundamental to effective sales management. In contrast to using subjective human rating, we propose a modern machine learning paradigm to estimate the winpropensity of sales leads over time. A profile-specific two-dimensional Hawkes processes model is developed to capture the influence from seller’s activities on their leads to the win outcome, coupled with lead’s personalized profiles. It is motivated by two observations: i) sellers tend to frequently focus their selling activities and efforts on a few leads during a relatively short time. This is evidenced and reflected by their concentrated interactions with the pipeline, including login, browsing and updating the sales leads which are logged by the system; ii) the pending opportunity is prone to reach its win outcome shortly after such temporally concentrated interactions. Our model is deployed and in continual use to a large, global, B2B multinational technology enterprize (Fortune 500) with a case study. Due to the generality and flexibility of the model, it also enjoys the potential applicability to other real-world problems.

IJCAI Conference 2013 Conference Paper

Towards Effective Prioritizing Water Pipe Replacement and Rehabilitation

  • Junchi Yan
  • Yu Wang
  • Ke Zhou
  • Jin Huang
  • Chunhua Tian
  • Hongyuan Zha
  • Weishan Dong

Water pipe failures can not only have a great impact on people’s daily life but also cause significant waste of water which is an essential and precious resource to human beings. As a result, preventative maintenance for water pipes, particularly in urbanscale networks, is of great importance for a sustainable society. To achieve effective replacement and rehabilitation, failure prediction aims to proactively find those ‘most-likely-to-fail’ pipes becomes vital and has been attracting more attention from both academia and industry, especially from the civil engineering field. This paper presents an alreadydeployed industrial computational system for pipe failure prediction. As an alternative to risk matrix methods often depending on ad-hoc domain heuristics, learning based methods are adopted using the attributes with respect to physical, environmental, operational conditions and etc. Further challenge arises in practice when lacking of profile attributes. A dive into the failure records shows that the failure event sequences typically exhibit temporal clustering patterns, which motivates us to use the stochastic process to tackle the failure prediction task. Specifically, the failure sequence is formulated as a self-exciting stochastic process which is, to our best knowledge, a novel formulation for pipe failure prediction. And we show that it outperforms a baseline assuming the failure risk grows linearly with aging. Broad new problems and research points for the machine learning community are also introduced for future work.