Author name cluster

Haoran Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

EAAI Journal 2026 Journal Article

Diffusion-based lossy geometry compression for three dimensional point clouds

Haibo Zhang
Haoran Sun
Qicheng Wang
Xiao Cai
Xiao Wu
Mingquan Zhou
Guohua Geng

Details DOI

JBHI Journal 2026 Journal Article

KidMesh: Computational Mesh Reconstruction for Pediatric Congenital Hydronephrosis Using Deep Neural Networks

Haoran Sun
Zhanpeng Zhu
Anguo Zhang
Bo Liu
Zhaohua Lin
Liqin Huang
Mingjing Yang
Lei Liu

Pediatric congenital hydronephrosis (CH) is a common urinary tract disorder, primarily caused by obstruction at the renal pelvis-ureter junction. Magnetic resonance urography (MRU) can visualize hydronephrosis, including renal pelvis and calyces, by utilizing the natural contrast provided by water. Existing voxel-based segmentation approaches can extract CH regions from MRU, facilitating disease diagnosis and prognosis. However, these segmentation methods predominantly focus on morphological features, such as size, shape, and structure. To enable functional assessments, such as urodynamic simulations, external complex post-processing steps are required to convert these results into mesh-level representations. To address this limitation, we propose an end-to-end method based on deep neural networks, namely KidMesh, which could automatically reconstruct CH meshes directly from MRU. Generally, KidMesh extracts feature maps from MRU images and converts them into feature vertices through grid sampling. It then deforms a template mesh according to these feature vertices to generate the specific CH meshes of MRU images. Meanwhile, we develop a novel schema to train KidMesh without relying on accurate mesh-level annotations, which are difficult to obtain due to the sparsely sampled MRU slices. Experimental results show that KidMesh reconstructs CH meshes in an average of 0. 4 seconds, and achieve comparable performance to conventional methods without requiring post-processing. The reconstructed meshes exhibited no self-intersections, with only 3. 7% and 0. 2% of the vertices having error distances exceeding 3. 2mm and 6. 4mm, respectively. After rasterization, these meshes achieved a Dice score of 0. 86 against manually delineated CH masks. Furthermore, these meshes could be used in renal urine flow simulations, providing valuable urodynamic information for clinical practice.

Details DOI

AAAI Conference 2026 Conference Paper

Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field

Haoqin Hong
Ding Fan
Fubin Dou
Zhi-Li Zhou
Haoran Sun
Congcong Zhu
Jingrun Chen

Recently, 3D Gaussian Splatting (3DGS), an explicit scene representation technique, has shown significant promise for dynamic novel-view synthesis from monocular video input. However, purely data-driven 3DGS often struggles to capture the diverse physics-driven motion patterns in dynamic scenes. To fill this gap, we propose Physics‑Informed Deformable Gaussian Splatting (PIDG), which treats each Gaussian particle as a Lagrangian material point with time-varying constitutive parameters and is supervised by 2D optical flow via motion projection. Specifically, we adopt static-dynamic decoupled 4D decomposed hash encoding to reconstruct geometry and motion efficiently. Subsequently, we impose the Cauchy momentum residual as a physics constraint, enabling independent prediction of each particle’s velocity and constitutive stress via a time-evolving material field. Finally, we further supervise data fitting by matching Lagrangian particle flow to camera-compensated optical flow, which accelerates convergence and improves generalization. Experiments on a custom physics-driven dataset as well as on standard synthetic and real-world datasets demonstrate significant gains in physical consistency and monocular dynamic reconstruction quality.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining

Jiayu Wang
Haoyu Bian
Haoran Sun
Shaoning Zeng

Image deraining is crucial for vision applications but is challenged by the complex multi-scale physics of rain and its coupling with scenes. To address this challenge, a novel approach inspired by multi-stage image restoration is proposed, incorporating Point Spread Function (PSF) mechanisms to reveal the image degradation process while combining dynamic physical modeling with sequential feature fusion transfer, named SD-PSFNet. Specifically, SD-PSFNet employs a sequential restoration architecture with three cascaded stages, allowing multiple dynamic evaluations and refinements of the degradation process estimation. The network utilizes components with learned PSF mechanisms to dynamically simulate rain streak optics, enabling effective rain-background separation while progressively enhancing outputs through novel PSF components at each stage. Additionally, SD-PSFNet incorporates adaptive gated fusion for optimal cross-stage feature integration, enabling sequential refinement from coarse rain removal to fine detail restoration. Our model achieves state-of-the-art PSNR/SSIM metrics on Rain100H (33.12dB/0.9371), RealRain-1k-L (42.28dB/0.9872), and RealRain-1k-H (41.08dB/0.9838). In summary, SD-PSFNet demonstrates excellent capability in complex scenes and dense rainfall conditions, providing a new physics-aware approach to image deraining.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang 0001
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Yaodong Yang 0001

How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce **Amulet**, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users' personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.

Details

NeurIPS Conference 2025 Conference Paper

Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search

Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu

Multimodal large language models (MLLMs) have begun to demonstrate robust reasoning capabilities on general tasks, yet their application in the medical domain remains in its early stages. Constructing chain-of-thought (CoT) training data is essential for bolstering the reasoning abilities of medical MLLMs. However, existing approaches exhibit a deficiency in offering a comprehensive framework for searching and evaluating effective reasoning paths towards critical diagnosis. To address this challenge, we propose Mentor-Intern Collaborative Search (MICS), a novel reasoning-path searching scheme to generate rigorous and effective medical CoT data. MICS first leverages mentor models to initialize the reasoning, one step at a time, then prompts each intern model to continue the thinking along those initiated paths, and finally selects the optimal reasoning path according to the overall reasoning performance of multiple intern models. The reasoning performance is determined by an MICS-Score, which assesses the quality of generated reasoning paths. Eventually, we construct MMRP, a multi-task medical reasoning dataset with ranked difficulty, and Chiron-o1, a new medical MLLM devised via a curriculum learning strategy, with robust visual question-answering and generalizable reasoning capabilities. Extensive experiments demonstrate that Chiron-o1, trained on our CoT dataset constructed using MICS, achieves state-of-the-art performance across a list of medical visual question answering and reasoning benchmarks. Codes are available at https: //github. com/Yankai96/Chiron-o1

PDF Details

ICML Conference 2025 Conference Paper

Core Knowledge Deficits in Multi-Modal Language Models

Yijiang Li
Qingying Gao
Tianwei Zhao
Bingyang Wang
Haoran Sun
Haiyun Lyu
Robert D. Hawkins
Nuno Vasconcelos

While Multi-modal Large Language Models (MLLMs) demonstrate impressive abilities over high-level perception and reasoning, their robustness in the wild remains limited, often falling short on tasks that are intuitive and effortless for humans. We examine the hypothesis that these deficiencies stem from the absence of core knowledge—rudimentary cognitive abilities innate to humans from early childhood. To explore the core knowledge representation in MLLMs, we introduce CoreCognition, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science. We evaluate 230 models with 11 different prompts, leading to a total of 2, 530 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level abilities relative to high-level ones. Finally, we propose Concept Hacking, a novel controlled evaluation method, that reveals MLLMs fail to progress toward genuine core knowledge understanding, but instead rely on shortcut learning as they scale. Project page at https: //williamium3000. github. io/core-knowledge/.

Details

IJCAI Conference 2025 Conference Paper

Game Theory Meets Large Language Models: A Systematic Survey

Haoran Sun
Yusen Wu
Yukun Cheng
Xu Chu

Game theory establishes a fundamental framework for analyzing strategic interactions among rational decision-makers. The rapid advancement of large language models (LLMs) has sparked extensive research exploring the intersection of these two fields. Specifically, game-theoretic methods are being applied to evaluate and enhance LLM capabilities, while LLMs themselves are reshaping classic game models. This paper presents a comprehensive survey of the intersection of these fields, exploring a bidirectional relationship from three perspectives: (1) Establishing standardized game-based benchmarks for evaluating LLM behavior; (2) Leveraging game-theoretic methods to improve LLM performance through algorithmic innovations; (3) Characterizing the societal impacts of LLMs through game modeling. Among these three aspects, we also highlight how the equilibrium analysis for traditional game models is impacted by LLMs' advanced language understanding, which in turn extends the study of game theory. Finally, we identify key challenges and future research directions, assessing their feasibility based on the current state of the field. By bridging theoretical rigor with emerging AI capabilities, this survey aims to foster interdisciplinary collaboration and drive progress in this evolving research area.

PDF Details DOI

ICLR Conference 2025 Conference Paper

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu 0003

Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to preferred outcomes. This hinders learning efficiency and slows convergence.In this paper, we propose MA-RLHF, a simple yet effective RLHF framework that incorporates macro actions --- sequences of tokens or higher-level language constructs --- into the learning process. By operating at higher level of abstraction, our approach reduces the temporal distance between actions and rewards, facilitating faster and more accurate credit assignment. This results in more stable policy gradient estimates and enhances learning efficiency within each episode, all without increasing computational complexity during training or inference. We validate our approach through extensive experiments across various model sizes and tasks, including text summarization, dialogue generation, question answering, and program synthesis. Our method achieves substantial performance improvements over standard RLHF, with performance gains of up to 30\% in text summarization and code generation, 18\% in dialogue, and 8\% in question answering tasks. Notably, our approach reaches parity with vanilla RLHF $1.7 \sim 2$ times faster in terms of training time and continues to outperform it with further training. We make our code and data publicly available at \url{https://github.com/ernie-research/MA-RLHF}.

Details

NeurIPS Conference 2025 Conference Paper

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Haoran Sun
Yurong Chen
Siwei Wang
Chu Xu
Wei Chen
Xiaotie Deng

Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, but this could harm the aggregation performance. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on training rules that maximize social welfare subject to certain regularizations, referred to as SW-Max rules. First, we show that under most circumstances, truthful reporting is sub-optimal with simply a SW-Max rule, thereby highlighting the necessity of payments. Second, we extend the VCG payment to implement SW-Max rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence and derive the necessary conditions for a payment rule to implement a SW-Max rule in DSIC and other principles. Third, we demonstrate that our mechanism is approximately DSIC with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM training results further confirm the practical implications of our results.

PDF Details

AIIM Journal 2025 Journal Article

Medical multimodal foundation models in clinical diagnosis and treatment: Applications, challenges, and future directions

Kai Sun
Siyan Xue
Fuchun Sun
Haoran Sun
Yu Luo
Ling Wang
Siyuan Wang
Na Guo

Details DOI

NeurIPS Conference 2025 Conference Paper

Path-Enhanced Contrastive Learning for Recommendation

Haoran Sun
Fei Xiong
Yuanzhe Hu
Liang Wang

Collaborative filtering (CF) methods are now facing the challenge of data sparsity in recommender systems. In order to reduce the effect of data sparsity, researchers proposed contrastive learning methods to extract self-supervised signals from raw data. Contrastive learning methods address this problem by graph augmentation and maximizing the consistency of node representations between different augmented graphs. However, these methods tends to unintentionally distance the target node from its path nodes on the interaction path, thus limiting its effectiveness. In this regard, we propose a solution that uses paths as samples in the contrastive loss function. In order to obtain the path samples, we design a path sampling method. In addition to the contrast of the relationship between the target node and the nodes within the path (intra-path contrast), we also designed a method of contrasting the relationship between the paths (inter-path contrast) to better pull the target node and its path nodes closer to each other. We use Simplifying and Powering Graph Convolution Network (LightGCN) as the basis and combine with a new path-enhanced graph approach proposed for graph augmentation. It effectively improves the performance of recommendation models. Our proposed Path Enhanced Contrastive Loss (PECL) model replaces the common contrastive loss function with our novel loss function, showing significant performance improvement. Experiments on three real-world datasets demonstrate the effectiveness of our model.

PDF Details

EAAI Journal 2025 Journal Article

Three-way dynamic clustering algorithms based on generalized neighborhood relations in incomplete hybrid information systems with applications in medical decision-making

Haoran Sun
Bingzhen Sun
Xixuan Zhao
Qiang Bao
Xiaoli Chu

Details DOI

NeurIPS Conference 2024 Conference Paper

ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models

Siwei Wang
Yifei Shen
Shi Feng
Haoran Sun
Shang-Hua Teng
Wei Chen

Planning is a crucial element of both human intelligence and contemporary large language models (LLMs). In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn the adjacency and an incomplete reachability matrices, consistent with our theoretical predictions. When applying our methodology to the real-world planning benchmark Blocksworld, our observations remain consistent. Additionally, our analyses uncover a fundamental limitation of current Transformer architectures in path-finding: these architectures cannot identify reachability relationships through transitivity, which leads to failures in generating paths when concatenation is required. These findings provide new insights into how the internal mechanisms of autoregressive learning facilitate intelligent planning and deepen our understanding of how future LLMs might achieve more advanced and general planning-and-reasoning capabilities across diverse applications.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Graph Attention Network with High-Order Neighbor Information Propagation for Social Recommendation

Fei Xiong
Haoran Sun
Guixun Luo
Shirui Pan
Meikang Qiu
Liang Wang

In recommender systems, graph neural networks (GNN) can integrate interactions between users and items with their attributes, which makes GNN-based methods more powerful. However, directly stacking multiple layers in a graph neural network can easily lead to over-smoothing, hence recommendation systems based on graph neural networks typically underutilize higher-order neighborhoods in their learning. Although some heterogeneous graph random walk methods based on meta-paths can achieve higher-order aggregation, the focus is predominantly on the nodes at the ends of the paths. Moreover, these methods require manually defined meta-paths, which limits the model’s expressiveness and flexibility. Furthermore, path encoding in graph neural networks usually focuses only on the sequence leading to the target node. However, real-world interactions often do not follow this strict sequence, limiting the predictive performance of sequence-based network models. These problems prevent GNN-based methods from being fully effective. We propose a Graph Attention network with Information Propagation path aggregation for Social Recommendation (GAIPSRec). Firstly, we propose a universal heterogeneous graph sampling framework that does not require manually defining meta-paths for path sampling, thereby offering greater flexibility. Moreover, our method takes into account all nodes on the aggregation path and is capable of learning information from higher-order neighbors without succumbing to over-smoothing. Finally, our method utilizes a gate mechanism to fuse sequential and non-sequential dependence in encoding path instances, allowing a more holistic view of the data. Extensive experiments on real-world datasets show that our proposed GAIPSRec improves the performance significantly and outperforms state-of-the-art methods.

PDF Details DOI

JBHI Journal 2024 Journal Article

Wearable Surface Deformation Myography (sDMG) System for Recognition of Locomotion Modes

Haoran Sun
Xiangyu Peng
Junlang Wang
Jie Liu
Tingting Fu
Chaoming He

This study designs a wearable sensing system for locomotion mode recognition using lower-limb skin surface curvature deformation caused by the morphological changes of musculotendinous complexes and soft tissues. Flexible bending sensors are embedded into stretch pants, enabling curvature deformations of specific skin segments above lower-limb muscle groups to be captured in a noncontact manner. To evaluate the performance of this system, we conducted experiments on eight able-bodied subjects completing seven common locomotive activities, including walking, running, ramp ascending/descending, stair ascending/descending, and standing. The system measured seven channels of deformation signals from two cross-sections on the shank and the thigh. The collected signals were distinguishable across different locomotion modes and exhibited consistency when monitoring steps. Using selected time-domain features and a linear discriminant analysis (LDA) classifier enabled the proposed system to continuously recognize locomotion modes with an average accuracy of 96. 5%. Furthermore, the system maintains recognition performance with 95. 7% accuracy even after removing and reapplying the sensors. Finally, we conducted comparison experiments to analyze how window length, feature selection, and the number of channels affect recognition performance, providing insights for optimization. We believe that this novel signal platform holds great potential as a valuable supplementary tool in wearable human motion detection, enriching the information diversity for motion analysis, and enabling new possibilities for further advancements and applications in fields including biomedical engineering, textiles, and computer graphics.

Details DOI

NeurIPS Conference 2023 Conference Paper

A Scalable Neural Network for DSIC Affine Maximizer Auction Design

Zhijian Duan
Haoran Sun
Yurong Chen
Xiaotie Deng

Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of allocation candidates. To address these limitations, we propose AMenuNet, a scalable neural network that constructs the AMA parameters (even including the allocation menu) from bidder and item representations. AMenuNet is always DSIC and individually rational (IR) due to the properties of AMAs, and it enhances scalability by generating candidate allocations through a neural network. Additionally, AMenuNet is permutation equivariant, and its number of parameters is independent of auction scale. We conduct extensive experiments to demonstrate that AMenuNet outperforms strong baselines in both contextual and non-contextual multi-item auctions, scales well to larger auctions, generalizes well to different settings, and identifies useful deterministic allocations. Overall, our proposed approach offers an effective solution to automated DSIC auction design, with improved scalability and strong revenue performance in various settings.

PDF Details

ICLR Conference 2023 Conference Paper

Any-scale Balanced Samplers for Discrete Space

Haoran Sun
Bo Dai 0001
Charles Sutton
Dale Schuurmans
Hanjun Dai

The locally balanced informed proposal has proved to be highly effective for sampling from discrete spaces. However, its success relies on the "local'' factor, which ensures that whenever the proposal distribution is restricted to be near the current state, the locally balanced weight functions are asymptotically optimal and the gradient approximations are accurate. In seeking a more efficient sampling algorithm, many recent works have considered increasing the scale of the proposal distributions, but this causes the "local'' factor to no longer hold. Instead, we propose any-scale balanced samplers to repair the gap in non-local proposals. In particular, we substitute the locally balanced function with an any-scale balanced function that can self-adjust to achieve better efficiency for proposal distributions at any scale. We also use quadratic approximations to capture curvature of the target distribution and reduce the error in the gradient approximation, while employing a Gaussian integral trick with a special estimated diagonal to efficiently sample from the quadratic proposal distribution. On various synthetic and real distributions, the proposed sampler substantially outperforms existing approaches.

Details

ICML Conference 2023 Conference Paper

Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets

Yurong Chen 0002
Qian Wang 0025
Zhijian Duan 0001
Haoran Sun
Zhaohua Chen 0001
Xiang Yan
Xiaotie Deng

In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal social welfare and discuss bidders’ incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.

Details

NeurIPS Conference 2023 Conference Paper

DISCS: A Benchmark for Discrete Sampling

Katayoon Goshvadi
Haoran Sun
Xingchao Liu
Azade Nova
Ruqi Zhang
Will Grathwohl
Dale Schuurmans
Hanjun Dai

Sampling in discrete spaces, with critical applications in simulation and optimization, has recently been boosted by significant advances in gradient-based approaches that exploit modern accelerators like GPUs. However, two key challenges are hindering further advancement in research on discrete sampling. First, since there is no consensus on experimental settings and evaluation setups, the empirical results in different research papers are often not comparable. Second, implementing samplers and target distributions often requires a nontrivial amount of effort in terms of calibration and parallelism. To tackle these challenges, we propose DISCS (DISCrete Sampling), a tailored package and benchmark that supports unified and efficient experiment implementation and evaluations for discrete sampling in three types of tasks: sampling from classical graphical models and energy based generative models, and sampling for solving combinatorial optimization. Throughout the comprehensive evaluations in DISCS, we gained new insights into scalability, design principles for proposal distributions, and lessons for adaptive sampling design. DISCS efficiently implements representative discrete samplers in existing research works as baselines and offers a simple interface that researchers can conveniently add new discrete samplers and directly compare their performance with the benchmark result in a calibrated setup.

PDF Details

ICML Conference 2023 Conference Paper

Revisiting Sampling for Combinatorial Optimization

Haoran Sun
Katayoon Goshvadi
Azade Nova
Dale Schuurmans
Hanjun Dai

Sampling approaches like Markov chain Monte Carlo were once popular for combinatorial optimization, but the inefficiency of classical methods and the need for problem-specific designs curtailed ongoing development. Recent work has favored data-driven approaches that mitigate the need for hand-craft heuristics, but these are often not usable as out-of-the-box solvers due to dependence on in-distribution training and limited scalability to large instances. In this paper, we revisit the idea of using sampling for combinatorial optimization, motivated by the significant recent advances of gradient-based discrete MCMC and new techniques for parallel neighborhood exploration on accelerators. Remarkably, we find that modern sampling strategies can leverage landscape information to provide general-purpose solvers that require no training and yet are competitive with state of the art combinatorial solvers. In particular, experiments on cover vertex selection, graph partition and routing demonstrate better speed-quality trade-offs over current learning based approaches, and sometimes even superior performance to commercial solvers and specialized algorithms.

Details

ICLR Conference 2023 Conference Paper

Score-based Continuous-time Discrete Diffusion Models

Haoran Sun
Lijun Yu
Bo Dai 0001
Dale Schuurmans
Hanjun Dai

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, \ie, the score function, is not properly defined for discrete spaces. This makes it non-trivial to adapt SDE with score functions to categorical data. In this paper, we extend diffusion models to discrete variables by introducing a stochastic jump process where the reverse process denoises via a continuous-time Markov chain. This formulation admits an analytical simulation during backward sampling. To learn the reverse process, we extend score matching to general categorical data, and show that an unbiased estimator can be obtained via simple matching of the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.

Details

NeurIPS Conference 2022 Conference Paper

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Haoran Sun
Hanjun Dai
Dale Schuurmans

Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the efficiency of M-H in discrete spaces can also be characterized by an asymptotic acceptance rate that is independent of the target distribution. Moreover, we verify, both theoretically and empirically, that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are $0. 574$ and $0. 234$ respectively. These results also help establish that LBP is asymptotically $O(N^\frac{2}{3})$ more efficient than RWM with respect to model dimension $N$. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces. We demonstrate empirically that such adaptive M-H sampling can robustly improve sampling in a variety of target distributions in discrete spaces, including training deep energy based models.

PDF Details

ICLR Conference 2022 Conference Paper

Path Auxiliary Proposal for MCMC in Discrete Space

Haoran Sun
Hanjun Dai
Wei Xia
Arun Ramamurthy

Energy-based Model (EBM) offers a powerful approach for modeling discrete structure, but both inference and learning of EBM are hard as it involves sampling from discrete distributions. Recent work shows Markov Chain Monte Carlo (MCMC) with the informed proposal is a powerful tool for such sampling. However, an informed proposal only allows local updates as it requires evaluating all energy changes in the neighborhood. In this work, we present a path auxiliary algorithm that uses a composition of local moves to efficiently explore large neighborhoods. We also give a fast version of our algorithm that only queries the evaluation of energy function twice for each proposal via linearization of the energy function. Empirically, we show that our path auxiliary algorithms considerably outperform other generic samplers on various discrete models for sampling, inference, and learning. Our method can also be used to train deep EBMs for high-dimensional discrete data.

Details

ICLR Conference 2022 Conference Paper

Provable Learning-based Algorithm For Sparse Recovery

Xinshi Chen
Haoran Sun
Le Song

Recovering sparse parameters from observational data is a fundamental problem in machine learning with wide applications. Many classic algorithms can solve this problem with theoretical guarantees, but their performances rely on choosing the correct hyperparameters. Besides, hand-designed algorithms do not fully exploit the particular problem distribution of interest. In this work, we propose a deep learning method for algorithm learning called PLISA (Provable Learning-based Iterative Sparse recovery Algorithm). PLISA is designed by unrolling a classic path-following algorithm for sparse recovery, with some components being more flexible and learnable. We theoretically show the improved recovery accuracy achievable by PLISA. Furthermore, we analyze the empirical Rademacher complexity of PLISA to characterize its generalization ability to solve new problems outside the training set. This paper contains novel theoretical contributions to the area of learning-based algorithms in the sense that (i) PLISA is generically applicable to a broad class of sparse estimation problems, (ii) generalization analysis has received less attention so far, and (iii) our analysis makes novel connections between the generalization ability and algorithmic properties such as stability and convergence of the unrolled algorithm, which leads to a tighter bound that can explain the empirical observations. The techniques could potentially be applied to analyze other learning-based algorithms in the literature.

Details

NeurIPS Conference 2021 Conference Paper

Multi-task Learning of Order-Consistent Causal Graphs

Xinshi Chen
Haoran Sun
Caleb Ellington
Eric Xing
Le Song

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

PDF Details

NeurIPS Conference 2020 Conference Paper

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Xiangyi Chen
Tiancong Chen
Haoran Sun
Steven Z. Wu
Mingyi Hong

Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent examples include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, which we show can provably close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these median-based algorithms, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.

PDF Details

ICML Conference 2020 Conference Paper

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun
Songtao Lu
Mingyi Hong 0001

Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to \emph{both} reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) \emph{and} gradient tracking (which tracks the global full gradient using local estimates). We show that to achieve certain $\epsilon$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}\epsilon^{-1})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mn\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity.

Details