Author name cluster

Xin Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

YNIMG Journal 2026 Journal Article

Disentangling computational and neural mechanisms of evaluation direction in value-based decision-making

Chunchun Chen
Xin Lin
Yilin Yang
Jianping Huang

Most value-based decision-making research rests on the implicit assumption that individuals typically evaluate options based on subjective preferences, favoring high-value alternatives. However, in certain contexts, the focus of information evaluation during decision-making processes may shift toward low-value evidence rather than high-value evidence. These two evaluation directions may dynamically alternate within a single decision episode to support the final decision. Yet prior studies have rarely disentangled these evaluation modes, which has limited our understanding of the cognitive and neural dynamics underlying changes in evaluation direction. This study recruited 36 participants and employed a value-based binary choice paradigm manipulating evaluation direction via task framing. While decision outcomes did not significantly differ across conditions, distinct computational and neural signatures emerged. Specifically, behavioral indicators and a hierarchical drift diffusion model (HDDM) revealed that low-value-directed evaluations were associated with longer reaction times, slower evidence accumulation, and higher decision thresholds, indicating increased deliberation. Electroencephalography (EEG) results further showed enhanced N200 and centro-parietal positivity (CPP) amplitudes in the low-value condition, reflecting greater value conflict and diminished value integration efficiency; simultaneously, increased alpha and beta desynchronization suggested heightened demand of attentional resources and stronger decision commitment. Together, these results demonstrate that value evaluation direction modulates the decision-making process across distinct temporal periods-from early value conflict to late-stage evidence accumulation and action preparation, revealing the underlying mechanisms of humans' flexible value encoding and providing a new methodological framework for analyzing multi-level value construction.

Details DOI

AAAI Conference 2026 Conference Paper

Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization

Yan Huang
Yongyi Su
Xin Lin
Le Zhang
Xun Xu

The emergence of foundation models has substantially advanced zero-shot generalization in monocular depth estimation (MDE), as exemplified by the Depth Anything series. However, given access to some data from downstream tasks, a natural question arises: can the performance of these models be further improved? To this end, we propose WeSTAR, a parameter-efficient framework that performs \textbf{We}akly supervised \textbf{S}elf-\textbf{T}raining \textbf{A}daptation with \textbf{R}egularization, designed to enhance the robustness of MDE foundation models in unseen and diverse domains. We first adopt a dense self-training objective as the primary source of structural self-supervision. To further improve robustness, we introduce semantically-aware hierarchical normalization, which exploits instance-level segmentation maps to perform more stable and multi-scale structural normalization. Beyond dense supervision, we introduce a cost-efficient weak supervision in the form of pairwise ordinal depth annotations to further guide the adaptation process, which enforces informative ordinal constraints to mitigate local topological errors. Finally, a weight regularization loss is employed to anchor the LoRA updates, ensuring training stability and preserving the model's generalizable knowledge. Extensive experiments on both realistic and corrupted out-of-distribution datasets under diverse and challenging scenarios demonstrate that WeSTAR consistently improves generalization and achieves state-of-the-art performance across a wide range of benchmarks.

PDF Details DOI

JAIR Journal 2025 Journal Article

A New Literature Review of 3D Object Detection on Autonomous Driving

Peng Zhang
Xin Li
Xin Lin
Liang He

In recent years, the realm of computer vision has experienced a significant surge in the importance of 3D object detection, especially in the context of autonomous driving. The capability to precisely identify the locations, dimensions, and types of key 3D objects surrounding an autonomous vehicle is crucial, rendering 3D object detection a vital component of any advanced perception system. This review delivers an extensive overview of the emerging technologies in 3D object detection tailored for autonomous vehicles. It encompasses a thorough examination, evaluation, and integration of the current research landscape in this domain, staying up-to-date with the latest advancements in 3D object detection and suggesting prospective avenues for future research. Our survey begins by clarifying the principles of 3D object detection and addressing its present challenges in the 3D domain. We then introduce three distinct taxonomies: camera-based, point cloudbased, and multi-modality-based approaches, providing a comprehensive classification of contemporary 3D object detection methodologies from various angles. Diverging from previous reviews, this paper also highlights and scrutinizes common issues and solutions for specific scenarios (such as pedestrian detection, lane lines, roadside cameras, and weather conditions) in object detection. Furthermore, we conduct an in-depth analysis and comparison of different classifications and methods, utilizing various datasets and experimental outcomes. Conclusively, we suggest several potential research directions, offering valuable insights for the ongoing evolution of 3D object detection technology. This review aims to serve as a comprehensive resource for researchers and practitioners in the field, guiding future innovations in 3D object detection for autonomous driving.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Explore What LLM Does Not Know in Complex Question Answering

Xin Lin
Zhenya Huang
Zhiqiang Zhang
Jun Zhou
Enhong Chen

Complex question answering (QA) is a challenging task in artificial intelligence research which requires reasoning based on related knowledge. The retrieval-augmented generation (RAG) based on large language models (LLMs) have become one promising solution in QA. To facilitate RAG more effectively, the LLM needs to precisely evaluate knowledge required in QA. That is, first, the LLM needs to examine its knowledge boundary (what the LLM does not know) to retrieve external knowledge as supplement. Second, the LLM needs to evaluate the utility of the retrieved knowledge (whether it helps in reasoning) for robust RAG. To this end, in this paper, we propose a novel Question Answering with Knowledge Evaluation (KEQA) framework to promote the effectiveness and efficiency of RAG in QA. First, inspired by quizzes in classroom, we propose a quiz-based method to precisely examine the knowledge state of the uninterpretable LLM for QA. We ask indicative quizzes on each required knowledge, and inspect whether the LLM can consistently answer the quiz to examine its knowledge boundary. Second, we retrieve the unknown knowledge from external source, and evaluate its utility to pick the helpful ones for reasoning. We design a reasoning-based metric to evaluate utility, and construct a demonstration set in training data for reference to guide knowledge picking in inference. We conduct extensive experiments on four widely-used QA datasets, and the results demonstrate the effectiveness of the proposed method.

PDF Details DOI

ICLR Conference 2025 Conference Paper

HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes

Xin Lin
Shi Luo
Xiaojun Shan
Xiaoyu Zhou
Chao Ren 0002
Lu Qi 0001
Ming-Hsuan Yang 0001
Nuno Vasconcelos

3D Gaussian Splatting (3DGS) has shown promising results for Novel View Synthesis. However, while it is quite effective when based on high-quality images, its performance declines as image quality degrades, due to lack of resolution, motion blur, noise, compression artifacts, or other factors common in real-world data collection. While some solutions have been proposed for specific types of degradation, general techniques are still missing. To address the problem, we propose a robust HQGS that significantly enhances the 3DGS under various degradation scenarios. We first analyze that 3DGS lacks sufficient attention in some detailed regions in low-quality scenes, leading to the absence of Gaussian primitives in those areas and resulting in loss of detail in the rendered images. To address this issue, we focus on leveraging edge structural information to provide additional guidance for 3DGS, enhancing its robustness. First, we introduce an edge-semantic fusion guidance module that combines rich texture information from high-frequency edge-aware maps with semantic information from images. The fused features serve as prior guidance to capture detailed distribution across different regions, bringing more attention to areas with detailed edge information and allowing for a higher concentration of Gaussian primitives to be assigned to such areas. Additionally, we present a structural cosine similarity loss to complement pixel-level constraints, further improving the quality of the rendered images. Extensive experiments demonstrate that our method offers better robustness and achieves the best results across various degraded scenes. Source code and trained models are publicly available at: \url{https://github.com/linxin0/HQGS}.

Details

NeurIPS Conference 2025 Conference Paper

HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning

Frank Wan
Xiaoran Shang
Yuxin Wu
Guibin Zhang
Jinhe Bi
Liangtao Zheng
Xin Lin
Yue Liu

Robust Federated Graph Learning (FGL) provides an effective decentralized framework for training Graph Neural Networks (GNNs) in noisy-label environments. However, the subtlety of noise during training presents formidable obstacles for developing robust FGL systems. Previous robust FL approaches neither adequately constrain edge-mediated error propagation nor account for intra-class topological differences. At the client level, we innovatively demonstrate that hyperspherical embedding can effectively capture graph structures in a fine-grained manner. Correspondingly, our method effectively addresses the aforementioned issues through fine-grained hypersphere alignment. Moreover, we uncover undetected noise arising from localized perspective constraints and propose the geometric-aware hyperspherical purification module at the server level. Combining both level strategies, we present our robust FGL framework, **HYPERION**, which operates all components within a unified hyperspherical space. **HYPERION** demonstrates remarkable robustness across multiple datasets, for instance, achieving a 29. 7\% $\uparrow$ F1-macro score with 50\%-pair noise on Cora. The code is available for anonymous access at \url{https: //anonymous. 4open. science/r/Hyperion-NeurIPS/}.

PDF Details

ICLR Conference 2025 Conference Paper

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

Jingtong Yue
Zhiwei Lin
Xin Lin
Xiaoyu Zhou
Xiangtai Li
Lu Qi 0001
Yongtao Wang
Ming-Hsuan Yang 0001

While recent low-cost radar-camera approaches have shown promising results in multi-modal 3D object detection, both sensors face challenges from environmen- tal and intrinsic disturbances. Poor lighting or adverse weather conditions de- grade camera performance, while radar suffers from noise and positional ambigu- ity. Achieving robust radar-camera 3D object detection requires consistent perfor- mance across varying conditions, a topic that has not yet been fully explored. In this work, we first conduct a systematic analysis of robustness in radar-camera de- tection on five kinds of noises and propose RobuRCDet, a robust object detection model in bird’s eye view (BEV). Specifically, we design a 3D Gaussian Expan- sion (3DGE) module to mitigate inaccuracies in radar points, including position, Radar Cross-Section (RCS), and velocity. The 3DGE uses RCS and velocity priors to generate a deformable kernel map and variance for kernel size adjustment and value distribution. Additionally, we introduce a weather-adaptive fusion module, which adaptively fuses radar and camera features based on camera signal confi- dence. Extensive experiments on the popular benchmark, nuScenes, show that our RobuRCDet achieves competitive results in regular and noisy conditions. The source codes and trained models will be made available.

Details

TIST Journal 2025 Journal Article

The Social Cognition Ability Evaluation of LLMs: A Dynamic Gamified Assessment and Hierarchical Social Learning Measurement Approach

Qin Ni
Yangze Yu
Yiming Ma
Xin Lin
Ciping Deng
Tingjiang Wei
Mo Xuan

Large Language Model (LLM) has shown amazing abilities in reasoning tasks, theory of mind (ToM) has been tested in many studies as part of reasoning tasks, and social learning, which is closely related to ToM, is still lack of investigation. However, the test methods and materials make the test results unconvincing. We propose a dynamic gamified assessment (DGA) and hierarchical social learning measurement to test ToM and social learning capacities in LLMs. The test for ToM consists of five parts. First, we extract ToM tasks from ToM experiments and then design game rules to satisfy the ToM task requirement. After that, we design ToM questions to match the game’s rules and use these to generate test materials. Finally, we go through the above steps to test the model. To assess the social learning ability, we introduce a novel set of social rules (three in total). Experiment results demonstrate that, except GPT-4, LLMs performed poorly on the ToM test but showed a certain level of social learning ability in social learning measurement.

Details DOI

AAAI Conference 2024 Conference Paper

BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind

Yuanyuan Mao
Xin Lin
Qin Ni
Liang He

As a foundational component of cognitive intelligence, theory of mind (ToM) can make AI more closely resemble human thought processes, thereby enhancing their interaction and collaboration with human. In particular, it can significantly improve a model's comprehension of videos in complex scenes. However, current video question answer (VideoQA) datasets focus on studying causal reasoning within events, few of them genuinely incorporating human ToM. Consequently, there is a lack of development in ToM reasoning tasks within the area of VideoQA. This paper presents BDIQA, the first benchmark to explore the cognitive reasoning capabilities of VideoQA models in the context of ToM. BDIQA is inspired by the cognitive development of children's ToM and addresses the current deficiencies in machine ToM within datasets and tasks. Specifically, it offers tasks at two difficulty levels, assessing Belief, Desire and Intention (BDI) reasoning in both simple and complex scenarios. We conduct evaluations on several mainstream methods of VideoQA and diagnose their capabilities with zero-shot, few-shot and supervised learning. We find that the performance of pre-trained models on cognitive reasoning tasks remains unsatisfactory. To counter this challenge, we undertake thorough analysis and experimentation, ultimately presenting two guidelines to enhance cognitive reasoning derived from ablation analysis.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle

Shangzi Xue
Zhenya Huang
Jiayu Liu
Xin Lin
Yuting Ning
Binbin Jin
Xin Li
Qi Liu

In this paper, we introduce DeAR ( Decompose-Analyze-Rethink ), a framework that iteratively builds a reasoning tree to tackle intricate problems within a single large language model (LLM). Unlike approaches that extend or search for rationales, DeAR is featured by 1) adopting a tree-based question decomposition manner to plan the organization of rationales, which mimics the logical planning inherentin human cognition; 2) globally updating the rationales at each reasoning step through natural language feedback. Specifically, the Decompose stage decomposes the question into simpler sub-questions, storing them as new nodes; the Analyze stage generates and self-checks rationales for sub-questions at each node evel; and the Rethink stage updates parent-node rationales based on feedback from their child nodes. By generating and updating the reasoning process from a more global perspective, DeAR constructs more adaptive and accurate logical structures for complex problems, facilitating timely error correction compared to rationale-extension and search-based approaches such as Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT). We conduct extensive experiments on three reasoning benchmarks, including ScienceQA, StrategyQA, and GSM8K, which cover a variety of reasoning tasks, demonstrating that our approach significantly reduces logical errors and enhances performance across various LLMs. Furthermore, we validate that DeAR is an efficient method that achieves a superior trade-off between accuracy and reasoning time compared to ToT and GoT.

PDF Details DOI

ECAI Conference 2024 Conference Paper

EVIDENT: Enhanced Visualization and Design Integration from Textual Edits and Prompts

Miguel Escarda-Fernández
Xin Lin
Iñigo López-Riobóo Botana
Sonia González-Vázquez

In this work, we present EVIDENT (Enhanced VIsualization and DEsign iNTegration from Textual Edits and Prompts), a chatbot framework aimed at enhancing the creative process of generating prompts for background images in virtual reality sets. The system is designed to interpret initial user instructions for generating input prompts for text-to-image models and modify descriptions according to the user guidance through chat. Moreover, the system integrates a text-to-image diffusion model for testing the generated prompts quality, considering the prompt-image alignment via post hoc visual interpretation. This method empowers users to make informed adjustments to their prompts. We adapt a Large Language Model (LLM) with human preference data, using both synthetic and human-labelled custom datasets. We provide an intuitive platform for users to refine their prompts, continuously improving our LLM generation capabilities thanks to a feedback mechanism, allowing us to collect new human preference data for our fine-tuning and alignment pipelines.

Details

AAAI Conference 2024 Conference Paper

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation

Xin Lin
Chong Shi
Yibing Zhan
Zuopeng Yang
Yaqi Wu
Dacheng Tao

Dynamic scene graph generation (SGG) focuses on detecting objects in a video and determining their pairwise relationships. Existing dynamic SGG methods usually suffer from several issues, including 1) Contextual noise, as some frames might contain occluded and blurred objects. 2) Label bias, primarily due to the high imbalance between a few positive relationship samples and numerous negative ones. Additionally, the distribution of relationships exhibits a long-tailed pattern. To address the above problems, in this paper, we introduce a network named TD2-Net that aims at denoising and debiasing for dynamic SGG. Specifically, we first propose a denoising spatio-temporal transformer module that enhances object representation with robust contextual information. This is achieved by designing a differentiable Top-K object selector that utilizes the gumbel-softmax sampling strategy to select the relevant neighborhood for each object. Second, we introduce an asymmetrical reweighting loss to relieve the issue of label bias. This loss function integrates asymmetry focusing factors and the volume of samples to adjust the weights assigned to individual samples. Systematic experimental results demonstrate the superiority of our proposed TD2-Net over existing state-of-the-art approaches on Action Genome databases. In more detail, TD2-Net outperforms the second-best competitors by 12.7% on mean-Recall@10 for predicate classification.

PDF Details DOI

AAAI Conference 2023 Conference Paper

A Disentangled-Attention Based Framework with Persona-Aware Prompt Learning for Dialogue Generation

Pingsheng Liu
Zhengjie Huang
Xiechi Zhang
Linlin Wang
Gerard de Melo
Xin Lin
Liang Pang
Liang He

Endowing dialogue agents with personas is the key to delivering more human-like conversations. However, existing persona-grounded dialogue systems still lack informative details of human conversations and tend to reply with inconsistent and generic responses. One of the main underlying causes is that pre-defined persona sentences are generally short and merely superficial descriptions of personal attributes, making appropriate persona selection and understanding non-trivial. Another challenge is that it is crucial to consider the context and the conversation flow to dynamically determine when to invoke different types of persona signals. To address these problems, we propose a disentangled-attention based pre-training architecture, which incorporates persona-aware prompt learning to bridge the connection between the selected persona and response generation. Our model first exploits the conversation flow to select context-relevant personas, and subsequently enriches the superficial persona descriptions with extra personality traits through persona-aware prompting. Finally, the decoder leverages a disentangled-attention mechanism to flexibly control the reliance on personas and dialogue contexts, and incorporates A*-like keyword-based heuristic estimates for controllable generation. Extensive experiments show that our approach can outperform strong baselines and deliver more consistent and engaging responses on the PERSONA-CHAT dataset.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training

Yuting Ning
Zhenya Huang
Xin Lin
Enhong Chen
Shiwei Tong
Zheng Gong
Shijin Wang

Understanding mathematical questions effectively is a crucial task, which can benefit many applications, such as difficulty estimation. Researchers have drawn much attention to designing pre-training models for question representations due to the scarcity of human annotations (e.g., labeling difficulty). However, unlike general free-format texts (e.g., user comments), mathematical questions are generally designed with explicit purposes and mathematical logic, and usually consist of more complex content, such as formulas, and related mathematical knowledge (e.g., Function). Therefore, the problem of holistically representing mathematical questions remains underexplored. To this end, in this paper, we propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo, which attempts to bring questions with more similar purposes closer. Specifically, we first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy (KHAR), which ranks the similarities between questions in a fine-grained manner. Next, we adopt a ranking contrastive learning task to optimize our model based on the augmented and ranked questions. We conduct extensive experiments on two real-world mathematical datasets. The experimental results demonstrate the effectiveness of our model.

PDF Details DOI

JBHI Journal 2021 Journal Article

Curriculum Feature Alignment Domain Adaptation for Epithelium-Stroma Classification in Histopathological Images

Qi Qi
Xin Lin
Chaoqi Chen
Weiping Xie
Yue Huang
Xinghao Ding
Xiaoqing Liu
Yizhou Yu

In recent years, deep learning methods have received more attention in epithelial-stroma (ES) classification tasks. Traditional deep learning methods assume that the training and test data have the same distribution, an assumption that is seldom satisfied in complex imaging procedures. Unsupervised domain adaptation (UDA) transfers knowledge from a labelled source domain to a completely unlabeled target domain, and is more suitable for ES classification tasks to avoid tedious annotation. However, existing UDA methods for this task ignore the semantic alignment across domains. In this paper, we propose a Curriculum Feature Alignment Network (CFAN) to gradually align discriminative features across domains through selecting effective samples from the target domain and minimizing intra-class differences. Specifically, we developed the Curriculum Transfer Strategy (CTS) and Adaptive Centroid Alignment (ACA) steps to train our model iteratively. We validated the method using three independent public ES datasets, and experimental results demonstrate that our method achieves better performance in ES classification compared with commonly used deep learning methods and existing deep domain adaptation methods.

Details DOI

AAAI Conference 2021 Conference Paper

HMS: A Hierarchical Solver with Dependency-Enhanced Understanding for Math Word Problem

Xin Lin
Zhenya Huang
Hongke Zhao
Enhong Chen
Qi Liu
Hao Wang
Shijin Wang

Automatically solving math word problems is a crucial task for exploring the intelligence levels of machines in the general AI domain. It is highly challenging since it requires not only natural language understanding but also mathematical expression inference. Existing solutions usually explore sequence-to-sequence models to generate expressions, where the problems are simply encoded sequentially. However, such models are generally far from enough for understanding problems as similar to humans and lead to incorrect answers. To this end, in this paper, we propose a novel Hierarchical Math Solver (HMS) to make deep understanding and exploitation of problems. In problem understanding, imitating human reading habits, we propose a hierarchical word-clauseproblem encoder. Specifically, we first split each problem into several clauses and learn problem semantics from the local clause level to the global problem level. Then, in clause understanding, we propose a dependency-based module to enhance clause semantics with the dependency structure of the problem. Next, in expression inference, we propose a novel tree-based decoder to generate the mathematical expression for the answer. In the decoder, we apply a hierarchical attention mechanism to enhance the problem semantics with context from different levels, and a pointer-generator network to guide the model to copy existing information and infer extra knowledge. Extensive experimental results on two widely used datasets demonstrate that HMS achieves not only better answers but also more reasonable inference.

PDF Details

AAAI Conference 2021 Conference Paper

Looking Wider for Better Adaptive Representation in Few-Shot Learning

Jiabao Zhao
Yifan Yang
Xin Lin
Jing Yang
Liang He

Building a good feature space is essential for the metric-based few-shot algorithms to recognize a novel class with only a few samples. The feature space is often built by Convolutional Neural Networks (CNNs). However, CNNs primarily focus on local information with the limited receptive field, and the global information generated by distant pixels is not well used. Meanwhile, having a global understanding of the current task and focusing on distinct regions of the same sample for different queries are important for the few-shot classification. To tackle these problems, we propose the Cross Non-Local Neural Network (CNL) for capturing the long-range dependency of the samples and the current task. CNL extracts the taskspecific and context-aware features dynamically by strengthening the features of the sample at a position via aggregating information from all positions of itself and the current task. To reduce losing important information, we maximize the mutual information between the original and refined features as a constraint. Moreover, we add a task-specific scaling to deal with multi-scale and task-specific features extracted by CNL. We conduct extensive experiments for validating our proposed algorithm, which achieves new state-of-the-art performances on two public benchmarks.

PDF Details

ICML Conference 2016 Conference Paper

A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

Ian En-Hsu Yen
Xin Lin
Jiong Zhang 0001
Pradeep Ravikumar
Inderjit S. Dhillon

Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems.

Details

ICML Conference 2015 Conference Paper

A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models

Ian En-Hsu Yen
Xin Lin
Kai Zhong
Pradeep Ravikumar
Inderjit S. Dhillon

MAD-Bayes (MAP-based Asymptotic Derivations) has been recently proposed as a general technique to derive scalable algorithm for Bayesian Nonparametric models. However, the combinatorial nature of objective functions derived from MAD-Bayes results in hard optimization problem, for which current practice employs heuristic algorithms analogous to k-means to find local minimum. In this paper, we consider the exemplar-based version of MAD-Bayes formulation for DP and Hierarchical DP (HDP) mixture model. We show that an exemplar-based MAD-Bayes formulation can be relaxed to a convex structural-regularized program that, under cluster-separation conditions, shares the same optimal solution to its combinatorial counterpart. An algorithm based on Alternating Direction Method of Multiplier (ADMM) is then proposed to solve such program. In our experiments on several benchmark data sets, the proposed method finds optimal solution of the combinatorial problem and significantly improves existing methods in terms of the exemplar-based objective.

Details