Author name cluster

Xiaojun Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers

2 author rows

AAAI Conference 2026 Conference Paper

ARGH-Mark: Anchor-Synchronized Watermarking with Hamming Correction for Robust and Quality-Preserving LLM Attribution

He Li
Xiaojun Chen
Jingcheng He
Zhendong Zhao
Shuguang Yuan
Xin Zhao
Yunfei Yang

The proliferation of large language models has intensified demands for reliable content attribution, yet existing watermarking techniques face a fundamental trilemma: they cannot simultaneously optimize for robustness against attacks, minimal text quality degradation, and detection efficiency. To resolve this challenge, we propose ARGH-Mark, a novel watermarking framework that integrates three synergistic innovations: (1) Anchor-synchronized phase recovery for maintaining detection integrity under insertion/deletion attacks, (2) RG-balanced vocabulary modulation that dynamically partitions lexicons via contextual hashing to preserve generation quality, and (3) Hamming-based error correction enabling single-bit error rectification through algebraic coding. Comprehensive evaluations across question answering (ELI5), summarization (CNN/DailyMail), and text generation (C4) demonstrate state-of-the-art performance: the proposed ARGH-Mark framework achieves near-perfect match rate and bit accuracy across diverse configurations, while preserving the quality of the generated text. It significantly reduces detection latency, enabling real-time extraction, and maintains high robustness against token tampering attacks through integrated Hamming error correction, ensuring reliable attribution in adversarial settings. ARGH-Mark achieves a new Pareto frontier in the watermarking design space and advances trustworthy deployment of generative AI in alignment-critical applications.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Yunfei Yang
Xiaojun Chen
Yuexin Xuan
Zhendong Zhao
Xin Zhao
He Li

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures

Wenzhe He
Xiaojun Chen
Ruiqi Wang
Ruihui Li
Huilong Pi
Jiapeng Zhang
Zhuo Tang
Kenli Li

3D LiDAR scene completion from point clouds is a fundamental component of perception systems in autonomous vehicles. Previous methods have predominantly employed diffusion models for high‑fidelity reconstruction. However, their multi-step iterative sampling incurs significant computational overhead, limiting its real-time applicability. To address this, we propose LiNeXt: a lightweight, non‐diffusion network optimized for rapid and accurate point cloud completion. Specifically, LiNeXt first applies the Noise‑to‑Coarse (N2C) Module to denoise the input noisy point cloud in a single pass, thereby obviating the multi‑step iterative sampling of diffusion‑based methods. The Refine Module then takes the coarse point cloud and its intermediate features from the N2C Module to perform more precise refinement, further enhancing structural completeness. Furthermore, we observe that LiDAR point clouds exhibit a distance-dependent spatial distribution, being densely sampled at proximal ranges and sparsely sampled at distal ranges. Accordingly, we propose the Distance‑aware Selected Repeat strategy to generate a more uniformly distributed noisy point cloud. On the SemanticKITTI dataset, LiNeXt achieves a 199.8 times speedup in inference, reduces Chamfer Distance by 50.7 percent, and uses only 6.1 percent of the parameters compared with LiDiff. These results demonstrate the superior efficiency and effectiveness of LiNeXt for real-time scene completion.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PointSLAM++: Robust Dense Neural Gaussian Point Cloud-based SLAM

Xu Wang
Boyao Han
Xiaojun Chen
Ying Liu
Ruihui Li

Real-time 3D reconstruction is crucial for robotics and augmented reality, yet current simultaneous localization and mapping(SLAM) approaches often struggle to maintain structural consistency and robust pose estimation in the presence of depth noise. This work introduces PointSLAM++, a novel RGB-D SLAM system that leverages a hierarchically constrained neural Gaussian representation to preserve structural relationships while generating Gaussian primitives for scene mapping. It also employs progressive pose optimization to mitigate depth sensor noise, significantly enhancing localization accuracy. Furthermore, it utilizes a dynamic neural representation graph that adjusts the distribution of Gaussian nodes based on local geometric complexity, enabling the map to adapt to intricate scene details in real time. This combination yields high-precision 3D mapping and photorealistic scene rendering. Experimental results show PointSLAM++ outperforms existing 3DGS-based SLAM methods in reconstruction accuracy and rendering quality, demonstrating its advantages for large-scale AR and robotics.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation

Xin Zhao
Xiaojun Chen
Bingshan Liu
Zeyao Liu
Zhendong Zhao
Xiaoyan Gu

Generative vision-language models like Stable Diffusion demonstrate remarkable capabilities in creative media synthesis, but they also pose substantial risks of producing unsafe, offensive, or culturally inappropriate content when prompted adversarially. Current defenses struggle to align outputs with human values without sacrificing generation quality or incurring high costs. To address these challenges, we introduce VALOR (Value-Aligned LLM-Overseen Rewriter), a modular, zero-shot agentic framework for safer and more helpful text-to-image generation. VALOR integrates layered prompt analysis with human-aligned value reasoning: a multi-level NSFW detector filters lexical and semantic risks; a cultural value alignment module identifies violations of social norms, legality, and representational ethics; and an intention disambiguator detects subtle or indirect unsafe implications. When unsafe content is detected, prompts are selectively rewritten by a large language model under dynamic, role-specific instructions designed to preserve user intent while enforcing alignment. If the generated image still fails a safety check, VALOR optionally performs a stylistic regeneration to steer the output toward a safer visual domain without altering core semantics. Experiments across adversarial, ambiguous, and value-sensitive prompts show that VALOR significantly reduces unsafe outputs by up to 100.00% while preserving prompt usefulness and creativity. These results highlight VALOR as a scalable and effective approach for deploying safe, aligned, and helpful image generation systems in open-world settings.

PDF Details DOI

JBHI Journal 2025 Journal Article

Automatic Segmentation of Bone Graft in Maxillary Sinus via Distance Constrained Network Guided by Prior Anatomical Knowledge

Jiangchang Xu
Jie Gao
Shuanglin Jiang
Chunliang Wang
Örjan Smedby
Yiqun Wu
Xiaoyi Jiang
Xiaojun Chen

Maxillary Sinus Lifting is a crucial surgical procedure for addressing insufficient alveolar bone mass andsevere resorption in dental implant therapy. To accurately analyze the geometry changesof the bone graft (BG) in the maxillary sinus (MS), it is essential to perform quantitative analysis. However, automated BG segmentation remains a major challenge due to the complex local appearance, including blurred boundaries, lesion interference, implant and artifact interference, and BG exceeding the MS. Currently, there are few tools available that can efficiently and accurately segment BG from cone beam computed tomography (CBCT) image. In this paper, we propose a distance-constrained attention network guided by prior anatomical knowledge for the automatic segmentation of BG. First, a guidance strategy of preoperative prior anatomical knowledge is added to a deep neural network (DNN), which improves its ability to identify the dividing line between the MS and BG. Next, a coordinate attention gate is proposed, which utilizes the synergy of channel and position attention to highlight salient features from the skip connections. Additionally, the geodesic distance constraint is introduced into the DNN to form multi-task predictions, which reduces the deviation of the segmentation result. In the test experiment, the proposed DNN achieved a Dice similarity coefficient of 85. 48 $\pm$ 6. 38%, an average surface distance error is 0. 57 $\pm$ 0. 34mm, and a 95% Hausdorff distance of 2. 64 $\pm$ 2. 09mm, which is superior to the comparison networks. It markedly improves the segmentation accuracy and efficiency of BG and has potential applications in analyzing its volume change and absorption rate in the future.

IROS Conference 2025 Conference Paper

DRTT: A Diffusion-based Framework for 4DCT Generation, Robust Thoracic Registration and Tumor Deformation Tracking

Dongyuan Li
Yixin Shan
Yuxuan Mao
Haochen Shi
Shenghao Huang
Weiyan Sun
Chang Chen
Xiaojun Chen

In minimally invasive robotic thoracic surgery, the unavoidable respiratory motion of the patient causes lung lesions to move and deform, making precise tumor localiza-tion a significant challenge for surgeons. To address this, we introduce an RDDM (Recursive Deformable Diffusion Model)-based framework designed for real-time intraoperative tumor tracking, which can be used for registration and navigation in robot-assisted thoracic surgery. The RDDM reduces training complexity and enhances dataset utilization by employing a simplified DDM (Diffusion Deformable Model) iteratively, significantly lowering computational demands while maximizing the extraction of valuable information from limited 4D-CT (four-dimensional computed tomography) datasets. Considering the robustness required for intraoperative registration and navigation, we incorporate an ICP (Iterative Closest Point)-based point cloud registration method into the framework and validate our approach using publicly available datasets and volunteer trials. This innovation has the potential to reduce radiation exposure, trauma, and the risk of complications for patients undergoing minimally invasive thoracic surgery, and enables downstream tasks such as RAPNB (robot-assisted percutaneous needle biopsy) and radiation therapy.

JBHI Journal 2025 Journal Article

Multimodal Distillation Pre-Training Model for Ultrasound Dynamic Images Annotation

Xiaojun Chen
Jia Ke
Yaning Zhang
Jianping Gou
Anna Shen
Shaohua Wan

With the development of medical technology, ultrasonography has become an important diagnostic method in doctors' clinical work. However, compared with the static medical image processing work such as CT, MRI, etc. , which has more research bases, ultrasonography is a dynamic medical image similar to video, which is captured and generated by a real-time moving probe, so how to deal with the video data in the medical field and cross modal extraction of the textual semantics in the medical video is a difficult problem that needs to be researched. For this reason, this paper proposes a pre-training model of multimodal distillation and fusion coding for processing the semantic relationship between ultrasound dynamic Images and text. Firstly, by designing the fusion encoder, the visual geometric features of tissues and organs in ultrasound dynamic images, the overall visual appearance descriptive features and the named entity linguistic features are fused to form a unified visual-linguistic feature, so that the model obtains richer visual, linguistic cues aggregation and alignment ability. Then, the pre-training model is augmented by multimodal knowledge distillation to improve the learning ability of the model. The final experimental results on multiple datasets show that the multimodal distillation pre-training model generally improves the fusion ability of various types of features in ultrasound dynamic images, and realizes the automated and accurate annotation of ultrasound dynamic images.

AAAI Conference 2025 Conference Paper

TGLsta: Low-resource Textual Graph Learning with Semantic and Topological Awareness via LLMs

Qin Zhang
Xiaowei Li
Ziqi Liu
Xiaochen Fan
Xiaojun Chen
Shirui Pan

Textual Graphs (TGs) present a graph-based representation of textual data and find wide applications in real-world scenarios, such as citation networks, knowledge graphs, and social networks. While the traditional "pre-train, fine-tune" framework effectively addresses tasks requiring abundant labeled data, it falls short in scenarios with limited resource or zero-shot learning capabilities, particularly in low-resource textual graph node classification. Additionally, prevalent approaches that convert text nodes into shallow or manually engineered features fail to capture the rich semantic nuances within the text. The conventional methods often neglect the fusion of semantic and topological information, resulting in suboptimal model learning. To overcome these challenges, we proposed a novel method of low-resource textual graph node classification based on large language models, i.e., Textual graph learning with semantic and topological awareness (TGLsta), which comprehensively explores the semantic information, near neighborhood information, and the topology information in textual graphs, where these components are the most important information source contained in textual graphs. Graph prompt tuning for both zero- and few-shot textual graph node classification is further introduced.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

Xin Zhao
Xiaojun Chen
Bingshan Liu
Haoyu Gao
Zhendong Zhao
Yilong Chen

Large language models (LLMs) with Mixture-of-Experts (MoE) architectures achieve impressive performance and efficiency by dynamically routing inputs to specialized subnetworks, known as experts. However, this sparse routing mechanism inherently exhibits task preferences due to expert specialization, introducing a new and underexplored vulnerability to backdoor attacks. In this work, we investigate the feasibility and effectiveness of injecting backdoors into MoE-based LLMs by exploiting their inherent expert routing preferences. We thus propose \textbf{BadSwitch}, a novel backdoor framework that integrates task-coupled dynamic trigger optimization with a sensitivity-guided Top-S expert tracing mechanism. Our approach jointly optimizes trigger embeddings during pretraining while identifying S most sensitive experts, subsequently constraining the Top-K gating mechanism to these targeted experts. Unlike traditional backdoor attacks that rely on superficial data poisoning or model editing, BadSwitch primarily embeds malicious triggers into expert routing paths with strong task affinity, enabling precise and stealthy model manipulation. Through comprehensive evaluations across three prominent MoE architectures (Switch Transformer, QwenMoE, and DeepSeekMoE), we demonstrate that BadSwitch can efficiently hijack pre-trained models with up to 100\% success rate (ASR) while maintaining the highest clean accuracy (ACC) among all baselines. Furthermore, BadSwitch exhibits strong resilience against both text-level and model-level defense mechanisms, achieving 94. 07\% ASR and 87. 18\% ACC on the AGNews dataset. Our analysis of expert activation patterns reveals fundamental insights into MoE vulnerabilities. We anticipate this work will expose security risks in MoE systems and contribute to advancing AI safety.

JBHI Journal 2024 Journal Article

A Multi-Task Transformer With Local-Global Feature Interaction and Multiple Tumoral Region Guidance for Breast Cancer Diagnosis

Yi Zhang
Bolun Zeng
Jia Li
Yuanyi Zheng
Xiaojun Chen

Breast cancer, as a malignant tumor disease, has maintained high incidence and mortality rates over the years. Ultrasonography is one of the primary methods for diagnosing early-stage breast cancer. However, correctly interpreting breast ultrasound images requires massive time from physicians with specialized knowledge and extensive experience. Recently, deep learning-based method have made significant advancements in breast tumor segmentation and classification due to their powerful fitting capabilities. However, most existing methods focus on performing one of these tasks separately, and often failing to effectively leverage information from specific tumor-related areas that hold considerable diagnostic value. In this study, we propose a multi-task network with local-global feature interaction and multiple tumoral region guidance for breast ultrasound-based tumor segmentation and classification. Specifically, we construct a dual-stream encoder, paralleling CNN and Transformer, to facilitate hierarchical interaction and fusion of local and global features. This architecture enables each stream to capitalize on the strengths of the other while preserving its unique characteristics. Moreover, we design a multi-tumoral region guidance module to explicitly learn long-range non-local dependencies within intra-tumoral and peri-tumoral regions from spatial domain, thus providing interpretable cues beneficial for classification. Experimental results on two breast ultrasound datasets show that our network outperforms state-of-the-art methods in tumor segmentation and classification tasks. Compared with the second-best competitive method, our network improves the diagnosis accuracy from 73. 64% to 80. 21% on a large external validation dataset, which demonstrates its superior generalization capability.

JBHI Journal 2024 Journal Article

Adaptive Multi-Dimensional Weighted Network With Category-Aware Contrastive Learning for Fine-Grained Hand Bone Segmentation

Bolun Zeng
Li Chen
Yuanyi Zheng
Xiaojun Chen

Accurately delineating and categorizing individual hand bones in 3D ultrasound (US) is a promising technology for precise digital diagnostic analysis. However, this is a challenging task due to the inherent imaging limitations of the US and the insignificant feature differences among numerous bones. In this study, we have proposed a novel deep learning-based solution for pediatric hand bone segmentation in the US. Our method is unique in that it allows for effective detailed feature mining through an adaptive multi-dimensional weighting attention mechanism. It innovatively implements a category-aware contrastive learning method to highlight inter-class semantic feature differences, thereby enhancing the category discrimination performance of the model. Extensive experiments on the challenging pediatric clinical hand 3D US datasets show the outstanding performance of the proposed method in segmenting thirty-eight bone structures, with the average Dice coefficient of 90. 0%. The results outperform other state-of-the-art methods, demonstrating its effectiveness in fine-grained hand bone segmentation. Our method will be globally released as a plugin in the 3D Slicer, providing an innovative and reliable tool for relevant clinical applications.

NeurIPS Conference 2024 Conference Paper

EGonc : Energy-based Open-Set Node Classification with substitute Unknowns

Qin Zhang
Zelin Shi
Shirui Pan
Junyang Chen
Huisi Wu
Xiaojun Chen

Open-set Classification (OSC) is a critical requirement for safely deploying machine learning models in the open world, which aims to classify samples from known classes and reject samples from out-of-distribution (OOD). Existing methods exploit the feature space of trained network and attempt at estimating the uncertainty in the predictions. However, softmax-based neural networks are found to be overly confident in their predictions even on data they have never seen before andthe immense diversity of the OOD examples also makes such methods fragile. To this end, we follow the idea of estimating the underlying density of the training data to decide whether a given input is close to the in-distribution (IND) data and adopt Energy-based models (EBMs) as density estimators. A novel energy-based generative open-set node classification method, \textit{EGonc}, is proposed to achieve open-set graph learning. Specifically, we generate substitute unknowns to mimic the distribution of real open-set samples firstly, based on the information of graph structures. Then, an additional energy logit representing the virtual OOD class is learned from the residual of the feature against the principal space, and matched with the original logits by a constant scaling. This virtual logit serves as the indicator of OOD-ness. EGonc has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for IND and OOD samples. Comprehensive experimental evaluations of EGonc also demonstrate its superiority.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Ziqiang Liu
Feiteng Fang
Xi Feng
Xinrun Du
Chenhao Zhang
Zekun Wang
yuelin bai
Qixuan Zhao

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74. 8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https: //huggingface. co/datasets/m-a-p/II-Bench.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Multi-Level Cross-Modal Alignment for Image Clustering

Liping Qiu
Qin Zhang
Xiaojun Chen
Shaotian Cai

Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.

PDF Details DOI

AAAI Conference 2024 Conference Paper

ROG_PL: Robust Open-Set Graph Learning via Region-Based Prototype Learning

Qin Zhang
Xiaowei Li
Jiexin Lu
Liping Qiu
Shirui Pan
Xiaojun Chen
Junyang Chen

Open-set graph learning is a practical task that aims to classify the known class nodes and to identify unknown class samples as unknowns. Conventional node classification methods usually perform unsatisfactorily in open-set scenarios due to the complex data they encounter, such as out-of-distribution (OOD) data and in-distribution (IND) noise. OOD data are samples that do not belong to any known classes. They are outliers if they occur in training (OOD noise), and open-set samples if they occur in testing. IND noise are training samples which are assigned incorrect labels. The existence of IND noise and OOD noise is prevalent, which usually cause the ambiguity problem, including the intra-class variety problem and the inter-class confusion problem. Thus, to explore robust open-set learning methods is necessary and difficult, and it becomes even more difficult for non-IID graph data. To this end, we propose a unified framework named ROG_PL to achieve robust open-set learning on complex noisy graph data, by introducing prototype learning. In specific, ROG_PL consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions. The first module corrects noisy labels through similarity-based label propagation and removes low-confidence samples, to solve the intra-class variety problem caused by noise. The second module learns open-set prototypes for each known class via non-overlapped regions and remains both interior and border prototypes to remedy the inter-class confusion problem. The two modules are iteratively updated under the constraints of classification loss and prototype diversity loss. To the best of our knowledge, the proposed ROG_PL is the first robust open-set node classification method for graph data with complex noise. Experimental evaluations of ROG_PL on several benchmark graph datasets demonstrate that it has good performance.

PDF Details DOI

JMLR Journal 2023 Journal Article

An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity

Wei Liu
Xin Liu
Xiaojun Chen

The leaky ReLU network with a group sparse regularization term has been widely used in the recent years. However, training such network yields a nonsmooth nonconvex optimization problem and there exists a lack of approaches to compute a stationary point deterministically. In this paper, we first resolve the multi-layer composite term in the original optimization problem by introducing auxiliary variables and additional constraints. We show the new model has a nonempty and bounded solution set and its feasible set satisfies the Mangasarian-Fromovitz constraint qualification. Moreover, we show the relationship between the new model and the original problem. Remarkably, we propose an inexact augmented Lagrangian algorithm for solving the new model, and show the convergence of the algorithm to a KKT point. Numerical experiments demonstrate that our algorithm is more efficient for training sparse leaky ReLU neural networks than some well-known algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

IJCAI Conference 2023 Conference Paper

G2Pxy: Generative Open-Set Node Classification on Graphs with Proxy Unknowns

Qin Zhang
Zelin Shi
Xiaolin Zhang
Xiaojun Chen
Philippe Fournier-Viger
Shirui Pan

Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are of ten applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i. e. , G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on bench mark graph datasets. Moreover, G2Pxy does not have specific requirement on the GNN architecture and shows good generalizations.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Semantic-Enhanced Image Clustering

Shaotian Cai
Liping Qiu
Xiaojun Chen
Qin Zhang
Longteng Chen

Image clustering is an important and open challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus being unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of visual-language pre-training model. Different from the zero-shot setting, in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named Semantic-Enhanced Image Clustering (SIC). In this new method, we propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics. Finally, we propose to perform clustering with consistency learning in both image space and semantic space, in a self-supervised learning fashion. The theoretical result of convergence analysis shows that our proposed method can converge at a sublinear speed. Theoretical analysis of expectation risk also shows that we can reduce the expectation risk by improving neighborhood consistency, increasing prediction confidence, or reducing neighborhood imbalance. Experimental results on five benchmark datasets clearly show the superiority of our new method.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Deep Unsupervised Hashing with Latent Semantic Components

Qinghong Lin
Xiaojun Chen
Qin Zhang
Shaotian Cai
Wenzhe Zhao
Hongfa Wang

Deep unsupervised hashing has been appreciated in the regime of image retrieval. However, most prior arts failed to detect the semantic components and their relationships behind the images, which makes them lack discriminative power. To make up the defect, we propose a novel Deep Semantic Components Hashing (DSCH), which involves a common sense that an image normally contains a bunch of semantic components with homology and co-occurrence relationships. Based on this prior, DSCH regards the semantic components as latent variables under the Expectation- Maximization framework and designs a two-step iterative algorithm with the objective of maximum likelihood of training data. Firstly, DSCH constructs a semantic component structure by uncovering the fine-grained semantics components of images with a Gaussian Mixture Modal (GMM), where an image is represented as a mixture of multiple components, and the semantics co-occurrence are exploited. Besides, coarse-grained semantics components, are discovered by considering the homology relationships between finegrained components, and the hierarchy organization is then constructed. Secondly, DSCH makes the images close to their semantic component centers at both fine-grained and coarsegrained levels, and also makes the images share similar semantic components close to each other. Extensive experiments on three benchmark datasets demonstrate that the proposed hierarchical semantic components indeed facilitate the hashing model to achieve superior performance.

IJCAI Conference 2021 Conference Paper

Enhancing Label Representations with Relational Inductive Bias Constraint for Fine-Grained Entity Typing

Jinqing Li
Xiaojun Chen
Dakui Wang
Yuwei Li

Fine-Grained Entity Typing (FGET) is a task that aims at classifying an entity mention into a wide range of entity label types. Recent researches improve the task performance by imposing the label-relational inductive bias based on the hierarchy of labels or label co-occurrence graph. However, they usually overlook explicit interactions between instances and labels which may limit the capability of label representations. Therefore, we propose a novel method based on a two-phase graph network for the FGET task to enhance the label representations, via imposing the relational inductive biases of instance-to-label and label-to-label. In the phase 1, instance features will be introduced into label representations to make the label representations more representative. In the phase 2, interactions of labels will capture dependency relationships among them thus make label representations more smooth. During prediction, we introduce a pseudo-label generator for the construction of the two-phase graph. The input instances differ from batch to batch so that the label representations are dynamic. Experiments on three public datasets verify the effectiveness and stability of our proposed method and achieve state-of-the-art results on their testing sets.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Exploring Human-Like Reading Strategy for Abstractive Text Summarization

Min Yang
Qiang Qu
Wenting Tu
Ying Shen
Zhou Zhao
Xiaojun Chen

The recent artificial intelligence studies have witnessed great interest in abstractive text summarization. Although remarkable progress has been made by deep neural network based methods, generating plausible and high-quality abstractive summaries remains a challenging task. The human-like reading strategy is rarely explored in abstractive text summarization, which however is able to improve the effectiveness of the summarization by considering the process of reading comprehension and logical thinking. Motivated by the humanlike reading strategy that follows a hierarchical routine, we propose a novel Hybrid learning model for Abstractive Text Summarization (HATS). The model consists of three major components, a knowledge-based attention network, a multitask encoder-decoder network, and a generative adversarial network, which are consistent with the different stages of the human-like reading strategy. To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets.

IJCAI Conference 2019 Conference Paper

Knowledge-enhanced Hierarchical Attention for Community Question Answering with Multi-task and Adaptive Learning

Min Yang
Lei Chen
Xiaojun Chen
Qingyao Wu
Wei Zhou
Ying Shen

In this paper, we propose a Knowledge-enhanced Hierarchical Attention for community question answering with Multi-task learning and Adaptive learning (KHAMA). First, we propose a hierarchical attention network to fully fuse knowledge from input documents and knowledge base (KB) by exploiting the semantic compositionality of the input sequences. The external factual knowledge helps recognize background knowledge (entity mentions and their relationships) and eliminate noise information from long documents that have sophisticated syntactic and semantic structures. In addition, we build multiple CQA models with adaptive boosting and then combine these models to learn a more effective and robust CQA system. Further- more, KHAMA is a multi-task learning model. It regards CQA as the primary task and question categorization as the auxiliary task, aiming at learning a category-aware document encoder and enhance the quality of identifying essential information from long questions. Extensive experiments on two benchmarks demonstrate that KHAMA achieves substantial improvements over the compared methods.

AAAI Conference 2019 Short Paper

Semi-Supervised Feature Selection with Adaptive Discriminant Analysis

Weichan Zhong
Xiaojun Chen
Guowen Yuan
Yiqin Li
Feiping Nie

In this paper, we propose a novel Adaptive Discriminant Analysis for semi-supervised feature selection, namely SADA. Instead of computing fixed similarities before performing feature selection, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative method. In each iteration, S is computed from the projected distance with the learned W and W is computed with the learned S. Therefore, SADA can learn better projection matrix W by weakening the effect of noise features with the adaptive similarity matrix. Experimental results on 4 data sets show the superiority of SADA compared to 5 semisupervised feature selection methods.

AAAI Conference 2018 Short Paper

A Semi-Supervised Network Embedding Model for Protein Complexes Detection

Wei Zhao
Jia Zhu
Min Yang
Danyang Xiao
Gabriel Pui Cheong Fung
Xiaojun Chen

Protein complex is a group of associated polypeptide chains which plays essential roles in biological process. Given a graph representing protein-protein interactions (PPI) network, it is critical but non-trivial to detect protein complexes. In this paper, we propose a semi-supervised network embedding model by adopting graph convolutional networks to effectively detect densely connected subgraphs. We conduct extensive experiment on two popular PPI networks with various data sizes and densities. The experimental results show our approach achieves state-of-the-art performance.

AAAI Conference 2018 Short Paper

A Stratified Feature Ranking Method for Supervised Feature Selection

Renjie Chen
Xiaojun Chen
Guowen Yuan
Wenya Sun
Qingyao Wu

Most feature selection methods usually select the highest rank features which may be highly correlated with each other. In this paper, we propose a Stratiﬁed Feature Ranking (SFR) method for supervised feature selection. In the new method, a Subspace Feature Clustering (SFC) is proposed to identify feature clusters, and a stratiﬁed feature ranking method is proposed to rank the features such that the high rank features are lowly correlated. Experimental results show the superiority of SFR.

AAAI Conference 2018 Short Paper

Discriminative Semi-Supervised Feature Selection via Rescaled Least Squares Regression-Supplement

Guowen Yuan
Xiaojun Chen
Chen Wang
Feiping Nie
Liping Jing

In this paper, we propose a Discriminative Semi-Supervised Feature Selection (DSSFS) method. In this method, a dragging technique is introduced to the Rescaled Linear Square Regression in order to enlarge the distances between different classes. An iterative method is proposed to simultaneously learn the regression coefﬁcients, -draggings matrix and predicting the unknown class labels. Experimental results show the superiority of DSSFS.

IJCAI Conference 2018 Conference Paper

PLASTIC: Prioritize Long and Short-term Information in Top-n Recommendation using Adversarial Training

Wei Zhao
Benyou Wang
Jianbo Ye
Yongqiang Gao
Min Yang
Xiaojun Chen

Recommender systems provide users with ranked lists of items based on individual's preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term models represent the interactions between users and items that are supposed to change slowly across time, session-based models encode the information of users' interests and changing dynamics of items' attributes in short terms. In this paper, we propose a PLASTIC model, Prioritizing Long And Short-Term Information in top-n reCommendation using adversarial training. In the adversarial process, we train a generator as an agent of reinforcement learning which recommends the next item to a user sequentially. We also train a discriminator which attempts to distinguish the generated list of items from the real list recorded. Extensive experiments show that our model exhibits significantly better performances on two widely used real-world datasets.

AAAI Conference 2017 Short Paper

Attention Based LSTM for Target Dependent Sentiment Classification

Min Yang
Wenting Tu
Jingxuan Wang
Fei Xu
Xiaojun Chen

We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classiﬁcation. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.

AAAI Conference 2017 Short Paper

Detecting Review Spammer Groups

Min Yang
Ziyu Lu
Xiaojun Chen
Fei Xu

With an increasing number of paid writers posting fake reviews to promote or demote some target entities through Internet, review spammer detection has become a crucial and challenging task. In this paper, we propose a three-phase method to address the problem of identifying review spammer groups and individual spammers, who get paid for posting fake comments. We evaluate the effectiveness and performance of the approach on a real-life online shopping review dataset from amazon. com. The experimental result shows that our model achieved comparable or better performance than previous work on spammer detection.

IS Journal 2017 Journal Article

Local PurTree Spectral Clustering for Massive Customer Transaction Data

Xiaojun Chen
Si Peng
Joshua Zhexue Huang
Feiping Nie
Yong Ming

The clustering of customer transaction data is very important to retail and e-commerce companies. The authors propose a local PurTree spectral clustering algorithm for massive customer transaction data that uses a purchase tree to represent customer transaction data and a PurTree distance to compute the distance between two trees. The new method learns a data similarity matrix from the local distances and the level weights in the PurTree distance simultaneously. An iterative optimization algorithm is proposed to optimize the proposed model. The authors conducted experiments to compare their method with four commonly used clustering method for transaction data on six real-life datasets. The experimental results show that the new method outperformed other clustering algorithms.

IJCAI Conference 2017 Conference Paper

Scalable Normalized Cut with Improved Spectral Rotation

Xiaojun Chen
Feiping Nie
Joshua Zhexue Huang
Min Yang

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.

IJCAI Conference 2017 Conference Paper

Semi-supervised Feature Selection via Rescaled Linear Regression

Xiaojun Chen
Guowen Yuan
Feiping Nie
Joshua Zhexue Huang

With the rapid increase of complex and high-dimensional sparse data, demands for new methods to select features by exploiting both labeled and unlabeled data have increased. Least regression based feature selection methods usually learn a projection matrix and evaluate the importances of features using the projection matrix, which is lack of theoretical explanation. Moreover, these methods cannot find both global and sparse solution of the projection matrix. In this paper, we propose a novel semi-supervised feature selection method which can learn both global and sparse solution of the projection matrix. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which are used for ranking the features. It has shown that the new model can learn global and sparse solution. Moreover, the introduction of scale factors provides a theoretical explanation for why we can use the projection matrix to rank the features. A simple yet effective algorithm with proved convergence is proposed to optimize the new model. Experimental results on eight real-life data sets show the superiority of the method.