Arrow Research search

Author name cluster

Jiajun Wen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

AAAI Conference 2026 Conference Paper

BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-Tailed Recognition

  • Weijia Fan
  • Qiufu Li
  • Jiajun Wen
  • Xiaoyang Peng

For long-tailed recognition (LTR) tasks, high intra-class compactness and inter-class separability in both head and tail classes, as well as balanced separability among all the classifier vectors, are preferred. The existing LTR methods based on cross-entropy (CE) loss not only struggle to learn features with desirable properties but also couple imbalanced classifier vectors in the denominator of its Softmax, amplifying the imbalance effects in LTR. In this paper, for the LTR, we propose a binary cross-entropy (BCE)-based tripartite synergistic learning, termed BCE3S, which consists of three components: (1) BCE-based joint learning optimizes both the classifier and sample features, which achieves better compactness and separability among features than the CE-based joint learning, by decoupling the metrics between feature and the imbalanced classifier vectors in multiple Sigmoid; (2) BCE-based contrastive learning further improves the intra-class compactness of features; (3) BCE-based uniform learning balances the separability among classifier vectors and interactively enhances the feature properties by combining with the joint learning. The extensive experiments show that the LTR model trained by BCE3S not only achieves higher compactness and separability among sample features, but also balances the classifier's separability, achieving SOTA performance on various long-tailed datasets such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist2018.

AAAI Conference 2026 Conference Paper

From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging

  • Jialin Wu
  • Jian Yang
  • Handing Wang
  • Jiajun Wen
  • Zhiyong Yu

Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users the ability to explicitly balance performance trade-offs. Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation. This offline stage typically involves iterative search or dedicated training, with complexity that grows exponentially with the number of tasks. To overcome these limitations, we shift the perspective from parameter-space optimization to a direct correction of the model's final representation. Our approach models this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation. This solution directly incorporates user preferences, allowing a Pareto-optimal model to be generated on-the-fly with complexity that scales linearly with the number of tasks. Experimental results show our method generates a superior Pareto front with more precise preference alignment and drastically reduced computational cost.

IJCAI Conference 2025 Conference Paper

MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection

  • Jiayi Cheng
  • Can Gao
  • Jie Zhou
  • Jiajun Wen
  • Tao Dai
  • Jinbao Wang

3D Anomaly Detection (AD) is a promising means of controlling the quality of manufactured products. However, existing methods typically require carefully training a task-specific model for each category independently, leading to high cost, low efficiency, and weak generalization. This study presents a novel unified model for Multi-Category 3D Anomaly Detection (MC3D-AD) that aims to utilize both local and global geometry-aware information to reconstruct normal representations of all categories. First, to learn robust and generalized features of different categories, we propose an adaptive geometry-aware masked attention module that extracts geometry variation information to guide mask attention. Then, we introduce a local geometry-aware encoder reinforced by the improved mask attention to encode group-level feature tokens. Finally, we design a global query decoder that utilizes point cloud position embeddings to improve the decoding process and reconstruction ability. This leads to local and global geometry-aware reconstructed feature tokens for the 3D AD task. MC3D-AD is evaluated on two publicly available Real3D-AD and Anomaly-ShapeNet datasets, and exhibits significant superiority over current state-of-the-art single-category methods, achieving 3. 1% and 9. 3% improvement in object-level AUROC over Real3D-AD and Anomaly-ShapeNet, respectively. The code is available at https: //github. com/iCAN-SZU/MC3D-AD.

NeurIPS Conference 2025 Conference Paper

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

  • Liang Ma
  • Jiajun Wen
  • Min Lin
  • Rongtao Xu
  • Xiwen Liang
  • Bingqian Lin
  • Jun Ma
  • Yongxin Wang

While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block assembly tasks. PhyBlock integrates a novel four-level cognitive hierarchy assembly task alongside targeted Visual Question Answering (VQA) samples, collectively aimed at evaluating progressive spatial reasoning and fundamental physical comprehension, including object properties, spatial relationships, and holistic scene understanding. PhyBlock includes 2600 block tasks (400 assembly tasks, 2200 VQA tasks) and evaluates models across three key dimensions: partial completion, failure diagnosis, and planning robustness. We benchmark 23 state-of-the-art VLMs, highlighting their strengths and limitations in physically grounded, multi-step planning. Our empirical findings indicate that the performance of VLMs exhibits pronounced limitations in high-level planning and reasoning capabilities, leading to a notable decline in performance for the growing complexity of the tasks. Error analysis reveals persistent difficulties in spatial orientation and dependency reasoning. We position PhyBlock as a unified testbed to advance embodied reasoning, bridging vision-language understanding and real-world physical problem-solving.

IJCAI Conference 2022 Conference Paper

Uncertainty-Guided Pixel Contrastive Learning for Semi-Supervised Medical Image Segmentation

  • Tao Wang
  • Jianglin Lu
  • Zhihui Lai
  • Jiajun Wen
  • Heng Kong

Recently, contrastive learning has shown great potential in medical image segmentation. Due to the lack of expert annotations, however, it is challenging to apply contrastive learning in semi-supervised scenes. To solve this problem, we propose a novel uncertainty-guided pixel contrastive learning method for semi-supervised medical image segmentation. Specifically, we construct an uncertainty map for each unlabeled image and then remove the uncertainty region in the uncertainty map to reduce the possibility of noise sampling. The uncertainty map is determined by a well-designed consistency learning mechanism, which generates comprehensive predictions for unlabeled data by encouraging consistent network outputs from two different decoders. In addition, we suggest that the effective global representations learned by an image encoder should be equivariant to different geometric transformations. To this end, we construct an equivariant contrastive loss to strengthen global representation learning ability of the encoder. Extensive experiments conducted on popular medical image benchmarks demonstrate that the proposed method achieves better segmentation performance than the state-of-the-art methods.