Author name cluster

Jian Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

160 papers

2 author rows

EAAI Journal 2026 Journal Article

Dense synergistic attention network: An effective CNN model for communication facilities image classification

Dianzhi Yu
Yan Min
Jian Yang
Honghao Li
Jiao Zhou
Piao Yang
Qian Tian
Yuanyuan Li

Details DOI

AAAI Conference 2026 Conference Paper

Diffusion-Based Contextual Reconstruction for Point Cloud Segmentation with Limited Annotations

Jiawei Lian
Zhengxue Wang
Wentao Qu
Haobo Jiang
Le Hui
Jian Yang

Point cloud semantic segmentation is fundamental to 3D scene understanding, but dense annotation requirements limit scalability. Although recent label propagation and contrastive learning methods enhance local consistency, the incomplete object coverage caused by sparse annotations hinders global context modeling, ultimately limiting overall performance. To this end, we propose a diffusion-based contextual reconstruction framework for point cloud semantic segmentation with limited annotations. At its core, our framework guides denoising with semantic predictions, using better context reconstruction to enhance the conditional model for better segmentation. Specifically, our contributions include: (1) Diffusion-based segmentation framework: reconstructs contextual semantics from noise under conditional guidance, sharing the decoder with the segmentation module for robust contextual semantic learning. (2) Dynamically aggregates local context from segmentation features and guides denoising with global spatial structure, significantly enhancing denoising quality and contextual awareness. Notably, we pioneer diffusion models for 3D semantic segmentation with limited annotations, enabling efficient single-step inference. Experiments show robustness across varying annotation ratios and state-of-the-art performance on benchmarks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging

Jialin Wu
Jian Yang
Handing Wang
Jiajun Wen
Zhiyong Yu

Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users the ability to explicitly balance performance trade-offs. Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation. This offline stage typically involves iterative search or dedicated training, with complexity that grows exponentially with the number of tasks. To overcome these limitations, we shift the perspective from parameter-space optimization to a direct correction of the model's final representation. Our approach models this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation. This solution directly incorporates user preferences, allowing a Pareto-optimal model to be generated on-the-fly with complexity that scales linearly with the number of tasks. Experimental results show our method generates a superior Pareto front with more precise preference alignment and drastically reduced computational cost.

PDF Details DOI

EAAI Journal 2026 Journal Article

Interpretable prediction and simplified calculation of blast load on structure surface based on machine learning and theoretical model

Dingkun Yang
Jian Yang
Jun Shi

Details DOI

JBHI Journal 2026 Journal Article

Point-Supervised Coronary Semantic Segmentation in X-Ray Angiographic Images

Ying Chen
Danni Ai
Jianyu Du
Yuanyuan Wang
Tianyu Fu
Deqiang Xiao
Yucong Lin
Long Shao

Coronary semantic segmentation in X-ray angiography is essential for computer-aided diagnosis and treatment planning of coronary artery disease (CAD). Despite its importance, this task remains highly challenging due to the complex and interconnected vascular topology, as well as the similar visual characteristics among different branches, making dense pixel-level manual annotation difficult and labor-intensive. To alleviate this burden, we propose a point-supervised coronary semantic segmentation framework that significantly reduces annotation effort without compromising segmentation accuracy. The primary challenge of point label based supervision lies in the model's tendency to overfit sparse point labels, leading to limited generalization to pixel-level predictions. To enrich the supervision signals and stabilize the training process with the sparse point labels, we propose an adaptive foreground mask generation module and a region regularization strategy to ensure accurate semantic guidance while maximizing meaningful coverage of the vascular structures. To enhance coronary topology perception and branch differentiation, we propose a multi-task learning framework that jointly performs keypoint detection and coronary semantic segmentation through a shared feature extraction encoder and two task-specific decoders. The experimental results demonstrate that our point-supervised model achieves performance comparable to fully supervised model, and outperforms the existing state-of-the-art point-supervised semantic segmentation methods.

Details DOI

AAAI Conference 2026 Conference Paper

RMLer: Synthesizing Novel Objects Across Diverse Categories via Reinforcement Mixing Learning

Jun Li
Zikun Chen
Haibo Chen
Shuo Chen
Jian Yang

Novel object synthesis by integrating distinct textual concepts from diverse categories remains a significant challenge in text-to-image generation. Existing methods often suffer from insufficient concept mixing, lack of rigorous evaluation, and suboptimal outputs, resulting in conceptual imbalance, superficial combinations, or mere juxtapositions. To address these limitations, we propose Reinforcement Mixing Learning (RMLer), a framework that formulates cross-category concept fusion as a reinforcement learning problem: mixed features serve as states, mixing strategies as actions, and visual outcomes as rewards. Specifically, we design an MLP policy network to predict dynamic coefficients for blending cross-category text embeddings. We further introduce visual rewards based on (1) semantic similarity and (2) compositional balance between the fused object and its constituent concepts, and optimize the policy via proximal policy optimization. At inference time, a selection strategy leverages these rewards to curate the highest-quality fused objects. Extensive experiments demonstrate that RMLer synthesizes coherent, high-fidelity objects from diverse categories and consistently outperforms existing methods. Our work provides a robust framework for generating novel visual concepts, with promising applications in film, gaming, and design.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Shaping Without Tearing: Controllable Diffeomorphic Deformations for Topology-Preserving 3D Point Cloud Augmentation

Jian Bi
Qianliang Wu
Jianjun Qian
Lei Luo
Jian Yang

Point cloud data augmentation is critical to improving the generalization of 3D deep learning models. However, existing methods often fail to preserve the underlying manifold structure, leading to semantic distortion or topology violation. This causes models to learn untrustworthy features, thereby limiting the representational ability of the model. To overcome these limitations, we propose ManiPoint, a novel point cloud augmentation framework based on diffeomorphism that explicitly preserves manifold structure during deformation. ManiPoint constructs diffeomorphic transformations via continuous differentiable mappings, ensuring topological consistency and geometric continuity between original and augmented data. To prevent excessive distortion and ensure semantic consistency, we introduce a controllable deformation mechanism that quantitatively constrains the augmentation magnitude and enables fine-grained control over the deformation space. We further provide theoretical analysis, indicating that, compared with topologically inconsistent methods, ManiPoint reduces empirical and vicinal risks by generating diverse and structurally reliable samples. Extensive experiments and visualizations on object-level datasets demonstrate that ManiPoint produces high-quality augmentations and consistently improves model robustness over existing baselines. Meanwhile, the scalability of our method was further verified on the scene-level datasets.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

Yuxuan Li
Xiang Li
Yunheng Li
Yicheng Zhang
Yimian Dai
Qibin Hou
Ming-Ming Cheng
Jian Yang

With the rapid advancement of remote sensing technology, high-resolution multi-modal imagery is now more widely accessible. Conventional object detection models are trained on a single dataset, often restricted to a specific imaging modality and annotation format. However, such an approach overlooks the valuable shared knowledge across multi-modalities and limits the model’s applicability in more versatile scenarios. This paper introduces a new task called Multi-Modal Datasets and Multi-Task Object Detection (M2Det) for remote sensing, designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization. To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection). SM3Det leverages a grid-level sparse MoE backbone to enable joint knowledge learning while preserving distinct feature representations for different modalities. Furthermore, we propose a novel consistency and synchronization optimization mechanism, allowing it to effectively handle varying levels of learning difficulty across modalities and tasks. Extensive experiments demonstrate SM3Det's effectiveness and generalizability, consistently outperforming the combination of specialized models on individual datasets.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Small but Mighty: Dynamic Wavelet Expert-Guided Fine-Tuning of Large-Scale Models for Optical Remote Sensing Object Segmentation

Yanguang Sun
Chao Wang
Jian Yang
Lei Luo

Accurately localizing and segmenting relevant objects from optical remote sensing images (ORSIs) is critical for advancing remote sensing applications. Existing methods are typically built upon moderate-scale pre-trained models and employ diverse optimization strategies to achieve promising performance under full-parameter fine-tuning. In fact, deeper and larger-scale foundation models can provide stronger support for performance improvement. However, due to their massive number of parameters, directly adopting full-parameter fine-tuning leads to pronounced training difficulties, such as excessive GPU memory consumption and high computational costs, which result in extremely limited exploration of large-scale models in existing works. In this paper, we propose a novel dynamic wavelet expert-guided fine-tuning paradigm with fewer trainable parameters, dubbed WEFT, which efficiently adapts large-scale foundation models to ORSIs segmentation tasks by leveraging the guidance of wavelet experts. Specifically, we introduce a task-specific wavelet expert extractor to model wavelet experts from different perspectives and dynamically regulate their outputs, thereby generating trainable features enriched with task-specific information for subsequent fine-tuning. Furthermore, we construct an expert-guided conditional adapter that first enhances the fine-grained perception of frozen features for specific tasks by injecting trainable features, and then iteratively updates the information of both types of feature, allowing for efficient fine-tuning. Extensive experiments show that our WEFT not only outperforms 21 state-of-the-art (SOTA) methods on three ORSIs datasets, but also achieves optimal results in camouflage, natural, and medical scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SpatioTemporal Difference Network for Video Depth Super-Resolution

Zhengxue Wang
Yuan Wu
Xiang Li
Zhiqiang Yan
Jian Yang

Depth super-resolution has achieved impressive performance, and the incorporation of multi-frame information further enhances reconstruction quality. Nevertheless, statistical analyses reveal that video depth super-resolution remains affected by pronounced long-tailed distributions, with the long-tailed effects primarily manifesting in spatial non-smooth regions and temporal variation zones. To address these challenges, we propose a novel SpatioTemporal Difference Network (STDNet) comprising two core branches: a spatial difference branch and a temporal difference branch. In the spatial difference branch, we introduce a spatial difference mechanism to mitigate the long-tailed issues in spatial non-smooth regions. This mechanism dynamically aligns RGB features with learned spatial difference representations, enabling intra-frame RGB-D aggregation for depth calibration. In the temporal difference branch, we further design a temporal difference strategy that preferentially propagates temporal variation information from adjacent RGB and depth frames to the current depth frame, leveraging temporal difference representations to achieve precise motion compensation in temporal long-tailed areas. Extensive experimental results across multiple datasets demonstrate the effectiveness of our STDNet, outperforming existing approaches.

PDF Details DOI

TMLR Journal 2026 Journal Article

SpikingBrain: Spiking Brain-inspired Large Models

Yuqi Pan
Yupeng Feng
JingHao Zhuang
siyu ding
Han Xu
Zehao Liu
Bohan Sun
Yuhong Chou

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100× speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.