Author name cluster

Rui Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

TCS Journal 2026 Journal Article

Characterization of minimum restricted arc-cuts of unidirectional hypercubes

Xiaohui Hua
Rui Ma

For a strongly connected digraph D, the restricted arc-connectivity λ′(D) is defined as the minimum cardinality of an arc-cut over all arc-cuts F satisfying that D − F has a non-trivial strong component D′ such that D − V ( D ′ ) contains an arc. A restricted arc-cut F of D is called a minimum restricted arc-cut if | F | = λ ′ ( D ). A strongly connected digraph D is hyper-λ′ if the removal of any minimum restricted arc-cut F of D such that D − F has exactly one non-trivial strong component D′, and D − V ( D ′ ) only contains an arc. In this paper, we prove that unidirectional hypercube Q n → is hyper-λ′ for n ≥ 3.

Details DOI

AAAI Conference 2026 Conference Paper

Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View

Jianyu Qi
Ding Zou
Wenrui Yan
Rui Ma
Jiaxu Li
Zhijie Zheng
Zhiguo Yang
Rongchang Zhao

Recent advances in Multimodal Large Language Models (MLLMs) have spurred significant progress in Chain-of-Thought (CoT) reasoning. Building on the success of Deepseek-R1, researchers extended multimodal reasoning to post-training paradigms based on reinforcement learning (RL), focusing predominantly on mathematical datasets. However, existing post-training paradigms tend to neglect two critical aspects: (1) The lack of quantifiable difficulty metrics capable of strategically screening samples for post-training optimization. (2) Suboptimal post-training paradigms that fail to jointly optimize perception and reasoning capabilities. To address this gap, we propose two novel difficulty-aware sampling strategies: Progressive Image Semantic Masking (PISM) quantifies sample hardness through systematic image degradation, while Cross-Modality Attention Balance (CMAB) assesses cross-modal interaction complexity via attention distribution analysis. Leveraging these metrics, we design a hierarchical training framework that incorporates both GRPO-only and SFT+GRPO hybrid training paradigms, and evaluate them across six benchmark datasets. Experiments demonstrate consistent superiority of GRPO applied to difficulty-stratified samples compared to conventional SFT+GRPO pipelines, indicating that strategic data sampling can obviate the need for supervised fine-tuning while improving model accuracy.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

Xie Tianyidan
Rui Ma
Qian Wang
Xiaoqian Ye
Feixuan Liu
Ying Tai
Zhenyu Zhang
Lanjun Wang

Recent advancements in image-conditioned image generation have demonstrated substantial progress. However, foreground-conditioned image generation remains underexplored, encountering challenges such as compromised object integrity, foreground-background inconsistencies, limited diversity, and reduced control flexibility. These challenges arise from current end-to-end inpainting models, which suffer from inaccurate training masks, limited foreground semantic understanding, data distribution biases, and inherent interference between visual and textual prompts. To overcome these limitations, we present Anywhere, a multi-agent framework that departs from the traditional end-to-end approach. In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency. Our framework is further enhanced with the ability to incorporate optional user textual inputs, perform automated quality assessments, and initiate re-generation as needed. Comprehensive experiments demonstrate that this modular design effectively overcomes the limitations of existing end-to-end models, resulting in higher fidelity, quality, diversity and controllability in foreground-conditioned image generation. Additionally, the Anywhere framework is extensible, allowing it to benefit from future advancements in each individual agent.

PDF Details DOI

ICML Conference 2025 Conference Paper

P(all-atom) Is Unlocking New Path For Protein Design

Wei Qu
Jiawei Guan
Rui Ma
Ke Zhai 0008
Weikun Wu
Haobo Wang

We introduce Pallatom, an innovative protein generation model capable of producing protein structures with all-atom coordinates. Pallatom directly learns and models the joint distribution $P(\textit{structure}, \textit{seq})$ by focusing on $P(\textit{all-atom})$, effectively addressing the interdependence between sequence and structure in protein generation. To achieve this, we propose a novel network architecture specifically designed for all-atom protein generation. Our model employs a dual-track framework that tokenizes proteins into token-level and atomic-level representations, integrating them through a multi-layer decoding process with "traversing" representations and recycling mechanism. We also introduce the $\texttt{atom14}$ representation method, which unifies the description of unknown side-chain coordinates, ensuring high fidelity between the generated all-atom conformation and its physical structure. Experimental results demonstrate that Pallatom excels in key metrics of protein design, including designability, diversity, and novelty, showing significant improvements across the board. Our model not only enhances the accuracy of protein generation but also exhibits excellent sampling efficiency, paving the way for future applications in larger and more complex systems.

Details

NeurIPS Conference 2025 Conference Paper

PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion

Linlian Jiang
Rui Ma
Li Gu
Ziqiang Wang
Xinxin Zuo
Yang Wang

Point cloud completion is essential for robust 3D perception in safety-critical applications such as robotics and augmented reality. However, existing models perform static inference and rely heavily on inductive biases learned during training, limiting their ability to adapt to novel structural patterns and sensor-induced distortions at test time. To address this limitation, we propose PointMAC, a meta-learned framework for robust test-time adaptation in point cloud completion. It enables sample-specific refinement without requiring additional supervision. Our method optimizes the completion model under two self-supervised auxiliary objectives that simulate structural and sensor-level incompleteness. A meta-auxiliary learning strategy based on Model-Agnostic Meta-Learning (MAML) ensures that adaptation driven by auxiliary objectives is consistently aligned with the primary completion task. During inference, we adapt the shared encoder on-the-fly by optimizing auxiliary losses, with the decoder kept fixed. To further stabilize adaptation, we introduce Adaptive $\lambda$-Calibration, a meta-learned mechanism for balancing gradients between primary and auxiliary objectives. Extensive experiments on synthetic, simulated, and real-world datasets demonstrate that PointMAC achieves state-of-the-art results by refining each sample individually to produce high-quality completions. To the best of our knowledge, this is the first work to apply meta-auxiliary test-time adaptation to point cloud completion.

PDF Details

NeurIPS Conference 2025 Conference Paper

Results of the Big ANN: NeurIPS’23 competition

Harsha Vardhan simhadri
Martin Aumüller
Matthijs Douze
Dmitry Baranchuk
Amir Ingber
Edo Liberty
George Williams
Ben Landrum

The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect its the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search (Simhadri et al. , NeurIPS 2021), this competition addressed sparse, filtered, out-of-distribution, and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

PDF Details

AAAI Conference 2025 Conference Paper

SigStyle: Signature Style Transfer via Personalized Text-to-Image Models

Ye Wang
Tongyuan Bai
Xuping Xie
Zili Yi
Yilin Wang
Rui Ma

Style transfer enables the seamless integration of artistic styles from a style image into a content image, resulting in visually striking and aesthetically enriched outputs. Despite numerous advances in this field, existing methods did not explicitly focus on the signature style, which represents the distinct and recognizable visual traits of the image such as geometric and structural patterns, color palettes and brush strokes etc. In this paper, we introduce SigStyle, a framework that leverages the semantic priors that embedded in a personalized text-to-image diffusion model to capture the signature style representation. This style capture process is powered by a hypernetwork that efficiently fine-tunes the diffusion model for any given single style image. Style transfer then is conceptualized as the reconstruction process of content image through learned style tokens from the personalized diffusion model. Additionally, to ensure the content consistency throughout the style transfer process, we introduce a time-aware attention swapping technique that incorporates content information from the original image into the early denoising steps of target image generation. Beyond enabling high-quality signature style transfer across a wide range of styles, SigStyle supports multiple interesting applications, such as local style transfer, texture transfer, style fusion and style-guided text-to-image generation. Quantitative and qualitative evaluations demonstrate our approach outperforms existing style transfer methods for recognizing and transferring the signature styles.

PDF Details DOI

AAAI Conference 2024 Conference Paper

B-spine: Learning B-spline Curve Representation for Robust and Interpretable Spinal Curvature Estimation

Hao Wang
Qiang Song
Ruofeng Yin
Rui Ma

Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we propose B-Spine, a novel deep learning pipeline to learn B-spline curve representation of the spine and estimate the Cobb angles for spinal curvature estimation from low-quality X-ray images. Given a low quality input, a novel SegRefine network which employs the unpaired image-to-image translation is proposed to generate a high quality spine mask from the initial segmentation result. Next, a novel mask-based B-spline prediction model is proposed to predict the B-spline curve for the spine centerline. Finally, the Cobb angles are estimated by a hybrid approach which combines the curve slope analysis and a curve based regression model. We conduct quantitative and qualitative comparisons with the representative and SOTA learning-based methods on the public AASCE2019 dataset and our new proposed JLU-CJUH dataset which contains more challenging low-quality images. The superior performance on both datasets shows our method can achieve both robustness and interpretability for spinal curvature estimation.

PDF Details DOI

EAAI Journal 2024 Journal Article

Differentiable sampling based efficient architecture search for automatic fault diagnosis

Xingwu Zhang
Rui Ma
Yu Zhao
Chenxi Wang
Zhibin Zhao
Xuefeng Chen

Intelligent diagnosis on rotating machinery has developed rapidly, but different methods have fluctuating performance and fussy design, causing poor effect in practical applications. Thus, it would be great to automatically generate the optimal method for given diagnosis tasks, as differentiable neural architecture search (DNAS) does. However, three challenges severely restrict DNAS methods in industrial scenarios: 1) vibration signals are multi-scale and non-stationary; 2) huge memory cost by supernet-based search is unsuitable to practical diagnosis; 3) manual architecture derivation causes performance collapse between architecture search and practical diagnosis. Thus, we propose Differentiable Sampling based Efficient Architecture Search (DS-EAS), which generates architecture by differentiable sampling. First, the operator involution is introduced to adaptively extract critical features from noisy signals. Second, Gumbel Max-Softmax is adopted to forward sample and backward propagate the gradient on single sub-architecture at one iteration, alleviating huge memory cost. Third, progressively pruning is proposed to eliminate manual discretization error, leading to the final architecture with zero operators. Based on the searched architecture, a deeper one is built to test its real performance. Traction motor experiment is performed to discuss the performance of DS-EAS on three different sample cases. Compared with other state-of-the-art methods, outperformance of DS-EAS is successfully verified.

Details DOI

AAAI Conference 2024 Conference Paper

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen
Ning Liu
Yichen Zhu
Zhengping Che
Rui Ma
Fachao Zhang
Xiaofeng Mou
Yi Chang

Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning with SD for efficient model compression. In this work, we propose the framework named Early Pruning with Self-Distillation (EPSD), which identifies and preserves distillable weights in early pruning for a given SD task. EPSD efficiently combines early pruning and self-distillation in a two-step process, maintaining the pruned network's trainability for compression. Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network. We demonstrated that EPSD improves the training of pruned networks, supported by visual and quantitative analyses. Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-ImageNet, full ImageNet, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques.

PDF Details DOI

JBHI Journal 2022 Journal Article

Continuous Bimanual Trajectory Decoding of Coordinated Movement From EEG Signals

Yi-Feng Chen
Ruiqi Fu
Junde Wu
Jongbin Song
Rui Ma
Yi-Chuan Jiang
Mingming Zhang

While many voluntary movements involve bimanual coordination, few attempts have been made to simultaneously decode the trajectory of bimanual movements from electroencephalogram (EEG) signals. In this study, we proposed a novel bimanual brain-computer interface (BCI) paradigm to reconstruct the continuous trajectory of both hands during coordinated movements from EEG. The protocol required human subjects to complete a bimanual reaching task to the left, middle, or right target while EEG data were collected. A multi-task deep learning model combining the EEGNet and long short-term memory network (LSTM) was proposed to decode bimanual trajectories, including position and velocity. Decoding performance was evaluated in terms of the correlation coefficient (CC) and normalized root mean square error (NRMSE) between decoded and real trajectories. Experimental results from 13 human subjects showed that the grand-averaged combined CC values achieved 0. 54 and 0. 42 for position and velocity decoding, respectively. The corresponding combined NRMSE values were 0. 22 and 0. 23. Both CC and NRMSE were significantly superior to the chance level (p<0. 05). Comparative experiments also indicated that the proposed model significantly outperformed some other commonly-used methods in terms of CC and NRMSE for continuous trajectory decoding. These findings demonstrated the feasibility of simultaneously decoding bimanual trajectory from EEG, indicating the potential of bimanual control for coordinated tasks.

Details DOI

ICRA Conference 2021 Conference Paper

Design and Experimental Validation of a Robotic System for Reactor Core Detector Removal

Zhe Han
Huanyu Tian
Fansheng Meng
Hao Wen 0003
Rui Ma
Xingguang Duan
Yilin Zhang
Chenghua Liu

The reactor power and the coolant level in the nuclear plant are monitored via the reactor core detectors. Every 4 to 5 years, the detectors with high-level radiation need to be removed, which is time-consuming and hazardous for workers. To address this issue, this paper introduces a novel robotic system and its strategy for the removal of the detectors. The modular mechanisms are designed to achieve diverse actions such as positioning, extracting, transporting, cutting, and coiling. The detector with different radiation doses is physically classified and minimized in volume. The experiments to simulate the removal process are conducted. The results demonstrate that the time for the robotic removal of one detector is reduced from more than 1 hour to 31. 2±5. 3 min compared with the manual mode. The radiation exposure time for workers is reduced to 0 under normal working conditions, which significantly reduces the radiation dose compared with the traditional methods.

Details