Arrow Research search

Author name cluster

Dong Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

DA-DFGAS:Differentiable Federated Graph Neural Architecture Search with Distribution-Aware Attentive Aggregation

  • Zhaowei Liu
  • Yihao Jiang
  • Rufei Gao
  • Jinglei Liu
  • Dong Yang

Graph Neural Networks (GNNs) have demonstrated superior performance in processing centralized graph-structured data. However, real-world privacy and security concerns hinder data centralization and shareing, leading to severe data isolation (data silos). While Federated Learning (FL) offers a distributed solution to mitigate these obstacles, existing Federated Graph Neural Network (FedGNN) frameworks struggle to effectively address data heterogeneity. To address this, this paper proposes DA-DFGAS, a federated graph neural architecture search algorithm. Specifically, DA-DFGAS facilitates model personalization via a directed tree topology and path constraint mechanisms, while simultaneously employing a joint self-attention mechanism based on predicted probability distributions to capture distributional variations across multiple clients. Furthermore, it integrates a bi-level global-local objective optimization strategy to ensure global model consistency while preserving local adaptability. Experimental results on multiple datasets demonstrate that DA-DFGAS outperforms state-of-the-art methods, achieving 0.5–3.0% accuracy improvements over centralized baselines and 0.5–5.0% over federated counterparts.

AAAI Conference 2026 Conference Paper

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

  • Can Zhao
  • Pengfei Guo
  • Dong Yang
  • Yufan He
  • Yucheng Tang
  • Benjamin Simon
  • Mason Belue
  • Stephanie Harmon

Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability, only working for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high-quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to improve sensitivity to the region of interest. Our experiments show that MAISI-v2 can achieve state-of-the-art image quality with 33× acceleration for latent diffusion models. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

NeurIPS Conference 2025 Conference Paper

Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging

  • Ibrahim Ethem Hamamci
  • Sezgin Er
  • Suprosanna Shit
  • Hadrien Reynaud
  • Dong Yang
  • Pengfei Guo
  • Marc Edgar
  • Daguang Xu

Recent progress in vision-language modeling for 3D medical imaging has been fueled by large-scale computed tomography (CT) corpora with paired free-text reports, stronger architectures, and powerful pretrained models. This has enabled applications such as automated report generation and text-conditioned 3D image synthesis. Yet, current approaches struggle with high-resolution, long-sequence volumes: contrastive pretraining often yields vision encoders that are misaligned with clinical language, and slice-wise tokenization blurs fine anatomy, reducing diagnostic performance on downstream tasks. We introduce BTB3D (Better Tokens for Better 3D), a causal convolutional encoder-decoder that unifies 2D and 3D training and inference while producing compact, frequency-aware volumetric tokens. A three-stage training curriculum enables (i) local reconstruction, (ii) overlapping-window tiling, and (iii) long-context decoder refinement, during which the model learns from short slice excerpts yet generalizes to scans exceeding $300$ slices without additional memory overhead. BTB3D sets a new state-of-the-art on two key tasks: it improves BLEU scores and increases clinical F1 by 40\% over CT2Rep, CT-CHAT, and Merlin for report generation; and it reduces FID by 75\% and halves FVD compared to GenerateCT and MedSyn for text-to-CT synthesis, producing anatomically consistent $512\times512\times241$ volumes. These results confirm that precise three-dimensional tokenization, rather than larger language backbones alone, is essential for scalable vision-language modeling in 3D medical imaging. The codebase is available at: https: //github. com/ibrahimethemhamamci/BTB3D

NeurIPS Conference 2025 Conference Paper

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

  • Dong Yang
  • YIYI CAI
  • Yuki Saito
  • Lixu Wang
  • Hiroshi Saruwatari

We propose Shallow Flow Matching (SFM), a novel mechanism that enhances flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. Unlike conventional FM modules, which use the coarse representations from the weak generator as conditions, SFM constructs intermediate states along the FM paths from these representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. The SFM inference starts from the intermediate state rather than pure noise, thereby focusing computation on the latter stages of the FM paths. We integrate SFM into multiple TTS models with a lightweight SFM head. Experiments demonstrate that SFM yields consistent gains in speech naturalness across both objective and subjective evaluations, and significantly accelerates inference when using adaptive-step ODE solvers. Demo and codes are available at https: //ydqmkkx. github. io/SFMDemo/.

ICRA Conference 2024 Conference Paper

HPF-SLAM: An Efficient Visual SLAM System Leveraging Hybrid Point Features

  • Xin Su
  • Sebastian Eger
  • Adam Misik
  • Dong Yang
  • Rastin Pries
  • Eckehard G. Steinbach

Visual SLAM is an essential tool in diverse applications such as robot perception and extended reality, where feature-based methods are prevalent due to their accuracy and robustness. However, existing methods employ either hand-crafted or solely learnable point features and are thus limited by the feature attributes. In this paper, we propose incorporating hybrid point features efficiently into a single system. By integrating hand-crafted and learnable features, we seek to capitalize on their complementary attributes in both key-point identification and descriptor expressiveness. To this purpose, we design a pre-processing module, which includes extraction, inter-class processing, and post-processing of hybrid point features. We present an efficient matching approach to exclusively perform the data association within the same class of features. Moreover, we design a Hybrid Bag-of-Words (H-BoW) model to deal with hybrid point features in matching and loop-closure-detection. By integrating the proposed framework into a modern feature-based system, we introduce HPF-SLAM. We evaluate the system on EuRoC-MAV and TUM-RGBD benchmarks. The experimental results show that our method consistently surpasses the baseline at comparable speed.

IJCAI Conference 2023 Conference Paper

A Canonicalization-Enhanced Known Fact-Aware Framework For Open Knowledge Graph Link Prediction

  • Yilin Wang
  • Minghao Hu
  • Zhen Huang
  • Dongsheng Li
  • Wei Luo
  • Dong Yang
  • Xicheng Lu

Open knowledge graph (OpenKG) link prediction aims to predict missing factual triples in the form of (head noun phrase, relation phrase, tail noun phrase). Since triples are not canonicalized, previous methods either focus on canonicalizing noun phrases (NPs) to reduce graph sparsity, or utilize textual forms to improve type compatibility. However, they neglect to canonicalize relation phrases (RPs) and triples, making OpenKG maintain high sparsity and impeding the performance. To address the above issues, we propose a Canonicalization-Enhanced Known Fact-Aware (CEKFA) framework that boosts link prediction performance through sparsity reduction of RPs and triples. First, we propose a similarity-driven RP canonicalization method to reduce RPs' sparsity by sharing knowledge of semantically similar ones. Second, to reduce the sparsity of triples, a known fact-aware triple canonicalization method is designed to retrieve relevant known facts from training data. Finally, these two types of canonical information are integrated into a general two-stage re-ranking framework that can be applied to most existing knowledge graph embedding methods. Experiment results on two OpenKG datasets, ReVerb20K and ReVerb45K, show that our approach achieves state-of-the-art results. Extensive experimental analyses illustrate the effectiveness and generalization ability of the proposed framework.

IROS Conference 2023 Conference Paper

Haptic Dataset Augmentation with Subjective QoE Labels using Conditional Generative Adversarial Network

  • Zican Wang
  • Xiao Xu 0001
  • Dong Yang
  • Zhenyu Wang 0010
  • Sarah Shtaierman
  • Eckehard G. Steinbach

This paper proposes a novel Generative Adversarial Network (GAN)-based strategy to augment subjective haptic Quality of Experience (QoE) datasets for bilateral teleoperation with haptic feedback without conducting time-consuming subjective experiments. In our previous work, we proposed a multi-assessment fusion approach to predict subjective haptic quality using a collection of objective metrics. This method requires a sufficiently large haptic dataset with QoE labels. The proposed generative approach automatically expands the existing haptic quality dataset by combining a modified conditional GAN (CGAN) and Style GAN (StyleGAN) architecture. The most important feature of our method is that it learns from the labeled training data and focuses on synthesizing signals with artifacts according to new input labels containing the QoE score, time delay, control method, and data reduction information. Extensive experiments are conducted to validate the suitability of the expanded dataset. The results show that our approach is able to generate new data, which match the label and signal distribution of the original data with categorical rank and linear correlation of over 0. 85.

ICRA Conference 2023 Conference Paper

SRI-Graph: A Novel Scene-Robot Interaction Graph for Robust Scene Understanding

  • Dong Yang
  • Xiao Xu 0001
  • Mengchen Xiong
  • Edwin Babaians
  • Eckehard G. Steinbach

We propose a novel scene-robot interaction graph (SRI-Graph) that exploits the known position of a mobile manipulator for robust and accurate scene understanding. Compared to the state-of-the-art scene graph approaches, the proposed SRI-Graph captures not only the relationships between the objects, but also the relationships between the robot manipulator and objects with which it interacts. To improve the detection accuracy of spatial relationships, we leverage the 3D position of the mobile manipulator in addition to RGB images. The manipulator's ego information is crucial for a successful scene understanding when the relationships are visually uncertain. The proposed model is validated for a real-world 3D robot-assisted feeding task. We release a new dataset named 3DRF-Pos for training and validation. We also develop a tool, named LabelImg-Rel, as an extension of the open-sourced image annotation tool LabelImg for a convenient annotation in robot-environment interaction scenarios *. Our experimental results using the Movo platform show that SRI-Graph outperforms the state-of-the-art approach and improves detection accuracy by up to 9. 83%.

IROS Conference 2022 Conference Paper

Skill-CPD: Real-time Skill Refinement for Shared Autonomy in Manipulator Teleoperation

  • Edwin Babaians
  • Dong Yang
  • Mojtaba Karimi
  • Xiao Xu 0001
  • Serkut Ayvasik
  • Eckehard G. Steinbach

Advanced wireless communication networks provide lower latency and a higher transmission rate. Although this is an enabler for many new teleoperation applications, the risk of network instability or packet drop is still unavoidable. Real-time manipulator teleoperation requires data transmission with no discontinuity. Shared autonomy (SA) is a standard method to mitigate this issue. In this way, if the data from the remote side is unavailable, the controller can continue based on the previously observed models. However, due to the spatial gap between human and robot trajectories, indisputable fluctuations occur, which cause issues in teleoperation applications. This motivates us to propose a new skill refinement strategy to modify the previously trained skill and mitigate the sudden unwanted motions within the control takeover phase. To this end, our approach comprises applying the Hidden Semi-Markov Model (HSMM) and Linear Quadratic Tracker (LQT) in combination to learn and predict the user's intentions and then exploiting Coherent Point Drift (CPD) to refine the executable trajectory. We test our method both in simulation and in the real world for 2D English letter drawing and 3D robot-assisted feeding scenarios. Our experimental results using the Kinova® Movo platform show that the proposed refinement approach generates a stable trajectory and mitigates the control switching inconsistency. All comprehensive experiments and source code is available at: http://cxdcxd.github.io/SkillCPD.

IJCAI Conference 2019 Conference Paper

Ensemble-based Ultrahigh-dimensional Variable Screening

  • Wei Tu
  • Dong Yang
  • Linglong Kong
  • Menglu Che
  • Qian Shi
  • Guodong Li
  • Guangjian Tian

Since the sure independence screening (SIS) method by Fan and Lv, many different variable screening methods have been proposed based on different measures under different models. However, most of these methods are designed for specific models. In practice, we often have very little information about the data generating process and different methods can result in very different sets of features. The heterogeneity presented here motivates us to combine various screening methods simultaneously. In this paper, we introduce a general ensemble-based framework to efficiently combine results from multiple variable screening methods. The consistency and sure screening property of proposed framework has been established. Extensive simulation studies confirm our intuition that the proposed ensemble-based method is more robust against model specification than using single variable screening method. The proposed ensemble-based method is used to predict attention deficit hyperactivity disorder (ADHD) status using brain function connectivity (FC).