Author name cluster

Dong Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

DA-DFGAS:Differentiable Federated Graph Neural Architecture Search with Distribution-Aware Attentive Aggregation

Zhaowei Liu
Yihao Jiang
Rufei Gao
Jinglei Liu
Dong Yang

Graph Neural Networks (GNNs) have demonstrated superior performance in processing centralized graph-structured data. However, real-world privacy and security concerns hinder data centralization and shareing, leading to severe data isolation (data silos). While Federated Learning (FL) offers a distributed solution to mitigate these obstacles, existing Federated Graph Neural Network (FedGNN) frameworks struggle to effectively address data heterogeneity. To address this, this paper proposes DA-DFGAS, a federated graph neural architecture search algorithm. Specifically, DA-DFGAS facilitates model personalization via a directed tree topology and path constraint mechanisms, while simultaneously employing a joint self-attention mechanism based on predicted probability distributions to capture distributional variations across multiple clients. Furthermore, it integrates a bi-level global-local objective optimization strategy to ensure global model consistency while preserving local adaptability. Experimental results on multiple datasets demonstrate that DA-DFGAS outperforms state-of-the-art methods, achieving 0.5–3.0% accuracy improvements over centralized baselines and 0.5–5.0% over federated counterparts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

Can Zhao
Pengfei Guo
Dong Yang
Yufan He
Yucheng Tang
Benjamin Simon
Mason Belue
Stephanie Harmon

Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability, only working for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high-quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to improve sensitivity to the region of interest. Our experiments show that MAISI-v2 can achieve state-of-the-art image quality with 33× acceleration for latent diffusion models. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.

PDF Details DOI

EAAI Journal 2026 Journal Article

Uncertainty-aware vessel trajectory prediction for heterogeneous data fusion in internet of things-driven smart waterways

Yuxu Lu
Kaisen Yang
Dong Yang
Haifeng Ding
Jinxian Weng
Mingyang Zhang

Details DOI

NeurIPS Conference 2025 Conference Paper

Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging

Ibrahim Ethem Hamamci
Sezgin Er
Suprosanna Shit
Hadrien Reynaud
Dong Yang
Pengfei Guo
Marc Edgar
Daguang Xu

Recent progress in vision-language modeling for 3D medical imaging has been fueled by large-scale computed tomography (CT) corpora with paired free-text reports, stronger architectures, and powerful pretrained models. This has enabled applications such as automated report generation and text-conditioned 3D image synthesis. Yet, current approaches struggle with high-resolution, long-sequence volumes: contrastive pretraining often yields vision encoders that are misaligned with clinical language, and slice-wise tokenization blurs fine anatomy, reducing diagnostic performance on downstream tasks. We introduce BTB3D (Better Tokens for Better 3D), a causal convolutional encoder-decoder that unifies 2D and 3D training and inference while producing compact, frequency-aware volumetric tokens. A three-stage training curriculum enables (i) local reconstruction, (ii) overlapping-window tiling, and (iii) long-context decoder refinement, during which the model learns from short slice excerpts yet generalizes to scans exceeding $300$ slices without additional memory overhead. BTB3D sets a new state-of-the-art on two key tasks: it improves BLEU scores and increases clinical F1 by 40\% over CT2Rep, CT-CHAT, and Merlin for report generation; and it reduces FID by 75\% and halves FVD compared to GenerateCT and MedSyn for text-to-CT synthesis, producing anatomically consistent $512\times512\times241$ volumes. These results confirm that precise three-dimensional tokenization, rather than larger language backbones alone, is essential for scalable vision-language modeling in 3D medical imaging. The codebase is available at: https: //github. com/ibrahimethemhamamci/BTB3D

PDF Details

EAAI Journal 2025 Journal Article

Inversion of tunnel fires using limited monitored temperature data based on transfer learning approach and full-scale scenario applications

Li Jiang
Xin Guo
Ying Yang
Dong Yang

Details DOI

NeurIPS Conference 2025 Conference Paper

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Dong Yang
YIYI CAI
Yuki Saito
Lixu Wang
Hiroshi Saruwatari

We propose Shallow Flow Matching (SFM), a novel mechanism that enhances flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. Unlike conventional FM modules, which use the coarse representations from the weak generator as conditions, SFM constructs intermediate states along the FM paths from these representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. The SFM inference starts from the intermediate state rather than pure noise, thereby focusing computation on the latter stages of the FM paths. We integrate SFM into multiple TTS models with a lightweight SFM head. Experiments demonstrate that SFM yields consistent gains in speech naturalness across both objective and subjective evaluations, and significantly accelerates inference when using adaptive-step ODE solvers. Demo and codes are available at https: //ydqmkkx. github. io/SFMDemo/.

PDF Details

ICRA Conference 2024 Conference Paper

HPF-SLAM: An Efficient Visual SLAM System Leveraging Hybrid Point Features

Xin Su
Sebastian Eger
Adam Misik
Dong Yang
Rastin Pries
Eckehard G. Steinbach

Visual SLAM is an essential tool in diverse applications such as robot perception and extended reality, where feature-based methods are prevalent due to their accuracy and robustness. However, existing methods employ either hand-crafted or solely learnable point features and are thus limited by the feature attributes. In this paper, we propose incorporating hybrid point features efficiently into a single system. By integrating hand-crafted and learnable features, we seek to capitalize on their complementary attributes in both key-point identification and descriptor expressiveness. To this purpose, we design a pre-processing module, which includes extraction, inter-class processing, and post-processing of hybrid point features. We present an efficient matching approach to exclusively perform the data association within the same class of features. Moreover, we design a Hybrid Bag-of-Words (H-BoW) model to deal with hybrid point features in matching and loop-closure-detection. By integrating the proposed framework into a modern feature-based system, we introduce HPF-SLAM. We evaluate the system on EuRoC-MAV and TUM-RGBD benchmarks. The experimental results show that our method consistently surpasses the baseline at comparable speed.

Details

IJCAI Conference 2023 Conference Paper

A Canonicalization-Enhanced Known Fact-Aware Framework For Open Knowledge Graph Link Prediction

Yilin Wang
Minghao Hu
Zhen Huang
Dongsheng Li
Wei Luo
Dong Yang
Xicheng Lu

Open knowledge graph (OpenKG) link prediction aims to predict missing factual triples in the form of (head noun phrase, relation phrase, tail noun phrase). Since triples are not canonicalized, previous methods either focus on canonicalizing noun phrases (NPs) to reduce graph sparsity, or utilize textual forms to improve type compatibility. However, they neglect to canonicalize relation phrases (RPs) and triples, making OpenKG maintain high sparsity and impeding the performance. To address the above issues, we propose a Canonicalization-Enhanced Known Fact-Aware (CEKFA) framework that boosts link prediction performance through sparsity reduction of RPs and triples. First, we propose a similarity-driven RP canonicalization method to reduce RPs' sparsity by sharing knowledge of semantically similar ones. Second, to reduce the sparsity of triples, a known fact-aware triple canonicalization method is designed to retrieve relevant known facts from training data. Finally, these two types of canonical information are integrated into a general two-stage re-ranking framework that can be applied to most existing knowledge graph embedding methods. Experiment results on two OpenKG datasets, ReVerb20K and ReVerb45K, show that our approach achieves state-of-the-art results. Extensive experimental analyses illustrate the effectiveness and generalization ability of the proposed framework.

PDF Details DOI

IROS Conference 2023 Conference Paper

Haptic Dataset Augmentation with Subjective QoE Labels using Conditional Generative Adversarial Network

Zican Wang
Xiao Xu 0001
Dong Yang
Zhenyu Wang 0010
Sarah Shtaierman
Eckehard G. Steinbach

This paper proposes a novel Generative Adversarial Network (GAN)-based strategy to augment subjective haptic Quality of Experience (QoE) datasets for bilateral teleoperation with haptic feedback without conducting time-consuming subjective experiments. In our previous work, we proposed a multi-assessment fusion approach to predict subjective haptic quality using a collection of objective metrics. This method requires a sufficiently large haptic dataset with QoE labels. The proposed generative approach automatically expands the existing haptic quality dataset by combining a modified conditional GAN (CGAN) and Style GAN (StyleGAN) architecture. The most important feature of our method is that it learns from the labeled training data and focuses on synthesizing signals with artifacts according to new input labels containing the QoE score, time delay, control method, and data reduction information. Extensive experiments are conducted to validate the suitability of the expanded dataset. The results show that our approach is able to generate new data, which match the label and signal distribution of the original data with categorical rank and linear correlation of over 0. 85.

Details

ICRA Conference 2023 Conference Paper

SRI-Graph: A Novel Scene-Robot Interaction Graph for Robust Scene Understanding

Dong Yang
Xiao Xu 0001
Mengchen Xiong
Edwin Babaians
Eckehard G. Steinbach

We propose a novel scene-robot interaction graph (SRI-Graph) that exploits the known position of a mobile manipulator for robust and accurate scene understanding. Compared to the state-of-the-art scene graph approaches, the proposed SRI-Graph captures not only the relationships between the objects, but also the relationships between the robot manipulator and objects with which it interacts. To improve the detection accuracy of spatial relationships, we leverage the 3D position of the mobile manipulator in addition to RGB images. The manipulator's ego information is crucial for a successful scene understanding when the relationships are visually uncertain. The proposed model is validated for a real-world 3D robot-assisted feeding task. We release a new dataset named 3DRF-Pos for training and validation. We also develop a tool, named LabelImg-Rel, as an extension of the open-sourced image annotation tool LabelImg for a convenient annotation in robot-environment interaction scenarios *. Our experimental results using the Movo platform show that SRI-Graph outperforms the state-of-the-art approach and improves detection accuracy by up to 9. 83%.

Details

IROS Conference 2022 Conference Paper

Skill-CPD: Real-time Skill Refinement for Shared Autonomy in Manipulator Teleoperation

Edwin Babaians
Dong Yang
Mojtaba Karimi
Xiao Xu 0001
Serkut Ayvasik
Eckehard G. Steinbach

Advanced wireless communication networks provide lower latency and a higher transmission rate. Although this is an enabler for many new teleoperation applications, the risk of network instability or packet drop is still unavoidable. Real-time manipulator teleoperation requires data transmission with no discontinuity. Shared autonomy (SA) is a standard method to mitigate this issue. In this way, if the data from the remote side is unavailable, the controller can continue based on the previously observed models. However, due to the spatial gap between human and robot trajectories, indisputable fluctuations occur, which cause issues in teleoperation applications. This motivates us to propose a new skill refinement strategy to modify the previously trained skill and mitigate the sudden unwanted motions within the control takeover phase. To this end, our approach comprises applying the Hidden Semi-Markov Model (HSMM) and Linear Quadratic Tracker (LQT) in combination to learn and predict the user's intentions and then exploiting Coherent Point Drift (CPD) to refine the executable trajectory. We test our method both in simulation and in the real world for 2D English letter drawing and 3D robot-assisted feeding scenarios. Our experimental results using the Kinova® Movo platform show that the proposed refinement approach generates a stable trajectory and mitigates the control switching inconsistency. All comprehensive experiments and source code is available at: http://cxdcxd.github.io/SkillCPD.

Details

IJCAI Conference 2019 Conference Paper

Ensemble-based Ultrahigh-dimensional Variable Screening

Wei Tu
Dong Yang
Linglong Kong
Menglu Che
Qian Shi
Guodong Li
Guangjian Tian

Since the sure independence screening (SIS) method by Fan and Lv, many different variable screening methods have been proposed based on different measures under different models. However, most of these methods are designed for specific models. In practice, we often have very little information about the data generating process and different methods can result in very different sets of features. The heterogeneity presented here motivates us to combine various screening methods simultaneously. In this paper, we introduce a general ensemble-based framework to efficiently combine results from multiple variable screening methods. The consistency and sure screening property of proposed framework has been established. Extensive simulation studies confirm our intuition that the proposed ensemble-based method is more robust against model specification than using single variable screening method. The proposed ensemble-based method is used to predict attention deficit hyperactivity disorder (ADHD) status using brain function connectivity (FC).

PDF Details

EAAI Journal 2012 Journal Article

A dynamic constraint satisfaction approach for configuring structural products under mass customization

Dong Yang
Ming Dong
Xiao-Kun Chang

Details DOI