Arrow Research search

Author name cluster

Xi Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

AAAI Conference 2026 Conference Paper

Learning Compact Latent Space for Representing Neural Signed Distance Functions with High-fidelity Geometry Details

  • Qiang Bai
  • Bojian Wu
  • Xi Yang
  • Zhizhong Han

Neural signed distance functions (SDFs) have been a vital representation to represent 3D shapes or scenes with neural networks. An SDF is an implicit function that can query signed distances at specific coordinates for recovering a 3D surface. Although implicit functions work well on a single shape or scene, they pose obstacles when analyzing multiple SDFs with high-fidelity geometry details, due to the limited information encoded in the latent space for SDFs and the loss of geometry details. To overcome these obstacles, we introduce a method to represent multiple SDFs in a common space, aiming to recover more high-fidelity geometry details with more compact latent representations. Our key idea is to take full advantage of the benefits of generalization-based and overfitting-based learning strategies, which manage to preserve high-fidelity geometry details with compact latent codes. Based on this framework, we also introduce a novel sampling strategy to sample training queries. The sampling can improve the training efficiency and eliminate artifacts caused by the influence of other SDFs. We report numerical and visual evaluations on widely used benchmarks to validate our designs and show advantages over the latest methods in terms of the representative ability and compactness.

JBHI Journal 2026 Journal Article

MedMAP: Promoting Incomplete Multi-Modal Brain Tumor Segmentation With Alignment

  • Tianyi Liu
  • Zhaorui Tan
  • Muyin Chen
  • Xi Yang
  • Haochuan Jiang
  • Kaizhu Huang

Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts to address the missing modality problem in brain tumor segmentation typically overlook the modality gaps and thus fail to learn important invariant feature representations across different modalities. Such drawback consequently leads to limited performance for missing modality models. To ameliorate these problems, pre-trained models are used in natural visual segmentation tasks to minimize the gaps. However, promising pre-trained models are difficult to obtain in the brain tumor segmentation task due to the lack of sufficient data. Along this line, in this paper, we propose a novel paradigm that aligns latent features of involved modalities to a well-defined distribution anchor as the substitution of the pre-trained model. As a major contribution, we prove that our novel training paradigm ensures a tight evidence lower bound, thus theoretically certifying its effectiveness. Extensive experiments on different backbones validate that the proposed paradigm can enable invariant feature representations and produce models with narrowed modality gaps. Models with our alignment paradigm show their superior performance on both BraTS2018, BraTS2020 and Brain Metastasis datasets.

AAAI Conference 2026 Conference Paper

Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training

  • Xi Yang
  • Han Zhang
  • Zhijian Lin
  • Yibiao Hu
  • Hong Han

Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports, which refers to place authentic images into out-of-context or fabricated image-text pairings. This problem has attracted significant attention from researchers in recent years. Current methods focus on assessing image-text consistency or generating explanations. However, these approaches assume that the training and test data are drawn from the same distribution. When encountering novel news domains, models tend to perform poorly due to the lack of prior knowledge. To address this challenge, we propose Variational Domain-Invariant Learning with Test-Time Training (VDT) framework to enhance the domain adaptation capability for OOC misinformation detection. Domain-Invariant Variational Align module is employed to jointly encodes source and target domain data to learn a separable distributional space and domain-invariant features. For preserving semantic integrity, we utilize domain consistency constraint module to reconstruct the source and target domain latent distribution. During testing phase, we adopt the test-time training strategy and confidence-variance filtering module to dynamically updating the VAE encoder and classifier, facilitating the model's adaptation to the target domain distribution. Extensive experiments conducted on the benchmark dataset NewsCLIPpings demonstrate that our method outperforms state-of-the-art baselines under most domain adaptation settings.

AAAI Conference 2026 Conference Paper

StyleProto: Style-Augmented Prototype Learning for Cross-Domain Few-Shot Object Detection

  • Xi Yang
  • Quantao Xie

Cross-Domain Few-Shot Object Detection (CD-FSOD) faces significant challenges due to the dual issues of domain shift and limited labeled samples. One major challenge is style bias, caused by limited support samples that fail to represent the target domain’s style diversity. Another is feature confusion, which stems from distribution shifts and limited supervision, manifesting as both object-background ambiguity and object-object confusion. To address these challenges, we propose Style-Augmented Prototype Learning (StyleProto), which constructs style-aware prototypes from support samples with diverse visual styles, and refines them via spatial weighting and discriminative fusion. Specifically, our StyleProto consists of three components: (1) Style Generation Augmentation (SGA); (2) Semantic-Focused Prototype Construction (SPC); (3) Hierarchical Prototype Fusion Aggregator (HPFA). SGA synthesizes style-diverse yet semantically consistent training samples by recombining style statistics from the support set, thus improving robustness to unseen styles. SPC aggregates support features using spatial attention to highlight object semantics and suppress background noise, yielding cleaner and more distinctive class prototypes. HPFA leverages query-guided attention to integrate discriminative support features, enhancing prototype representations with richer class-specific details. Extensive experiments on multiple benchmarks demonstrate that StyleProto consistently outperforms existing state-of-the-art methods.

ICML Conference 2025 Conference Paper

Clustering Properties of Self-Supervised Learning

  • Xi Weng
  • Jianing An
  • Xudong Ma
  • Binhang Qi
  • Jie Luo 0004
  • Xi Yang
  • Jin Song Dong 0001
  • Lei Huang 0015

Self-supervised learning (SSL) methods via joint embedding architectures have proven remarkably effective at capturing semantically rich representations with strong clustering properties, magically in the absence of label supervision. Despite this, few of them have explored leveraging these untapped properties to improve themselves. In this paper, we provide an evidence through various metrics that the encoder’s output encoding exhibits superior and more stable clustering properties compared to other components. Building on this insight, we propose a novel positive-feedback SSL method, termed Re presentation S elf- A ssignment (ReSA), which leverages the model’s clustering properties to promote learning in a self-guided manner. Extensive experiments on standard SSL benchmarks reveal that models pretrained with ReSA outperform other state-of-the-art SSL methods by a significant margin. Finally, we analyze how ReSA facilitates better clustering properties, demonstrating that it effectively enhances clustering performance at both fine-grained and coarse-grained levels, shaping representations that are inherently more structured and semantically meaningful.

AAAI Conference 2025 Conference Paper

Disentangling Tabular Data Towards Better One-Class Anomaly Detection

  • Jianan Ye
  • Zhaorui Tan
  • Yijie Hu
  • Xi Yang
  • Guangliang Cheng
  • Kaizhu Huang

Tabular anomaly detection under the one-class classification setting poses a significant challenge, as it involves accurately conceptualizing "normal" derived exclusively from a single category to discern anomalies from normal data variations. Capturing the intrinsic correlation among attributes within normal samples presents one promising method for learning the concept. To do so, the most recent effort relies on a learnable mask strategy with a reconstruction task. However, this wisdom may suffer from the risk of producing uniform masks, i.e., essentially nothing is masked, leading to less effective correlation learning. To address this issue, we presume that attributes related to others in normal samples can be divided into two non-overlapping and correlated subsets, defined as CorrSets, to capture the intrinsic correlation effectively. Accordingly, we introduce an innovative method that disentangles CorrSets from normal tabular data. To our knowledge, this is a pioneering effort to apply the concept of disentanglement for one-class anomaly detection on tabular data. Extensive experiments on 20 tabular datasets show that our method substantially outperforms the state-of-the-art methods and leads to an average performance improvement of 6.1% on AUC-PR and 2.1% on AUC-ROC.

AAAI Conference 2025 Conference Paper

Dual Information Purification for Lightweight SAR Object Detection

  • Xi Yang
  • Jiachen Sun
  • Songsong Duan
  • De Cheng

Synthetic aperture radar (SAR) object detection requires accurate identification and localization of targets at various scales within SAR images. However, background clutter and speckle noise can obscure key features and mislead the knowledge distillation process. To address these challenges, we introduce the Dual Information Purification Knowledge Distillation (DIPKD) method, which improves the performance of the student model through three key strategies: denoising, enrichment, and decoupling. First, our Selective Noise Suppression (SNS) technique reduces speckle noise in global features by minimizing misleading information from the teacher model. Second, the Knowledge Level Decoupling (KLD) module separates features into target and non-target knowledge, balancing feature mapping and reducing background noise to enhance the extraction of critical information for the student model. Finally, the Reverse Information Transfer (RIT) module refines intermediate features in the student model, compensating for the loss of detailed local information. Experimental results demonstrate that DIPKD significantly outperforms existing distillation techniques in SAR object detection, achieving 60.2% and 51.4% mAP scores on the SSDD and HRSID datasets, respectively. Additionally, the student model shows performance improvements of 1.3% and 2.9% over the teacher model, highlighting the effectiveness of the information purification approach.

AAAI Conference 2025 Conference Paper

Optimizing Label Assignment for Weakly Supervised Person Search

  • Haiyang Zhu
  • Xi Yang
  • Nannan Wang

Weakly supervised person search aims to detect and match individuals using only bounding box annotations jointly. The existing methods mainly alternate between the clustering stage and the training stage, where the former is responsible for instance level label allocation tasks and the latter needs to undertake proposal level label allocation tasks. In the clustering phase, the conventional use of the DBSCAN algorithm for clustering pedestrian instance features often neglects key contextual information such as scene context and relative positioning of individuals. During the training phase, the Region Proposal Network assigns labels based on the MaxIoU, which tends to produce locally ambiguous labels. Finally, the proposals updated to the memory bank with extensive background information tend to interfere with the task of pseudo-label generation. To address these issues, this paper proposes an Optimizing Label Assignment (OLA) for weakly supervised person search. Firstly, in the clustering phase, Context Aware Clustering is introduced to integrate contextual information and constraints, enhancing the accuracy of clustering. Secondly, in the training phase, we adopt Prototype Matching based on Optimal Transport theory to optimize label distribution from a global perspective. Furthermore, we propose Dual Memory Bank Enhancement that effectively enhances the accuracy of label assignment. Extensive experiments conducted on the CUHK-SYSU and PRW datasets demonstrate that our method achieves state-of-the-art performance in weakly supervised person search.

AAAI Conference 2025 Conference Paper

RoPaSS: Robust Watermarking for Partial Screen-Shooting Scenarios

  • Zehua Ma
  • Han Fang
  • Xi Yang
  • Kejiang Chen
  • Weiming Zhang

Screen-shooting robust watermarking is an effective means of preventing screen content leakage from unauthorized camera shooting, as it can trace the leaked source through the watermark extraction thereby providing an effective deterrent. However, current screen-shooting resilient watermarking schemes rely on the image's contours to synchronize and then extract the watermark. While in practical applications, it's common for only a portion of the image to be captured, resulting in a limited performance of the previous watermarking schemes. To address this problem, we propose the RoPaSS: a robust watermarking scheme for partial screen-shooting scenarios, which effectively constructs symmetric characteristics on the embedding watermark to handle the sticky re-synchronization issue. Specifically, RoPaSS consists of a watermark encoder, a decoder, and three estimators, which are trained in two stages. In the first training stage, RoPaSS integrates the flipping operation into the watermark encoder and decoder training to increase the redundancy of watermark messages and artificially guide the generation of symmetric watermarks. In the second stage, estimators utilize the watermark symmetry as an additional reference to estimate the restoration parameters to resynchronize the partially captured watermarked image. Experiments have demonstrated the excellent performance of RoPaSS in partial screen-shooting traceability, with extraction accuracy of above 93% in frontal shooting and above 86% in 30° shooting even if only 50% of the image content is captured.

NeurIPS Conference 2025 Conference Paper

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

  • Chen Yang
  • Hui Wang
  • Shiyao Wang
  • Junyang Chen
  • Jiabei He
  • Jiaming Zhou
  • Xi Yang
  • Yequan Wang

While voice technologies increasingly serve aging populations, current systems exhibit significant performance gaps due to inadequate training data capturing elderly-specific vocal characteristics like presbyphonia and dialectal variations. The limited data available on super-aged individuals in existing elderly speech datasets, coupled with overly simple recording styles and annotation dimensions, exacerbates this issue. To address the critical scarcity of speech data from individuals aged 75 and above, we introduce SeniorTalk, a carefully annotated Chinese spoken dialogue dataset. This dataset contains 55. 53 hours of speech from 101 natural conversations involving 202 participants, ensuring a strategic balance across gender, region, and age. Through detailed annotation across multiple dimensions, it can support a wide range of speech tasks. We perform extensive experiments on speaker verification, speaker diarization, speech recognition, and speech editing tasks, offering crucial insights for the development of speech technologies targeting this age group. Code is available at https: //github. com/flageval-baai/SeniorTalk and data at https: //huggingface. co/datasets/evan0617/seniortalk.

JBHI Journal 2025 Journal Article

TDSFE-Net: A Temporal Dual-Stream Feature Extraction Network for Depression Detection From EEG

  • Mingyang Li
  • Zhiwei Wang
  • Xi Yang
  • Tao Zhang

Early detection and diagnosis are critical for effective depression management. Although electroence-phalography (EEG) can provide an objective basis for the auxiliary diagnosis of depression, decoding depression-related brain activity from EEG is a highly challenging task due to the inherent complexity, dynamism, and non-linearity. Therefore, this study introduces a novel temporal dual-stream feature extraction network (TDSFE-Net) that incorporates multiple attention mechanisms. Specially, we first develop a dynamic fusion weight based local-global attention mechanism into the hierarchiclal temporal-separable convolutional network (TSCN) to automatically capture the temporal dynamic characteristics of the EEG signal. Subsequently, a channel-wise module is designed to reveal the key temporal information in spatial dimensions. Finally, a softmax with full conected layer is used as classifier. The TDSFE-Net achieved impressive classification accuracies of 98. 72%, 96. 91%, and 99. 53% on the MODMA, HUSM, and Hospital datasets, respectively. In addition, this study also reveals the pattern of correlation between the activity of specific brain regions and depression, providing a new perspective and scientific basis for discovering biomarkers and studying the neural mechanisms of depression.

NeurIPS Conference 2025 Conference Paper

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

  • Xuannan Liu
  • Zekun Li
  • Zheqi He
  • Peipei Li
  • shuhan xia
  • Xing Cui
  • Huaibo Huang
  • Xi Yang

The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal dynamics of video that may induce distinct safety risks. To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark designed to evaluate the safety of LVLMs under video-text attacks. It comprises 2, 264 video-text pairs spanning 48 fine-grained unsafe categories, each pairing a synthesized video with either a harmful query, which contains explicit malice, or a benign query, which appears harmless but triggers harmful behavior when interpreted alongside the video. To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images (what is shown) and motion text (how it moves), which jointly guide the synthesis of query-relevant videos. To effectively evaluate uncertain or borderline harmful outputs, we propose RJScore, a novel LLM-based metric that incorporates the confidence of judge models and human-aligned decision threshold calibration. Extensive experiments show that benign-query video composition achieves average attack success rates of 67. 2%, revealing consistent vulnerabilities to video-induced attacks. We believe Video-SafetyBench will catalyze future research into video-based safety evaluation and defense strategies.

IJCAI Conference 2024 Conference Paper

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

  • Zheqi He
  • Xinya Wu
  • Pengfei Zhou
  • Richeng Xuan
  • Guang Liu
  • Xi Yang
  • Qiannan Zhu
  • Hua Huang

Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3, 603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose an evaluation strategy called Positional Error Variance for assessing multiple-choice questions. The strategy aims to perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs. The data and code are available at https: //github. com/FlagOpen/CMMU.

NeurIPS Conference 2024 Conference Paper

DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection

  • Xiao Yu
  • Yuang Qi
  • Kejiang Chen
  • Guoqiang Chen
  • Xi Yang
  • Pengyuan Zhu
  • Xiuwei Shang
  • Weiming Zhang

Large language models (LLMs) have the potential to generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Consequently, detecting whether a text is generated by LLMs has become increasingly important. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. However, since we do not have access to the interior of the black-box model, we must resort to surrogate models, which impacts detection quality. In order to achieve high-quality detection of black-box models, we would like to extract deep intrinsic characteristics of the black-box model generated texts. We view the generation process as a coupled process of prompt and intrinsic characteristics of the generative model. Based on this insight, we propose to decouple prompt and intrinsic characteristics (DPIC) for LLM-generated text detection method. Specifically, given a candidate text, DPIC employs an auxiliary LLM to reconstruct the prompt corresponding to the candidate text, then uses the prompt to regenerate text by the auxiliary LLM, which makes the candidate text and the regenerated text align with their prompts, respectively. Then, the similarity between the candidate text and the regenerated text is used as a detection feature, thus eliminating the prompt in the detection process, which allows the detector to focus on the intrinsic characteristics of the generative model. Compared to the baselines, DPIC has achieved an average improvement of 6. 76\% and 2. 91\% in detecting texts from different domains generated by GPT4 and Claude3, respectively.

NeurIPS Conference 2024 Conference Paper

Feature-Level Adversarial Attacks and Ranking Disruption for Visible-Infrared Person Re-identification

  • Xi Yang
  • Huanling liu
  • De Cheng
  • Nannan Wang
  • Xinbo Gao

Visible-infrared person re-identification (VIReID) is widely used in fields such as video surveillance and intelligent transportation, imposing higher demands on model security. In practice, the adversarial attacks based on VIReID aim to disrupt output ranking and quantify the security risks of models. Although numerous studies have been emerged on adversarial attacks and defenses in fields such as face recognition, person re-identification, and pedestrian detection, there is currently a lack of research on the security of VIReID systems. To this end, we propose to explore the vulnerabilities of VIReID systems and prevent potential serious losses due to insecurity. Compared to research on single-modality ReID, adversarial feature alignment and modality differences need to be particularly emphasized. Thus, we advocate for feature-level adversarial attacks to disrupt the output rankings of VIReID systems. To obtain adversarial features, we introduce \textit{Universal Adversarial Perturbations} (UAP) to simulate common disturbances in real-world environments. Additionally, we employ a \textit{Frequency-Spatial Attention Module} (FSAM), integrating frequency information extraction and spatial focusing mechanisms, and further emphasize important regional features from different domains on the shared features. This ensures that adversarial features maintain consistency within the feature space. Finally, we employ an \textit{Auxiliary Quadruple Adversarial Loss} to amplify the differences between modalities, thereby improving the distinction and recognition of features between visible and infrared images, which causes the system to output incorrect rankings. Extensive experiments on two VIReID benchmarks (i. e. , SYSU-MM01, RegDB) and different systems validate the effectiveness of our method.

AAAI Conference 2024 Conference Paper

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

  • Ge Gao
  • Xi Yang
  • Min Chi

Reinforcement learning (RL) is broadly employed in human-involved systems to enhance human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since online policy learning and evaluation can be high-stake. Intelligent tutoring has raised tremendous attentions as highly challenging when applying OPE to human-involved systems, due to that students' subgroups can favor different pedagogical policies and the costly procedure that policies have to be induced fully offline and then directly deployed to the upcoming semester. In this work, we formulate on-demand pedagogical policy selection (ODPS) to tackle the challenges for OPE in intelligent tutoring. We propose a pipeline, EduPlanner, as a concrete solution for ODPS. Our pipeline results in an theoretically unbiased estimator, and enables efficient and customized policy selection by identifying subgroups over both historical data and on-arrival initial logs. We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EduPlanner, especially for the ones associated with low-performing subgroups.

NeurIPS Conference 2024 Conference Paper

Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

  • Zhaorui Tan
  • Xi Yang
  • Qiufeng Wang
  • Anh Nguyen
  • Kaizhu Huang

Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.

AAAI Conference 2024 Conference Paper

MuST: Robust Image Watermarking for Multi-Source Tracing

  • Guanjie Wang
  • Zehua Ma
  • Chang Liu
  • Xi Yang
  • Han Fang
  • Weiming Zhang
  • Nenghai Yu

In recent years, with the popularity of social media applications, massive digital images are available online, which brings great convenience to image recreation. However, the use of unauthorized image materials in multi-source composite images is still inadequately regulated, which may cause significant loss and discouragement to the copyright owners of the source image materials. Ideally, deep watermarking techniques could provide a solution for protecting these copyrights based on their encoder-noise-decoder training strategy. Yet existing image watermarking schemes, which are mostly designed for single images, cannot well address the copyright protection requirements in this scenario, since the multi-source image composing process commonly includes distortions that are not well investigated in previous methods, e.g., the extreme downsizing. To meet such demands, we propose MuST, a multi-source tracing robust watermarking scheme, whose architecture includes a multi-source image detector and minimum external rectangle operation for multiple watermark resynchronization and extraction. Furthermore, we constructed an image material dataset covering common image categories and designed the simulation model of the multi-source image composing process as the noise layer. Experiments demonstrate the excellent performance of MuST in tracing sources of image materials from the composite images compared with SOTA watermarking methods, which could maintain the extraction accuracy above 98% to trace the sources of at least 3 different image materials while keeping the average PSNR of watermarked image materials higher than 42.51 dB. We released our code on https://github.com/MrCrims/MuST

NeurIPS Conference 2024 Conference Paper

Off-Policy Selection for Initiating Human-Centric Experimental Design

  • Ge Gao
  • Xi Yang
  • Qitong Gao
  • Song Ju
  • Miroslav Pajic
  • Min Chi

In human-centric applications like healthcare and education, the \textit{heterogeneity} among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a \textit{pivotal challenge} in human-centric systems (HCSs): \textbf{\textit{how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? }} We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.

AAAI Conference 2024 Conference Paper

Point Deformable Network with Enhanced Normal Embedding for Point Cloud Analysis

  • Xingyilang Yin
  • Xi Yang
  • Liangchen Liu
  • Nannan Wang
  • Xinbo Gao

Recently MLP-based methods have shown strong performance in point cloud analysis. Simple MLP architectures are able to learn geometric features in local point groups yet fail to model long-range dependencies directly. In this paper, we propose Point Deformable Network (PDNet), a concise MLP-based network that can capture long-range relations with strong representation ability. Specifically, we put forward Point Deformable Aggregation Module (PDAM) to improve representation capability in both long-range dependency and adaptive aggregation among points. For each query point, PDAM aggregates information from deformable reference points rather than points in limited local areas. The deformable reference points are generated data-dependent, and we initialize them according to the input point positions. Additional offsets and modulation scalars are learned on the whole point features, which shift the deformable reference points to the regions of interest. We also suggest estimating the normal vector for point clouds and applying Enhanced Normal Embedding (ENE) to the geometric extractors to improve the representation ability of single-point. Extensive experiments and ablation studies on various benchmarks demonstrate the effectiveness and superiority of our PDNet.

NeurIPS Conference 2024 Conference Paper

SA3DIP: Segment Any 3D Instance with Potential 3D Priors

  • Xi Yang
  • Xu Gu
  • Xingyilang Yin
  • Xinbo Gao

The proliferation of 2D foundation models has sparked research into adapting them for open-world 3D instance segmentation. Recent methods introduce a paradigm that leverages superpoints as geometric primitives and incorporates 2D multi-view masks from Segment Anything model (SAM) as merging guidance, achieving outstanding zero-shot instance segmentation results. However, the limited use of 3D priors restricts the segmentation performance. Previous methods calculate the 3D superpoints solely based on estimated normal from spatial coordinates, resulting in under-segmentation for instances with similar geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D space suffers from over-segmentation due to SAM's inherent part-level segmentation tendency. To address these issues, we propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors. Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors, which reduces the initial errors that accumulate in subsequent procedures. On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process. Furthermore, we notice a considerable portion of low-quality ground truth annotations in ScanNetV2 benchmark, which affect the fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth labels and supplement additional instances for 3D class-agnostic instance segmentation. Experimental evaluations on various 2D-3D datasets demonstrate the effectiveness and robustness of our approach. Our code and proposed ScanNetV2-INS dataset are available HERE.

AAAI Conference 2024 Conference Paper

Semantic-Aware Data Augmentation for Text-to-Image Synthesis

  • Zhaorui Tan
  • Xi Yang
  • Kaizhu Huang

Data augmentation has been recently leveraged as an effective regularizer in various vision-language deep neural networks. However, in text-to-image synthesis (T2Isyn), current augmentation wisdom still suffers from the semantic mismatch between augmented paired data. Even worse, semantic collapse may occur when generated images are less semantically constrained. In this paper, we develop a novel Semantic-aware Data Augmentation (SADA) framework dedicated to T2Isyn. In particular, we propose to augment texts in the semantic space via an Implicit Textual Semantic Preserving Augmentation, in conjunction with a specifically designed Image Semantic Regularization Loss as Generated Image Semantic Conservation, to cope well with semantic mismatch and collapse. As one major contribution, we theoretically show that Implicit Textual Semantic Preserving Augmentation can certify better text-image consistency while Image Semantic Regularization Loss regularizing the semantics of generated images would avoid semantic collapse and enhance image quality. Extensive experiments validate that SADA enhances text-image consistency and improves image quality significantly in T2Isyn models across various backbones. Especially, incorporating SADA during the tuning process of Stable Diffusion models also yields performance improvements.

TIST Journal 2024 Journal Article

Toward Ubiquitous Interaction-Attentive and Extreme-Aware Crowd Activity Level Prediction

  • Huiqun Huang
  • Xi Yang
  • Suining He
  • Mahan Tabatabaie

Accurate prediction of citywide crowd activity levels (CALs), i.e., the numbers of participants of citywide crowd activities under different venue categories at certain time and locations, is essential for the city management, the personal service applications, and the entrepreneurs in commercial strategic planning. Existing studies have not thoroughly taken into account the complex spatial and temporal interactions among different categories of CALs and their extreme occurrences, leading to lowered adaptivity and accuracy of their models. To address above concerns, we have proposed IE-CALP, a novel spatio-temporal I nteractive attention-based and E xtreme-aware model for C rowd A ctivity L evel P rediction. The tasks of IE-CALP consist of (a) forecasting the spatial distributions of various CALs at different city regions (spatial CALs), and (b) predicting the number of participants per category of the CALs (categorical CALs). To realize above, we have designed a novel spatial CAL-POI interaction-attentive learning component in IE-CALP to model the spatial interactions across different CAL categories, as well as those among the spatial urban regions and CALs. In addition, IE-CALP incorporate the multi-level trends (e.g., daily and weekly levels of temporal granularity) of CALs through a multi-level temporal feature learning component. Furthermore, to enhance the model adaptivity to extreme CALs (e.g., during extreme urban events or weather conditions), we further take into account the extreme value theory and model the impacts of historical CALs upon the occurrences of extreme CALs. Extensive experiments upon a total of 738,715 CAL records and 246,660 POIs in New York City (NYC), Los Angeles (LA), and Tokyo have further validated the accuracy, adaptivity, and effectiveness of IE-CALP ’s interaction-attentive and extreme-aware CAL predictions.

AAAI Conference 2024 Conference Paper

Unraveling Batch Normalization for Realistic Test-Time Adaptation

  • Zixian Su
  • Jingwei Guo
  • Kai Yao
  • Xi Yang
  • Qiufeng Wang
  • Kaizhu Huang

While recent test-time adaptations exhibit efficacy by adjusting batch normalization to narrow domain disparities, their effectiveness diminishes with realistic mini-batches due to inaccurate target estimation. As previous attempts merely introduce source statistics to mitigate this issue, the fundamental problem of inaccurate target estimation still persists, leaving the intrinsic test-time domain shifts unresolved. This paper delves into the problem of mini-batch degradation. By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch. Drawing upon this insight, we introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches. Importantly, our TEMA adaptively extends the scope of typical methods beyond the current batch to incorporate a diverse set of class information, which in turn boosts an accurate target estimation. Built upon this foundation, we further design a novel layer-wise rectification strategy to consistently promote test-time performance. Our proposed method enjoys a unique advantage as it requires neither training nor tuning parameters, offering a truly hassle-free solution. It significantly enhances model robustness against shifted domains and maintains resilience in diverse real-world scenarios with various batch sizes, achieving state-of-the-art performance on several major benchmarks. Code is available at https://github.com/kiwi12138/RealisticTTA.

AAAI Conference 2023 Conference Paper

AutoStegaFont: Synthesizing Vector Fonts for Hiding Information in Documents

  • Xi Yang
  • Jie Zhang
  • Han Fang
  • Chang Liu
  • Zehua Ma
  • Weiming Zhang
  • Nenghai Yu

Hiding information in text documents has been a hot topic recently, with the most typical schemes of utilizing fonts. By constructing several fonts with similar appearances, information can be effectively represented and embedded in documents. However, due to the unstructured characteristic, font vectors are more difficult to synthesize than font images. Existing methods mainly use handcrafted features to design the fonts manually, which is time-consuming and labor-intensive. Moreover, due to the diversity of fonts, handcrafted features are not generalizable to different fonts. Besides, in practice, since documents might be distorted through transmission, ensuring extractability under distortions is also an important requirement. Therefore, three requirements are imposed on vector font generation in this domain: automaticity, generalizability, and robustness. However, none of the existing methods can satisfy these requirements well and simultaneously. To satisfy the above requirements, we propose AutoStegaFont, an automatic vector font synthesis scheme for hiding information in documents. Specifically, we design a two-stage and dual-modality learning framework. In the first stage, we jointly train an encoder and a decoder to invisibly encode the font images with different information. To ensure robustness, we target designing a noise layer to work with the encoder and decoder during training. In the second stage, we employ a differentiable rasterizer to establish a connection between the image and the vector modality. Then, we design an optimization algorithm to convey the information from the encoded image to the corresponding vector. Thus the encoded font vectors can be automatically generated. Extensive experiments demonstrate the superior performance of our scheme in automatically synthesizing vector fonts for hiding information in documents, with robustness to distortions caused by low-resolution screenshots, printing, and photography. Besides, the proposed framework has better generalizability to fonts with diverse styles and languages.

IJCAI Conference 2023 Conference Paper

Hierarchical Apprenticeship Learning for Disease Progression Modeling

  • Xi Yang
  • Ge Gao
  • Min Chi

Disease progression modeling (DPM) plays an essential role in characterizing patients' historical pathways and predicting their future risks. Apprenticeship learning (AL) aims to induce decision-making policies by observing and imitating expert behaviors. In this paper, we investigate the incorporation of AL-derived patterns into DPM, utilizing a Time-aware Hierarchical EM Energy-based Subsequence (THEMES) AL approach. To the best of our knowledge, this is the first study incorporating AL-derived progressive and interventional patterns for DPM. We evaluate the efficacy of this approach in a challenging task of septic shock early prediction, and our results demonstrate that integrating the AL-derived patterns significantly enhances the performance of DPM.

NeurIPS Conference 2023 Conference Paper

IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers

  • Zhenglin Huang
  • Xiaoan Bao
  • Na Zhang
  • Qingqi Zhang
  • Xiao Tu
  • Biao Wu
  • Xi Yang

Data augmentation has been proven effective for training high-accuracy convolutional neural network classifiers by preventing overfitting. However, building deep neural networks in real-world scenarios requires not only high accuracy on clean data but also robustness when data distributions shift. While prior methods have proposed that there is a trade-off between accuracy and robustness, we propose IPMix, a simple data augmentation approach to improve robustness without hurting clean accuracy. IPMix integrates three levels of data augmentation (image-level, patch-level, and pixel-level) into a coherent and label-preserving technique to increase the diversity of training data with limited computational overhead. To further improve the robustness, IPMix introduces structural complexity at different levels to generate more diverse images and adopts the random mixing method for multi-scale information fusion. Experiments demonstrate that IPMix outperforms state-of-the-art corruption robustness on CIFAR-C and ImageNet-C. In addition, we show that IPMix also significantly improves the other safety measures, including robustness to adversarial perturbations, calibration, prediction consistency, and anomaly detection, achieving state-of-the-art or comparable results on several benchmarks, including ImageNet-R, ImageNet-A, and ImageNet-O.

AAAI Conference 2023 Conference Paper

MaskBooster: End-to-End Self-Training for Sparsely Supervised Instance Segmentation

  • Shida Zheng
  • Chenshu Chen
  • Xi Yang
  • Wenming Tan

The present paper introduces sparsely supervised instance segmentation, with the datasets being fully annotated bounding boxes and sparsely annotated masks. A direct solution to this task is self-training, which is not fully explored for instance segmentation yet. In this paper, we propose MaskBooster for sparsely supervised instance segmentation (SpSIS) with comprehensive usage of pseudo masks. MaskBooster is featured with (1) dynamic and progressive pseudo masks from an online updating teacher model, (2) refining binary pseudo masks with the help of bounding box prior, (3) learning inter-class prediction distribution via knowledge distillation for soft pseudo masks. As an end-to-end and universal self-training framework, MaskBooster can empower fully supervised algorithms and boost their segmentation performance on SpSIS. Abundant experiments are conducted on COCO and BDD100K datasets and validate the effectiveness of MaskBooster. Specifically, on different COCO protocols and BDD100K, we surpass sparsely supervised baseline by a large margin for both Mask RCNN and ShapeProp. MaskBooster on SpSIS also outperforms weakly and semi-supervised instance segmentation state-of-the-art on the datasets with similar annotation budgets.

JBHI Journal 2023 Journal Article

Mind the Gap: Alleviating Local Imbalance for Unsupervised Cross-Modality Medical Image Segmentation

  • Zixian Su
  • Kai Yao
  • Xi Yang
  • Qiufeng Wang
  • Yuyao Yan
  • Jie Sun
  • Kaizhu Huang

Unsupervised cross-modality medical image adaptation aims to alleviate the severe domain gap between different imaging modalities without using the target domain label. A key in this campaign relies upon aligning the distributions of source and target domain. One common attempt is to enforce the global alignment between two domains, which, however, ignores the fatal local-imbalance domain gap problem, i. e. , some local features with larger domain gap are harder to transfer. Recently, some methods conduct alignment focusing on local regions to improve the efficiency of model learning. While this operation may cause a deficiency of critical information from contexts. To tackle this limitation, we propose a novel strategy to alleviate the domain gap imbalance considering the characteristics of medical images, namely Global-Local Union Alignment. Specifically, a feature-disentanglement style-transfer module first synthesizes the target-like source images to reduce the global domain gap. Then, a local feature mask is integrated to reduce the ‘inter-gap’ for local features by prioritizing those discriminative features with larger domain gap. This combination of global and local alignment can precisely localize the crucial regions in segmentation target while preserving the overall semantic consistency. We conduct a series of experiments with two cross-modality adaptation tasks, i, e. cardiac substructure and abdominal multi-organ segmentation. Experimental results indicate that our method achieves state-of-the-art performance in both tasks.

AAAI Conference 2023 Conference Paper

Rethinking Data Augmentation for Single-Source Domain Generalization in Medical Image Segmentation

  • Zixian Su
  • Kai Yao
  • Xi Yang
  • Kaizhu Huang
  • Qiufeng Wang
  • Jie Sun

Single-source domain generalization (SDG) in medical image segmentation is a challenging yet essential task as domain shifts are quite common among clinical image datasets. Previous attempts most conduct global-only/random augmentation. Their augmented samples are usually insufficient in diversity and informativeness, thus failing to cover the possible target domain distribution. In this paper, we rethink the data augmentation strategy for SDG in medical image segmentation. Motivated by the class-level representation invariance and style mutability of medical images, we hypothesize that unseen target data can be sampled from a linear combination of C (the class number) random variables, where each variable follows a location-scale distribution at the class level. Accordingly, data augmented can be readily made by sampling the random variables through a general form. On the empirical front, we implement such strategy with constrained Bezier transformation on both global and local (i.e. class-level) regions, which can largely increase the augmentation diversity. A Saliency-balancing Fusion mechanism is further proposed to enrich the informativeness by engaging the gradient information, guiding augmentation with proper orientation and magnitude. As an important contribution, we prove theoretically that our proposed augmentation can lead to an upper bound of the generalization risk on the unseen target domain, thus confirming our hypothesis. Combining the two strategies, our Saliency-balancing Location-scale Augmentation (SLAug) exceeds the state-of-the-art works by a large margin in two challenging SDG tasks. Code is available at https://github.com/Kaiseem/SLAug.

JBHI Journal 2022 Journal Article

A Novel 3D Unsupervised Domain Adaptation Framework for Cross-Modality Medical Image Segmentation

  • Kai Yao
  • Zixian Su
  • Kaizhu Huang
  • Xi Yang
  • Jie Sun
  • Amir Hussain
  • Frans Coenen

We consider the problem of volumetric (3D) unsupervised domain adaptation (UDA) in cross-modality medical image segmentation, aiming to perform segmentation on the unannotated target domain (e. g. MRI) with the help of labeled source domain (e. g. CT). Previous UDA methods in medical image analysis usually suffer from two challenges: 1) they focus on processing and analyzing data at 2D level only, thus missing semantic information from the depth level; 2) one-to-one mapping is adopted during the style-transfer process, leading to insufficient alignment in the target domain. Different from the existing methods, in our work, we conduct a first of its kind investigation on multi-style image translation for complete image alignment to alleviate the domain shift problem, and also introduce 3D segmentation in domain adaptation tasks to maintain semantic consistency at the depth level. In particular, we develop an unsupervised domain adaptation framework incorporating a novel quartet self-attention module to efficiently enhance relationships between widely separated features in spatial regions on a higher dimension, leading to a substantial improvement in segmentation accuracy in the unlabeled target domain. In two challenging cross-modality tasks, specifically brain structures and multi-organ abdominal segmentation, our model is shown to outperform current state-of-the-art methods by a significant margin, demonstrating its potential as a benchmark resource for the biomedical and health informatics research community.

IJCAI Conference 2022 Conference Paper

A Reinforcement Learning-Informed Pattern Mining Framework for Multivariate Time Series Classification

  • Ge Gao
  • Qitong Gao
  • Xi Yang
  • Miroslav Pajic
  • Min Chi

Multivariate time series (MTS) classification is a challenging and important task in various domains and real-world applications. Much of prior work on MTS can be roughly divided into neural network (NN)- and pattern-based methods. The former can lead to robust classification performance, but many of the generated patterns are challenging to interpret; while the latter often produce interpretable patterns that may not be helpful for the classification task. In this work, we propose a reinforcement learning (RL) informed PAttern Mining framework (RLPAM) to identify interpretable yet important patterns for MTS classification. Our framework has been validated by 30 benchmark datasets as well as real-world large-scale electronic health records (EHRs) for an extremely challenging task: sepsis shock early prediction. We show that RLPAM outperforms the state-of-the-art NN-based methods on 14 out of 30 datasets as well as on the EHRs. Finally, we show how RL informed patterns can be interpretable and can improve our understanding of septic shock progression.

IJCAI Conference 2022 Conference Paper

Fine-tuning Deep Neural Networks by Interactively Refining the 2D Latent Space of Ambiguous Images

  • Jiafu Wei
  • Haoran Xie
  • Chia-Ming Chang
  • Xi Yang

Deep neural networks (DNNs) have achieved excellent results currently in classification, while they may still suffer from ambiguous images which are similar across classes. By contrast, humans have a relatively good ability to distinguish these categories of images. Therefore, we propose a human-in-the-loop solution to assist the network to better classify the images by leveraging human knowledge. To achieve this, we project the high-dimensional latent space trained by the network onto a two-dimensional workspace. The users can interactively modify the projected coordinates of inputs on the workspace using our designed tools, then the modified information will be fed back to the network to fine-tune it, which in turn affects the network's classification results, thereby improving the accuracy of network classification.

AAAI Conference 2022 Conference Paper

Tracing Text Provenance via Context-Aware Lexical Substitution

  • Xi Yang
  • Jie Zhang
  • Kejiang Chen
  • Weiming Zhang
  • Zehua Ma
  • Feng Wang
  • Nenghai Yu

Text content created by humans or language models is often stolen or misused by adversaries. Tracing text provenance can help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional text watermarking methods embed watermarks by slightly altering text format like line spacing and font, which, however, are fragile to cross-media transmissions like OCR. Considering this, natural language watermarking methods represent watermarks by replacing words in original sentences with synonyms from handcrafted lexical resources (e. g. , WordNet), but they do not consider the substitution’s impact on the overall sentence’s meaning. Recently, a transformer-based network was proposed to embed watermarks by modifying the unobtrusive words (e. g. , function words), which also impair the sentence’s logical and semantic coherence. Besides, one well-trained network fails on other different types of text content. To address the limitations mentioned above, we propose a natural language watermarking scheme based on contextaware lexical substitution (LS). Specifically, we employ BERT to suggest LS candidates by inferring the semantic relatedness between the candidates and the original sentence. Based on this, a selection strategy in terms of synchronicity and substitutability is further designed to test whether a word is exactly suitable for carrying the watermark signal. Extensive experiments demonstrate that, under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences and has a better transferability than existing methods. Besides, the proposed LS approach outperforms the state-of-the-art approach on the Stanford Word Substitution Benchmark.

IJCAI Conference 2021 Conference Paper

Multi-series Time-aware Sequence Partitioning for Disease Progression Modeling

  • Xi Yang
  • Yuan Zhang
  • Min Chi

Electronic healthcare records (EHRs) are comprehensive longitudinal collections of patient data that play a critical role in modeling the disease progression to facilitate clinical decision-making. Based on EHRs, in this work, we focus on sepsis -- a broad syndrome that can develop from nearly all types of infections (e. g. , influenza, pneumonia). The symptoms of sepsis, such as elevated heart rate, fever, and shortness of breath, are vague and common to other illnesses, making the modeling of its progression extremely challenging. Motivated by the recent success of a novel subsequence clustering approach: Toeplitz Inverse Covariance-based Clustering (TICC), we model the sepsis progression as a subsequence partitioning problem and propose a Multi-series Time-aware TICC (MT-TICC), which incorporates multi-series nature and irregular time intervals of EHRs. The effectiveness of MT-TICC is first validated via a case study using a real-world hand gesture dataset with ground-truth labels. Then we further apply it for sepsis progression modeling using EHRs. The results suggest that MT-TICC can significantly outperform competitive baseline models, including the TICC. More importantly, it unveils interpretable patterns, which sheds some light on better understanding the sepsis progression.

JBHI Journal 2021 Journal Article

Ophthalmic Disease Detection via Deep Learning With a Novel Mixture Loss Function

  • Xiong Luo
  • Jianyuan Li
  • Maojian Chen
  • Xi Yang
  • Xiangjun Li

With the popularization of computer-aided diagnosis (CAD) technologies, more and more deep learning methods are developed to facilitate the detection of ophthalmic diseases. In this article, the deep learning-based detections for some common eye diseases, including cataract, glaucoma, and age-related macular degeneration (AMD), are analyzed. Generally speaking, morphological change in retina reveals the presence of eye disease. Then, while using some existing deep learning methods to achieve this analysis task, the satisfactory performance may not be given, since fundus images usually suffer from the impact of data imbalance and outliers. It is, therefore, expected that with the exploration of effective and robust deep learning algorithms, the detection performance could be further improved. Here, we propose a deep learning model combined with a novel mixture loss function to automatically detect eye diseases, through the analysis of retinal fundus color images. Specifically, given the good generalization and robustness of focal loss and correntropy-induced loss functions in addressing complex dataset with class imbalance and outliers, we present a mixture of those two losses in deep neural network model to improve the recognition performance of classifier for biomedical data. The proposed model is evaluated on a real-life ophthalmic dataset. Meanwhile, the performance of deep learning model with our proposed loss function is compared with the baseline models, while adopting accuracy, sensitivity, specificity, Kappa, and area under the receiver operating characteristic curve (AUC) as the evaluation metrics. The experimental results verify the effectiveness and robustness of the proposed algorithm.

AAAI Conference 2021 Conference Paper

Training Binary Neural Network without Batch Normalization for Image Super-Resolution

  • Xinrui Jiang
  • Nannan Wang
  • Jingwei Xin
  • Keyu Li
  • Xi Yang
  • Xinbo Gao

Recently, binary neural network (BNN) based superresolution (SR) methods have enjoyed initial success in the SR field. However, there is a noticeable performance gap between the binarized model and the full-precision one. Furthermore, the batch normalization (BN) in binary SR networks introduces floating-point calculations, which is unfriendly to low-precision hardwares. Therefore, there is still room for improvement in terms of model performance and efficiency. Focusing on this issue, in this paper, we first explore a novel binary training mechanism based on the feature distribution, allowing us to replace all BN layers with a simple training method. Then, we construct a strong baseline by combining the highlights of recent binarization methods, which already surpasses the state-of-the-arts. Next, to train highly accurate binarized SR model, we also develop a lightweight network architecture and a multi-stage knowledge distillation strategy to enhance the model representation ability. Extensive experiments demonstrate that the proposed method not only presents advantages of lower computation as compared to conventional floating-point networks but outperforms the state-of-the-art binary methods on the standard SR networks.

IJCAI Conference 2019 Conference Paper

ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling

  • Yuan Zhang
  • Xi Yang
  • Julie Ivy
  • Min Chi

Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock.

IS Journal 2014 Journal Article

Computational Cognitive Models for Brain-Machine Collaborations

  • Zhongzhi Shi
  • Jianhua Zhang
  • Xi Yang
  • Gang Ma
  • Baoyuan Qi
  • Jinpeng Yue

Cyborg intelligence will integrate the best of both machine and biological intelligences via brain-machine integration. To make this integration effective and coadaptive, multiagents should work collaboratively. Here, three levels of computational cognitive models for brain-machine collaboration are presented--awareness-based, motivational-based, and joint-intention-based collaboration. Each collaboration level has its own principle and method.