Author name cluster

Chen Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

54 papers

2 author rows

AAAI Conference 2026 Conference Paper

CIA: Cluster-Instance Alignment for Unsupervised Day-Night Vehicle Re-Identification

Yongguo Ling
Chen Zhang
Yiming Liu
Wenhao Shao

Cross-time vehicle re-identification (Re-ID), especially across day and night conditions, remains a challenging problem due to drastic illumination variations that lead to significant domain shifts. While existing methods perform well under daytime scenarios, their effectiveness degrades severely in cross-domain settings, and fully supervised solutions demand costly annotations in both domains. In this paper, we introduce a new setting, Unsupervised Day-Night Vehicle Re-Identification (USL-DN-ReID), and propose a novel Cluster-Instance Alignment (CIA) framework to address it. CIA performs dual-level alignment: 1) at the cluster level, a Dictionary-Guided Graph Matching (DGM) module builds a cross-domain topological graph using soft similarities among cluster centers and solves global matching via the Hungarian algorithm; 2) at the instance level, a Multi-Factor Adaptive Alignment (MAA) module introduces a multi-factor adaptive weighting strategy that emphasizes high-confidence pairwise relations while suppressing noise. Together, these components enable robust and scalable cross-domain adaptation without requiring target-domain labels. Extensive experiments conducted on the DN-348 and DN-Wild benchmarks demonstrate the effectiveness and superiority of the proposed CIA framework, setting new state-of-the-art results on both datasets.

PDF Details DOI

JBHI Journal 2026 Journal Article

Edge Extension for Missing Anatomical Features: A Mask-Guided Spatial Diffusion Framework for Ultrasound Scoliosis Image Outpainting

Chen Zhang
Wei Guo
De Yang
Weidong Cai
Yongping Zheng
Sai Ho Ling

Accurate scoliosis diagnosis relies on precise spinal curvature measurement, traditionally using radiographic Cobb's angle. Ultrasound imaging offers a radiation-free alternative via ultrasound curve angle (UCA) estimation, but its clinical utility is limited by incomplete anatomical information due to the restricted field of view (FOV) during scanning. This hinders key tasks like segmentation and landmark detection, restricting ultrasound's broader adoption in scoliosis assessment. To address this challenge, we propose an edge-aware outpainting diffusion framework that restores missing spinal anatomy by integrating mask-guided spatial diffusion. Specifically, the model is trained to predict noise between randomly selected target windows and anchor regions using spatially encoded masks. During inference, a dedicated edge-preservation mechanism guides the generation of anatomical structures. In addition to mitigating hallucinations in diffusion-based generation and ensuring perceptual consistency between generated and retained regions, we incorporate a total variation loss to enforce structural smoothness and coherence across the entire output. This approach effectively constrains reconstruction within masked regions, improving the recovery of anatomical features—particularly in cases of complex S-shaped spinal deformities commonly seen in scoliosis ultrasound imaging, where the field of view is inherently limited. Extensive experiments demonstrate our approach achieves the lowest Fréchet Inception Distance (180. 97) and highest Inception Score (1. 87 $\pm$ 0. 12), while improving thoracic and lumbar UCA estimation accuracy by 47. 8% and 24. 6%, respectively. Thoracic structure detection also increases by 8. 1% compared to Swin-Unet. Low KL divergence and Wasserstein distance confirm strong distributional alignment between generated and real anatomy. Overall, our framework enables anatomically consistent outpainting under limited FOV, enhancing ultrasound's reliability for clinical scoliosis assessment.

Details DOI

AAAI Conference 2026 Conference Paper

Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter

Zhiyang Chen
Chen Zhang
Hao Fang
Runmin Cong

Underwater Instance Segmentation (UIS), integrating pixel-level understanding and instance-level discrimination, is a pivotal technology in marine resource exploration and ecological protection. In recent years, large-scale pretrained visual foundation models, exemplified by DINO, have advanced rapidly and demonstrated remarkable performance on complex downstream tasks. In this paper, we demonstrate that DINO can serve as an effective feature learner for UIS, and we introduce DiveSeg, a novel framework built upon two insightful components: (1) The AquaStyle Aligner, designed to embed underwater color style features into the DINO fine-tuning process, facilitating better adaptation to the underwater domain. (2) The ObjectPrior Prompter, which incorporates binary segmentation-based prompts to deliver object-level priors, provides essential guidance for instance segmentation task that requires both object- and instance-level reasoning. We conduct thorough experiments on the popular UIIS and USIS10K datasets, and the results show that DiveSeg achieves the state-of-the-art performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Function-on-Function Bayesian Optimization

Jingru Huang
Haijie Xu
Manrui Jiang
Chen Zhang

Bayesian optimization (BO) has been widely used to optimize expensive and gradient-free objective functions across various domains. However, existing BO methods have not addressed the objective where both inputs and outputs are functions, which increasingly arise in complex systems as advanced sensing technologies. To fill this gap, we propose a novel function-on-function Bayesian optimization (FFBO) framework. Specifically, we first introduce a function-on-function Gaussian process (FFGP) model with a separable operator-valued kernel to capture the correlations between function-valued inputs and outputs. Compared to existing Gaussian process models, FFGP is modeled directly in the function space. Based on FFGP, we define a scalar upper confidence bound (UCB) acquisition function using a weighted operator-based scalarization strategy. Then, a scalable functional gradient ascent algorithm (FGA) is developed to efficiently identify the optimal function-valued input. We further analyze the theoretical properties of the proposed method. Extensive experiments on synthetic and real-world data demonstrate the superior performance of FFBO over existing approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

NeuroBridge: Bio-Inspired Self-Supervised EEG-to-Image Decoding via Cognitive Priors and Bidirectional Semantic Alignment

Wenjiang Zhang
Sifeng Wang
Yuwei Su
Xinyu Li
Chen Zhang
Suyu Zhong

Visual neural decoding seeks to reconstruct or infer perceived visual stimuli from brain activity patterns, providing critical insights into human cognition and enabling transformative applications in brain-computer interfaces and artificial intelligence. Current approaches, however, remain constrained by the scarcity of high-quality stimulus-brain response pairs and the inherent semantic mismatch between neural representations and visual content. Inspired by perceptual variability and co-adaptive strategy of the biological systems, we propose a novel self-supervised architecture, named NeuroBridge, which integrates Cognitive Prior Augmentation (CPA) with Shared Semantic Projector (SSP) to promote effective cross-modality alignment. Specifically, CPA simulates perceptual variability by applying asymmetric, modality-specific transformations to both EEG signals and images, enhancing semantic diversity. Unlike previous approaches, SSP establishes a bidirectional alignment process through a co-adaptive strategy, which mutually aligns features from two modalities into a shared semantic space for effective cross-modal learning. NeuroBridge surpasses previous state-of-the-art methods under both intra-subject and inter-subject settings. In the intra-subject scenario, it achieves the improvements of 12.3% in top-1 accuracy and 10.2% in top-5 accuracy, reaching 63.2% and 89.9% respectively on a 200-way zero-shot retrieval task. Extensive experiments demonstrate the effectiveness, robustness, and scalability of the proposed framework for neural visual decoding.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Scaling Law for Large Wireless Models

Ziheng Liu
Jiayi Zhang
Haoyu Wang
Bokai Xu
Chen Zhang
Yiyang Zhu
Enyu Shi

Emerging from recent advances in foundation models, Large Wireless Models (LWMs) represent a new paradigm of general-purpose intelligence for wireless communications that transcends task-specific engineering. The success of foundation models is critically underpinned by scaling laws, which provide a predictable roadmap for how performance scales with resources. However, established scaling laws from language and vision, charting performance as a power-law of model and dataset sizes, are ill-suited for the wireless domain, as their core formulations cannot model the structured nature of the physical channel. To address this, we propose a novel wireless scaling law that extends the classical formulation by modeling two wireless-native factors: channel heterogeneity and discretization granularity. These two factors reshape scaling behavior via nested linear and power-law relationships, recasting the scaling law's parameters (notably the scaling exponent and irreducible loss) from universal constants into dynamic variables dictated by the physical environment. Our physics-aware formulation reveals two key insights: first, that compute-optimal scaling is not dictated by a fixed model-data ratio but is instead a dynamic function of heterogeneity and granularity, and second, that this dependency is particularly sensitive to granularity, allowing significant performance to be unlocked from existing data simply by refining its resolution. Crucially, this establishes a reliable roadmap for designing powerful yet resource-efficient LWMs, translating theoretical insights into actionable engineering principles. Extensive experiments validate our wireless scaling law, showing a 32.31% prediction accuracy improvement over classical laws in diverse wireless scenarios where they fail.

PDF Details DOI

TMLR Journal 2026 Journal Article

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang
Hejia Geng
Xiaohang Yu
Zhenfei Yin
Zaibin Zhang
Zelin Tan
Heng Zhou
Zhong-Zhi Li

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM RL with the temporally extended Partially Observable Markov Decision Processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

PDF Details

AAAI Conference 2025 Conference Paper

Achieving Lightweight Super-Resolution for Real-Time Computer Graphics

Yu Wen
Chen Zhang
Chenhao Xie
Xin Fu

Image super-resolution (SR) is essential for bridging the gap between modern hardware and real-time computer graphics (CG) applications. It reduces CG workload by allowing low-resolution rendering, with original quality restored later via mathematical operations or machine learning. However, recent learning-based SR methods often rely on complex models, demanding high computational resources and undermining the benefits of reduced rendering workload. Our qualitative and quantitative analysis of the SR process and rendering reveals that readily accessible rendering information can significantly enhance neural network design by serving as additional features. To capitalize on this, we propose CGSR, an optimization framework designed for lightweight real-time super-resolution. CGSR utilizes rendering information to boost both network extensibility and efficiency. It utilizes progressively available rendering information from the pipeline, which arrives earlier than the rendered frame, enabling pre-processing and masking of latency. These features are then integrated into a selected SR network backbone to form a CG-enhanced network. This network is further optimized and refined into a CG-optimized version using neural architecture search (NAS). To improve runtime performance, CGSR also employs rendering-aware hybrid pruning, which dynamically prunes the network based on temporal rendering data. Evaluation results show that CGSR significantly reduces parameter size, multi-add operations, and inference time while maintaining high SR quality across various backbone SR networks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li

In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it has appropriately addressed the user's request. Therefore, we propose using the likelihood of follow-up utterances as rewards to differentiate preferred responses from less favored ones, without relying on human or commercial LLM-based preference annotations. Our proposed reward mechanism, ``Follow-up Likelihood as Reward" (FLR), matches the performance of strong reward models trained on large-scale human or GPT-4 annotated data on 8 pairwise-preference and 4 rating-based benchmarks. Building upon the FLR mechanism, we propose to automatically mine preference data from the online generations of a base policy model. The preference data are subsequently used to boost the helpfulness of the base model through direct alignment from preference (DAP) methods, such as direct preference optimization (DPO). Lastly, we demonstrate that fine-tuning the language model that provides follow-up likelihood with natural language feedback significantly enhances FLR's performance on reward modeling benchmarks and effectiveness in aligning the base policy model's helpfulness.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Xinhao Luo
Zihan Liu
Yangjie Zhou
Shihan Fang
Ziyu Huang
Yu Feng
Chen Zhang
Shixuan Sun

Large language model (LLM) decoding suffers from high latency due to fragmented execution across operators and heavy reliance on off-chip memory for data exchange and reduction. This execution model limits opportunities for fusion and incurs significant memory traffic and kernel launch overhead. While modern architectures such as NVIDIA Hopper provide distributed shared memory and low-latency intra-cluster interconnects, they expose only low-level data movement instructions, lacking structured abstractions for collective on-chip communication. To bridge this software-hardware gap, we introduce two cluster-level communication primitives, ClusterReduce and ClusterGather, which abstract common communication patterns and enable structured, high-speed data exchange and reduction between thread blocks within a cluster, allowing intermediate results to be on-chip without involving off-chip memory. Building on these abstractions, we design ClusterFusion, an execution framework that schedules communication and computation jointly to expand operator fusion scope by composing decoding stages such as QKV Projection, Attention, and Output Projection into a single fused kernels. Evaluations on H100 GPUs show that ClusterFusion outperforms state-of-the-art inference frameworks by $1. 61\times$ on average in end-to-end latency across different models and configurations.

PDF Details

NeurIPS Conference 2025 Conference Paper

Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation

Hanchen Su
Xuyuan Li
Yan Zhou
zhuoyi lu
Ziwei Chai
Haozheng Wang
Chen Zhang
Yang Yang

The increasing utilization of graph databases across various fields stems from their capacity to represent intricate interconnections. Nonetheless, exploiting the full capabilities of graph databases continues to be a significant hurdle, largely because of the inherent difficulty in translating natural language into Cypher. Recognizing the critical role of schema selection in database query generation and drawing inspiration from recent progress in reasoning-augmented approaches trained through reinforcement learning to enhance inference capabilities and generalization, we introduce Cypher-RI, a specialized framework for the Text-to-Cypher task. Distinct from conventional approaches, our methodology seamlessly integrates schema selection within the Cypher generation pipeline, conceptualizing it as a critical element in the reasoning process. The schema selection mechanism is guided by textual context, with its outcomes recursively shaping subsequent inference processes. Impressively, our 7B-parameter model, trained through this RL paradigm, demonstrates superior performance compared to baselines, exhibiting a 9. 41\% accuracy improvement over GPT-4o on CypherBench. These results underscore the effectiveness of our proposed reinforcement learning framework, which integrates schema selection to enhance both the accuracy and reasoning capabilities in Text-to-Cypher tasks.

PDF Details

AAAI Conference 2025 Conference Paper

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Jie Ma
Zhitao Gao
Qi Chai
Wangchun Sun
Pinghui Wang
Hongbin Pei
Jing Tao
Lingyun Song

Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: *excessively long reasoning paths distracting from the answer generation*, and *false-positive relations hindering the path refinement*. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7% and 9.1% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Exploring Intrinsic Alignments Within Text Corpus

Zi Liang
Pinghui Wang
Ruofei Zhang
Haibo Hu
Shuo Zhang
Qingqing Ye
Nuo Xu
Yaxin Xiao

Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) have thus emerged as vital components in constructing LLMs. While these methods achieve robust and fine-grained alignment to human values, their practical application is still hindered by high annotation costs and incomplete human alignments. Besides, the intrinsic human values within training corpora have not been fully exploited. To address these issues, we propose ISAAC (Intrinsically Supervised Alignments by Assessing Corpus), a primary and coarse-grained safety alignment strategy for LLMs. ISAAC only relies on a prior assumption about the text corpus, and does not require preferences in RLHF or human responses selection in SFT. Specifically, it assumes a long-tail distribution of text corpus and employs a specialized sampling strategy to automatically sample high-quality responses. Theoretically, we prove that this strategy can improve the safety of LLMs under our assumptions. Empirically, our evaluations on mainstream LLMs show that ISAAC achieves a safety score comparable to current SFT solutions. Moreover, we conduct experiments on ISAAC for some RLHF-based LLMs, where we find that ISAAC can even improve the safety of these models under specific safety domains. These findings demonstrate that ISAAC can provide preliminary alignment to LLMs, thereby reducing the construction costs of existing human-feedback-based methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

Yifei Gao
Yong Chen
Chen Zhang

Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional regression benchmarks show that FAME achieves state-of-the-art accuracy and strong robustness to arbitrarily sampled discrete observations of functions.

PDF Details

IJCAI Conference 2025 Conference Paper

MutationGuard: A Graph and Temporal-Spatial Neural Method for Detecting Mutation Telecommunication Fraud

Haitao Bai
Pinghui Wang
Ruofei Zhang
Ziyang Zhou
Juxiang Zeng
Yulou Su
Li Xing
Zhou Su

Telecommunication fraud refers to deceptive activities in the field of communication services. This research focuses on a category of fraud identified as ''mutation telecommunication fraud". There is currently a lack of research on mutation telecommunication fraud detection, allowing this type of fraud to persist uncaught. We identify that detecting mutation fraud requires capturing multi-source patterns, including user communication graphs and temporal-spatial Voice of Call (VOC) features. Specifically, we introduce MutationGuard, which leverages Graph Neural Networks (GNN) to capture changes in user communication graphs. For VOC records, we map call start times onto a 3D cylindrical surface, thereby representing each VOC record in spatial coordinates and utilizing proposed LFFE and TCFE modules to capture local fraud behaviors and temporal behavior changes. The proposed neural modeling approach that facilitates multi-source information fusion constitutes a significant advancement in detecting mutation fraud. Experiment results reveal a significant improvement in the AUC score by 1. 52% and the F1 score by 1. 36% on the proposed telecommunication fraud dataset. Particularly, our method shows a significant improvement of 13. 93% in accuracy on mutation fraud data. We also validate the effectiveness of our method on the publicly available Sichuan Telecommunication Fraud dataset.

PDF Details DOI

ICML Conference 2025 Conference Paper

Nonparametric Teaching for Graph Property Learners

Chen Zhang
Weixin Bu
Zeyi Ren
Zhengwu Liu
Yik-Chung Wu
Ngai Wong 0001

Inferring properties of graph-structured data, e. g. , the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Nonparametric Teaching (GraNT) that reinterprets the learning process through a novel nonparametric teaching perspective. Specifically, the latter offers a theoretical framework for teaching implicitly defined ( i. e. , nonparametric) mappings via example selection. Such an implicit mapping is realized by a dense set of graph-property pairs, with the GraNT teacher selecting a subset of them to promote faster convergence in GCN training. By analytically examining the impact of graph structure on parameter-based gradient descent during training, and recasting the evolution of GCNs—shaped by parameter updates—through functional gradient descent in nonparametric teaching, we show for the first time that teaching graph property learners ( i. e. , GCNs) is consistent with teaching structure-aware nonparametric learners. These new findings readily commit GraNT to enhancing learning efficiency of the graph property learner, showing significant reductions in training time for graph-level regression (-36. 62%), graph-level classification (-38. 19%), node-level regression (-30. 97%) and node-level classification (-47. 30%), all while maintaining its generalization performance.

Details

NeurIPS Conference 2025 Conference Paper

TraffiDent: A Dataset for Understanding the Interplay Between Traffic Dynamics and Incidents

Xiaochuan Gou
Ziyue Li
Tian Lan
Junpeng Lin
Zhishuai Li
Bingyu Zhao
Chen Zhang
Di Wang

Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e. g. , to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e. g. , to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16, 972 traffic nodes) from year 2022 to 2024: our TraffiDent dataset includes traffic, i. e. , time-series indexes on traffic flow, lane occupancy, and average vehicle speed, and incident, whose records are spatiotemporally aligned with traffic data, with seven different incident classes. Additionally, each node includes detailed physical and policy-level meta-attributes of lanes. Previous datasets typically contain only traffic or incident data in isolation, limiting research to general forecasting tasks. TraffiDent integrates both, enabling detailed analysis of traffic-incident interactions and causal relationships. To demonstrate its broad applicability, we design: (1) post-incident traffic forecasting to quantify the impact of different incidents on traffic indexes; (2) incident classification using traffic indexes to determine the incidents types for precautions measures; (3) global causal analysis among the traffic indexes, meta-attributes, and incidents to give high-level guidance of the interrelations of various factors; (4) local causal analysis within road nodes to examine how different incidents affect the road segments' relations. The dataset is available at https: //xaitraffic. github. io.

PDF Details

AAAI Conference 2024 Conference Paper

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang
Luis Fernando D'Haro
Yiming Chen
Malu Zhang
Haizhou Li

Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique, reference-free neural metrics that better align with human evaluations. Notably among them, large language models (LLMs), particularly the instruction-tuned variants like ChatGPT, are shown to be promising substitutes for human judges. Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc. Hence, it remains inconclusive how effective these LLMs are. To this end, we conduct a comprehensive study on the application of LLMs for automatic dialogue evaluation. Specifically, we analyze the multi-dimensional evaluation capability of 30 recently emerged LLMs at both turn and dialogue levels, using a comprehensive set of 12 meta-evaluation datasets. Additionally, we probe the robustness of the LLMs in handling various adversarial perturbations at both turn and dialogue levels. Finally, we explore how model-level and dimension-level ensembles impact the evaluation performance. All resources are available at https://github.com/e0397123/comp-analysis.

PDF Details DOI

ICML Conference 2024 Conference Paper

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

Chen Zhang
Qiang He
Yuan Zhou
Elvis S. Liu
Hong Wang
Jian Zhao 0010
Yang Wang

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent’s behavior with human expectations. Shūkai ’s ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.

Details

ICLR Conference 2024 Conference Paper

Deep Reinforcement Learning for Modelling Protein Complexes

Ziqi Gao
Tao Feng
Jiaxuan You
Chenyi Zi
Yan Zhou
Chen Zhang
Jia Li 0009

Structure prediction of large protein complexes (a.k.a., protein multimer mod- elling, PMM) can be achieved through the one-by-one assembly using provided dimer structures and predicted docking paths. However, existing PMM methods struggle with vast search spaces and generalization challenges: (1) The assembly of a N -chain multimer can be depicted using graph structured data, with each chain represented as a node and assembly actions as edges. Thus the assembly graph can be arbitrary acyclic undirected connected graph, leading to the com- binatorial optimization space of N^(N −2) for the PMM problem. (2) Knowledge transfer in the PMM task is non-trivial. The gradually limited data availability as the chain number increases necessitates PMM models that can generalize across multimers of various chains. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards and adversarial loss through policy gradient for automatic PMM prediction. Specifi- cally, GAPN learns to efficiently search through the immense assembly space and optimize the direct docking reward through policy gradient. Importantly, we de- sign a adversarial reward function to enhance the receptive field of our model. In this way, GAPN will simultaneously focus on a specific batch of multimers and the global assembly rules learned from multimers with varying chain numbers. Empirically, we have achieved both significant accuracy (measured by RMSD and TM-Score) and efficiency improvements compared to leading complex mod- eling software. GAPN outperforms the state-of-the-art method (MoLPC) with up to 27% improvement in TM-Score, with a speed-up of 600×.

Details

NeurIPS Conference 2024 Conference Paper

Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification

Yihong Luo
Yuhan Chen
Siya Qiu
Yiwei Wang
Chen Zhang
Yan Zhou
Xiaochun Cao
Jing Tang

Graph Neural Networks (GNNs) have shown superior performance in node classification. However, GNNs perform poorly in the Few-Shot Node Classification (FSNC) task that requires robust generalization to make accurate predictions for unseen classes with limited labels. To tackle the challenge, we propose the integration of Sharpness-Aware Minimization (SAM)--a technique designed to enhance model generalization by finding a flat minimum of the loss landscape--into GNN training. The standard SAM approach, however, consists of two forward-backward steps in each training iteration, doubling the computational cost compared to the base optimizer (e. g. , Adam). To mitigate this drawback, we introduce a novel algorithm, Fast Graph Sharpness-Aware Minimization (FGSAM), that integrates the rapid training of Multi-Layer Perceptrons (MLPs) with the superior performance of GNNs. Specifically, we utilize GNNs for parameter perturbation while employing MLPs to minimize the perturbed loss so that we can find a flat minimum with good generalization more efficiently. Moreover, our method reutilizes the gradient from the perturbation phase to incorporate graph topology into the minimization process at almost zero additional cost. To further enhance training efficiency, we develop FGSAM+ that executes exact perturbations periodically. Extensive experiments demonstrate that our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks. In particular, our FGSAM+ as a SAM variant offers a faster optimization than the base optimizer in most cases. In addition to FSNC, our proposed methods also demonstrate competitive performance in the standard node classification task for heterophilic graphs, highlighting the broad applicability.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space

Yiyang Guo
Ruizhe Li
Mude Hui
Hanzhong Guo
Chen Zhang
Chuangjian Cai
Le Wan
Shangfei Wang

Invisible watermarking is essential for safeguarding digital content, enabling copyright protection and content authentication. However, existing watermarking methods fall short in robustness against regeneration attacks. In this paper, we propose a novel method called FreqMark that involves unconstrained optimization of the image latent frequency space obtained after VAE encoding. Specifically, FreqMark embeds the watermark by optimizing the latent frequency space of the images and then extracts the watermark through a pre-trained image encoder. This optimization allows a flexible trade-off between image quality with watermark robustness and effectively resists regeneration attacks. Experimental results demonstrate that FreqMark offers significant advantages in image quality and robustness, permits flexible selection of the encoding bit number, and achieves a bit accuracy exceeding 90\% when encoding a 48-bit hidden message under various attack scenarios.

PDF Details DOI

AAAI Conference 2024 Conference Paper

IINet: Implicit Intra-inter Information Fusion for Real-Time Stereo Matching

Ximeng Li
Chen Zhang
Wanjuan Su
Wenbing Tao

Recently, there has been a growing interest in 3D CNN-based stereo matching methods due to their remarkable accuracy. However, the high complexity of 3D convolution makes it challenging to strike a balance between accuracy and speed. Notably, explicit 3D volumes contain considerable redundancy. In this study, we delve into more compact 2D implicit network to eliminate redundancy and boost real-time performance. However, simply replacing explicit 3D networks with 2D implicit networks causes issues that can lead to performance degradation, including the loss of structural information, the quality decline of inter-image information, as well as the inaccurate regression caused by low-level features. To address these issues, we first integrate intra-image information to fuse with inter-image information, facilitating propagation guided by structural cues. Subsequently, we introduce the Fast Multi-scale Score Volume (FMSV) and Confidence Based Filtering (CBF) to efficiently acquire accurate multi-scale, noise-free inter-image information. Furthermore, combined with the Residual Context-aware Upsampler (RCU), our Intra-Inter Fusing network is meticulously designed to enhance information transmission on both feature-level and disparity-level, thereby enabling accurate and robust regression. Experimental results affirm the superiority of our network in terms of both speed and accuracy compared to all other fast methods.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Zhenhui Ye
Tianyun Zhong
Yi Ren
Ziyue Jiang
Jiawei Huang
Rongjie Huang
Jinglin Liu
JinZheng He

Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. Personalized TFG is a variant that emphasizes the perceptual identity similarity of the synthesized result (from the perspective of appearance and talking style). While previous works typically solve this problem by learning an individual neural radiance field (NeRF) for each identity to implicitly store its static and dynamic information, we find it inefficient and non-generalized due to the per-identity-per-training framework and the limited training data. To this end, we propose MimicTalk, the first attempt that exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG. To be specific, (1) we first come up with a person-agnostic 3D TFG model as the base model and propose to adapt it into a specific identity; (2) we propose a static-dynamic-hybrid adaptation pipeline to help the model learn the personalized static appearance and facial dynamic features; (3) To generate the facial motion of the personalized talking style, we propose an in-context stylized audio-to-motion model that mimics the implicit talking style provided in the reference video without information loss by an explicit style representation. The adaptation process to an unseen identity can be performed in 15 minutes, which is 47 times faster than previous person-dependent methods. Experiments show that our MimicTalk surpasses previous baselines regarding video quality, efficiency, and expressiveness. Video samples are available at https: //mimictalk. github. io.

PDF Details DOI

AAAI Conference 2024 Conference Paper

NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

Hongbo Zhang
Guang Wang
Xu Wang
Zhengyang Zhou
Chen Zhang
Zheng Dong
Yang Wang

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

PDF Details DOI

ICML Conference 2024 Conference Paper

Nonparametric Teaching of Implicit Neural Representations

Chen Zhang
Steven Tin Sui Luo
Jason Chun Lok Li
Yik-Chung Wu
Ngai Wong 0001

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.

Details

JBHI Journal 2024 Journal Article

RandStainNA++: Enhance Random Stain Augmentation and Normalization Through Foreground and Background Differentiation

Chong Wang
Shuxin Li
Jing Ke
Chen Zhang
Yiqing Shen

The wide prevalence of staining variations in digital pathology presents a significant obstacle, often undermining the effectiveness of diagnosis and analysis. The current strategies to counteract this issue primarily revolve around Stain Normalization (SN) and Stain Augmentation (SA). Nonetheless, these methodologies come with inherent limitations. They struggle to adapt to the vast array of staining styles, tend to presuppose linear associations between color spaces, and often lead to unrealistic color transformations. In response to these challenges, we introduce RandStainNA++, a novel method seamlessly integrating SN and SA. This method exploits the versatility of random SN and SA within randomly selected color spaces, effectively managing variations for the foreground and background independently. By refining the transformations of staining styles for the foreground and background within a realistic scope, this strategy promotes the generation of more practical staining transformations during the training phase. Further enhancing our approach, we propose a unique self-distillation method. This technique incorporates prior knowledge of stain variation, substantially augmenting the generalization capability of the network. The striking results yield that, compared to conventional classification models, our method boosts performance by a significant margin of 16-25%. Furthermore, when juxtaposed with baseline segmentation models, the Dice score registers an increase of 0. 06.

Details DOI

ICML Conference 2023 Conference Paper

Nonparametric Iterative Machine Teaching

Chen Zhang
Xiaofeng Cao 0002
Weiyang Liu
Ivor W. Tsang
James T. Kwok

In this paper, we consider the problem of Iterative Machine Teaching (IMT), where the teacher provides examples to the learner iteratively such that the learner can achieve fast convergence to a target model. However, existing IMT algorithms are solely based on parameterized families of target models. They mainly focus on convergence in the parameter space, resulting in difficulty when the target models are defined to be functions without dependency on parameters. To address such a limitation, we study a more general task – Nonparametric Iterative Machine Teaching (NIMT), which aims to teach nonparametric target models to learners in an iterative fashion. Unlike parametric IMT that merely operates in the parameter space, we cast NIMT as a functional optimization problem in the function space. To solve it, we propose both random and greedy functional teaching algorithms. We obtain the iterative teaching dimension (ITD) of the random teaching algorithm under proper assumptions, which serves as a uniform upper bound of ITD in NIMT. Further, the greedy teaching algorithm has a significantly lower ITD, which reaches a tighter upper bound of ITD in NIMT. Finally, we verify the correctness of our theoretical findings with extensive experiments in nonparametric scenarios.

Details

NeurIPS Conference 2023 Conference Paper

Nonparametric Teaching for Multiple Learners

Chen Zhang
Xiaofeng Cao
Weiyang Liu
Ivor Tsang
James Kwok

We study the problem of teaching multiple learners simultaneously in the nonparametric iterative teaching setting, where the teacher iteratively provides examples to the learner for accelerating the acquisition of a target concept. This problem is motivated by the gap between current single-learner teaching setting and the real-world scenario of human instruction where a teacher typically imparts knowledge to multiple students. Under the new problem formulation, we introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model. To achieve this, we frame the problem as teaching a vector-valued target model and extend the target model space from a scalar-valued reproducing kernel Hilbert space used in single-learner scenarios to a vector-valued space. Furthermore, we demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching, particularly when the multiple learners can communicate with each other. Lastly, we conduct extensive experiments to validate the practicality and efficiency of MINT.

PDF Details

IROS Conference 2023 Conference Paper

Underwater and Surface Aquatic Locomotion of Soft Biomimetic Robot Based on Bending Rolled Dielectric Elastomer Actuators

Chenyu Zhang
Chen Zhang
Juntian Qu
Xiang Qian

All-around, real-time navigation and sensing across the water environments by miniature soft robotics are promising, for their merits of small size, high agility and good compliance to the unstructured surroundings. In this paper, we propose and demonstrate a mantas-like soft aquatic robot which propels itself by flapping-fins using rolled dielectric elastomer actuators (DEAs) with bending motions. This robot exhibits fast-moving capabilities of swimming at 57mm/s or 1. 25 body length per second (BL/s), skating on water surface at 64 mm/s (1. 36 BL/s) and vertical ascending at 38mm/s (0. 82 BL/s) at 1300 V, 17 Hz of the power supply. These results show the feasibility of adopting rolled DEAs for mesoscale aquatic robots with high motion performance in various water-related scenarios.

Details

AAAI Conference 2023 Conference Paper

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Yihan Wu
Junliang Guo
Xu Tan
Chen Zhang
Bohan Li
Ruihua Song
Lei He
Sheng Zhao

Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose VideoDubber, a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. Specifically, we control the speech length of generated sentence by guiding the prediction of each word with the duration information, including the speech duration of itself as well as how much duration is left for the remaining words. We design experiments on four language directions (German -> English, Spanish -> English, Chinese English), and the results show that VideoDubber achieves better length control ability on the generated speech than baseline methods. To make up the lack of real-world datasets, we also construct a real-world test set collected from films to provide comprehensive evaluations on the video dubbing task.

PDF Details DOI

YNIMG Journal 2022 Journal Article

Goal-specific brain MRI harmonization

Lijun An
Jianzhong Chen
Pansheng Chen
Chen Zhang
Tong He
Christopher Chen
Juan Helen Zhou
B.T. Thomas Yeo

There is significant interest in pooling magnetic resonance image (MRI) data from multiple datasets to enable mega-analysis. Harmonization is typically performed to reduce heterogeneity when pooling MRI data across datasets. Most MRI harmonization algorithms do not explicitly consider downstream application performance during harmonization. However, the choice of downstream application might influence what might be considered as study-specific confounds. Therefore, ignoring downstream applications during harmonization might potentially limit downstream performance. Here we propose a goal-specific harmonization framework that utilizes downstream application performance to regularize the harmonization procedure. Our framework can be integrated with a wide variety of harmonization models based on deep neural networks, such as the recently proposed conditional variational autoencoder (cVAE) harmonization model. Three datasets from three different continents with a total of 2787 participants and 10,085 anatomical T1 scans were used for evaluation. We found that cVAE removed more dataset differences than the widely used ComBat model, but at the expense of removing desirable biological information as measured by downstream prediction of mini mental state examination (MMSE) scores and clinical diagnoses. On the other hand, our goal-specific cVAE (gcVAE) was able to remove as much dataset differences as cVAE, while improving downstream cross-sectional prediction of MMSE scores and clinical diagnoses.

Details DOI

IJCAI Conference 2022 Conference Paper

GRELEN: Multivariate Time Series Anomaly Detection from the Perspective of Graph Relational Learning

Weiqi Zhang
Chen Zhang
Fugee Tsung

System monitoring and anomaly detection is a crucial task in daily operation. With the rapid development of cyber-physical systems and IT systems, multiple sensors get involved to represent the system state from different perspectives, which inspires us to detect anomalies considering feature dependence relationship among sensors instead of focusing on individual sensor's behavior. In this paper, we propose a novel Graph Relational Learning Network (GReLeN) to detect multivariate time series anomalies from the perspective of between-sensor dependence relationship learning. Variational AutoEncoder (VAE) serves as the overall framework for feature extraction and system representation. Graph Neural Network (GNN) and stochastic graph relational learning strategy are also imposed to capture the between-sensor dependence. Then a composite anomaly metric is established with the learned dependence structure explicitly. The experiments on four real-world datasets show our superiority in detection accuracy, anomaly diagnosis, and model interpretation.

PDF Details DOI

JBHI Journal 2022 Journal Article

Landmark Localization for Cephalometric Analysis Using Multiscale Image Patch-Based Graph Convolutional Networks

Gang Lu
Yuanxiu Zhang
Youyong Kong
Chen Zhang
Jean-Louis Coatrieux
Huazhong Shu

Accurate and robust cephalometric image analysis plays an essential role in orthodontic diagnosis, treatment assessment and surgical planning. This paper proposes a novel landmark localization method for cephalometric analysis using multiscale image patch-based graph convolutional networks. In detail, image patches with the same size are hierarchically sampled from the Gaussian pyramid to well preserve multiscale context information. We combine local appearance and shape information into spatialized features with an attention module to enrich node representations in graph. The spatial relationships of landmarks are built with the incorporation of three-layer graph convolutional networks, and multiple landmarks are simultaneously updated and moved toward the targets in a cascaded coarse-to-fine process. Quantitative results obtained on publicly available cephalometric X-ray images have exhibited superior performance compared with other state-of-the-art methods in terms of mean radial error and successful detection rate within various precision ranges. Our approach performs significantly better especially in the clinically accepted range of 2 mm and this makes it suitable in cephalometric analysis and orthognathic surgery.

Details DOI

AAAI Conference 2022 Conference Paper

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

Chen Zhang
Luis Fernando D'Haro
Thomas Friedrichs
Haizhou Li

Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations. To measure the quality of such conversational agents, a dialogue evaluator is expected to conduct assessment across domains as well. However, most of the state-of-the-art automatic dialogue evaluation metrics (ADMs) are not designed for multi-domain evaluation. We are motivated to design a general and robust framework, MDD-Eval, to address the problem. Specifically, we first train a teacher evaluator with human-annotated data to acquire a rating skill to tell good dialogue responses from bad ones in a particular domain and then, adopt a self-training strategy to train a new evaluator with teacher-annotated multi-domain data, that helps the new evaluator to generalize across multiple domains. MDD- Eval is extensively assessed on six dialogue evaluation benchmarks. Empirical results show that the MDD-Eval framework achieves a strong performance with an absolute improvement of 7% over the state-of-the-art ADMs in terms of mean Spearman correlation scores across all the evaluation benchmarks.

PDF Details

NeurIPS Conference 2022 Conference Paper

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization

Yang Zhao
Chen Zhang
Haifeng Huang
Haoyuan Li
Zhou Zhao

Aiming to locate the object that emits a specified sound in complex scenes, the task of sounding object localization bridges two perception-oriented modalities of vision and acoustics, and brings enormous research value to the comprehensive perceptual understanding of machine intelligence. Although there are massive training data collected in this field, few of them contain accurate bounding box annotations, hindering the learning process and further application of proposed models. In order to address this problem, we try to explore an effective multi-modal knowledge transfer strategy to obtain precise knowledge from other similar tasks and transfer it through well-aligned multi-modal data to deal with this task in a zero-resource manner. Concretely, we design and propose a novel \textit{Two-stream Universal Referring localization Network} (TURN), which is composed of a localization stream and an alignment stream to carry out different functions. The former is utilized to extract the knowledge related to referring object localization from the image grounding task, while the latter is devised to learn a universal semantic space shared between texts and audios. Moreover, we further develop an adaptive sampling strategy to automatically identify the overlap between different data domains, thus boosting the performance and stability of our model. The extensive experiments on various publicly-available benchmarks demonstrate that TURN can achieve competitive performance compared with the state-of-the-art approaches without using any data in this field, which verifies the feasibility of our proposed mechanisms and strategies.

PDF Details

AAAI Conference 2021 Conference Paper

CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Peixiang Zhong
Di Wang
Pengfei Li
Chen Zhang
Hao Wang
Chunyan Miao

Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we hypothesize that combining rationality and emotion into conversational agents can improve response quality. To test the hypothesis, we focus on one fundamental aspect of rationality, i. e. , commonsense, and propose CARE, a novel model for commonsense-aware emotional response generation. Specifically, we first propose a framework to learn and construct commonsense-aware emotional latent concepts of the response given an input message and a desired emotion. We then propose three methods to collaboratively incorporate the latent concepts into response generation. Experimental results on two large-scale datasets support our hypothesis and show that our model can produce more accurate and commonsense-aware emotional responses and achieve better human ratings than state-of-the-art models that only specialize in one aspect.

PDF Details

IROS Conference 2021 Conference Paper

Improving Object Permanence using Agent Actions and Reasoning

Ying Siu Liang
Chen Zhang
Dongkyu Choi
Kenneth Kwok

Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation actions performed on an object prior to its disappearance allows us to reason about its location, e. g. , that the object has been placed in a carrier. In this paper we argue that object permanence can be improved when the robot uses knowledge about executed actions and describe an approach to infer hidden object states from agent actions. We show that considering agent actions not only improves rule-based reasoning models but also purely neural approaches, showing its general applicability. Then, we conduct quantitative experiments on a snitch localization task using a dataset of 1, 371 synthesized videos, where we compare the performance of different object permanence models with and without action annotations. We demonstrate that models with action annotations can significantly increase performance of both neural and rule-based approaches. Finally, we evaluate the usability of our approach in real-world applications by conducting qualitative experiments with two Universal Robots (UR5 and UR16e) in both lab and industrial settings. The robots complete benchmark tasks for a gearbox assembly and demonstrate the object permanence capabilities with real sensor data in an industrial environment.

Details

IJCAI Conference 2021 Conference Paper

Interactive Video Acquisition and Learning System for Motor Assessment of Parkinson's Disease

Yunyue Wei
Bingquan Zhu
Chen Hou
Chen Zhang
Yanan Sui

Diagnosis and treatment for Parkinson's disease rely on the evaluation of motor functions, which is expensive and time consuming when performing at clinics. It is also difficult for patients to record correct movements at home without the guidance from experienced physicians. To help patients with Parkinson’s disease get better evaluation from in-home recorded movement videos, we developed an interactive video acquisition and learning system for clinical motor assessments. The system provides real-time guidance with multi-level body keypoint tracking and analysis to patients, which guarantees correct understanding and performing of clinical tasks. We tested its effectiveness on healthy subjects, and the efficiency and usability on patient groups. Experiments showed that our system enabled high quality video recordings following clinical standards, benefiting both patients and physicians. Our system provides a novel learning-based telemedicine approach for the care of patients with Parkinson’s disease.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression

Chen Zhang
Shifeng Zhang
Fabio Maria Carlucci
Zhenguo Li

Explicit deep generative models (DGMs), e. g. , VAEs and Normalizing Flows, have shown to offer an effective data modelling alternative for lossless compression. However, DGMs themselves normally require large storage space and thus contaminate the advantage brought by accurate data density estimation. To eliminate the requirement of saving separate models for different target datasets, we propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. We formalise this setting as that of One-Shot Online Adaptation (OSOA) of DGMs for lossless compression and propose a vanilla algorithm under this setting. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets. With the same adaptation step number or adaptation time, it is shown vanilla OSOA can exhibit better space efficiency, e. g. , $47\%$ less space, than fine-tuning the pretrained model and saving the fine-tuned model. Moreover, we showcase the potential of OSOA and motivate more sophisticated OSOA algorithms by showing further space or time efficiency with multiple updates per batch and early stopping.

PDF Details

AAAI Conference 2021 Conference Paper

UWSpeech: Speech to Speech Translation for Unwritten Languages

Chen Zhang
Xu Tan
Yi Ren
Tao Qin
Kejun Zhang
Tie-Yan Liu

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.

PDF Details

IS Journal 2020 Journal Article

A Deep Coupled LSTM Approach for USD/CNY Exchange Rate Forecasting

Wei Cao
Weidong Zhu
Wenjun Wang
Yves Demazeau
Chen Zhang

Forecasting CNY exchange rate accurately is a challenging task due to its complex coupling nature, which includes market-level coupling from interactions with multiple financial markets, macrolevel coupling from interactions with economic fundamentals, and deep coupling from interactions of the two aforementioned kinds of couplings. This study develops a new deep coupled long short-term memory (LSTM) approach, namely, DC-LSTM, to capture the complex couplings for USD/CNY exchange rate forecasting. In this approach, a deep structure consisting of stacked LSTMs is built to model the complex couplings. The experimental results with 10 years data indicate that the proposed approach significantly outperforms seven other benchmarks. The DC-LSTM is verified to be a useful tool to make wise investment decisions through a profitability discussion. The purpose in this article is to clarify the importance of coupling learning for exchange rate forecasting, and the usefulness of deep coupled model to capture the couplings.

Details DOI

IJCAI Conference 2020 Conference Paper

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

Jinglin Liu
Yi Ren
Xu Tan
Chen Zhang
Tao Qin
Zhou Zhao
Tie-Yan Liu

Non-autoregressive translation (NAT) achieves faster inference speed but at the cost of worse accuracy compared with autoregressive translation (AT). Since AT and NAT can share model structure and AT is an easier task than NAT due to the explicit dependency on previous target-side tokens, a natural idea is to gradually shift the model training from the easier AT task to the harder NAT task. To smooth the shift from AT training to NAT training, in this paper, we introduce semi-autoregressive translation (SAT) as intermediate tasks. SAT contains a hyperparameter k, and each k value defines a SAT task with different degrees of parallelism. Specially, SAT covers AT and NAT as its special cases: it reduces to AT when k=1 and to NAT when k=N (N is the length of target sentence). We design curriculum schedules to gradually shift k from 1 to N, with different pacing functions and number of tasks trained at the same time. We called our method as task-level curriculum learning for NAT (TCL-NAT). Experiments on IWSLT14 De-En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Tensor Completion for Weakly-Dependent Data on Graph for Metro Passenger Flow Prediction

Ziyue Li
Nurettin Dorukhan Sergin
Hao Yan
Chen Zhang
Fugee Tsung

Low-rank tensor decomposition and completion have attracted signiﬁcant interest from academia given the ubiquity of tensor data. However, low-rank structure is a global property, which will not be fulﬁlled when the data presents complex and weak dependencies given speciﬁc graph structures. One particular application that motivates this study is the spatiotemporal data analysis. As shown in the preliminary study, weakly dependencies can worsen the low-rank tensor completion performance. In this paper, we propose a novel lowrank CANDECOMP / PARAFAC (CP) tensor decomposition and completion framework by introducing the L1-norm penalty and Graph Laplacian penalty to model the weakly dependency on graph. We further propose an efﬁcient optimization algorithm based on the Block Coordinate Descent for efﬁcient estimation. A case study based on the metro passenger ﬂow data in Hong Kong is conducted to demonstrate an improved performance over the regular tensor completion methods.

PDF Details

AAAI Conference 2019 Conference Paper

Balanced Sparsity for Efficient DNN Inference on GPU

Zhuliang Yao
Shijie Cao
Wencong Xiao
Chen Zhang
Lanshun Nie

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3. 1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

PDF Details

IJCAI Conference 2019 Conference Paper

Discriminative and Correlative Partial Multi-Label Learning

Haobo Wang
Weiwei Liu
Yang Zhao
Chen Zhang
Tianlei Hu
Gang Chen

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.

PDF Details

AAAI Conference 2019 Conference Paper

Partially Observable Multi-Sensor Sequential Change Detection: A Combinatorial Multi-Armed Bandit Approach

Chen Zhang
Steven C.H. Hoi

This paper explores machine learning to address a problem of Partially Observable Multi-sensor Sequential Change Detection (POMSCD), where only a subset of sensors can be observed to monitor a target system for change-point detection at each online learning round. In contrast to traditional Multisensor Sequential Change Detection tasks where all the sensors are observable, POMSCD is much more challenging because the learner not only needs to detect on-the-fly whether a change occurs based on partially observed multi-sensor data streams, but also needs to cleverly choose a subset of informative sensors to be observed in the next learning round, in order to maximize the overall sequential change detection performance. In this paper, we present the first online learning study to tackle POMSCD in a systemic and rigorous way. Our approach has twofold novelties: (i) we attempt to detect changepoints from partial observations effectively by exploiting potential correlations between sensors, and (ii) we formulate the sensor subset selection task as a Multi-Armed Bandit (MAB) problem and develop an effective adaptive sampling strategy using MAB algorithms. We offer theoretical analysis for the proposed online learning solution, and further validate its empirical performance via an extensive set of numerical studies together with a case study on real-world data sets.

PDF Details

TIST Journal 2018 Journal Article

ResumeVis

Chen Zhang
Hao Wang

Massive public resume data emerging on the internet indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as recruitment trend predict, talent seeking and evaluation. Existing RA studies either largely rely on the knowledge of domain experts, or leverage classic statistical or data mining models to identify and filter explicit attributes based on pre-defined rules. However, they fail to discover the latent semantic information from semi-structured resume text, i.e., individual career progress trajectory and social-relations, which are otherwise vital to comprehensive understanding of people’s career evolving patterns. Besides, when dealing with large numbers of resumes, how to properly visualize such semantic information to reduce the information load and to support better human cognition is also challenging. To tackle these issues, we propose a visual analytics system called ResumeVis to mine and visualize resume data. First, a text mining-based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. Through interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes’ collective mobility. Case studies with over 2,500 government officer resumes demonstrate the effectiveness of our system.

Details DOI

AAAI Conference 2017 Conference Paper

Fredholm Multiple Kernel Learning for Semi-Supervised Domain Adaptation

Wei Wang
Hao Wang
Chen Zhang
Yang Gao

As a fundamental constituent of machine learning, domain adaptation generalizes a learning model from a source domain to a different (but related) target domain. In this paper, we focus on semi-supervised domain adaptation and explicitly extend the applied range of unlabeled target samples into the combination of distribution alignment and adaptive classiﬁer learning. Speciﬁcally, our extension formulates the following aspects in a single optimization: 1) learning a crossdomain predictive model by developing the Fredholm integral based kernel prediction framework; 2) reducing the distribution difference between two domains; 3) exploring multiple kernels to induce an optimal learning space. Correspondingly, such an extension is distinguished with allowing for noise resiliency, facilitating knowledge transfer and analyzing diverse data characteristics. It is emphasized that we prove the differentiability of our formulation and present an effective optimization procedure based on the reduced gradient, guaranteeing rapid convergence. Comprehensive empirical studies verify the effectiveness of the proposed method.

PDF Details

JBHI Journal 2016 Journal Article

Design and Implementation of an On-Chip Patient-Specific Closed-Loop Seizure Onset and Termination Detection System

Chen Zhang
Muhammad Awais Bin Altaf
Jerald Yoo

This paper presents the design of an area- and energy-efficient closed-loop machine learning-based patient-specific seizure onset and termination detection algorithm, and its on-chip hardware implementation. Application- and scenario-based tradeoffs are compared and reviewed for seizure detection and suppression algorithm and system which comprises electroencephalography (EEG) data acquisition, feature extraction, classification, and stimulation. Support vector machine achieves a good tradeoff among power, area, patient specificity, latency, and classification accuracy for long-term monitoring of patients with limited training seizure patterns. Design challenges of EEG data acquisition on a multichannel wearable environment for a patch-type sensor are also discussed in detail. Dual-detector architecture incorporates two area-efficient linear support vector machine classifiers along with a weight-and-average algorithm to target high sensitivity and good specificity at once. On-chip implementation issues for a patient-specific transcranial electrical stimulation are also discussed. The system design is verified using CHB-MIT EEG database [1] with a comprehensive measurement criteria which achieves high sensitivity and specificity of 95. 1% and 96. 2%, respectively, with a small latency of 1 s. It also achieves seizure onset and termination detection delay of 2. 98 and 3. 82 s, respectively, with seizure length estimation error of 4. 07 s.

Details DOI

AAAI Conference 2015 Conference Paper

Transfer Feature Representation via Multiple Kernel Learning

Wei Wang
Hao Wang
Chen Zhang
Fanjiang Xu

Learning an appropriate feature representation across source and target domains is one of the most effective solutions to domain adaptation problems. Conventional cross-domain feature learning methods rely on the Reproducing Kernel Hilbert Space (RKHS) induced by a single kernel. Recently, Multiple Kernel Learning (MKL), which bases classifiers on combinations of kernels, has shown improved performance in the tasks without distribution difference between domains. In this paper, we generalize the framework of MKL for cross-domain feature learning and propose a novel Transfer Feature Representation (TFR) algorithm. TFR learns a convex combination of multiple kernels and a linear transformation in a single optimization which integrates the minimization of distribution difference with the preservation of discriminating power across domains. As a result, standard machine learning models trained in the source domain can be reused for the target domain data. After rewritten into a differentiable formulation, TFR can be optimized by a reduced gradient method and reaches the convergence. Experiments in two real-world applications verify the effectiveness of our proposed method.

PDF Details

AAAI Conference 2014 Conference Paper

Cross-Domain Metric Learning Based on Information Theory

Hao Wang
Wei Wang
Chen Zhang
Fanjiang Xu

Supervised metric learning plays a substantial role in statistical classification. Conventional metric learning algorithms have limited utility when the training data and testing data are drawn from related but different domains (i. e. , source domain and target domain). Although this issue has got some progress in feature-based transfer learning, most of the work in this area suffers from non-trivial optimization and pays little attention to preserving the discriminating information. In this paper, we propose a novel metric learning algorithm to transfer knowledge from the source domain to the target domain in an information-theoretic setting, where a shared Mahalanobis distance across two domains is learnt by combining three goals together: 1) reducing the distribution difference between different domains; 2) preserving the geometry of target domain data; 3) aligning the geometry of source domain data with its label information. Based on this combination, the learnt Mahalanobis distance effectively transfers the discriminating power and propagates standard classifiers across these two domains. More importantly, our proposed method has closed-form solution and can be efficiently optimized. Experiments in two real-world applications demonstrate the effectiveness of our proposed method.

PDF Details

AAAI Conference 2010 Conference Paper

News Recommendation in Forum-Based Social Media

Jia Wang
Qing Li
Yuanzhu Chen
Jiafen Liu
Chen Zhang
Zhangxi Lin

Self-publication of news on Web sites is becoming a common application platform to enable more engaging interaction among users. Discussion in the form of comments following news postings can be effectively facilitated if the service provider can recommend articles based on not only the original news itself but also the thread of changing comments. This turns the traditional news recommendation to a “discussion moderator” that can intelligently assist online forums. In this work, we present a framework to implement such adaptive news recommendation. In addition, to alleviate the problem of recommending essentially identical articles, the relationship (duplication, generalization or specialization) between suggested news articles and the original posting is investigated. Experiments indicate that our proposed solutions provide an enhanced news recommendation service in forum-based social media.

PDF Details

ICAPS Conference 2005 Conference Paper

Planning as Mixed-Initiative Goal Manipulation

Michael T. Cox
Chen Zhang

Mixed-initiative planning systems attempt to integrate human and AI planners so that the synthesis results in high quality plans. In the AI community, the dominant model of planning is search. In state-space planning, search consists of backward and forward chaining through the effects and preconditions of operator representations. Although search is an acceptable mechanism to use in performing automated planning, we present an alternative model to present to the user at the interface of a mixed-initiative planning system. That is we propose to model planning as a goal manipulation task. Here planning involves moving goals through a hyperspace in order to reach equilibrium between available resources and the constraints of a dynamic environment. The users can establish and "steer" goals through a visual representation of the planning domain. They can associate resources with particular goals and shift goals along various dimensions in response to changing conditions as well as change the structure of previous plans. Users need not know details of the underlying technology, even when search is used within. Here we empirically examine user performance under both alternatives and see that many users do better with the alternative model.

Details