Arrow Research search

Author name cluster

Jing Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

60 papers
2 author rows

Possible papers

60

AAAI Conference 2026 Conference Paper

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

  • Yijia Fan
  • jusheng zhang
  • Kaitong Cai
  • Jing Yang
  • Jian Wang
  • Keze Wang

Despite recent advancements in 3D-text cross-modal alignment, existing state-of-the-art methods still struggle to align fine-grained textual semantics with detailed geometric structures, and their alignment performance degrades significantly when scaling to large-scale 3D databases. To overcome this limitation, we introduce 3DAlign-DAER, a unified framework designed to align text and 3D geometry via the proposed dynamic attention policy and the efficient retrieval strategy, capturing subtle correspondences for diverse cross-modal retrieval and classification tasks. Specifically, during the training, our proposed dynamic attention policy (DAP) employs the Hierarchical Attention Fusion (HAF) module to represent the alignment as learnable fine-grained token-to-point attentions. To optimize these attentions across different tasks and geometric hierarchies, our DAP further exploits the Monte Carlo tree search to dynamically calibrate HAF attention weights via a hybrid reward signal and further enhances the alignment between textual descriptions and local 3D geometry. During the inference, our 3DAlign-DAER introduces an Efficient Retrieval Strategy (ERS) to leverage efficient hierarchical searching in the large-scale embedding spaces, outperforming traditional methods (eg, KNN) in accuracy and efficiency. Furthermore, to facilitate text-3D alignment research and train our 3DAlign-DAER, we construct Align3D-2M, a large-scale dataset featuring 2M text-3D pairs, to provide sufficient fine-grained cross-modal annotations. Extensive and comprehensive experiments demonstrate the superior performance of our 3DAlign-DAER on diverse benchmarks.

TAAS Journal 2026 Journal Article

Activity Recognition Empowered Autonomic Systems with Generative Artificial Intelligence

  • Xu Xu
  • Jing Yang
  • Muhammad Attique Khan
  • Vijay Govindarajan
  • Muhammad Asghar Khan
  • Jamel Baili
  • Por Lip Yee

WiFi-based human activity recognition (HAR) is a promising solution for enabling intelligent sensing in autonomic systems such as smart homes and industrial environments. Recent research has explored deep learning models for this task, yet two core challenges persist, i.e., signal degradation caused by environmental noise, and the difficulty of extracting discriminative features from heterogeneous time-frequency domains. To address these limitations, we propose a generative artificial intelligence (GenAI) framework that integrates a conditional diffusion model with a pixel-aware attention mechanism. The diffusion model enhances data quality by reconstructing clean channel state information (CSI) signals through a forward noise injection and a reverse denoising process. The pixel-aware attention module adaptively fuses multi-resolution features from short-time Fourier transform (STFT) and discrete wavelet transform (DWT) spectrograms at channel, spatial, and pixel levels, improving the representation of fine-grained activity patterns. To the best of our knowledge, this is the first work to apply a generative denoising diffusion with fine-grained pixel-level fusion for this task. We evaluate our model on four public datasets, i.e., SignFi, Widar3.0, UT-HAR, and NTU-HAR. Experimental results show that our model consistently outperforms existing methods, demonstrating strong robustness and generalization in both gesture and action recognition tasks.

TIST Journal 2026 Journal Article

Cascade Transformer for Hierarchical Semantic Reasoning in Text-Based Visual Question Answering

  • Yuan Gao
  • Dezhen Feng
  • Laurence T. Yang
  • Jing Yang
  • Xiaowen Jiang
  • Jieming Yang

Text-based visual question answering (TextVQA) aims to answer questions by understanding scene text in images. However, many current methods overly depend on the accuracy of Optical Character Recognition (OCR) systems, while overlooking the significance of visual objects. They tend to perform poorly when the question involves the relationships between visual objects and scene text. To address the above issues, we focus on raising the status of visual objects and innovatively propose a hierarchical semantic reasoning network (CT-HSR) based on the cascade transformer architecture, achieving fine-grained cross-modal reasoning and visual semantic enhancement. Specifically, the visual representations containing rich semantic information of the question modality are obtained through the cross-modal transformer-based vision-language pre-training model firstly. Then, the uni-modal transformer for unified modality encoding module is utilized to capture visual objects that are more semantically related to OCR texts. In addition, we further alleviate the cross-modal noise interference through the feature filtering strategy. Finally, we better align the three modalities by introducing TextVQA pre-training tasks and generate prediction answers through multi-step iterative prediction during fine-tuning. Extensive experiments on the TextVQA, ST-VQA, and OCR-VQA datasets have demonstrated the effectiveness of our proposed model compared to the state-of-the-art methods. The code will be released at https://github.com/FTFWO/CT-HSR.

AAAI Conference 2026 Conference Paper

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

  • Yijia Fan
  • jusheng zhang
  • Kaitong Cai
  • Jing Yang
  • Chengpei Tang
  • Jian Wang
  • Keze Wang

Multi-agent systems (MAS) built on large language models (LLMs) often suffer from inefficient ''free-for-all'' communication, leading to exponential token costs and low signal-to-noise ratios that hinder their practical deployment. We challenge the notion that more communication is always beneficial, hypothesizing instead that the core issue is the absence of resource rationality. We argue that "free'' communication, by ignoring the principle of scarcity, inherently breeds inefficiency and unnecessary expenses. To address this, we introduce the Dynamic Auction-based Language Agent (DALA), a novel framework that treats communication bandwidth as a scarce and tradable resource. Specifically, our DALA regards inter-agent communication as a centralized auction, where agents learn to bid for the opportunity to speak based on the predicted value density of their messages. Thus, our DALA intrinsically encourages agents to produce concise, informative messages while filtering out low-value communication. Extensive and comprehensive experiments demonstrate that our economically-driven DALA achieves new state-of-the-art performance across seven challenging reasoning benchmarks, including 84.32% on MMLU and a 91.21% pass@1 rate on HumanEval. Note that this is accomplished with remarkable efficiency, i.e., our DALA uses only 6.25 million tokens, a fraction of the resources consumed by current state-of-the-art methods on GSM8K. Further analysis reveals that our DALA cultivates the emergent skill of strategic silence, effectively adapting its communication strategies from verbosity to silence in a dynamic manner via resource constraints.

JBHI Journal 2026 Journal Article

DGAN-MPCC: A Novel Dual-GAN Enhanced Multi-Positive Contrastive Clustering Method for Omics Data

  • Jingxuan Wang
  • Jing Yang
  • Muhammad Attique Khan
  • Por Lip Yee
  • Jamel Baili
  • Dayu Hu

AI-driven clustering methods have significantly enhanced the capacity of researchers to explore the heterogeneity inherent in single-cell omics data, which is a crucial aspect of understanding complex biological systems in healthcare. Despite advancements, most existing methods still face challenges, such as (1) inherent sparsity and noise in cell data, which frequently lead to overfitting in networks. To address this, some researchers have proposed using Generative Adversarial Networks (GANs), however, the conventional single GAN architecture primarily focuses on simple data enhancement and lacks the capacity to infer complex biological data, thus leading to suboptimal clustering performance. (2) Contrastive learning has been proposed to obtain high-quality clustering structures; however, existing methods predominantly rely on a single positive pair, which prevents them from modeling and learning continuous transitions in cell states and thus hinders the establishment of feature representations sensitive to cell types. To address these issues, we propose a novel Dual-GAN Enhanced Multi-Positive Contrastive Clustering Method, DGAN-MPCC, tailored for low-quality single-cell data. Specifically, we propose using two independent GANs to simultaneously enhance the quality of both the input and bottleneck layers, thereby refining the generated cell embedding. Additionally, we have developed a multi-positive contrastive clustering framework that adaptively defines a multi-positive set from clustering structures, enabling each sample to establish positive relationships with all samples within the same cluster, thereby diversifying supervisory signals within the same class. Extensive experiments on several real-world single-cell datasets demonstrate that DGAN-MPCC surpasses current methods across multiple scenarios, providing a more robust and efficient tool for AI-driven decision-making in healthcare.

AAAI Conference 2026 Conference Paper

DomainCQA: Crafting Knowledge-Intensive QA from Domain-Specific Charts

  • Yujing Lu
  • Ling Zhong
  • Jing Yang
  • Weiming Li
  • Peng Wei
  • Yongheng Wang
  • Manni Duan
  • Qing Zhang

Chart Question Answering (CQA) evaluates Multimodal Large Language Models (MLLMs) on visual understanding and reasoning over chart data. However, existing benchmarks mostly test surface-level parsing, such as reading labels and legends, while overlooking deeper scientific reasoning. We propose DomainCQA, a framework for constructing domain-specific CQA benchmarks that emphasize both visual comprehension and knowledge-intensive reasoning. It integrates complexity-aware chart selection, multitier QA generation, and expert validation. Applied to astronomy, DomainCQA yields AstroChart, a benchmark of 1,690 QA pairs over 482 charts, exposing persistent weaknesses in fine-grained perception, numerical reasoning, and domain knowledge integration across 21 MLLMs. Fine-tuning on AstroChart improves performance across fundamental and advanced tasks. Pilot QA sets in biochemistry, economics, medicine, and social science further demonstrate DomainCQA’s generality. Together, our results establish DomainCQA as a unified pipeline for constructing and augmenting domain-specific chart reasoning benchmarks.

AAAI Conference 2026 Conference Paper

HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution

  • Jinzhou Tang
  • jusheng zhang
  • Qinhan Lv
  • Sidi Liu
  • Jing Yang
  • Chengpei Tang
  • Keze Wang

Autonomous agents play a crucial role in advancing Artificial General Intelligence, enabling problem decomposition and tool orchestration through Large Language Models (LLMs). However, existing paradigms face a critical trade-off. On one hand, reusable fixed workflows require manual reconfiguration upon environmental changes; on the other hand, flexible reactive loops fail to distill reasoning progress into transferable structures. We introduce Hierarchical Variable Agent (HiVA), a novel framework modeling agentic workflows as self-organized graphs with the Semantic-Topological Evolution (STEV) algorithm, which optimizes hybrid semantic-topological spaces using textual gradients as discrete-domain surrogates for backpropagation. The iterative process comprises Multi-Armed Bandit-infused forward routing, diagnostic gradient generation from environmental feedback, and coordinated updates that co-evolve individual semantics and topology for collective optimization in unknown environments. Experiments on dialogue, coding, Long-context Q&A, mathematical, and agentic benchmarks demonstrate improvements of 5-10% in task accuracy and enhanced resource efficiency over existing baselines, establishing HiVA's effectiveness in autonomous task execution.

AAAI Conference 2026 Conference Paper

Multi-Metric Preference Alignment for Generative Speech Restoration

  • Junan Zhang
  • Xueyao Zhang
  • Jing Yang
  • Yuancheng Wang
  • Fan Fan
  • Zhizheng Wu

Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and timbre preservation. This principled approach ensures a holistic preference signal. Applying Direct Preference Optimization (DPO) with our dataset, we observe consistent and significant performance gains across three diverse generative paradigms: autoregressive models (AR), masked generative models (MGM), and flow-matching models (FM) on various restoration benchmarks, in both objective and subjective evaluations. Ablation studies confirm the superiority of our multi-metric strategy over single-metric approaches in mitigating reward hacking. Furthermore, we demonstrate that our aligned models can serve as powerful ''data annotators'', generating high-quality pseudo-labels to serve as a supervision signal for traditional discriminative models in data-scarce scenarios like singing voice restoration.

AAAI Conference 2026 Conference Paper

RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability

  • Kaitong Cai
  • jusheng zhang
  • Yijia Fan
  • Jing Yang
  • Keze Wang

Retrieval-Augmented Generation (RAG) faces a core bottleneck with knowledge-sparse and semantically ambiguous long-tail queries, where retrieval noise distorts reasoning and necessitates costly post-processing. To tackle this, we propose RaCoT (Retrieval-aware Contrastive-of-Thought), a novel framework that shifts contrastive thinking to the pre-retrieval stage. By automatically generating a semantically adjacent yet differently answered contrastive question and extracting a Δ-Prompt to capture their key differences, RaCoT guides the model to proactively focus on the "critical details that determine answer divergence." This approach allows it to suppress semantic interference within a single retrieval pass, overcoming the theoretical bottleneck of single-vector queries that struggle to simultaneously encode signals for what to attend to and what to ignore. On six authoritative benchmarks, including PopQA and TriviaQA-unfiltered, RaCoT outperforms strong baselines like RankRAG and Self-RAG by 0.9-2.4 percentage points. It exhibits superior robustness, with a performance drop of only 8.6% in adversarial tests, far surpassing the over 15% degradation in other methods. Furthermore, its low latency (3.12s) and token overhead (11.54) place it on the accuracy-efficiency Pareto frontier, while ablation studies validate the necessity of each component. Ultimately, RaCoT reframes the RAG paradigm from "post-hoc context cleaning" to "a priori shaping of discriminative reasoning," offering an efficient and robust path toward reliable AI systems for real-time, resource-constrained deployments.

AAAI Conference 2026 Conference Paper

Top-Down Semantic Refinement for Image Captioning

  • jusheng zhang
  • Kaitong Cai
  • Jing Yang
  • Jian Wang
  • Chengpei Tang
  • Keze Wang

Large Vision-Language Models (VLMs) face an inherent contradiction in image captioning: their powerful single-step generation capabilities often lead to a myopic decision-making process. This makes it difficult to maintain global narrative coherence while capturing rich details, a limitation that is particularly pronounced in tasks that require multi-step and complex scene description. To overcome this fundamental challenge, we redefine image captioning as a goal-oriented hierarchical refinement planning problem, and further propose a novel framework, named Top-Down Semantic Refinement (TDSR), which models the generation process as a Markov Decision Process (MDP). However, planning within the vast state space of a VLM presents a significant computational hurdle. Our core contribution, therefore, is the design of a highly efficient Monte Carlo Tree Search (MCTS) algorithm tailored for VLMs. By incorporating a visual-guided parallel expansion and a lightweight value network, our TDSR reduces the call frequency to the expensive VLM by an order of magnitude without sacrificing planning quality. Furthermore, an adaptive early stopping mechanism dynamically matches computational overhead to the image's complexity. Extensive experiments on multiple benchmarks, including DetailCaps, COMPOSITIONCAP, and POPE, demonstrate that our TDSR, as a plug-and-play module, can significantly enhance the performance of existing VLMs (e.g., LLaVA-1.5, Qwen2.5-VL) by achieving state-of-the-art or highly competitive results in fine-grained description, compositional generalization, and hallucination suppression.

AAAI Conference 2026 Conference Paper

Towards Multimodal Continual Knowledge Embedding with Modality Forgetting Modulation

  • Xiaowen Jiang
  • Jing Yang
  • ShunDong Yang
  • Yuan Gao
  • Xinfa Jiang
  • Laurence Tianruo Yang
  • Jieming Yang

The continuous emergence of new entities, relations, triples, and multimodal information drives the dynamic evolution of multimodal knowledge graph (MMKG). However, existing MMKG embedding models follow a static setting, where training from scratch for growing MMKG wastes learned knowledge, while fine-tuning on new knowledge easily leads to catastrophic forgetting, severely limiting their applicability in real-world scenarios. To address this, we propose a multimodal continual representation learning framework (MoFot) for growing MMKG. Unlike existing static multimodal embedding methods, MoFot focuses on alleviating catastrophic forgetting rather than retraining to adapt to new knowledge. Specifically, MoFot effectively mitigates catastrophic forgetting caused by parameter updates and differing forgetting rates across modalities through a multimodal collaborative modulation mechanism. The mechanism ensures consistent retention of previously learned multimodal knowledge across snapshots through multimodal weight modulation and multimodal feature modulation. MoFot outperforms existing MMKG embedding, KG continual learning, and MMKG inductive models. Experimental results demonstrate that MoFot not only avoids forgetting but also enhances old knowledge by learning new knowledge, achieving adaptation to new knowledge while mitigating forgetting of old knowledge.

JBHI Journal 2025 Journal Article

Diffusion Model with Relation-Aware Attention and Edge-Aware Constraint for Multi-Modal Brain Tumor Segmentation

  • Xu Xu
  • Jing Yang
  • Dayu Hu
  • Muhammad Attique Khan
  • Lip Yee Por
  • Congsheng Li

Multi-modal brain tumors segmentation is a critical step for diagnosing and monitoring brain-related disease. Many studies have developed models for this task, but two challenges remain, i. e. , weak feature aggregation and poorly segmented edges. To address these issues, we develop an improved Diffusion model with relation-aware attention and edge-aware constraint, namely Diff-RE, for multi-modal brain tumor segmentation. Specifically, the volume data and noisy segmentation label map are paralleled fed into encoder module to extract high-level features. During training, weights are shared to ensure consistency. Then, extracted features are channel-wise concatenated and passed through the relation-aware attention module, which enhances appearance features using global structural relationships. Finally, the decoder module processes the attention-enhanced features to generate segmentation results. To improve boundary accuracy, an edge-aware constraint module is introduced during training. Our framework is trained and evaluated using three benchmark datasets, i. e. , BraTS 2018, 2019, and 2020. Experimental results demonstrate that Diff-RE is effective and highlight its superiority over peer methods.

JBHI Journal 2025 Journal Article

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning

  • Yinda Chen
  • Yangfan He
  • Jing Yang
  • Dapeng Zhang
  • Zhenlong Yuan
  • Muhammad Attique Khan
  • Jamel Baili
  • Por Lip Yee

Prompt engineering significantly influences the reliability and clinical utility of Large Language Models (LLMs) in medical applications. Current optimization approaches inadequately address domain-specific medical knowledge and safety requirements. This paper introduces EMPOWER, a novel evolutionary framework that enhances medical prompt quality through specialized representation learning, multi-dimensional evaluation, and structure-preserving algorithms. Our methodology incorporates: (1) a medical terminology attention mechanism, (2) a comprehensive assessment architecture evaluating clarity, specificity, clinical relevance, and factual accuracy, (3) a component-level evolutionary algorithm preserving clinical reasoning integrity, and (4) a semantic verification module ensuring adherence to medical knowledge. Evaluation across diagnostic, therapeutic, and educational tasks demonstrates significant improvements: 24. 7% reduction in factually incorrect content, 19. 6% enhancement in domain specificity, and 15. 3% higher clinician preference in blinded evaluations. The framework addresses critical challenges in developing clinically appropriate prompts, facilitating more responsible integration of LLMs into healthcare settings.

ICML Conference 2025 Conference Paper

Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function

  • Christopher Subich
  • Syed Zahid Husain
  • Leo Separovic
  • Jing Yang

Recent advancements in data-driven weather forecasting models have delivered deterministic models that outperform the leading operational forecast systems based on traditional, physics-based models. However, these data-driven models are typically trained with a mean squared error loss function, which causes smoothing of fine scales through a “double penalty” effect. We develop a simple, parameter-free modification to this loss function that avoids this problem by separating the loss attributable to decorrelation from the loss attributable to spectral amplitude errors. Fine-tuning the GraphCast model with this new loss function results in sharp deterministic weather forecasts, an increase of the model’s effective resolution from 1, 250km to 160km, improvements to ensemble spread, and improvements to predictions of tropical cyclone strength and surface wind extremes.

ICLR Conference 2025 Conference Paper

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

  • Tianhao (Walter) Wu
  • Jing Yang
  • Zhilin Guo 0001
  • Jingyi Wan
  • Fangcheng Zhong
  • Cengiz Öztireli

The ability to reconstruct realistic and controllable upper body avatars from casual monocular videos is critical for various applications in communication and entertainment. By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects.

NeurIPS Conference 2025 Conference Paper

Greedy Sampling Is Provably Efficient For RLHF

  • Di Wu
  • Chengshuai Shi
  • Jing Yang
  • Cong Shen

Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique for post‑training large language models. Despite its empirical success, the theoretical understanding of RLHF is still limited, as learning the KL-regularized target with only preference feedback poses additional challenges compared with canonical RL. Existing works mostly study the reward-based Bradley-Terry (BT) preference model, and extend classical designs utilizing optimism or pessimism. This work, instead, considers the general preference model (whose practical relevance has been observed recently) and obtains performance guarantees with major, order-wise improvements over existing ones. Surprisingly, these results are derived from algorithms that directly use empirical estimates (i. e. , greedy sampling), as opposed to constructing optimistic or pessimistic estimates in previous works. This insight has a deep root in the unique structural property of the optimal policy class under the KL-regularized target, and we further specialize it to the BT model, highlighting the surprising sufficiency of greedy sampling in RLHF.

JBHI Journal 2025 Journal Article

IoT-Driven Skin Cancer Detection: Active Learning and Hyperparameter Optimization for Enhanced Accuracy

  • Jing Yang
  • Haoshen Qin
  • Jinli Wang
  • Por Lip Yee
  • Sunil Prajapat
  • Gyanendra Kumar
  • Balamurugan Balusamy
  • Ali Kashif Bashir

Skin cancer, one of the most prevalent and lethal cancer types, poses significant challenges for early diagnosis due to the diversity in lesion size, shape, color, and surface reflections. The Internet of Things (IoT) has revolutionized healthcare by enabling real-time data exchange and supporting advancements in automated diagnosis through deep learning (DL) techniques such as convolutional neural networks (CNNs). However, CNNs often require large, labeled datasets, which are costly and time-consuming to compile. To address these challenges, we propose an innovative active learning (AL) framework driven by deep reinforcement learning (DRL) and a novel scope loss function. This framework optimizes classification while reducing reliance on extensive labeled data. Unlike traditional active learning techniques that rely on static selection methods, our model dynamically incorporates deep reinforcement learning (DRL) for strategic sample selection during training. The scope loss function balances the exploitation of labeled data with the exploration of new, unlabeled data, enabling efficient training. Additionally, an enhanced artificial bee colony (ABC) algorithm with a mutual learning strategy optimizes hyperparameter tuning, boosting model performance. Evaluated on the International Skin Imaging Collaboration (ISIC) and human against machines 10000 images (HAM10000) datasets, the proposed framework achieved high accuracy, with F-measures of 92. 791% and 91. 984%, respectively. This novel approach demonstrates significant potential to advance early skin cancer detection, offering a reliable and efficient tool for healthcare professionals.

JBHI Journal 2025 Journal Article

Large Model Driven Multi-Granularity Medical Image Analysis: A Fuzzy Logic-Guided Framework

  • Guan Wang
  • Mingyu Xu
  • Chao Li
  • Xingsi Xue
  • Bo Yi
  • Jing Yang

The analysis of medical images requires sophisticated computational approaches that can handle the inherent complexity and uncertainty present in pathological structures. This paper presents a large model driven framework that integrates fuzzy logic principles with transformer-based architectures to enable multi-granularity medical image analysis. The proposed approach, termed ULVM-MG, employs a sophisticated feature extraction strategy that simultaneously processes pathological images at coarse, medium, and fine granularity levels, mirroring the systematic examination methodology employed by experienced pathologists. In particular, a fuzzy-guided cross-attention mechanism directs the transformer's attention toward diagnostically significant regions while preserving essential contextual information. regions while preserving essential contextual information. Comprehensive evaluation on histopathological datasets demonstrates superior performance compared to state-of-the-art transformer-based approaches. ULVM-MG achieves 98. 76% and 97. 34% accuracy on LC25000 and NCT datasets, respectively, outperforming the best baseline by 1. 61% and 2. 17%. The framework excels particularly in distinguishing morphologically similar tissue types and benign versus malignant classification tasks. Ablation studies confirm the critical contributions of multi-granularity processing and fuzzy uncertainty modeling, with statistical analysis revealing significant performance improvements across all evaluation metrics.

ICLR Conference 2025 Conference Paper

MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction

  • Jing Yang
  • Minyue Jiang
  • Sen Yang
  • Xiao Tan 0001
  • Yingying Li
  • Errui Ding
  • Jingdong Wang 0001
  • Hanli Wang

The construction of vectorized high-definition map typically requires capturing both category and geometry information of map elements. Current state-of-the-art methods often adopt solely either point-level or instance-level representation, overlooking the strong intrinsic relationship between points and instances. In this work, we propose a simple yet efficient framework named MGMapNet (multi-granularity map network) to model map elements with multi-granularity representation, integrating both coarse-grained instance-level and fine-grained point-level queries. Specifically, these two granularities of queries are generated from the multi-scale bird's eye view features using a proposed multi-granularity aggregator. In this module, instance-level query aggregates features over the entire scope covered by an instance, and the point-level query aggregates features locally. Furthermore, a point-instance interaction module is designed to encourage information exchange between instance-level and point-level queries. Experimental results demonstrate that the proposed MGMapNet achieves state-of-the-art performances, surpassing MapTRv2 by 5.3 mAP on the nuScenes dataset and 4.4 mAP on the Argoverse2 dataset, respectively.

JBHI Journal 2025 Journal Article

Multi-Task Collaborative Assisted Training Method for Grouping Fuzzy Categories Classification of Cervical Cancer Cells

  • Yizhou Chen
  • Huiyan Jiang
  • Wenbo Pang
  • Zhaoshuo Diao
  • Jing Yang

Cervical cancer is a malignant tumor that endangers women's life and health. While deep learning has enhanced the accuracy of cervical cell classification, there remain obstacles impeding further performance enhancement, including the similarities between different categories, variability between single cells and cell clusters, as well as the accuracy of annotations. To address these issues, a novel multi-task collaborative framework for cervical cell classification is proposed. Specifically, to solve the similarity between different categories, we propose a grouping cell contrast auxiliary branch, which divides cervical cells into different groups and utilizes supervised contrastive learning to learn representative feature between different categories. And we introduce a multi-level cell classification auxiliary branch that simultaneously performs 5-class, 3-class, and 2-class classification tasks, and explicitly constrains the inter-class relationship learning of cervical cells. Furthermore, to solve the variations within the same category of single cells and cell clusters, we propose an image reconstruction auxiliary branch, which encourages the model to learn more contextual features. Finally, to solve subjectivity and accuracy of annotations, we introduce a soft label distillation auxiliary branch, which constrains the consistency of probability distributions between the encoder and the momentum encoder. It is worth noting that these auxiliary branches only work during training and will not add additional computational consumption during inference. We validate on the HSJCC, DSCC and SIPaKMeD datasets. Compared to existing methods, our approach has achieved outstanding performance and effectively mitigates the issues raised, demonstrating its effectiveness in automated cervical cell classification.

JBHI Journal 2025 Journal Article

Quantum-Resistant Privacy Preservation for Mobile Healthcare Services in Connected Transportation Systems via Deep Neural Architectures

  • Xinyue Li
  • Bo Yi
  • Xingsi Xue
  • Zhi Wang
  • Jing Yang

The rapid convergence of connected transportation networks and real-time healthcare services has given rise to new security and privacy challenges. Conventional cryptographic mechanisms, primarily designed for classical adversaries, may soon be rendered obsolete by quantum computers, posing dire risks to the confidentiality of sensitive medical data. This work proposes a quantum-resistant privacy preservation framework for mobile healthcare systems operating in vehicular networks. Leveraging lattice-based cryptography-specifically Ring Learning-with-Errors (Ring-LWE)-our approach ensures robust encryption and key management, rendering patient data impervious to quantum-based attacks. Complementing this cryptographic layer is a deep neural network architecture that integrates convolutional and attention-based modules to detect network anomalies with high accuracy and minimal latency. We demonstrate the feasibility of our method through comprehensive experiments that measure (1) cryptographic overhead, (2) intrusion detection effectiveness, and (3) end-to-end system performance under realistic conditions and varied load scenarios. Experimental results show that the proposed scheme can maintain sub-100 ms end-to-end latencies for healthcare data transfer in high-traffic urban networks, detecting a wide range of attacks at accuracy levels exceeding 95%. These findings underscore the potential of combining post-quantum cryptographic primitives with advanced deep learning to secure time-sensitive medical applications within next-generation intelligent transportation systems.

NeurIPS Conference 2025 Conference Paper

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

  • Renpu Liu
  • Jing Yang

Large language models (LLMs) exhibit impressive in‑context learning (ICL) capabilities, yet the quality of their predictions is fundamentally limited by the few costly labeled demonstrations that can fit into a prompt. Meanwhile, there exist vast and continuously growing amounts of unlabeled data that may be closely related to the ICL task. How to utilize such unlabeled data to provably enhance the performance of ICL thus becomes an emerging fundamental question. In this work, we propose a novel augmented ICL framework, in which the prompt includes a small set of labeled examples alongside a block of unlabeled inputs. We focus on the multi-class linear classification setting and demonstrate that, with chain-of-thought (CoT) prompting, a multi-layer transformer can effectively emulate an expectation–maximization (EM) algorithm. This enables the transformer to implicitly extract useful information from both labeled and unlabeled data, leading to provable improvements in ICL accuracy. Moreover, we show that such a transformer can be trained via teacher forcing, with its parameters converging to the desired solution at a linear rate. Experiments demonstrate that the augmented ICL framework consistently outperforms conventional few-shot ICL, providing empirical support for our theoretical findings. To the best of our knowledge, this is the first theoretical study on the impact of unlabeled data on the ICL performance of transformers.

NeurIPS Conference 2024 Conference Paper

Efficient Prompt Optimization Through the Lens of Best Arm Identification

  • Chengshuai Shi
  • Kun Yang
  • Zihan Chen
  • Jundong Li
  • Jing Yang
  • Cong Shen

The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i. e. , prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e. g. , accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance.

NeurIPS Conference 2024 Conference Paper

Federated Online Prediction from Experts with Differential Privacy: Separations and Regret Speed-ups

  • Fengyu Gao
  • Ruiquan Huang
  • Jing Yang

We study the problems of differentially private federated online prediction from experts against both *stochastic adversaries* and *oblivious adversaries*. We aim to minimize the average regret on $m$ clients working in parallel over time horizon $T$ with explicit differential privacy (DP) guarantees. With stochastic adversaries, we propose a **Fed-DP-OPE-Stoch** algorithm that achieves $\sqrt{m}$-fold speed-up of the per-client regret compared to the single-player counterparts under both pure DP and approximate DP constraints, while maintaining logarithmic communication costs. With oblivious adversaries, we establish non-trivial lower bounds indicating that *collaboration among clients does not lead to regret speed-up with general oblivious adversaries*. We then consider a special case of the oblivious adversaries setting, where there exists a low-loss expert. We design a new algorithm **Fed-SVT** and show that it achieves an $m$-fold regret speed-up under both pure DP and approximate DP constraints over the single-player counterparts. Our lower bound indicates that Fed-SVT is nearly optimal up to logarithmic factors. Experiments demonstrate the effectiveness of our proposed algorithms. To the best of our knowledge, this is the first work examining the differentially private online prediction from experts in the federated setting.

JBHI Journal 2024 Journal Article

Improving Needle Tip Tracking and Detection in Ultrasound-Based Navigation System Using Deep Learning-Enabled Approach

  • Hui Che
  • Jiaxin Qin
  • Yao Chen
  • Zihan Ji
  • Yibo Yan
  • Jing Yang
  • Qi Wang
  • Chaofeng Liang

Ultrasound-guided percutaneous interventions have numerous advantages over traditional techniques. Accurate needle placement in the target anatomy is crucial for successful intervention, and reliable visual information is essential to achieve this. However, previous studies have revealed several challenges, such as the variability in needle echogenicity and the common misalignment of the ultrasound beam and the needle. Advanced techniques have been developed to optimize needle visualization, including hardware-based and image-processing-based methods. This paper proposes a novel strategy of integrating ultrasound-based deep learning approaches into an optical navigation system to enhance needle visualization and improve tip positioning accuracy. Both the tracking and detection algorithms are optimized utilizing optical tracking information. The information is introduced into the tracking network to define the search patch update strategy and form a trajectory reference to correct tracking results. In the detection network, the original image is processed according to the needle insertion position and current position given by the optical localization system to locate a coarse region, and the depth-score criterion is adopted to optimize detection results. Extensive experiments demonstrate that our approach achieves promising tip tracking and detection performance with tip localization errors of 1. 11 $\pm $ 0. 59 mm and 1. 17 $\pm$ 0. 70 mm, respectively. Moreover, we establish a paired dataset consisting of ultrasound images and their corresponding spatial tip coordinates acquired from the optical tracking system and conduct real puncture experiments to verify the effectiveness of the proposed methods. Our approach significantly improves needle visualization and provides physicians with visual guidance for posture adjustment.

NeurIPS Conference 2024 Conference Paper

Non-asymptotic Convergence of Training Transformers for Next-token Prediction

  • Ruiquan Huang
  • Yingbin Liang
  • Jing Yang

Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data, especially in next-token prediction (NTP) tasks. However, the theoretical understanding of their performance in NTP is limited, with existing studies focusing mainly on asymptotic performance. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer consisting of a self-attention module followed by a feed-forward layer. We first characterize the essential structural properties of training datasets for NTP using a mathematical framework based on partial orders. Then, we design a two-stage training algorithm, where the pre-processing stage for training the feed-forward layer and the main stage for training the attention layer exhibit fast convergence performance. Specifically, both layers converge sub-linearly to the direction of their corresponding max-margin solutions. We also show that the cross-entropy loss enjoys a linear convergence rate. Furthermore, we show that the trained transformer presents non-trivial prediction ability with dataset shift, which sheds light on the remarkable generalization performance of transformers. Our analysis technique involves the development of novel properties on the attention gradient and further in-depth analysis of how these properties contribute to the convergence of the training process. Our experiments further validate our theoretical findings.

AAAI Conference 2024 Conference Paper

PMET: Precise Model Editing in a Transformer

  • Xiaopeng Li
  • Shasha Li
  • Shezheng Song
  • Jing Yang
  • Jun Ma
  • Jie Yu

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the \textsc{counterfact} and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at \url{https://github.com/xpq-tech/PMET}.

NeurIPS Conference 2024 Conference Paper

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

  • Chengshuai Shi
  • Kun Yang
  • Jing Yang
  • Cong Shen

The in-context learning (ICL) capability of pre-trained models based on the transformer architecture has received growing interest in recent years. While theoretical understanding has been obtained for ICL in reinforcement learning (RL), the previous results are largely confined to the single-agent setting. This work proposes to further explore the in-context learning capabilities of pre-trained transformer models in competitive multi-agent games, i. e. , in-context game-playing (ICGP). Focusing on the classical two-player zero-sum games, theoretical guarantees are provided to demonstrate that pre-trained transformers can provably learn to approximate Nash equilibrium in an in-context manner for both decentralized and centralized learning settings. As a key part of the proof, constructional results are established to demonstrate that the transformer architecture is sufficiently rich to realize celebrated multi-agent game-playing algorithms, in particular, decentralized V-learning and centralized VI-ULCB.

JBHI Journal 2023 Journal Article

An Uncertainty-Aware and Sex-Prior Guided Biological Age Estimation From Orthopantomogram Images

  • Dong Zhang
  • Jing Yang
  • Shaoyi Du
  • Wenqing Bu
  • Yu-cheng Guo

Bone age, as a measure of biological age (BA), plays an important role in a variety of fields, including forensics, orthodontics, sports, and immigration. Despite its significance, accurate estimation of BA remains a challenge due to the uncertainty error between BA and chronological age (CA) caused by individual diversity and the difficult integration of multiple factors, such as sex, and identified or measured anatomical structures, into the estimation process. To address problems, we propose an uncertainty-aware and sex-prior guided biological age estimation from orthopantomogram images (OPGs), named UASP-BAE, which models uncertainty errors while setting sex dimorphism as tractive features to enhance age-related specific features, aiming to improve the accuracy of BA estimation. Furthermore, considering the global relevance of the anatomic structure, such as the mandible, teeth, maxillary sinus, etc. , a cross-attention module based on CNN and self-attention is proposed to mine the local texture and global semantic features of OPGs. Moreover, we design a novel age composition loss by cross-entropy, probability bias, and regression functions, aiming at evaluating BA's uncertainty errors and results to obtain an accurate and robust model. On 10703 OPGs from 5. 00 to 25. 00 years of age, our model had a best MAE value of 0. 8005 years and higher than the comparison popular algorithms, which also demonstrates the method's potential for improved accuracy in BA estimation.

ICLR Conference 2023 Conference Paper

Light Sampling Field and BRDF Representation for Physically-based Neural Rendering

  • Jing Yang
  • Hanyuan Xiao
  • Wenbin Teng
  • Yunxuan Cai
  • Yajie Zhao

Physically-based rendering (PBR) is key for immersive rendering effects used widely in the industry to showcase detailed realistic scenes from computer graphics assets. A well-known caveat is that producing the same is computationally heavy and relies on complex capture devices. Inspired by the success in quality and efficiency of recent volumetric neural rendering, we want to develop a physically-based neural shader to eliminate device dependency and significantly boost performance. However, no existing lighting and material models in the current neural rendering approaches can accurately represent the comprehensive lighting models and BRDFs properties required by the PBR process. Thus, this paper proposes a novel lighting representation that models direct and indirect light locally through a light sampling strategy in a learned light sampling field. We also propose BRDF models to separately represent surface/subsurface scattering details to enable complex objects such as translucent material (i.e., skin, jade). We then implement our proposed representations with an end-to-end physically-based neural face skin shader, which takes a standard face asset (i.e., geometry, albedo map, and normal map) and an HDRI for illumination as inputs and generates a photo-realistic rendering as output. Extensive experiments showcase the quality and efficiency of our PBR face skin shader, indicating the effectiveness of our proposed lighting and material representations.

IS Journal 2023 Journal Article

Parallel Intelligence in CPSSs: Being, Becoming, and Believing

  • Jing Yang
  • Yonglin Tian
  • Xiao Wang
  • Fei-Yue Wang

The recent debut and success of ChatGPT have brought up renewed debates and desires for artificial general intelligence (AGI) amid fears and anxieties of potential disruptions to our humanity and social values, as witnessed by the call from tech celebrities for a pause in the development of ChatGPT-style AGI tools. At the IEEE IS’ AI and CPSS Department, we would like to initiate cautious, balanced, hopefully deep investigations to address various related issues on the impact and significance of intelligent science and technology to our economy and society. Let’s start with the “three Bs” and “ACP” for parallel intelligence in CPSSs: Being by artificial systems (A), Becoming through computational experiments (C), and Believing with parallel execution (P).

NeurIPS Conference 2023 Conference Paper

Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

  • Yuan Cheng
  • Jing Yang
  • Yingbin Liang

Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.

AAAI Conference 2022 Conference Paper

Feature Generation and Hypothesis Verification for Reliable Face Anti-spoofing

  • Shice Liu
  • Shitao Lu
  • Hongyi Xu
  • Jing Yang
  • Shouhong Ding
  • Lizhuang Ma

Although existing face anti-spoofing (FAS) methods achieve high accuracy in intra-domain experiments, their effects drop severely in cross-domain scenarios because of poor generalization. Recently, multifarious techniques have been explored, such as domain generalization and representation disentanglement. However, the improvement is still limited by two issues: 1) It is difficult to perfectly map all faces to a shared feature space. If faces from unknown domains are not mapped to the known region in the shared feature space, accidentally inaccurate predictions will be obtained. 2) It is hard to completely consider various spoof traces for disentanglement. In this paper, we propose a Feature Generation and Hypothesis Verification framework to alleviate the two issues. Above all, feature generation networks which generate hypotheses of real faces and known attacks are introduced for the first time in the FAS task. Subsequently, two hypothesis verification modules are applied to judge whether the input face comes from the real-face space and the real-face distribution respectively. Furthermore, some analyses of the relationship between our framework and Bayesian uncertainty estimation are given, which provides theoretical support for reliable defense in unknown domains. Experimental results show our framework achieves promising results and outperforms the state-of-the-art approaches on extensive public datasets.

IS Journal 2022 Journal Article

Metaverses and DeMetaverses: From Digital Twins in CPS to Parallel Intelligence in CPSS

  • Xiao Wang
  • Jing Yang
  • Jinpeng Han
  • Wei Wang
  • Fei-Yue Wang

A total of 12 years have been passed since this Department was created in 2010 as the first academic forum dedicated to cyber-physical-social systems (CPSS), with the first CPSS research article on the field: “The Emergence of Intelligent Enterprises: From CPS to CPSS. ” What has happened and changed during the past decade? A brief reflection and review are presented here with a focus on digital twins in CPS versus parallel intelligence in CPSS, and their relationship to blockchain intelligence, smart contracts, metaverses, DAO, Web3, and decentralized science. The concept of DeMetaverses is thus introduced and interpreted as a DAO-based decentralized autonomous metaverse. The characteristics, mechanism, and impact of DeMetaverses are discussed with a vision for achieving an integrated human, artificial, natural, and organizational intelligence that would transform our world into “6S” societies.

NeurIPS Conference 2022 Conference Paper

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

  • Yuan Cheng
  • Songtao Feng
  • Jing Yang
  • Hong Zhang
  • Yingbin Liang

As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream RL in both online and offline settings, where the agent is assigned with a new task sharing the same representation as the upstream tasks. For both online and offline settings, we develop a sample-efficient algorithm, and show that it finds a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large. Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.

NeurIPS Conference 2021 Conference Paper

Federated Linear Contextual Bandits

  • Ruiquan Huang
  • Weiqiang Wu
  • Jing Yang
  • Cong Shen

This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-PE relies on a novel multi-client G-optimal design, and achieves near-optimal regrets for both disjoint and shared parameter cases with logarithmic communication costs. In addition, a new concept called collinearly-dependent policies is introduced, based on which a tight minimax regret lower bound for the disjoint parameter case is derived. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.

NeurIPS Conference 2021 Conference Paper

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization

  • Chengshuai Shi
  • Wei Xiong
  • Cong Shen
  • Jing Yang

Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON -- Batched Exploration with Adaptive COmmunicatioN -- that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.

AAAI Conference 2021 Conference Paper

Looking Wider for Better Adaptive Representation in Few-Shot Learning

  • Jiabao Zhao
  • Yifan Yang
  • Xin Lin
  • Jing Yang
  • Liang He

Building a good feature space is essential for the metric-based few-shot algorithms to recognize a novel class with only a few samples. The feature space is often built by Convolutional Neural Networks (CNNs). However, CNNs primarily focus on local information with the limited receptive field, and the global information generated by distant pixels is not well used. Meanwhile, having a global understanding of the current task and focusing on distinct regions of the same sample for different queries are important for the few-shot classification. To tackle these problems, we propose the Cross Non-Local Neural Network (CNL) for capturing the long-range dependency of the samples and the current task. CNL extracts the taskspecific and context-aware features dynamically by strengthening the features of the sample at a position via aggregating information from all positions of itself and the current task. To reduce losing important information, we maximize the mutual information between the original and refined features as a constraint. Moreover, we add a task-specific scaling to deal with multi-scale and task-specific features extracted by CNL. We conduct extensive experiments for validating our proposed algorithm, which achieves new state-of-the-art performances on two public benchmarks.

AAAI Conference 2020 Conference Paper

CF-LSTM: Cascaded Feature-Based Long Short-Term Networks for Predicting Pedestrian Trajectory

  • Yi Xu
  • Jing Yang
  • Shaoyi Du

Pedestrian trajectory prediction is an important but difficult task in self-driving or autonomous mobile robot field because there are complex unpredictable human-human interactions in crowded scenarios. There have been a large number of studies that attempt to understand humans’ social behavior. However, most of these studies extract location features from previous one time step while neglecting the vital velocity features. In order to address this issue, we propose a novel feature-cascaded framework for long short-term network (CF-LSTM) without extra artificial settings or social rules. In this framework, feature information from previous two time steps are firstly extracted and then integrated as a cascaded feature to LSTM, which is able to capture the previous location information and dynamic velocity information, simultaneously. In addition, this scene-agnostic cascaded feature is the external manifestation of complex human-human interactions, which can also effectively capture dynamic interaction information in different scenes without any other pedestrians’ information. Experiments on public benchmark datasets indicate that our model achieves better performance than the state-of-the-art methods and this feature-cascaded framework has the ability to implicitly learn human-human interactions.

AAAI Conference 2020 Conference Paper

FAN-Face: a Simple Orthogonal Improvement to Deep Face Recognition

  • Jing Yang
  • Adrian Bulat
  • Georgios Tzimiropoulos

It is known that facial landmarks provide pose, expression and shape information. In addition, when matching, for example, a profile and/or expressive face to a frontal one, knowledge of these landmarks is useful for establishing correspondence which can help improve recognition. However, in prior work on face recognition, facial landmarks are only used for face cropping in order to remove scale, rotation and translation variations. This paper proposes a simple approach to face recognition which gradually integrates features from different layers of a facial landmark localization network into different layers of the recognition network. To this end, we propose an appropriate feature integration layer which makes the features compatible before integration. We show that such a simple approach systematically improves recognition on the most difficult face recognition datasets, setting a new state-of-theart on IJB-B, IJB-C and MegaFace datasets.

IJCAI Conference 2018 Conference Paper

Cost-aware Cascading Bandits

  • Ruida Zhou
  • Chao Gan
  • Jing Yang
  • Cong Shen

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an {\it ordered} list of items and \congr{examines} them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected {\it net reward} in each step, i. e. , the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of items, as well as when to stop examination. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the offline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in $O(\log T)$. We also provide a lower bound for all $\alpha$-consistent policies, which scales in $\Omega(\log T)$ and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

IROS Conference 2014 Conference Paper

Integrating multiple soft constraints for planning practical paths

  • Jing Yang
  • Patrick W. Dymond
  • Michael Jenkin

Sampling-based algorithms are a common approach to high-dimensional real-world path planning problems. Unfortunately the solutions found using such planners are often not practical in that they do not take into account soft application-specific constraints. This paper formulates the practicality of paths based on the notion of soft constraints found in the Planning Domain Definition Language 3 (PDDL3) [21] and a range of optimization strategies are developed targeted towards user-preferred qualities by integrating soft constraints in the pre-processing, planning and post-processing phases of the sampling-based path planners. An auction-based resource allocation approach coordinates competing optimization strategies. This approach uses an adaptive bidding strategy for each optimizer and in each round the optimizer with the best predicted performance is selected. This general coordination system allows for flexibility in both the number and types of the optimizers used. Experimental validation demonstrates the effectiveness of the approach.