Author name cluster

Jing Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

96 papers

2 author rows

EAAI Journal 2026 Journal Article

A novel framework for segmenting open-pit mining road

Shuo Fan
Yachun Mao
Shuai Zhen
Jing Liu
Liming He
Xinqi Mao

Accurate segmentation of open-pit mine road networks presents a critical challenge for mine digitization and autonomous driving applications. These roads are prone to mechanical compaction, geological erosion, and coverage by gravel dust, resulting in segmentation outcomes characterized by blurred boundaries, holes, fractures, and geometric deformations, which severely compromise measurement accuracy. To address these challenges, this paper proposes the Mining Road Segmentation Network (MRS-Net), which integrates local features with global semantics. First, a Residual Network Version 2 (ResNetV2)-Transformer cascaded encoder is constructed, employing residual connections to preserve sub-pixel-level edge details and multi-head self-attention to establish long-range dependencies, thereby enhancing the representation of weak texture features. Second, the Road Multi-scale Features Fusion Module (RMFF) was designed to extract local geometric features and global continuity features through progressive hollow convolution, enabling the model to extract multi-scale features and effectively suppress interference from gravel dust. Finally, a progressive decoding architecture incorporating bilinear interpolation is adopted to improve edge smoothness. MRS-Net is evaluated on an Unmanned Aerial Vehicle (UAV)-acquired road dataset from the Anshan open-pit iron mine in Liaoning Province, China. Results demonstrate that MRS-Net achieves superior segmentation performance compared to models such as DeepLabV3+ and TransUNet across three distinct scenarios: main roads, temporary roads, and abandoned roads. Specifically, it achieves Intersection over Union (IoU), Dice coefficient(Dice), and Kappa coefficient (Kappa) values of 89. 4 % / 94. 1 % / 87. 2 %, 75. 7 % / 83. 3 % / 75. 1 %, and 83. 8 % / 90. 0 % / 84. 85 % respectively for these scenarios.

JBHI Journal 2026 Journal Article

A Novel Grasping Robot Control Method Using Motion Execution BCI Combining Knowledge Reasoning

Rui Li
Jing Liu
Jinli Liu
Shiqiang Yang
Weiping Liu
Ke Deng
Wen Wang

Recently, with the growing number of disabled people, brain-controlled technology offers a novel way to help patients restore their daily abilities. However, the conventional brain-controlled system based on the motion related task lacks intelligence in real-world environments. To address above problem, this study proposed a share-controlled system combining a precise hand movement (PHM)-based brain computer interface (BCI) system and knowledge-driven reasoning method. Six types of precise hand movements were selected to design novel motion execution paradigm for BCI system. A feature intermediate fusion convolutional neural network was employed to accurately decode electroencephalogram. Furthermore, a shared control grasping technology based on knowledge-based reasoning combined PHM-based BCI system was designed for grasping robot, which enhancing the system's intelligence and versatility in selecting objects. To verify the improvement of proposed method, experiments were conducted with 15 healthy subjects and 2 patients. The proposed method achieved an average accuracy of 82. 80 ± 6. 08%, with the highest accuracy reaching 94. 27%. All the experimental results demonstrate the effectiveness of the proposed shared control method.

AAAI Conference 2026 Conference Paper

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

Yuhao Wang
Ruiyang Ren
Yucheng Wang
Jing Liu
Xin Zhao
Hua Wu
Haifeng Wang

With the rapid advancement of large language models (LLMs), retrieval-augmented generation (RAG) has emerged as a critical approach to supplement the inherent knowledge limitations of LLMs. However, due to the typically large volume of retrieved information, RAG tends to operate with long context lengths. From the perspective of entropy engineering, we identify unconstrained entropy growth and attention dilution due to long retrieval context as significant factors affecting RAG performance. In this paper, we propose the balanced entropy-engineered RAG (BEE-RAG) framework, which improves the adaptability of RAG systems to varying context lengths through the principle of entropy invariance. By leveraging balanced context entropy to reformulate attention dynamics, BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level. Building upon this, we introduce a zero-shot inference strategy for multi-importance estimation and a parameter-efficient adaptive fine-tuning mechanism to obtain the optimal balancing factor for different settings. Extensive experiments across multiple RAG tasks demonstrate the effectiveness of BEE-RAG.

PDF Details DOI

YNIMG Journal 2026 Journal Article

Cerebro-cerebellar structure-function coupling’s role in motor recovery after infarction

Jing Liu
Yi Shan
Bi-Xiao Cui
Zhen-Ming Wang
Shao-Zhen Yan
Jie Xu
Lin-Lin Ye
Lei Cao

OBJECTIVE: To investigate the pathway-specific structure-function coupling induced by focal subcortical infarction and its influence on clinical symptoms. METHODS: In this prospective study, 50 patients with unilateral subcortical infarction and motor impairment and 50 matched controls underwent resting state fMRI, DTI, and Fugl-Meyer-Assessment lower-extremity (FMA-LE) at 7-14- and 30-days post-infarction. To analyze the pathway-specific structure-function coupling, we evaluated the association between structural integrity of the corticospinal tract (CST), dentate thalamocortical tract (DTCT), cortico-pontocerebellar tract (CPCT), and dorsal spinocerebellar tract (DSCT) and functional connectivity (FC) of corresponding subregions. Moderation analysis assesses whether the structure-function coupling pathway moderates FMA-LE. RESULTS: At baseline, patients exhibited significantly lower structural integrity of DTCT, DSCT, and CST than controls. We found structure-function couplings in the three motor pathways of the cerebro-cerebellar circuit: (1) contralesional thalamus to ipsilesional cerebellum-crus_2 with dentate thalamocortical tract (DTCT), (2) contralesional thalamus to cerebellum vermis_10 with dorsal spinocerebellar tract (DSCT), (3) ipsilesional precentral gyrus to frontal medial gyrus with CST. The baseline DSCT structural integrity specificity modulates the relationship between FC and FMA-LE over 30 days. CONCLUSIONS: We observed that cerebro-cerebellar circuit structure-function coupling after infarction, based on its anatomy and mapped to motor function (with DSCT as the key pathway mediating/moderating prognosis), serves as a potent biomarker for lower limb prognosis and a basis for precise rehabilitation.

AAAI Conference 2026 Conference Paper

GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models

Yushuo Zheng
Jiangyong Ying
Huiyu Duan
Chunyi Li
Zicheng Zhang
Jing Liu
Xiaohong Liu
Guangtao Zhai

Large multimodal models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks, however their knowledge and abilities in the cross-view geo-localization and pose estimation domains remain unexplored, despite potential benefits for navigation, autonomous driving, outdoor robotics, etc. To bridge this gap, we introduce GeoX-Bench, a comprehensive Benchmark designed to explore and evaluate the capabilities of LMMs in cross-view Geo-localization and pose estimation. Specifically, GeoX-Bench contains 10,859 panoramic-satellite image pairs spanning 128 cities in 49 countries, along with corresponding 755,976 question-answering (QA) pairs. Among these, 42,900 QA pairs are designated for benchmarking, while the remaining are intended to enhance the capabilities of LMMs. Based on GeoX-Bench, we evaluate the capabilities of 25 state-of-the-art LMMs on cross-view geo-localization and pose estimation tasks, and further explore the empowered capabilities of instruction-tuning. Our benchmark demonstrate that while current LMMs achieve impressive performance in geo-localization tasks, their effectiveness declines significantly on the more complex pose estimation tasks, highlighting a critical area for future improvement, and instruction-tuning LMMs on the training data of GeoX-Bench can significantly improve the cross-view geo-sense abilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LatentLLM: Activation-Aware Transform to Multi-Head Latent Attention

Toshiaki Koike-Akino
Xiangyu Chen
Jing Liu
Ye Wang
Pu (Perry) Wang
Matthew Brand

Modern foundation models such as large language models (LLMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor decomposition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory-efficient LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.

PDF Details DOI

EAAI Journal 2026 Journal Article

Machine learning-based corrosion prediction in supercritical carbon dioxide transport pipelines: Model evaluation and experimental validation

Emily Seto
Xingyu Li
Yimin Zeng
Jing Liu

The repurposing of existing oil and gas pipelines for carbon dioxide (CO2) transportation in carbon capture, utilization, and storage (CCUS) systems is gaining momentum. In these systems, CO2 is transported in a supercritical state (s-CO2), often containing aggressive impurities such as water (H2O), oxygen (O2), and acidic gases, which significantly increases corrosion risk, necessitating robust corrosion management strategies. Traditional corrosion evaluation and prediction methods are often time-consuming and costly, making machine learning (ML) a promising alternative for predicting complex corrosion scenarios. This study evaluates the potential of four ML classifiers, including random forest (RF), gradient boosting classifier (GBC), support vector machine (SVM), and K-nearest neighbors (KNN), for corrosion severity prediction in pipeline steels exposed to s-CO2 environments with varying impurity compositions. The models were trained on data comprising of temperature, pressure, exposure duration, and impurity type/concentration. Experimental validation was conducted to assess model reliability. Data preprocessing and domain-specific knowledge were identified as critical to improving model performance. Among the classifiers, RF exhibited the highest predicted accuracy (81. 0%) and an F1 score of 0. 798. Utilizing the most important 80% of features further improved the RF model's accuracy (85. 7%) and F1 score (0. 837). Feature importance analysis identified the interaction between H2O and sulfur dioxide (SO2), as well as SO2 content alone, as the most critical parameters underscoring the need for further corrosion assessments in s-CO2 systems containing SO2. This study highlights the promise of ML, particularly the RF classifier, for efficient and reliable corrosion prediction in CCUS pipeline applications, supported by experimental validation.

TIST Journal 2026 Journal Article

Mining High Average Utility Nonoverlapping Patterns from Sequential Database

Meng Geng
Youxi Wu
Yan Li
Jing Liu
Lei Guo
Xingquan Zhu
Xindong Wu

As a crucial aspect of data mining, high average utility sequential pattern mining (SPM) aims to discover low frequency and high average utility patterns (subsequences) in sequence data. Most existing high average utility SPM methods overlook the repetitive occurrences of patterns in each sequence, resulting in some important patterns being ignored. To address this issue, we focus on the problem of mining high average utility nonoverlapping patterns (HUPs) from sequential database, and propose an HUP-Miner algorithm. To reduce the need for repeated scanning of the original database, we use a position dictionary to record the occurrence information of each item. To reduce the number of candidate patterns generated, we adopt a pattern join strategy and explore four pruning strategies. To efficiently calculate the average utility of a pattern, we propose an SPC algorithm that utilizes the occurrence positions of sub-patterns. When compared with 12 competitive algorithms, the experimental results on 14 databases show that HUP-Miner gives superior results. Furthermore, we use information gain as the utility for each item, and find that the HUPs discovered in this way can generate better performance via a clustering analysis. All of the algorithms and databases used here are available from https://github.com/wuc567/Pattern-Mining/tree/master/HUP-Miner.

AAAI Conference 2026 Conference Paper

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

Feng Chen
Yefei He
Shaoxuan He
Yuanyu He
Jing Liu
Lequan Lin
Akide Liu
Zhaoyang Li

Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training–inference gap and lack the capacity for fine-grained token selection across multiple dimensions—such as queries, key-values (KV), and heads—leading to suboptimal performance and acceleration gains. In this paper, we introduce OmniSparse, a training-aware fine-grained sparse attention of long-video MLLMs, which is applied in both training and inference with dynamic token budget allocation. Specifically, OmniSparse contains three adaptive and complementary mechanisms: (1) query selection as lazy-active classification, aiming to retain active queries that capture broader semantic similarity, while discarding most of lazy ones that focus on limited local context and exhibit high functional redundancy with their neighbors, (2) KV selection with head-level dynamic budget allocation, where a shared budget is determined based on the flattest head and applied uniformly across all heads to ensure attention recall after selection, and (3) KV cache slimming to alleviate head-level redundancy, which selectively fetches visual KV cache according to the head-level decoding query pattern. Experimental results demonstrate that OmniSparse can achieve comparable performance with full attention, achieving 2.7x speedup during prefill and 2.4x memory reduction for decoding.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SFGA: Similarity-Constrained Fusion Learning for Unsupervised Anomaly Detection in Multiplex Graphs

Huiliang Zhai
Xiangyi Teng
Jing Liu

Multiplex graphs are widely used to model multi-relational complex systems and play an important role in various real-world scenarios, such as financial systems and social networks. Hence, detecting anomalous samples in multiplex graph becomes crucial to ensure cybersecurity and stability. Although existing homogeneous graph anomaly detection (GAD) methods can be applied to deal with multiplex graphs, they still face two major challenges: 1) Due to the multiplicity and complexity of relations in multiplex graphs, homogeneous GAD models fail to effectively capture anomalous behaviors that correlate with diverse relational patterns. 2) In real-world applications, malicious entities usually disguise themselves through various camouflage strategies, making it difficult to capture subtle anomalous features via single-relation analysis. To address these challenges, we propose a novel unsupervised anomaly detection method for multiplex graphs based on Similarity-constrained Fusion Graph Autoencoder (SFGA). In SFGA, we design a multiplex graph autoencoder and introduced a cross-plex attention module at the model bottleneck to achieve comprehensive modeling of cross-relation anomaly patterns. Then, a similarity balancing strategy is proposed to constrain node representations at the bottleneck from both local and global perspectives, enhancing the discriminative power against camouflaged anomalies of autoencoder and enabling more effective identification of anomalous nodes with overlapping or deceptive patterns. Extensive experiments are conducted on both synthetic and real-world datasets at varying scales, and the results demonstrate that our proposed method outperforms state-of-the-art approaches by a large margin.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SimpleDiffusion: A Lightweight and Efficient Conditional Diffusion Model for Multi-Modal Salient Object Detection

Shuo Zhang
Jiaming Huang
Wenbing Tang
Jing Liu
LI HAN
Jiandun Li
Hongchun Yuan
Zizhu Fan

Multi-modal salient object detection (MSOD), which integrates complementary modalities such as depth or thermal data, primarily faces two challenges: accurately preserving salient object details and effectively aligning cross-modal features. Recent advances in using Stable Diffusion to generate images with fine edge details have inspired researchers to reformulate MSOD as a conditional mask generation process guided by salient features, which has achieved excellent visual results. However, these approaches often overlook the high computational cost and large-scale architecture of Stable Diffusion, both of which render it unsuitable for real-world MSOD applications. Therefore, we propose SimpleDiffusion, the first lightweight and efficient conditional diffusion model for MSOD that does not rely on Stable Diffusion. Specifically, we propose an Adaptive Cross-Modal Fusion Conditional Network and a Latent Denoising Network to reduce the complexity of diffusion models. Furthermore, we design a Multi-modal Feature Rectification and Fusion Module to enhance the representational capacity of cross-modal salient features. Customized training and sampling strategies are also developed to improve inference efficiency and reduce erroneous object segmentations. Experiments on multiple MSOD datasets demonstrate that SimpleDiffusion reduces model size by over tenfold and improves inference speed by more than fivefold compared to other diffusion-based methods, while maintaining comparable or superior performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Textual Self-Attention Network: Test-Time Preference Optimization Through Textual Gradient-Based Attention

Shibing Mo
Haoyang Ruan
Kai Wu
Jing Liu

Large Language Models (LLMs) have demonstrated remarkable generalization capabilities, but aligning their outputs with human preferences typically requires expensive supervised fine-tuning. Recent test-time methods leverage textual feedback to overcome this, but they often critique and revise a single candidate response, lacking a principled mechanism to systematically analyze, weigh, and synthesize the strengths of multiple promising candidates. Such a mechanism is crucial because different responses may excel in distinct aspects (e.g., clarity, factual accuracy, or tone), and combining their best elements may produce a far superior outcome. This paper proposes the Textual Self-Attention Network (TSAN), a new paradigm for test-time preference optimization that requires no parameter updates. TSAN emulates self-attention entirely in natural language to overcome this gap: it analyzes multiple candidates by formatting them into textual keys and values, weighs their relevance using an LLM-based attention module, and synthesizes their strengths into a new, preference-aligned response under the guidance of the learned textual attention. This entire process operates in a textual gradient space, enabling iterative and interpretable optimization. Empirical evaluations demonstrate that with just three test-time iterations on a base SFT model, TSAN outperforms supervised models like Llama-3.1-70B-Instruct and surpasses the current state-of-the-art test-time alignment method by effectively leveraging multiple candidate solutions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories

Yanghong Mei
Yirong Yang
Longteng Guo
Qunbo Wang
Ming-Ming Yu
Xingjian He
Wenjun Wu
Jing Liu

Navigating complex urban environments using natural language instructions poses significant challenges for embodied agents, including noisy language instructions, ambiguous spatial references, diverse landmarks, and dynamic street scenes. Current visual navigation methods are typically limited to simulated or off-street environments, and often rely on precise goal formats, such as specific coordinates or images. This limits their effectiveness for autonomous agents like last-mile delivery robots navigating unfamiliar cities. To address these limitations, we introduce UrbanNav, a scalable framework that trains embodied agents to follow free-form language instructions in diverse urban settings. Leveraging web-scale city walking videos, we develop an scalable annotation pipeline that aligns human navigation trajectories with language instructions grounded in real-world landmarks. UrbanNav encompasses over 1,500 hours of navigation data and 3 million instruction-trajectory-landmark triplets, capturing a wide range of urban scenarios. Our model learns robust navigation policies to tackle complex urban scenarios, demonstrating superior spatial reasoning, robustness to noisy instructions, and generalization to unseen urban settings. Experimental results show that UrbanNav significantly outperforms existing methods, highlighting the potential of large-scale web video data to enable language-guided, real-world urban navigation for embodied agents.

PDF Details DOI

IROS Conference 2025 Conference Paper

An Easy Method for Extrinsic Calibration of Camera and Time-of-Flight Sensor

Tianyou Zhang
Jing Liu
Dragos Axinte
Xin Dong

A multi-zone (typically 8×8) time-of-flight (ToF) sensor offers a low-cost, low-power, and compact solution for range measurement, making it ideal for specialized robotic applications. However, its low resolution limits its usability. Pairing a ToF sensor with a camera enhances depth perception and can solve the unscaled metric problem in mono depth estimation. Advances in deep learning further enable high-quality depth map reconstruction from ToF-camera data, providing a cost-effective alternative. However, accurate ToF-camera calibration remains a challenge due to ToF sensor’s coarse depth output. This work presents a simple yet effective method for the extrinsic calibration of a ToF sensor with an RGB camera using only a chessboard and two whiteboards. A tailored two-plane fitting algorithm is proposed specifically for the ToF sensor. Moreover, our approach leverages parallel lines with vanishing points and geometric constraints from plane intersections. This eliminates the need for robotic arm movements or SLAM-based sensor pose reconstruction, significantly reducing complexity while maintaining high accuracy. Experimental results demonstrate that our method lowers the root mean square (RMS) depth difference from 96. 59 mm to 67. 89 mm, underscoring its effectiveness in practical applications. Code is publicly available in https://github.com/Tianyou-Nottingham/ToF-Camera-Calibration.

AAAI Conference 2025 Conference Paper

AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks

Shibing Mo
Kai Wu
Qixuan Gao
Xiangyi Teng
Jing Liu

In real-world applications, spectral Graph Neural Networks (GNNs) are powerful tools for processing diverse types of graphs. However, a single GNN often struggles to handle different graph types—such as homogeneous and heterogeneous graphs—simultaneously. This challenge has led to the manual design of GNNs tailored to specific graph types, but these approaches are limited by the high cost of labor and the constraints of expert knowledge, which cannot keep up with the rapid growth of graph data. To overcome these challenges, we introduce AutoSGNN, an automated framework for discovering propagation mechanisms in spectral GNNs. AutoSGNN unifies the search space for spectral GNNs by integrating large language models with evolutionary strategies to automatically generate architectures that adapt to various graph types. Extensive experiments on nine widely-used datasets, encompassing both homophilic and heterophilic graphs, demonstrate that AutoSGNN outperforms state-of-the-art spectral GNNs and graph neural architecture search methods in both performance and efficiency.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

MingMing Yu
Fei Zhu
Wenzhuo Liu
Yirong Yang
Qunbo Wang
Wenjun Wu
Jing Liu

Embodied agents are expected to perform object navigation in dynamic, open-world environments. However, existing approaches typically rely on static trajectories and a fixed set of object categories during training, overlooking the real-world requirement for continual adaptation to evolving scenarios. To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency. (2) An adaptive sampling strategy that selects diverse and informative experiences, thereby reducing redundancy and minimizing memory overhead. Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements. The code will be publicly available at \url{https: //bigtree765. github. io/C-Nav-project}.

AAAI Conference 2025 Conference Paper

Channel Merging: Preserving Specialization for Merged Experts

Mingyang Zhang
Jing Liu
Ganggui Ding
Linlin Ou
Xinyi Yu
Bohan Zhuang

Lately, the practice of utilizing task-specific fine-tuning has been implemented to improve the performance of large language models (LLM) in subsequent tasks. Through the integration of diverse LLMs, the overall competency of LLMs is significantly boosted. Nevertheless, traditional ensemble methods are notably memory-intensive, necessitating the simultaneous loading of all specialized models into GPU memory. To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference. Despite these advances, model merging often leads to parameter conflicts and performance decline as the number of experts increases. Previous methods to mitigate these conflicts include post-pruning and partial merging. However, both approaches have limitations, particularly in terms of performance and storage efficiency when merged experts increase. To address these challenges, we introduce Channel Merging, a novel strategy designed to minimize parameter conflicts while enhancing storage efficiency. This method initially clusters and merges channel parameters based on their similarity to form several groups offline. By ensuring that only highly similar parameters are merged within each group, it significantly reduces parameter conflicts. During inference, we can instantly look up the expert parameters from the merged groups, preserving specialized knowledge. Our experiments demonstrate that Channel Merging consistently delivers high performance, matching unmerged models in tasks like English and Chinese reasoning, mathematical reasoning, and code generation. Moreover, it obtains results comparable to model ensemble with just 53% parameters when used with a task-specific router.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Context-aware Dynamic Pruning for Speech Foundation Models

Masao Someki
Yifan Peng 0003
Siddhant Arora
Markus Müller
Athanasios Mouchtaris
Grant P. Strimel
Jing Liu
Shinji Watanabe 0001

Foundation models, such as large language models, have achieved remarkable success in natural language processing and are evolving into models capable of handling multiple modalities. Listening ability, in particular, is crucial for many applications, leading to research on building speech foundation models. However, the high computational cost of these large models presents a significant challenge for real-world applications. Although substantial efforts have been made to reduce computational costs, such as through pruning techniques, the majority of these approaches are applied primarily during the training phase for specific downstream tasks. In this study, we hypothesize that optimal pruned networks may vary based on contextual factors such as speaker characteristics, languages, and tasks. To address this, we propose a dynamic pruning technique that adapts to these contexts during inference without altering the underlying model. We demonstrated that we could successfully reduce inference time by approximately 30\% while maintaining accuracy in multilingual/multi-task scenarios. We also found that the obtained pruned structure offers meaningful interpretations based on the context, e.g., task-related information emerging as the dominant factor for efficient pruning.

EAAI Journal 2025 Journal Article

Continuous spatio temporal prompts for visual tracking

Meng Sun
Xiaotao Liu
Yifan Li
Hongyu Wang
Dian Yuan
Jing Liu

Currently, visual single-object tracking methods utilize online template updates to combine temporal information. However, these methods rely on confidence scores to evaluate the reliability of the current template, which may result in a template not being updated for an extended period. Moreover, advanced trackers select bounding boxes based solely on the similarity between the template and the search area, which can lead to tracking drift when encountering deformable or similar targets. To alleviate these limitations, we propose a Spatio Temporal Prompt Tracker (STPTrack), which utilizes the prior information about small changes of object state between successive frames. Different from previous tracking methods that mainly rely on templates and similarity scores, STPTrack transfers the object position and shape information of the previous frame to the current frame as continuous spatio temporal prompt for the first time, and realizes the efficient fusion of spatio temporal information through the prompt encoder and the fusion decoder module. Specifically, it encodes the bounding box coordinates or mask information of the previous frame and the response points of the current frame as prompt features, and then combines prompt tokens with search tokens through the fusion decoder to provide the potential location of the object for the search feature map. Our STPTrack sets a new state-of-the-art performance on six tracking benchmark datasets.

AAAI Conference 2025 Conference Paper

DiMSOD: A Diffusion-Based Framework for Multi-Modal Salient Object Detection

Shuo Zhang
Jiaming Huang
Wenbing Tang
Yan Wu
Terrence Hu
Xiaogang Xu
Jing Liu

Multi-modal salient object detection (SOD) through the integration of additional data such as depth or thermal information has become a significant task in computer vision during recent years. Traditionally, the challenges of identifying salient objects in RGB, RGB-D (Depth), and RGB-T (Thermal) images are tackled separately. However, without intricate cross-modal fusion strategies, such approaches struggle to effectively integrate multi-modal information, often resulting in poorly defined object edges or overconfident inaccurate predictions. Recent studies have shown that designing a unified end-to-end framework to handle all three types of SOD tasks simultaneously is both necessary and difficult. To address this need, we propose a novel approach that treats multi-modal SOD as a conditional mask generation task utilizing diffusion models. We introduce DiMSOD, which enables the concurrent use of local (depth maps, thermal maps) and global controls (original images) within a unified model for progressive denoising and refined prediction. DiMSOD is efficient, only requiring fine-tuning of our newly introduced modules on the existing stable diffusion, which not only reduces the fine-tuning cost, making it more viable for practical use, but also enhances the integration of multi-modal conditional controls. Specifically, we have developed modules including SOD-ControlNet, Feature Adaptive Network (FAN), and Feature Injection Attention Network (FIAN) to enhance the model's performance. Extensive experiments demonstrate that DiMSOD efficiently detects salient objects across RGB, RGB-D, and RGB-T datasets, achieving superior performance compared to previous well-established methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

End-to-End Vision Tokenizer Tuning

Wenxuan Wang
Fan Zhang
Yufeng Cui
Haiwen Diao
Zhuoyan Luo
Huchuan Lu
Jing Liu
Xinlong Wang

Existing vision tokenization isolates the optimization of vision tokenizers from downstream training, implicitly assuming the visual tokens can generalize well across various tasks, e. g. , image generation and visual question answering. The vision tokenizer optimized for low-level reconstruction is agnostic to downstream tasks requiring varied representations and semantics. This decoupled paradigm introduces a critical misalignment: The loss of the vision tokenization can be the representation bottleneck for target tasks. For example, errors in tokenizing text in a given image lead to poor results when recognizing or generating them. To address this, we propose ETT, an end-to-end vision tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks. Unlike prior autoregressive models that use only discrete indices from a frozen vision tokenizer, ETT leverages the visual embeddings of the tokenizer codebook, and optimizes the vision tokenizers end-to-end with both reconstruction and caption objectives. ETT can be seamlessly integrated into existing training pipelines with minimal architecture modifications. Our ETT is simple to implement and integrate, without the need to adjust the original codebooks or architectures of the employed large language models. Extensive experiments demonstrate that our proposed end-to-end vision tokenizer tuning unlocks significant performance gains, i. e. , 2-6% for multimodal understanding and visual generation tasks compared to frozen tokenizer baselines, while preserving the original reconstruction capability. We hope this very simple and strong method can empower multimodal foundation models besides image generation and understanding.

JBHI Journal 2025 Journal Article

Exploring an Innovative Deep Learning Solution for Acupuncture Point Localization on the Weak Feature Body Surface of the Human Back

Shilong Yang
Yalan Li
Hao Zou
Lingfeng Huang
Jing Liu
Yongsheng Teng
Yaoqin Xie

In current clinical practice, the localization of human acupuncture points relies extensively on the subjective experience of physicians. Therefore, despite being a crucial basic content of traditional Chinese medicine (TCM), acupuncture point localization has not been well expanded and promoted through intelligent means. Our goal is to explore an efficient and reliable solution for acupuncture point localization and recognition that addresses the shortcomings of subjectivity and standardization in this task. We focus on the weak feature body surface of the human back and propose an innovative approach that utilizes a deep learning network with a self-attention module for global extraction of image features. This methodology differs from common Convolutional Neural Networks (CNNs) which often lead to classification ambiguous in weak feature image tasks due to excessive cropping and scaling operations during feature extraction. Moreover, our self-constructed dataset of human back acupuncture points provides data support for model training. The localization task for the back acupuncture points of the subjects in the dataset strictly follows the national standard definition and is labelled by professional doctors of TCM to ensure data robustness and quality. Our preliminary experiments validate that our proposed network learns higher-quality global image features, achieving an average accuracy of less than 1cm in the localization and recognition task of 84 acupuncture points on the back of the human body.

IJCAI Conference 2025 Conference Paper

Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark

Yudong Jiang
Baohan Xu
Siqian Yang
Mingyu Ying
Jing Liu
Chao Xu
Siqi Wang
Yidi Wu

Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our entire project is publicly available on https: //github. com/bilibili/Index-anisora/tree/main

PDF Details DOI

AAAI Conference 2025 Conference Paper

FedCross: Intertemporal Federated Learning Under Evolutionary Games

Jianfeng Lu
Ying Zhang
Riheng Jia
Shuqin Cao
Jing Liu
Hao Fu

Federated Learning (FL) mitigates privacy leakage in decentralized machine learning by allowing multiple clients to train collaboratively locally. However, dynamic mobile networks with high mobility, intermittent connectivity, and bandwidth limitation severely hinder model updates to the cloud server. Although previous studies have typically addressed user mobility issue through task reassignment or predictive modeling, frequent migrations may result in high communication overhead. Addressing this challenge involves not only dealing with resource constraints, but also finding ways to mitigate the challenges posed by user migrations. We therefore propose a intertemporal incentive framework, FedCross, which ensures the continuity of FL tasks by migrating interrupted training tasks to feasible mobile devices. FedCross comprises two distinct stages: Specifically, in Stage 1, we address the task allocation problem across regions under resource constraints by employing a multi-objective migration algorithm to quantify the optimal task receivers. Moreover, we adopt evolutionary game theory to capture the dynamic decision-making of users, forecasting the evolution of user proportions across different regions to mitigate frequent migrations. In Stage 2, we utilize a procurement auction mechanism to allocate rewards among base stations, ensuring that those providing high-quality models receive optimal compensation. This approach incentivizes sustained user participation, thereby ensuring the overall feasibility of FedCross. Finally, experimental results validate the theoretical soundness of FedCross and demonstrate its significant reduction in communication overhead.

PDF Details DOI

ICML Conference 2025 Conference Paper

Few-Shot Learner Generalizes Across AI-Generated Image Detection

Shiyu Wu
Jing Liu
Jing Li
Yequan Wang

Current fake image detectors trained on large synthetic image datasets perform satisfactorily on limited studied generative models. However, these detectors suffer a notable performance decline over unseen models. Besides, collecting adequate training data from online generative models is often expensive or infeasible. To overcome these issues, we propose Few-Shot Detector (FSD), a novel AI-generated image detector which learns a specialized metric space for effectively distinguishing unseen fake images using very few samples. Experiments show that FSD achieves state-of-the-art performance by $+11. 6%$ average accuracy on the GenImage dataset with only $10$ additional samples. More importantly, our method is better capable of capturing the intra-category commonality in unseen images without further training. Our code is available at https: //github. com/teheperinko541/Few-Shot-AIGI-Detector.

AAAI Conference 2025 Conference Paper

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Md Rafi Ur Rashid
Jing Liu
Toshiaki Koike-Akino
Ye Wang
Shagufta Mehnaz

Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pretrained models from unverified sources, highlighting the potential risks involved.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Graph Contrastive Learning with Joint Spectral Augmentation of Attribute and Topology

Liang Yang
Zhenna Li
Jiaming Zhuo
Jing Liu
Ziyi Ma
Chuan Wang
Zhen Wang
Xiaochun Cao

As an essential technique for Graph Contrastive Learning (GCL), Graph Augmentation (GA) improves the generalization capability of the GCLs by introducing different forms of the same graph. To ensure information integrity, existing GA strategies have been designed to simultaneously process the two types of information available in graphs: node attributes and graph topology. Nonetheless, these strategies tend to augment the two types of graph information separately, ignoring their correlation, resulting in limited representation ability. To overcome this drawback, this paper proposes a novel GCL framework with a Joint spectrAl augMentation, named GCL-JAM. Motivated the equivalence between the graph learning objective on an attribute graph and the spectral clustering objective on the attribute-interpolated graph, the node attributes are first abstracted as another type of node to harmonize the node attributes and graph topology. The newly constructed graph is then utilized to perform spectral augmentation to capture the correlation during augmentation. Theoretically, the proposed joint spectral augmentation is proved to perturb more inter-class edges and noise attributes compared to separate augmentation methods. Extensive experiments on homophily and heterophily graphs validate the effectiveness and universality of GCL-JAM.

PDF Details DOI

AAAI Conference 2025 Conference Paper

M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images

Hongyi Wang
Xiuju Du
Jing Liu
Shuyi Ouyang
Yen-Wei Chen
Lanfen Lin

The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones along with patch-sampling for this task, which ignores the inherent multi-scale information embedded in the pyramidal data structure of digital pathology images, and wastes the inter-spot visual information crucial for accurate gene expression prediction. To address these limitations, we propose M2OST, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images via a decoupled multi-scale feature extractor. Unlike traditional models that are trained with one-to-one image-label pairs, M2OST uses multiple images from different levels of the digital pathology image to jointly predict the gene expressions in their common corresponding spot. Built upon our many-to-one scheme, M2OST can be easily scaled to fit different numbers of inputs, and its network structure inherently incorporates nearby inter-spot features, enhancing regression performance. We have tested M2OST on three public ST datasets and the experimental results show that M2OST can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs).

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Model Merging in Pre-training of Large Language Models

Yunshui Li
Yiyuan Ma
Shen Yan
Chaoyi Zhang
Jing Liu
Jianqiao Lu
Ziwen Xu
Mengzhao Chen

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comprehensive experimental analysis, we offer the open-source community practical pre-training guidelines for effective model merging.

AAAI Conference 2025 Conference Paper

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen
Zhao Song
Yufa Zhou
Bo Chen
Jing Liu
Ruiyi Zhang
Ryan A. Rossi
Hao Tan

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction

Jie Yan
Jing Liu
Zhong-Yuan Zhang

Federated clustering (FC) aims to discover global cluster structures across decentralized clients without sharing raw data, making privacy preservation a fundamental requirement. There are two critical challenges: (1) privacy leakage during collaboration, and (2) robustness degradation due to aggregation of proxy information from non-independent and identically distributed (Non-IID) local data, leading to inaccurate or inconsistent global clustering. Existing solutions typically rely on model-specific local proxies, which are sensitive to data heterogeneity and inherit inductive biases from their centralized counterparts, thus limiting robustness and generality. We propose Omni Federated Clustering (OmniFC), a unified and model-agnostic framework. Leveraging Lagrange coded computing, our method enables clients to share only encoded data, allowing exact reconstruction of the global distance matrix—a fundamental representation of sample relationships—without leaking private information, even under client collusion. This construction is naturally resilient to Non-IID data distributions. This approach decouples FC from model-specific proxies, providing a unified extension mechanism applicable to diverse centralized clustering methods. Theoretical analysis confirms both reconstruction fidelity and privacy guarantees, while comprehensive experiments demonstrate OmniFC's superior robustness, effectiveness, and generality across various benchmarks compared to state-of-the-art methods. Code will be released.

ICRA Conference 2025 Conference Paper

Rapid Autonomous Exploration of Large-Scale Environments for Ground Robots Based on Region Partitioning

Zhi Wen
Xiaotao Liu
Gaojie Lu
Jing Liu

Autonomous exploration in large environments often leads to inefficient long backtracking, as distant targets are prioritized over closer ones. In this work, a hierarchical planning method is proposed, which employs region partitioning to systematically address the aforementioned issue. The space is dynamically partitioned at a coarse resolution, and as exploration progresses, regions with sufficient known areas are further subdivided to locate unknown areas more precisely. A utility function considering unknown area size, travel distance and sequence similarity is used, and the simulated annealing algorithm generates a subregion sequence for global guidance. Within each subregion, a linear acceleration model helps select target points. This method reduces computational load and minimizes long-distance backtracking, enabling more efficient high-frequency planning. Extensive simulations and real-world tests show that our method significantly improves exploration efficiency compared to existing vision-based techniques.

JBHI Journal 2025 Journal Article

SAMA: A Self-and-Mutual Attention Network for Accurate Recurrence Prediction of Non-Small Cell Lung Cancer Using Genetic and CT Data

Yang Ai
Jing Liu
Yinhao Li
Fang Wang
Xiuju Du
Rahul Kumar Jain
Lanfen Lin
Yen-Wei Chen

Accurate preoperative recurrence prediction for non-small cell lung cancer (NSCLC) is a challenging issue in the medical field. Existing studies primarily conduct image and molecular analyses independently or directly fuse multimodal information through radiomics and genomics, which fail to fully exploit and effectively utilize the highly heterogeneous cross-modal information at different levels and model the complex relationships between modalities, resulting in poor fusion performance and becoming the bottleneck of precise recurrence prediction. To address these limitations, we propose a novel unified framework, the Self-and-Mutual Attention (SAMA) Network, designed to efficiently fuse and utilize macroscopic CT images and microscopic gene data for precise NSCLC recurrence prediction, integrating handcrafted features, deep features, and gene features. Specifically, we design a Self-and-Mutual Attention Module that performs three-stage fusion: the self-enhancement stage enhances modality-specific features; the gene-guided and CT-guided cross-modality fusion stages perform bidirectional cross-guidance on the self-enhanced features, complementing and refining each modality, enhancing heterogeneous feature expression; and the optimized feature aggregation stage ensures the refined interactive features for precise prediction. Extensive experiments on both publicly available datasets from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA) demonstrate that our method achieves state-of-the-art performance and exhibits broad applicability to various cancers.

EAAI Journal 2025 Journal Article

ShapG: New feature importance method based on the Shapley value

Chi Zhao
Jing Liu
Elena Parilina

With wide application of Artificial Intelligence (AI), it has become particularly important to make decisions of AI systems explainable and transparent. In this paper, we proposed a new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) for measuring feature importance. ShapG is a model-agnostic global explanation method. At the first stage, it defines an undirected graph based on the dataset, where nodes represent features and edges are added based on calculation of correlation coefficients between features. At the second stage, it calculates an approximated Shapley value by sampling the data taking into account this graph structure. The sampling approach of ShapG allows to calculate the importance of features efficiently, i. e. to reduce computational complexity. Comparison of ShapG with other existing XAI methods shows that it provides more accurate explanations for two examined datasets. We also compared other XAI methods developed based on cooperative game theory with ShapG in running time, and the results show that ShapG exhibits obvious advantages in its running time, which further proves efficiency of ShapG. In addition, extensive experiments demonstrate a wide range of applicability of the ShapG method for explaining complex models. We find ShapG an important tool in improving explainability and transparency of AI systems and believe it can be widely used in various fields.

NeurIPS Conference 2025 Conference Paper

SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes

Yifan Yang
Zhen Zhang
Rupak Vignesh Swaminathan
Jing Liu
Nathan Susanj
Zheng Zhang

Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-variance evolutionary strategies (ES) or zeroth-order (ZO) optimization, and often fail to achieve satisfactory performance. In this paper, we propose a hybrid Sharpness-aware Zeroth-order optimization (SharpZO) approach, specifically designed to enhance the performance of ZO VLM fine-tuning via a sharpness-aware warm-up training. SharpZO features a two-stage optimization process: a sharpness-aware ES stage that globally explores and smooths the loss landscape to construct a strong initialization, followed by a fine-grained local search via sparse ZO optimization. The entire optimization relies solely on forward passes. Detailed theoretical analysis and extensive experiments on CLIP models demonstrate that SharpZO significantly improves accuracy and convergence speed, achieving up to 7\% average gain over state-of-the-art forward-only methods.

AAAI Conference 2025 Conference Paper

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Gangqiang Hu
Jianfeng Lu
Jianmin Han
Shuqin Cao
Jing Liu
Hao Fu

Due to the sensitivity of data, Federated Learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semidecentralized FL, clients’ communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges inherent in real-world scenarios. To tackle this issue, we propose a TRust-Aware clIent scheduLing mechanism called TRAIL, which assesses client states and contributions, enhancing model training efficiency through selective client participation. We focus on a semi-decentralized FL framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we propose an adaptive hidden semi-Markov model to estimate clients’ communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7% in test accuracy and a reduction of 15.3% in training loss.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro-Velazquez
Ariya Rastrow
Najim Dehak
Jesus Villalba

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on the input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model’s capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10\% relative reduction in LID errors, a 37\% improvement in ASR CER on the ML-SUPERB benchmark, and a 27\% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.

PDF Details DOI

EAAI Journal 2024 Journal Article

Digital-analog driven multi-scale transfer for smart bearing fault diagnosis

Wenbin Huang
Zixian Li
Xiaoxi Ding
Dong He
Qihang Wu
Jing Liu

Self-diagnosis and self-decision are crucial to smart bearing, where intelligent and robust models should be built and deployed on the smart bearing chip for an on-line edge effect. Whereas, this process requires a large amount of labeled prior data to train the fault identification model. Although the existing digital-analog driven transfer learning methods can realize fault identification under small samples, these algorithms mainly focus on how to reduce the difference between the two domains. These algorithms do not form a complete and applicable method for smart bearing fault diagnosis. Focusing on these issues, a digital-analog driven multi-scale transfer (DaD-MsT) method was proposed for smart bearing fault diagnosis. Different from the conventional methods, it can be achieved through end-side and edge-side cooperation, and the effect of transfer diagnosis is further improved by the proposed deep branch transfer network (DBTN) model. First, the smart bearing dynamic model is established, and the dynamic model response is obtained for use as source domain data in end-side. Then, a DBTN model was proposed to realize more effective digital-analog driven transfer learning. Finally, the trained model is deployed on the edge chip of the smart bearing for real-time fault identification and parameter fine-tuning. Experiments and comparisons verify the effectiveness of the proposed method in the case of small-sample data. Specifically, an online edge intelligent diagnosis system is also built to illustrate the ability in actual application of smart bearing intelligent diagnosis.

AAAI Conference 2024 Conference Paper

Graph Disentangled Contrastive Learning with Personalized Transfer for Cross-Domain Recommendation

Jing Liu
Lele Sun
Weizhi Nie
Peiguang Jing
Yuting Su

Cross-Domain Recommendation (CDR) has been proven to effectively alleviate the data sparsity problem in Recommender System (RS). Recent CDR methods often disentangle user features into domain-invariant and domain-specific features for efficient cross-domain knowledge transfer. Despite showcasing robust performance, three crucial aspects remain unexplored for existing disentangled CDR approaches: i) The significance nuances of the interaction behaviors are ignored in generating disentangled features; ii) The user features are disentangled irrelevant to the individual items to be recommended; iii) The general knowledge transfer overlooks the user's personality when interacting with diverse items. To this end, we propose a Graph Disentangled Contrastive framework for CDR (GDCCDR) with personalized transfer by meta-networks. An adaptive parameter-free filter is proposed to gauge the significance of diverse interactions, thereby facilitating more refined disentangled representations. In sight of the success of Contrastive Learning (CL) in RS, we propose two CL-based constraints for item-aware disentanglement. Proximate CL ensures the coherence of domain-invariant features between domains, while eliminatory CL strives to disentangle features within each domains using mutual information between users and items. Finally, for domain-invariant features, we adopt meta-networks to achieve personalized transfer. Experimental results on four real-world datasets demonstrate the superiority of GDCCDR over state-of-the-art methods.

PDF Details DOI

JBHI Journal 2024 Journal Article

iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities

Changli Feng
Haiyan Wei
Chugui Xu
Bin Feng
Xiaorong Zhu
Jing Liu
Quan Zou

Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming.

NeurIPS Conference 2024 Conference Paper

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang

A critical approach for efficiently deploying computationally demanding large language models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of previously generated tokens, significantly reducing the need for repetitive computations and thereby lowering latency in autoregressive generation. However, the size of the KV cache grows linearly with sequence length, posing challenges for applications requiring long context input and extensive sequence generation. In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference. Our approach is based on the observation that KV cache states exhibit high similarity between the adjacent layers in the middle-to-deep portion of LLMs. To facilitate merging, we propose disentangling the states into the magnitude and direction components, interpolating the directions of the state vectors while preserving their lengths unchanged. Furthermore, we introduce a token retention strategy to keep highly distinct state pairs unmerged, thus preserving the information with minimal additional storage overhead. Our MiniCache is training-free and general, complementing existing KV cache compression strategies, such as quantization and sparsity. We conduct a comprehensive evaluation of MiniCache utilizing various models including LLaMA-2, LLaMA-3, Phi-3, Mistral, and Mixtral across multiple benchmarks, demonstrating its exceptional performance in achieving superior compression ratios and high throughput. On the ShareGPT dataset, LLaMA-2-7B with cross-layer merging achieves a compression ratio of $1. 53\times$. Additionally, since MiniCache is orthogonal to existing quantization techniques, it can achieve a compression ratio of up to $5. 02\times$ when combined with the 4-bit quantization technique, enhancing inference throughput by approximately $5\times$ and reducing the memory footprint by $41\%$ compared to the FP16 full cache baseline, all while maintaining near-lossless performance. Project is available at https: //minicache. vmv. re.

PDF Details DOI

JBHI Journal 2024 Journal Article

Noninvasive Left Ventricle Pressure-Volume Loop Determination Method With Cardiac Magnetic Resonance Imaging and Carotid Tonometry Using a Physics-Informed Approach

Jing Liu
Coskun Bilgi
Alda Bregasi
Gary F Mitchell
Niema M Pahlevan

Left ventricular (LV) pressure-volume loop (PV-loop) is an important tool to quantify intrinsic left ventricular properties and ventricular-arterial coupling. A significant drawback of conventional PV-loop assessment is the need of invasive measurements which limits its widespread application. To tackle this issue, we developed a PV-loop determination method by using non-invasive measurements from arterial tonometry and cardiac magnetic resonance imaging. A physics-based optimization strategy was designed that adaptively identifies the optimal parameters to construct the PV-loop. We conducted comparative analysis in a convenience sample (N = 77) with heart failure (HF) (N = 23) patients and a control (N = 54) group to evaluate the sensitivity our PV-loop estimation algorithm. Significant and coherent differences between cohorts for the parameters derived using the PV-loop were observed. Our method captures the significant elevation of LV end diastolic pressure (p<0. 001), and the decrease of the ventricular efficiency (p<0. 0001) of the HF patients compared to the Control group. This method further captures the mechanistic changes of the LV by highlighting the significant differences of the smaller stroke work (p<0. 0001), mean external power (p<0. 05), and contractility (p<0. 001) between these groups. The LV performance metrics align well with the previous clinical PV-loop observations of HF patients and our results demonstrate that the proposed PV-loop reconstruction method can be used to assess the ventricular functional changes associated with HF. Using this noninvasive method may significantly impact and facilitate the diagnosis and therapeutic management of HF.

NeurIPS Conference 2024 Conference Paper

Pretrained Optimization Model for Zero-Shot Black Box Optimization

Xiaobin Li
Kai Wu
Yujian B. Li
Xiaoyu Zhang
Handing Wang
Jing Liu

Zero-shot optimization involves optimizing a target task that was not seen during training, aiming to provide the optimal solution without or with minimal adjustments to the optimizer. It is crucial to ensure reliable and robust performance in various applications. Current optimizers often struggle with zero-shot optimization and require intricate hyperparameter tuning to adapt to new tasks. To address this, we propose a Pretrained Optimization Model (POM) that leverages knowledge gained from optimizing diverse tasks, offering efficient solutions to zero-shot optimization through direct application or fine-tuning with few-shot samples. Evaluation on the BBOB benchmark and two robot control tasks demonstrates that POM outperforms state-of-the-art black-box optimization methods, especially for high-dimensional tasks. Fine-tuning POM with a small number of samples and budget yields significant performance improvements. Moreover, POM demonstrates robust generalization across diverse task distributions, dimensions, population sizes, and optimization horizons. For code implementation, see https: //github. com/ninja-wm/POM/.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Rapid Plug-in Defenders

Kai Wu
Yujian B. Li
Jian Lou
Xiaoyu Zhang
Handing Wang
Jing Liu

In the realm of daily services, the deployment of deep neural networks underscores the paramount importance of their reliability. However, the vulnerability of these networks to adversarial attacks, primarily evasion-based, poses a concerning threat to their functionality. Common methods for enhancing robustness involve heavy adversarial training or leveraging learned knowledge from clean data, both necessitating substantial computational resources. This inherent time-intensive nature severely limits the agility of large foundational models to swiftly counter adversarial perturbations. To address this challenge, this paper focuses on the \textbf{Ra}pid \textbf{P}lug-\textbf{i}n \textbf{D}efender (\textbf{RaPiD}) problem, aiming to rapidly counter adversarial perturbations without altering the deployed model. Drawing inspiration from the generalization and the universal computation ability of pre-trained transformer models, we propose a novel method termed \textbf{CeTaD} (\textbf{C}onsidering Pr\textbf{e}-trained \textbf{T}ransformers \textbf{a}s \textbf{D}efenders) for RaPiD, optimized for efficient computation. \textbf{CeTaD} strategically fine-tunes the normalization layer parameters within the defender using a limited set of clean and adversarial examples. Our evaluation centers on assessing \textbf{CeTaD}'s effectiveness, transferability, and the impact of different components in scenarios involving one-shot adversarial examples. The proposed method is capable of rapidly adapting to various attacks and different application scenarios without altering the target model and clean training data. We also explore the influence of varying training data conditions on \textbf{CeTaD}'s performance. Notably, \textbf{CeTaD} exhibits adaptability across differentiable service models and proves the potential of continuous learning.

PDF Details DOI

JBHI Journal 2024 Journal Article

Segmentation Guided Crossing Dual Decoding Generative Adversarial Network for Synthesizing Contrast-Enhanced Computed Tomography Images

Yulin Yang
Qingqing Chen
Yinhao Li
Fang Wang
Xian-Hua Han
Yutaro Iwamoto
Jing Liu
Lanfen Lin

Although contrast-enhanced computed tomography (CE-CT) images significantly improve the accuracy of diagnosing focal liver lesions (FLLs), the administration of contrast agents imposes a considerable physical burden on patients. The utilization of generative models to synthesize CE-CT images from non-contrasted CT images offers a promising solution. However, existing image synthesis models tend to overlook the importance of critical regions, inevitably reducing their effectiveness in downstream tasks. To overcome this challenge, we propose an innovative CE-CT image synthesis model called Segmentation Guided Crossing Dual Decoding Generative Adversarial Network (SGCDD-GAN). Specifically, the SGCDD-GAN involves a crossing dual decoding generator including an attention decoder and an improved transformation decoder. The attention decoder is designed to highlight some critical regions within the abdominal cavity, while the improved transformation decoder is responsible for synthesizing CE-CT images. These two decoders are interconnected using a crossing technique to enhance each other's capabilities. Furthermore, we employ a multi-task learning strategy to guide the generator to focus more on the lesion area. To evaluate the performance of proposed SGCDD-GAN, we test it on an in-house CE-CT dataset. In both CE-CT image synthesis tasks–namely, synthesizing ART images and synthesizing PV images–the proposed SGCDD-GAN demonstrates superior performance metrics across the entire image and liver region, including SSIM, PSNR, MSE, and PCC scores. Furthermore, CE-CT images synthetized from our SGCDD-GAN achieve remarkable accuracy rates of 82. 68%, 94. 11%, and 94. 11% in a deep learning-based FLLs classification task, along with a pilot assessment conducted by two radiologists.

TIST Journal 2024 Journal Article

Self-supervised Bipartite Graph Representation Learning: A Dirichlet Max-margin Matrix Factorization Approach

Shenghai Zhong
Shu Guo
Jing Liu
Hongren Huang
Lihong Wang
Jianxin Li
Chen Li
Yiming Hei

Bipartite graph representation learning aims to obtain node embeddings by compressing sparse vectorized representations of interactions between two types of nodes, e.g., users and items. Incorporating structural attributes among homogeneous nodes, such as user communities, improves the identification of similar interaction preferences, namely, user/item embeddings, for downstream tasks. However, existing methods often fail to proactively discover and fully utilize these latent structural attributes. Moreover, the manual collection and labeling of structural attributes is always costly. In this article, we propose a novel approach called Dirichlet Max-margin Matrix Factorization (DM3F), which adopts a self-supervised strategy to discover latent structural attributes and model discriminative node representations. Specifically, in self-supervised learning, our approach generates pseudo group labels (i.e., structural attributes) as a supervised signal using the Dirichlet process without relying on manual collection and labeling, and employs them in a max-margin classification. Additionally, we introduce a Variational Markov Chain Monte Carlo algorithm (Variational MCMC) to effectively update the parameters. The experimental results on six real datasets demonstrate that, in the majority of cases, the proposed method outperforms existing approaches based on matrix factorization and neural networks. Furthermore, the modularity analysis confirms the effectiveness of our model in capturing structural attributes to produce high-quality user embeddings.

AAAI Conference 2024 Conference Paper

Signed Graph Neural Ordinary Differential Equation for Modeling Continuous-Time Dynamics

Lanlan Chen
Kai Wu
Jian Lou
Jing Liu

Modeling continuous-time dynamics constitutes a foundational challenge, and uncovering inter-component correlations within complex systems holds promise for enhancing the efficacy of dynamic modeling. The prevailing approach of integrating graph neural networks with ordinary differential equations has demonstrated promising performance. However, they disregard the crucial signed information potential on graphs, impeding their capacity to accurately capture real-world phenomena and leading to subpar outcomes. In response, we introduce a novel approach: a signed graph neural ordinary differential equation, adeptly addressing the limitations of miscapturing signed information. Our proposed solution boasts both flexibility and efficiency. To substantiate its effectiveness, we seamlessly integrate our devised strategies into three preeminent graph-based dynamic modeling frameworks: graph neural ordinary differential equations, graph neural controlled differential equations, and graph recurrent neural networks. Rigorous assessments encompass three intricate dynamic scenarios from physics and biology, as well as scrutiny across four authentic real-world traffic datasets. Remarkably outperforming the trio of baselines, empirical results underscore the substantial performance enhancements facilitated by our proposed approach. Our code can be found at https://github.com/beautyonce/SGODE.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang
Xiaotao Liu
Yifan Li
Meng Sun
Dian Yuan
Jing Liu

RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results. However, these RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training. The former struggles to cope with the object state changes, while the latter neglects the correlation between spatial and temporal information. To alleviate these limitations, we propose a novel Temporal Adaptive RGBT Tracking framework, named as TATrack. TATrack has a spatio-temporal two-stream structure and captures temporal information by an online updated template, where the two-stream structure refers to the multi-modal feature extraction and cross-modal interaction for the initial template and the online update template respectively. TATrack contributes to comprehensively exploit spatio-temporal information and multi-modal information for target localization. In addition, we design a spatio-temporal interaction (STI) mechanism that bridges two branches and enables cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that our method achieves state-of-the-art performance, while running at real-time speed.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang

KV cache stores key and value states from previous tokens to avoid re-computation, yet it demands substantial storage space, especially for long sequences. Adaptive KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance. However, previous methods of this approach exhibit significant performance degradation at high compression ratios due to inaccuracies in identifying salient tokens. Additionally, the compression process introduces excessive overhead, substantially increasing memory burdens and the generation latency. In this paper, we present ZipCache, an accurate and efficient KV cache quantization method for large language models (LLMs). First, we construct a strong baseline for quantizing KV cache. Through the proposed channel-separable tokenwise quantization scheme, the memory overhead of quantization parameters are substantially reduced compared to fine-grained groupwise quantization. To enhance the compression ratio, we propose normalized attention score as an effective metric for identifying salient tokens by considering the lower triangle characteristics of the attention matrix. The quantization bit-width for each token is then adaptively assigned based on their saliency. Moreover, we develop an efficient approximation method that decouples the saliency metric from full attention scores, enabling compatibility with fast attention implementations like FlashAttention. Extensive experiments demonstrate that ZipCache achieves superior compression ratios, fast generation speed and minimal performance losses compared with previous KV cache compression methods. For instance, when evaluating Mistral-7B model on GSM8k dataset, ZipCache is capable of compressing the KV cache by $4. 98\times$, with only a 0. 38% drop in accuracy. In terms of efficiency, ZipCache also showcases a 37. 3% reduction in prefill-phase latency, a 56. 9% reduction in decoding-phase latency, and a 19. 8% reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of 4096. Code is available at https: //github. com/ThisisBillhe/ZipCache/.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

A Survey on Efficient Training of Transformers

Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources. This survey provides the first systematic overview of the efficient training of Transformers, covering the recent progress in acceleration arithmetic and hardware, with a focus on the former. We analyze and compare methods that save computation and memory costs for intermediate tensors during training, together with techniques on hardware/algorithm co-design. We finally discuss challenges and promising areas for future research.

PDF Details DOI

YNICL Journal 2023 Journal Article

Brain development mediates the relationship between self-reported poor parental monitoring and adolescent anxiety

Yiman Li
Zheyi Zhou
Yuqi Zhang
Hui Ai
Mingfang Liu
Jing Liu
Li Wang
Jiang Qiu

Adolescence is the peak period for the onset of generalized anxiety disorder (GAD). Brain networks of cognitive and affective control in adolescents are not well developed when their exposure to external stimuli suddenly increases.Reasonable parental monitoring is especially important during this period.To examine the role of parental monitoring in the development of functional brain networks of GAD, we conducted a cross-validation-based predictive study based on the functional brain networks of 192 participants. We found that a set of functional brain networks, especially the default mode network and its connectivity with the frontoparietal network, could predict the ages of adolescents, which was replicated in three independent samples.Importantly, the difference between predicted age and chronological age significantly mediated the relationship between parental monitoring and anxiety levels. These findings suggest that inadequate parental monitoring plays a crucial role in the delayed development of specific brain networks associated with GAD in adolescents. Our work highlights the important role of parental monitoring in adolescent development.

EAAI Journal 2023 Journal Article

Condition monitoring of wind turbine using novel deep learning method and dynamic kernel principal components Mahalanobis distance

Wenhe Chen
Hanting Zhou
Longsheng Cheng
Jing Liu
Min Xia

Condition monitoring (CM) of wind turbine (WT) has been increasingly adopted for its fault diagnosis and maintenance decision-making. However, the data collected in CM is typically noisy, multidimensional, and highly nonlinear, which causes significant challenges in achieving the effective CM of WT. This paper proposes a novel CM method using a deep learning model with temporal pattern attention (TPA) and a dynamic kernel principal components Mahalanobis distance (DKPMD). The method can evaluate the WT performance accurately for detecting faults. First, outliers are recognized and removed using isolation forest improved by sparse autoencoder and fuzzy c-means clustering (FSIF) from raw wind turbine data of health state for enhancing the quality and reliability of data in modeling. Then, a gated recurrent unit (GRU) is developed for data reconstruction of the objective variables using LassoNet and TPA, which can capture the short- and long-term temporal relationships under different time steps based on selected variables. Meanwhile, kernel RMSE (KRMSE) is applied as a loss function, which avoids the negative effects of large reconstructed errors in parameter optimization. A condition index (CI) is constructed using DKPMD based on the reconstructed errors to consider the dynamic correlation between the variables. Finally, a delay perception-based IF(DPIF) is utilized to determine the threshold. Experiments with data from real WT demonstrate the effectiveness of the developed approach in detecting early abnormal conditions, which outperforms other state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

Mingzhen Sun
Weining Wang
Zihan Qin
Jiahui Sun
Sihan Chen
Jing Liu

Video generation necessitates both global coherence and local realism. This work presents a novel non-autoregressive method GLOBER, which first generates global features to obtain comprehensive global guidance and then synthesizes video frames based on the global features to generate coherent videos. Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner. To achieve maximum flexibility, our video decoder perceives temporal information through normalized frame indexes, which enables it to synthesize arbitrary sub video clips with predetermined starting and ending frame indexes. Moreover, a novel adversarial loss is introduced to improve the global coherence and local realism between the synthesized video frames. Finally, we employ a diffusion-based video generator to fit the global features outputted by the video encoder for video generation. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method, and new state-of-the-art results have been achieved on multiple benchmarks.

NeurIPS Conference 2023 Conference Paper

How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception

Dingkang Yang
Kun Yang
Yuzheng Wang
Jing Liu
Zhi Xu
Rongbin Yin
Peng Zhai
Lihua Zhang

Multi-agent collaborative perception has recently received widespread attention as an emerging application in driving scenarios. Despite the advancements in previous efforts, challenges remain due to various noises in the perception procedure, including communication redundancy, transmission delay, and collaboration heterogeneity. To tackle these issues, we propose \textit{How2comm}, a collaborative perception framework that seeks a trade-off between perception performance and communication bandwidth. Our novelties lie in three aspects. First, we devise a mutual information-aware communication mechanism to maximally sustain the informative features shared by collaborators. The spatial-channel filtering is adopted to perform effective feature sparsification for efficient communication. Second, we present a flow-guided delay compensation strategy to predict future characteristics from collaborators and eliminate feature misalignment due to temporal asynchrony. Ultimately, a pragmatic collaboration transformer is introduced to integrate holistic spatial semantics and temporal context clues among agents. Our framework is thoroughly evaluated on several LiDAR-based collaborative detection datasets in real-world and simulated scenarios. Comprehensive experiments demonstrate the superiority of How2comm and the effectiveness of all its vital components. The code will be released at https: //github. com/ydk122024/How2comm.

IJCAI Conference 2023 Conference Paper

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

Yanrui Du
Jing Yan
Yan Chen
Jing Liu
Sendong Zhao
Qiaoqiao She
Hua Wu
Haifeng Wang

Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i. e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He
Luping Liu
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without requiring any re-training. Nonetheless, applying existing post-training quantization methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. Moreover, as the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step, which prioritizes lower bitwidths to expedite early denoising steps, while ensuring that higher bitwidths maintain a high signal-to-noise ratio (SNR) in the later steps. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models in generating high-quality samples, with only a $0. 06$ increase in FID score compared to full-precision LDM-4 on ImageNet $256\times256$, while saving $19. 9\times$ bit operations. Code is available at [https: //github. com/ziplab/PTQD](https: //github. com/ziplab/PTQD).

AAAI Conference 2023 Conference Paper

SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines

Shizun Wang
Weihong Zeng
Xu Wang
Hao Yang
Li Chen
Chuang Zhang
Ming Wu
Yi Yuan

The creation of a parameterized stylized character involves careful selection of numerous parameters, also known as the "avatar vectors" that can be interpreted by the avatar engine. Existing unsupervised avatar vector estimation methods that auto-create avatars for users, however, often fail to work because of the domain gap between realistic faces and stylized avatar images. To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works. SwiftAvatar introduces dual-domain generators to create pairs of realistic faces and avatar images using shared latent codes. The latent codes can then be bridged with the avatar vectors as pairs, by performing GAN inversion on the avatar images rendered from the engine using avatar vectors. Through this way, we are able to synthesize paired data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces. We also propose semantic augmentation to improve the diversity of synthesis. Finally, a light-weight avatar vector estimator is trained on the synthetic pairs to implement efficient auto-creation. Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines. The superiority and advantageous flexibility of SwiftAvatar are also verified in both subjective and objective evaluations.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Mingzhen Sun
Xinxin Zhu
Jing Liu

Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video caption dataset called VAST-27M. Specifically, we first collect 27 million open-domain video clips and separately train a vision and an audio captioner to generate vision and audio captions. Then, we employ an off-the-shelf Large Language Model (LLM) to integrate the generated captions, together with subtitles and instructional prompts into omni-modality captions. Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA). Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks.

NeurIPS Conference 2022 Conference Paper

CoPur: Certifiably Robust Collaborative Inference via Feature Purification

Jing Liu
Chulin Xie
Sanmi Koyejo
Bo Li

Collaborative inference leverages diverse features provided by different agents (e. g. , sensors) for more accurate inference. A common setup is where each agent sends its embedded features instead of the raw data to the Fusion Center (FC) for joint prediction. In this setting, we consider the inference-time attacks when a small fraction of agents are compromised. The compromised agent either does not send embedded features to the FC, or sends arbitrarily embedded features. To address this, we propose a certifiably robust COllaborative inference framework via feature PURification (CoPur), by leveraging the block-sparse nature of adversarial perturbations on the feature vector, as well as exploring the underlying redundancy across the embedded features (by assuming the overall features lie on an underlying lower dimensional manifold). We theoretically show that the proposed feature purification method can robustly recover the true feature vector, despite adversarial corruptions and/or incomplete observations. We also propose and test an untargeted distributed feature-flipping attack, which is agnostic to the model, training data, label, as well as the features held by other agents, and is shown to be effective in attacking state-of-the-art defenses. Experiments on ExtraSensory and NUS-WIDE datasets show that CoPur significantly outperforms existing defenses in terms of robustness against targeted and untargeted adversarial attacks.

NeurIPS Conference 2022 Conference Paper

EcoFormer: Energy-Saving Attention with Linear Complexity

Jing Liu
Zizheng Pan
Haoyu He
Jianfei Cai
Bohan Zhuang

Transformer is a transformative framework for deep learning which models sequential data and has achieved remarkable performance on a wide range of tasks, but with high computational and energy cost. To improve its efficiency, a popular choice is to compress the models via binarization which constrains the floating-point values into binary ones to save resource consumption owing to cheap bitwise operations significantly. However, existing binarization methods only aim at minimizing the information loss for the input distribution statistically, while ignoring the pairwise similarity modeling at the core of the attention mechanism. To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space. The kernelized hash functions are learned to match the ground-truth similarity relations extracted from the attention map in a self-supervised way. Based on the equivalence between the inner product of binary codes and the Hamming distance as well as the associative property of matrix multiplication, we can approximate the attention in linear complexity by expressing it as a dot-product of binary codes. Moreover, the compact binary representations of queries and keys in EcoFormer enable us to replace most of the expensive multiply-accumulate operations in attention with simple accumulations to save considerable on-chip energy footprint on edge devices. Extensive experiments on both vision and language tasks show that EcoFormer consistently achieves comparable performance with standard attentions while consuming much fewer resources. For example, based on PVTv2-B0 and ImageNet-1K, EcoFormer achieves a 73% reduction in on-chip energy footprint with only a slight performance drop of 0. 33% compared to the standard attention. Code is available at https: //github. com/ziplab/EcoFormer.

EAAI Journal 2022 Journal Article

Learning large-scale fuzzy cognitive maps under limited resources

Kai Wu
Jing Liu

Research on the problem of learning large-scale fuzzy cognitive maps (FCMs) with a limited computational budget is outstanding. To learn large-scale FCMs from time series, in most work, this problem is decomposed into learning local connections of each concept, respectively, and then one optimizer is employed to optimize each such sub-problem. Each sub-problem may have different requirements for the computational resource, but the existing methods ignore this issue and allocate the same amounts of computational resources for each sub-problem. In this paper, we propose two strategies to address this problem. We first develop a dynamic resource allocation strategy to maximize the performance of the decomposition-based optimizer under a limited computational budget. Second, we propose a half-thresholding memetic algorithm to improve the performance of the traditional evolutionary algorithm. We term our proposal as a half-thresholding memetic algorithm with a dynamic resource allocation strategy (HTMA-DRA). Finally, the experiments on large-scale synthetic data and DREAM datasets compared with the existing state-of-the-art methods demonstrate the effectiveness of the proposed HTMA-DRA.

AAAI Conference 2022 Conference Paper

Less Is More: Pay Less Attention in Vision Transformers

Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a nonuniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at https: //github. com/zip-group/LIT.

JBHI Journal 2022 Journal Article

PCXRNet: Pneumonia Diagnosis From Chest X-Ray Images Using Condense Attention Block and Multiconvolution Attention Block

Yibo Feng
Xu Yang
Dawei Qiu
Huan Zhang
Dejian Wei
Jing Liu

Coronavirus disease2019 (COVID-19)has become a global pandemic. Many recognition approaches based on convolutional neural networks have been proposed for COVID-19 chest X-ray images. However, only a few of them make good use of the potential inter- and intra-relationships of feature maps. Considering the limitation mentioned above, this paper proposes an attention-based convolutional neural network, called PCXRNet, for diagnosis of pneumonia using chest X-ray images. To utilize the information from the channels of the feature maps, we added a novel condense attention module (CDSE) that comprised of two steps: condensation step and squeeze-excitation step. Unlike traditional channel attention modules, CDSE first downsamples the feature map channel by channel to condense the information, followed by the squeeze-excitation step, in which the channel weights are calculated. To make the model pay more attention to informative spatial parts in every feature map, we proposed a multi-convolution spatial attention module (MCSA). It reduces the number of parameters and introduces more nonlinearity. The CDSE and MCSA complement each other in series to tackle the problem of redundancy in feature maps and provide useful information from and between feature maps. We used the ChestXRay2017 dataset to explore the internal structure of PCXRNet, and the proposed network was applied to COVID-19 diagnosis. As a result, the network achieves an accuracy of 94. 619%, recall of 94. 753%, precision of 95. 286%, and F1-score of 94. 996% on the COVID-19 dataset.

AAAI Conference 2021 Conference Paper

Consistent-Separable Feature Representation for Semantic Segmentation

Xingjian He
Jing Liu
Jun Fu
Xinxin Zhu
Jinqiao Wang
Hanqing Lu

Cross-entropy loss combined with softmax is one of the most commonly used supervision components in most existing segmentation methods. The softmax loss is typically good at optimizing the inter-class difference, but not good at reducing the intra-class variation, which can be suboptimal for semantic segmentation task. In this paper, we propose a Consistent-Separable Feature Representation Network to model the Consistent-Separable (C-S) features, which are intra-class consistent and inter-class separable, improving the discriminative power of the deep features. Specifically, we develop a Consistent-Separable Feature Learning Module to obtain C-S features through a new loss, called Class-Aware Consistency loss. This loss function is proposed to force the deep features to be consistent among the same class and apart between different classes. Moreover, we design an Adaptive feature Aggregation Module to fuse the C-S features and original features from backbone for the better semantic prediction. We show that compared with various baselines, the proposed method brings consistent performance improvement. Our proposed approach achieves state-of-the-art performance on Cityscapes (82. 6% mIoU in test set), ADE20K (46. 65% mIoU in validation set), COCO Stuff (41. 3% mIoU in validation set) and PASCAL Context (55. 9% mIoU in test set).

JBHI Journal 2021 Journal Article

Non-Invasive Capillary Blood Pressure Measurement Enabling Early Detection and Classification of Venous Congestion

Jing Liu
Bryan Yan
Shih-Chi Chen
Yuan-Ting Zhang
Charles Sodini
Ni Zhao

Capillary blood pressure (CBP) is the primary driving force for fluid exchange across microvessels. Subclinical systemic venous congestion prior to overt peripheral edema can directly result in elevated peripheral CBP. Therefore, CBP measurements can enable timely edema control in a variety of clinical cases including venous insufficiency, heart failure and so on. However, currently CBP measurements can be only done invasively and with a complicated experimental setup. In this work, we proposed an opto-mechanical system to achieve non-invasive and automatic CBP measurements through modifying the widely implemented oscillometric technique in home-use arterial blood pressure monitors. The proposed CBP system is featured with a blue light photoplethysmography sensor embedded in finger/toe cuffs to probe skin capillary pulsations. The experimental results demonstrated the proposed CBP system can track local CBP changes induced by different levels of venous congestion. Leveraging the decision tree technique, we demonstrate the use of a multi-site CBP measurement at fingertips and toes to classify four categories of subjects (total N = 40) including patients with peripheral arterial disease, varicose veins and heart failure. Our work demonstrates the promising non-invasive CBP measurement as well as its great potential in realizing point-of-care systems for the management of cardiovascular diseases.

JBHI Journal 2021 Journal Article

PCA-Based Multi-Wavelength Photoplethysmography Algorithm for Cuffless Blood Pressure Measurement on Elderly Subjects

Jing Liu
Shirong Qiu
Ningqi Luo
Sze-Kei Lau
Hui Yu
Timothy Kwok
Yuan-Ting Zhang
Ni Zhao

The prevalence of hypertension has made blood pressure (BP) measurement one of the most wanted functions in wearable devices for convenient and frequent self-assessment of health conditions. The widely adopted principle for cuffless BP monitoring is based on arterial pulse transit time (PTT), which is measured with electrocardiography and photoplethysmography (PPG). To achieve cuffless BP monitoring with more compact wearable electronics, we have previously conceived a multi-wavelength PPG (MWPPG) strategy to perform BP estimation from arteriolar PTT, requiring only a single sensing node. However, challenges remain in decoding the compounded MWPPG signals consisting of both heterogeneous physiological information and motion artifact (MA). In this work, we proposed an improved MWPPG algorithm based on principal component analysis (PCA) which matches the statistical decomposition results with the arterial pulse and capillary pulse. The arteriolar PTT is calculated accordingly as the phase shift based on the entire waveforms, instead of local peak lag time, to enhance the feature robustness. Meanwhile, the PCA-derived MA component is employed to identify and exclude the MA-contaminated segments. To evaluate the new algorithm, we performed a comparative experiment (N = 22) with a cuffless MWPPG measurement device and used double-tube auscultatory BP measurement as a reference. The results demonstrate the accuracy improvement enabled by the PCA-based operations on MWPPG signals, yielding errors of 1. 44 ± 6. 89 mmHg for systolic blood pressure and -1. 00 ± 6. 71 mm Hg for diastolic blood pressure. In conclusion, the proposed PCA-based method can improve the performance of MWPPG in wearable medical devices for cuffless BP measurement.

YNIMG Journal 2021 Journal Article

The coupling of BOLD signal variability and degree centrality underlies cognitive functions and psychiatric diseases

Jintao Sheng
Liang Zhang
Junjiao Feng
Jing Liu
Anqi Li
Wei Chen
Yuedi Shen
Jinhui Wang

Brain signal variability has been consistently linked to functional integration; however, whether this coupling is associated with cognitive functions and/or psychiatric diseases has not been clarified. Using multiple multimodality datasets, including resting-state functional magnetic resonance imaging (rsfMRI) data from the Human Connectome Project (HCP: N = 927) and a Beijing sample (N = 416) and cerebral blood flow (CBF) and rsfMRI data from a Hangzhou sample (N = 29), we found that, compared with the existing variability measure (i. e. , SDBOLD), the mean-scaled (standardized) fractional standard deviation of the BOLD signal (mfSDBOLD) maintained very high test-retest reliability, showed greater cross-site reliability and was less affected by head motion. We also found strong reproducible couplings between the mfSDBOLD and functional integration measured by the degree centrality (DC), both cross-voxel and cross-subject, which were robust to scanning and preprocessing parameters. Moreover, both mfSDBOLD and DC were correlated with CBF, suggesting a common physiological basis for both measures. Critically, the degree of coupling between mfSDBOLD and long-range DC was positively correlated with individuals’ cognitive total composite scores. Brain regions with greater mismatches between mfSDBOLD and long-range DC were more vulnerable to brain diseases. Our results suggest that BOLD signal variability could serve as a meaningful index of local function that underlies functional integration in the human brain and that a strong coupling between BOLD signal variability and functional integration may serve as a hallmark of balanced brain networks that are associated with optimal brain functions.

AAAI Conference 2020 Conference Paper

A Cluster-Weighted Kernel K-Means Method for Multi-View Clustering

Jing Liu
Fuyuan Cao
Xiao-Zhi Gao
Liqin Yu
Jiye Liang

Clustering by jointly exploiting information from multiple views can yield better performance than clustering on one single view. Some existing multi-view clustering methods aim at learning a weight for each view to determine its contribution to the ﬁnal solution. However, the view-weighted scheme can only indicate the overall importance of a view, which fails to recognize the importance of each inner cluster of a view. A view with higher weight cannot guarantee all clusters in this view have higher importance than them in other views. In this paper, we propose a cluster-weighted kernel k-means method for multi-view clustering. Each inner cluster of each view is assigned a weight, which is learned based on the intra-cluster similarity of the cluster compared with all its corresponding clusters in different views, to make the cluster with higher intra-cluster similarity have a higher weight among the corresponding clusters. The cluster labels are learned simultaneously with the cluster weights in an alternative updating way, by minimizing the weighted sum-of-squared errors of the kernel k-means. Compared with the view-weighted scheme, the cluster-weighted scheme enhances the interpretability for the clustering results. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed method.

AAAI Conference 2020 Conference Paper

A Robust Adversarial Training Approach to Machine Reading Comprehension

Kai Liu
Xin Liu
An Yang
Jing Liu
Jinsong Su
Sujian Li
Qiaoqiao She

Lacking robustness is a serious problem for Machine Reading Comprehension (MRC) models. To alleviate this problem, one of the most promising ways is to augment the training dataset with sophisticated designed adversarial examples. Generally, those examples are created by rules according to the observed patterns of successful adversarial attacks. Since the types of adversarial examples are innumerable, it is not adequate to manually design and enrich training data to defend against all types of adversarial attacks. In this paper, we propose a novel robust adversarial training approach to improve the robustness of MRC models in a more generic way. Given an MRC model well-trained on the original dataset, our approach dynamically generates adversarial examples based on the parameters of current model and further trains the model by using the generated examples in an iterative schedule. When applied to the state-of-the-art MRC models, including QANET, BERT and ERNIE2. 0, our approach obtains signiﬁcant and comprehensive improvements on 5 adversarial datasets constructed in different ways, without sacriﬁcing the performance on the original SQuAD development set. Moreover, when coupled with other data augmentation strategy, our approach further boosts the overall performance on adversarial datasets and outperforms the state-of-the-art methods.

JBHI Journal 2020 Journal Article

Feasibility of Fingertip Oscillometric Blood Pressure Measurement: Model-Based Analysis and Experimental Validation

Jing Liu
Charles G. Sodini
Yanghui Ou
Bryan Yan
Yuan-Ting Zhang
Ni Zhao

The most commonly used oscillometric upper-arm (UA) blood pressure (BP) monitors are not convenient enough for ambulatory BP monitoring, given the large size of the arm cuff and the compression of UA during the measurement. Finger-worn oscillometric BP devices featuring miniaturized finger cuff have been developed and researched as an alternative solution to the UA-based measurement, yet the reliability of the finger-based measurement is still questioned. To investigate the feasibility of oscillometric BP measurements at the finger position, we performed model-based analysis and experimental validation to explore the underlying issues associated with extending the cuff-based oscillometric approach from UA to other alternative sites. The simulation results revealed that a larger bone-to-tissue volume ratio produced a lower pressure transmission efficiency, which can account for the inter-site measurement discrepancies of mean blood pressure (MBP). We also experimentally compared the oscillometric MBP measurements at UA, middle forearm, wrist, finger proximal phalanx, and finger distal phalanx (FD) of 20 young adults, and each position was matched with a cuff of appropriate size and kept at the same height with the heart. The experimental results demonstrated that FD could be a superior alternative position for oscillometric BP measurement, as it requires the smallest cuff size while providing the most consistent MBP with the UA. Our analysis also suggested that further study is demanded to identify the appropriate oscillometric algorithm for reliable systolic blood pressure and diastolic blood pressure measurements at FD.

YNIMG Journal 2020 Journal Article

Individual-specific and shared representations during episodic memory encoding and retrieval

Xiaoqian Xiao
Yu Zhou
Jing Liu
Zhifang Ye
Li Yao
Jiacai Zhang
Chuansheng Chen
Gui Xue

Although human memories seem unique to each individual, they are shared to a great extent across individuals. Previous studies have examined, separately, subject-specific and cross-subject shared representations during memory encoding and retrieval, but how shared memories are formed from individually encoded representations is not clearly understood. Using a unique fMRI design involving memory encoding and retrieval, and representational similarity analysis to link representations from different individuals, brain regions, and processing stages, the current study revealed that distributed brain regions showed both subject-specific and shared neural representations during both memory encoding and retrieval. Furthermore, different brain regions showed stage-specific representational strength, with the visual cortex showing greater unique and shared representations during encoding, whereas the left angular gyrus showing greater unique and shared representations during retrieval. The neural representations during encoding were transformed during retrieval, as shown by smaller cross-subject encoding-retrieval similarity (ERS) than cross-subject similarity either during encoding or during retrieval. This cross-subject and cross-stage similarity was found both within and across regions, with strong pattern similarity between the encoded representation in VVC and the retrieved representation in the angular gyrus. Simulation analysis further suggested that these patterns could be achieved by incorporating stage-specific representational strength, and cross-region reinstatement from encoding to retrieval, but not by a common transformation from encoding to retrieval across subjects. Together, our results shed light on how memory representations are encoded and transformed to maintain individual characteristics and at the same time to create shared representations to facilitate interpersonal communication.

IJCAI Conference 2020 Conference Paper

Latent Regularized Generative Dual Adversarial Network For Abnormal Detection

Chengwei Chen
Jing Liu
Yuan Xie
Yin Xiao Ban
Chunyun Wu
Yiqing Tao
Haichuan Song

With the development of adversarial attack in deep learning, it is critical for abnormal detector to not only discover the out-of-distribution samples but also provide defence against the adversarial attacker. Since few previous universal detector is known to work well on both tasks, we consider against both scenarios by constructing a robust and effective technique, where one sample could be regarded as the abnormal sample if it exhibits a higher image reconstruction error. Due to the training instability issues existed in previous generative adversarial networks (GANs) based methods, in this paper we propose a dual auxiliary autoencoder to make a tradeoff between the capability of generator and discriminator, leading to a more stable training process and high-quality image reconstruction. Moreover, to generate discriminative and robust latent representations, the mutual information estimator regarded as latent regularizer is adopted to extract the most unique information of target class. Overall, our generative dual adversarial network simultaneously optimizes the image reconstruction space and latent space to improve the performance. Experiments show that our model has the clear superiority over cutting edge semi-supervised abnormal detectors and achieves the state-of-the-art results on the datasets.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Longteng Guo
Jing Liu
Xinxin Zhu
Xingjian He
Jie Jiang
Hanqing Lu

Most image captioning models are autoregressive, i. e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13. 9x decoding speedup.

PDF Details DOI

TIST Journal 2019 Journal Article

Comparison and Modelling of Country-level Microblog User and Activity in Cyber-physical-social Systems Using Weibo and Twitter Data

Po Yang
Jing Liu
Jun Qi
Yun Yang
Xulong Wang
Zhihan Lv

As the rapid growth of social media technologies continues, Cyber-Physical-Social System (CPSS) has been a hot topic in many industrial applications. The use of “microblogging” services, such as Twitter, has rapidly become an influential way to share information. While recent studies have revealed that understanding and modelling microblog user behaviour with massive users’ data in social media are keen to success of many practical applications in CPSS, a key challenge in literatures is that diversity of geography and cultures in social media technologies strongly affect user behaviour and activity. The motivation of this article is to understand differences and similarities between microblogging users from different countries using social media technologies, and to attempt to design a Country-Level Micro-Blog User (CLMB) behaviour and activity model for supporting CPSS applications. We proposed a CLMB model for analysing microblogging user behaviour and their activity across different countries in the CPSS applications. The model has considered three important characteristics of user behaviour in microblogging data, including content of microblogging messages, user emotion index, and user relationship network. We evaluated CLBM model under the collected microblog dataset from 16 countries with the largest number of representative and active users in the world. Experimental results show that (1) for some countries with small population and strong cohesiveness, users pay more attention to social functionalities of microblogging service; (2) for some countries containing mostly large loose social groups, users use microblogging services as a news dissemination platform; (3) users in countries whose social network structure exhibits reciprocity rather than hierarchy will use more linguistic elements to express happiness in microblogging services.

IJCAI Conference 2019 Conference Paper

Densely Connected Attention Flow for Visual Question Answering

Fei Liu
Jing Liu
Zhiwei Fang
Richang Hong
Hanqing Lu

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector efficiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.

IJCAI Conference 2019 Conference Paper

FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data

Haipeng Chen
Sushil Jajodia
Jing Liu
Noseong Park
Vadim Sokolov
V. S. Subrahmanian

In many cases, an organization wishes to release some data, but is restricted in the amount of data to be released due to legal, privacy and other concerns. For instance, the US Census Bureau releases only 1% of its table of records every year, along with statistics about the entire table. However, the machine learning (ML) models trained on the released sub-table are usually sub-optimal. In this paper, our goal is to find a way to augment the sub-table by generating a synthetic table from the released sub-table, under the constraints that the generated synthetic table (i) has similar statistics as the entire table, and (ii) preserves the functional dependencies of the released sub-table. We propose a novel generative adversarial network framework called ITS-GAN, where both the generator and the discriminator are specifically designed to satisfy these two constraints. By evaluating the augmentation performance of ITS-GAN on two representative datasets, the US Census Bureau data and US Bureau of Transportation Statistics (BTS) data, we show that ITS-GAN yields high quality classification results, and significantly outperforms various state-of-the-art data augmentation approaches.

IS Journal 2019 Journal Article

Identifying Adverse Drug Events From Social Media Using An Improved Semisupervised Method

Jing Liu
Gang Wang
Gang Chen

Adverse drug event (ADE) is a serious health concern. Social media has provided patients a broad platform to share their ADE experiences, impelling the development of social media-based pharmacovigilance. However, social media analysis of ADEs presents several important challenges that need to be addressed for high-performing ADE identification. To address these challenges, a feature weighted-based improved disagreement-based semisupervised learning method, named WIDSSL, is proposed for effectively identifying ADEs from non-ADEs. Empirical results demonstrate the effectiveness of WIDSSL. Our proposed WIDSSL method can reduce the reliance on a large number of labeled instances for high-performing ADE identification, and hence enhance the feasibility of conducting social media-based pharmacovigilance.

IJCAI Conference 2019 Conference Paper

VEST: A System for Vulnerability Exploit Scoring & Timing

Haipeng Chen
Jing Liu
Rui Liu
Noseong Park
V. S. Subrahmanian

Knowing if/when a cyber-vulnerability will be exploited and how severe the vulnerability is can help enterprise security officers (ESOs) come up with appropriate patching schedules. Today, this ability is severely compromised: our study of data from Mitre and NIST shows that on average there is a 132 day gap between the announcement of a vulnerability by Mitre and the time NIST provides an analysis with severity score estimates and 8 important severity attributes. Many attacks happen during this very 132-day window. We present Vulnerability Exploit Scoring \& Timing (VEST), a system for (early) prediction and visualization of if/when a vulnerability will be exploited, and its estimated severity attributes and score.

NeurIPS Conference 2018 Conference Paper

Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang
Mingkui Tan
Bohan Zhuang
Jing Liu
Yong Guo
Qingyao Wu
Junzhou Huang
Jinhui Zhu

Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0. 39% in top-1 accuracy.

AIIM Journal 2018 Journal Article

SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media

Jing Liu
Songzheng Zhao
Gang Wang

With the development of Web 2. 0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e. g. , drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction – distinguishing ADE relationship from other relation types – necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space’s high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems.

YNIMG Journal 2017 Journal Article

Dissociated roles of the parietal and frontal cortices in the scope and control of attention during visual working memory

Siyao Li
Ying Cai
Jing Liu
Dawei Li
Zifang Feng
Chuansheng Chen
Gui Xue

Mounting evidence suggests that multiple mechanisms underlie working memory capacity. Using transcranial direct current stimulation (tDCS), the current study aimed to provide causal evidence for the neural dissociation of two mechanisms underlying visual working memory (WM) capacity, namely, the scope and control of attention. A change detection task with distractors was used, where a number of colored bars (i. e. , two red bars, four red bars, or two red plus two blue bars) were presented on both sides (Experiment 1) or the center (Experiment 2) of the screen for 100ms, and participants were instructed to remember the red bars and to ignore the blue bars (in both Experiments), as well as to ignore the stimuli on the un-cued side (Experiment 1 only). In both experiments, participants finished three sessions of the task after 15min of 1. 5mA anodal tDCS administered on the right prefrontal cortex (PFC), the right posterior parietal cortex (PPC), and the primary visual cortex (VC), respectively. The VC stimulation served as an active control condition. We found that compared to stimulation on the VC, stimulation on the right PPC specifically increased the visual WM capacity under the no-distractor condition (i. e. , 4 red bars), whereas stimulation on the right PFC specifically increased the visual WM capacity under the distractor condition (i. e. , 2 red bars plus 2 blue bars). These results suggest that the PPC and PFC are involved in the scope and control of attention, respectively. We further showed that compared to central presentation of the stimuli (Experiment 2), bilateral presentation of the stimuli (on both sides of the fixation in Experiment 1) led to an additional demand for attention control. Our results emphasize the dissociated roles of the frontal and parietal lobes in visual WM capacity, and provide a deeper understanding of the neural mechanisms of WM.

TCS Journal 2017 Journal Article

Group Rekeying in the Exclusive Subset-Cover Framework

Jing Liu
Minmin Liu
Changji Wang
Shaowen Yao

Group Rekeying deals with the problem about how to efficiently and securely distribute a new group key GK to remaining legitimate users when there are changes in group membership (join/leave). Given a universe U of n users, an exclusive key K S for an arbitrary subset S ⊂ U is a long-term key shared by all users in U ∖ S. Hence we can distribute a new group key GK encrypted under K S such that all users in U except those in S can decrypt it during group rekeying. This method allows us to exclude S from the group with a rekey message whose length is just one single encrypted key. In this paper, we use this idea to extend the famous Subset-Cover Framework to obtain its exclusive version — Exclusive Subset-Cover Framework. We provide sufficient conditions that guarantee the security of any stateless group rekeying protocol in this framework. We propose a concrete exclusive subset-cover protocol called exclusive complete subtree protocol. Compared with existing 1-resilient stateless group rekeying protocols, this protocol achieves not only constant communication overhead but also better computational efficiency as well as better collusion resistance. From this protocol, it is easy to obtain a 1-resilient stateful group rekeying protocol which also outperforms the existing 1-resilient stateful protocols. Recent researches have proved some lower bounds on the communication complexity of group rekeying protocols. These bounds suggest that it is impossible to achieve a lower communication overhead without trading off some degree of collusion resistance. However, there are application scenarios which require communication overhead below these bounds. We show that any 1-resilient stateless group rekeying protocol with constant communication overhead can be used in tandem with a Subset-Cover based protocol to construct a hybrid protocol with tunable collusion-bandwidth tradeoffs.

AIIM Journal 2016 Journal Article

An ensemble method for extracting adverse drug events from social media

Jing Liu
Songzheng Zhao
Xiaodi Zhang

Objective Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2. 0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. Methods and materials We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i. e. , subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i. e. , majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i. e. , Twitter). Results When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78. 6%, 72. 2%, and 79. 2% on the three data sets. When individual methods are used, we attain the best AUC values of 82. 1%, 73. 2%, and 77. 0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84. 5%, 77. 3%, and 84. 5% on the three data sets, outperforming the baselines. Conclusions Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness.

JBHI Journal 2016 Journal Article

Continuous Blood Pressure Measurement From Invasive to Unobtrusive: Celebration of 200th Birth Anniversary of Carl Ludwig

Xiao-Rong Ding
Ni Zhao
Guang-Zhong Yang
Roderic I. Pettigrew
Benny Lo
Fen Miao
Ye Li
Jing Liu

The year 2016 marks the 200th birth anniversary of Carl Friedrich Wilhelm Ludwig (1816-1895). As one of the most remarkable scientists, Ludwig invented the kymograph, which for the first time enabled the recording of continuous blood pressure (BP), opening the door to the modern study of physiology. Almost a century later, intraarterial BP monitoring through an arterial line has been used clinically. Subsequently, arterial tonometry and volume clamp method were developed and applied in continuous BP measurement in a noninvasive way. In the last two decades, additional efforts have been made to transform the method of unobtrusive continuous BP monitoring without the use of a cuff. This review summarizes the key milestones in continuous BP measurement; that is, kymograph, intraarterial BP monitoring, arterial tonometry, volume clamp method, and cuffless BP technologies. Our emphasis is on recent studies of unobtrusive BP measurements as well as on challenges and future directions.

YNIMG Journal 2016 Journal Article

Dissociated neural substrates underlying impulsive choice and impulsive action

Qiang Wang
Chunhui Chen
Ying Cai
Siyao Li
Xiao Zhao
Li Zheng
Hanqi Zhang
Jing Liu

There is a growing consensus that impulsivity is a multifaceted construct that comprises several components such as impulsive choice and impulsive action. Although impulsive choice and impulsive action have been shown to be the common characteristics of some impulsivity-related psychiatric disorders, surprisingly few studies have directly compared their neural correlates and addressed the question whether they involve common or distinct neural correlates. We addressed this important empirical gap using an individual differences approach that could characterize the functional relevance of neural networks in behaviors. A large sample (n=227) of college students was tested with the delay discounting and stop-signal tasks, and their performances were correlated with the neuroanatomical (gray matter volume, GMV) and functional (resting-state functional connectivity, RSFC) measures, using multivariate pattern analysis (MVPA) and 10-fold cross-validation. Behavioral results showed no significant correlation between impulsive choice measured by discounting rate (k) and impulsive action measured by stop signal reaction time (SSRT). The GMVs in the right frontal pole (FP) and left middle frontal gyrus (MFG) were predictive of k, but not SSRT. In contrast, the GMVs in the right inferior frontal gyrus (IFG), supplementary motor area (SMA), and anterior cingulate cortex (ACC) could predict individuals' SSRT, but not k. RSFC analysis using the FP and right IFG as seed regions revealed two distinct networks that correspond well to the “waiting” and “stopping” systems, respectively. Furthermore, the RSFC between the FP and ventromedial prefrontal cortex (VMPFC) was predictive of k, whereas the RSFC between the IFG and pre-SMA was predictive of SSRT. These results demonstrate clearly neural dissociations between impulsive choice and impulsive action, provide new insights into the nature of impulsivity, and have implications for impulsivity-related disorders.

TIST Journal 2016 Journal Article

Multimedia News Summarization in Search

Zechao Li
Jinhui Tang
Xueming Wang
Jing Liu
Hanqing Lu

It is a necessary but challenging task to relieve users from the proliferative news information and allow them to quickly and comprehensively master the information of the whats and hows that are happening in the world every day. In this article, we develop a novel approach of multimedia news summarization for searching results on the Internet, which uncovers the underlying topics among query-related news information and threads the news events within each topic to generate a query-related brief overview. First, the hierarchical latent Dirichlet allocation (hLDA) model is introduced to discover the hierarchical topic structure from query-related news documents, and a new approach based on the weighted aggregation and max pooling is proposed to identify one representative news article for each topic. One representative image is also selected to visualize each topic as a complement to the text information. Given the representative documents selected for each topic, a time-bias maximum spanning tree (MST) algorithm is proposed to thread them into a coherent and compact summary of their parent topic. Finally, we design a friendly interface to present users with the hierarchical summarization of their required news information. Extensive experiments conducted on a large-scale news dataset collected from multiple news Web sites demonstrate the encouraging performance of the proposed solution for news summarization in news retrieval.

IJCAI Conference 2015 Conference Paper

Weakly Supervised RBM for Semantic Segmentation

Yong Li
Jing Liu
Yuhang Wang
Hanqing Lu
Songde Ma

In this paper, we propose a weakly supervised Restricted Boltzmann Machines (WRBM) approach to deal with the task of semantic segmentation with only image-level labels available. In WRBM, its hidden nodes are divided into multiple blocks, and each block corresponds to a specific label. Accordingly, semantic segmentation can be directly modeled by learning the mapping from visible layer to the hidden layer of WRBM. Specifically, based on the standard RBM, we import another two terms to make full use of image-level labels and alleviate the effect of noisy labels. First, we expect the hidden response of each superpixel is suppressed on the labels outside its parent image-level label set, and a non-image-level label suppression term is formulated to implicitly import the image-level labels as weak supervision. Second, semantic graph propagation is employed to exploit the cooccurrence between visually similar regions and labels. Besides, we deal with the problems of label imbalance and diverse backgrounds by adapting the block size to the label frequency and appending hidden response blocks corresponding to backgrounds respectively. Extensive experiments on two real-world datasets demonstrate the good performance of our approach compared with some state-of-the-art methods.

AAAI Conference 2014 Conference Paper

Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning

Cam-Tu Nguyen
Xiaoliang Wang
Jing Liu
Zhi-Hua Zhou

Multi-Instance Multi-Label (MIML) is a learning framework where an example is associated with multiple labels and represented by a set of feature vectors (multiple instances). In the formalization of MIML learning, instances come from a single source (single view). To leverage multiple information sources (multi-view), we develop a multi-view MIML framework based on hierarchical Bayesian Network, and derive an effective learning algorithm based on variational inference. The model can naturally deal with examples in which some views could be absent (partial examples). On multi-view datasets, it is shown that our method is better than other multi-view and single-view approaches particularly in the presence of partial examples. On single-view benchmarks, extensive evaluation shows that our method is highly competitive or better than other MIML approaches on labeling examples and instances. Moreover, our method can effectively handle datasets with a large number of labels.

AAAI Conference 2014 Conference Paper

Learning Low-Rank Representations with Classwise Block-Diagonal Structure for Robust Face Recognition

Yong Li
Jing Liu
Zechao Li
Yangmuzi Zhang
Hanqing Lu
Songde Ma

Face recognition has been widely studied due to its importance in various applications. However, the case that both training images and testing images are corrupted is not well addressed. Motivated by the success of low-rank matrix recovery, we propose a novel semisupervised low-rank matrix recovery algorithm for robust face recognition. The proposed method can learn robust discriminative representations for both training images and testing images simultaneously by exploiting the classwise block-diagonal structure. Specifically, low-rank matrix approximation can handle the possible contamination of data. Moreover, the classwise blockdiagonal structure is exploited to promote discrimination of representations for robust recognition. The above issues are formulated into a unified objective function and we design an efficient optimization procedure based on augmented Lagrange multiplier method to solve it. Extensive experiments on three public databases are performed to validate the effectiveness of our approach. The strong identification capability of representations with block-diagonal structure is verified.

YNIMG Journal 2012 Journal Article

Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map

Jing Liu
Tian Liu
Ludovic de Rochefort
James Ledoux
Ildar Khalidov
Weiwei Chen
A. John Tsiouris
Cynthia Wisnieff

The magnetic susceptibility of tissue can be determined in gradient echo MRI by deconvolving the local magnetic field with the magnetic field generated by a unit dipole. This Quantitative Susceptibility Mapping (QSM) problem is unfortunately ill-posed. By transforming the problem to the Fourier domain, the susceptibility appears to be undersampled only at points where the dipole kernel is zero, suggesting that a modest amount of additional information may be sufficient for uniquely resolving susceptibility. A Morphology Enabled Dipole Inversion (MEDI) approach is developed that exploits the structural consistency between the susceptibility map and the magnitude image reconstructed from the same gradient echo MRI. Specifically, voxels that are part of edges in the susceptibility map but not in the edges of the magnitude image are considered to be sparse. In this approach an L1 norm minimization is used to express this sparsity property. Numerical simulations and phantom experiments are performed to demonstrate the superiority of this L1 minimization approach over the previous L2 minimization method. Preliminary brain imaging results in healthy subjects and in patients with intracerebral hemorrhages illustrate that QSM is feasible in practice.

AAAI Conference 2012 Conference Paper

Unsupervised Feature Selection Using Nonnegative Spectral Analysis

Zechao Li
Yi Yang
Jing Liu
Xiaofang Zhou
Hanqing Lu

In this paper, a new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), is proposed. To exploit the discriminative information in unsupervised scenarios, we perform spectral clustering to learn the cluster labels of the input samples, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables NDFS to select the most discriminative features. To learn more accurate cluster labels, a nonnegative constraint is explicitly imposed to the class indicators. To reduce the redundant or even noisy features, `2, 1-norm minimization constraint is added into the objective function, which guarantees the feature selection matrix sparse in rows. Our algorithm exploits the discriminative information and feature correlation simultaneously to select a better feature subset. A simple yet efficient iterative algorithm is designed to optimize the proposed objective function. Experimental results on different real world datasets demonstrate the encouraging performance of our algorithm over the state-of-the-arts.

TCS Journal 2010 Journal Article

Algorithmic properties of ciliate sequence alignment

J. Mark Keil
Jing Liu
Ian McQuillan

We study the problem of optimally partitioning scrambled genes of stichotrichous ciliates into their relevant functional segments, and of aligning scrambled genes with non-scrambled genes. This problem is significantly more difficult than traditional sequence alignment due to the patterns that occur in the scrambled genes. Here, a formal model is created to capture this problem. Then, the inherent complexity of this problem is discussed using the model. We determine that the problem of determining if there is a solution (an alignment) which achieves some minimum score is NP-complete.

EAAI Journal 2010 Journal Article

Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO

Qi Wu
Shuyan Wu
Jing Liu

In view of the bad capability of the standard support vector machine (SVM) in field of white noise of input series, a new v-SVM with Gaussian loss function which is call g-SVM is put forward to handle white noises. To seek the unknown parameters of g-SVM, an adaptive normal Gaussian particle swarm optimization (ANPSO) is also proposed. The results of applications show that the hybrid forecasting model based on the g-SVM and ANPSO is feasible and effective, the comparison between the method proposed in this paper and other ones is also given which proves this method is better than v-SVM and other traditional methods.

IROS Conference 2006 Conference Paper

A Remote Aerial Robot for Topographic Survey

Xiaoguang Zhao
Jing Liu
Min Tan 0001

In this paper a seminal system for the topographic survey with an unmanned aerial robot is presented. The proposed system has demonstrated the feasibility of acquiring initial 3D ground models using active laser range sensors on a low-flying helicopter platform. The robot system consists of a remote control helicopter with a laser sensor and a GPS (global positioning system) for collecting ground point data and a post-processing data sub-system for drawing a topographic map. This robot system has many potential applications, such as terrain modeling, structure inspection or climate and weather measurement, etc. The experiment results verify the proposal robot system for topographic survey

YNIMG Journal 2005 Journal Article

The integrated response of the human cerebro-cerebellar and limbic systems to acupuncture stimulation at ST 36 as evidenced by fMRI

Kathleen K.S. Hui
Jing Liu
Ovidiu Marina
Vitaly Napadow
Christian Haselgrove
Kenneth K. Kwong
David N. Kennedy
Nikos Makris

ICRA Conference 2002 Conference Paper

Ramadge-Wonham Supervisory Control of Mobile Robots: Lessons from Practice

Jing Liu
Houshang Darabi

We used Ramadge-Wonham (RW) theory (1987) of supervisory control to control a system of mobile robots. We discuss our experience in modeling and implementation of the developed control system. We specifically address the control program structure that manages the interaction of the RW controller with its plant. We also present our approach in dealing with practical issues such as forcing events and simultaneous events. The advantages and disadvantages of the RW controller are discussed.