Arrow Research search

Author name cluster

Wei Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

185 papers
2 author rows

Possible papers

185

AAAI Conference 2026 Conference Paper

AdaDepth: Exploiting Inherent Scene Information for Self-Supervised Depth Estimation in Dynamic Scenes

  • Xuanang Gao
  • Xiongbin Wu
  • Zhiwei Ning
  • Runze Yang
  • Zhonglong Zheng
  • Jie Yang
  • Wei Liu

Self-supervised monocular depth estimation methods severely compromise accuracy in dynamic objects due to their static scene assumption. Existing approaches for dynamic scenes suffer from two critical shortcomings: 1) reliance on supervised segmentation models (requiring costly annotations) or computationally intensive multi-branch models to isolate moving objects, and 2) simple integration of 2D/3D motion flow without reliable supervision for dynamic objects. We propose AdaDepth, a two‑stage framework that jointly performs unsupervised scene decomposition and dynamic-aware depth learning. In the initial structural stage, our geometry-motion joint scene decomposition (GMoDecomp) module ensures the robust generation of a depth prior and simultaneously partitions the scene into multiple regions through the fusion of geometric and motion cues. In the region-adaptive refinement stage, we exploit the depth prior and decomposed regions to introduce motion-aware and geometry-consistent constraints, effectively improving depth estimation in dynamic scenes. AdaDepth achieves accurate depth prediction in highly dynamic scenes without relying on external labels or specialized segmentation models. Extensive experiments on KITTI, Cityscapes, and Waymo Open demonstrate its superiority over state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

Agent-SAMA: State-Aware Mobile Assistant

  • Linqiang Guo
  • Wei Liu
  • Yi Wen Heng
  • Tse-Hsun (Peter) Chen
  • Yang Wang

Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, lim- iting GUI agents’ ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on two types of benchmarks: cross- app (Mobile-Eval-E, SPA-Bench) and mostly single-app (AndroidWorld). On Mobile-Eval-E, Agent-SAMA achieves an 84.0% success rate and a 71.9% recovery rate. On SPA-Bench, it reaches an 80.0% success rate with a 66.7% recovery rate. Compared to prior methods, Agent-SAMA improves task success by up to 12% and recovery success by 13.8%. On AndroidWorld, Agent-SAMA achieves a 63.7% success rate, outperforming the baselines. Our results demonstrate that structured state modeling enhances robustness and can serve as a lightweight, model-agnostic memory layer for future GUI agents.

JBHI Journal 2026 Journal Article

CQH-MPN: A Classical–Quantum Hybrid Prototype Network With Fuzzy Proximity-Based Classification for Early Glaucoma Diagnosis

  • Wei Liu
  • Haijian Shao
  • Xing Deng
  • Yingtao Jiang

Glaucoma is the second leading cause of blindness worldwide and the only form of irreversible vision loss, making early and accurate diagnosis essential. Although deep learning has revolutionized medical image analysis, its dependence on large-scale annotated datasets poses a significant barrier, especially in clinical scenarios with limited labeled data. To address this challenge, we propose a Classical–Quantum Hybrid Mean Prototype Network (CQH-MPN) tailored for few-shot glaucoma diagnosis. CQH-MPN integrates a quantum feature encoder, which exploits quantum superposition and entanglement for enhanced global representation learning, with a classical convolutional encoder to capture local structural features. These dual encodings are fused and projected into a shared embedding space, where mean prototype representations are computed for each class. We introduce a fuzzy proximity-based metric that extends traditional prototype distance measures by incorporating intra-class variability and inter-class ambiguity, thereby improving classification sensitivity under uncertainty. Our model is evaluated on two public retinal fundus image datasets—ACRIMA and ORIGA—under 1-shot, 3-shot, and 5-shot settings. Results show that CQH-MPN consistently outperforms other models, achieving an accuracy of 94. 50% $\pm$ 1. 04% on the ACRIMA dataset under the 1-shot setting. Moreover, the proposed method demonstrates significant performance improvements across different shot configurations on both datasets. By effectively bridging the representational power of quantum computing with classical deep learning, CQH-MPN demonstrates robust generalization in data-scarce environments. This work lays the foundation for quantum-augmented few-shot learning in medical imaging and offers a viable solution for real-world, low-resource diagnostic applications.

JBHI Journal 2026 Journal Article

Detecting Driver Sleepiness From Physiological Indicators Using a CNN-LSTM Self-Attention Model

  • Yingying Jiao
  • Yifan Zhang
  • Wei Liu
  • Zhuqing Jiao

Sleepiness at the wheel is an important factor contributing to road traffic accidents. Based on the characteristic changes in Electroencephalography (EEG) and Electrooculography (EOG) signals, a dozing state is refined into three sub-states: the onset, duration, and end state. Each state is characterized by different physiological indicators such as the EEG alpha waves, the rising edge, and falling edge waveforms in EOG signals. To enable real-time detection of these physiological indicators, we propose a framework integrating three Convolutional Neural Network–Long Short-Term Memory–Self-Attention (CLSA) models, which combine CNN-based local feature extraction with self-attention mechanism for global context capture. The framework is evaluated for performance on continuous test data from 12 subjects. Our results demonstrate that by detecting alpha waves and the rising edge waveform, the alpha wave epoch (AWE) at the onset of the dozing state can be identified with high accuracy and precision. Thus, the onset sub-state is calculated as the period from the start time of the rising edge waveform to the time when the AWE is valid. Subsequently, the duration sub-state corresponds to the sustained presence of alpha waves. Furthermore, the falling edge waveform is detected with high accuracy, enabling the classification of the end state into two distinct phenomena: alpha blocking phenomenon or alpha wave attenuation-disappearance phenomenon, representing the sleepiness level—relaxed wakefulness or sleep onset, respectively. Utilizing three-channel signal processing, this framework provides a promising approach for real-time sleepiness detection in real-world driving scenarios.

JBHI Journal 2026 Journal Article

Digital Twins Framework for Clinical Decision-Centric Co-Management of Patient Monitoring and Environment Management

  • Wei Liu
  • Yuanyuan Sun
  • Jing Wang
  • Nanchang Yin

The convergence of continuous physiological monitoring and intelligent building systems in smart clinics offers a transformative opportunity for patient-centered care, yet it introduces the challenge of harmonizing clinical fidelity, patient comfort, and operational sustainability. We present DT-ECO, a privacy-preserving digital twins framework that enables decision-centric co-management of multi-modal patient monitoring and clinical environmental systems. DT-ECO constructs a hybrid digital twin that integrates a physics-informed building model with graph-temporal physiological inference and battery electrochemistry, enabling real-time synchronization between patient state, IoT device operation, and environmental dynamics within a differentiable programming environment. On this foundation, a hierarchical control strategy is developed, in which a constrained deep reinforcement learning agent adaptively schedules wearable IoT sensor sampling to extend device lifetime, while a model predictive controller orchestrates HVAC operation and on-site energy resources to maintain a therapeutic environment. Extensive evaluations on DOE reference hospitals and public ECG datasets demonstrate that DT-ECO achieves a 31. 8% reduction in annual energy consumption and extends median wearable battery life by 28%, while rigorously maintaining clinical standards-evidenced by less than 0. 6% thermal comfort violation and no degradation in arrhythmia detection capability (F1-score 0. 956). By bridging the gap between patient physiology and the clinical environment, DT-ECO establishes a pathway toward precision healthcare facilities that are simultaneously patient-centric, diagnostically robust, and operationally sustainable.

AAAI Conference 2026 Conference Paper

Domain-Aware Suppression and Aggregation for Federated DG ReID

  • Zhixi Yu
  • Wei Liu
  • Wenke Huang
  • Bin Yang
  • Qian Bie
  • Guancheng Wan
  • Xin Xu

Federated domain generalization in person re-identification (FedDG-ReID) aims to learn a privacy-preserving server model from decentralized client source domains that generalizes to unseen domains. Existing approaches enhance the generalizability of the server model by increasing the diversity of client person data. However, these methods overlook that ReID model parameters are easily biased by client-specific data distributions, leading to the capture of excessive domain-specific identity information. Such identity information (e.g., clothing style) struggles with identity information in unseen domains, thereby hindering the generalization ability of the server model. To address this, we propose a novel FedDG-ReID framework, which mainly consists of Domain-aware Parameter Suppression (DPS) and Domain-invariant Weighted Aggregation (DWA), called FedSupWA. Specifically, DPS adaptively attenuates the update magnitude of the parameters based on the fit of the parameters to the client's domain, encouraging the model to focus on more generalized domain-independent identity information, such as pedestrian contours, and other consistent information across domains. DWA enhances the server model’s generalization by evaluating the effectiveness of the client model in maintaining the consistency of pedestrian identities to measure the importance of the learned domain-independent identity information and assigning greater aggregation weights to clients that contribute more generalized information. Extensive experiments demonstrate the effectiveness of FedSupWA, showing that it achieves state-of-the-art performance.

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

  • Ge Ying
  • Dawei Zhang
  • Chengzhuan Yang
  • Wei Liu
  • Sang-Woon Jeon
  • Hua Wang
  • Changqin Huang
  • Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

AAAI Conference 2026 Conference Paper

FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification

  • Xin Xu
  • Binchang Ma
  • Zhixi Yu
  • Wei Liu

The application of federated domain generalization in person re-identification (FedDG-ReID) aims to enhance the model's generalization ability in unseen domains while protecting client data privacy. However, existing mainstream methods typically rely on global feature representations and simple averaging operations for model aggregation, leading to two limitations in domain generalization: (1) Using only global features makes it difficult to capture subtle, domain-invariant local details (such as accessories or textures); (2) Uniform parameter averaging treats all clients as equivalent, ignoring their differences in robust feature extraction capabilities, thereby diluting the contributions of high-quality clients. To address these issues, we propose a novel federated learning framework—Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration (FedARKS)—comprising two mechanisms: RK (Robust Knowledge) and KS (Knowledge Selection). In our design, each client employs a dual-branch network of RK: the Global Feature Processing Branch serves as the primary component, extracting overall representations for model aggregation and server-side updates; while the Body Part Processing Branch acts as an auxiliary component, focusing on extracting domain-invariant local details to supplement and guide the local training process during global feature learning. Additionally, our KS mechanism adaptively assigns corresponding aggregation weights to clients based on their ability to extract domain-invariant knowledge, enabling the server to better integrate cross-domain invariant knowledge extracted by clients. Extensive experiments validate that FedARKS achieves state-of-the-art generalization results on the FedDG-ReID benchmark, demonstrating that learning subtle body part features can effectively assist and reinforce global representations, thereby enabling robust cross-domain person ReID capabilities.

AAAI Conference 2026 Conference Paper

ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement

  • Xin Xu
  • Hao Liu
  • Wei Liu
  • Wei Wang
  • Jiayi Wu
  • Kui Jiang

Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.

TMLR Journal 2026 Journal Article

LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication

  • Wei Liu
  • Anweshit Panda
  • Ujwal Pandey
  • Haven Cook
  • George Slota
  • Naigang Wang
  • Jie Chen
  • Yangyang Xu

In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Adaptive gradient methods, such as Adam, have demonstrated strong practical performance in deep learning and centralized distributed settings. However, their convergence properties remain largely unexplored in decentralized settings involving multiple local training steps, such as federated learning. To address this limitation, we propose LoDAdaC, a unified multiple \textbf{Lo}cal Training (MLT) \textbf{D}ecentralized framework with \textbf{Ada}m-type updates and \textbf{C}ompressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and GPT-style language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.

AAAI Conference 2026 Conference Paper

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

  • Hui Huang
  • Yancheng He
  • Wei Liu
  • Muyun Yang
  • Jiaheng Liu
  • Kehai Chen
  • Bing Xu
  • Conghui Zhu

The widespread adoption of reinforcement learning-based alignment highlights the growing importance of reward models. Various benchmarks have been built to evaluate reward models in various domains and scenarios. However, a significant gap remains in assessing reward models for long-form generation, despite its critical role in real-world applications. To bridge this, we introduce Long-form RewardBench, the first reward modeling testbed specifically designed for long-form generation. Our benchmark encompasses five key subtasks: QA, RAG, Chat, Writing, and Reasoning. We collected instruction and preference data through a meticulously designed multi-stage data collection process, and conducted extensive experiments on 20+ mainstream reward models, including both classifiers and generative models. Our findings reveal that current models still lack long-form reward modeling capabilities. Furthermore, we designed a novel Long-form Needle-in-a-Haystack Test, which revealed a correlation between reward modeling performance and the error's position within a response, as well as the overall response length, with distinct characteristics observed between classification and generative models. Finally, we demonstrate that classifier exhibit better generalizability compared to generative models trained on the same data. As the first benchmark for long-form reward modeling, this work aims to offer a robust platform for visualizing progress in this crucial area.

AAAI Conference 2026 Conference Paper

MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution

  • Hua Chang
  • Xin Xu
  • Wei Liu
  • Wei Wang
  • Xin Yuan
  • Kui Jiang

Chinese opera is celebrated for preserving classical art. However, early filming equipment limitations have degraded videos of last-century performances by renowned artists (e.g., low frame rates and resolution), hindering archival efforts. Although space-time video super-resolution (STVSR) has advanced significantly, applying it directly to opera videos remains challenging. The scarcity of datasets impedes the recovery of high-frequency details, and existing STVSR methods lack global modeling capabilities—compromising visual quality when handling opera’s characteristic large motions. To address these challenges, we pioneer a large-scale Chinese Opera Video Clip (COVC) dataset and propose the Mamba-based multiscale fusion network for space-time Opera Video Super-Resolution (MambaOVSR). Specifically, MambaOVSR involves three novel components: the Global Fusion Module (GFM) for motion modeling through a multiscale alternating scanning mechanism, and the Multiscale Synergistic Mamba Module (MSMM) for alignment across different sequence lengths. Additionally, our MambaVR block resolves feature artifacts and positional information loss during alignment. Experimental results on the COVC dataset show that MambaOVSR significantly outperforms the SOTA STVSR method by an average of 1.86 dB in terms of PSNR.

AAAI Conference 2026 Conference Paper

Not All Inconsistency Is Equal: Decomposing LVLM Uncertainty into Belief Divergence and Belief Conflict

  • Jie Shi
  • Xiaodong Yue
  • Wei Liu
  • Yufei Chen
  • Feifan Dong

Uncertainty Quantification (UQ) is critical for detecting hallucinations in black-box Large Vision-Language Models (LVLMs). However, prevailing methods like Discrete Semantic Entropy (DSE) are unreliable, as their scores are primarily dominated by the number of semantic clusters. This renders them incapable of distinguishing between benign semantic ambiguity (varied but coherent responses) and severe belief conflict (contradictory responses). We address this limitation by proposing a novel framework rooted in Dempster-Shafer theory of evidence, built on the premise that not all inconsistency is equal. Our method decomposes uncertainty into two complementary metrics: Belief Divergence, which quantifies ambiguity by measuring the separation between viewpoints, and Belief Conflict, which captures direct logical contradictions. Extensive experiments demonstrate that our framework provides a more reliable measure of uncertainty.

AAAI Conference 2026 Conference Paper

Orthogonal Spatial-temporal Distributional Transfer for 4D Generation

  • Wei Liu
  • Shengqiong Wu
  • Bobo Li
  • Haoyu Zhao
  • Hao Fei
  • Mong-Li Lee
  • Wynne Hsu

In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately learning the critical spatial-temporal features necessary for high-quality 4D generation, thus hindering progress in this domain. To combat this, we propose a novel framework that transfers rich spatial priors from existing 3D diffusion models and temporal priors from video diffusion models to enhance 4D synthesis. We develop a spatial-temporal-disentangled 4D (STD-4D) Diffusion model, which synthesizes 4D-aware videos through disentangled spatial and temporal latents. To facilitate the best feature transfer, we design a novel Orthogonal Spatial-temporal Distributional Transfer (Orster) mechanism, where the spatiotemporal feature distributions are carefully modeled and injected into the STD-4D Diffusion. Further, during the 4D construction, we devise a spatial-temporal-aware HexPlane (ST-HexPlane) to integrate the transferred spatiotemporal features for better 4D deformation and 4D Gaussian feature modeling. Experiments demonstrate that our method significantly outperforms existing approaches, achieving superior spatial-temporal consistency and higher-quality 4D synthesis.

AAAI Conference 2026 Conference Paper

PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning

  • Beicheng Xu
  • Wei Liu
  • Keyao Ding
  • Yupeng Lu
  • Bin Cui

The Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem is fundamental in Automated Machine Learning (AutoML). Inspired by the success of ensemble learning, recent AutoML systems construct post-hoc ensembles for final predictions rather than relying on the best single model. However, while most CASH methods conduct extensive searches for the optimal single model, they typically employ fixed strategies during the ensemble phase that fail to adapt to specific task characteristics. To tackle this issue, we propose PSEO, a framework for post-hoc stacking ensemble optimization. First, we conduct base model selection through binary quadratic programming, with a trade-off between diversity and performance. Furthermore, we introduce two mechanisms to fully realize the potential of multi-layer stacking. Finally, PSEO builds a hyperparameter space and searches for the optimal post-hoc ensemble strategy within it. Empirical results on 80 public datasets show that PSEO achieves the best average test rank (2.96) among 16 methods, including post-hoc designs in recent AutoML systems and state-of-the-art ensemble learning methods.

AAAI Conference 2026 Short Paper

ResNet-GA: Evolutionary Deep Learning Models for Adversarial Defense (Student Abstract)

  • Li-Chiao Wang
  • Chung-Shou Liao
  • Wei Liu

Adversarial attacks remain a major challenge for deep learning models, as they can undermine both performance and reliability in practical applications such as image recognition. Although evolutionary algorithms (EAs) have proven effective in optimizing complex systems, their use for directly enhancing model robustness for adversarial defense has been limited. In this study, we introduce ResNet-GA, a method that applies evolutionary deep learning (EDL) to develop ResNet-like networks specifically designed to resist different forms of adversarial perturbations. The approach evolves network architectures with a genetic algorithm (GA), adapting the Residual Blocks at every stage in ResNet according to the needs of each dataset and attack type. Experimental results show that ResNet-GA strengthens model robustness beyond standard baselines, highlighting the value of iterative evolutionary design for building more dependable deep learning systems under various adversarial conditions.

AAAI Conference 2026 Conference Paper

Think-J: Learning to Think for Generative LLM-as-a-Judge

  • Hui Huang
  • Yancheng He
  • Hongli Zhou
  • Rui Zhang
  • Wei Liu
  • Weixun Wang
  • Jiaheng Liu
  • Wenbo Su

LLM-as-a-Judge refers to the automatic modeling of preferences for responses generated by Large Language Models (LLMs), which is of significant importance for both LLM evaluation and reward modeling. Although generative LLMs have made substantial progress in various tasks, their performance as LLM-Judge still falls short of expectations. In this work, we propose Think-J, which improves generative LLM-as-a-Judge by learning how to think. We first utilized a small amount of curated data to develop the model with initial judgment thinking capabilities. Subsequently, we optimize the judgment thinking traces based on reinforcement learning (RL). We propose two methods for judgment thinking optimization, based on offline and online RL, respectively. The offline method requires training a critic model to construct positive and negative examples for learning. The online method defines rule-based reward as feedback for optimization. Experimental results showed that our approach can significantly enhance the evaluation capability of generative LLM-Judge, surpassing both generative and classifier-based LLM-Judge without requiring extra human annotations.

JBHI Journal 2026 Journal Article

XAI Driven Intelligent IoMT Secure Data Management Framework

  • Wei Liu
  • Feng Zhao
  • Lewis Nkenyereye
  • Shalli Rani
  • Keqin Li
  • Jianhui Lv

The Internet of Medical Things (IoMT) has transformed traditional healthcare systems by enabling real-time monitoring, remote diagnostics, and data-driven treatment. However, security and privacy remain significant concerns for IoMT adoption due to the sensitive nature of medical data. Therefore, we propose an integrated framework leveraging blockchain and explainable artificial intelligence (XAI) to enable secure, intelligent, and transparent management of IoMT data. First, the traceability and tamper-proof of blockchain are used to realize the secure transaction of IoMT data, transforming the secure transaction of IoMT data into a two-stage Stackelberg game. The dual-chain architecture is used to ensure the security and privacy protection of the transaction. The main-chain manages regular IoMT data transactions, while the side-chain deals with data trading activities aimed at resale. Simultaneously, the perceptual hash technology is used to realize data rights confirmation, which maximally protects the rights and interests of each participant in the transaction. Subsequently, medical time-series data is modeled using bidirectional simple recurrent units to detect anomalies and cyberthreats accurately while overcoming vanishing gradients. Lastly, an adversarial sample generation method based on local interpretable model-agnostic explanations is provided to evaluate, secure, and improve the anomaly detection model, as well as to make it more explainable and resilient to possible adversarial attacks. Simulation results are provided to illustrate the high performance of the integrated secure data management framework leveraging blockchain and XAI, compared with the benchmarks.

AAAI Conference 2025 Short Paper

Assessing Vulnerabilities in State-of-the-Art Large Language Models Through Hex Injection (Student Abstract)

  • Da Cheng Gu
  • Wei Liu

State-of-the-art large language models (LLMs) are designed with robust safeguards to prevent the disclosure of harmful information and dangerous procedures. However, "jailbreaking" techniques can circumvent these protections by exploiting vulnerabilities in the models. This paper introduces a novel method, Hex Injection, which leverages a specific weakness in LLMs' ability to decode encoded text to uncover concealed dangerous instructions. Hex Injection distinguishes itself from traditional methods by combining encoded instructions with plaintext prompts to reveal unsafe content more effectively. Our approach involves encoding potentially malicious prompts in hexadecimal and integrating them. We observe a 94% average success rate (ASR) with a combination of plaintext, encoded, and role-play for Llama 3 and 3.1 models, and an 86% ASR for the Gemma 2 model. This research not only advances the understanding of LLM security but also offers valuable insights for improving safety mechanisms in artificial intelligence systems.

AAAI Conference 2025 Conference Paper

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

  • Zichen Geng
  • Zeeshan Hayder
  • Wei Liu
  • Ajmal Saeed Mian

Text-driven Human-Object Interaction (Text-to-HOI) generation is an emerging field with applications in animation, video games, virtual reality, and robotics. A key challenge in HOI generation is maintaining interaction consistency in long sequences. Existing Text-to-Motion-based approaches, such as discrete motion tokenization, cannot be directly applied to HOI generation due to limited data in this domain and the complexity of the modality. To address the problem of interaction consistency in long sequences, we propose an autoregressive diffusion model (ARDHOI) that predicts the next continuous token. Specifically, we introduce a Contrastive Variational Autoencoder (cVAE) to learn a physically plausible space of continuous HOI tokens, thereby ensuring that generated human-object motions are realistic and natural. For generating sequences autoregressively, we develop a Mamba-based context encoder to capture and maintain consistent sequential actions. Additionally, we implement an MLP-based denoiser to generate the subsequent token conditioned on the encoded context. Our model has been evaluated on the OMOMO and BEHAVE datasets, where it outperforms existing state-of-the-art methods in terms of both performance and inference speed. This makes ARDHOI a robust and efficient solution for text-driven HOI tasks.

NeurIPS Conference 2025 Conference Paper

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

  • Chen Chen
  • Pengsheng Guo
  • Liangchen Song
  • Jiasen Lu
  • Rui Qian
  • Tsu-Jui Fu
  • Xinze Wang
  • Wei Liu

Conditional generative modeling aims to learn a conditional data distribution from samples containing data-condition pairs. For this, diffusion and flow-based methods have attained compelling results. These methods use a learned (flow) model to transport an initial standard Gaussian noise that ignores the condition to the conditional data distribution. The model is hence required to learn both mass transport \emph{and} conditional injection. To ease the demand on the model, we propose \emph{Condition-Aware Reparameterization for Flow Matching} (CAR-Flow) -- a lightweight, learned \emph{shift} that conditions the source, the target, or both distributions. By relocating these distributions, CAR-Flow shortens the probability path the model must learn, leading to faster training in practice. On low-dimensional synthetic data, we visualize and quantify the effects of CAR-Flow. On higher-dimensional natural image data (ImageNet-256), equipping SiT-XL/2 with CAR-Flow reduces FID from 2. 07 to 1. 68, while introducing less than (0. 6\%) additional parameters.

AAAI Conference 2025 Conference Paper

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

  • Junxian Li
  • Di Zhang
  • Xunzhi Wang
  • Zeying Hao
  • Jingdi Lei
  • Qian Tan
  • Cai Zhou
  • Wei Liu

Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce ChemVLM, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks.

TMLR Journal 2025 Journal Article

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

  • Wei Liu
  • Anweshit Panda
  • Ujwal Pandey
  • Christopher Brissette
  • Yikang Shen
  • George Slota
  • Naigang Wang
  • Jie Chen

In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression technique to save communication costs. Though momentum acceleration and compressed communication have been used in literature, it is highly nontrivial to theoretically prove the effectiveness of their composition in a decentralized algorithm that can maintain the benefits of both sides, because of the need to simultaneously control the consensus error, the compression error, and the bias from the momentum gradient. For the scenario where gradients are bounded, our proposal is a compressed decentralized adaptive method. To the best of our knowledge, this is the first decentralized adaptive stochastic gradient method with compressed communication. For the scenario of data heterogeneity without bounded gradients, our proposal is a compressed decentralized heavy-ball method, which applies a gradient tracking technique to address the challenge of data heterogeneity. Notably, both methods achieve an optimal convergence rate, and they can achieve linear speed up and adopt topology-independent algorithmic parameters within a certain regime of the user-specified error tolerance. Superior empirical performance is observed over state-of-the-art methods on training deep neural networks (DNNs) and Transformers.

IJCAI Conference 2025 Conference Paper

Decision-Aware Preference Modeling for Multi-Behavior Recommendation

  • Qingfeng Li
  • Wei Liu
  • Zaiqiao Meng
  • Jian Yin

In recommender systems, multi-behavior methods have demonstrated significant effectiveness in addressing issues such as data sparsity—challenges commonly encountered by traditional single-behavior recommendation methods. These methods typically infer user preferences from various auxiliary behaviors and apply them to recommendations for the target behavior. However, existing methods face challenges in uncovering the interaction patterns for different behaviors from multi-behavior implicit feedback, as users exhibit varying preference strengths for different items across behaviors. To address this issue, this paper introduces a novel approach, Decision-Aware Preference Modeling (DAPM), for multi-behavior recommendation. We first construct a behavior-agnostic graph to learn comprehensive representations that are not affected by behavior factors, complementing the behavior-specific representations. Subsequently, we introduce an innovative contrastive learning paradigm that emphasizes inter-behavior consistency and intra-behavior uniformity to alleviate the “false repulsion” problem in traditional contrastive learning. Furthermore, we propose a multi-behavior hinge loss with boundary constraints to explicitly model users' decision boundaries across different behaviors, thereby enhancing the model’s ability to accurately capture users' inconsistent preference intensities. Extensive experiments on three real-world datasets demonstrate the consistent improvements achieved by DAPM over thirteen state-of-the-art baselines. We release our code at https: //github. com/Breeze-del/DAPM.

ICLR Conference 2025 Conference Paper

DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks

  • Wei Liu
  • Li Yang
  • Mingxuan Zhao
  • Shuxun Wang
  • Jin Gao
  • Wenjuan Li
  • Bing Li
  • Weiming Hu

Spiking Neural Networks (SNNs), with their biologically inspired spatio-temporal dynamics and spike-driven processing, are emerging as a promising low-power alternative to traditional Artificial Neural Networks (ANNs). However, the complex neuronal dynamics and non-differentiable spike communication mechanisms in SNNs present substantial challenges for efficient training. By analyzing the membrane potentials in spiking neurons, we found that their distributions can increasingly deviate from the firing threshold as time progresses, which tends to cause diminished backpropagation gradients and unbalanced optimization. To address these challenges, we propose Deep Temporal-Aligned Gradient Enhancement (DeepTAGE), a novel approach that improves optimization gradients in SNNs from both internal surrogate gradient functions and external supervision methods. Our DeepTAGE dynamically adjusts surrogate gradients in accordance with the membrane potential distribution across different time steps, enhancing their respective gradients in a temporal-aligned manner that promotes balanced training. Moreover, to mitigate issues of gradient vanishing or deviating during backpropagation, DeepTAGE incorporates deep supervision at both spatial (network stages) and temporal (time steps) levels to ensure more effective and robust network optimization. Importantly, our method can be seamlessly integrated into existing SNN architectures without imposing additional inference costs or requiring extra control modules. We validate the efficacy of DeepTAGE through extensive experiments on static benchmarks (CIFAR10, CIFAR100, and ImageNet-1k) and a neuromorphic dataset (DVS-CIFAR10), demonstrating significant performance improvements.

AAAI Conference 2025 Conference Paper

Enhancing Multi-View Classification Reliability with Adaptive Rejection

  • Wei Liu
  • Yufei Chen
  • Xiaodong Yue

Multi-view classification based on evidence theory aims to enhance result reliability by effectively quantifying prediction uncertainty at the evidence level, particularly when dealing with low-quality views. However, these methods face limitations in real-world applications due to the sensitivity of estimated uncertainty to view distribution, leading to two main issues: 1) difficulty in making clear judgments about whether to trust predictions based on vague uncertainty scores, and 2) the potential negative impact of integrating information from low-quality views on multi-view classification performance. Both limitations compromise the reliability of multi-view decisions. To address these challenges, we introduce an adaptive rejection mechanism based on estimated uncertainty, which is free of data distribution constraints. By integrating this adaptive rejection mechanism into the fusion of multiple views, our method not only indicates whether predictions should be adopted or rejected at the view level but also enhances classification performance by minimizing the impact of unreliable information. The effectiveness of our method is demonstrated through comprehensive theoretical analysis and empirical experiments on various multi-view datasets, establishing its superiority in enhancing the reliability of multi-view classification.

JBHI Journal 2025 Journal Article

Explainable AI for Medical Image Analysis in Medical Cyber-Physical Systems: Enhancing Transparency and Trustworthiness of IoMT

  • Wei Liu
  • Feng Zhao
  • Achyut Shankar
  • Carsten Maple
  • James Dinesh Peter
  • Byung-Gyu Kim
  • Adam Slowik
  • Bidare Divakarachari Parameshachari

This study explores the application of explainable artificial intelligence (XAI) in the context of medical image analysis within medical cyber-physical systems (MCPS) to enhance transparency and trustworthiness. Meanwhile, this study proposes an explainable framework that integrates machine learning and knowledge reasoning. The explainability of the model is realized when the framework evolution target feature results and reasoning results are the same and are relatively reliable. However, using these technologies also presents new challenges, including the need to ensure the security and privacy of patient data from Internet of Medical Things (IoMT). Therefore, attack detection is an essential aspect of MCPS security. For the MCPS model with only sensor attacks, the necessary and sufficient conditions for detecting attacks are given based on the definition of sparse observability. The corresponding attack detector and state estimator are designed by assuming that some IoMT sensors are under protection. It is expounded that the IoMT sensors under protection play an important role in improving the efficiency of attack detection and state estimation. The experimental results show that the XAI in the context of medical image analysis within MCPS improves the accuracy of lesion classification, effectively removes low-quality medical images, and realizes the explainability of recognition results. This helps doctors understand the logic of the system's decision-making and can choose whether to trust the results based on the explanation given by the framework.

JBHI Journal 2025 Journal Article

Exploring Microbe-Drug Association Prediction via Multi-Attribute Dual-Decoder Graph Autoencoder

  • Wei Liu
  • Xiangcheng Deng
  • Xingen Sun
  • Xu Lu
  • Xing Chen

Predicting potential microbe-drug associations (MDA) can help study pathogenesis, expedite pharmaceutical innovation, and enhance targeted therapeutics. Given the time and labor intensity of traditional biological experiments, an increasing number of computational approaches are being employed to predict MDA. The method based on graph embedding is one of the most widely used. However, most of these methods only consider node embedding or graph structure information in isolation, which leads to restricted predictive accuracy. In this work, we propose a method called exploring microbe-drug association prediction via multi-attribute dual-decoder graph autoencoder (MDGAEMDA). Specifically, a heterogeneous network containing microbe similarity, drug similarity, and known associations is constructed. Second, to enrich the node information, the multi-attribute features are obtained by importing the topological information of microbe and drug. Then, two heterogeneous networks constructed by the graph masking strategy are input into dual-decoder graph autoencoder that contains one encoder and two decoders (node decoder and structure decoder) to learn both node embedding and graph structure information. Finally, two low-dimensional features are spliced into the features of MDA pairs and predicted by random forest. The model was compared with multiple advanced methods using public datasets. The experimental outcomes showed that our model significantly outperformed other methods. The case study of widely used drugs demonstrated the reliability of the proposed method to predict MDA.

AAAI Conference 2025 Conference Paper

Follow-Your-Click: Open-domain Regional Image Animation via Motion Prompts

  • Yue Ma
  • Yingqing He
  • Hongfa Wang
  • Andong Wang
  • Leqi Shen
  • Chenyang Qi
  • Jixuan Ying
  • Chengfei Cai

Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to provide redundant detailed descriptions of frame contents.These two issues hinder the practical utilization of current I2V tools. In this paper, we propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click (for specifying what to move) and a motion prompt (for specifying how to move). Technically, we propose the first-frame masking strategy, which significantly improves the video generation quality, and a motion-augmented module equipped with a motion prompt dataset to improve the motion prompt following abilities of our model. To further control the motion speed, we propose flow-based motion magnitude control to control the speed of target movement more precisely. Extensive experiments compared with 7 baselines, including both commercial tools and research methods on 8 metrics, suggest the superiority of our approach.

AAAI Conference 2025 Conference Paper

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

  • Yirui Chen
  • Xudong Huang
  • Quan Zhang
  • Wei Li
  • Mingjian Zhu
  • Qiangyu Yan
  • Simiao Li
  • Hanting Chen

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location (IMDL). However, the lack of a large-scale data foundation makes the IMDL task unattainable. In this paper, we build a local manipulation data generation pipeline that integrates the powerful capabilities of SAM, LLM, and generative models. Upon this basis, we propose the GIM dataset, which has the following advantages: 1) Large scale, GIM includes over one million pairs of AI-manipulated images and real images. 2) Rich image content, GIM encompasses a broad range of image classes. 3) Diverse generative manipulation, the images are manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce the GIM benchmark with two settings to evaluate existing IMDL methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial block (FSB), and a Multi-Window Anomalous Modeling (MWAM) module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses the previous state-of-the-art approach on two different benchmarks.

IJCAI Conference 2025 Conference Paper

HyperTrans: Efficient Hypergraph-Driven Cross-Domain Pattern Transfer in Image Anomaly Detection

  • Tengyu Zhang
  • Deyu Zeng
  • Baoqiang Li
  • Wei Wang
  • Wei Liu
  • Zongze Wu

Anomaly detection plays a pivotal role in industrial quality assurance processes, with cross-domain problems, exemplified by the model upgrade from RGB to 3D, being prevalent in real-world scenarios yet remaining systematically underexplored. To address the severe challenges posed by the extreme lack of datasets in target domain, we retain the knowledge from source models and explore a novel solution for anomaly detection through cross-domain learning, introducing HyperTrans. Targeting few-shot scenarios, HyperTrans centers around hypergraphs to model the relationship of the limited patch features and employs a perturbation-rectification-scoring architecture. The domain perturbation module injects and adapts channel-level statistical perturbations, mitigating style shifts during domain transfer. Subsequently, a residual hypergraph restoration module utilizes a cross-domain hypergraph to capture higher-order correlations in patches and align them across domains. Ultimately, with feature patterns exhibiting reduced domain shifts, an inter-domain scoring module aggregates similarity information between patches and normal patterns within the multi-domain subhypergraphs to make an integrated decision, generating multi-level anomaly predictions. Extensive experiments demonstrate that HyperTrans offers significant advantages in anomaly classification and anomaly segmentation tasks, outperforming state-of-the-art non-cross-domain methods in image-wise ROCAUC by 13%, 12%, and 15% in 1-shot, 2-shot, and 5-shot settings on MVTec3D AD.

AAAI Conference 2025 Conference Paper

Infinite-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

  • Qihua Chen
  • Yue Ma
  • Hongfa Wang
  • Junkun Yuan
  • Wenzhe Zhao
  • Qi Tian
  • Hongmei Wang
  • Shaobo Min

This paper explores higher-resolution video outpainting with extensive content generation. We point out common issues faced by existing methods when attempting to largely outpaint videos: the generation of low-quality content and limitations imposed by GPU memory. To address these challenges, we propose a diffusion-based method called Infinite-Canvas. It builds upon two core designs. First, instead of employing the common practice of "single-shot" outpainting, we distribute the task across spatial windows and seamlessly merge them. It allows us to outpaint videos of any size and resolution without being constrained by GPU memory. Second, the source video and its relative positional relation are injected into the generation process of each window. It makes the generated spatial layout within each window harmonize with the source video. Coupling with these two designs enables us to generate higher-resolution outpainting videos with rich content while keeping spatial and temporal consistency. Infinite-Canvas excels in large-scale video outpainting, e.g., from 512 × 512 to 1152 × 2048 (9×), while producing high-quality and aesthetically pleasing results. It achieves the best quantitative results across various resolution and scale setups. The code is available at https://github.com/mayuelala/FollowYourCanvas.

AAAI Conference 2025 Conference Paper

Just a Few Glances: Open-Set Visual Perception with Image Prompt Paradigm

  • Jinrong Zhang
  • Penghui Wang
  • Chunxiao Liu
  • Wei Liu
  • Dian Jin
  • Qiong Zhang
  • Erli Meng
  • Zhengnan Hu

To break through the limitations of pre-training models on fixed categories, Open-Set Object Detection (OSOD) and Open-Set Segmentation (OSS) have attracted a surge of interest from researchers. Inspired by large language models, mainstream OSOD and OSS methods generally utilize text as a prompt, achieving remarkable performance. Following SAM paradigm, some researchers use visual prompts, such as points, boxes, and masks that cover detection or segmentation targets. Despite these two prompt paradigms exhibit excellent performance, they also reveal inherent limitations. On the one hand, it is difficult to accurately describe characteristics of specialized category using textual description. On the other hand, existing visual prompt paradigms heavily rely on multi-round human interaction, which hinders them being applied to fully automated pipeline. To address the above issues, we propose a novel prompt paradigm in OSOD and OSS, that is, Image Prompt Paradigm. This brand new prompt paradigm enables to detect or segment specialized categories without multi-round human intervention. To achieve this goal, the proposed image prompt paradigm uses just a few image instances as prompts, and we propose a novel framework named MI Grounding for this new paradigm. In this framework, high-quality image prompts are automatically encoded, selected and fused, achieving the single-stage and non-interactive inference. We conduct extensive experiments on public datasets, showing that MI Grounding achieves competitive performance on OSOD and OSS benchmarks compared to text prompt paradigm methods and visual prompt paradigm methods. Moreover, MI Grounding can greatly outperform existing method on our constructed specialized ADR50K dataset.

AAAI Conference 2025 Conference Paper

Local Conditional Controlling for Text-to-Image Diffusion Models

  • Yibo Zhao
  • Liang Peng
  • Yang Yang
  • Zekai Luo
  • Hengjia Li
  • Yao Chen
  • Zheng Yang
  • Xiaofei He

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we explore a novel and practical task setting: local control. It focuses on controlling specific local region according to user-defined image conditions, while the remaining regions are only conditioned by the original text prompt. However, it is non-trivial to achieve it. The naive manner of directly adding local conditions may lead to the local control dominance problem, which forces the model to focus on the controlled region and neglect object generation in other regions. To mitigate this problem, we propose Regional Discriminate Loss to update the noised latents, aiming at enhanced object generation in non-control regions. Furthermore, the proposed Focused Token Response suppresses weaker attention scores which lack the strongest response to enhance object distinction and reduce duplication. Lastly, we adopt Feature Mask Constraint to reduce quality degradation in images caused by information differences across the local control region. All proposed strategies are operated at the inference stage. Extensive experiments demonstrate that our method can synthesize high-quality images aligned with the text prompt under local control conditions.

NeurIPS Conference 2025 Conference Paper

MI-TRQR: Mutual Information-Based Temporal Redundancy Quantification and Reduction for Energy-Efficient Spiking Neural Networks

  • Dengfeng Xue
  • Wenjuan Li
  • Yifan Lu
  • Chunfeng Yuan
  • Yufan Liu
  • Wei Liu
  • Man Yao
  • Li Yang

Brain-inspired spiking neural networks (SNNs) provide energy-efficient computation through event-driven processing. However, the shared weights across multiple timesteps lead to serious temporal feature redundancy, limiting both efficiency and performance. This issue is further aggravated when processing static images due to the duplicated input. To mitigate this problem, we propose a parameter-free and plug-and-play module named Mutual Information-based Temporal Redundancy Quantification and Reduction (MI-TRQR), constructing energy-efficient SNNs. Specifically, Mutual Information (MI) is properly introduced to quantify redundancy between discrete spike features at different timesteps on two spatial scales: pixel (local) and the entire spatial features (global). Based on the multi-scale redundancy quantification, we apply a probabilistic masking strategy to remove redundant spikes. The final representation is subsequently recalibrated to account for the spike removal. Extensive experimental results demonstrate that our MI-TRQR achieves sparser spiking firing, higher energy efficiency, and better performance concurrently with different SNN architectures in tasks of neuromorphic data classification, static data classification, and time-series forecasting. Notably, MI-TRQR increases accuracy by \textbf{1. 7\%} on CIFAR10-DVS with 4 timesteps while reducing energy cost by \textbf{37. 5\%}. Our codes are available at https: //github. com/dfxue/MI-TRQR.

AAAI Conference 2025 Conference Paper

Modeling All Response Surfaces in One for Conditional Search Spaces

  • Jiaxing Li
  • Wei Liu
  • Chao Xue
  • Yibing Zhan
  • Xiaoxing Wang
  • Weifeng Liu
  • Dacheng Tao

Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.

AAAI Conference 2025 Conference Paper

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

  • Yuxuan Bian
  • Ailing Zeng
  • Xuan Ju
  • Xian Liu
  • Zhaoyang Zhang
  • Wei Liu
  • Qiang Xu

Whole-body multimodal motion generation, controlled by text, speech, or music, has numerous applications including video generation and character animation. However, employing a unified model to process different condition modalities presents two main challenges: motion distribution drifts across different tasks (e.g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e.g., text and audio). In this paper, we propose MotionCraft, a unified diffusion transformer that crafts whole-body motion with plug-and-play multimodal control. Our framework employs a coarse-to-fine training strategy, starting with the text-to-motion semantic pre-training, followed by the multimodal low-level control adaptation. To effectively learn and transfer motion knowledge across different distributions, we design MC-Attn for parallel modeling of static and dynamic human topology graphs. To overcome the motion format inconsistency of existing benchmarks, we introduce MC-Bench, the first available multimodal whole-body motion generation benchmark based on the unified SMPL-X format. Extensive experiments show that MotionCraft achieves state-of-the-art performance on various standard motion generation tasks.

NeurIPS Conference 2025 Conference Paper

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

  • Daoguang Zan
  • Zhirong Huang
  • Wei Liu
  • Hanwu Chen
  • Shulin Xin
  • Linhao Zhang
  • Qi Liu
  • Li Aoyan

The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 languages of Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2, 132 high-quality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on human-annotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Our experiments reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolution rates decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.

ECAI Conference 2025 Conference Paper

PMR: Physical Model-Driven Multi-Stage Restoration of Turbulent Dynamic Videos

  • Tao Wu
  • Jingyuan Ye
  • Cheng Zhou
  • Wenlong Chen
  • Zheng Liu
  • Huiming Zheng
  • Wei Liu
  • Ying Fu

Geometric distortions and blurring caused by atmospheric turbulence degrade the quality of long-range dynamic scene videos. Existing methods struggle with restoring edge details and eliminating mixed distortions, especially under conditions of strong turbulence and complex dynamics. To address these challenges, we introduce a Dynamic Efficiency Index (DEI), which combines turbulence intensity, optical flow, and proportions of dynamic regions to accurately quantify video dynamic intensity under varying turbulence conditions and provide a high-dynamic turbulence training dataset. Additionally, we propose a Physical Model-Driven Multi-Stage Video Restoration (PMR) framework that consists of three stages: de-tilting for geometric stabilization, motion segmentation enhancement for dynamic region refinement, and de-blurring for quality restoration. PMR employs lightweight backbones and stage-wise joint training to ensure both efficiency and high restoration quality. Experimental results demonstrate that the proposed method effectively suppresses motion trailing artifacts, restores edge details and exhibits strong generalization capability, especially in real-world scenarios characterized by high-turbulence and complex dynamics. We will make the code and datasets openly available.

NeurIPS Conference 2025 Conference Paper

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

  • Yang Qiu
  • Yixiong Zou
  • Jun Wang
  • Wei Liu
  • Xiangyu Fu
  • Ruixuan Li

Out-of-distribution generalization under distributional shifts remains a critical challenge for graph neural networks. Existing methods generally adopt the Invariant Risk Minimization (IRM) framework, requiring costly environment annotations or heuristically generated synthetic splits. To circumvent these limitations, in this work, we aim to develop an IRM-free method for capturing causal subgraphs. We first identify that causal subgraphs exhibit substantially smaller distributional variations than non-causal components across diverse environments, which we formalize as the Invariant Distribution Criterion and theoretically prove in this paper. Building on this criterion, we systematically uncover the quantitative relationship between distributional shift and representation norm for identifying the causal subgraph, and investigate its underlying mechanisms in depth. Finally, we propose an IRM-free method by introducing a norm-guided invariant distribution objective for causal subgraph discovery and prediction. Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization. Code is available at https: //github. com/anders1123/IDG.

NeurIPS Conference 2025 Conference Paper

Retro-R1: LLM-based Agentic Retrosynthesis

  • Wei Liu
  • Jiangtao Feng
  • Hongli Yu
  • Yuxuan Song
  • Yuqiang Li
  • Shufei Zhang
  • Lei Bai
  • Wei-Ying Ma

Retrosynthetic planning is a fundamental task in chemical discovery. Due to the vast combinatorial search space, identifying viable synthetic routes remains a significant challenge--even for expert chemists. Recent advances in Large Language Models (LLMs), particularly equipped with reinforcement learning, have demonstrated strong human-like reasoning and planning abilities, especially in mathematics and code problem solving. This raises a natural question: Can the reasoning capabilities of LLMs be harnessed to develop an AI chemist capable of learning effective policies for multi-step retrosynthesis? In this study, we introduce Retro-R1, a novel LLM-based retrosynthesis agent trained via reinforcement learning to design molecular synthesis pathways. Unlike prior approaches, which typically rely on single-turn, question-answering formats, Retro-R1 interacts dynamically with plug-in single-step retrosynthesis tools and learns from environmental feedback. Experimental results show that Retro-R1 achieves a 55. 79\% pass@1 success rate, surpassing the previous state of the art by 8. 95\%. Notably, Retro-R1 demonstrates strong generalization to out-of-domain test cases, where existing methods tend to fail despite their high in-domain performance. Our work marks a significant step toward equipping LLMs with advanced, chemist-like reasoning abilities, highlighting the promise of reinforcement learning for enabling data-efficient, generalizable, and sophisticated scientific problem-solving in LLM-based agents.

AAAI Conference 2025 Conference Paper

Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology

  • Xiaolin Hu
  • Zixuan Gong
  • Gengze Xu
  • Wei Liu
  • Jian Luan
  • Bin Wang
  • Yong Liu

Zeroth-order (ZO) optimization as the gradient-free method has become a powerful tool when the first-order gradient is unavailable or expensive to obtain, especially in decentralized learning scenarios where data and computational resources are distributed across multiple clients. There have been many efforts to analyze the optimization convergence rate of zeroth-order decentralized stochastic gradient descent (ZO-DSGD) algorithms. However, the generalization of these methods has not been well studied. In this paper, we provide a generalization analysis of ZO-DSGD with changing topology, where the clients run zeroth-order SGD with local data and communicate with each other according to time-varying topology. We systematically analyze the generalization error in convex, strongly convex, and non-convex cases. The obtained results in the convex and strongly convex cases with zeroth-order oracles recover the results of SGD. Moreover, the generalization bounds derived in non-convex cases align with that of DSGD. To capture the influence of communication topology on the generalization performance, we analyze local generalization bounds concerning local models held at different clients. The obtained results reflect the influence of the number of clients, local sample size, and topology on the generalization error. To the best of our knowledge, this is the first work that provides a generalization analysis of zeroth-order decentralized stochastic gradient descent methods and recovers the results of SGD.

IJCAI Conference 2025 Conference Paper

Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

  • Xinhao Yao
  • Hongjin Qian
  • Xiaolin Hu
  • Gengze Xu
  • Wei Liu
  • Jian Luan
  • Bin Wang
  • Yong Liu

Large Language Models (LLMs), built on Transformer architectures, exhibit remarkable generalization across a wide range of tasks. However, fine-tuning these models for specific tasks remains resource-intensive due to their extensive parameterization. In this paper, we explore two remarkable phenomena related to the attention mechanism during the fine-tuning of LLMs (where Wq, Wk, and Wv denote the weights of the query, key, and value layers, respectively). The first phenomenon, termed “Unequal Importance of Attention Matrices”, highlights the impact of fine-tuning different weight matrices. It shows that optimizing the Wv matrix yields significantly better performance than optimizing the Wk matrix. Fine-tuning only the Wq and Wv matrices is computationally efficient while delivering results comparable to, or even better than fine-tuning all three matrices (Wq, Wk, and Wv). The second phenomenon, “Attention Matrices with Customized Learning Rate Lead to Better Convergence”, emphasizes the importance of assigning distinct learning rates to these matrices. Specifically, a higher learning rate for the Wv matrix compared to Wq and Wk accelerates convergence and improves performance. Building on these insights, we propose a new strategy that improves fine-tuning efficiency in terms of both storage and time. Experimental results on benchmark datasets validate the effectiveness of this approach, supporting our theoretical findings. Our analysis lays the theoretical groundwork for configuring and improving algorithms in LLMs fine-tuning.

AAAI Conference 2025 Conference Paper

Towards More Discriminative Feature Learning in SNNs with Temporal-Self-Erasing Supervision

  • Wei Liu
  • Li Yang
  • Mingxuan Zhao
  • Dengfeng Xue
  • Shuxun Wang
  • Boyu Cai
  • Jin Gao
  • Wenjuan Li

Spiking Neural Networks (SNNs) are biologically inspired models that process visual inputs over multiple time steps. However, they often struggle with limited feature discrimination along the temporal dimension due to inherent spatiotemporal invariance. This limitation arises from the redundant activation of certain regions and shared supervision for multiple time steps, constraining the network’s ability to adapt and learn diverse features. To address this challenge, we propose a novel Temporal-Self-Erasing (TSE) supervision method that dynamically adapts the learning regions of interest for different time steps. The TSE method operates by identifying highly activated regions from predictions across multiple time steps and adaptively suppressing them during model training, thereby encouraging the network to focus on less activated yet potentially informative regions. This approach not only enhances the feature discrimination capability of SNNs but also facilitates more effective multi-time-step inference by exploiting more semantic information. Experimental results on benchmark datasets demonstrate that our TSE method significantly improves the classification accuracy and robustness of SNNs.

ICLR Conference 2025 Conference Paper

UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

  • Ruifeng Li
  • Mingqian Li
  • Wei Liu
  • Yuhua Zhou
  • Xiangxin Zhou
  • Yuan Yao
  • Qiang Zhang 0026
  • Hongyang Chen 0001

Drug discovery is crucial for identifying candidate drugs for various diseases. However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching via meta- learning, bridging multi-level molecular representations and task-level generalization. Specifically, our approach explicitly captures structural features across multiple levels—atoms, substructures, and molecules—via hierarchical pooling and matching, facilitating precise molecular representation and comparison. Additionally, we employ a meta-learning strategy for implicit task-level matching, allowing the model to capture shared patterns across tasks and quickly adapt to new ones. This unified matching framework ensures effective molecular alignment while leveraging shared meta-knowledge for fast adaptation. Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks, achieving improvements of 2.87% in AUROC and 6.52% in ∆AUPRC. UniMatch also shows excellent generalization ability on the Meta-MolNet benchmark.

ICRA Conference 2025 Conference Paper

X-MOBILITY: End-to-End Generalizable Navigation via World Modeling

  • Wei Liu
  • Huihua Zhao
  • Chenran Li
  • Joydeep Biswas
  • Billy Okal
  • Pulkit Goyal
  • Yan Chang
  • Soha Pouya

General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies-off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for crossembodiment generalization. Project page: https://nvlabs.github.io/X-MOBILITY.

NeurIPS Conference 2024 Conference Paper

$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

  • Weiquan Wang
  • Jun Xiao
  • Chunping Wang
  • Wei Liu
  • Zhao Wang
  • Long Chen

Diffusion models have demonstrated their effectiveness in addressing the inherent uncertainty and indeterminacy in monocular 3D human pose estimation (HPE). Despite their strengths, the need for large search spaces and the corresponding demand for substantial training data make these models prone to generating biomechanically unrealistic poses. This challenge is particularly noticeable in occlusion scenarios, where the complexity of inferring 3D structures from 2D images intensifies. In response to these limitations, we introduce the **Di**screte **Di**ffusion **Pose** (**$\text{Di}^2\text{Pose}$**), a novel framework designed for occluded 3D HPE that capitalizes on the benefits of a discrete diffusion model. Specifically, **$\text{Di}^2\text{Pose}$** employs a two-stage process: it first converts 3D poses into a discrete representation through a pose quantization step, which is subsequently modeled in latent space through a discrete diffusion process. This methodological innovation restrictively confines the search space towards physically viable configurations and enhances the model’s capability to comprehend how occlusions affect human pose within the latent space. Extensive evaluations conducted on various benchmarks (e. g. , Human3. 6M, 3DPW, and 3DPW-Occ) have demonstrated its effectiveness.

NeurIPS Conference 2024 Conference Paper

AFBench: A Large-scale Benchmark for Airfoil Design

  • Jian Liu
  • Jianyu Wu
  • Hairun Xie
  • Guoqing Zhang
  • Jing Wang
  • Wei Liu
  • Wanli Ouyang
  • Junjun Jiang

Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified airfoils following the multimodal instructions, \emph{i. e. ,} dragging points and physical parameters. This paper presents the open-source endeavors in airfoil inverse design, \emph{AFBench}, including a large-scale dataset with 200 thousand airfoils and high-quality aerodynamic and geometric labels, two novel and practical airfoil inverse design tasks, \emph{i. e. ,} conditional generation on multimodal physical parameters, controllable editing, and comprehensive metrics to evaluate various existing airfoil inverse design methods. Our aim is to establish \emph{AFBench} as an ecosystem for training and evaluating airfoil inverse design methods, with a specific focus on data-driven controllable inverse design models by multimodal instructions capable of bridging the gap between ideas and execution, the academic research and industrial applications. We have provided baseline models, comprehensive experimental observations, and analysis to accelerate future research. Our baseline model is trained on an RTX 3090 GPU within 16 hours. The codebase, datasets and benchmarks will be available at \url{https: //hitcslj. github. io/afbench/}.

NeurIPS Conference 2024 Conference Paper

Autonomous Agents for Collaborative Task under Information Asymmetry

  • Wei Liu
  • Chenxi Wang
  • Yifei Wang
  • Zihao Xie
  • Rennai Qiu
  • Yufan Dang
  • Zhuoyun Du
  • Weize Chen

Large Language Model Multi-Agent Systems (LLM-MAS) have greatly progressed in solving complex tasks. It communicates among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' collaborations are leveraged to perform multi-person tasks, a new challenge arises due to information asymmetry, since each agent can only access the information of its human user. Previous MAS struggle to complete tasks under this condition. To address this, we propose a new MAS paradigm termed iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human social network is mirrored in the agent network, where agents proactively exchange human information necessary for task resolution, thereby overcoming information asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to navigate agents' communication towards effective information exchange. Together with InfoNav, iAgents organizes human information in a mixed memory to provide agents with accurate and comprehensive information for exchange. Additionally, we introduce InformativeBench, the first benchmark tailored for evaluating LLM agents' task-solving ability under information asymmetry. Experimental results show that iAgents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70, 000 messages to complete tasks within 3 minutes.

AAAI Conference 2024 Conference Paper

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

  • Jie Han
  • Yixiong Zou
  • Haozhao Wang
  • Jun Wang
  • Wei Liu
  • Yao Wu
  • Tao Zhang
  • Ruixuan Li

Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among source domains and target domains. For instance, transferring domain-specific-knowledge-related experience is difficult. To tackle this problem, we propose a new method that explicitly decouples the transferring of general-semantic-representation-related experience and the domain-specific-knowledge-related experience. Specifically, for domain-specific-knowledge-related experience, we design two modules to capture intent-slot relation and slot-slot relation respectively. Extensive experiments on Snips and FewJoint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1-shot setting, and from 46.54% to 60.79% in the 5-shot setting.

AAAI Conference 2024 Conference Paper

DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation

  • Zhuowei Chen
  • Shancheng Fang
  • Wei Liu
  • Qian He
  • Mengqi Huang
  • Zhendong Mao

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity and follow the text prompts simultaneously for conditioned input face images and texts. Despite existing encoder-based methods achieving high efficiency and decent face similarity, the generated image often fails to follow the textual prompts. To ease this editability issue, we present DreamIdentity, to learn edit-friendly and accurate face-identity representations in the word embedding space. Specifically, we propose self-augmented editability learning to enhance the editability for projected embedding, which is achieved by constructing paired generated celebrity's face and edited celebrity images for training, aiming at transferring mature editability of off-the-shelf text-to-image models in celebrity to unseen identities. Furthermore, we design a novel dedicated face-identity encoder to learn an accurate representation of human faces, which applies multi-scale ID-aware features followed by a multi-embedding projector to generate the pseudo words in the text embedding space directly. Extensive experiments show that our method can generate more text-coherent and ID-preserved images with negligible time overhead compared to the standard text-to-image generation process.

NeurIPS Conference 2024 Conference Paper

Empowering and Assessing the Utility of Large Language Models in Crop Science

  • Hang Zhang
  • Jiawei Sun
  • Renqi Chen
  • Wei Liu
  • Zhonghang Yuan
  • Xinzhe Zheng
  • Zhefan Wang
  • Zhiyuan Yang

Large language models (LLMs) have demonstrated remarkable efficacy across knowledge-intensive tasks. Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. The CROP dataset is curated through a task-oriented and LLM-human integrated pipeline, comprising 210, 038 single-turn and 1, 871 multi-turn dialogues related to crop science scenarios. The CROP benchmark includes 5, 045 multiple-choice questions covering three difficulty levels. Our experiments based on the CROP benchmark demonstrate notable enhancements in crop science-related tasks when LLMs are fine-tuned with the CROP dataset. To the best of our knowledge, CROP dataset is the first-ever instruction tuning dataset in the crop science domain. We anticipate that CROP will accelerate the adoption of LLMs in the domain of crop science, ultimately contributing to global food production.

NeurIPS Conference 2024 Conference Paper

Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

  • Wei Liu
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • YuanKai Zhang
  • Ruixuan Li

An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss landscape of MMI. Although some penalty-based methods have been developed to penalize the spurious features (e. g. , invariance penalty, intervention penalty, etc) to help MMI work better, these are merely remedial measures. In the optimization objectives of these methods, spurious features are still distinguished from plain noise, which hinders the discovery of causal rationales. This paper aims to develop a new criterion that treats spurious features as plain noise, allowing the model to work on datasets rich in spurious features as if it were working on clean datasets, thereby making rationale extraction easier. We theoretically observe that removing either plain noise or spurious features from the input does not alter the conditional distribution of the remaining components relative to the task label. However, significant changes in the conditional distribution occur only when causal features are eliminated. Based on this discovery, the paper proposes a criterion for \textbf{M}aximizing the \textbf{R}emaining \textbf{D}iscrepancy (MRD). Experiments on six widely used datasets show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10. 4\%$ as compared to several recent competitive MMI variants. Code: \url{https: //github. com/jugechengzi/Rationalization-MRD}.

AAAI Conference 2024 Conference Paper

MathAttack: Attacking Large Language Models towards Math Solving Ability

  • Zihao Zhou
  • Qiufeng Wang
  • Mingyu Jin
  • Jie Yao
  • Jianan Ye
  • Wei Liu
  • Wei Wang
  • Xiaowei Huang

With the boom of Large Language Models (LLMs), the research of solving Math Word Problem (MWP) has recently made great progress. However, there are few studies to examine the robustness of LLMs in math solving ability. Instead of attacking prompts in the use of LLMs, we propose a MathAttack model to attack MWP samples which are closer to the essence of robustness in solving math problems. Compared to traditional text adversarial attack, it is essential to preserve the mathematical logic of original MWPs during the attacking. To this end, we propose logical entity recognition to identify logical entries which are then frozen. Subsequently, the remaining text are attacked by adopting a word-level attacker. Furthermore, we propose a new dataset RobustMath to evaluate the robustness of LLMs in math solving ability. Extensive experiments on our RobustMath and two another math benchmark datasets GSM8K and MultiAirth show that MathAttack could effectively attack the math solving ability of LLMs. In the experiments, we observe that (1) Our adversarial samples from higher-accuracy LLMs are also effective for attacking LLMs with lower accuracy (e.g., transfer from larger to smaller-size LLMs, or from few-shot to zero-shot prompts); (2) Complex MWPs (such as more solving steps, longer text, more numbers) are more vulnerable to attack; (3) We can improve the robustness of LLMs by using our adversarial samples in few-shot prompts. Finally, we hope our practice and observation can serve as an important attempt towards enhancing the robustness of LLMs in math solving ability. The code and dataset is available at: https://github.com/zhouzihao501/MathAttack.

NeurIPS Conference 2024 Conference Paper

OPUS: Occupancy Prediction Using a Sparse Set

  • Jiabao Wang
  • Zhaojiang Liu
  • Qiang Meng
  • Liujiang Yan
  • Ke Wang
  • Jie Yang
  • Wei Liu
  • Qibin Hou

Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied. Performing classification on these empty voxels demands suboptimal computation resource allocation, and reducing such empty voxels necessitates complex algorithm designs. To this end, we present a novel perspective on the occupancy prediction task: formulating it as a streamlined set prediction paradigm without the need for explicit space modeling or complex sparsification procedures. Our proposed framework, called OPUS, utilizes a transformer encoder-decoder architecture to simultaneously predict occupied locations and classes using a set of learnable queries. Firstly, we employ the Chamfer distance loss to scale the set-to-set comparison problem to unprecedented magnitudes, making training such model end-to-end a reality. Subsequently, semantic classes are adaptively assigned using nearest neighbor search based on the learned locations. In addition, OPUS incorporates a suite of non-trivial strategies to enhance model performance, including coarse-to-fine learning, consistent point sampling, and adaptive re-weighting, etc. Finally, compared with current state-of-the-art methods, our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6. 1 RayIoU.

JBHI Journal 2024 Journal Article

scDMAE: A Generative Denoising Model Adopted Mask Strategy for scRNA-Seq Data Recovery

  • Wei Liu
  • Youze Pan
  • Zhijie Teng
  • Junlin Xu

The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized gene expression studies at the single-cell level. However, the presence of technical noise and data sparsity in scRNA-seq often undermines the accuracy of subsequent analyses. Existing methods for denoising and imputing scRNA-seq data often rely on stringent assumptions about data distribution, limiting the effectiveness of data recovery. In this study, we propose the scDMAE model for denoising and recovery of scRNA-seq data. First, the model fuses gene expression features and topological features to discern the primary expression patterns of genes in cells. Then, an autoencoder with a masking strategy is used to model dropout events and separate potential noise in the data. Finally, the model incorporates the original raw data to recover the true biological expression value. By conducting experiments on various types of scRNA-Seq datasets, scDMAE demonstrates superior performance compared to other comparative methods based on six distinct evaluation metrics in downstream analysis. The scDMAE method can accurately cluster similar cell populations, identify differential genes and infer cell trajectories.

AAAI Conference 2024 Conference Paper

SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding

  • Tianyu Yu
  • Chengyue Jiang
  • Chao Lou
  • Shen Huang
  • Xiaobin Wang
  • Wei Liu
  • Jiong Cai
  • Yangning Li

Large language models (LLMs) have shown impressive abilities for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity typing. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our models are accessible at https://github.com/Alibaba-NLP/SeqGPT.

AAAI Conference 2024 Conference Paper

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger

  • Yuting Gao
  • Jinfeng Liu
  • Zihan Xu
  • Tong Wu
  • Enwei Zhang
  • Ke Li
  • Jie Yang
  • Wei Liu

During the preceding biennium, vision-language pre-training has achieved noteworthy success on several downstream tasks. Nevertheless, acquiring high-quality image-text pairs, where the pairs are entirely exclusive of each other, remains a challenging task, and noise exists in the commonly used datasets. To address this issue, we propose SoftCLIP, a novel approach that relaxes the strict one-to-one constraint and achieves a soft cross-modal alignment by introducing a softened target, which is generated from the fine-grained intra-modal self-similarity. The intra-modal guidance is indicative to enable two pairs have some local similarities and model many-to-many relationships between the two modalities. Besides, since the positive still dominates in the softened target distribution, we disentangle the negatives in the distribution to further boost the relation alignment with the negatives in the cross-modal learning. Extensive experiments demonstrate the effectiveness of SoftCLIP. In particular, on ImageNet zero-shot classification task, using CC3M/CC12M as pre-training dataset, SoftCLIP brings a top-1 accuracy improvement of 6.8%/7.2% over the CLIP baseline.

NeurIPS Conference 2024 Conference Paper

United We Stand, Divided We Fall: Fingerprinting Deep Neural Networks via Adversarial Trajectories

  • Tianlong Xu
  • Chen Wang
  • Gaoyang Liu
  • Yang Yang
  • Kai Peng
  • Wei Liu

In recent years, deep neural networks (DNNs) have witnessed extensive applications, and protecting their intellectual property (IP) is thus crucial. As a non-invasive way for model IP protection, model fingerprinting has become popular. However, existing single-point based fingerprinting methods are highly sensitive to the changes in the decision boundary, and may suffer from the misjudgment of the resemblance of sparse fingerprinting, yielding high false positives of innocent models. In this paper, we propose ADV-TRA, a more robust fingerprinting scheme that utilizes adversarial trajectories to verify the ownership of DNN models. Benefited from the intrinsic progressively adversarial level, the trajectory is capable of tolerating greater degree of alteration in decision boundaries. We further design novel schemes to generate a surface trajectory that involves a series of fixed-length trajectories with dynamically adjusted step sizes. Such a design enables a more unique and reliable fingerprinting with relatively low querying costs. Experiments on three datasets against four types of removal attacks show that ADV-TRA exhibits superior performance in distinguishing between infringing and innocent models, outperforming the state-of-the-art comparisons.

NeurIPS Conference 2024 Conference Paper

Unlearnable 3D Point Clouds: Class-wise Transformation Is All You Need

  • Xianlong Wang
  • Minghui Li
  • Wei Liu
  • Hangtao Zhang
  • Shengshan Hu
  • Yechao Zhang
  • Ziqi Zhou
  • Hai Jin

Traditional unlearnable strategies have been proposed to prevent unauthorized users from training on the 2D image data. With more 3D point cloud data containing sensitivity information, unauthorized usage of this new type data has also become a serious concern. To address this, we propose the first integral unlearnable framework for 3D point clouds including two processes: (i) we propose an unlearnable data protection scheme, involving a class-wise setting established by a category-adaptive allocation strategy and multi-transformations assigned to samples; (ii) we propose a data restoration scheme that utilizes class-wise inverse matrix transformation, thus enabling authorized-only training for unlearnable data. This restoration process is a practical issue overlooked in most existing unlearnable literature, i. e. , even authorized users struggle to gain knowledge from 3D unlearnable data. Both theoretical and empirical results (including 6 datasets, 16 models, and 2 tasks) demonstrate the effectiveness of our proposed unlearnable framework. Our code is available at https: //github. com/CGCL-codes/UnlearnablePC.

AAAI Conference 2023 Conference Paper

Adjective Scale Probe: Can Language Models Encode Formal Semantics Information?

  • Wei Liu
  • Ming Xiang
  • Nai Ding

It is an open question what semantic representations transformer-based language models can encode and whether they have access to more abstract aspects of semantic meaning. Here, we propose a diagnostic dataset to investigate how well language models understand the degree semantics of adjectives. In the dataset, referred as the Adjective Scale Probe (ASP), we semi-automatically generate 8 tests of Natural Language Inference (NLI) questions to test 8 key capabilities of adjective interpretation. We apply the ASP dataset to evaluate the performance of 3 language models, i.e., BERT, DeBERTa, and T0. It is found that language models perform below the majority baseline for most tests of the ASP, even when the models have been fine-tuned to achieve high performance on the large-scale MNLI dataset. But after we fine-tune the pre-trained models on a subset of the ASP, DeBERTa can achieve high performance on the untrained adjectives and untrained tests, suggesting that DeBERTa may have captured degree semantic information of adjectives through pre-training but it needs specific training data to learn how to apply such information to the current tasks. In sum, the ASP provides an easy-to-use method to test fine-grained formal semantic properties of adjectives, and reveals language models' abilities to access formal semantic information.

JBHI Journal 2023 Journal Article

An Enhanced EEG Microstate Recognition Framework Based on Deep Neural Networks: An Application to Parkinson's Disease

  • Chunguang Chu
  • Zhen Zhang
  • Zhenxi Song
  • Zifan Xu
  • Jiang Wang
  • Fei Wang
  • Wei Liu
  • Liying Lu

Variations in brain activity patterns reveal impairments of motor and cognitive functions in the human brain. Electroencephalogram (EEG) microstates embody brain activity patterns at a microscopic time scale. However, current microstate analysis method can only recognize less than 90% of EEG signals per subject, which severely limits the characterization of dynamic brain activity. As an application to early Parkinson's disease (PD), we propose an enhanced EEG microstate recognition framework based on deep neural networks, which yields recognition rates from 90% to 99%, as accompanied by a strong anti-artifact property. Additionally, gradient-weighted class activation mapping, as a visualization technique, is employed to locate the activated functional brain regions of each microstate class. We find that each microstate class corresponds to a particular activated brain region. Finally, based on the improved identification of microstate sequences, we explore the EEG microstate characteristics and their clinical associations. We show that the decreased occurrences of a particular microstate class reflect the degree of cognitive decline in early PD, and reduced transitions between certain microstates suggest injury in motor-related brain regions. The novel EEG microstate recognition framework paves the way to revealing more effective biomarkers for early PD.

JMLR Journal 2023 Journal Article

An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity

  • Wei Liu
  • Xin Liu
  • Xiaojun Chen

The leaky ReLU network with a group sparse regularization term has been widely used in the recent years. However, training such network yields a nonsmooth nonconvex optimization problem and there exists a lack of approaches to compute a stationary point deterministically. In this paper, we first resolve the multi-layer composite term in the original optimization problem by introducing auxiliary variables and additional constraints. We show the new model has a nonempty and bounded solution set and its feasible set satisfies the Mangasarian-Fromovitz constraint qualification. Moreover, we show the relationship between the new model and the original problem. Remarkably, we propose an inexact augmented Lagrangian algorithm for solving the new model, and show the convergence of the algorithm to a KKT point. Numerical experiments demonstrate that our algorithm is more efficient for training sparse leaky ReLU neural networks than some well-known algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2023 Conference Paper

CFFT-GAN: Cross-Domain Feature Fusion Transformer for Exemplar-Based Image Translation

  • Tianxiang Ma
  • Bingchuan Li
  • Wei Liu
  • Miao Hua
  • Jing Dong
  • Tieniu Tan

Exemplar-based image translation refers to the task of generating images with the desired style, while conditioning on certain input image. Most of the current methods learn the correspondence between two input domains and lack the mining of information within the domain. In this paper, we propose a more general learning approach by considering two domain features as a whole and learning both inter-domain correspondence and intra-domain potential information interactions. Specifically, we propose a Cross-domain Feature Fusion Transformer (CFFT) to learn inter- and intra-domain feature fusion. Based on CFFT, the proposed CFFT-GAN works well on exemplar-based image translation. Moreover, CFFT-GAN is able to decouple and fuse features from multiple domains by cascading CFFT modules. We conduct rich quantitative and qualitative experiments on several image translation tasks, and the results demonstrate the superiority of our approach compared to state-of-the-art methods. Ablation studies show the importance of our proposed CFFT. Application experimental results reflect the potential of our method.

NeurIPS Conference 2023 Conference Paper

D-Separation for Causal Self-Explanation

  • Wei Liu
  • Jun Wang
  • Haozhao Wang
  • Ruixuan Li
  • Zhiying Deng
  • YuanKai Zhang
  • Yang Qiu

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the non-selected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure for dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to 13. 7% compared to previous state-of-the-art MMI-based methods. Our code is in an anonymous repository: https: //anonymous. 4open. science/r/MCD-CE88.

AAAI Conference 2023 Conference Paper

DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise Annotations

  • Yuanfeng Ji
  • Lu Zhang
  • Jiaxiang Wu
  • Bingzhe Wu
  • Lanqing Li
  • Long-Kai Huang
  • Tingyang Xu
  • Yu Rong

AI-aided drug discovery (AIDD) is gaining popularity due to its potential to make the search for new pharmaceuticals faster, less expensive, and more effective. Despite its extensive use in numerous fields (e.g., ADMET prediction, virtual screening), little research has been conducted on the out-of-distribution (OOD) learning problem with noise. We present DrugOOD, a systematic OOD dataset curator and benchmark for AIDD. Particularly, we focus on the drug-target binding affinity prediction problem, which involves both macromolecule (protein target) and small-molecule (drug compound). DrugOOD offers an automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise level annotations, and rigorous benchmarking of SOTA OOD algorithms, as opposed to only providing fixed datasets. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have revealed a significant performance gap between in-distribution and out-of-distribution experiments, emphasizing the need for the development of more effective schemes that permit OOD generalization under noise for AIDD.

NeurIPS Conference 2023 Conference Paper

Evaluating Post-hoc Explanations for Graph Neural Networks via Robustness Analysis

  • Junfeng Fang
  • Wei Liu
  • Yuan Gao
  • Zemin Liu
  • An Zhang
  • Xiang Wang
  • Xiangnan He

This work studies the evaluation of explaining graph neural networks (GNNs), which is crucial to the credibility of post-hoc explainability in practical usage. Conventional evaluation metrics, and even explanation methods -- which mainly follow the paradigm of feeding the explanatory subgraph and measuring output difference -- always suffer from the notorious out-of-distribution (OOD) issue. In this work, we endeavor to confront the issue by introducing a novel evaluation metric, termed O OD-resistant A dversarial R obustness (OAR). Specifically, we draw inspiration from the notion of adversarial robustness and evaluate post-hoc explanation subgraphs by calculating their robustness under attack. On top of that, an elaborate OOD reweighting block is inserted into the pipeline to confine the evaluation process to the original data distribution. For applications involving large datasets, we further devise a Sim plified version of OAR (SimOAR), which achieves a significant improvement in computational efficiency at the cost of a small amount of performance. Extensive empirical studies validate the effectiveness of our OAR and SimOAR.

NeurIPS Conference 2023 Conference Paper

Exploiting Contextual Objects and Relations for 3D Visual Grounding

  • Li Yang
  • Chunfeng Yuan
  • Ziqi Zhang
  • Zhongang Qi
  • Yan Xu
  • Wei Liu
  • Ying Shan
  • Bing Li

3D visual grounding, the task of identifying visual objects in 3D scenes based on natural language inputs, plays a critical role in enabling machines to understand and engage with the real-world environment. However, this task is challenging due to the necessity to capture 3D contextual information to distinguish target objects from complex 3D scenes. The absence of annotations for contextual objects and relations further exacerbates the difficulties. In this paper, we propose a novel model, CORE-3DVG, to address these challenges by explicitly learning about contextual objects and relations. Our method accomplishes 3D visual grounding via three sequential modular networks, including a text-guided object detection network, a relation matching network, and a target identification network. During training, we introduce a pseudo-label self-generation strategy and a weakly-supervised method to facilitate the learning of contextual objects and relations, respectively. The proposed techniques allow the networks to focus more effectively on referred objects within 3D scenes by understanding their context better. We validate our model on the challenging Nr3D, Sr3D, and ScanRefer datasets and demonstrate state-of-the-art performance. Our code will be public at https: //github. com/yangli18/CORE-3DVG.

AAAI Conference 2023 Short Paper

Fraud’s Bargain Attacks to Textual Classifiers via Metropolis-Hasting Sampling (Student Abstract)

  • Mingze Ni
  • Zhensu Sun
  • Wei Liu

Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic heuristic rules that are agnostic to the optimal adversarial examples, a strategy that often results in attack failures. To this end, this research proposes Fraud's Bargain Attack (FBA), which utilizes a novel randomization mechanism to enlarge the searching space and enables high-quality adversarial examples to be generated with high probabilities. FBA applies the Metropolis-Hasting algorithm to enhance the selection of adversarial examples from all candidates proposed by a customized Word Manipulation Process (WMP). WMP perturbs one word at a time via insertion, removal, or substitution in a contextual-aware manner. Extensive experiments demonstrate that FBA outperforms the baselines in terms of attack success rate and imperceptibility.

IJCAI Conference 2023 Conference Paper

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation

  • Hanyuan Chen
  • Jun-Yan He
  • Wangmeng Xiang
  • Zhi-Qi Cheng
  • Wei Liu
  • Hanbing Liu
  • Bin Luo
  • Yifeng Geng

Human pose estimation is a challenging task due to its structured data sequence nature. Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient for scenarios involving overlapping joints and rapidly changing poses. To overcome these issues, we introduce a novel approach, the High-order Directed Transformer (HDFormer), which leverages high-order bone and joint relationships for improved pose estimation. Specifically, HDFormer incorporates both self-attention and high-order attention to formulate a multi-order attention module. This module facilitates first-order "joint-joint", second-order "bone-joint", and high-order "hyperbone-joint" interactions, effectively addressing issues in complex and occlusion-heavy situations. In addition, modern CNN techniques are integrated into the transformer-based architecture, balancing the trade-off between performance and efficiency. HDFormer significantly outperforms state-of-the-art (SOTA) models on Human3. 6M and MPI-INF-3DHP datasets, requiring only 1/10 of the parameters and significantly lower computational costs. Moreover, HDFormer demonstrates broad real-world applicability, enabling real-time, accurate 3D pose estimation. The source code is in https: //github. com/hyer/HDFormer.

AAAI Conference 2023 Conference Paper

PointCA: Evaluating the Robustness of 3D Point Cloud Completion Models against Adversarial Examples

  • Shengshan Hu
  • Junwei Zhang
  • Wei Liu
  • Junhui Hou
  • Minghui Li
  • Leo Yu Zhang
  • Hai Jin
  • Lichao Sun

Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause the performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.

JBHI Journal 2023 Journal Article

Predicting CircRNA-Disease Associations via Feature Convolution Learning With Heterogeneous Graph Attention Network

  • Li Peng
  • Cheng Yang
  • Yifan Chen
  • Wei Liu

Exploring the relationship between circular RNA (circRNA) and disease is beneficial for revealing the mechanisms of disease pathogenesis. However, a blind search for all possible associations between circRNAs and diseases through biological experiments is time-consuming. Although some prediction methods have been proposed, they still have limitations. In this study, a novel computational framework, called GATCL2CD, is proposed to forecast unknown circRNA-disease associations (CDAs). First, we calculate Gaussian interactive profile kernel (GIP) similarity and semantic similarity for diseases, circRNA sequence similarity and function similarity, and GIPs for circRNAs. Then, we combine them to construct a heterogeneous graph. Thereafter, GATCL2CD proposes a feature convolution learning framework, that uses a multi-head dynamic attention mechanism to obtain different aggregated representations of features that correspond to the nodes in the heterogeneous graph. Then, it extracts rich higher-order features from the stacked feature representations of each node by using of a single-layer convolutional neural network with filter kernels of different sizes. Finally, a pairwise element-wise product operation is implemented to capture the interactions of higher-order feature representations, and a multilayer perceptron neural network is introduced as an efficient classifier for inferring potential CDAs. Major experimental results under 5-fold cross-validation (5-fold CV) on three different datasets show that GATCL2CD is superior to five other state-of-the-art methods. Furthermore, case studies demonstrate the suitability of GATCL2CD as a useful tool for identifying potential disease-related circRNAs.

NeurIPS Conference 2023 Conference Paper

Punctuation-level Attack: Single-shot and Single Punctuation Can Fool Text Models

  • Wenqiang Wang
  • Chongyang Du
  • Tao Wang
  • Kaihao Zhang
  • Wenhan Luo
  • Lin Ma
  • Wei Liu
  • Xiaochun Cao

The adversarial attacks have attracted increasing attention in various fields including natural language processing. The current textual attacking models primarily focus on fooling models by adding character-/word-/sentence-level perturbations, ignoring their influence on human perception. In this paper, for the first time in the community, we propose a novel mode of textual attack, punctuation-level attack. With various types of perturbations, including insertion, displacement, deletion, and replacement, the punctuation-level attack achieves promising fooling rates against SOTA models on typical textual tasks and maintains minimal influence on human perception and understanding of the text by mere perturbation of single-shot single punctuation. Furthermore, we propose a search method named Text Position Punctuation Embedding and Paraphrase (TPPEP) to accelerate the pursuit of optimal position to deploy the attack, without exhaustive search, and we present a mathematical interpretation of TPPEP. Thanks to the integrated Text Position Punctuation Embedding (TPPE), the punctuation attack can be applied at a constant cost of time. Experimental results on public datasets and SOTA models demonstrate the effectiveness of the punctuation attack and the proposed TPPE. We additionally apply the single punctuation attack to summarization, semantic-similarity-scoring, and text-to-image tasks, and achieve encouraging results.

AAAI Conference 2023 Conference Paper

ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

  • Bingchuan Li
  • Tianxiang Ma
  • Peng Zhang
  • Miao Hua
  • Wei Liu
  • Qian He
  • Zili Yi

The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and edit- ing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying net- work, and see great generalizability towards unseen manipulation types and out-of-domain images.

AAAI Conference 2023 Conference Paper

Safe Multi-View Deep Classification

  • Wei Liu
  • Yufei Chen
  • Xiaodong Yue
  • Changqing Zhang
  • Shaorong Xie

Multi-view deep classification expects to obtain better classification performance than using a single view. However, due to the uncertainty and inconsistency of data sources, adding data views does not necessarily lead to the performance improvements in multi-view classification. How to avoid worsening classification performance when adding views is crucial for multi-view deep learning but rarely studied. To tackle this limitation, in this paper, we reformulate the multi-view classification problem from the perspective of safe learning and thereby propose a Safe Multi-view Deep Classification (SMDC) method, which can guarantee that the classification performance does not deteriorate when fusing multiple views. In the SMDC method, we dynamically integrate multiple views and estimate the inherent uncertainties among multiple views with different root causes based on evidence theory. Through minimizing the uncertainties, SMDC promotes the evidences from data views for correct classification, and in the meantime excludes the incorrect evidences to produce the safe multi-view classification results. Furthermore, we theoretically prove that in the safe multi-view classification, adding data views will certainly not increase the empirical risk of classification. The experiments on various kinds of multi-view datasets validate that the proposed SMDC method can achieve precise and safe classification results.

AAAI Conference 2023 Short Paper

Summarization Attack via Paraphrasing (Student Abstract)

  • Jiyao Li
  • Wei Liu

Many natural language processing models are perceived to be fragile on adversarial attacks. Recent work on adversarial attack has demonstrated a high success rate on sentiment analysis as well as classification models. However, attacks to summarization models have not been well studied. Summarization tasks are rarely influenced by word substitution, since advanced abstractive summary models utilize sentence level information. In this paper, we propose a paraphrasing-based attack method to attack summarization models. We first rank the sentences in the document according to their impacts to summarization. Then, we apply paraphrasing procedure to generate adversarial samples. Finally, we test our algorithm on benchmarks datasets against others methods. Our approach achieved the highest success rate and the lowest sentence substitution rate. In addition, the adversarial samples have high semantic similarity with the original sentences.

AAAI Conference 2023 Conference Paper

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

  • Stan Weixian Lei
  • Difei Gao
  • Jay Zhangjie Wu
  • Yuxuan Wang
  • Wei Liu
  • Mengmi Zhang
  • Mike Zheng Shou

VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to study how to answer questions when deploying VQA systems to new environments (new Visual scenes) and how to answer questions requiring new functions (new Question types). Thus, we propose CLOVE, a benchmark for Continual Learning On Visual quEstion answering, which contains scene- and function-incremental settings for the two aforementioned CL scenarios. In terms of methodology, the main difference between CL on VQA and classification is that the former additionally involves expanding and preventing forgetting of reasoning mechanisms, while the latter focusing on class representation. Thus, we propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Symbolic Replay. Using a piece of scene graph as a prompt, it replays pseudo scene graphs to represent the past images, along with correlated QA pairs. A unified VQA model is also proposed to utilize the current and replayed data to enhance its QA ability. Finally, experimental results reveal challenges in CLOVE and demonstrate the effectiveness of our method. Code and data are available at https://github.com/showlab/CLVQA.

AAAI Conference 2023 Conference Paper

Towards In-Distribution Compatible Out-of-Distribution Detection

  • Boxi Wu
  • Jie Jiang
  • Haidong Ren
  • Zifan Du
  • Wenxiao Wang
  • Zhifeng Li
  • Deng Cai
  • Xiaofei He

Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy.

AAAI Conference 2023 Conference Paper

Trusted Fine-Grained Image Classification through Hierarchical Evidence Fusion

  • Zhikang Xu
  • Xiaodong Yue
  • Ying Lv
  • Wei Liu
  • Zihao Li

Fine-Grained Image Classification (FGIC) aims to classify images into specific subordinate classes of a superclass. Due to insufficient training data and confusing data samples, FGIC may produce uncertain classification results that are untrusted for data applications. In fact, FGIC can be viewed as a hierarchical classification process and the multilayer information facilitates to reduce uncertainty and improve the reliability of FGIC. In this paper, we adopt the evidence theory to measure uncertainty and confidence in hierarchical classification process and propose a trusted FGIC method through fusing multilayer classification evidence. Comparing with the traditional approaches, the trusted FGIC method not only generates accurate classification results but also reduces the uncertainty of fine-grained classification. Specifically, we construct an evidence extractor at each classification layer to extract multilayer (multi-grained) evidence for image classification. To fuse the extracted multi-grained evidence from coarse to fine, we formulate evidence fusion with the Dirichlet hyper probability distribution and thereby hierarchically decompose the evidence of coarse-grained classes into fine-grained classes to enhance the classification performances. The ablation experiments validate that the hierarchical evidence fusion can improve the precision and also reduce the uncertainty of fine-grained classification. The comparison with state-of-the-art FGIC methods shows that our proposed method achieves competitive performances.

ICML Conference 2022 Conference Paper

Constrained Variational Policy Optimization for Safe Reinforcement Learning

  • Zuxin Liu
  • Zhepeng Cen
  • Vladislav Isenbaev
  • Wei Liu
  • Zhiwei Steven Wu
  • Bo Li 0026
  • Ding Zhao

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). The proposed algorithm decomposes the safe RL problem into a convex optimization phase and a supervised learning phase, which yields a more stable training performance. A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines. The code is available at https: //github. com/liuzuxin/cvpo-safe-rl.

NeurIPS Conference 2022 Conference Paper

Egocentric Video-Language Pretraining

  • Kevin Qinghong Lin
  • Jinpeng Wang
  • Mattia Soldan
  • Michael Wray
  • Rui Yan
  • Eric Z. XU
  • Difei Gao
  • Rong-Cheng Tu

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3. 8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https: //github. com/showlab/EgoVLP.

AAAI Conference 2022 Conference Paper

Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning

  • Huanqin Wu
  • Baijiaxin Ma
  • Wei Liu
  • Tao Chen
  • Dan Nie

Generating absent keyphrases, which do not appear in the input document, is challenging in the keyphrase prediction task. Most previous works treat the problem as an autoregressive sequence-to-sequence generation task, which demonstrates promising results for generating grammatically correct and fluent absent keyphrases. However, such an end-toend process with a complete data-driven manner is unconstrained, which is prone to generate keyphrases inconsistent with the input document. In addition, the existing autoregressive decoding method makes the generation of keyphrases must be done from left to right, leading to slow speed during inference. In this paper, we propose a constrained absent keyphrase generation method in a prompt-based learning fashion. Specifically, the prompt will be created firstly based on the keywords, which are defined as the overlapping words between absent keyphrase and document. Then, a maskpredict decoder is used to complete the absent keyphrase on the constraint of prompt. Experiments on keyphrase generation benchmarks have demonstrated the effectiveness of our approach. In addition, we evaluate the performance of constrained absent keyphrases generation from an information retrieval perspective. The result shows that our approach can generate more consistent keyphrases, which can improve document retrieval performance. What’s more, with a nonautoregressive decoding manner, our model can speed up the absent keyphrase generation by 8. 67× compared with the autoregressive method.

NeurIPS Conference 2022 Conference Paper

FR: Folded Rationalization with a Unified Encoder

  • Wei Liu
  • Haozhao Wang
  • Jun Wang
  • Ruixuan Li
  • Chao Yue
  • YuanKai Zhang

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a suboptimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10. 3% as compared to state-of-the-art methods.

TIST Journal 2022 Journal Article

Redundant Label Learning via Subspace Representation and Global Disambiguation

  • Gengyu Lyu
  • Songhe Feng
  • Wei Liu
  • Shuoyan Liu
  • Congyan Lang

Redundant Label Learning (RLL) aims at inducing a robust model from training data, where each example is associated with a set of candidate labels, among which some of them are incorrect. Most existing approaches deal with such problem by disambiguating the candidate labels first and then inducing the predictive model from the disambiguated data. However, these approaches only focus on disambiguation for each instance’ candidate label set, while the global label context tends to be ignored. Meanwhile, these approaches usually induce the objective model by directly utilizing the original feature information, which may lead to the model overfitting due to high-dimensional redundant features. To tackle the above issues, we propose a novel feature S ubspac E R epresentation and label G lobal Disambiguat IO n ( SERGIO ) approach, which improves the generalization ability of the learning system from the perspective of both feature space and label space. Specifically, we project the original high-dimensional feature space into a low-dimensional subspace, where the projection matrix is regularized with an orthogonality constraint to make the subspace more compact. Meanwhile, we introduce a label confidence matrix and constrain it with ℓ 1 -norm and trace-norm regularization simultaneously, which are utilized to explore global label correlations and further well in accordance with the nature of single-label classification and multi-label classification problem, respectively. Extensive experiments on both single-label and multi-label RLL datasets demonstrate that our proposed method achieves competitive performance against state-of-the-art approaches.

JMLR Journal 2022 Journal Article

Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration

  • Congliang Chen
  • Li Shen
  • Fangyu Zou
  • Wei Liu

Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have been tried to promote Adam-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam for solving large-scale non-convex stochastic optimization. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. On the other hand, in practice, mini-Adam and distributed-Adam are widely used without any theoretical guarantee. We further give an analysis on how the batch size or the number of nodes in the distributed system affects the convergence of Adam, which theoretically shows that mini-batch and distributed Adam can be linearly accelerated by using a larger mini-batch size or a larger number of nodes. At last, we apply the generic Adam and mini-batch Adam with the sufficient condition for solving the counterexample and training several neural networks on various real-world datasets. Experimental results are exactly in accord with our theoretical analysis. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

AAAI Conference 2022 Conference Paper

Trusted Multi-View Deep Learning with Opinion Aggregation

  • Wei Liu
  • Xiaodong Yue
  • Yufei Chen
  • Thierry Denoeux

Multi-view deep learning is performed based on the deep fusion of data from multiple sources, i. e. data with multiple views. However, due to the property differences and inconsistency of data sources, the deep learning results based on the fusion of multi-view data may be uncertain and unreliable. It is required to reduce the uncertainty in data fusion and implement the trusted multi-view deep learning. Aiming at the problem, we revisit the multi-view learning from the perspective of opinion aggregation and thereby devise a trusted multiview deep learning method. Within this method, we adopt evidence theory to formulate the uncertainty of opinions as learning results from different data sources and measure the uncertainty of opinion aggregation as multi-view learning results through evidence accumulation. We prove that accumulating the evidences from multiple data views will decrease the uncertainty in multi-view deep learning and facilitate to achieve the trusted learning results. Experiments on various kinds of multi-view datasets verify the reliability and robustness of the proposed multi-view deep learning method.

IJCAI Conference 2021 Conference Paper

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks

  • Xinghao Yang
  • Weifeng Liu
  • Dacheng Tao
  • Wei Liu

Modern Natural Language Processing (NLP) models are known immensely brittle towards text adversarial examples. Recent attack algorithms usually adopt word-level substitution strategies following a pre-computed word replacement mechanism. However, their resultant adversarial examples are still imperfect in achieving grammar correctness and semantic similarities, which is largely because of their unsuitable candidate word selections and static optimization methods. In this research, we propose BESA, a BERT-based Simulated Annealing algorithm, to address these two problems. Firstly, we leverage the BERT Masked Language Model (MLM) to generate contextual-aware candidate words to produce fluent adversarial text and avoid grammar errors. Secondly, we employ Simulated Annealing (SA) to adaptively determine the word substitution order. The SA provides sufficient word replacement options via internal simulations, with an objective to obtain both a high attack success rate and a low word substitution rate. Besides, our algorithm is able to jump out of local optima with a controlled probability, making it closer to achieve the best possible attack (i. e. , the global optima). Experiments on five popular datasets manifest the superiority of BESA compared with existing methods, including TextFooler, BAE, BERT-Attack, PWWS, and PSO.

AAAI Conference 2021 Conference Paper

Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search

  • Xinghao Yang
  • Weifeng Liu
  • James Bailey
  • Dacheng Tao
  • Wei Liu

Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification are rarely studied. Several lines of text attack methods have been proposed in the literature, such as character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word distortions necessary to induce misclassification, while simultaneously ensuring the lexical correctness, syntactic correctness, and semantic similarity. In this paper, we propose the Bigram and Unigram based Monotonic Heuristic Search (BU-MHS) method to examine the vulnerability of deep models. Our method has three major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless outputs. Secondly, we propose a hybrid method to replace the input words with both their synonyms and sememe candidates, which greatly enriches potential substitutions compared to only using synonyms. Lastly, we design a search algorithm, i. e. , Monotonic Heuristic Search (MHS), to determine the priority of word replacements, aiming to reduce the modification cost in an adversarial attack. We evaluate the effectiveness of BU-MHS on IMDB, AG’s News, and Yahoo! Answers text datasets by attacking four popular DNNs models. Results show that our BU-MHS achieves the highest attack success rate by changing the smallest number of words compared with baselines.

NeurIPS Conference 2021 Conference Paper

Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement

  • Aming WU
  • Suqi Zhao
  • Cheng Deng
  • Wei Liu

Few-shot object detection (FSOD) aims to detect new objects based on few annotated samples. To alleviate the impact of few samples, enhancing the generalization and discrimination abilities of detectors on new objects plays an important role. In this paper, we explore employing Singular Value Decomposition (SVD) to boost both the generalization and discrimination abilities. In specific, we propose a novel method, namely, SVD-Dictionary enhancement, to build two separated spaces based on the sorted singular values. Concretely, the eigenvectors corresponding to larger singular values are used to build the generalization space in which localization is performed, as these eigenvectors generally suppress certain variations (e. g. , the variation of styles) and contain intrinsical characteristics of objects. Meanwhile, since the eigenvectors corresponding to relatively smaller singular values may contain richer category-related information, we can utilize them to build the discrimination space in which classification is performed. Dictionary learning is further leveraged to capture high-level discriminative information from the discrimination space, which is beneficial for improving detection accuracy. In the experiments, we separately verify the effectiveness of our method on PASCAL VOC and COCO benchmarks. Particularly, for the 2-shot case in VOC split1, our method significantly outperforms the baseline by 6. 2\%. Moreover, visualization analysis shows that our method is instrumental in doing FSOD.

ICRA Conference 2021 Conference Paper

Kinematic analysis of a flexible surgical instrument for robot-assisted minimally invasive surgery

  • Mei Feng
  • Zhixue Ni
  • Yili Fu
  • Xingze Jin
  • Wei Liu
  • Xiuquan Lu

Flexible surgical instruments can flexibly adjust their posture with a high degree of freedom, which makes them highly suitable for performing surgical tasks in narrow workspaces. However, redundant degrees of freedom increase their kinematic difficulty, which may cause redundant solutions, complex calculations, and low speeds. In this paper, a flexible surgical instrument is presented. The structural characteristics of this flexible instrument were explored in terms of force balance, it was concluded that the instrument had a constant curvature during bending. Based on this, the kinematics and inverse kinematics were solved via the geometric and Newton iteration methods, respectively. Our experiments showed that the proposed method for solving flexible instrument kinematics had high precision, a unique solution, and high speed; the instrument can be well controlled to perform refined operations. The proposed geometric method for solving the flexible instrument kinematics avoided the calculation of the Jacobian matrix, making it fast and capable of meeting the master-slave control requirement for real-time surgery. Furthermore, the proposed kinematics solution method is not limited by the mechanical structure, so it can be used for flexible instruments owning to its constant curvature bending.

NeurIPS Conference 2021 Conference Paper

Neural Routing by Memory

  • Kaipeng Zhang
  • Zhenqiang Li
  • Zhifeng Li
  • Wei Liu
  • Yoichi Sato

Recent Convolutional Neural Networks (CNNs) have achieved significant success by stacking multiple convolutional blocks, named procedures in this paper, to extract semantic features. However, they use the same procedure sequence for all inputs, regardless of the intermediate features. This paper proffers a simple yet effective idea of constructing parallel procedures and assigning similar intermediate features to the same specialized procedures in a divide-and-conquer fashion. It relieves each procedure's learning difficulty and thus leads to superior performance. Specifically, we propose a routing-by-memory mechanism for existing CNN architectures. In each stage of the network, we introduce parallel Procedural Units (PUs). A PU consists of a memory head and a procedure. The memory head maintains a summary of a type of features. For an intermediate feature, we search its closest memory and forward it to the corresponding procedure in both training and testing. In this way, different procedures are tailored to different features and therefore tackle them better. Networks with the proposed mechanism can be trained efficiently using a four-step training strategy. Experimental results show that our method improves VGGNet, ResNet, and EfficientNet's accuracies on Tiny ImageNet, ImageNet, and CIFAR-100 benchmarks with a negligible extra computational cost.

TIST Journal 2021 Journal Article

Quantized Adam with Error Feedback

  • Congliang Chen
  • Li Shen
  • Haozhi Huang
  • Wei Liu

In this article, we present a distributed variant of an adaptive stochastic gradient method for training deep neural networks in the parameter-server model. To reduce the communication cost among the workers and server, we incorporate two types of quantization schemes, i.e., gradient quantization and weight quantization, into the proposed distributed Adam. In addition, to reduce the bias introduced by quantization operations, we propose an error-feedback technique to compensate for the quantized gradient. Theoretically, in the stochastic nonconvex setting, we show that the distributed adaptive gradient method with gradient quantization and error feedback converges to the first-order stationary point, and that the distributed adaptive gradient method with weight quantization and error feedback converges to the point related to the quantized level under both the single-worker and multi-worker modes. Last, we apply the proposed distributed adaptive gradient methods to train deep neural networks. Experimental results demonstrate the efficacy of our methods.

AAAI Conference 2020 Conference Paper

A Generalized Framework for Edge-Preserving and Structure-Preserving Image Smoothing

  • Wei Liu
  • Pingping Zhang
  • Yinjie Lei
  • Xiaolin Huang
  • Jie Yang
  • Ian Reid

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. The effectiveness and superior performance of our approach are validated through comprehensive experimental results in a range of applications.

IJCAI Conference 2020 Conference Paper

A Spatial Missing Value Imputation Method for Multi-view Urban Statistical Data

  • Yongshun Gong
  • Zhibin Li
  • Jian Zhang
  • Wei Liu
  • Bei Chen
  • Xiangjun Dong

Large volumes of urban statistical data with multiple views imply rich knowledge about the development degree of cities. These data present crucial statistics which play an irreplaceable role in the regional analysis and urban computing. In reality, however, the statistical data divided into fine-grained regions usually suffer from missing data problems. Those missing values hide the useful information that may result in a distorted data analysis. Thus, in this paper, we propose a spatial missing data imputation method for multi-view urban statistical data. To address this problem, we exploit an improved spatial multi-kernel clustering method to guide the imputation process cooperating with an adaptive-weight non-negative matrix factorization strategy. Intensive experiments are conducted with other state-of-the-art approaches on six real-world urban statistical datasets. The results not only show the superiority of our method against other comparative methods on different datasets, but also represent a strong generalizability of our model.

NeurIPS Conference 2020 Conference Paper

Adversarial Learning for Robust Deep Clustering

  • Xu Yang
  • Cheng Deng
  • Kun Wei
  • Junchi Yan
  • Wei Liu

Deep clustering integrates embedding and clustering together to obtain the optimal nonlinear embedding space, which is more effective in real-world scenarios compared with conventional clustering methods. However, the robustness of the clustering network is prone to being attenuated especially when it encounters an adversarial attack. A small perturbation in the embedding space will lead to diverse clustering results since the labels are absent. In this paper, we propose a robust deep clustering method based on adversarial learning. Specifically, we first attempt to define adversarial samples in the embedding space for the clustering network. Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding. We then provide a simple yet efficient defense algorithm to improve the robustness of the clustering network. Experimental results on two popular datasets show that the proposed adversarial learning method can significantly enhance the robustness and further improve the overall clustering performance. Particularly, the proposed method is generally applicable to multiple existing clustering frameworks to boost their robustness. The source code is available at https: //github. com/xdxuyang/ALRDC.

AAAI Conference 2020 Conference Paper

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

  • Zhaohui Zheng
  • Ping Wang
  • Wei Liu
  • Jinze Li
  • Rongguang Ye
  • Dongwei Ren

Bounding box regression is the crucial step in object detection. In existing methods, while ℓn-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, i.e., overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster R-CNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU.

IJCAI Conference 2020 Conference Paper

Few-shot Visual Learning with Contextual Memory and Fine-grained Calibration

  • Yuqing Ma
  • Wei Liu
  • Shihao Bai
  • Qingyu Zhang
  • Aishan Liu
  • Weimin Chen
  • Xianglong Liu

Few-shot learning aims to learn a model that can be readily adapted to new unseen classes (concepts) by accessing one or few examples. Despite the successful progress, most of the few-shot learning approaches, concentrating on either global or local characteristics of examples, still suffer from weak generalization abilities. Inspired by the inverted pyramid theory, to address this problem, we propose an inverted pyramid network (IPN) that intimates the human's coarse-to-fine cognition paradigm. The proposed IPN consists of two consecutive stages, namely global stage and local stage. At the global stage, a class-sensitive contextual memory network (CCMNet) is introduced to learn discriminative support-query relation embeddings and predict the query-to-class similarity based on the contextual memory. Then at the local stage, a fine-grained calibration is further appended to complement the coarse relation embeddings, targeting more precise query-to-class similarity evaluation. To the best of our knowledge, IPN is the first work that simultaneously integrates both global and local characteristics in few-shot learning, approximately imitating the human cognition mechanism. Our extensive experiments on multiple benchmark datasets demonstrate the superiority of IPN, compared to a number of state-of-the-art approaches.

NeurIPS Conference 2020 Conference Paper

Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies

  • Yuehua Zhu
  • Muli Yang
  • Cheng Deng
  • Wei Liu

Deep metric learning plays a key role in various machine learning tasks. Most of the previous works have been confined to sampling from a mini-batch, which cannot precisely characterize the global geometry of the embedding space. Although researchers have developed proxy- and classification-based methods to tackle the sampling issue, those methods inevitably incur a redundant computational cost. In this paper, we propose a novel Proxy-based deep Graph Metric Learning (ProxyGML) approach from the perspective of graph classification, which uses fewer proxies yet achieves better comprehensive performance. Specifically, multiple global proxies are leveraged to collectively approximate the original data points for each class. To efficiently capture local neighbor relationships, a small number of such proxies are adaptively selected to construct similarity subgraphs between these proxies and each data point. Further, we design a novel reverse label propagation algorithm, by which the neighbor relationships are adjusted according to ground-truth labels, so that a discriminative metric space can be learned during the process of subgraph classification. Extensive experiments carried out on widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate the superiority of the proposed ProxyGML over the state-of-the-art methods in terms of both effectiveness and efficiency. The source code is publicly available at \url{https: //github. com/YuehuaZhu/ProxyGML}.

AAAI Conference 2020 Conference Paper

Multi-Task Driven Feature Models for Thermal Infrared Tracking

  • Qiao Liu
  • Xin Li
  • Zhenyu He
  • Nana Fan
  • Di Yuan
  • Wei Liu
  • Yongsheng Liang

Existing deep Thermal InfraRed (TIR) trackers usually use the feature models of RGB trackers for representation. However, these feature models learned on RGB images are neither effective in representing TIR objects nor taking fine-grained TIR information into consideration. To this end, we develop a multi-task framework to learn the TIR-specific discriminative features and fine-grained correlation features for TIR tracking. Specifically, we first use an auxiliary classification network to guide the generation of TIR-specific discriminative features for distinguishing the TIR objects belonging to different classes. Second, we design a fine-grained aware module to capture more subtle information for distinguishing the TIR objects belonging to the same class. These two kinds of features complement each other and recognize TIR objects in the levels of inter-class and intra-class respectively. These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task. In addition, we develop a large-scale TIR training dataset to train the network for adapting the model to the TIR domain. Extensive experimental results on three benchmarks show that the proposed algorithm achieves a relative gain of 10% over the baseline and performs favorably against the stateof-the-art methods. Codes and the proposed TIR dataset are available at https: //github. com/QiaoLiuHit/MMNet.

NeurIPS Conference 2020 Conference Paper

Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

  • Yan Yan
  • Yi Xu
  • Qihang Lin
  • Wei Liu
  • Tianbao Yang

Epoch gradient descent method (a. k. a. Epoch-GD) proposed by (Hazan and Kale, 2011) was deemeda breakthrough for stochastic strongly convex minimization, which achieves theoptimal convergence rate of O(1/T) with T iterative updates for the objective gap. However, its extension to solving stochastic min-max problems with strong convexity and strong concavity still remains open, and it is still unclear whethera fast rate ofO(1/T)for theduality gapis achievable for stochastic min-max optimization under strong convexity and strong concavity. Although some re-cent studies have proposed stochastic algorithms with fast convergence rates formin-max problems, they require additional assumptions about the problem, e. g. ,smoothness, bi-linear structure, etc. In this paper, we bridge this gap by providinga sharp analysis of epoch-wise stochastic gradient descent ascent method (referredto as Epoch-GDA) for solving strongly convex strongly concave (SCSC) min-maxproblems, without imposing any additional assumption about smoothness or the function’s structure. To the best of our knowledge, our result is the first one that shows Epoch-GDA can achieve the optimal rate ofO(1/T)for the duality gapof general SCSC min-max problems. We emphasize that such generalization of Epoch-GD for strongly convex minimization problems to Epoch-GDA for SCSC min-max problems is non-trivial and requires novel technical analysis. Moreover, we notice that the key lemma can also be used for proving the convergence of Epoch-GDA for weakly-convex strongly-concave min-max problems, leading to a nearly optimal complexity without resorting to smoothness or other structural conditions.

IJCAI Conference 2020 Conference Paper

Population Location and Movement Estimation through Cross-domain Data Analysis

  • Xinghao Yang
  • Wei Liu

Estimations on people movement behaviour within a country can provide valuable information to government strategic resource plannings. In this paper, we propose to utilize multi-domain statistical data to estimate people movements under the assumption that most population tend to move to areas with similar or better living conditions. We design a Multi-domain Matrix Factorization (MdMF) model to discover the underlying consistency patterns from these cross-domain data and estimate the movement trends using the proposed model. This research can provide important theoretical support to government and agencies in strategic resource planning and investments.

AAAI Conference 2020 Conference Paper

Potential Passenger Flow Prediction: A Novel Study for Urban Transportation Development

  • Yongshun Gong
  • Zhibin Li
  • Jian Zhang
  • Wei Liu
  • Jinfeng Yi

Recently, practical applications for passenger flow prediction have brought many benefits to urban transportation development. With the development of urbanization, a real-world demand from transportation managers is to construct a new metro station in one city area that never planned before. Authorities are interested in the picture of the future volume of commuters before constructing a new station, and estimate how would it affect other areas. In this paper, this specific problem is termed as potential passenger flow (PPF) prediction, which is a novel and important study connected with urban computing and intelligent transportation systems. For example, an accurate PPF predictor can provide invaluable knowledge to designers, such as the advice of station scales and influences on other areas, etc. To address this problem, we propose a multi-view localized correlation learning method. The core idea of our strategy is to learn the passenger flow correlations between the target areas and their localized areas with adaptive-weight. To improve the prediction accuracy, other domain knowledge is involved via a multiview learning process. We conduct intensive experiments to evaluate the effectiveness of our method with real-world official transportation datasets. The results demonstrate that our method can achieve excellent performance compared with other available baselines. Besides, our method can provide an effective solution to the cold-start problem in the recommender system as well, which proved by its outperformed experimental results.

KR Conference 2020 Conference Paper

Seq2KG: An End-to-End Neural Model for Domain Agnostic Knowledge Graph (not Text Graph) Construction from Text

  • Michael Stewart
  • Wei Liu

Knowledge Graph Construction (KGC) from text unlocks information held within unstructured text and is critical to a wide range of downstream applications. General approaches to KGC from text are heavily reliant on the existence of knowledge bases, yet most domains do not even have an external knowledge base readily available. In many situations this results in information loss as a wealth of key information is held within "non-entities". Domain-specific approaches to KGC typically adopt unsupervised pipelines, using carefully crafted linguistic and statistical patterns to extract co-occurred noun phrases as triples, essentially constructing text graphs rather than true knowledge graphs. In this research, for the first time, in the same flavour as Collobert et al. 's seminal work of "Natural language processing (almost) from scratch" in 2011, we propose a Seq2KG model attempting to achieve "Knowledge graph construction (almost) from scratch". An end-to-end Sequence to Knowledge Graph (Seq2KG) neural model jointly learns to generate triples and resolves entity types as a multi-label classification task through deep learning neural networks. In addition, a novel evaluation metric that takes both semantic and structural closeness into account is developed for measuring the performance of triple extraction. We show that our end-to-end Seq2KG model performs on par with a state of the art rule-based system which outperformed other neural models and won the first prize of the first Knowledge Graph Contest in 2019. A new annotation scheme and three high-quality manually annotated datasets are available to help promote this direction of research.

IS Journal 2020 Journal Article

The Study for Public Management Policy Utility Evaluation and Optimization System under the Framework of Social Computing Perspective

  • Le Chen
  • Xianzhi Yuan
  • Gaoyu Zhang
  • Qinghua Guo
  • Wei Liu
  • Shuyi Zhang

In recent years, in order to rationalize the allocation of social resources and optimize the implementation of public management policies, scholars have conducted in-depth researches on policy effectiveness. However, at present, most of the study is still at the level of using macrolevel qualitative analysis, and lack of quantitative analysis and evaluation system for the effectiveness of policy implementation. The goal of this article is to discuss the utility evaluation system of public management policy from the perspective of social computing. First, based on the data obtained through questionnaire survey, we obtain indicators of the survey data by using factor analysis, and a new BDI (belief–desire–intention) model is created based on the observation indicators, and then the simulation platform is constructed; then, a brand new quantitative analysis method for policy optimization is proposed by using modified logistic functions as a tool. As application, we conducted the case study for the “Targeted poverty alleviation policy in Yulin region” (Guangxi, China), in which the key indicators for the poverty were established, and then the policy optimization suggestions were given based on the results of simulation experiments. This case study has Chinese characteristics, which might be applied to the poverty alleviation work globally.

NeurIPS Conference 2020 Conference Paper

Towards Playing Full MOBA Games with Deep Reinforcement Learning

  • Deheng Ye
  • Guibin Chen
  • Wen Zhang
  • Sheng Chen
  • Bo Yuan
  • Bo Liu
  • Jia Chen
  • Zhao Liu

MOBA games, e. g. , Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i. e. , lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including off-policy adaption, multi-head value estimation, curriculum self-play learning, policy distillation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

IJCAI Conference 2020 Conference Paper

Transductive Relation-Propagation Network for Few-shot Learning

  • Yuqing Ma
  • Shihao Bai
  • Shan An
  • Wei Liu
  • Aishan Liu
  • Xiantong Zhen
  • Xianglong Liu

Few-shot learning, aiming to learn novel concepts from few labeled examples, is an interesting and very challenging problem with many practical advantages. To accomplish this task, one should concentrate on revealing the accurate relations of the support-query pairs. We propose a transductive relation-propagation graph neural network (TRPN) to explicitly model and propagate such relations across support-query pairs. Our TRPN treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intra-class commonality and inter-class uniqueness, to guide the relation propagation in the graph, generating the discriminative relation embeddings for support-query pairs. A pseudo relational node is further introduced to propagate the query characteristics, and a fast, yet effective transductive learning strategy is devised to fully exploit the relation information among different queries. To the best of our knowledge, this is the first work that explicitly takes the relations of support-query pairs into consideration in few-shot learning, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.

AAAI Conference 2020 Short Paper

Travel Time Prediction on Un-Monitored Roads: A Spatial Factorization Machine Based Approach (Student Abstract)

  • Lile Li
  • Wei Liu

Real-time traffic monitoring is one of the most important factors for route planning and estimated time of arrival (ETA). Many major roads in large cities are installed with live traffic monitoring systems, inferring the current traffic congestion status and ETAs to other locations. However, there are also many other roads, especially small roads and paths, that are not monitored. Yet, live traffic status on such un-monitored small roads can play a non-negligible role in personalized route planning and re-routing when road incident happens. How to estimate the traffic status on such un-monitored roads is thus a valuable problem to be addressed. In this paper, we propose a model called Spatial Factorization Machines (SFM) to address this problem. A major advantage of the SFM model is that it incorporates physical distances and structures of road networks into the estimation of traffic status on un-monitored roads. Our experiments on real world traffic data demonstrate that the SFM model significantly outperforms other existing models on ETA of un-monitored roads.

IJCAI Conference 2019 Conference Paper

A Compliance Checking Framework for DNN Models

  • Sunny Verma
  • Chen Wang
  • Liming Zhu
  • Wei Liu

Growing awareness towards ethical use of machine learning (ML) models has created a surge for the development of fair models. Existing work in this regard assumes the presence of sensitive attributes in the data and hence can build classifiers whose decisions remain agnostic to such attributes. However, in the real world settings, the end-user of the ML model is unaware of the training data; besides, building custom models is not always feasible. Moreover, utilizing a pre-trained model with high accuracy on certain dataset can not be assumed to be fair. Unknown biases in the training data are the true culprit for unfair models (i. e. , disparate performance for groups in the dataset). In this preliminary research, we propose a different lens for building fair models by enabling the user with tools to discover blind spots and biases in a pre-trained model and augment them with corrective measures.

NeurIPS Conference 2019 Conference Paper

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

  • Qiming Zhang
  • Jing Zhang
  • Wei Liu
  • Dacheng Tao

Unsupervised domain adaptation (UDA) aims to enhance the generalization capability of a certain model from a source domain to a target domain. UDA is of particular significance since no extra effort is devoted to annotating target domain samples. However, the different data distributions in the two domains, or \emph{domain shift/discrepancy}, inevitably compromise the UDA performance. Although there has been a progress in matching the marginal distributions between two domains, the classifier favors the source domain features and makes incorrect predictions on the target domain due to category-agnostic feature alignment. In this paper, we propose a novel category anchor-guided (CAG) UDA model for semantic segmentation, which explicitly enforces category-aware feature alignment to learn shared discriminative features and classifiers simultaneously. First, the category-wise centroids of the source domain features are used as guided anchors to identify the active features in the target domain and also assign them pseudo-labels. Then, we leverage an anchor-based pixel-level distance loss and a discriminative loss to drive the intra-category features closer and the inter-category features further apart, respectively. Finally, we devise a stagewise training mechanism to reduce the error accumulation and adapt the proposed model progressively. Experiments on both the GTA5$\rightarrow $Cityscapes and SYNTHIA$\rightarrow $Cityscapes scenarios demonstrate the superiority of our CAG-UDA model over the state-of-the-art methods. The code is available at \url{https: //github. com/RogerZhangzz/CAG\_UDA}.

TIST Journal 2019 Journal Article

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

  • Ali Braytee
  • Wei Liu
  • Ali Anaissi
  • Paul J. Kennedy

Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges, including class imbalance, label correlation, incomplete multi-label matrices, and noisy and irrelevant features. In this article, we propose an integrated multi-label classification approach with incomplete label space and class imbalance (ML-CIB) for simultaneously training the multi-label classification model and addressing the aforementioned challenges. The model learns a new label matrix and captures new label correlations, because it is difficult to find a complete label vector for each instance in real-world data. We also propose a label regularization to handle the imbalanced multi-labeled issue in the new label, and l 1 regularization norm is incorporated in the objective function to select the relevant sparse features. A multi-label feature selection (ML-CIB-FS) method is presented as a variant of the proposed ML-CIB to show the efficacy of the proposed method in selecting the relevant features. ML-CIB is formulated as a constrained objective function. We use the accelerated proximal gradient method to solve the proposed optimisation problem. Last, extensive experiments are conducted on 19 regular-scale and large-scale imbalanced multi-labeled datasets. The promising results show that our method significantly outperforms the state-of-the-art.

AAAI Conference 2019 Short Paper

Cross-Domain Recommendation via Coupled Factorization Machines

  • Lile Li
  • Quan Do
  • Wei Liu

Data across many business domains can be represented by two or more coupled data sets. Correlations among these coupled datasets have been studied in the literature for making more accurate cross-domain recommender systems. However, in existing methods, cross-domain recommendations mostly assume the coupled mode of data sets share identical latent factors, which limits the discovery of potentially useful domain-specific properties of the original data. In this paper, we proposed a novel cross-domain recommendation method called Coupled Factorization Machine (CoFM) that addresses this limitation. Compared to existing models, our research is the first model that uses factorization machines to capture both common characteristics of coupled domains while simultaneously preserving the differences among them. Our experiments with real-world datasets confirm the advantages of our method in making across-domain recommendations.

NeurIPS Conference 2019 Conference Paper

Cross-Modal Learning with Adversarial Samples

  • Chao Li
  • Shangqian Gao
  • Cheng Deng
  • De Xie
  • Wei Liu

With the rapid developments of deep neural networks, numerous deep cross-modal analysis methods have been presented and are being applied in widespread real-world applications, including healthcare and safety-critical environments. However, the recent studies on robustness and stability of deep neural networks show that a microscopic modification, known as adversarial sample, which is even imperceptible to humans, can easily fool a well-performed deep neural network and brings a new obstacle to deep cross-modal correlation exploring. In this paper, we propose a novel Cross-Modal correlation Learning with Adversarial samples, namely CMLA, which for the first time presents the existence of adversarial samples in cross-modal data. Moreover, we provide a simple yet effective adversarial sample learning method, where inter- and intra- modality similarity regularizations across different modalities are simultaneously integrated into the learning of adversarial samples. Finally, our proposed CMLA is demonstrated to be highly effective in cross-modal hashing based retrieval. Extensive experiments on two cross-modal benchmark datasets show that the adversarial examples produced by our CMLA are efficient in fooling a target deep cross-modal hashing network. On the other hand, such adversarial examples can significantly strengthen the robustness of the target network by conducting an adversarial training.

IJCAI Conference 2019 Conference Paper

DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis

  • Sunny Verma
  • Chen Wang
  • Liming Zhu
  • Wei Liu

Multimodal sentiment analysis combines information available from visual, textual, and acoustic representations for sentiment prediction. The recent multimodal fusion schemes combine multiple modalities as a tensor and obtain either; the common information by utilizing neural networks, or the unique information by modeling low-rank representation of the tensor. However, both of these information are essential as they render inter-modal and intra-modal relationships of the data. In this research, we first propose a novel deep architecture to extract the common information from the multi-mode representations. Furthermore, we propose unique networks to obtain the modality-specific information that enhances the generalization performance of our multimodal system. Finally, we integrate these two aspects of information via a fusion layer and propose a novel multimodal data fusion architecture, which we call DeepCU (Deep network with both Common and Unique latent information). The proposed DeepCU consolidates the two networks for joint utilization and discovery of all-important latent information. Comprehensive experiments are conducted to demonstrate the effectiveness of utilizing both common and unique information discovered by DeepCU on multiple real-world datasets. The source code of proposed DeepCU is available at https: //github. com/sverma88/DeepCU-IJCAI19.

AAAI Conference 2019 Conference Paper

Enhanced Random Forest Algorithms for Partially Monotone Ordinal Classification

  • Christopher Bartley
  • Wei Liu
  • Mark Reynolds

One of the factors hindering the use of classification models in decision making is that their predictions may contradict expectations. In domains such as finance and medicine, the ability to include knowledge of monotone (nondecreasing) relationships is sought after to increase accuracy and user satisfaction. As one of the most successful classifiers, attempts have been made to do so for Random Forest. Ideally a solution would (a) maximise accuracy; (b) have low complexity and scale well; (c) guarantee global monotonicity; and (d) cater for multi-class. This paper first reviews the state-of-theart from both the literature and statistical libraries, and identifies opportunities for improvement. A new rule-based method is then proposed, with a maximal accuracy variant and a faster approximate variant. Simulated and real datasets are then used to perform the most comprehensive ordinal classification benchmarking in the monotone forest literature. The proposed approaches are shown to reduce the bias induced by monotonisation and thereby improve accuracy.

IJCAI Conference 2019 Conference Paper

Geo-ALM: POI Recommendation by Fusing Geographical Information and Adversarial Learning Mechanism

  • Wei Liu
  • Zhi-Jie Wang
  • Bin Yao
  • Jian Yin

Learning user’s preference from check-in data is important for POI recommendation. Yet, a user usually has visited some POIs while most of POIs are unvisited (i. e. , negative samples). To leverage these “no-behavior” POIs, a typical approach is pairwise ranking, which constructs ranking pairs for the user and POIs. Although this approach is generally effective, the negative samples in ranking pairs are obtained randomly, which may fail to leverage “critical” negative samples in the model training. On the other hand, previous studies also utilized geographical feature to improve the recommendation quality. Nevertheless, most of previous works did not exploit geographical information comprehensively, which may also affect the performance. To alleviate these issues, we propose a geographical information based adversarial learning model (Geo-ALM), which can be viewed as a fusion of geographic features and generative adversarial networks. Its core idea is to learn the discriminator and generator interactively, by exploiting two granularity of geographic features (i. e. , region and POI features). Experimental results show that Geo- ALM can achieve competitive performance, compared to several state-of-the-arts.

JMLR Journal 2019 Journal Article

Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

  • Bin Hong
  • Weizhong Zhang
  • Wei Liu
  • Jieping Ye
  • Deng Cai
  • Xiaofei He
  • Jie Wang

Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 2019 Conference Paper

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

  • Yitian Yuan
  • Lin Ma
  • Jingwen Wang
  • Wei Liu
  • Wenwu Zhu

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence. Existing methods mainly tackle this task via matching and aligning semantics between a sentence and candidate video segments, while neglect the fact that the sentence information plays an important role in temporally correlating and composing the described contents in videos. In this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence related video contents over time. More importantly, the proposed SCDM performs dynamically with respect to the diverse video contents so as to establish a more precise matching relationship between sentence and video, thereby improving the temporal grounding accuracy. Extensive experiments on three public datasets demonstrate that our proposed model outperforms the state-of-the-arts with clear margins, illustrating the ability of SCDM to better associate and localize relevant video contents for temporal sentence grounding. Our code for this paper is available at https: //github. com/yytzsy/SCDM.

IJCAI Conference 2018 Conference Paper

A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization

  • Li Wang
  • Junlin Yao
  • Yunzhe Tao
  • Li Zhong
  • Wei Liu
  • Qiang Du

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.

AAAI Conference 2018 Conference Paper

Attention-Based Transactional Context Embedding for Next-Item Recommendation

  • Shoujin Wang
  • Liang Hu
  • Longbing Cao
  • Xiaoshui Huang
  • Defu Lian
  • Wei Liu

To recommend the next item to a user in a transactional context is practical yet challenging in applications such as marketing campaigns. Transactional context refers to the items that are observable in a transaction. Most existing transactionbased recommender systems (TBRSs) make recommendations by mainly considering recently occurring items instead of all the ones observed in the current context. Moreover, they often assume a rigid order between items within a transaction, which is not always practical. More importantly, a long transaction often contains many items irreverent to the next choice, which tends to overwhelm the influence of a few truely relevant ones. Therefore, we posit that a good TBRS should not only consider all the observed items in the current transaction but also weight them with different relevance to build an attentive context that outputs the proper next item with a high probability. To this end, we design an effective attentionbased transaction embedding model (ATEM) for context embedding to weight each observed item in a transaction without assuming order. The empirical study on real-world transaction datasets proves that ATEM significantly outperforms the state-of-the-art methods in terms of both accuracy and novelty.

AAAI Conference 2018 Conference Paper

Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

  • Wei Liu
  • Chaofeng Chen
  • Kwan-Yee Wong

In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our Char- Net is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters. To this end, we introduce a novel hierarchical attention mechanism (HAM) which consists of a recurrent RoIWarp layer and a characterlevel attention layer. The recurrent RoIWarp layer sequentially extracts a feature region corresponding to a character from the feature map produced by the word-level encoder, and feeds it to the character-level encoder which removes the distortion of the character through a simple spatial transformer and further encodes the character region. The character-level attention layer then attends to the most relevant features of the feature map produced by the characterlevel encoder and composes a context vector, which is finally fed to the LSTM-based decoder for decoding. This approach of adopting a simple local transformation to model the distortion of individual characters not only results in an improved efficiency, but can also handle different types of distortion that are hard, if not impossible, to be modelled by a single global transformation. Experiments have been conducted on six public benchmark datasets. Our results show that Char- Net can achieve state-of-the-art performance on all the benchmarks, especially on the IC-IST which contains scene text with large distortion. Code will be made available.

NeurIPS Conference 2018 Conference Paper

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation

  • Wenqi Ren
  • Jiawei Zhang
  • Lin Ma
  • Jinshan Pan
  • Xiaochun Cao
  • Wangmeng Zuo
  • Wei Liu
  • Ming-Hsuan Yang

In this paper, we present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework. The proposed neural network is motivated by the low-rank property of pseudo-inverse kernels. We first compute a generalized low-rank approximation for a large number of blur kernels, and then use separable filters to initialize the convolutional parameters in the network. Our analysis shows that the estimated decomposed matrices contain the most essential information of the input kernel, which ensures the proposed network to handle various blurs in a unified framework and generate high-quality deblurring results. Experimental results on benchmark datasets with noise and saturated pixels demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

NeurIPS Conference 2018 Conference Paper

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

  • Hongteng Xu
  • Wenlin Wang
  • Wei Liu
  • Lawrence Carin

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transport to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled ground-distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving algorithm convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.

NeurIPS Conference 2018 Conference Paper

Generalizing Graph Matching beyond Quadratic Assignment Model

  • Tianshu Yu
  • Junchi Yan
  • Yilin Wang
  • Wei Liu
  • Baoxin Li

Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP). We show that a large family of functions, which we define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters. We also study the properties of global optimality and devise convex/concave-preserving extensions to the widely used Lawler's QAP form. Our theoretical findings show the potential for deriving new algorithms and techniques for graph matching. We deliver solvers based on two specific instances of Separable Functions, and the state-of-the-art performance of our method is verified on popular benchmarks.

AAAI Conference 2018 Conference Paper

Learning to Guide Decoding for Image Captioning

  • Wenhao Jiang
  • Lin Ma
  • Xinpeng Chen
  • Hanwang Zhang
  • Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.

IJCAI Conference 2018 Conference Paper

Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics

  • Yongyi Tang
  • Lin Ma
  • Wei Liu
  • Wei-Shi Zheng

Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons. Recent methods employ the latest hidden states of a recurrent neural network (RNN) to encode the historical skeletons, which can only address short-term prediction. In this work, we propose a motion context modeling by summarizing the historical human motion with respect to the current prediction. A modified highway unit (MHU) is proposed for efficiently eliminating motionless joints and estimating next pose given the motion context. Furthermore, we enhance the motion dynamic by minimizing the gram matrix loss for long-term motion prediction. Experimental results show that the proposed model can promisingly forecast the human future movements, which yields superior performances over related state-of-the-art approaches. Moreover, specifying the motion context with the activity labels enables our model to perform human motion transfer.

NeurIPS Conference 2018 Conference Paper

Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

  • Yunzhe Tao
  • Qi Sun
  • Qiang Du
  • Wei Liu

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing spectrum analysis on the weight matrices of the well-trained networks, and then propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics, thus allowing deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion process and Markov jump process.

NeurIPS Conference 2018 Conference Paper

Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning

  • Xing Yan
  • Weizhong Zhang
  • Lin Ma
  • Wei Liu
  • Qi Wu

We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the time-varying characteristic and the asymmetrical heavy-tail property of financial time series. It combines the merits of a popular sequential neural network model, i. e. , LSTM, with a novel parametric quantile function that we construct to represent the conditional distribution of asset returns. Our model also captures individually the serial dependences of higher moments, rather than just the volatility. Across a wide range of asset classes, the out-of-sample forecasts of conditional quantiles or VaR of our model outperform the GARCH family. Further, the proposed approach does not suffer from the issue of quantile crossing, nor does it expose to the ill-posedness comparing to the parametric probability density function approach.

IJCAI Conference 2018 Conference Paper

Salient Object Detection by Lossless Feature Reflection

  • Pingping Zhang
  • Wei Liu
  • Huchuan Lu
  • Chunhua Shen

Salient object detection, which aims to identify and locate the most salient pixels or regions in images, has been attracting more and more interest due to its various real-world applications. However, this vision task is quite challenging, especially under complex image scenes. Inspired by the intrinsic reflection of natural images, in this paper we propose a novel feature learning framework for large-scale salient object detection. Specifically, we design a symmetrical fully convolutional network (SFCN) to learn complementary saliency features under the guidance of lossless feature reflection. The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions. In addition, to overcome the blurry boundary problem, we propose a new structural loss function to learn clear object boundaries and spatially consistent saliency. The coarse prediction results are effectively refined by these structural information for performance improvements. Extensive experiments on seven saliency detection datasets demonstrate that our approach achieves consistently superior performance and outperforms the very recent state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Semantic Structure-based Unsupervised Deep Hashing

  • Erkun Yang
  • Cheng Deng
  • Tongliang Liu
  • Wei Liu
  • Dacheng Tao

Hashing is becoming increasingly popular for approximate nearest neighbor searching in massive databases due to its storage and search efficiency. Recent supervised hashing methods, which usually construct semantic similarity matrices to guide hash code learning using label information, have shown promising results. However, it is relatively difficult to capture and utilize the semantic relationships between points in unsupervised settings. To address this problem, we propose a novel unsupervised deep framework called Semantic Structure-based unsupervised Deep Hashing (SSDH). We first empirically study the deep feature statistics, and find that the distribution of the cosine distance for point pairs can be estimated by two half Gaussian distributions. Based on this observation, we construct the semantic structure by considering points with distances obviously smaller than the others as semantically similar and points with distances obviously larger than the others as semantically dissimilar. We then design a deep architecture and a pair-wise loss function to preserve this semantic structure in Hamming space. Extensive experiments show that SSDH significantly outperforms current state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Stochastic Non-Convex Ordinal Embedding With Stabilized Barzilai-Borwein Step Size

  • Ke Ma
  • Jinshan Zeng
  • Jiechao Xiong
  • Qianqian Xu
  • Xiaochun Cao
  • Wei Liu
  • Yuan Yao

Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i. e. , stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic non-convex optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate O( 1 T ) in our setting, where T is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.

AAAI Conference 2017 Conference Paper

Adaptive Proximal Average Approximation for Composite Convex Minimization

  • Li Shen
  • Wei Liu
  • Junzhou Huang
  • Yu-Gang Jiang
  • Shiqian Ma

We propose a fast first-order method to solve multi-term nonsmooth composite convex minimization problems by employing a recent proximal average approximation technique and a novel adaptive parameter tuning technique. Thanks to this powerful parameter tuning technique, the proximal gradient step can be performed with a much larger stepsize in the algorithm implementation compared with the prior PA- APG method (Yu 2013), which is the core to enable significant improvements in practical performance. Moreover, by choosing the approximation parameter adaptively, the proposed method is shown to enjoy the O( 1 k ) iteration complexity theoretically without needing any extra computational cost, while the PA-APG method incurs much more iterations for convergence. The preliminary experimental results on overlapping group Lasso and graph-guided fused Lasso problems confirm our theoretic claim well, and indicate that the proposed method is almost five times faster than the stateof-the-art PA-APG method and therefore suitable for higherprecision required optimization.

AAAI Conference 2017 Short Paper

Extracting Highly Effective Features for Supervised Learning via Simultaneous Tensor Factorization

  • Sunny Verma
  • Wei Liu
  • Chen Wang
  • Liming Zhu

Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems’ performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.

NeurIPS Conference 2017 Conference Paper

Geometric Descent Method for Convex Composite Minimization

  • Shixiang Chen
  • Shiqian Ma
  • Wei Liu

In this paper, we extend the geometric descent method recently proposed by Bubeck, Lee and Singh to tackle nonsmooth and strongly convex composite problems. We prove that our proposed algorithm, dubbed geometric proximal gradient method (GeoPG), converges with a linear rate $(1-1/\sqrt{\kappa})$ and thus achieves the optimal rate among first-order methods, where $\kappa$ is the condition number of the problem. Numerical results on linear regression and logistic regression with elastic net regularization show that GeoPG compares favorably with Nesterov's accelerated proximal gradient method, especially when the problem is ill-conditioned.

NeurIPS Conference 2017 Conference Paper

Mixture-Rank Matrix Approximation for Collaborative Filtering

  • Dongsheng Li
  • Chao Chen
  • Wei Liu
  • Tun Lu
  • Ning Gu
  • Stephen Chu

Low-rank matrix approximation (LRMA) methods have achieved excellent accuracy among today's collaborative filtering (CF) methods. In existing LRMA methods, the rank of user/item feature matrices is typically fixed, i. e. , the same rank is adopted to describe all users/items. However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy. In this paper, a mixture-rank matrix approximation (MRMA) method is proposed, in which user-item ratings can be characterized by a mixture of LRMA models with different ranks. Meanwhile, a learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to MRMA. Experimental studies on MovieLens and Netflix datasets demonstrate that MRMA can outperform six state-of-the-art LRMA-based CF methods in terms of recommendation accuracy.

AAAI Conference 2017 Conference Paper

Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval

  • Erkun Yang
  • Cheng Deng
  • Wei Liu
  • Xianglong Liu
  • Dacheng Tao
  • Xinbo Gao

With benefits of low storage cost and fast query speed, crossmodal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.

IJCAI Conference 2017 Conference Paper

Positive unlabeled learning via wrapper-based adaptive sampling

  • Pengyi Yang
  • Wei Liu
  • Jean Yang

Learning from positive and unlabeled data frequently occurs in applications where only a subset of positive instances is available while the rest of the data are unlabeled. In such scenarios, often the goal is to create a discriminant model that can accurately classify both positive and negative data by modelling from labeled and unlabeled instances. In this study, we propose an adaptive sampling (AdaSampling) approach that utilises prediction probabilities from a model to iteratively update the training data. Starting with equal prior probabilities for all unlabeled data, our method "wraps" around a predictive model to iteratively update these probabilities to distinguish positive and negative instances in unlabeled data. Subsequently, one or more robust negative set(s) can be drawn from unlabeled data, according to the likelihood of each instance being negative, to train a single classification model or ensemble of models.

IJCAI Conference 2017 Conference Paper

Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning

  • Wenhao Jiang
  • Cheng Deng
  • Wei Liu
  • Feiping Nie
  • Fu-lai Chung
  • Heng Huang

Domain adaptation problems arise in a variety of applications, where a training dataset from the source domain and a test dataset from the target domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to another feature learning algorithm. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on Amazon review and spam datasets from the ECML/PKDD 2006 discovery challenge.

IJCAI Conference 2016 Conference Paper

Coordinate Discrete Optimization for Efficient Cross-View Image Retrieval

  • Yadong Mu
  • Wei Liu
  • Cheng Deng
  • Zongting Lv
  • Xinbo Gao

Learning compact hash codes has been a vibrant research topic for large-scale similarity search owing to the low storage cost and expedited search operation. A recent research thrust aims to learn compact codes jointly from multiple sources, referred to as cross-view (or cross-modal) hashing in the literature. The main theme of this paper is to develop a novel formulation and optimization scheme for cross-view hashing. As a key differentiator, our proposed method directly conducts optimization on discrete binary hash codes, rather than relaxed continuous variables as in existing cross-view hashing methods. This way relaxation-induced search accuracy loss can be avoided. We attack the cross-view hashing problem by simultaneously capturing semantic neighboring relations and maximizing the generative probability of the learned hash codes in each view. Specifically, to enable effective optimization on discrete hash codes, the optimization proceeds in a block coordinate descent fashion. Each iteration sequentially updates a single bit with others clamped. We transform the resultant sub-problem into an equivalent, more tractable quadratic form and devise an active set based solver on the discrete codes. Rigorous theoretical analysis is provided for the convergence and local optimality condition. Comprehensive evaluations are conducted on three image benchmarks. The clearly superior experimental results faithfully prove the merits of the proposed method.

IJCAI Conference 2016 Conference Paper

Fast Structural Binary Coding

  • Dongjin Song
  • Wei Liu
  • David A. Meyer

Binary coding techniques, which compress originally high-dimensional data samples into short binary codes, are becoming increasingly popular due to their efficiency for information retrieval. Leveraging supervised information can dramatically enhance the coding quality, and hence improve search performance. There are few methods, however, that efficiently learn coding functions that optimize the precision at the top of the Hamming distance ranking list while approximately preserving the geometric relationships between database examples. In this paper, we propose a novel supervised binary coding approach, namely Fast Structural Binary Coding (FSBC), to optimize the precision at the top of a Hamming distance ranking list and ensure that similar images can be returned as a whole. The key idea is to train disciplined coding functions by optimizing a lower bound of the area under the ROC (Receiver Operating Characteristic) curve (AUC) and penalize this objective so that the geometric relationships between database examples in the original Euclidean space are approximately preserved in the Hamming space. To find such a coding function, we relax the original discrete optimization objective with a continuous surrogate, and then derive a stochastic gradient descent method to optimize the surrogate objective efficiently. Empirical studies based upon two image datasets demonstrate that the proposed binary coding approaches achieve superior image search performance to the states-of-the-art.

AAAI Conference 2016 Conference Paper

Scalable Sequential Spectral Clustering

  • Yeqing Li
  • Junzhou Huang
  • Wei Liu

In the past decades, Spectral Clustering (SC) has become one of the most effective clustering approaches. Although it has been widely used, one significant drawback of SC is its expensive computation cost. Many efforts have been devoted to accelerating SC algorithms and promising results have been achieved. However, most of the existing algorithms rely on the assumption that data can be stored in the computer memory. When data cannot fit in the memory, these algorithms will suffer severe performance degradations. In order to overcome this issue, we propose a novel sequential SC algorithm for tackling large-scale clustering with limited computational resources, e. g. , memory. We begin with investigating an effective way of approximating the graph affinity matrix via leveraging a bipartite graph. Then we choose a smart graph construction and optimization strategy to avoid random access to data. These efforts lead to an efficient SC algorithm whose memory usage is independent of the number of input data points. Extensive experiments carried out on large datasets demonstrate that the proposed sequential SC algorithm is up to a thousand times faster than the state-of-thearts.

AAAI Conference 2016 Conference Paper

Teaching-to-Learn and Learning-to-Teach for Multi-label Propagation

  • Chen Gong
  • Dacheng Tao
  • Jie Yang
  • Wei Liu

Multi-label propagation aims to transmit the multi-label information from labeled examples to unlabeled examples based on a weighted graph. Existing methods ignore the specific propagation difficulty of different unlabeled examples and conduct the propagation in an imperfect sequence, leading to the error-prone classification of some difficult examples with uncertain labels. To address this problem, this paper associates each possible label with a “teacher”, and proposes a “Multi-Label Teaching-to-Learn and Learning-to- Teach” (ML-TLLT) algorithm, so that the entire propagation process is guided by the teachers and manipulated from simple examples to more difficult ones. In the teaching-to-learn step, the teachers select the simplest examples for the current propagation by investigating both the definitiveness of each possible label of the unlabeled examples, and the dependencies between labels revealed by the labeled examples. In the learning-to-teach step, the teachers reversely learn from the learner’s feedback to properly select the simplest examples for the next propagation. Thorough empirical studies show that due to the optimized propagation sequence designed by the teachers, ML-TLLT yields generally better performance than seven state-of-the-art methods on the typical multi-label benchmark datasets.

AAAI Conference 2016 Conference Paper

Towards Optimal Binary Code Learning via Ordinal Embedding

  • Hong Liu
  • Rongrong Ji
  • Yongjian Wu
  • Wei Liu

Binary code learning, a. k. a. , hashing, has been recently popular due to its high efficiency in large-scale similarity search and recognition. It typically maps high-dimensional data points to binary codes, where data similarity can be efficiently computed via rapid Hamming distance. Most existing unsupervised hashing schemes pursue binary codes by reducing the quantization error from an original real-valued data space to a resulting Hamming space. On the other hand, most existing supervised hashing schemes constrain binary code learning to correlate with pairwise similarity labels. However, few methods consider ordinal relations in the binary code learning process, which serve as a very significant cue to learn the optimal binary codes for similarity search. In this paper, we propose a novel hashing scheme, dubbed Ordinal Embedding Hashing (OEH), which embeds given ordinal relations among data points to learn the ranking-preserving binary codes. The core idea is to construct a directed unweighted graph to capture the ordinal relations, and then train the hash functions using this ordinal graph to preserve the permutation relations in the Hamming space. To learn such hash functions effectively, we further relax the discrete constraints and design a stochastic gradient decent algorithm to obtain the optimal solution. Experimental results on two large-scale benchmark datasets demonstrate that the proposed OEH method can achieve superior performance over the state-of-the-arts approaches. At last, the evaluation on query by humming dataset demonstrates the OEH also has good performance for music retrieval by using user’s humming or singing.

IJCAI Conference 2016 Conference Paper

Visual Tracking with Reliable Memories

  • Shu Wang
  • Shaoting Zhang
  • Wei Liu
  • Dimitris N. Metaxas

In this paper, we propose a novel visual tracking framework that intelligently discovers reliable patterns from a wide range of video to resist drift error for long-term tracking tasks. First, we design a Discrete Fourier Transform (DFT) based tracker which is able to exploit a large number of tracked samples while still ensures real-time performance. Second, we propose a clustering method with temporal constraints to explore and memorize consistent patterns from previous frames, named as reliable memories. By virtue of this method, our tracker can utilize uncontaminated information to alleviate drifting issues. Experimental results show that our tracker performs favorably against other state-of-the-art methods on benchmark datasets. Furthermore, it is significantly competent in handling drifts and able to robustly track challenging long videos over 4, 000 frames, while most of others lose track at early frames.

AAAI Conference 2015 Conference Paper

Actionable Combined High Utility Itemset Mining

  • Jingyu Shao
  • Junfu Yin
  • Wei Liu
  • Longbing Cao

The itemsets discovered by traditional High Utility Itemsets Mining (HUIM) methods are more useful than frequent itemset mining outcomes; however, they are usually disordered and not actionable, and sometime accidental, because the utility is the only judgement and no relations among itemsets are considered. In this paper, we introduce the concept of combined mining to select combined itemsets that are not only high utility and high frequency, but also involving relations between itemsets. An effective method for mining such actionable combined high utility itemsets is proposed. The experimental results are promising, compared to those from traditional HUIM algorithm (UP-Growth).

AAAI Conference 2015 Conference Paper

Coupled Collaborative Filtering for Context-aware Recommendation

  • Xinxin Jiang
  • Wei Liu
  • Longbing Cao
  • Guodong Long

Context-aware features have been widely recognized as important factors in recommender systems. However, as a major technique in recommender systems, traditional Collaborative Filtering (CF) does not provide a straightforward way of integrating the context-aware information into personal recommendation. We propose a Coupled Collaborative Filtering (CCF) model to measure the contextual information and use it to improve recommendations. In the proposed approach, coupled similarity computation is designed to be calculated by interitem, intra-context and inter-context interactions among item, user and context-ware factors. Experiments based on different types of CF models demonstrate the effectiveness of our design.

AAAI Conference 2015 Conference Paper

Low-Rank Similarity Metric Learning in High Dimensions

  • Wei Liu
  • Cun Mu
  • Rongrong Ji
  • Shiqian Ma
  • John Smith
  • Shih-Fu Chang

Metric learning has become a widespreadly used tool in machine learning. To reduce expensive costs brought in by increasing dimensionality, low-rank metric learning arises as it can be more economical in storage and computation. However, existing low-rank metric learning algorithms usually adopt nonconvex objectives, and are hence sensitive to the choice of a heuristic low-rank basis. In this paper, we propose a novel low-rank metric learning algorithm to yield bilinear similarity functions. This algorithm scales linearly with input dimensionality in both space and time, therefore applicable to high-dimensional data domains. A convex objective free of heuristics is formulated by leveraging trace norm regularization to promote low-rankness. Crucially, we prove that all globally optimal metric solutions must retain a certain low-rank structure, which enables our algorithm to decompose the high-dimensional learning task into two steps: an SVD-based projection and a metric learning problem with reduced dimensionality. The latter step can be tackled efficiently through employing a linearized Alternating Direction Method of Multipliers. The efficacy of the proposed algorithm is demonstrated through experiments performed on four benchmark datasets with tens of thousands of dimensions.

IJCAI Conference 2015 Conference Paper

Modeling Inter- and Intra-Part Deformations for Object Structure Parsing

  • Ling Cai
  • Rongrong Ji
  • Wei Liu
  • Gang Hua

Part deformation has been a longstanding challenge for object parsing, of which the primary difficulty lies in modeling the highly diverse object structures. To this end, we propose a novel structure parsing model to capture deformable object structures. The proposed model consists of two deformable layers: the top layer is an undirected graph that incorporates inter-part deformations to infer object structures; the base layer is consisted of various independent nodes to characterize local intra-part deformations. To learn this two-layer model, we design a layer-wise learning algorithm, which employs matching pursuit and belief propagation for a low computational complexity inference. Specifically, active basis sparse coding is leveraged to build the nodes at the base layer, while the edge weights are estimated by a structural support vector machine. Experimental results on two benchmark datasets (i. e. , faces and horses) demonstrate that the proposed model yields superior parsing performance over state-of-the-art models.

IJCAI Conference 2015 Conference Paper

Multi-View Matrix Decomposition: A New Scheme for Exploring Discriminative Information

  • Cheng Deng
  • Zongting Lv
  • Wei Liu
  • Junzhou Huang
  • Dacheng Tao
  • Xinbo Gao

Recent studies have demonstrated the advantages of fusing information from multiple views for various machine learning applications. However, most existing approaches assumed the shared component common to all views and ignored the private components of individual views, which thereby restricts the learning performance. In this paper, we propose a new multi-view, low-rank, and sparse matrix decomposition scheme to seamlessly integrate diverse yet complementary information stemming from multiple views. Unlike previous approaches, our approach decomposes an input data matrix concatenated from multiple views as the sum of lowrank, sparse, and noisy parts. Then a unified optimization framework is established, where the lowrankness and group-structured sparsity constraints are imposed to simultaneously capture the shared and private components in both instance and view levels. A proven optimization algorithm is developed to solve the optimization, yielding the learned augmented representation which is used as features for classification tasks. Extensive experiments conducted on six benchmark image datasets show that our approach enjoys superior performance over the state-of-the-art approaches.

AAAI Conference 2015 Conference Paper

Optimizing Bag Features for Multiple-Instance Retrieval

  • Zhouyu Fu
  • Feifei Pan
  • Cheng Deng
  • Wei Liu

Multiple-Instance (MI) learning is an important supervised learning technique which deals with collections of instances called bags. While existing research in MI learning mainly focused on classification, in this paper we propose a new approach for MI retrieval to enable effective similarity retrieval of bags of instances, where training data is presented in the form of similar and dissimilar bag pairs. An embedded scheme is devised as encoding each bag into a single bag feature vector by exploiting a similarity-based transformation. In this way, the original MI problem is converted into a single-instance version. Furthermore, we develop a principled approach for optimizing bag features specific to similarity retrieval through leveraging pairwise label information at the bag level. The experimental results demonstrate the effectiveness of the proposed approach in comparison with the alternatives for MI retrieval.

AAAI Conference 2015 Conference Paper

Refer-to-as Relations as Semantic Knowledge

  • Song Feng
  • Sujith Ravi
  • Ravi Kumar
  • Polina Kuznetsova
  • Wei Liu
  • Alexander Berg
  • Tamara Berg
  • Yejin Choi

We study Refer-to-as relations as a new type of semantic knowledge. Compared to the much studied Is-a relation, which concerns factual taxonomic knowledge, Refer-to-as relations aim to address pragmatic semantic knowledge. For example, a “penguin” is a “bird” from a taxonomic point of view, but people rarely refer to a “penguin” as a “bird” in vernacular use. This observation closely relates to the entry-level categorization studied in Psychology. We posit that Refer-toas relations can be learned from data, and that both textual and visual information would be helpful in inferring the relations. By integrating existing lexical structure knowledge with language statistics and visual similarities, we formulate a collective inference approach to map all object names in an encyclopedia to commonly used names for each object. Our contributions include a new labeled data set, the collective inference and optimization approach, and the computed mappings and similarities.

TIST Journal 2015 Journal Article

When Location Meets Social Multimedia

  • Rongrong Ji
  • Yue Gao
  • Wei Liu
  • Xing Xie
  • Qi Tian
  • Xuelong Li

Coming with the popularity of multimedia sharing platforms such as Facebook and Flickr, recent years have witnessed an explosive growth of geographical tags on social multimedia content. This trend enables a wide variety of emerging applications, for example, mobile location search, landmark recognition, scene reconstruction, and touristic recommendation, which range from purely research prototype to commercial systems. In this article, we give a comprehensive survey on these applications, covering recent advances in recognition and mining of geographical-aware social multimedia. We review related work in the past decade regarding to location recognition, scene summarization, tourism suggestion, 3D building modeling, mobile visual search and city navigation. At the end, we further discuss potential challenges, future topics, as well as open issues related to geo-social multimedia computing, recognition, mining, and analytics.

NeurIPS Conference 2014 Conference Paper

Discrete Graph Hashing

  • Wei Liu
  • Cun Mu
  • Sanjiv Kumar
  • Shih-Fu Chang

Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases. In particular, learning based hashing has received considerable attention due to its appealing storage and search efficiency. However, the performance of most unsupervised learning based hashing methods deteriorates rapidly as the hash code length increases. We argue that the degraded performance is due to inferior optimization procedures used to achieve discrete binary codes. This paper presents a graph-based unsupervised hashing model to preserve the neighborhood structure of massive data in a discrete code space. We cast the graph hashing problem into a discrete optimization framework which directly learns the binary codes. A tractable alternating maximization algorithm is then proposed to explicitly deal with the discrete constraints, yielding high-quality codes to well capture the local neighborhoods. Extensive experiments performed on four large datasets with up to one million samples show that our discrete optimization based graph hashing method obtains superior search accuracy over state-of-the-art unsupervised hashing methods, especially for longer codes.

ICRA Conference 2014 Conference Paper

Salient region detection based on local and global saliency

  • Peng Wang 0024
  • Zhi Zhou
  • Wei Liu
  • Hong Qiao

A new and effective salient region detection method based on local and global saliency information is proposed. To keep the completeness of salient regions, the input image is segmented into several regions firstly. Then for each region, local saliency and global saliency are generated respectively. The local saliency is computed by multi-scale neighborhood contrast, and the global saliency is measured according to global spatial distribution and inter-region isolation of features. Based on the local saliency and global saliency, the final saliency can be obtained by the weighted combination of them. The comparison experiment results demonstrate the effective performance of the proposed algorithm on salient region detection.

AAAI Conference 2014 Conference Paper

Sub-Selective Quantization for Large-Scale Image Search

  • Yeqing Li
  • Chen Chen
  • Wei Liu
  • Junzhou Huang

Recently with the explosive growth of visual content on the Internet, large-scale image search has attracted intensive attention. It has been shown that mapping highdimensional image descriptors to compact binary codes can lead to considerable efficiency gains in both storage and similarity computation of images. However, most existing methods still suffer from expensive training devoted to large-scale binary code learning. To address this issue, we propose a sub-selection based matrix manipulation algorithm which can significantly reduce the computational cost of code learning. As case studies, we apply the sub-selection algorithm to two popular quantization techniques PCA Quantization (PCAQ) and Iterative Quantization (ITQ). Crucially, we can justify the resulting sub-selective quantization by proving its theoretic properties. Extensive experiments are carried out on three image benchmarks with up to one million samples, corroborating the efficacy of the sub-selective quantization method in terms of image retrieval.

NeurIPS Conference 2014 Conference Paper

Zeta Hull Pursuits: Learning Nonconvex Data Hulls

  • Yuanjun Xiong
  • Wei Liu
  • Deli Zhao
  • Xiaoou Tang

Selecting a small informative subset from a given dataset, also called column sampling, has drawn much attention in machine learning. For incorporating structured data information into column sampling, research efforts were devoted to the cases where data points are fitted with clusters, simplices, or general convex hulls. This paper aims to study nonconvex hull learning which has rarely been investigated in the literature. In order to learn data-adaptive nonconvex hulls, a novel approach is proposed based on a graph-theoretic measure that leverages graph cycles to characterize the structural complexities of input data points. Employing this measure, we present a greedy algorithmic framework, dubbed Zeta Hulls, to perform structured column sampling. The process of pursuing a Zeta hull involves the computation of matrix inverse. To accelerate the matrix inversion computation and reduce its space complexity as well, we exploit a low-rank approximation to the graph adjacency matrix by using an efficient anchor graph technique. Extensive experimental results show that data representation learned by Zeta Hulls can achieve state-of-the-art accuracy in text and image classification tasks.

IJCAI Conference 2013 Conference Paper

Semi-Supervised Learning with Manifold Fitted Graphs

  • Tongtao Zhang
  • Rongrong Ji
  • Wei Liu
  • Dacheng Tao
  • Gang Hua

In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultaneous neighbor selection and edge weight optimization. The core idea underlying our model is to perform a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the novel neighborhood graph generated by our proposed optimization model M-Fitted Graph since such a graph stems from sparse manifold fitting. To evaluate the robustness and effectiveness of M-fitted graphs, we leverage graph-based semisupervised learning as the testbed. Extensive experiments carried out on six benchmark datasets validate that the proposed M-fitted graph is superior to state-of-the-art neighborhood graphs in terms of classification accuracy using popular graph-based semi-supervised learning methods.

AAAI Conference 2010 Conference Paper

Constrained Metric Learning Via Distance Gap Maximization

  • Wei Liu
  • Xinmei Tian
  • Dacheng Tao
  • Jianzhuang Liu

Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically abstracted as points residing in a Euclidean space. An appropriate distance metric in the data space is quite demanding for a great number of applications. In this paper, we pose robust and tractable metric learning under pairwise constraints that are expressed as similarity judgements between data pairs. The major features of our approach include: 1) it maximizes the gap between the average squared distance among dissimilar pairs and the average squared distance among similar pairs; 2) it is capable of propagating similar constraints to all data pairs; and 3) it is easy to implement in contrast to the existing approaches using expensive optimization such as semidefinite programming. Our constrained metric learning approach has widespread applicability without being limited to particular backgrounds. Quantitative experiments are performed for classification and retrieval tasks, uncovering the effectiveness of the proposed approach.

IJCAI Conference 2009 Conference Paper

  • Wei Liu
  • Buyue Qian
  • Jingyu Cui
  • Jianzhuang Liu

Typical graph-theoretic approaches for semisupervised classification infer labels of unlabeled instances with the help of graph Laplacians. Founded on the spectral decomposition of the graph Laplacian, this paper learns a kernel matrix via minimizing the leave-one-out classification error on the labeled instances. To this end, an efficient algorithm is presented based on linear programming, resulting in a transductive spectral kernel. The idea of our algorithm stems from regularization methodology and also has a nice interpretation in terms of spectral clustering. A simple classifier can be readily built upon the learned kernel, which suffices to give prediction for any data point aside from those in the available dataset. Besides this usage, the spectral kernel can be effectively used in tandem with conventional kernel machines such as SVMs. We demonstrate the efficacy of the proposed algorithm through experiments carried out on challenging classification tasks.

IJCAI Conference 2007 Conference Paper

  • Wei Liu
  • Xiaoou Tang
  • Jianzhuang Liu

This paper develops a statistical inference approach, Bayesian Tensor Inference, for style transformation between photo images and sketch images of human faces. Motivated by the rationale that image appearance is determined by two cooperative factors: image content and image style, we first model the interaction between these factors through learning a patch-based tensor model. Second, by introducing a common variation space, we capture the inherent connection between photo patch space and sketch patch space, thus building bidirectional mapping/inferring between the two spaces. Subsequently, we formulate a Bayesian approach accounting for the statistical inference from sketches to their corresponding photos in terms of the learned tensor model. Comparative experiments are conducted to contrast the proposed method with state-of-the-art algorithms for facial sketch synthesis in a novel face hallucination scenario: sketch-based facial photo hallucination. The encouraging results obtained convincingly validate the effectiveness of our method.