Arrow Research search

Author name cluster

Bin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

104 papers
2 author rows

Possible papers

104

AAAI Conference 2026 Conference Paper

Adaptive Morph-Patch Transformer for Aortic Vessel Segmentation

  • Zhenxi Zhang
  • Fuchen Zheng
  • Adnan Iltaf
  • Yifei Han
  • Zhenyu Cheng
  • Yue Du
  • Bin Li
  • Tianyong Liu

Accurate segmentation of aortic vascular structures is critical for diagnosing and treating cardiovascular diseases. Traditional Transformer-based models have shown promise in this domain by capturing long-range dependencies between vascular features. However, their reliance on fixed-size rectangular patches often influences the integrity of complex vascular structures, leading to suboptimal segmentation accuracy. To address this challenge, we propose the adaptive Morph-Patch Transformer (MPT), a novel architecture specifically designed for aortic vascular segmentation. Specifically, MPT introduces an adaptive patch partitioning strategy that dynamically generates morphology-aware patches aligned with complex vascular structures. This strategy can preserve semantic integrity of complex vascular structures within individual patches. Moreover, a Semantic Clustering Attention (SCA) method is proposed to dynamically aggregate features from various patches with similar semantic characteristics. This method enhances the model's capability to segment vessels of varying sizes, preserving the integrity of vascular structures. Extensive experiments on three open-source datasets (AVT, AortaSeg24 and TBAD) demonstrate that MPT achieves state-of-the-art performance, with improvements in segmenting intricate vascular structures.

AAAI Conference 2026 Conference Paper

E-MaT:Event-oriented Mamba for Egocentric Point Tracking

  • Han Han
  • Wei Zhai
  • Baocai Yin
  • Yang Cao
  • Bin Li
  • Zheng-Jun Zha

Egocentric point tracking aims to localize points on object surfaces from a first-person perspective and serves as a critical step toward embodied intelligence. Recent methods rely on video input, tracking query points through feature matching across consecutive frames. However, these methods struggle in highly dynamic settings—a common challenge in first-person perspectives, where the head-mounted camera undergoes frequent and abrupt rotations, resulting in high angular velocities, motion blur, and large inter-frame displacements. In contrast, event cameras capture motion at microsecond temporal resolution, naturally avoiding blur and delivering low-latency, high-fidelity cues crucial for egocentric point tracking. Moreover, rapid egocentric motion disrupts local smoothness, breaking the assumption that spatially adjacent regions share similar motion. Event dynamics expose global motion trends, guiding coherent modeling and consistent feature flow. Therefore, this paper proposes a mamba-based tracking framework that constructs feature modeling paths aligned with the dominant motion trend extracted from events, and modulates feature propagation along these paths based on local motion intensity, enhancing stability by suppressing unreliable signals and emphasizing consistent cues. Additionally, a motion-adaptive suppression module enhances temporal robustness by adaptively suppressing correlation features based on motion intensity variations, mitigating the effects of intensity fluctuations and partial observability. To facilitate research in this domain, a multimodal dataset named DVS-EgoPoints with both events and videos for egocentric point tracking is collected. Experiments on the DVS-EgoPoints dataset and a simulation benchmark demonstrate superior performance over state-of-the-art methods, especially under challenging motion and occlusion conditions.

AAAI Conference 2026 Conference Paper

From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

  • Ke Niu
  • Haiyang Yu
  • Zhuofan Chen
  • Mengyang Zhao
  • Teng Fu
  • Bin Li
  • Xiangyang Xue

Computer-Aided Design (CAD) plays a vital role in engineering and manufacturing, yet current CAD workflows require extensive domain expertise and manual modeling effort. Recent advances in large language models (LLMs) have made it possible to generate code from natural language, opening new opportunities for automating parametric 3D modeling. However, directly translating human design intent into executable CAD code remains highly challenging, due to the need for logical reasoning, syntactic correctness, and numerical precision. In this work, we propose CAD-RL, a multimodal Chain-of-Thought (CoT) guided reinforcement learning post training framework for CAD modeling code generation. Our method combines CoT-based Cold Start with goal-driven reinforcement learning post training using three task-specific rewards: executability reward, geometric accuracy reward, and external evaluation reward. To ensure stable policy learning under sparse and high-variance reward conditions, we introduce three targeted optimization strategies: Trust Region Stretch for improved exploration, Precision Token Loss for enhanced dimensions parameter accuracy, and Overlong Filtering to reduce noisy supervision. To support training and benchmarking, we release ExeCAD, a noval dataset comprising 16,540 real-world CAD examples with paired natural language and structured design language descriptions, executable CADQuery scripts, and rendered 3D models. Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability over existing VLMs.

AAAI Conference 2026 Conference Paper

OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding

  • Teng Fu
  • Mengyang Zhao
  • Ke Niu
  • Kaixin Peng
  • Bin Li

LVLMs have been shown to perform excellently in image-level tasks such as VQA and caption. However, in many instance-level tasks, such as visual grounding and object detection, LVLMs still show performance gaps compared to previous expert models. Meanwhile, although pedestrian tracking is a classical task, there have been a number of new topics in combining object tracking and natural language, such as Referring MOT, Cross-view Referring MOT, and Semantic MOT. These tasks emphasize that models should understand the tracked object at an advanced semantic level, which is exactly where LVLMs excel. In this paper, we propose a new unified Pedestrian Tracking framework, namely OmniPT, which can track, track based on reference and generate semantic understanding of tracked objects interactively. We address two issues: how to model the tracking task into a task that foundation models can perform, and how to make the model output formatted answers. To this end, we implement a training phase consisting of RL-Mid Training-SFT-RL. Based on the pre-trained weights of the LVLM, we first perform a simple RL phase to enable the model to output fixed and supervisable bounding box format. Subsequently, we conduct a mid-training phase using a large number of pedestrian-related datasets. Finally, we perform supervised fine-tuning on several pedestrian tracking datasets, and then carry out another RL phase to improve the model's tracking performance and enhance its ability to follow instructions. We conduct experiments on tracking benchmarks and the experimental results demonstrate that the proposed method can perform better than the previous methods.

AAAI Conference 2026 Conference Paper

SCoUT: A Framework for Structured Stereotype Analysis in Language Models

  • Jinxuan Wu
  • Bin Li
  • Xiangyang Xue

Existing stereotype auditing methods for Large Language Models (LLMs) typically rely on isolated rating schemes or task-specific probes, lacking theoretical grounding and failing to reveal internal organization beyond surface-level output patterns. In this paper, we introduce SCoUT (Stereotype Content-oriented Utility structure via Thurstonian modeling), a closed-loop framework that structurally models, explicitly probes, and functionally steers stereotype dimensions (warmth and competence) in LLMs. SCoUT first reconstructs a global stereotype utility structure aligned with Stereotype Content Model theory via Thurstonian comparative judgments. Across multiple open-source LLMs, this modeling achieves high pairwise-preference prediction accuracy (≥ 0.90 on larger-scale models) and exhibits strong cross-model consistency. Probing internal attention mechanisms localizes this structure to specific heads (Spearman’s ρ up to 0.83 for warmth and 0.90 for competence) and surfaces a salient asymmetry between warmth and competence. Further, targeted inference-time activation modifications on these dimension-sensitive heads consistently steer model outputs along the intended axes. By bridging behavioral measurement with internal representation and controllable steering, SCoUT offers an end-to-end framework that uncovers and interprets the latent structure of stereotypes, advancing stereotype auditing from surface detection to structural analysis.

EAAI Journal 2025 Journal Article

A unified region and concept-level explainable artificial intelligence method for explainability and active learning of defect segmentation model

  • Huangyuan Wu
  • Bin Li
  • Lianfang Tian
  • Chao Dong
  • Wenzhi Liao

Objective: Despite the Artificial Intelligence (AI) method having achieved great progress in defect segmentation tasks, the explainability of AI method remains a challenge since its black-box property. To guarantee its prediction result can be understood and trusted by users, recent works attempted to explain the model’s decision process through Explainable Artificial Intelligence (XAI) methods. Challenges: However, the existing XAI methods still have some limitations: (1) these XAI methods only focus on explaining model decisions from a single perspective, which usually introduces biased explanations. (2) few works consider how to leverage the explanation mechanism of XAI methods to guide the active learning process of model, which limits the application of XAI methods. Methods: To address these issues, a unified region-level and concept-level explainable AI (RC-XAI) framework is proposed for the explainability and active learning of the defect segmentation model. Novelty: Firstly, RC-XAI incorporates region-level and concept-level explanators in a collaborative manner to provide comprehensive explanations for the model decision. It enhances the reliability and robustness of explanations. Secondly, RC-XAI proposes an explainability-driven representative sample selection (ED-RSS) module to guide the model’s active learning process for improving its final performance. Findings: Experimental results on three challenging datasets demonstrate the effectiveness and generalization of the proposed RC-XAI method. Our method provides better and more comprehensive explainability compared with other XAI methods. Additionally, experiments demonstrate the potential of applying the explanation mechanism of the RC-XAI method to the active learning process of defect segmentation models.

EAAI Journal 2025 Journal Article

An evolutionary multitasking algorithm for multi-objective feature selection using dual-perspective reduction

  • Mengyue Wang
  • Hongwei Ge
  • Xia Wang
  • Liang Sun
  • Yaqing Hou
  • Bin Li

Feature selection inherently involves two conflicting objectives: minimizing the number of selected features and maximizing the classification accuracy. The exponential growth of the search space and complex interactions between features make high-dimensional feature selection challenging. Existing multi-objective methods suffer from slow convergence and limited search capabilities. Moreover, there is a lack of efficient methods for identifying feature subsets with equivalent objective values, which could offer diverse options. To address these issues, this paper proposes an evolutionary multitasking algorithm for multi-objective feature selection using dual-perspective reduction, called DREA-FS. First, a dual-perspective dimensionality reduction strategy is designed to generate simplified and complementary tasks through improved filter-based and group-based methods, facilitating the rapid identification of promising regions. To enable effective information sharing, a dual-archive multitasking optimization mechanism is proposed, which incorporates a diversity archive to preserve feature subsets with equivalent performance and maintain diversity. Coupled with an elite archive that offers convergence guidance, this mechanism achieves a balance between convergence and diversity across tasks, thereby enhancing the ability to search for equivalent feature subsets. Experimental results on 21 datasets demonstrate that the proposed method outperforms state-of-the-art multi-objective algorithms in classification performance. Besides, DREA-FS can identify different feature subsets with equivalent objective values, supporting decision-makers with diverse options and better interpretability.

NeurIPS Conference 2025 Conference Paper

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression

  • Qingyue Yang
  • Jie Wang
  • Xing Li
  • Zhihai Wang
  • Chen Chen
  • Lei Chen
  • Xianzhi Yu
  • Wulong Liu

With the development of large language models (LLMs), efficient inference through Key-Value (KV) cache compression has attracted considerable attention, especially for long-context generation. To compress the KV cache, recent methods identify critical KV tokens through static modeling of attention scores. However, these methods often struggle to accurately determine critical tokens as they neglect the *temporal patterns* in attention scores, resulting in a noticeable degradation in LLM performance. To address this challenge, we propose **AttentionPredictor**, which is the **first learning-based method to directly predict attention patterns for KV cache compression and critical token identification**. Specifically, AttentionPredictor learns a lightweight, unified convolution model to dynamically capture spatiotemporal patterns and predict the next-token attention scores. An appealing feature of AttentionPredictor is that it accurately predicts the attention score and shares the unified prediction model, which consumes negligible memory, among all transformer layers. Moreover, we propose a cross-token critical cache prefetching framework that hides the token estimation time overhead to accelerate the decoding stage. By retaining most of the attention information, AttentionPredictor achieves **13$\times$** KV cache compression and **5. 6$\times$** speedup in a cache offloading scenario with comparable LLM performance, significantly outperforming the state-of-the-arts. The code is available at https: //github. com/MIRALab-USTC/LLM-AttentionPredictor.

NeurIPS Conference 2025 Conference Paper

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

  • Zhihai Wang
  • Zijie Geng
  • Zhaojie Tu
  • Jie Wang
  • Yuxi Qian
  • Zhexuan Xu
  • Ziyan Liu
  • Siyuan Xu

Chip placement is a critical step in the Electronic Design Automation (EDA) workflow, which aims to arrange chip modules on the canvas to optimize the performance, power, and area (PPA) metrics of final designs. Recent advances show great potential of AI-based algorithms in chip placement. However, due to the lengthy EDA workflow, evaluations of these algorithms often focus on intermediate surrogate metrics, which are computationally efficient but often misalign with the final end-to-end performance (i. e. , the final design PPA). To address this challenge, we propose to build ChiPBench, a comprehensive benchmark specifically designed to evaluate the effectiveness of AI-based algorithms in final design PPA metrics. Specifically, we generate a diverse evaluation dataset from $20$ circuits across various domains, such as CPUs, GPUs, and NPUs. We then evaluate six state-of-the-art AI-based chip placement algorithms on the dataset and conduct a thorough analysis of their placement behavior. Extensive experiments show that AI-based chip placement algorithms produce unsatisfactory final PPA results, highlighting the significant influence of often-overlooked factors like regularity and dataflow. We believe ChiPBench will effectively bridge the gap between academia and industry.

NeurIPS Conference 2025 Conference Paper

Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection

  • Yue Zhou
  • Xinan He
  • Kaiqing Lin
  • Bing Fan
  • Feng Ding
  • Bin Li

Current AIGC detectors often achieve near-perfect accuracy on images produced by the same generator used for training but struggle to generalize to outputs from unseen generators. We trace this failure in part to latent prior bias: detectors learn shortcuts tied to patterns stemming from the initial noise vector rather than learning robust generative artifacts. To address this, we propose \textbf{On-Manifold Adversarial Training (OMAT)}: by optimizing the initial latent noise of diffusion models under fixed conditioning, we generate \emph{on-manifold} adversarial examples that remain on the generator’s output manifold—unlike pixel-space attacks, which introduce off-manifold perturbations that the generator itself cannot reproduce and that can obscure the true discriminative artifacts. To test against state-of-the-art generative models, we introduce GenImage++, a test-only benchmark of outputs from advanced generators (Flux. 1, SD3) with extended prompts and diverse styles. We apply our adversarial-training paradigm to ResNet50 and CLIP baselines and evaluate across existing AIGC forensic benchmarks and recent challenge datasets. Extensive experiments show that adversarially trained detectors significantly improve cross-generator performance without any network redesign. Our findings on latent-prior bias offer valuable insights for future dataset construction and detector evaluation, guiding the development of more robust and generalizable AIGC forensic methodologies.

IJCAI Conference 2025 Conference Paper

Code-BT: A Code-Driven Approach to Behavior Tree Generation for Robot Tasks Planning with Large Language Models

  • Siyang Zhang
  • Bin Li
  • Jingtao Qi
  • Xueying Wang
  • Fu Li
  • Jianan Wang
  • En Zhu
  • Jinjing Sun

Behavior trees(BTs) provide a systematic and structured control architecture extensively employed in game AI and robotic behavior control, owing to their modularity, reactivity, and reusability. Nonetheless, manual BTs design requires significant expertise and becomes inefficient as task complexity increases. Recent automation technologies have avoided manual work, but often have high application barriers and face challenges in adapting to new tasks, making it difficult to easily configure them to specific requirements. Code-BT introduces a novel approach that utilizes large language models(LLMs) to automatically generate BTs, representing the task planning process as the process of coding and organizing sequences. By retrieving control flow information from the generated code, BTs can be efficiently constructed to address the complexity and diversity of task planning challenges. Rather than relying on manual design, Code-BT uses task instructions to guide the selection of relevant APIs, and then systematically assembles these APIs into modular code to align with the BTs structure. Finally, action sequences and control logic are extracted from the generated code to construct the BTs. Our approach not only ensures the automation of BTs generation but also guarantees the scalability and adaptability for long-term tasks. Experimental results demonstrate that Code-BT substantially improves LLM performance in BTs generation, achieving improvements ranging from16. 67% to 29. 17%.

EAAI Journal 2025 Journal Article

Complete information extraction for monocular depth estimation using a dual framework

  • Bin Li
  • Dazheng Zhou
  • Xianjie Gao
  • Mingliang Zhang

This paper aims to address the problem of efficient extraction of complete multi-scale information for supervised monocular depth estimation. Most of the existing depth estimation methods are based on Convolutional Neural Network (CNN). By gradually exploring the contextual and semantic features, they have achieved good results in scene depth estimation. However, with the expansion of the receptive field, global information limited by the local induction bias is gradually suppressed, resulting in the performance cannot be further improved. Recently, Transformer-based methods have been widely used to model the global correlation between features. Nevertheless, since the Transformer networks are not spatially aware enough, they usually lose local details and have no clear mechanism for reusing features when processing images. The Transformer networks perform self-attention mechanism at each location and cannot directly obtain information from other locations for features. Therefore, we propose a novel dual framework called as Transformer-CNN, which includes the Transformer-branch and the CNN-branch for monocular depth estimation. Specifically, the Transformer-branch is able to model the global contextual information and the CNN-branch can capture local spatial relationships in images. However, simply fusing these two independent branches may result in insufficient feature aggregation. To this end, we design a Parallel Feature Interaction Module (PFIM), which contains a Self-Attention Module (SAM) and a Cross-Attention Module (CAM), so as to highlight features from the Transformer-branch and the CNN-branch respectively and extract complementary information between the two branches. Meanwhile, in order to make full use of the low-level features with low quality in the scene, we propose a Low-level Information Acquisition Module (LIAM) to capture texture-related information and preserve texture details in the CNN-branch. Finally, to address the lack of multi-scale contextual information in Vision Transformer (ViT), we introduce a Wide Area Multi-scale Decoder (WAMD), which incorporates the multi-scale feature representations into the decoder part via a Wide Area Attention (WAA). Extensive experiments on benchmark datasets collected in the outdoor and indoor environments demonstrate the competitive results of the proposed method, compared with the state-of-the-art monocular depth estimation methods.

NeurIPS Conference 2025 Conference Paper

CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

  • Ke Niu
  • Zhuofan Chen
  • Haiyang Yu
  • Yuwen Chen
  • Teng Fu
  • Mengyang Zhao
  • Bin Li
  • Xiangyang Xue

Computer-Aided Design (CAD) is pivotal in industrial manufacturing, with orthographic projection reasoning foundational to its entire workflow—encompassing design, manufacturing, and simulation. However, prevailing deep-learning approaches employ standard 3D reconstruction pipelines as an alternative, which often introduce imprecise dimensions and limit the parametric editability required for CAD workflows. Recently, some researchers adopt vision–language models (VLMs), particularly supervised fine-tuning (SFT), to tackle CAD-related challenges. SFT shows promise but often devolves into pattern memorization, resulting in poor out-of-distribution (OOD) performance on complex reasoning tasks. To tackle these limitations, we introduce CReFT-CAD, a two-stage fine-tuning paradigm: first, a curriculum-driven reinforcement learning stage with difficulty-aware rewards to steadily build reasoning abilities; second, supervised post-tuning to refine instruction following and semantic extraction. Complementing this, we release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning, comprising 200, 000 synthetic and 3, 000 real-world orthographic projections with precise dimensional annotations and six interoperable data modalities. Benchmarking leading VLMs on orthographic projection reasoning, we show that CReFT-CAD significantly improves reasoning accuracy and OOD generalizability in real-world scenarios, providing valuable insights to advance CAD reasoning research. The code and adopted datasets are available at \url{https: //github. com/KeNiu042/CReFT-CAD}.

NeurIPS Conference 2025 Conference Paper

Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding

  • Xiaoyi Zhang
  • Zhaoyang Jia
  • Zongyu Guo
  • Jiahao Li
  • Bin Li
  • Houqiang Li
  • Yan Lu

Long-form video understanding presents significant challenges due to extensive temporal-spatial complexity and the difficulty of question answering under such extended contexts. While Large Language Models (LLMs) have demonstrated considerable advancements in video analysis capabilities and long context handling, they continue to exhibit limitations when processing information-dense hour-long videos. To overcome such limitations, we propose the $\textbf{D}eep \ \textbf{V}ideo \ \textbf{D}iscovery \ (\textbf{DVD})$ agent to leverage an $\textit{agentic search}$ strategy over segmented video clips. Different from previous video agents manually designing a rigid workflow, our approach emphasizes the autonomous nature of agents. By providing a set of search-centric tools on multi-granular video database, our DVD agent leverages the advanced reasoning capability of LLM to plan on its current observation state, strategically selects tools to orchestrate adaptive workflow for different queries in light of the gathered information. We perform comprehensive evaluation on multiple long video understanding benchmarks that demonstrates our advantage. Our DVD agent achieves state-of-the-art performance on the challenging LVBench dataset, reaching an accuracy of $\textbf{74. 2\%}$, which substantially surpasses all prior works, and further improves to $\textbf{76. 0\%}$ with transcripts.

AAAI Conference 2025 Conference Paper

Foundation Model Driven Appearance Extraction for Robust Multiple Object Tracking

  • Teng Fu
  • Haiyang Yu
  • Ke Niu
  • Bin Li
  • Xiangyang Xue

Multiple Object Tracking (MOT) is a fundamental task in computer vision. Existing methods utilize motion information or appearance information to perform object tracking. However, these algorithms still struggle with special circumstances, such as occlusion and blurring in complex scenes. Inspired by the fact that people can pinpoint objects through verbal descriptions, we explore performing long-term robust tracking using semantic features of objects. Motivated by the success of the multimodal foundation model in text-image alignment, we reconsider the appearance feature extraction module in MOT and propose a Foundation model Driven multi-object tracker (FDTracker). Specifically, we propose a two-stage trained appearance feature extractor. In the first stage, using a single image of the object as input, the model could capture the attributes of objects with the assistance of natural language instructions. In the second stage, using a sequence of images of objects as input, the model learns how to use these attributes to distinguish between different objects and connect the same object at different times. Finally, for coordinating appearance and motion information, we propose a reasonable combined strategy, which better facilitates trajectory assignment and reconnection. Extensive experiments on benchmarks demonstrate the robustness of FDTracker.

NeurIPS Conference 2025 Conference Paper

Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes

  • Kaiqing Lin
  • Zhiyuan Yan
  • Ke-Yue Zhang
  • Li Hao
  • Yue Zhou
  • Yuzhen Lin
  • Weixiang Li
  • Taiping Yao

Securing personal identity against deepfake attacks is increasingly critical in the digital age, especially for celebrities and political figures whose faces are easily accessible and frequently targeted. Most existing deepfake detection methods focus on general-purpose scenarios and often ignore the valuable prior knowledge of known facial identities, e. g. , "VIP individuals" whose authentic facial data are already available. In this paper, we propose VIPGuard, a unified multimodal framework designed to capture fine-grained and comprehensive facial representations of a given identity, compare them against potentially fake or similar-looking faces, and reason over these comparisons to make accurate and explainable predictions. Specifically, our framework consists of three main stages. First, we fine-tune a multimodal large language model (MLLM) to learn detailed and structural facial attributes. Second, we perform identity-level discriminative learning to enable the model to distinguish subtle differences between highly similar faces, including real and fake variations. Finally, we introduce user-specific customization, where we model the unique characteristics of the target face identity and perform semantic reasoning via MLLM to enable personalized and explainable deepfake detection. Our framework shows clear advantages over previous detection works, where traditional detectors mainly rely on low-level visual cues and provide no human-understandable explanations, while other MLLM-based models often lack a detailed understanding of specific face identities. To facilitate the evaluation of our method, we build a comprehensive identity-aware benchmark called VIPBench for personalized deepfake detection, involving the latest 7 face-swapping and 7 entire face synthesis techniques for generation. Extensive experiments show that our model outperforms existing methods in both detection and explanation. The code is available at https: //github. com/KQL11/VIPGuard.

NeurIPS Conference 2025 Conference Paper

LogicTree: Improving Complex Reasoning of LLMs via Instantiated Multi-step Synthetic Logical Data

  • Zehao Wang
  • Lin Yang
  • Jie Wang
  • Kehan Wang
  • Hanzhu Chen
  • Bin Wang
  • Jianye Hao
  • Defu Lian

Despite their remarkable performance on various tasks, Large Language Models (LLMs) still struggle with logical reasoning, particularly in complex and multi-step reasoning processes. Among various efforts to enhance LLMs' reasoning capabilities, synthesizing large-scale, high-quality logical reasoning datasets has emerged as a promising direction. However, existing methods often rely on predefined templates for logical reasoning data generation, limiting their adaptability to real-world scenarios. To address the limitation, we propose LogicTree, a novel framework for efficiently synthesizing multi-step logical reasoning dataset that excels in both complexity and instantiation. By iteratively searching for applicable logic rules based on structural pattern matching to perform backward deduction, LogicTree constructs multi-step logic trees that capture complex reasoning patterns. Furthermore, we employ a two-stage LLM-based approach to instantiate various real-world scenarios for each logic tree, generating consistent real-world reasoning processes that carry contextual significance. This helps LLMs develop generalizable logical reasoning abilities across diverse scenarios rather than merely memorizing templates. Experiments on multiple benchmarks demonstrate that our approach achieves an average improvement of 9. 4\% in accuracy on complex logical reasoning tasks.

EAAI Journal 2025 Journal Article

Low-cost and sparsity for continual semantic segmentation

  • Qing Ji
  • Bin Li
  • Shaobo Li
  • Hongchao An
  • Jing Yang

Deep neural networks have contributed to significant progress in semantic segmentation tasks. However, deep neural networks exhibit a critical drop in performance due to catastrophic forgetting when they are required to learn new tasks incrementally. The more plastic the network is, the easier it can learn new tasks. Whereas, for continual semantic segmentation, it is more reliable to preserve the knowledge it has learned from previous tasks. Here, gated 0-1 Bernoulli variable is used as a regularization method to optimize performance by enhancing network sparsity. Then, the special case of gated 0-1 Bernoulli variable is applied in the replay-based method of continual semantic segmentation. Specifically, when the value of the sub-network sampling rate reaches 0. 5, the network reaches the strongest stability. Finally, the gated 0-1 Bernoulli variable improves the network’s performance in complex scenarios and reduces cost under similar performance. Experimental results indicate that in using 100% samples for incremental training, the Mean Intersection over Union(mIoU) of the old classes improves by up to 4. 6% and 5. 5% compared to the baseline at the end of the overall training in continual semantic segmentation scenarios 10-1 and 10-2. Furthermore, in using 60% samples for incremental training, the performance for the old tasks only drops by less than a percentage, while the time cost to complete the full setup decreases by 22%.

JBHI Journal 2025 Journal Article

MambaVesselNet: A Novel Approach to Blood Vessel Segmentation Based on State-Space Models

  • Tianyong Liu
  • Zhiqing Zhang
  • Guojia Fan
  • Bin Li
  • Shoujun Zhou
  • Chengwu Xu
  • Gang Zhao
  • Fuxia Yang

Three-dimensional blood vessel segmentation is an important and challenging task that faces two main difficulties: (1) blood vessel structures are small, making them hard to capture by the network, and vessel edges are difficult to segment accurately; (2) false positives are prone to occur due to the presence of artifacts and noise. This paper proposes a novel blood vessel segmentation method called MambaVesselNet. This method is based on a state-space model and employs a selective state-space time series modeling strategy to achieve a larger receptive field. To better capture fine vascular structures and accurately segment edges, this paper introduces an edge enhancement module and a feature selection module. In terms of data preprocessing, nnUNet's preprocessing strategy is adopted to ensure spatial consistency of the input data. Evaluation on three standard vascular segmentation benchmarks shows that MambaVesselNet achieves state-of-the-art performance. Specifically, on cardiovascular and liver vessel datasets, the Dice coefficient is improved by 1. 38% and 2. 69%, respectively. The contributions of this paper include the proposal of a new module for enhancing blood vessel edge features, the development of a feature selection module with long sequence modeling capability, and the adoption of nnUNet's data preprocessing strategy, setting a new benchmark for blood vessel segmentation technology.

NeurIPS Conference 2025 Conference Paper

One-Step Diffusion-Based Image Compression with Semantic Distillation

  • Naifu Xue
  • Zhaoyang Jia
  • Jiahao Li
  • Bin Li
  • Yuan Zhang
  • Yan Lu

While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasant latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec—that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 39% bitrate reduction and 20× faster decoding compared to prior multi-step diffusion-based codecs. Project: https: //onedc-codec. github. io/

AAAI Conference 2025 Conference Paper

Query-efficient Attack for Black-box Image Inpainting Forensics via Reinforcement Learning

  • Xianbo Mo
  • Shunquan Tan
  • Bin Li
  • Jiwu Huang

Recently, image inpainting has become a common tool for manipulating nature images in a malicious manner, which has led to the rapid advancement of inpainting forensics. Although current forensics methods have shown precise location of inpainting regions and reliable robustness against image post-processing operations, it remains unclear whether they can effectively resist the possible attacks in real-world scenarios. To identify potential flaws, we propose a novel black-box anti-forensics framework to attack inpainting forensics methods, which employs reinforcement learning to generate a query-efficient countermeasure, named RLGC. To this end, we define reinforcement learning paradigm to model the Markov Decision Process of query-based black-box anti-forensics scenario. Specifically, pixel-wise agents are used to modulate anti-forensics images based on action selection and query forensics methods to obtain corresponding outputs. Later, reward function evaluates attack effect and image distortion with these outputs. To maximize the cumulative reward, policy and value networks are integrated and trained by Asynchronous Advantage Actor-Critic algorithm. Experimental results demonstrate that, without visually detectable distortion on anti-forensics images, RLGC achieves remarkable attack effects in a highly query-effcient way against various black-box inpainting forensics methods, even outperforming the most representative white-box attack method.

AAAI Conference 2025 Conference Paper

SPAC: Sparse Partitioning and Adaptive Core Tensor Pruning Model for Knowledge Graph Completion

  • Chuhong Yang
  • Bin Li
  • Nan Wu

Tensor decomposition (TD) models are promising solutions for knowledge graph completion due to their simple structures but powerful representation capacities. The TD models typically adopt Tucker decomposition with a structured core tensor. Some models with a sparse core tensor, such as DistMult and ComplEx, are too simple and thus limit the interaction between embedding components, while other models with a dense core tensor are too complex and may lead to significant overfitting. To address these issues, we propose a new TD model called SPAC (Sparse Partitioning and Adaptive Core tensor pruning) model for knowledge graph completion. Specifically, SPAC captures coarse and fine-grained semantic information using a hybrid core tensor, where auxiliary cores are used to model sparse interactions and main cores for dense interactions. Moreover, SPAC introduces a gating mechanism to control the output of intermediate variables, enhancing the interaction between different partition groups. Furthermore, SPAC employs an adaptive pruning approach to dynamically adjust the shape of the core tensor. Due to the elaborate model design, the proposed TD model enhances expressive capacity and reduces the number of parameters in the core tensor. Experiments are conducted on datasets FB15k-237, WN18RR, and YAGO3-10. The results demonstrate that SPAC outperforms state-of-the-art tensor decomposition models, including MEIM and Tucker models. A series of ablation studies show that the gating mechanism and adaptive pruning strategy in SPAC are crucial for the performance improvement.

AAAI Conference 2025 Conference Paper

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

  • Kaiqing Lin
  • Yuzhen Lin
  • Weixiang Li
  • Taiping Yao
  • Bin Li

The proliferation of deepfake faces poses huge potential negative impacts on our daily lives. Despite substantial advancements in deepfake detection over these years, the generalizability of existing methods against forgeries from unseen datasets or created by emerging generative models remains constrained. In this paper, inspired by the zero-shot advantages of Vision-Language Models (VLMs), we propose a novel approach that repurposes a well-trained VLM for general deepfake detection. Motivated by the model reprogramming paradigm that manipulates the model prediction via input perturbations, our method can reprogram a pre-trained VLM model (e.g., CLIP) solely based on manipulating its input without tuning the inner parameters. First, learnable visual perturbations are used to refine feature extraction for deepfake detection. Then, we exploit information of face embedding to create sample-level adaptative text prompts, improving the performance. Extensive experiments on several popular benchmark datasets demonstrate that (1) the cross dataset and cross-manipulation performances of deepfake detection can be significantly and consistently improved (e.g., over 88% AUC in cross-dataset setting from FF++ to Wild-Deepfake); (2) the superior performances are achieved with fewer trainable parameters, making it a promising approach for real-world applications.

IJCAI Conference 2025 Conference Paper

Strategyproofness and Monotone Allocation of Auction in Social Networks

  • Yuhang Guo
  • Dong Hao
  • Bin Li
  • Mingyu Xiao
  • Bakh Khoussainov

Strategyproofness in network auctions requires that bidders not only report their valuations truthfully, but also do their best to invite neighbours from the social network. In contrast to canonical auctions, where the value-monotone allocation in Myerson's Lemma is a cornerstone, a general principle of allocation rules for strategyproof network auctions is still missing. We show that, due to the absence of such a principle, even extensions to multi-unit network auctions with single-unit demand present unexpected difficulties, and all pioneering researches fail to be strategyproof. For the first time in this field, we identify two categories of monotone allocation rules on networks: Invitation-Depressed Monotonicity (ID-MON) and Invitation-Promoted Monotonicity (IP-MON). They encompass all existing allocation rules of network auctions as specific instances. For any given ID-MON or IP-MON allocation rule, we characterize the existence and sufficient conditions for the strategyproof payment rules, and show that among all such payment rules, the revenue-maximizing one exists and is computationally feasible. With these results, the obstacle of combinatorial network auction with single-minded bidders is now resolved.

JBHI Journal 2025 Journal Article

TGAP-Net: Twin Graph Attention Pseudo-Label Generation for Weakly Supervised Semantic Segmentation

  • Haohua Chen
  • Yishu Deng
  • Zhensheng Hu
  • Bin Li
  • Bingzhong Jing
  • Chaofeng Li

Multilabel pathological tissue segmentation is a vital task in computational pathology that aims to semantically segment different tissues within pathological images. Fully and weakly supervised models have demonstrated impressive performances in this regard. However, weakly supervised models still face challenges, such as the poor performance of nondominant samples and limited effectiveness of aggregation functions in conveying supervisory signals. To address these issues, we propose two key contributions: the introduction of a graph attention network(GAT) module to establish contextual relationships between pixels within patches and generate high-quality pseudo-labels, and the development of a novel global classified max pooling(GCMP) aggregation function that effectively transmits the supervision signal from weakly annotated labels and improves the model's classification accuracy. The experimental results show that our method improved the MIoU scores by 3. 3 and 3 for the nondominant samples, necrosis(NEC) and lymphocytes(LYM), respectively, in the LUAD-HistoSeg test set. This led to an overall MIoU of 0. 774, which is a 1. 8 increase in the state-of-the-art(SOTA) performance. Similarly, our approach improved MIoU scores by 5. 7 and 2 on the NEC and LYM samples, respectively, in the Breast Cancer Semantic Segmentation(BCSS) test set, resulting in an overall MIoU of 0. 721. This represents a 1. 6 increase in SOTA performance. In summary, our work addresses the issues of poor performance on nondominant samples and the suboptimal performance of aggregation functions. We propose a novel approach to achieve a significant performance improvement. This is extremely significant in reducing the workload of manual annotation and promoting the development of computational pathologies.

NeurIPS Conference 2025 Conference Paper

VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models

  • Xinan He
  • Yue Zhou
  • Bing Fan
  • Bin Li
  • Guopu Zhu
  • Feng Ding

Faces synthesized by diffusion models (DMs) with high-quality and controllable attributes pose a significant challenge for Deepfake detection. Most state-of-the-art detectors only yield a binary decision, incapable of forgery localization, attribution of forgery methods, and providing analysis on the cause of forgeries. In this work, we integrate Multimodal Large Language Models (MLLMs) within DM-based face forensics, and propose a fine-grained analysis triad framework called VLForgery, that can 1) predict falsified facial images; 2) locate the falsified face regions subjected to partial synthesis; and 3) attribute the synthesis with specific generators. To achieve the above goals, we introduce VLF (Visual Language Forensics), a novel and diverse synthesis face dataset designed to facilitate rich interactions between Visual' and Language' modalities in MLLMs. Additionally, we propose an extrinsic knowledge-guided description method, termed EkCot, which leverages knowledge from the image generation pipeline to enable MLLMs to quickly capture image content. Furthermore, we introduce a low-level vision comparison pipeline designed to identify differential features between real and fake that MLLMs can inherently understand. These features are then incorporated into EkCot, enhancing its ability to analyze forgeries in a structured manner, following the sequence of detection, localization, and attribution. Extensive experiments demonstrate that VLForgery outperforms other state-of-the-art forensic approaches in detection accuracy, with additional potential for falsified region localization and attribution analysis.

AAAI Conference 2025 Conference Paper

VQLTI: Long-Term Tropical Cyclone Intensity Forecasting with Physical Constraints

  • Xinyu Wang
  • Lei Liu
  • Kang Chen
  • Tao Han
  • Bin Li
  • Lei Bai

Tropical cyclone (TC) intensity forecasting is crucial for early disaster warning and emergency decision-making. Numerous researchers have explored deep-learning methods to address computational and post-processing issues in operational forecasting. Regrettably, they exhibit subpar long-term forecasting capabilities. We use two strategies to enhance long-term forecasting. (1) By enhancing the matching between TC intensity and spatial information, we can improve long-term forecasting performance. (2) Incorporating physical knowledge and physical constraints can help mitigate the accumulation of forecasting errors. To achieve the above strategies, we propose the VQLTI framework. VQLTI transfers the TC intensity information to a discrete latent space while retaining the spatial information differences, using large-scale spatial meteorological data as conditions. Furthermore, we leverage the forecast from the weather prediction model FengWu to provide additional physical knowledge for VQLTI. Additionally, we calculate the potential intensity (PI) to impose physical constraints on the latent variables. In the global long-term TC intensity forecasting, VQLTI achieves state-of-the-art results for the 24h to 120h, with the MSW (Maximum Sustained Wind) forecast error reduced by 35.65%-42.51% compared to ECMWF-IFS.

AAAI Conference 2024 Conference Paper

Discriminative Forests Improve Generative Diversity for Generative Adversarial Networks

  • Junjie Chen
  • Jiahao Li
  • Chen Song
  • Bin Li
  • Qingcai Chen
  • Hongchang Gao
  • Wendy Hui Wang
  • Zenglin Xu

Improving the diversity of Artificial Intelligence Generated Content (AIGC) is one of the fundamental problems in the theory of generative models such as generative adversarial networks (GANs). Previous studies have demonstrated that the discriminator in GANs should have high capacity and robustness to achieve the diversity of generated data. However, a discriminator with high capacity tends to overfit and guide the generator toward collapsed equilibrium. In this study, we propose a novel discriminative forest GAN, named Forest-GAN, that replaces the discriminator to improve the capacity and robustness for modeling statistics in real-world data distribution. A discriminative forest is composed of multiple independent discriminators built on bootstrapped data. We prove that a discriminative forest has a generalization error bound, which is determined by the strength of individual discriminators and the correlations among them. Hence, a discriminative forest can provide very large capacity without any risk of overfitting, which subsequently improves the generative diversity. With the discriminative forest framework, we significantly improved the performance of AutoGAN with a new record FID of 19.27 from 30.71 on STL10 and improved the performance of StyleGAN2-ADA with a new record FID of 6.87 from 9.22 on LSUN-cat.

AIIM Journal 2024 Journal Article

EHR coding with hybrid attention and features propagation on disease knowledge graph

  • Tianhan Xu
  • Bin Li
  • Ling Chen
  • Chao Yang
  • Yixun Gu
  • Xiang Gu

And sentences associated with these attributes and relationships have been neglected. in this paper ►We propose an end-to-end model called Knowledge Graph Enhanced neural network (KGENet) to address the above shortcomings. specifically ►We first construct a disease knowledge graph that focuses on the multi-view disease attributes of ICD codes and the disease relationships between these codes. we also use a long sequence encoder to get EHR document representation. most importantly ►KGENet leverages multi-view disease attributes and structured disease relationships for knowledge enhancement through hybrid attention and graph propagation ►Respectively. furthermore ►The above processes can provide attribute-aware and relationship-augmented explainability for the model prediction results based on our disease knowledge graph. experiments conducted on the MIMIC-III benchmark dataset show that KGENet outperforms state-of-the-art models in both model effectiveness and explainability Electronic health record (EHR) coding assigns International Classification of Diseases (ICD) codes to each EHR document. These standard medical codes represent diagnoses or procedures and play a critical role in medical applications. However, EHR is a long medical text that is difficult to represent, the ICD code label space is large, and the labels have an extremely unbalanced distribution. These factors pose challenges to automatic EHR coding. Previous studies have not explored the disease attributes (e. g. , symptoms, tests, medications) of ICD codes and the disease relationships (e. g. , causes, risk factors, comorbidities) between them. In addition, the important roles of medical

AAAI Conference 2024 Conference Paper

Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis

  • Dong Wei
  • Xiaoning Sun
  • Huaijiang Sun
  • Shengxiang Hu
  • Bin Li
  • Weiqing Li
  • Jianfeng Lu

The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently. However, in most cases, textual expressions only contain general and qualitative motion descriptions, while lack fine depiction and sufficient intensity, leading to the synthesized motions that either (a) semantically compliant but uncontrollable over specific pose details, or (b) even deviates from the provided descriptions, bringing animators with undesired cases. In this paper, we propose DiffKFC, a conditional diffusion model for text-driven motion synthesis with KeyFrames Collaborated, enabling realistic generation with collaborative and efficient dual-level control: coarse guidance at semantic level, with only few keyframes for direct and fine-grained depiction down to body posture level. Unlike existing inference-editing diffusion models that incorporate conditions without training, our conditional diffusion model is explicitly trained and can fully exploit correlations among texts, keyframes and the diffused target frames. To preserve the control capability of discrete and sparse keyframes, we customize dilated mask attention modules where only partial valid tokens participate in local-to-global attention, indicated by the dilated keyframe mask. Additionally, we develop a simple yet effective smoothness prior, which steers the generated frames towards seamless keyframe transitions at inference. Extensive experiments show that our model not only achieves state-of-the-art performance in terms of semantic fidelity, but more importantly, is able to satisfy animator requirements through fine-grained guidance without tedious labor.

AAAI Conference 2024 Conference Paper

Exploring One-Shot Semi-supervised Federated Learning with Pre-trained Diffusion Models

  • Mingzhao Yang
  • Shangchao Su
  • Bin Li
  • Xiangyang Xue

Recently, semi-supervised federated learning (semi-FL) has been proposed to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients. However, existing methods face several challenges such as communication costs, data heterogeneity, and training pressure on client devices. To address these challenges, we introduce the powerful diffusion models (DM) into semi-FL and propose FedDISC, a Federated Diffusion-Inspired Semi-supervised Co-training method. Specifically, we first extract prototypes of the labeled server data and use these prototypes to predict pseudo-labels of the client data. For each category, we compute the cluster centroids and domain-specific representations to signify the semantic and stylistic information of their distributions. After adding noise, these representations are sent back to the server, which uses the pre-trained DM to generate synthetic datasets complying with the client distributions and train a global model on it. With the assistance of vast knowledge within DM, the synthetic datasets have comparable quality and diversity to the client images, subsequently enabling the training of global models that achieve performance equivalent to or even surpassing the ceiling of supervised centralized training. FedDISC works within one communication round, does not require any local training, and involves very minimal information uploading, greatly enhancing its practicality. Extensive experiments on three large-scale datasets demonstrate that FedDISC effectively addresses the semi-FL problem on non-IID clients and outperforms the compared SOTA methods. Sufficient visualization experiments also illustrate that the synthetic dataset generated by FedDISC exhibits comparable diversity and quality to the original client dataset, with a neglectable possibility of leaking privacy-sensitive information of the clients.

AAMAS Conference 2024 Conference Paper

Extended Abstract of Diffusion Auction Design with Transaction Costs

  • Bin Li
  • Dong Hao
  • Dengji Zhao

We study multi-unit diffusion auctions powered by intermediated markets, where all transactions are processed by intermediaries and incur certain costs. The classic Vickrey-Clarke-Groves (VCG) mechanism within the scenario can obtain the maximum social welfare, but it can lead to a deficit for the seller. To address the revenue issue, we develop two deficit reduction strategies and further propose a family of diffusion auctions, called Critical Neighborhood Auctions (CNA). The CNA not only maximizes the social welfare, but also achieves a (non-negative) revenue that is no less than the revenue given by the VCG mechanism with/without intermediaries. This is the first set of diffusion auctions with welfare and revenue advantages that can handle multiple items and transaction costs.

AAAI Conference 2024 Conference Paper

Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning

  • Shangchao Su
  • Mingzhao Yang
  • Bin Li
  • Xiangyang Xue

Federated learning (FL) enables multiple clients to collaboratively train a global model without disclosing their data. Previous researches often require training the complete model parameters. However, the emergence of powerful pre-trained models makes it possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a meta prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and meta prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets across two different settings -- supervised and unsupervised. The results show that FedAPT can achieve better performance with less than 10% of the number of parameters of the fully trained model, and the global model can perform well in diverse client domains simultaneously.

NeurIPS Conference 2024 Conference Paper

FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge

  • Hanzhe Li
  • Jiaran Zhou
  • Yuezun Li
  • Baoyuan Wu
  • Bin Li
  • Junyu Dong

Generating synthetic fake faces, known as pseudo-fake faces, is an effective way to improve the generalization of DeepFake detection. Existing methods typically generate these faces by blending real or fake faces in spatial domain. While these methods have shown promise, they overlook the simulation of frequency distribution in pseudo-fake faces, limiting the learning of generic forgery traces in-depth. To address this, this paper introduces {\em FreqBlender}, a new method that can generate pseudo-fake faces by blending frequency knowledge. Concretely, we investigate the major frequency components and propose a Frequency Parsing Network to adaptively partition frequency components related to forgery traces. Then we blend this frequency knowledge from fake faces into real faces to generate pseudo-fake faces. Since there is no ground truth for frequency components, we describe a dedicated training strategy by leveraging the inner correlations among different frequency knowledge to instruct the learning process. Experimental results demonstrate the effectiveness of our method in enhancing DeepFake detection, making it a potential plug-and-play strategy for other methods.

NeurIPS Conference 2024 Conference Paper

Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection

  • Yinxuan Huang
  • Chengmin Gao
  • Bin Li
  • Xiangyang Xue

Given the complexities inherent in visual scenes, such as object occlusion, a comprehensive understanding often requires observation from multiple viewpoints. Existing multi-viewpoint object-centric learning methods typically employ random or sequential viewpoint selection strategies. While applicable across various scenes, these strategies may not always be ideal, as certain scenes could benefit more from specific viewpoints. To address this limitation, we propose a novel active viewpoint selection strategy. This strategy predicts images from unknown viewpoints based on information from observation images for each scene. It then compares the object-centric representations extracted from both viewpoints and selects the unknown viewpoint with the largest disparity, indicating the greatest gain in information, as the next observation viewpoint. Through experiments on various datasets, we demonstrate the effectiveness of our active viewpoint selection strategy, significantly enhancing segmentation and reconstruction performance compared to random viewpoint selection. Moreover, our method can accurately predict images from unknown viewpoints.

NeurIPS Conference 2024 Conference Paper

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

  • Haoyang Liu
  • Jie Wang
  • Wanbo Zhang
  • Zijie Geng
  • Yufei Kuang
  • Xijun Li
  • Yongdong Zhang
  • Bin Li

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures—which are closely related to the problem formulations—in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations. To address this challenge, we propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures. Specifically, MILP-StuDio begins by identifying the blocks in CCMs and decomposing the instances into block units, which serve as the building blocks of MILP instances. We then design three operators to construct new instances by removing, substituting, and appending block units in the original instances, enabling us to generate instances with flexible sizes. An appealing feature of MILP-StuDio is its strong ability to preserve the feasibility and computational hardness of the generated instances. Experiments on the commonly-used benchmarks demonstrate that using instances generated by MILP-StuDio is able to significantly reduce over 10% of the solving time for learning-based solvers.

NeurIPS Conference 2024 Conference Paper

Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework

  • Zhihai Wang
  • Jie Wang
  • Qingyue Yang
  • Yinqi Bai
  • Xing Li
  • Lei Chen
  • Jianye Hao
  • Mingxuan Yuan

Logic Synthesis (LS) aims to generate an optimized logic circuit satisfying a given functionality, which generally consists of circuit translation and optimization. It is a challenging and fundamental combinatorial optimization problem in integrated circuit design. Traditional LS approaches rely on manually designed heuristics to tackle the LS task, while machine learning recently offers a promising approach towards next-generation logic synthesis by neural circuit generation and optimization. In this paper, we first revisit the application of differentiable neural architecture search (DNAS) methods to circuit generation and found from extensive experiments that existing DNAS methods struggle to exactly generate circuits, scale poorly to large circuits, and exhibit high sensitivity to hyper-parameters. Then we provide three major insights for these challenges from extensive empirical analysis: 1) DNAS tends to overfit to too many skip-connections, consequently wasting a significant portion of the network's expressive capabilities; 2) DNAS suffers from the structure bias between the network architecture and the circuit inherent structure, leading to inefficient search; 3) the learning difficulty of different input-output examples varies significantly, leading to severely imbalanced learning. To address these challenges in a systematic way, we propose a novel regularized triangle-shaped circuit network generation framework, which leverages our key insights for completely accurate and scalable circuit generation. Furthermore, we propose an evolutionary algorithm assisted by reinforcement learning agent restarting technique for efficient and effective neural circuit optimization. Extensive experiments on four different circuit benchmarks demonstrate that our method can precisely generate circuits with up to 1200 nodes. Moreover, our synthesized circuits significantly outperform the state-of-the-art results from several competitive winners in IWLS 2022 and 2023 competitions.

NeurIPS Conference 2024 Conference Paper

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

  • Rui Yang
  • Jie Wang
  • Guoping Wu
  • Bin Li

Real-world offline datasets are often subject to data corruptions (such as noise or adversarial attacks) due to sensor failures or malicious attacks. Despite advances in robust offline reinforcement learning (RL), existing methods struggle to learn robust agents under high uncertainty caused by the diverse corrupted data (i. e. , corrupted states, actions, rewards, and dynamics), leading to performance degradation in clean environments. To tackle this problem, we propose a novel robust variational Bayesian inference for offline RL (TRACER). It introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions. Specifically, TRACER first models all corruptions as the uncertainty in the action-value function. Then, to capture such uncertainty, it uses all offline data as the observations to approximate the posterior distribution of the action-value function under a Bayesian inference framework. An appealing feature of TRACER is that it can distinguish corrupted data from clean data using an entropy-based uncertainty measure, since corrupted data often induces higher uncertainty and entropy. Based on the aforementioned measure, TRACER can regulate the loss associated with corrupted data to reduce its influence, thereby enhancing robustness and performance in clean environments. Experiments demonstrate that TRACER significantly outperforms several state-of-the-art approaches across both individual and simultaneous data corruptions.

AAAI Conference 2024 Conference Paper

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

  • Bin Li
  • Ye Shi
  • Qian Yu
  • Jingya Wang

Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 24.44%, and on Office-Home, it demonstrates a P@15 improvement of 12.12%. Code is available at https://github.com/HCVLAB/ProtoOT.

NeurIPS Conference 2024 Conference Paper

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

  • Chenyu Yang
  • Xizhou Zhu
  • Jinguo Zhu
  • Weijie Su
  • Junjie Wang
  • Xuan Dong
  • Wenhai Wang
  • Lewei Lu

Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propose a novel vision model pre-training method called Latent Compression Learning (LCL) for interleaved image-text data. This method performs latent compression learning by maximizing the mutual information between the inputs and outputs of a causal attention model. The training objective can be decomposed into two basic tasks: 1) contrastive learning between visual representation and preceding context, and 2) generating subsequent text based on visual representation. Our experiments demonstrate that our method not only matches the performance of CLIP on paired pre-training datasets (e. g. , LAION), but can also leverage interleaved pre-training data (e. g. , MMC4) to learn robust visual representations from scratch, showcasing the potential of vision model pre-training with interleaved image-text data.

JAAMAS Journal 2023 Journal Article

Diffusion auction design with transaction costs

  • Bin Li
  • Dong Hao
  • Dengji Zhao

Abstract We study multi-unit auctions powered by intermediated markets, where all transactions are processed by intermediaries and incur certain costs. Each intermediary in the market owns a private set of buyers and all intermediaries are networked with each other. Our goal is to incentivize the intermediaries to share the auction information to individuals they can reach, including their private buyers and neighboring intermediaries, so that more potential buyers are able to participate in the auction. To this end, we build a diffusion-based auction framework to handle the transaction costs and the strategic interactions between intermediaries. The classic Vickrey-Clarke-Groves (VCG) mechanism within the scenario can obtain the maximum social welfare, but it can decrease the seller’s revenue or even lead to a deficit. To overcome the revenue issue, we develop two deficit reduction strategies, based on which a family of diffusion auctions called Critical Neighborhood Auctions (CNA) is identified. The CNA not only maximizes the social welfare, but also eliminates all the seller’s deficits. Moreover, the revenue given by the CNA is no less than the revenue given by the VCG mechanism with/without intermediaries. This is the first set of diffusion auctions with welfare and revenue advantages that can handle multiple items and transaction costs.

NeurIPS Conference 2023 Conference Paper

Domain Re-Modulation for Few-Shot Generative Domain Adaptation

  • Yi Wu
  • Ziqiang Li
  • Chaoyue Wang
  • Heliang Zheng
  • Shanshan Zhao
  • Bin Li
  • Dacheng Tao

In this study, we delve into the task of few-shot Generative Domain Adaptation (GDA), which involves transferring a pre-trained generator from one domain to a new domain using only a few reference images. Inspired by the way human brains acquire knowledge in new domains, we present an innovative generator structure called $\textbf{Domain Re-Modulation (DoRM)}$. DoRM not only meets the criteria of $\textit{high quality}$, $\textit{large synthesis diversity}$, and $\textit{cross-domain consistency}$, which were achieved by previous research in GDA, but also incorporates $\textit{memory}$ and $\textit{domain association}$, akin to how human brains operate. Specifically, DoRM freezes the source generator and introduces new mapping and affine modules (M\&A modules) to capture the attributes of the target domain during GDA. This process resembles the formation of new synapses in human brains. Consequently, a linearly combinable domain shift occurs in the style space. By incorporating multiple new M\&A modules, the generator gains the capability to perform high-fidelity multi-domain and hybrid-domain generation. Moreover, to maintain cross-domain consistency more effectively, we introduce a similarity-based structure loss. This loss aligns the auto-correlation map of the target image with its corresponding auto-correlation map of the source image during training. Through extensive experiments, we demonstrate the superior performance of our DoRM and similarity-based structure loss in few-shot GDA, both quantitatively and qualitatively. Code will be available at https: //github. com/wuyi2020/DoRM.

AAAI Conference 2023 Conference Paper

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

  • Dong Wei
  • Huaijiang Sun
  • Bin Li
  • Jianfeng Lu
  • Weiqing Li
  • Xiaoning Sun
  • Shengxiang Hu

Stochastic human motion prediction aims to forecast multiple plausible future motions given a single pose sequence from the past. Most previous works focus on designing elaborate losses to improve the accuracy, while the diversity is typically characterized by randomly sampling a set of latent variables from the latent prior, which is then decoded into possible motions. This joint training of sampling and decoding, however, suffers from posterior collapse as the learned latent variables tend to be ignored by a strong decoder, leading to limited diversity. Alternatively, inspired by the diffusion process in nonequilibrium thermodynamics, we propose MotionDiff, a diffusion probabilistic model to treat the kinematics of human joints as heated particles, which will diffuse from original states to a noise distribution. This process not only offers a natural way to obtain the "whitened'' latents without any trainable parameters, but also introduces a new noise in each diffusion step, both of which facilitate more diverse motions. Human motion prediction is then regarded as the reverse diffusion process that converts the noise distribution into realistic future motions conditioned on the observed sequence. Specifically, MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a flexible refinement network to further enable geometric losses and align with the ground truth. Experimental results on two datasets demonstrate that our model yields the competitive performance in terms of both diversity and accuracy.

ECAI Conference 2023 Conference Paper

Mechanism Design for Ad Auctions with Display Prices

  • Bin Li
  • Yahui Lei

In various applications, ads are displayed together with prices, so as to provide a direct comparison among similar products or services. The price-displaying feature not only influences the consumers’ decision, but also affects the bidding behavior of advertisers. In this paper, we study ad auctions with display prices from the perspective of mechanism design, in which advertisers are asked to submit both the product costs and the display prices of their commodities. We first provide a characterization for all individually rational and incentive-compatible mechanisms in the presence of display prices, then use it to design ad auctions in two scenarios. In the former scenario, the display prices are assumed to be exogenously determined. For this scenario, we derive the welfare-maximizing and revenue-maximizing auctions for any given display price profile. In the latter, advertisers are allowed to strategize their display prices freely. We investigate two families of allocation policies within the scenario and identify the equilibrium display prices accordingly. Our findings demonstrate the impact of display prices on the design of ad auctions, and highlight how platforms can utilize display price information to optimize the performance of ad delivery.

AAAI Conference 2023 Conference Paper

Meta-Auxiliary Learning for Adaptive Human Pose Prediction

  • Qiongjie Cui
  • Huaijiang Sun
  • Jianfeng Lu
  • Bin Li
  • Weiqing Li

Predicting high-fidelity future human poses, from a historically observed sequence, is crucial for intelligent robots to interact with humans. Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, emerge as the dominant solution to solve this issue. Despite encouraging progress, they remain non-optimal, as the unique properties (e.g., motion style, rhythm) of a specific sequence cannot be adapted. More generally, once encountering out-of-distributions, the predicted poses tend to be unreliable. Motivated by this observation, we propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence. In the testing phase, our model can adjust the model parameters by several gradient updates to improve the generation quality. However, due to catastrophic forgetting, both auxiliary tasks typically have a low ability to automatically present the desired positive incentives for the final prediction performance. For this reason, we also propose a meta-auxiliary learning scheme for better adaptation. Extensive experiments show that the proposed approach achieves higher accuracy and more realistic visualization.

JBHI Journal 2023 Journal Article

Multi-Level Constrained Intra and Inter Subject Feature Representation for Facial Video Based BVP Signal Measurement

  • Bin Li
  • Wei Zhang
  • Hong Fu
  • Hao Liu
  • Feng Xu

Facial video-based blood volume pulse (BVP) signal measurement holds great potential for remote health monitoring, while existing methods have issues with convolutional kernel perceptual field constraints. This article proposes an end-to-end multi-level constrained spatiotemporal representation structure for facial video-based BVP signal measurement. First, an intra- and inter-subject feature representation is proposed to strengthen the BVP-related features generation at high, semantic, and shallow levels, respectively. Second, the global-local association is presented to enhance BVP signal period pattern learning, and the global temporal features are introduced into the local spatial convolution of each frame by adaptive kernel weights. Finally, the multi-dimensional fused features are mapped to one-dimensional BVP signals by the task-oriented signal estimator. The experimental results on the publicly available MMSE-HR dataset demonstrate that the proposed structure overperforms state-of-the-art methods (e. g. , AutoHR) in BVP signal measurement, with a 20% and 40% reduction in mean absolute error and root mean squared error, respectively. The proposed structure would be a powerful tool for telemedical and non-contact heart health monitoring.

IJCAI Conference 2023 Conference Paper

Orientation-Independent Chinese Text Recognition in Scene Images

  • Haiyang Yu
  • Xiaocong Wang
  • Bin Li
  • Xiangyang Xue

Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45. 63\% improvement on VCTR when introducing CIRN to the baseline model.

IJCAI Conference 2023 Conference Paper

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

  • Xinyan Zu
  • Haiyang Yu
  • Bin Li
  • Xiangyang Xue

Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.

NeurIPS Conference 2023 Conference Paper

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

  • Zhiyu Jin
  • Xuli Shen
  • Bin Li
  • Xiangyang Xue

Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitively disordered presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitively disorganized presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, enabling models to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.

YNICL Journal 2022 Journal Article

Changes in brain connectivity linked to multisensory processing of pain modulation in migraine with acupuncture treatment

  • Lu Liu
  • Tian-Li Lyu
  • Ming-Yang Fu
  • Lin-Peng Wang
  • Ying Chen
  • Jia-Hui Hong
  • Qiu-Yi Chen
  • Yu-Pu Zhu

Migraine without aura (MWoA) is a major neurological disorder with unsatisfactory adherence to current medications. Acupuncture has emerged as a promising method for treating MWoA. However, the brain mechanism underlying acupuncture is yet unclear. The present study aimed to examine the effects of acupuncture in regulating brain connectivity of the key regions in pain modulation. In this study, MWoA patients were recruited and randomly assigned to 4 weeks of real or sham acupuncture. Resting-state functional magnetic resonance imaging (fMRI) data were collected before and after the treatment. A modern neuroimaging literature meta-analysis of 515 fMRI studies was conducted to identify pain modulation-related key regions as regions of interest (ROIs). Seed-to-voxel resting state-functional connectivity (rsFC) method and repeated-measures two-way analysis of variance were conducted to determine the interaction effects between the two groups and time (baseline and post-treatment). The changes in rsFC were evaluated between baseline and post-treatment in real and sham acupuncture groups, respectively. Clinical data at baseline and post-treatment were also recorded in order to determine between-group differences in clinical outcomes as well as correlations between rsFC changes and clinical effects. 40 subjects were involved in the final analysis. The current study demonstrated significant improvement in real acupuncture vs sham acupuncture on headache severity (monthly migraine days), headache impact (6-item Headache Impact Test), and health-related quality of life (Migraine-Specific Quality of Life Questionnaire). Five pain modulation-related key regions, including the right amygdala (AMYG), left insula (INS), left medial orbital superior frontal gyrus (PFCventmed), left middle occipital gyrus (MOG), and right middle cingulate cortex (MCC), were selected based on the meta-analysis on brain imaging studies. This study found that 1) after acupuncture treatment, migraine patients of the real acupuncture group showed significantly enhanced connectivity in the right AMYG/MCC-left MTG and the right MCC-right superior temporal gyrus (STG) compared to that of the sham acupuncture group; 2) negative correlations were established between clinical effects and increased rsFC in the right AMYG/MCC-left MTG; 3) baseline right AMYG-left MTG rsFC predicts monthly migraine days reduction after treatment. The current results suggested that acupuncture may concurrently regulate the rsFC of two pain modulation regions in the AMYG and MCC. MTG and STG may be the key nodes linked to multisensory processing of pain modulation in migraine with acupuncture treatment. These findings highlighted the potential of acupuncture for migraine management and the mechanisms underlying the modulation effects.

IJCAI Conference 2022 Conference Paper

Data-Efficient Backdoor Attacks

  • Pengfei Xia
  • Ziqiang Li
  • Wei Zhang
  • Bin Li

Recent studies have proven that deep neural networks are vulnerable to backdoor attacks. Specifically, by mixing a small number of poisoned samples into the training set, the behavior of the trained model can be maliciously controlled. Existing attack methods construct such adversaries by randomly selecting some clean data from the benign set and then embedding a trigger into them. However, this selection strategy ignores the fact that each poisoned sample contributes inequally to the backdoor injection, which reduces the efficiency of poisoning. In this paper, we formulate improving the poisoned data efficiency by the selection as an optimization problem and propose a Filtering-and-Updating Strategy (FUS) to solve it. The experimental results on CIFAR-10 and ImageNet-10 indicate that the proposed method is effective: the same attack success rate can be achieved with only 47% to 75% of the poisoned sample volume compared to the random selection strategy. More importantly, the adversaries selected according to one setting can generalize well to other settings, exhibiting strong transferability. The prototype code of our method is now available at https: //github. com/xpf/Data-Efficient-Backdoor-Attacks.

AIJ Journal 2022 Journal Article

Diffusion auction design

  • Bin Li
  • Dong Hao
  • Hui Gao
  • Dengji Zhao

This paper studies an auction design problem for a seller to sell a single commodity in a social network, where each individual (the seller or a buyer) can only communicate with her neighbors. The challenge is to design a mechanism to incentivize the buyers, who are aware of the auction, to further propagate the information to their neighbors, so that more buyers can participate in the auction and hence, the seller will be able to make a higher revenue and a higher welfare. We build a general framework for this new scenario and propose several novel diffusion auctions, which not only incentivize the buyers to report their valuations on the commodity truthfully, but also to propagate the auction information to all their neighbors. Particularly, the direct extension of the well-known Vickrey-Clarke-Groves (VCG) mechanism on social networks can have the incentives, but it will decrease the seller's revenue or even lead to a deficit. We also show that in the social network setting all efficient mechanisms that are individually rational and incentive compatible can lead to a deficit. The goal in this article is to increase the seller's revenue by attracting more buyers, so we give up welfare maximization and propose a class of mechanisms called critical diffusion mechanisms. It is proved that both the seller's revenue and the social welfare achieved in critical diffusion mechanisms are not less than that given in the VCG mechanism before attracting new buyers. The intuition behind the proposed mechanisms is that buyers who join the mechanism earlier have higher priorities to buy the commodity. If a buyer does not win the commodity because of her propagation, then she will be compensated. The formalization of the problem has not been well-studied in the literature of mechanism design, and there are many open problems worth further investigation. The study of this problem will provide insights for the emerging market based on the participants' recommendations via their social networks.

EAAI Journal 2022 Journal Article

Hybrid Dynamic Bayesian network method for performance analysis of safety barriers considering multi-maintenance strategies

  • Shengnan Wu
  • Bin Li
  • Yangfan Zhou
  • Maoyu Chen
  • Yiliu Liu
  • Laibin Zhang

Safety barriers play a critical role in preventing unintentional hydrocarbon flow leaking from reservoir to external environment or another formation during different offshore operation stages, and such a leakage has the potential to trigger cascading events and may lead to catastrophic consequences. The present study aims at the development of a hybrid DBN-based approach for dynamic performance analysis of safety barriers in the prevention of subsea downhole leakage incidents. Events in operation, such as different types of maintenances and process demand are taken into account to enhance the safety barrier performance. These factors could be analyzed by reflecting inspecting and repair activities of safety barriers with multistate-based multiphase Markov process. In order to obtain a dynamic and synthetic risk analysis of subsea downhole leakage, a dynamic Bayesian network-based model is proposed, incorporating the failure analysis of safety barriers and downhole multiple leakage pathways. Such analysis allows determining the dynamic risk characteristic of leakage events, and key safety barriers under different maintenance scenarios. Dynamic performance of such safety barriers is evaluated with respect to four aspects: preventive maintenance and imperfect repair, degradation effects, process demand and maintenance cost. The approach is tested through the application to a case study with an offshore oil and gas well. The results the importance of safety barrier performance in controlling the expected leakage scenarios.

IJCAI Conference 2022 Conference Paper

Learning General Gaussian Mixture Model with Integral Cosine Similarity

  • Guanglin Li
  • Bin Li
  • Changsheng Chen
  • Shunquan Tan
  • Guoping Qiu

Gaussian mixture model (GMM) is a powerful statistical tool in data modeling, especially for unsupervised learning tasks. Traditional learning methods for GMM such as expectation maximization (EM) require the covariance of the Gaussian components to be non-singular, a condition that is often not satisfied in real-world applications. This paper presents a new learning method called G$^2$M$^2$ (General Gaussian Mixture Model) by fitting an unnormalized Gaussian mixture function (UGMF) to a data distribution. At the core of G$^2$M$^2$ is the introduction of an integral cosine similarity (ICS) function for comparing the UGMF and the unknown data density distribution without having to explicitly estimate it. By maximizing the ICS through Monte Carlo sampling, the UGMF can be made to overlap with the unknown data density distribution such that the two only differ by a constant scalar, and the UGMF can be normalized to obtain the data density distribution. A Siamese convolutional neural network is also designed for optimizing the ICS function. Experimental results show that our method is more competitive in modeling data having correlations that may lead to singular covariance matrices in GMM, and it outperforms state-of-the-art methods in unsupervised anomaly detection.

AAAI Conference 2022 Conference Paper

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

  • Yufei Kuang
  • Miao Lu
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm—namely, State-Conservative Policy Optimization (SCPO)—to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.

ICRA Conference 2022 Conference Paper

Multi-Dimensional Proprioception and Stiffness Tuning for Soft Robotic Joints

  • Zhonggui Fang
  • Chaoyi Huang
  • Yaxi Wang
  • Jiahao Xu
  • Jiyong Tan
  • Bin Li
  • Zichen Wang
  • Yige Wu

Proprioception and variable stiffness are two trending topics in soft robotics research. The former could endow soft robots with the ability to perceive the environment as well as their internal states without the need of dedicated sensors, while the latter could strengthen the otherwise excessive compliance, enabling soft robots for tasks which require a higher force. Both directions have been extensively reported in existing literature, achieving both concurrently was even more challenging. The major limiting factor was the limited stiffness due to the hyper elasticity of conventional soft robots, which increases the difficulties in capturing the continues deformation. In this work, we proposed an alternative approach to tackle these two challenges, a novel “tune-down” approach, combining proprioception with stiffness regulation and implemented over-constrained soft robotic joint designs to further strengthen this spirit. As a result, the soft robotic joint could achieve multi-directional proprioception, as well as variable stiffness tuning, concurrently, using merely an on-board sensor for basic pneumatic control. The concept, design, modeling, actuation/control, and experimental validation were presented in detail, demonstrating the efficacy and potential of the proposed approach.

NeurIPS Conference 2022 Conference Paper

Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge

  • Hongjing Niu
  • Hanting Li
  • Feng Zhao
  • Bin Li

Deep learning models have been found with a tendency of relying on shortcuts, i. e. , decision rules that perform well on standard benchmarks but fail when transferred to more challenging testing conditions. Such reliance may hinder deep learning models from learning other task-related features and seriously affect their performance and robustness. Although recent studies have shown some characteristics of shortcuts, there are few investigations on how to help the deep learning models to solve shortcut problems. This paper proposes a framework to address this issue by setting up roadblocks on shortcuts. Specifically, roadblocks are placed when the model is urged to learn to complete a gently modified task to ensure that the learned knowledge, including shortcuts, is insufficient the complete the task. Therefore, the model trained on the modified task will no longer over-rely on shortcuts. Extensive experiments demonstrate that the proposed framework significantly improves the training of networks on both synthetic and real-world datasets in terms of both classification accuracy and feature diversity. Moreover, the visualization results show that the mechanism behind the proposed our method is consistent with our expectations. In summary, our approach can effectively disable the shortcuts and thus learn more robust features.

AAAI Conference 2022 Conference Paper

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

  • Zhihai Wang
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The sample efficiency of model-based approaches relies on whether the model can well approximate the environment. However, learning an accurate model is challenging, especially in complex and noisy environments. To tackle this problem, we propose the conservative model-based actorcritic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. Specifically, CMBAC learns multiple estimates of the Q-value function from a set of inaccurate models and uses the average of the bottom-k estimates—a conservative estimate—to optimize the policy. An appealing feature of CMBAC is that the conservative estimates effectively encourage the agent to avoid unreliable “promising actions”— whose values are high in only a small fraction of the models. Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments.

AAAI Conference 2022 Conference Paper

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

  • Jingye Chen
  • Haiyang Yu
  • Jianqi Ma
  • Bin Li
  • Xiangyang Xue

In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at strokelevel, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on TextZoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase. Code is available at https: //github. com/FudanVI/FudanOCR.

AAAI Conference 2022 Conference Paper

Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints

  • Jinyang Yuan
  • Bin Li
  • Xiangyang Xue

Visual scenes are extremely rich in diversity, not only because there are infinite combinations of objects and background, but also because the observations of the same scene may vary greatly with the change of viewpoints. When observing a visual scene that contains multiple objects from multiple viewpoints, humans are able to perceive the scene in a compositional way from each viewpoint, while achieving the socalled “object constancy” across different viewpoints, even though the exact viewpoints are untold. This ability is essential for humans to identify the same object while moving and to learn from vision efficiently. It is intriguing to design models that have the similar ability. In this paper, we consider a novel problem of learning compositional scene representations from multiple unspecified viewpoints without using any supervision, and propose a deep generative model which separates latent representations into a viewpoint-independent part and a viewpoint-dependent part to solve this problem. To infer latent representations, the information contained in different viewpoints is iteratively integrated by neural networks. Experiments on several specifically designed synthetic datasets have shown that the proposed method is able to effectively learn from multiple unspecified viewpoints.

EAAI Journal 2021 Journal Article

A novel time-power based grey model for nonlinear time series forecasting

  • Keyong Wan
  • Bin Li
  • Weijie Zhou
  • Haicheng Zhu
  • Song Ding

To deal with various nonlinear issues in real applications, a novel time-power based grey model is put forward. However, in the original form of this model, the time-power parameter α normally equals to an integer, and then the analytical expression of the time response function will be obtained. Otherwise, if the parameter α equals to a non-integer, one cannot obtain the concrete time response function for future estimations. This situation may significantly restrict the applications of this grey model. To address such drawbacks, an optimized version is designed in this work. In the proposed model, a simplified solution to the differential equation is derived by using the definite integral technique. Furthermore, for improving accuracy, the time-power parameter α is optimized by utilizing the Particle Swarm Optimization algorithm based on the model parameter packages. Subsequently, the efficacy and practicality of this simplified function have been verified by numerical simulations and experimental studies. Moreover, the method of probability density prediction is employed for verifying the reliability and stability of the proposed model for the first time when predicting the settlement of the soft-clay subgrade on an expressway. The demonstration cases illustrate that the quantitative improvements over forecasts of the proposed model are even more pronounced with a level accuracy of 2. 29% and 1. 19% MAPE values in the fitted and predicted periods, respectively, which can significantly increase the predicting accuracy by more than 10% with respect to the other benchmarks. Therefore, the new proposed model not only has greater application fields and prospects but also achieves higher and more reliable predicting accuracy with the optimal α under the support of the Particle Swarm Optimization algorithm, compared with the competing models.

NeurIPS Conference 2021 Conference Paper

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

  • Xin Liu
  • Bin Li
  • Pengyi Shi
  • Lei Ying

This paper considers stochastic linear bandits with general nonlinear constraints. The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round $\tau\leq T$. We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields $\tilde{\cal O}\left(\left(\frac{K^{0. 75}}{\delta}+d\right)\sqrt{\tau}\right)$ (pseudo) regret in round $\tau\leq T, $ where $K$ is the number of constraints, $d$ is the dimension of the reward feature space, and $\delta$ is a Slater's constant; and {\em zero} constraint violation in any round $\tau>\tau', $ where $\tau'$ is {\em independent} of horizon $T. $ Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization and includes two components. The primal component is similar to unconstrained stochastic linear bandits (our algorithm uses the linear upper confidence bound algorithm (LinUCB)). The computational complexity of the dual component depends on the number of constraints, but is independent of the sizes of the contextual space, the action space, and the feature space. Thus, the computational complexity of our algorithm is similar to LinUCB for unconstrained stochastic linear bandits.

IJCAI Conference 2021 Conference Paper

Bayesian Nonparametric Space Partitions: A Survey

  • Xuhui Fan
  • Bin Li
  • Ling Luo
  • Scott A. Sisson

Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a D-dimensional space into a set of blocks, such that the data within the same block share certain kinds of homogeneity. BNSP models are applicable to many areas, including regression/classification trees, random feature construction, and relational modelling. This survey provides the first comprehensive review of this subject. We explore the current progress of BNSP research through three perspectives: (1) Partition strategies, where we review the various techniques for generating partitions and discuss their theoretical foundation, `self-consistency'; (2) Applications, where we detail the current mainstream usages of BNSP models and identify some potential future applications; and (3) Challenges, where we discuss current unsolved problems and possible avenues for future research.

NeurIPS Conference 2021 Conference Paper

Continuous-time edge modelling using non-parametric point processes

  • Xuhui Fan
  • Bin Li
  • Feng Zhou
  • Scott SIsson

The mutually-exciting Hawkes process (ME-HP) is a natural choice to model reciprocity, which is an important attribute of continuous-time edge (dyadic) data. However, existing ways of implementing the ME-HP for such data are either inflexible, as the exogenous (background) rate functions are typically constant and the endogenous (excitation) rate functions are specified parametrically, or inefficient, as inference usually relies on Markov chain Monte Carlo methods with high computational costs. To address these limitations, we discuss various approaches to model design, and develop three variants of non-parametric point processes for continuous-time edge modelling (CTEM). The resulting models are highly adaptable as they generate intensity functions through sigmoidal Gaussian processes, and so provide greater modelling flexibility than parametric forms. The models are implemented via a fast variational inference method enabled by a novel edge modelling construction. The superior performance of the proposed CTEM models is demonstrated through extensive experimental evaluations on four real-world continuous-time edge data sets.

NeurIPS Conference 2021 Conference Paper

Deep Contextual Video Compression

  • Jiahao Li
  • Bin Li
  • Yan Lu

Most of the existing neural video compression methods adopt the predictive coding framework, which first generates the predicted frame and then encodes its residue with the current frame. However, as for compression ratio, predictive coding is only a sub-optimal solution as it uses simple subtraction operation to remove the redundancy across frames. In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding. In particular, we try to answer the following questions: how to define, use, and learn condition under a deep video compression framework. To tap the potential of conditional coding, we propose using feature domain context as condition. This enables us to leverage the high dimension context to carry rich information to both the encoder and the decoder, which helps reconstruct the high-frequency contents for higher video quality. Our framework is also extensible, in which the condition can be flexibly designed. Experiments show that our method can significantly outperform the previous state-of-the-art (SOTA) deep video compression methods. When compared with x265 using veryslow preset, we can achieve 26. 0% bitrate saving for 1080P standard test videos.

AAAI Conference 2021 Conference Paper

Knowledge-Guided Object Discovery with Acquired Deep Impressions

  • Jinyang Yuan
  • Bin Li
  • Xiangyang Xue

We present a framework called Acquired Deep Impressions (ADI) which continuously learns knowledge of objects as “impressions” for compositional scene understanding. In this framework, the model first acquires knowledge from scene images containing a single object in a supervised manner, and then continues to learn from novel multi-object scene images which may contain objects that have not been seen before without any further supervision, under the guidance of the learned knowledge as humans do. By memorizing impressions of objects into parameters of neural networks and applying the generative replay strategy, the learned knowledge can be reused to generate images with pseudo-annotations and in turn assist the learning of novel scenes. The proposed ADI framework focuses on the acquisition and utilization of knowledge, and is complementary to existing deep generative models proposed for compositional scene representation. We adapt a base model to make it fall within the ADI framework and conduct experiments on two types of datasets. Empirical results suggest that the proposed framework is able to effectively utilize the acquired impressions and improve the scene decomposition performance.

AAAI Conference 2021 Conference Paper

Raven’s Progressive Matrices Completion with Latent Gaussian Process Priors

  • Fan Shi
  • Bin Li
  • Xiangyang Xue

Abstract reasoning ability is fundamental to human intelligence. It enables humans to uncover relations among abstract concepts and further deduce implicit rules from the relations. As a well-known abstract visual reasoning task, Raven’s Progressive Matrices (RPM) are widely used in human IQ tests. Although extensive research has been conducted on RPM solvers with machine intelligence, few studies have considered further advancing the standard answer-selection (classification) problem to a more challenging answer-painting (generating) problem, which can verify whether the model has indeed understood the implicit rules. In this paper we aim to solve the latter one by proposing a deep latent variable model, in which multiple Gaussian processes are employed as priors of latent variables to separately learn underlying abstract concepts from RPMs; thus the proposed model is interpretable in terms of concept-specific latent variables. The latent Gaussian process also provides an effective way of extrapolation for answer painting based on the learned conceptchanging rules. We evaluate the proposed model on RPMlike datasets with multiple continuously-changing visual concepts. Experimental results demonstrate that our model requires only few training samples to paint high-quality answers, generate novel RPM panels, and achieve interpretability through concept-specific latent variables.

IJCAI Conference 2021 Conference Paper

Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition

  • Jingye Chen
  • Bin Li
  • Xiangyang Xue

Chinese character recognition has attracted much research interest due to its wide applications. Although it has been studied for many years, some issues in this field have not been completely resolved yet, \textit{e. g. } the zero-shot problem. Previous character-based and radical-based methods have not fundamentally addressed the zero-shot problem since some characters or radicals in test sets may not appear in training sets under a data-hungry condition. Inspired by the fact that humans can generalize to know how to write characters unseen before if they have learned stroke orders of some characters, we propose a stroke-based method by decomposing each character into a sequence of strokes, which are the most basic units of Chinese characters. However, we observe that there is a one-to-many relationship between stroke sequences and Chinese characters. To tackle this challenge, we employ a matching-based strategy to transform the predicted stroke sequence to a specific character. We evaluate the proposed method on handwritten characters, printed artistic characters, and scene characters. The experimental results validate that the proposed method outperforms existing methods on both character zero-shot and radical zero-shot tasks. Moreover, the proposed method can be easily generalized to other languages whose characters can be decomposed into strokes.

IJCAI Conference 2020 Conference Paper

Incentive-Compatible Diffusion Auctions

  • Bin Li
  • Dong Hao
  • Dengji Zhao

Diffusion auction is a new model in auction design. It can incentivize the buyers who have already joined in the auction to further diffuse the sale information to others via social relations, whereby both the seller's revenue and the social welfare can be improved. Diffusion auctions are essentially non-typical multidimensional mechanism design problems and agents' social relations are complicatedly involved with their bids. In such auctions, incentive-compatibility (IC) means it is best for every agent to honestly report her valuation and fully diffuse the sale information to all her neighbors. Existing work identified some specific mechanisms for diffusion auctions, while a general theory characterizing all incentive-compatible diffusion auctions is still missing. In this work, we identify a sufficient and necessary condition for all dominant-strategy incentive-compatible (DSIC) diffusion auctions. We formulate the monotonic allocation policies in such multidimensional problems and show that any monotonic allocation policy can be implemented in a DSIC diffusion auction mechanism. Moreover, given any monotonic allocation policy, we obtain the optimal payment policy to maximize the seller's revenue.

YNICL Journal 2020 Journal Article

Neural primacy of the dorsolateral prefrontal cortex in patients with obsessive-compulsive disorder

  • Hailong Li
  • Xinyu Hu
  • Yingxue Gao
  • Lingxiao Cao
  • Lianqing Zhang
  • Xuan Bu
  • Lu Lu
  • Yanlin Wang

The dorsolateral prefrontal cortex (DLPFC), a key structure in the executive system, has consistently emerged as a crucial element in the pathophysiology of obsessive-compulsive disorder (OCD). However, the neural primacy of the DLPFC remains elusive in this disorder. We investigated the causal interaction (measured by effective connectivity) between the DLPFC and the remaining brain areas using bivariate Granger causality analysis of resting-state fMRI collected from 88 medication-free OCD patients and 88 matched healthy controls. Additionally, we conducted seed-based functional connectivity (FC) analyses to identify network-level neural functional alterations using the bilateral DLPFC as seeds. OCD patients demonstrated reduced FC between the right DLPFC and right orbitofrontal cortex (OFC), and activity in the right OFC had an inhibitory effect on the right DLPFC. Additionally, we observed alterations in both feedforward and reciprocal influences between the inferior temporal gyrus (ITG) and the DLPFC in patients. Furthermore, activity in the cerebellum had an excitatory influence on the right DLPFC in OCD patients. These findings may help to elucidate the psychopathology of OCD by detailing the directional connectivity between the DLPFC and the rest of the brain, ultimately helping to identify regions that could serve as treatment targets in OCD.

AAAI Conference 2020 Conference Paper

Outlier Detection Ensemble with Embedded Feature Selection

  • Li Cheng
  • Yijie Wang
  • Xinwang Liu
  • Bin Li

Feature selection places an important role in improving the performance of outlier detection, especially for noisy data. Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance. In this paper, we propose an outlier detection ensemble framework with embedded feature selection (ODEFS), to address this issue. Specifically, for each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation to learn feature subsets that are tailored for the outlier detection method. Moreover, we adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection, which is helpful to improve the reliability of the training set. After that, we design an alternate algorithm with proved convergence to solve the resultant optimization problem. In addition, we analyze the generalization error bound of the proposed framework, which provides theoretical guarantee on the method and insightful practical guidance. Comprehensive experimental results on 12 real-world datasets from diverse domains validate the superiority of the proposed ODEFS.

IJCAI Conference 2020 Conference Paper

Recurrent Dirichlet Belief Networks for interpretable Dynamic Relational Data Modelling

  • Yaqiong Li
  • Xuhui Fan
  • Ling Chen
  • Bin Li
  • Zheng Yu
  • Scott A. Sisson

The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational data. The proposed Recurrent-DBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent long-term temporal dependence modelling, which outperforms the one-order Markov descriptions in most of the dynamic probabilistic frameworks; (3) the computational cost scales to the number of positive links only. In addition, we develop a new inference strategy, which first upward-and-backward propagates latent counts and then downward-and-forward samples variables, to enable efficient Gibbs sampling for the Recurrent-DBN. We apply the Recurrent-DBN to dynamic relational data problems. The extensive experiment results on real-world data validate the advantages of the Recurrent-DBN over the state-of-the-art models in interpretable latent structure discovery and improved link prediction performance.

YNICL Journal 2019 Journal Article

Characteristic alteration of subcortical nuclei shape in medication-free patients with obsessive-compulsive disorder

  • Lianqing Zhang
  • Xinyu Hu
  • Hailong Li
  • Lu Lu
  • Bin Li
  • Xiaoxiao Hu
  • Xuan Bu
  • Shi Tang

BACKGROUND: Subcortical nuclei are important components in the pathology model of obsessive-compulsive disorder (OCD), and subregions of these structures subserve different functions that may distinctively contribute to OCD symptoms. Exploration of the subregional-level profile of structural abnormalities of these nuclei is needed to develop a better understanding of the neural mechanism of OCD. METHODS: A total of 83 medication-free, non-comorbid OCD patients and 93 age- and sex-matched healthy controls were recruited, and high-resolution T1-weighted MR images were obtained for all participants. The volume and shape of the subcortical nuclei (including the nucleus accumbens, amygdala, caudate, pallidum, putamen and thalamus) were quantified and compared with an automated parcellation approach and vertex-wise shape analysis using FSL-FIRST software. Sex differences in these measurements were also explored with an exploratory subgroup analysis. RESULTS: Volumetric analysis showed no significant differences between patients and healthy control subjects. Relative to healthy control subjects, the OCD patients showed an expansion of the lateral amygdala (right hemisphere) and right pallidum. These deformities were associated with illness duration and symptom severity of OCD. Exploratory subgroup analysis by sex revealed amygdala deformity in male patients and caudate deformity in female patients. CONCLUSIONS: The lateral amygdala and the dorsal pallidum were associated with OCD. Neuroanatomic evidence of sexual dimorphism was also found in OCD. Our study not only provides deeper insight into how these structures contribute to OCD symptoms by revealing these subregional-level deformities but also suggests that gender effects may be important in OCD studies.

JMLR Journal 2019 Journal Article

Convergence of Gaussian Belief Propagation Under General Pairwise Factorization: Connecting Gaussian MRF with Pairwise Linear Gaussian Model

  • Bin Li
  • Yik-Chung Wu

Gaussian belief propagation (BP) is a low-complexity and distributed method for computing the marginal distributions of a high-dimensional joint Gaussian distribution. However, Gaussian BP is only guaranteed to converge in singly connected graphs and may fail to converge in loopy graphs. Therefore, convergence analysis is a core topic in Gaussian BP. Existing conditions for verifying the convergence of Gaussian BP are all tailored for one particular pairwise factorization of the distribution in Gaussian Markov random field (MRF) and may not be valid for another pairwise factorization. On the other hand, convergence conditions of Gaussian BP in pairwise linear Gaussian model are developed independently from those in Gaussian MRF, making the convergence results highly scattered with diverse settings. In this paper, the convergence condition of Gaussian BP is investigated under a general pairwise factorization, which includes Gaussian MRF and pairwise linear Gaussian model as special cases. Upon this, existing convergence conditions in Gaussian MRF are extended to any pairwise factorization. Moreover, the newly established link between Gaussian MRF and pairwise linear Gaussian model reveals an easily verifiable sufficient convergence condition in pairwise linear Gaussian model, which provides a unified criterion for assessing the convergence of Gaussian BP in multiple applications. Numerical examples are presented to corroborate the theoretical results of this paper. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

IJCAI Conference 2019 Conference Paper

Diffusion and Auction on Graphs

  • Bin Li
  • Dong Hao
  • Dengji Zhao
  • Makoto Yokoo

Auction is the common paradigm for resource allocation which is a fundamental problem in human society. Existing research indicates that the two primary objectives, the seller's revenue and the allocation efficiency, are generally conflicting in auction design. For the first time, we expand the domain of the classic auction to a social graph and formally identify a new class of auction mechanisms on graphs. All mechanisms in this class are incentive-compatible and also promote all buyers to diffuse the auction information to others, whereby both the seller's revenue and the allocation efficiency are significantly improved comparing with the Vickrey auction. It is found that the recently proposed information diffusion mechanism is an extreme case with the lowest revenue in this new class. Our work could potentially inspire a new perspective for the efficient and optimal auction design and could be applied into the prevalent online social and economic networks.

TAAS Journal 2019 Journal Article

Finding the Largest Successful Coalition under the Strict Goal Preferences of Agents

  • Zhaopin Su
  • Guofu Zhang
  • Feng Yue
  • Jindong He
  • Miqing Li
  • Bin Li
  • Xin Yao

Coalition formation has been a fundamental form of resource cooperation for achieving joint goals in multiagent systems. Most existing studies still focus on the traditional assumption that an agent has to contribute its resources to all the goals, even if the agent is not interested in the goal at all. In this article, a natural extension of the traditional coalitional resource games (CRGs) is studied from both theoretical and empirical perspectives, in which each agent has uncompromising, personalized preferences over goals. Specifically, a new CRGs model with agents’ strict preferences for goals is presented, in which an agent is willing to contribute its resources only to the goals that are in its own interest set. The computational complexity of the basic decision problems surrounding the successful coalition is reinvestigated. The results suggest that these problems in such a strict preference way are complex and intractable. To find the largest successful coalition for possible computation reduction or potential parallel processing, a flow-network–based exhaust algorithm, called FNetEA, is proposed to achieve the optimal solution. Then, to solve the problem more efficiently, a hybrid algorithm, named 2D-HA, is developed to find the approximately optimal solution on the basis of genetic algorithm, two-dimensional (2D) solution representation, and a heuristic for solution repairs. Through extensive experiments, the 2D-HA algorithm exhibits the prominent ability to provide reassurances that the optimal solution could be found within a reasonable period of time, even in a super-large-scale space.

JBHI Journal 2019 Journal Article

Microsimulation Model Using Christiana Care Early Warning System (CEWS) to Evaluate Physiological Deterioration

  • Bin Li
  • Shengfan Zhang
  • Stephen Hoover
  • Ryan Arnold
  • Muge Capan

While physiological warning signs prior to deterioration events during hospitalization have been widely studied, evaluating clinical interventions, such as rapid response team (RRT) activations, based on scoring systems remains an understudied area. Simulation of physiological deterioration patterns represented by scoring systems can facilitate testing different RRT policies without disturbing care processes. Christiana Care Early Warning System (CEWS) is a scoring system developed at the study hospital to detect the physiological warning signs and inform RRT activations. The objective of this study is to evaluate CEWS-triggered RRT policies based on patient demographics and policy structures. Using retrospective data derived from a subset of electronic health records between December 2015 and December 2016 (6000 patients), we developed a microsimulation model with integrated regression analysis to compare RRT policies on subpopulations defined by age, gender, and comorbidities to find score thresholds that result in the lowest percent of time spent above critical CEWS values. Policies that rely on average scores were more sensitive to threshold changes compared to policies that rely on current value and change in the CEWS. Policy using score threshold 10 provided the lowest percentage of time under the critical condition for majority of subpopulations. The proposed model is a novel framework to simulate individual deterioration patterns and systematically evaluate RRT policies based on their impact on health conditions. Our work highlights the importance of integration of data-driven models into personalized care and represents a significant opportunity to inform biomedical and health informatics research on designing and evaluating EWS-based clinical interventions.

NeurIPS Conference 2019 Conference Paper

Scalable Deep Generative Relational Model with High-Order Node Dependence

  • Xuhui Fan
  • Bin Li
  • Caoyuan Li
  • Scott SIsson
  • Ling Chen

In this work, we propose a probabilistic framework for relational data modelling and latent structure exploring. Given the possible feature information for the nodes in a network, our model builds up a deep architecture that can approximate to the possible nonlinear mappings between the nodes' feature information and latent representations. For each node, we incorporate all its neighborhoods' high-order structure information to generate latent representation, such that these latent representations are ``smooth'' in terms of the network. Since the latent representations are generated from Dirichlet distributions, we further develop a data augmentation trick to enable efficient Gibbs sampling for Ber-Poisson likelihood with Dirichlet random variables. Our model can be ready to apply to large sparse network as its computations cost scales to the number of positive links in the networks. The superior performance of our model is demonstrated through improved link prediction performance on a range of real-world datasets.

AAAI Conference 2019 Conference Paper

Spatial Mixture Models with Learnable Deep Priors for Perceptual Grouping

  • Jinyang Yuan
  • Bin Li
  • Xiangyang Xue

Humans perceive the seemingly chaotic world in a structured and compositional way with the prerequisite of being able to segregate conceptual entities from the complex visual scenes. The mechanism of grouping basic visual elements of scenes into conceptual entities is termed as perceptual grouping. In this work, we propose a new type of spatial mixture models with learnable priors for perceptual grouping. Different from existing methods, the proposed method disentangles the representation of an object into “shape” and “appearance” which are modeled separately by the mixture weights and the conditional probability distributions. More specifically, each object in the visual scene is modeled by one mixture component, whose mixture weights and the parameter of the conditional probability distribution are generated by two neural networks, respectively. The mixture weights focus on modeling spatial dependencies (i. e. , shape) and the conditional probability distributions deal with intra-object variations (i. e. , appearance). In addition, the background is separately modeled as a special component complementary to the foreground objects. Our extensive empirical tests on two perceptual grouping datasets demonstrate that the proposed method outperforms the stateof-the-art methods under most experimental configurations. The learned conceptual entities are generalizable to novel visual scenes and insensitive to the diversity of objects.

AAAI Conference 2019 Conference Paper

Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition

  • Bin Li
  • Xi Li
  • Zhongfei Zhang
  • Fei Wu

With the representation effectiveness, skeleton-based human action recognition has received considerable research attention, and has a wide range of real applications. In this area, many existing methods typically rely on fixed physicalconnectivity skeleton structure for recognition, which is incapable of well capturing the intrinsic high-order correlations among skeleton joints. In this paper, we propose a novel spatio-temporal graph routing (STGR) scheme for skeletonbased action recognition, which adaptively learns the intrinsic high-order connectivity relationships for physicallyapart skeleton joints. Specifically, the scheme is composed of two components: spatial graph router (SGR) and temporal graph router (TGR). The SGR aims to discover the connectivity relationships among the joints based on sub-group clustering along the spatial dimension, while the TGR explores the structural information by measuring the correlation degrees between temporal joint node trajectories. The proposed scheme is naturally and seamlessly incorporated into the framework of graph convolutional networks (GCNs) to produce a set of skeleton-joint-connectivity graphs, which are further fed into the classification networks. Moreover, an insightful analysis on receptive field of graph node is provided to explain the necessity of our method. Experimental results on two benchmark datasets (NTU-RGB+D and Kinetics) demonstrate the effectiveness against the state-of-the-art.

TIST Journal 2018 Journal Article

Combination Forecasting Reversion Strategy for Online Portfolio Selection

  • Dingjiang Huang
  • Shunchang Yu
  • Bin Li
  • Steven C. H. Hoi
  • Shuigeng Zhou

Machine learning and artificial intelligence techniques have been applied to construct online portfolio selection strategies recently. A popular and state-of-the-art family of strategies is to explore the reversion phenomenon through online learning algorithms and statistical prediction models. Despite gaining promising results on some benchmark datasets, these strategies often adopt a single model based on a selection criterion (e.g., breakdown point) for predicting future price. However, such model selection is often unstable and may cause unnecessarily high variability in the final estimation, leading to poor prediction performance in real datasets and thus non-optimal portfolios. To overcome the drawbacks, in this article, we propose to exploit the reversion phenomenon by using combination forecasting estimators and design a novel online portfolio selection strategy, named Combination Forecasting Reversion (CFR), which outputs optimal portfolios based on the improved reversion estimator. We further present two efficient CFR implementations based on online Newton step (ONS) and online gradient descent (OGD) algorithms, respectively, and theoretically analyze their regret bounds, which guarantee that the online CFR model performs as well as the best CFR model in hindsight. We evaluate the proposed algorithms on various real markets with extensive experiments. Empirical results show that CFR can effectively overcome the drawbacks of existing reversion strategies and achieve the state-of-the-art performance.

IJCAI Conference 2018 Conference Paper

Customer Sharing in Economic Networks with Costs

  • Bin Li
  • Dong Hao
  • Dengji Zhao
  • Tao Zhou

In an economic market, sellers, infomediaries and customers constitute an economic network. Each seller has her own customer group and the seller's private customers are unobservable to other sellers. Therefore, a seller can only sell commodities among her own customers unless other sellers or infomediaries share her sale information to their customer groups. However, a seller is not incentivized to share others' sale information by default, which leads to inefficient resource allocation and limited revenue for the sale. To tackle this problem, we develop a novel mechanism called customer sharing mechanism (CSM) which incentivizes all sellers to share each other's sale information to their private customer groups. Furthermore, CSM also incentivizes all customers to truthfully participate in the sale. In the end, CSM not only allocates the commodities efficiently but also optimizes the seller's revenue.

IJCAI Conference 2018 Conference Paper

Efficient Attributed Network Embedding via Recursive Randomized Hashing

  • Wei Wu
  • Bin Li
  • Ling Chen
  • Chengqi Zhang

Attributed network embedding aims to learn a low-dimensional representation for each node of a network, considering both attributes and structure information of the node. However, the learning based methods usually involve substantial cost in time, which makes them impractical without the help of a powerful workhorse. In this paper, we propose a simple yet effective algorithm, named NetHash, to solve this problem only with moderate computing capacity. NetHash employs the randomized hashing technique to encode shallow trees, each of which is rooted at a node of the network. The main idea is to efficiently encode both attributes and structure information of each node by recursively sketching the corresponding rooted tree from bottom (i. e. , the predefined highest-order neighboring nodes) to top (i. e. , the root node), and particularly, to preserve as much information closer to the root node as possible. Our extensive experimental results show that the proposed algorithm, which does not need learning, runs significantly faster than the state-of-the-art learning-based network embedding methods while achieving competitive or even better performance in accuracy.

NeurIPS Conference 2018 Conference Paper

Rectangular Bounding Process

  • Xuhui Fan
  • Bin Li
  • Scott SIsson

Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model -- the Rectangular Bounding Process (RBP) -- to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods.

AAMAS Conference 2018 Conference Paper

Selling Multiple Items via Social Networks

  • Dengji Zhao
  • Bin Li
  • Junping Xu
  • Dong Hao
  • Nicholas R. Jennings

We consider a market where a seller sells multiple units of a commodity in a social network. Each node/buyer in the social network can only directly communicate with her neighbours, i. e. the seller can only sell the commodity to her neighbours if she could not find a way to inform other buyers. In this paper, we design a novel promotion mechanism that incentivizes all buyers, who are aware of the sale, to invite all their neighbours to join the sale, even though there is no guarantee that their efforts will be paid. While traditional sale promotions such as sponsored search auctions cannot guarantee a positive return for the advertiser (the seller), our mechanism guarantees that the seller’s revenue is better than not using the advertising. More importantly, the seller does not need to pay if the advertising is not beneficial to her.

AAAI Conference 2017 Conference Paper

Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding

  • Zeyi Wen
  • Bin Li
  • Ramamohanarao Kotagiri
  • Jian Chen
  • Yawen Chen
  • Rui Zhang

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the hth SVM for training the (h + 1)th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the hth SVM for improving the efficiency of training the (h + 1)th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.

AAAI Conference 2017 Conference Paper

Mechanism Design in Social Networks

  • Bin Li
  • Dong Hao
  • Dengji Zhao
  • Tao Zhou

This paper studies an auction design problem for a seller to sell a commodity in a social network, where each individual (the seller or a buyer) can only communicate with her neighbors. The challenge to the seller is to design a mechanism to incentivize the buyers, who are aware of the auction, to further propagate the information to their neighbors so that more buyers will participate in the auction and hence, the seller will be able to make a higher revenue. We propose a novel auction mechanism, called information diffusion mechanism (IDM), which incentivizes the buyers to not only truthfully report their valuations on the commodity to the seller, but also further propagate the auction information to all their neighbors. In comparison, the direct extension of the well-known Vickrey-Clarke-Groves (VCG) mechanism in social networks can also incentivize the information diffusion, but it will decrease the seller’s revenue or even lead to a deficit sometimes. The formalization of the problem has not yet been addressed in the literature of mechanism design and our solution is very significant in the presence of large-scale online social networks.

IJCAI Conference 2017 Conference Paper

Tracking the Evolution of Customer Purchase Behavior Segmentation via a Fragmentation-Coagulation Process

  • Ling Luo
  • Bin Li
  • Irena Koprinska
  • Shlomo Berkovsky
  • Fang Chen

Customer behavior modeling is important for businesses in order to understand, attract and retain customers. It is critical that the models are able to track the dynamics of customer behavior over time. We propose FC-CSM, a Customer Segmentation Model based on a Fragmentation-Coagulation process, which can track the evolution of customer segmentation, including the splitting and merging of customer groups. We conduct a case study using transaction data from a major Australian supermarket chain, where we: 1) show that our model achieves high fitness of purchase rate, outperforming models using mixture of Poisson processes; 2) compare the impact of promotions on customers for different products; and 3) track how customer groups evolve over time and how individual customers shift across groups. Our model provides valuable information to stakeholders about the different types of customers, how they change purchase behavior, and which customers are more receptive to promotion campaigns.

IJCAI Conference 2016 Conference Paper

Bayesian Optimization of Partition Layouts for Mondrian Processes

  • Yi Wang
  • Bin Li
  • Xuhui Fan
  • Yang Wang
  • Fang Chen

The Mondrian process (MP) produces hierarchical partitions on a product space as a kd-tree, which can be served as a flexible yet parsimonious partition prior for relational modeling. Due to the recursive generation of partitions and varying dimensionality of the partition state space, the inference procedure for the MP relational modeling is extremely difficult. The prevalent inference method reversible-jump MCMC for this problem requires a number of unnecessary retrospective steps to transit from one partition state to a very similar one and it is prone to fall into a local optimum. In this paper, we attempt to circumvent these drawbacks by proposing an alternative method for inferring the MP partition structure. Based on the observation that similar cutting rate measures on the partition space lead to similar partition layouts, we propose to impose a nonhomogeneous cutting rate measure on the partition space to control the layouts of the generated partitions - the original MCMC sampling problem is thus transformed into a Bayesian global optimization problem. The empirical tests demonstrate that Bayesian optimization is able to find better partition structures than MCMC sampling with the same number of partition structure proposals.

JMLR Journal 2016 Journal Article

OLPS: A Toolbox for On-Line Portfolio Selection

  • Bin Li
  • Doyen Sahoo
  • Steven C.H. Hoi

On-line portfolio selection is a practical financial engineering problem, which aims to sequentially allocate capital among a set of assets in order to maximize long-term return. In recent years, a variety of machine learning algorithms have been proposed to address this challenging problem, but no comprehensive open-source toolbox has been released for various reasons. This article presents the first open-source toolbox for "On-Line Portfolio Selection" (OLPS), which implements a collection of classical and state-of-the-art strategies powered by machine learning algorithms. We hope that OLPS can facilitate the development of new learning methods and enable the performance benchmarking and comparisons of different strategies. OLPS is an open-source project released under Apache License (version 2.0), which is available at github.com/OLPS/OLPS or OLPS.stevenhoi.org. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

IJCAI Conference 2016 Conference Paper

POISketch: Semantic Place Labeling over User Activity Streams

  • Dingqi Yang
  • Bin Li
  • Philippe Cudr
  • eacute; -Mauroux

Capturing place semantics is critical for enabling location-based applications. Techniques for assigning semantic labels (e. g. , "bar" or "office") to unlabeled places mainly resort to mining user activity logs by exploiting visiting patterns. However, existing approaches focus on inferring place labels with a static user activity dataset, and ignore the visiting pattern dynamics in user activity streams, leading to the rapid decrease of labeling accuracy over time. In this paper, we tackle the problem of semantic place labeling over user activity streams. We formulate this problem as a classification problem by characterizing each place through its fine-grained visiting patterns, which encode the visiting frequency of each user in each typical time slot. However, with the incoming activities of new users in data streams, such fine-grained visiting patterns constantly grow, leading to a continuously expanding feature space. To solve this issue, we propose an updatable sketching technique that creates and incrementally updates a set of compact and fixed-size sketches to approximate the similarity between fine-grained visiting patterns of ever-growing size. We further consider the discriminative weights of user activities in place labeling, and seamlessly incorporate them into our sketching method. Our empirical evaluation on real-world datasets demonstrates the validity of our approach and shows that sketches can be efficiently and effectively used to infer place labels over user activity streams.

AAAI Conference 2016 Conference Paper

The Ostomachion Process

  • Xuhui Fan
  • Bin Li
  • Yi Wang
  • Yang Wang
  • Fang Chen

Stochastic partition processes for exchangeable graphs produce axis-aligned blocks on a product space. In relational modeling, the resulting blocks uncover the underlying interactions between two sets of entities of the relational data. Although some flexible axis-aligned partition processes, such as the Mondrian process, have been able to capture complex interacting patterns in a hierarchical fashion, they are still in short of capturing dependence between dimensions. To overcome this limitation, we propose the Ostomachion process (OP), which relaxes the cutting direction by allowing for oblique cuts. The partitions generated by an OP are convex polygons that can capture inter-dimensional dependence. The OP also exhibits interesting properties: 1) Along the time line the cutting times can be characterized by a homogeneous Poisson process, and 2) on the partition space the areas of the resulting components comply with a Dirichlet distribution. We can thus control the expected number of cuts and the expected areas of components through hyper-parameters. We adapt the reversible-jump MCMC algorithm for inferring OP partition structures. The experimental results on relational modeling and decision tree classification have validated the merit of the OP.

AIJ Journal 2015 Journal Article

Moving average reversion strategy for on-line portfolio selection

  • Bin Li
  • Steven C.H. Hoi
  • Doyen Sahoo
  • Zhi-Yong Liu

On-line portfolio selection, a fundamental problem in computational finance, has attracted increasing interest from artificial intelligence and machine learning communities in recent years. Empirical evidence shows that stock's high and low prices are temporary and stock prices are likely to follow the mean reversion phenomenon. While existing mean reversion strategies are shown to achieve good empirical performance on many real datasets, they often make the single-period mean reversion assumption, which is not always satisfied, leading to poor performance in certain real datasets. To overcome this limitation, this article proposes a multiple-period mean reversion, or so-called “Moving Average Reversion” (MAR), and a new on-line portfolio selection strategy named “On-Line Moving Average Reversion” (OLMAR), which exploits MAR via efficient and scalable online machine learning techniques. From our empirical results on real markets, we found that OLMAR can overcome the drawbacks of existing mean reversion algorithms and achieve significantly better results, especially on the datasets where existing mean reversion algorithms failed. In addition to its superior empirical performance, OLMAR also runs extremely fast, further supporting its practical applicability to a wide range of applications. Finally, we have made all the datasets and source codes of this work publicly available at our project website: http: //OLPS. stevenhoi. org/.

IJCAI Conference 2015 Conference Paper

Semi-Universal Portfolios with Transaction Costs

  • Dingjiang Huang
  • Yan Zhu
  • Bin Li
  • Shuigeng Zhou
  • Steven C. H. Hoi

Online portfolio selection (PS) has been extensively studied in artificial intelligence and machine learning communities in recent years. An important practical issue of online PS is transaction cost, which is unavoidable and nontrivial in real financial trading markets. Most existing strategies, such as universal portfolio (UP) based strategies, often rebalance their target portfolio vectors at every investment period, and thus the total transaction cost increases rapidly and the final cumulative wealth degrades severely. To overcome the limitation, in this paper we investigate new investment strategies that rebalances its portfolio only at some selected instants. Specifically, we design a novel on-line PS strategy named semi-universal portfolio (SUP) strategy under transaction cost, which attempts to avoid rebalancing when the transaction cost outweighs the benefit of trading. We show that the proposed SUP strategy is universal and has an upper bound on the regret. We present an efficient implementation of the strategy based on nonuniform random walks and online factor graph algorithms. Empirical simulation on real historical markets show that SUP can overcome the drawback of existing UP based transaction cost aware algorithms and achieve significantly better performance. Furthermore, SUP has a polynomial complexity in the number of stocks and thus is efficient and scalable in practice. ∗ This work was partially supported by the NSFC (71401128), the SRF for ROCS, SEM, the Key Projects of FRM of Shanghai MCST (14JC1400300), the NSF of Shanghai (15ZR1408300), Shanghai Key Laboratory of Intelligent Information Processing (IIPL-2014- 001) and Singapore MOE tier 1 research grant (C220/MSS14C003).

IROS Conference 2014 Conference Paper

Modeling visuo-motor control and guidance functions in remote-control operation

  • Jonathan Andersh
  • Bin Li
  • Bérénice Mettler

A large class of human movements rely on the so-called hand-eye coordination for precise and versatile performance. Teleoperation of agile robotic systems in three dimensional environments would benefit from a detailed understanding of the perceptual control mechanisms used by the operator both for the design of operator interfaces and potentially for the use of gaze information as part of the control mechanism. The objective of this work is to model the role and contribution of the operator's gaze motion in remote control operation of an agile vehicle. The experiments were conducted using a miniature remote controlled helicopter. The overall human-machine system is described using a multi-loop manual control model. Experiments were designed and conducted to exercise different aspects of this control hierarchy, encompassing stabilization and regulation as well as trajectory tracking and goal directed guidance. The sensing requirements for each loop are established by investigating the relationship between the operator's visual gaze trajectories, the vehicle trajectories, and the control actions. Visual gaze data is classified according to the typical smooth pursuit, saccades and fixations and then incorporated into an estimation strategy.

AIJ Journal 2014 Journal Article

Online Transfer Learning

  • Peilin Zhao
  • Steven C.H. Hoi
  • Jialei Wang
  • Bin Li

In this paper, we propose a novel machine learning framework called “Online Transfer Learning” (OTL), which aims to attack an online learning task on a target domain by transferring knowledge from some source domain. We do not assume data in the target domain follows the same distribution as that in the source domain, and the motivation of our work is to enhance a supervised online learning task on a target domain by exploiting the existing knowledge that had been learnt from training data in source domains. OTL is in general very challenging since data in both source and target domains not only can be different in their class distributions, but also can be diverse in their feature representations. As a first attempt to this new research problem, we investigate two different settings of OTL: (i) OTL on homogeneous domains of common feature space, and (ii) OTL across heterogeneous domains of different feature spaces. For each setting, we propose effective OTL algorithms to solve online classification tasks, and show some theoretical bounds of the algorithms. In addition, we also apply the OTL technique to attack the challenging online learning tasks with concept-drifting data streams. Finally, we conduct extensive empirical studies on a comprehensive testbed, in which encouraging results validate the efficacy of our techniques.

IJCAI Conference 2013 Conference Paper

Robust Median Reversion Strategy for On-Line Portfolio Selection

  • Dingjiang Huang
  • Junlong Zhou
  • Bin Li
  • Steven C. H. Hoi
  • Shuigeng Zhou

On-line portfolio selection has been attracting increasing interests from artificial intelligence community in recent decades. Mean reversion, as one most frequent pattern in financial markets, plays an important role in some state-of-the-art strategies. Though successful in certain datasets, existing mean reversion strategies do not fully consider noises and outliers in the data, leading to estimation error and thus non-optimal portfolios, which results in poor performance in practice. To overcome the limitation, we propose to exploit the reversion phenomenon by robust L1-median estimator, and design a novel on-line portfolio selection strategy named “Robust Median Reversion” (RMR), which makes optimal portfolios based on the improved reversion estimation. Empirical results on various real markets show that RMR can overcome the drawbacks of existing mean reversion algorithms and achieve significantly better results. Finally, RMR runs in linear time, and thus is suitable for large-scale trading applications.

IJCAI Conference 2013 Conference Paper

Robust Median Reversion Strategy for On-Line Portfolio Selection

  • Dingjiang Huang
  • Junlong Zhou
  • Bin Li
  • Steven C. H. Hoi
  • Shuigeng Zhou

Interactive Narrative is a form of digital entertainment heavily based on AI techniques to support narrative generation and user interaction, significant progress arriving with the adoption of planning techniques. However, there is a lack of unified models that integrate generation, user responses and interaction. This paper addresses this by revisiting existing Interactive Narrative paradigms, granting explicit status to users’ disposition towards story characters as part of narrative generation as well as adding support for new forms of interaction. We demonstrate this with a novel Brain-Computer Interface (BCI) design, incorporating empathy for a main character derived from brain signals within filmic conceptions of narrative which drives generation using planning techniques. Results from an experimental study with a fullyimplemented system demonstrate the effectiveness of a EEG neurofeedback-based approach, showing that subjects can successfully modulate empathic support of a character in a medical drama. MRI analysis also shows activations in associated regions of the brain during expression of support.

TIST Journal 2011 Journal Article

CORN

  • Bin Li
  • Steven C.H. Hoi
  • Vivekanand Gopalkrishnan

Machine learning techniques have been adopted to select portfolios from financial markets in some emerging intelligent business applications. In this article, we propose a novel learning-to-trade algorithm termed COR relation-driven N onparametric learning strategy (CORN) for actively trading stocks. CORN effectively exploits statistical relations between stock market windows via a nonparametric learning approach. We evaluate the empirical performance of our algorithm extensively on several large historical and latest real stock markets, and show that it can easily beat both the market index and the best stock in the market substantially (without or with small transaction costs), and also surpass a variety of state-of-the-art techniques significantly.

IJCAI Conference 2011 Conference Paper

Cross-Domain Collaborative Filtering over Time

  • Bin Li
  • Xingquan Zhu
  • Ruijiang Li
  • Chengqi Zhang
  • Xiangyang Xue
  • Xindong Wu

Collaborative filtering (CF) techniques recommend items to users based on their historical ratings. In real-world scenarios, user interests may drift over time since they are affected by moods, contexts, and pop culture trends. This leads to the fact that a user's historical ratings comprise many aspects of user interests spanning a long time period. However, at a certain time slice, one user's interest may only focus on one or a couple of aspects. Thus, CF techniques based on the entire historical ratings may recommend inappropriate items. In this paper, we consider modeling user-interest drift over time based on the assumption that each user has multiple counterparts over temporal domains and successive counterparts are closely related. We adopt the cross-domain CF framework to share the static group-level rating matrix across temporal domains, and let user-interest distribution over item groups drift slightly between successive temporal domains. The derived method is based on a Bayesian latent factor model which can be inferred using Gibbs sampling. Our experimental results show that our method can achieve state-of-the-art recommendation performance as well as explicitly track and visualize user-interest drift over time.

AAAI Conference 2011 Conference Paper

Tracking User-Preference Varying Speed in Collaborative Filtering

  • Ruijiang Li
  • Bin Li
  • Cheng Jin
  • Xiangyang Xue
  • Xingquan Zhu

In real-world recommender systems, some users are easily influenced by new products and whereas others are unwilling to change their minds. So the preference varying speeds for users are different. Based on this observation, we propose a dynamic nonlinear matrix factorization model for collaborative filtering, aimed to improve the rating prediction performance as well as track the preference varying speeds for different users. We assume that user-preference changes smoothly over time, and the preference varying speeds for users are different. These two assumptions are incorporated into the proposed model as prior knowledge on user feature vectors, which can be learned efficiently by MAP estimation. The experimental results show that our method not only achieves state-of-the-art performance in the rating prediction task, but also provides an effective way to track userpreference varying speed.

IJCAI Conference 2009 Conference Paper

  • Bin Li
  • Qiang Yang
  • Xiangyang Xue

The sparsity problem in collaborative filtering (CF) is a major bottleneck for most CF methods. In this paper, we consider a novel approach for alleviating the sparsity problem in CF by transferring useritem rating patterns from a dense auxiliary rating matrix in other domains (e. g. , a popular movie rating website) to a sparse rating matrix in a target domain (e. g. , a new book rating website). We do not require that the users and items in the two domains be identical or even overlap. Based on the limited ratings in the target matrix, we establish a bridge between the two rating matrices at a clusterlevel of user-item rating patterns in order to transfer more useful knowledge from the auxiliary task domain. We first compress the ratings in the auxiliary rating matrix into an informative and yet compact cluster-level rating pattern representation referred to as a codebook. Then, we propose an efficient algorithm for reconstructing the target rating matrix by expanding the codebook. We perform extensive empirical tests to show that our method is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary tasks, as compared to many state-of-the-art CF methods.

TCS Journal 2008 Journal Article

Approximate GCDs of polynomials and sparse SOS relaxations

  • Bin Li
  • Jiawang Nie
  • Lihong Zhi

The problem of computing approximate GCDs of several polynomials with real or complex coefficients can be formulated as computing the minimal perturbation such that the perturbed polynomials have an exact GCD of given degree. We present algorithms based on SOS (Sums Of Squares) relaxations for solving the involved polynomial or rational function optimization problems with or without constraints.

TCS Journal 2003 Journal Article

A representation theorem for recovering contraction relations satisfying wci

  • Zhaohui Zhu
  • Bin Li
  • Xi'an Xiao
  • Shifu Chen
  • Wujia Zhu

A notion of an image structure associated with the canonical epistemic state is introduced. Based on it, we get a representation result for recovering contraction inference relations satisfying the condition weak conjunctive inclusion (wci) in terms of F-standard epistemic AGM states. In effect, this result establishes a representation theorem for belief contraction functions satisfying AGM postulates (k-1) – (k-7), and Rott's (wci) and (k-8c), and hence generalizes Rott's corresponding result in the finite framework.