Arrow Research search

Author name cluster

Hui Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

68 papers
2 author rows

Possible papers

68

AAAI Conference 2026 Conference Paper

HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models

  • Liheng Zhang
  • Jin Wang
  • Hui Li
  • Bingfeng Zhang
  • Weifeng Liu

3D understanding has drawn significant attention recently, leveraging Vision-Language Models (VLMs) to enable multi-modal reasoning between point cloud and text data. Current 3D-VLMs directly embed the 3D point clouds into 3D tokens, following large 2D-VLMs with powerful reasoning capabilities. However, this framework has a great computational cost limiting its application, where we identify that the bottleneck lies in processing all 3D tokens in the Large Language Model (LLM) part. This raises the question: how can we reduce the computational overhead introduced by 3D tokens while preserving the integrity of their essential information? To address this question, we introduce Hierarchical Compensatory Compression (HCC-3D) to efficiently compress 3D tokens while maintaining critical detail retention. Specifically, we first propose a global structure compression (GSC), in which we design global queries to compress all 3D tokens into a few key tokens while keeping overall structural information. Then, to compensate for the information loss in GSC, we further propose an adaptive detail mining (ADM) module that selectively recompresses salient but under-attended features through complementary scoring. Extensive experiments demonstrate that HCC-3D not only achieves extreme compression ratios (approximately 98%) compared to previous 3D VLMs, but also achieves new state-of-the-art performance, showing the great improvements on both efficiency and performance.

AAAI Conference 2026 Conference Paper

Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving

  • Hui Zeng
  • Daming Zhao
  • Pengfei Yang
  • WenXuan Hou
  • Tianyang Zheng
  • Hui Li
  • Weiye Ji
  • Jidong Zhai

Generative reasoning with large language models (LLMs) often involves long decoding sequences, leading to substantial memory and latency overheads from accumulating key-value (KV) caches. While existing KV compression methods primarily focus on reducing prefill memory from long input sequences, they fall short in addressing the dynamic and layer-sensitive nature of long-form generation, which is central to reasoning tasks. We propose Lethe, a dynamic KV cache management framework that introduces adaptivity along both the spatial and temporal dimensions of decoding. Along the spatial dimension, Lethe performs layerwise sparsity-aware allocation, assigning token pruning budgets to each transformer layer based on estimated attention redundancy. Along the temporal dimension, Lethe conducts multi-round token pruning during generation, driven by a Recency-Aware Selective Retention (RASR) mechanism. RASR extends traditional recency-based heuristics by also considering token relevance derived from evolving attention patterns, enabling informed decisions about which tokens to retain or evict. Empirical results demonstrate that Lethe achieves a favorable balance between efficiency and generation quality across diverse models and tasks, increases throughput by up to 2.56×.

AAAI Conference 2026 Conference Paper

MetaGameBO: Hierarchical Game-Theoretic Driven Robust Meta-Learning for Bayesian Optimization

  • Hui Li
  • Huafeng Liu
  • Yiran Fu
  • Shuyang Lin
  • Baoxin Zhang
  • Deqiang Ouyang
  • Liping Jing
  • Jian Yu

Meta-learning for Bayesian optimization accelerates optimization by leveraging knowledge from previous tasks, but existing methods optimize for average performance and fail on challenging outlier tasks critical in practice. These limitations become particularly severe when target tasks exhibit distribution shifts or when optimization budgets are limited in real-world applications. We introduce MetaGameBO, a hierarchical game-theoretic framework that formulates meta-learning as robust optimization through CVaR-based task selection and diversity-aware sample learning. Our approach incorporates uncertainty-aware adaptation via probabilistic embeddings and Thompson sampling for robust generalization to out-of-distribution targets. We establish theoretical guarantees including convergence to game-theoretic equilibria and improved sample complexity, and demonstrate substantial improvements with 95.7% reduction in average loss and 88.6% lower tail risk compared to state-of-the-art methods on challenging tasks and distribution shifts.

AAAI Conference 2026 Conference Paper

RcAE: Recursive Reconstruction Framework for Unsupervised Industrial Anomaly Detection

  • Rongcheng Wu
  • Hao Zhu
  • Shiying Zhang
  • Mingzhe Wang
  • Zhidong Li
  • Hui Li
  • Jianlong Zhou
  • Jiangtao Cui

Unsupervised industrial anomaly detection requires accurately identifying defects without labeled data. Traditional autoencoder-based methods often struggle with incomplete anomaly suppression and loss of fine details, as their single-pass decoding fails to effectively handle anomalies with varying severity and scale. We propose a recursive architecture for autoencoder (RcAE), which performs reconstruction iteratively to progressively suppress anomalies while refining normal structures. Unlike traditional single-pass models, this recursive design naturally produces a sequence of reconstructions, progressively exposing suppressed abnormal patterns. To leverage this reconstruction dynamics, we introduce a Cross Recursion Detection (CRD) module that tracks inconsistencies across recursion steps, enhancing detection of both subtle and large-scale anomalies. Additionally, we incorporate a Detail Preservation Network (DPN) to recover high-frequency textures typically lost during reconstruction. Extensive experiments demonstrate that our method significantly outperforms existing non-diffusion methods, and achieves performance on par with recent diffusion models with only 10% of their parameters and offering substantially faster inference. These results highlight the practicality and efficiency of our approach for real-world applications.

AAAI Conference 2026 Conference Paper

RSA-CR: Resisting Shilling Attacks in Citation Recommendation via Dumbbell Inductive Learning

  • Xiyue Gao
  • Yukai Liu
  • Zhuoqi Ma
  • Xiaotian Qiao
  • Hui Li
  • Cai Xu
  • Kunhua Zhang
  • Jiangtao Cui

Citation recommendation aims to provide researchers with the most relevant references for their manuscripts, helping them swiftly discover pertinent studies and bolster the reliability of their arguments. However, some individuals manipulate these recommendation systems by injecting false information, such as deliberately inflating the citation count of their own papers, to obtain favorable recommendations and ratings. This form of attack, commonly termed “shilling attack”, is not only highly concealed but also has an unimaginable impact on all scientific research. To address this problem, we theoretically reveal the impact of shilling attacks on citation recommendation and propose three feasible resistance strategies: historical collaborations, significant citations and content constraints. Based on these insights, we introduce RSA-CR, a robust and hybrid citation recommendation algorithm resistant to shilling attacks. The algorithm constructs a two-layer academic graph and uses random and content generation strategies to initialize author and paper embeddings. Confidence-guided inductive aggregations based on collaboration and citation relationships are then performed at the author and paper sides, where author aggregation results directly influences the paper aggregation strength. Finally, recommendations are made by measuring the distances between the fused paper embeddings. The entire learning process resembles a dumbbell, hence termed “dumbbell inductive learning”. Experiments on four academic datasets demonstrate that our method outperforms baselines in both effectiveness and robustness.

AAAI Conference 2026 Conference Paper

Whole-Body Coordination for Dynamic Object Grasping with Legged Manipulators

  • Qiwei Liang
  • Boyang Cai
  • Rongyi He
  • Hui Li
  • Tao Teng
  • Haihan Duan
  • Changxin Huang
  • Runhao Zeng

Quadrupedal robots with manipulators offer strong mobility and adaptability for grasping in unstructured, dynamic environments through coordinated whole-body control. However, existing research has predominantly focused on static-object grasping, neglecting the challenges posed by dynamic targets and thus limiting applicability in dynamic scenarios such as logistics sorting and human–robot collaboration. To address this, we introduce DQ-Bench, a new benchmark that systematically evaluates dynamic grasping across varying object motions, velocities, heights, object types, and terrain complexities, along with comprehensive evaluation metrics. Building upon this benchmark, we propose DQ-Net, a compact teacher–student framework designed to infer grasp configurations from limited perceptual cues. During training, the teacher network leverages privileged information to holistically model both the static geometric properties and dynamic motion characteristics of the target, and integrates a grasp fusion module to deliver robust guidance for motion planning. Concurrently, we design a lightweight student network that performs dual-viewpoint temporal modeling using only the target mask, depth map, and proprioceptive state, enabling closed-loop action outputs without reliance on privileged data. Extensive experiments on DQ-Bench demonstrate that DQ-Net achieves robust dynamic objects grasping across multiple task settings, substantially outperforming baseline methods in both success rate and responsiveness. We will release our codebase and benchmark publicly.

AAAI Conference 2025 Conference Paper

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

  • Tianyu Huang
  • Haoze Zhang
  • Yihan Zeng
  • Zhilu Zhang
  • Hui Li
  • Wangmeng Zuo
  • Rynson W. H. Lau

Dynamic 3D interaction has been attracting a lot of attention recently. However, creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, which requires manually assigning precise physical properties to the object or the simulated results would become unnatural. Another solution is to learn the deformation of 3D objects with the distillation of video generative models, which, however, tends to produce 3D videos with small and discontinuous motions due to the inappropriate extraction and application of physics priors. In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions. In particular, we propose motion distillation sampling to emphasize video motion information during distillation. In addition, to facilitate the optimization, we further propose a KAN-based material field with frame boosting. Experimental results demonstrate that our method enjoys more realistic motions than state-of-the-arts do.

AAAI Conference 2025 Conference Paper

Expected Hypervolume Improvement Is a Particular Hypervolume Improvement

  • Jingda Deng
  • Jianyong Sun
  • Qingfu Zhang
  • Hui Li

Multi-objective Bayesian optimization (MOBO) aims to optimize multiple competing objective functions in the expensive-to-evaluate scenario. The Expected Hypervolume Improvement (EHVI) is a commonly used acquisition function for MOBO and shows a good performance. However, the computation of EHVI becomes challenging as the number of objective functions grows. In this paper, we revisit the formulation of EHVI, as well as its multi-point counterpart qEHVI, and derive much simpler analytic expressions for them. The main contributions of this paper include: (1) first formulating EHVI as a particular hypervolume improvement, and thus immediately obtaining a formal proof of its NP-hardness, faster algorithms in both theory and practice, and more results on its derivatives; (2) first obtaining the analytic expressions of qEHVI for any q > 1 and m ≥ 2 where m is the number of objectives; and (3) demonstrating the advantages of our formulation over existing exact and approximation methods for computing EHVI and qEHVI through a large number of numerical experiments.

ICML Conference 2025 Conference Paper

Flow-based Domain Randomization for Learning and Sequencing Robotic Skills

  • Aidan Curtis
  • Eric Li
  • Michael Noseworthy
  • Nishad Gothoskar
  • Sachin Chitta
  • Hui Li
  • Leslie Pack Kaelbling
  • Nicole E. Carey

Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies learned in simulation. By randomizing properties of the environment during training, the learned policy can be robust to uncertainty along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate the problem of automatically discovering this sampling distribution via entropy-regularized reward maximization of a neural sampling distribution in the form of a normalizing flow. We show that this architecture is more flexible and results in better robustness than existing approaches to learning simple parameterized sampling distributions. We demonstrate that these policies can be used to learn robust policies for contact-rich assembly tasks. Additionally, we explore how these sampling distributions, in combination with a privileged value function, can be used for out-of-distribution detection in the context of an uncertainty-aware multi-step manipulation planner.

ICLR Conference 2025 Conference Paper

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

  • Jiahao Cui 0003
  • Hui Li
  • Yao Yao 0008
  • Hao Zhu 0004
  • Hanlin Shang
  • Kaihui Cheng
  • Hang Zhou 0009
  • Siyu Zhu 0001

Recent advances in latent diffusion-based generative models for portrait image animation, such as Hallo, have achieved impressive results in short-duration video synthesis. In this paper, we present updates to Hallo, introducing several design enhancements to extend its capabilities.First, we extend the method to produce long-duration videos. To address substantial challenges such as appearance drift and temporal artifacts, we investigate augmentation strategies within the image space of conditional motion frames. Specifically, we introduce a patch-drop technique augmented with Gaussian noise to enhance visual consistency and temporal coherence over long duration.Second, we achieve 4K resolution portrait video generation. To accomplish this, we implement vector quantization of latent codes and apply temporal alignment techniques to maintain coherence across the temporal dimension. By integrating a high-quality decoder, we realize visual synthesis at 4K resolution.Third, we incorporate adjustable semantic textual labels for portrait expressions as conditional inputs. This extends beyond traditional audio cues to improve controllability and increase the diversity of the generated content. To the best of our knowledge, Hallo2, proposed in this paper, is the first method to achieve 4K resolution and generate hour-long, audio-driven portrait image animations enhanced with textual prompts. We have conducted extensive experiments to evaluate our method on publicly available datasets, including HDTF, CelebV, and our introduced ''Wild'' dataset. The experimental results demonstrate that our approach achieves state-of-the-art performance in long-duration portrait video animation, successfully generating rich and controllable content at 4K resolution for duration extending up to tens of minutes.

ICRA Conference 2025 Conference Paper

ICRT: In-Context Imitation Learning via Next-Token Prediction

  • Max Fu
  • Huang Huang
  • Gaurav Datta
  • Lawrence Yunliang Chen
  • William Chung-Ho Panitch
  • Fangchen Liu
  • Hui Li
  • Ken Goldberg

In-context imitation learning is the capability to perform novel tasks when prompted with task demonstration examples. In-Context Robot Transformer (ICRT) is a causal transformer that performs autoregressive prediction on sensorimotor trajectories, which include images, proprioceptive states, and actions. This approach supports flexible and training-free execution of new tasks at test time. Experiments with a Franka Emika robot demonstrate that ICRT can adapt to new environment configurations that differ from both the prompt and the training data. In a multi-task environment setup, ICRT significantly outperforms current state-of-the-art robot foundation models on generalization to unseen tasks. Code, data, and appendix are available on https://icrt.dev.

ICLR Conference 2025 Conference Paper

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

  • Jingyi Yang
  • Zitong Yu
  • Xiuming Ni
  • Jia He
  • Hui Li

Contrastive language-image pretraining (CLIP) has significantly advanced image-based vision learning. A pressing topic subsequently arises: how can we effectively adapt CLIP to the video domain? Recent studies have focused on adjusting either the textual or visual branch of CLIP for action recognition. However, we argue that adaptations of both branches are crucial. In this paper, we propose a **C**ontrastive **L**anguage-**A**ction **V**ideo Learn**er** (**CLAVER**), designed to shift CLIP's focus from the alignment of static visual objects and concrete nouns to the alignment of dynamic action behaviors and abstract verbs. Specifically, we introduce a novel Kronecker mask attention for temporal modeling. Our tailored Kronecker mask offers three benefits 1) it expands the temporal receptive field for each token, 2) it serves as an effective spatiotemporal heterogeneity inductive bias, mitigating the issue of spatiotemporal homogenization, and 3) it can be seamlessly plugged into transformer-based models. Regarding the textual branch, we leverage large language models to generate diverse, sentence-level and semantically rich interpretive prompts of actions, which shift the model's focus towards the verb comprehension. Extensive experiments on various benchmarks and learning scenarios demonstrate the superiority and generality of our approach. The code will be available soon.

NeurIPS Conference 2025 Conference Paper

LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion

  • Yuan Guo
  • Qian Ma
  • Hui Li
  • Qiao Ning
  • Furui Zhan
  • Yu Gu
  • Ge Yu
  • Shikai Guo

Multi-modal Knowledge Graph Completion (MMKGC) aims to predict missing entities, relations, or attributes in knowledge graphs by collaboratively modeling the triple structure and multimodal information (e. g. , text, images, videos) associated with entities. This approach facilitates the automatic discovery of previously unobserved factual knowledge. However, existing MMKGC methods encounter several critical challenges: (i) the imbalance of inter-entity information across different modalities; (ii) the heterogeneity of intra-entity multimodal information; and (iii) for a given entity, the informational contributions of different modalities are inconsistent across contexts. In this paper, we propose a novel L arge model-driven B alanced M ultimodal K nowledge G raph C ompletion framework, termed LBMKGC. Subsequently, to bridge the semantic gap between heterogeneous modalities, LBMKGC aligns the multimodal embeddings of entities semantically by using the CLIP (Contrastive Language-Image Pre-Training) model. Furthermore, LBMKGC adaptively fuses multimodal embeddings with relational guidance by distinguishing between the perceptual and conceptual attributes of triples. Finally, extensive experiments conducted against 21 state-of-the-art baselines demonstrate that LBMKGC achieves superior performance across diverse datasets and scenarios while maintaining efficiency and generalizability. Our code and data are publicly available at: https: //github. com/guoynow/LBMKGC.

ECAI Conference 2025 Conference Paper

Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

  • Yunchuan Guan
  • Yu Liu 0040
  • Ke Zhou 0001
  • Hui Li
  • Sen Jia 0003
  • Zhiqi Shen 0001
  • Ziyang Wang
  • Xinglin Zhang

Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp’s superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.

ICML Conference 2025 Conference Paper

Learning Robust Neural Processes with Risk-Averse Stochastic Optimization

  • Huafeng Liu 0001
  • Yiran Fu
  • Liping Jing
  • Hui Li
  • Shuyang Lin
  • Jingyue Shi
  • Deqiang Ouyang
  • Jian Yu 0001

Neural processes (NPs) are a promising paradigm to enable skill transfer learning across tasks with the aid of the distribution of functions. The previous NPs employ the empirical risk minimization principle in optimization. However, the fast adaption ability to different tasks can vary widely, and the worst fast adaptation can be catastrophic in risk-sensitive tasks. To achieve robust neural processes modeling, we consider the problem of training models in a risk-averse manner, which can control the worst fast adaption cases at a certain probabilistic level. By transferring the risk minimization problem to a two-level finite sum minimax optimization problem, we can easily solve it via a double-looped stochastic mirror prox algorithm with a task-aware variance reduction mechanism via sampling samples across all tasks. The mirror prox technique ensures better handling of complex constraint sets and non-Euclidean geometries, making the optimization adaptable to various tasks. The final solution, by aggregating prox points with the adaptive learning rates, enables a stable and high-quality output. The proposed learning strategy can work with various NPs flexibly and achieves less biased approximation with a theoretical guarantee. To illustrate the superiority of the proposed model, we perform experiments on both synthetic and real-world data, and the results demonstrate that our approach not only helps to achieve more accurate performance but also improves model robustness.

NeurIPS Conference 2025 Conference Paper

Learning to Generalize: An Information Perspective on Neural Processes

  • Hui Li
  • Huafeng Liu
  • Shuyang Lin
  • Jingyue Shi
  • Yiran Fu
  • Liping Jing

Neural Processes (NPs) combine the adaptability of neural networks with the efficiency of meta-learning, offering a powerful framework for modeling stochastic processes. However, existing methods focus on empirical performance while lacking a rigorous theoretical understanding of generalization. To address this, we propose an information-theoretic framework to analyze the generalization bounds of NPs, introducing dynamical stability regularization to minimize sharpness and improve optimization dynamics. Additionally, we show how noise-injected parameter updates complement this regularization. The proposed approach, applicable to a wide range of NP models, is validated through experiments on classic benchmarks, including 1D regression, image completion, Bayesian optimization, and contextual bandits. The results demonstrate tighter generalization bounds and superior predictive performance, establishing a principled foundation for advancing generalizable NP models.

NeurIPS Conference 2025 Conference Paper

Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws

  • Lin Guo
  • Xiaoqing Luo
  • Wei Xie
  • Zhancheng Zhang
  • Hui Li
  • Rui Wang
  • Zhenhua Feng
  • Xiaoning Song

Existing infrared and visible image fusion methods often face the dilemma of balancing modal information. Generative fusion methods reconstruct fused images by learning from data distributions, but their generative capabilities remain limited. Moreover, the lack of interpretability in modal information selection further affects the reliability and consistency of fusion results in complex scenarios. This manuscript revisits the essence of generative image fusion under the inspiration of human cognitive laws and proposes a novel infrared and visible image fusion method, termed HCLFuse. First, HCLFuse investigates the quantification theory of information mapping in unsupervised fusion networks, which leads to the design of a multi-scale mask-regulated variational bottleneck encoder. This encoder applies posterior probability modeling and information decomposition to extract accurate and concise low-level modal information, thereby supporting the generation of high-fidelity structural details. Furthermore, the probabilistic generative capability of the diffusion model is integrated with physical laws, forming a time-varying physical guidance mechanism that adaptively regulates the generation process at different stages, thereby enhancing the ability of the model to perceive the intrinsic structure of data and reducing dependence on data quality. Experimental results show that the proposed method achieves state-of-the-art fusion performance in qualitative and quantitative evaluations across multiple datasets and significantly improves semantic segmentation metrics. This fully demonstrates the advantages of this generative image fusion method, drawing inspiration from human cognition, in enhancing structural consistency and detail quality.

ICRA Conference 2025 Conference Paper

Towards Neurorobotic Interface for Finger Joint Angle Estimation: A Multi-Stage CNN-LSTM Network with Transfer Learning

  • Yun Chen
  • Xinyu Zhang
  • Hui Li
  • Hongsheng He
  • Wan Shou
  • Qiang Zhang 0028

To maximize the autonomy of individuals with upper limb amputations in daily activities, leveraging forearm muscle information to infer movement intent is a promising research direction. While current prosthetic hand technologies can utilize forearm muscle data to achieve basic movements such as grasping, accurately estimating finger joint angles remains a significant challenge. Therefore, we propose a Multi-Stage Cascade Convolutional Neural Network with Long Short-Term Memory Network, where an upsampling module is introduced before the downsampling module to enhance model generalization. Additionally, we designed a transfer learning (TL) framework based on parameter freezing, where the pre-trained downsampling module is fixed, and only the upsampling module is updated with a small amount of out-ofdistribution data to achieve TL. Furthermore, we compared the performance of unimodal and multimodal models, collecting surface electromyography (sEMG) signals, brightness mode ultrasound images (B-mode US images), and motion capture data simultaneously. The results show that on the validation set, the US image had the lowest error, while on the prediction set, the four-channel sEMG achieved the lowest error. The performance of the multimodal model in both datasets was intermediate between the unimodal models. On the prediction set, the average normalized root mean square error values for the four-channel sEMG, US images, and sensor fusion models across three subjects were 0. 170, 0. 203, and 0. 186, respectively. By utilizing advanced sensor fusion techniques and TL, our approach can reduce the need for extensive data collection and training for new users, making prosthetic control more accessible and adaptable to individual needs.

JBHI Journal 2024 Journal Article

3D Vessel Segmentation With Limited Guidance of 2D Structure-Agnostic Vessel Annotations

  • Huai Chen
  • Xiuying Wang
  • Hui Li
  • LIsheng Wang

Delineating 3D blood vessels of various anatomical structures is essential for clinical diagnosis and treatment, however, is challenging due to complex structure variations and varied imaging conditions. Although recent supervised deep learning models have demonstrated their superior capacity in automatic 3D vessel segmentation, the reliance on expensive 3D manual annotations and limited capacity for annotation reuse among different vascular structures hinder their clinical applications. To avoid the repetitive and costly annotating process for each vascular structure and make full use of existing annotations, this paper proposes a novel 3D shape-guided local discrimination (3D-SLD) model for 3D vascular segmentation under limited guidance from public 2D vessel annotations. The primary hypothesis is that 3D vessels are composed of semantically similar voxels and often exhibit tree-shaped morphology. Accordingly, the 3D region discrimination loss is firstly proposed to learn the discriminative representation measuring voxel-wise similarities and cluster semantically consistent voxels to form the candidate 3D vascular segmentation in unlabeled images. Secondly, the shape distribution from existing 2D structure-agnostic vessel annotations is introduced to guide the 3D vessels with the tree-shaped morphology by the adversarial shape constraint loss. Thirdly, to enhance training stability and prediction credibility, the highlighting-reviewing-summarizing (HRS) mechanism is proposed. This mechanism involves summarizing historical models to maintain temporal consistency and identifying credible pseudo labels as reliable supervision signals. Only guided by public 2D coronary artery annotations, our method achieves results comparable to SOTA barely-supervised methods in 3D cerebrovascular segmentation, and the best DSC in 3D hepatic vessel segmentation, demonstrating the effectiveness of our method.

ICRA Conference 2024 Conference Paper

ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility

  • Yunsheng Tian
  • Karl D. D. Willis
  • Bassel Al Omari
  • Jieliang Luo
  • Pingchuan Ma 0004
  • Yichen Li 0004
  • Farhad Javid
  • Edward Gu

The automated assembly of complex products requires a system that can automatically plan a physically feasible sequence of actions for assembling many parts together. In this paper, we present ASAP, a physics-based planning approach for automatically generating such a sequence for general-shaped assemblies. ASAP accounts for gravity to design a sequence where each sub-assembly is physically stable with a limited number of parts being held and a support surface. We apply efficient tree search algorithms to reduce the combinatorial complexity of determining such an assembly sequence. The search can be guided by either geometric heuristics or graph neural networks trained on data with simulation labels. Finally, we show the superior performance of ASAP at generating physically realistic assembly sequence plans on a large dataset of hundreds of complex product assemblies. We further demonstrate the applicability of ASAP on both simulation and real-world robotic setups. Project website: asap. csail. mit.edu

ECAI Conference 2024 Conference Paper

Generalized Face Anti-Spoofing via Finer Domain Partition and Disentangling Liveness-Irrelevant Factors

  • Jingyi Yang
  • Zitong Yu
  • Xiuming Ni
  • Jia He
  • Hui Li

Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accurately. In our work, we redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes. We emphasize ignoring the adverse effect of identity shift, focusing on learning identity-invariant liveness representations through orthogonalizing liveness and identity features. To cope with style shifts, we propose Style Cross module to expand the stylistic diversity and Channel-wise Style Attention module to weaken the sensitivity to style shifts, aiming to learn robust liveness representations. Furthermore, acknowledging the asymmetry between live and spoof samples, we introduce a novel contrastive loss, Asymmetric Augmented Instance Contrast. Extensive experiments on four public datasets demonstrate that our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios. Additionally, our method has good scalability when expanding diversity of identities. Code is available at https: //github. com/yjyddq/DLIF.

AAAI Conference 2024 Conference Paper

Multi-Dimensional Fair Federated Learning

  • Cong Su
  • Guoxian Yu
  • Jun Wang
  • Hui Li
  • Qingzhong Li
  • Han Yu

Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitably in a population. The problem of privately training fair FL models without compromising the generalization capability of disadvantaged clients remains open. In this paper, we propose a method, called mFairFL, to address this problem and achieve group fairness and client fairness simultaneously. mFairFL leverages differential multipliers to construct an optimization objective for empirical risk minimization with fairness constraints. Before aggregating locally trained models, it first detects conflicts among their gradients, and then iteratively curates the direction and magnitude of gradients to mitigate these conflicts. Theoretical analysis proves mFairFL facilitates the fairness in model development. The experimental evaluations based on three benchmark datasets show significant advantages of mFairFL compared to seven state-of-the-art baselines.

IJCAI Conference 2024 Conference Paper

Nukplex: An Efficient Local Search Algorithm for Maximum K-Plex Problem

  • Rui Sun
  • Yiyuan Wang
  • Shimao Wang
  • Hui Li
  • Ximing Li
  • Minghao Yin

The maximum k-plex problem (MKPP) is an significant relaxation version of the maximum clique problem with extensive applications. Recently, lots of researchers have proposed many heuristic algorithms based on various methods to solve the MKPP. In this work, to further improve the performance of solving the MKPP, we propose an efficient local search algorithm based on three main ideas. First, we propose a relaxed bounded configuration checking strategy that considers two kinds of historical searching information to relax the restricted strength of configuration checking and the forbidden condition of candidate vertices for the operation Add, respectively. Second, we present a novel solution information-based vertex selection strategy based on two kinds of solution information to select high-quality candidate vertices. Third, we define the solution core and then introduce a core-based perturbation strategy to help the algorithm jump out of local optima. The experimental results show that the proposed algorithm significantly outperforms the state-of-the-art MKPP algorithms in almost all the instances.

ICLR Conference 2024 Conference Paper

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

  • Ziyao Guo
  • Kai Wang 0036
  • George Cazenavette
  • Hui Li
  • Kaipeng Zhang
  • Yang You 0001

The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset. Until now, no method of Dataset Distillation has reached this completely lossless goal, in part due to the fact that previous methods only remain effective when the total number of synthetic samples is extremely small. Since only so much information can be contained in such a small number of samples, it seems that to achieve truly loss dataset distillation, we must develop a distillation method that remains effective as the size of the synthetic dataset grows. In this work, we present such an algorithm and elucidate why existing methods fail to generate larger, high-quality synthetic sets. Current state-of-the-art methods rely on trajectory-matching, or optimizing the synthetic data to induce similar long-term training dynamics as the real data. We empirically find that the training stage of the trajectories we choose to match (i.e., early or late) greatly affects the effectiveness of the distilled dataset. Specifically, early trajectories (where the teacher network learns easy patterns) work well for a low-cardinality synthetic set since there are fewer examples wherein to distribute the necessary information. Conversely, late trajectories (where the teacher network learns hard patterns) provide better signals for larger synthetic sets since there are now enough samples to represent the necessary complex patterns. Based on our findings, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset. In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets, achieving lossless dataset distillation for the very first time. Code and distilled datasets will be released.

IJCAI Conference 2024 Conference Paper

VF-Detector: Making Multi-Granularity Code Changes on Vulnerability Fix Detector Robust to Mislabeled Changes

  • Zhenkan Fu
  • Shikai Guo
  • Hui Li
  • Rong Chen
  • Xiaochen Li
  • He Jiang

As software development projects increasingly rely on open-source software, users face the risk of security vulnerabilities from third-party libraries. To address label and character noise in code changes, we present VF-Detector to automatically identifying bug-fix commits in actual noise development environment. VF-Detector consists of three componments: Data Pre-processing (DP), Vulnerability Confidence Computation (VCC) and Confidence Learning Denoising (CLD). The DP component is responsible for preprocessing code change data. The VCC component calculates code change confidence value for each bug-fix by extracting features at various granularity levels. The CLD component removes noise and enhances model robustness by pruning noisy data with confidence values and performing effort-aware adjustments. Experimental results demonstrate VF-Detector's superiority over state-of-the-art methods in EffortCost@L and Popt@L metrics on Java and Python datasets. The improvements were 6. 5% and 5% for Java, and 23. 4% and 17. 8% for Python.

ECAI Conference 2024 Conference Paper

Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning

  • Runhao Zeng
  • Dingjie Zhou
  • Qiwei Liang
  • Junlin Liu
  • Hui Li
  • Changxin Huang
  • Jianqiang Li 0001
  • Xiping Hu

Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37. 6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.

ICRA Conference 2023 Conference Paper

Dimensional Optimization and Anti-Disturbance Analysis of an Upgraded Feed Mechanism in FAST

  • Xiaoyan Wang
  • Bin Zhang 0035
  • Zhaoyang Li
  • Xinyu Gao
  • Fei Zhang 0006
  • Yifan Ma
  • Rui Yao
  • Jianing Yin

Five-hundred-meter aperture spherical radio telescope (FAST) is a very famous large-scale scientific facility with excellent performance for astronomical observation in the world, but it currently fails to observe the center of the Milky Way Galaxy due to the limited observation angle that is affected by the heavy weight of the feed cabin. To improve this problem, an upgraded feed mechanism (UFM) with a lighter cable structure is designed and employed to replace the existing heavy rigid A-B rotator and Stewart platform in the feed cabin of FAST. The structural dimension of the UFM is analyzed and optimized under cable tension constraints to meet the requirements of the observation angle. Then, a novel disturbance increment method is proposed to analyze the anti-disturbance ability of the UFM, where a gradually increased disturbance wrench is applied to the UFM with the stiffness matrix iteratively updated. Through the dimensional optimization and further anti-disturbance analysis, the newly-designed UFM can indeed meet the higher demand for astronomical observation with the larger observation angle, which benefits from the lightweight cable structure. Besides, the UFM also has the appreciable anti-disturbance ability for long-term stable operation of FAST.

NeurIPS Conference 2023 Conference Paper

Injecting Multimodal Information into Rigid Protein Docking via Bi-level Optimization

  • Ruijia Wang
  • YiWu Sun
  • Yujie Luo
  • Shaochuan Li
  • Cheng Yang
  • Xingyi Cheng
  • Hui Li
  • Chuan Shi

The structure of protein-protein complexes is critical for understanding binding dynamics, biological mechanisms, and intervention strategies. Rigid protein docking, a fundamental problem in this field, aims to predict the 3D structure of complexes from their unbound states without conformational changes. In this scenario, we have access to two types of valuable information: sequence-modal information, such as coevolutionary data obtained from multiple sequence alignments, and structure-modal information, including the 3D conformations of rigid structures. However, existing docking methods typically utilize single-modal information, resulting in suboptimal predictions. In this paper, we propose xTrimoBiDock (or BiDock for short), a novel rigid docking model that effectively integrates sequence- and structure-modal information through bi-level optimization. Specifically, a cross-modal transformer combines multimodal information to predict an inter-protein distance map. To achieve rigid docking, the roto-translation transformation is optimized to align the docked pose with the predicted distance map. In order to tackle this bi-level optimization problem, we unroll the gradient descent of the inner loop and further derive a better initialization for roto-translation transformation based on spectral estimation. Compared to baselines, BiDock achieves a promising result of a maximum 234% relative improvement in challenging antibody-antigen docking problem.

NeurIPS Conference 2023 Conference Paper

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

  • Yuhan Ding
  • Fukun Yin
  • Jiayuan Fan
  • Hui Li
  • Xin Chen
  • Wen Liu
  • Chongshan Lu
  • Gang Yu

Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perceive the entire colossal space, we explore learning the surface distribution of the scene to provide structural priors and reduce the samplable space and propose a Point Diffusion implicit Function, PDF, for large-scale scene neural representation. The core of our method is a large-scale point cloud super-resolution diffusion module that enhances the sparse point cloud reconstructed from several training images into a dense point cloud as an explicit prior. Then in the rendering stage, only sampling points with prior points within the sampling radius are retained. That is, the sampling space is reduced from the unbounded space to the scene surface. Meanwhile, to fill in the background of the scene that cannot be provided by point clouds, the region sampling based on Mip-NeRF 360 is employed to model the background representation. Expensive experiments have demonstrated the effectiveness of our method for large-scale scene novel view synthesis, which outperforms relevant state-of-the-art baselines.

AAAI Conference 2023 Conference Paper

Practical Cross-System Shilling Attacks with Limited Access to Data

  • Meifang Zeng
  • Ke Li
  • Bingchuan Jiang
  • Liujuan Cao
  • Hui Li

In shilling attacks, an adversarial party injects a few fake user profiles into a Recommender System (RS) so that the target item can be promoted or demoted. Although much effort has been devoted to developing shilling attack methods, we find that existing approaches are still far from practical. In this paper, we analyze the properties a practical shilling attack method should have and propose a new concept of Cross-system Attack. With the idea of Cross-system Attack, we design a Practical Cross-system Shilling Attack (PC-Attack) framework that requires little information about the victim RS model and the target RS data for conducting attacks. PC-Attack is trained to capture graph topology knowledge from public RS data in a self-supervised manner. Then, it is fine-tuned on a small portion of target data that is easy to access to construct fake profiles. Extensive experiments have demonstrated the superiority of PC-Attack over state-of-the-art baselines. Our implementation of PC-Attack is available at https://github.com/KDEGroup/PC-Attack.

NeurIPS Conference 2023 Conference Paper

RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks

  • Haonan Yan
  • Wenjing Zhang
  • Qian Chen
  • Xiaoguang Li
  • Wenhai Sun
  • Hui Li
  • Xiaodong Lin

Model poisoning attacks greatly jeopardize the application of federated learning (FL). The effectiveness of existing defenses is susceptible to the latest model poisoning attacks, leading to a decrease in prediction accuracy. Besides, these defenses are intractable to distinguish benign outliers from malicious gradients, which further compromises the model generalization. In this work, we propose a novel defense including detection and aggregation, named RECESS, to serve as a “vaccine” for FL against model poisoning attacks. Different from the passive analysis in previous defenses, RECESS proactively queries each participating client with a delicately constructed aggregation gradient, accompanied by the detection of malicious clients according to their responses with higher accuracy. Further, RECESS adopts a newly proposed trust scoring based mechanism to robustly aggregate gradients. Rather than previous methods of scoring in each iteration, RECESS takes into account the correlation of clients’ performance over multiple iterations to estimate the trust score, bringing in a significant increase in detection fault tolerance. Finally, we extensively evaluate RECESS on typical model architectures and four datasets under various settings including white/black-box, cross-silo/device FL, etc. Experimental results show the superiority of RECESS in terms of reducing accuracy loss caused by the latest model poisoning attacks over five classic and two state-of-the-art defenses.

ICRA Conference 2023 Conference Paper

Safe Self-Supervised Learning in Real of Visuo-Tactile Feedback Policies for Industrial Insertion

  • Letian Fu
  • Huang Huang
  • Lars Berscheid
  • Hui Li
  • Ken Goldberg
  • Sachin Chitta

Industrial insertion tasks are often performed repetitively with parts that are subject to tight tolerances and prone to breakage. Learning an industrial insertion policy in real is challenging as the collision between the parts and the environment can cause slippage or breakage of the part. In this paper, we present a safe self-supervised method to learn a visuo-tactile insertion policy that is robust to grasp pose variations. The method reduces human input and collisions between the part and the receptacle. The method divides the insertion task into two phases. In the first align phase, a tactile-based grasp pose estimation model is learned to align the insertion part with the receptacle. In the second insert phase, a vision-based policy is learned to guide the part into the receptacle. The robot uses force-torque sensing to achieve a safe self-supervised data collection pipeline. Physical experiments on the USB insertion task from the NIST Assembly Taskboard suggest that the resulting policies can achieve 45/45 insertion successes on 45 different initial grasp poses, improving on two baselines: (1) a behavior cloning agent trained on 50 human insertion demonstrations (1/45) and (2) an online RL policy (TD3) trained in real (0/45).

TIST Journal 2023 Journal Article

Toward Balancing the Efficiency and Effectiveness in k-Facility Relocation Problem

  • Hu Wang
  • Hui Li
  • Meng Wang
  • Jiangtao Cui

Facility Relocation (FR), which is an effort to reallocate the placement of facilities to adapt to the changes of urban planning, has remarkable impact on many areas. Existing solutions fail to guarantee the result quality on relocating k > 1 facilities. As k -FR problem is NP-complete and is not submodular or non-decreasing, traditional greedy algorithm cannot be directly applied. We propose to transform k -FR into another facility placement problem, which is submodular and non-decreasing. We prove that the optimal solutions of both problems are equivalent. Accordingly, we present the first approximate solution toward the k -FR, FR2FP. Our extensive comparison over both FR2FP and the state-of-the-art solution shows that FR2FP, although it provides approximation guarantee, cannot necessarily given superior results. The comparison motivates us to present an advanced approximate solution, FR2FP-ex. Moreover, based on Lagrangian relaxation, we develop an algorithm that can adjust the approximation ratio. Extensive experiments verified that, FR2FP-ex demonstrates the best result quality, and it is very close to the optimal solution. In addition, we also unveil the scenarios when the state-of-the-art would fail. We further generalize the k -FR problem, considering the budget for relocation and the cost of each facility. We also present corresponding approximate solutions toward the new problem and prove the approximation ratio.

JBHI Journal 2022 Journal Article

A Drug Recommendation Model Based on Message Propagation and DDI Gating Mechanism

  • Yongjian Ren
  • Yuliang Shi
  • Kun Zhang
  • Xinjun Wang
  • Zhiyong Chen
  • Hui Li

Drug recommendation task based on the deep learning model has been widely studied and applied in the health care field in recent years. However, the accuracy of drug recommendation models still needs to be improved. In addition, the existing recommendation models either give only one recommendation (however, there may be a variety of drug combination options in practice) or can not provide the confidence level of the recommended result. To fill these gaps, a Drug Recommendation model based on Message Propagation neural network (denoted as DRMP) is proposed in this paper. Then, the Drug-Drug Interaction (DDI) knowledge is introduced into the proposed model to reduce the DDI rate in recommended drugs. Finally, the proposed model is extended to Bayesian Neural Network (BNN) to realize multiple recommendations and give the confidence of each recommendation result, so as to provide richer information to help doctors make decisions. Experimental results on public data sets show that the proposed model is superior to the best existing models.

IJCAI Conference 2022 Conference Paper

ARCANE: An Efficient Architecture for Exact Machine Unlearning

  • Haonan Yan
  • Xiaoguang Li
  • Ziyao Guo
  • Hui Li
  • Fenghua Li
  • Xiaodong Lin

Recently users’ right-to-be-forgotten is stipulated by many laws and regulations. However, only removing the data from the dataset is not enough, as machine learning models would memorize the training data once the data is involved in model training, increasing the risk of exposing users’ privacy. To solve this problem, currently, the straightforward method, naive retraining, is to discard these data and retrain the model from scratch, which is reliable but brings much computational and time overhead. In this paper, we propose an exact unlearning architecture called ARCANE. Based on ensemble learning, we transform the naive retraining into multiple one-class classification tasks to reduce retraining cost while ensuring model performance, especially in the case of a large number of unlearning requests not considered by previous works. Then we further introduce data preprocessing methods to reduce the retraining overhead and speed up the unlearning, which includes representative data selection for redundancy removal, training state saving to reuse previous calculation results, and sorting to cope with unlearning requests of different distributions. We extensively evaluate ARCANE on three typical datasets with three common model architectures. Experiment results show the effectiveness and superiority of ARCANE over both the naive retraining and the state-of-the-art method in terms of model performance and unlearning speed.

IROS Conference 2021 Conference Paper

A Learning Approach to Robot-Agnostic Force-Guided High Precision Assembly

  • Jieliang Luo
  • Hui Li

In this work we propose a learning approach to high-precision robotic assembly problems. We focus on the contact-rich phase, where the assembly pieces are in close contact with each other. Unlike many learning-based approaches that heavily rely on vision or spatial tracking, our approach takes force/torque in task space as the only observation. Our training environment is robotless, as the end-effector is not attached to any specific robot. Trained policies can then be applied to different robotic arms without re-training. This approach can greatly reduce complexity to perform contact-rich robotic assembly in the real world, especially in unstructured settings such as in architectural construction. To achieve it, we have developed a new distributed RL agent, named Recurrent Distributed DDPG (RD2), which extends Ape-X DDPG[1] with recurrency and makes two structural improvements on prioritized experience replay[2]. Our results show that RD2 is able to solve two fundamental high-precision assembly tasks, lap-joint and peg-in-hole, and outperforms two state-of-the-art algorithms, Ape-X DDPG and PPO with LSTM. We have successfully evaluated our robot-agnostic policies on three robotic arms, Kuka KR60, Franka Panda, and UR10, in simulation. The video presenting our experiments is available at https://sites.google.com/view/rd2-rl

AAAI Conference 2021 Conference Paper

Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

  • Yanlin Wang
  • Hui Li

Code completion has become an essential component of integrated development environments. Contemporary code completion methods rely on the abstract syntax tree (AST) to generate syntactically correct code. However, they cannot fully capture the sequential and repetitive patterns of writing code and the structural information of the AST. To alleviate these problems, we propose a new code completion approach named CCAG, which models the flattened sequence of a partial AST as an AST graph. CCAG uses our proposed AST Graph Attention Block to capture different dependencies in the AST graph for representation learning in code completion. The sub-tasks of code completion are optimized via multi-task learning in CCAG, and the task balance is automatically achieved using uncertainty without the need to tune task weights. The experimental results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.

ICRA Conference 2021 Conference Paper

Learning Task-Oriented Dexterous Grasping from Human Knowledge

  • Hui Li
  • Yinlong Zhang
  • Yanan Li 0001
  • Hongsheng He

Industrial automation requires robot dexterity to automate many processes such as product assembling, packaging, and material handling. The existing robotic systems lack the capability to determining proper grasp strategies in the context of object affordances and task designations. In this paper, a framework of task-oriented dexterous grasping is proposed to learn grasp knowledge from human experience and to deploy the grasp strategies while adapting to grasp context. Grasp topology is defined and grasp strategies are learned from an established dataset for task-oriented dexterous manipulation. To adapt to various grasp context, a reinforcement-learning based grasping policy was implemented to deploy different task-oriented strategies. The performance of the system was evaluated in a simulated grasping environment by using an AR10 anthropomorphic hand installed in a Sawyer robotic arm. The proposed framework achieved a hit rate of 100% for grasp strategies and an overall top-3 match rate of 95. 6%. The success rate of grasping was 85. 6% during 2700 grasping experiments for manipulation tasks given in natural-language instructions.

IJCAI Conference 2021 Conference Paper

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

  • Guozhi Tang
  • Lele Xie
  • Lianwen Jin
  • Jiapeng Wang
  • Jingdong Chen
  • Zhen Xu
  • Qianying Wang
  • Yaqiang Wu

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e. g. , invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification problem, which requires models to carefully identify each kind of semantics by introducing multimodal features, such as font, color, layout. But simply introducing multimodal features can't work well when faced with numeric semantic categories or some ambiguous texts. To address this issue, in this paper we propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE). Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics, and simply focuses on the strong relevancy between entities. Besides, we introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values, which helps model converge more smoothly. Comprehensive experiments demonstrate that the proposed MatchVIE can significantly outperform previous methods. Notably, to the best of our knowledge, MatchVIE may be the first attempt to tackle the VIE task by modeling the relevancy between keys and values and it is a good complement to the existing methods.

TIST Journal 2020 Journal Article

FROST

  • Meng Wang
  • Hui Li
  • Jiangtao Cui
  • Sourav S. Bhowmick
  • Ping Liu

The facility relocation (FR) problem, which aims to optimize the placement of facilities to accommodate the changes of users’ locations, has a broad spectrum of applications. Despite the significant progress made by existing solutions to the FR problem, they all assume each user is stationary and represented as a single point. Unfortunately, in reality, objects (e.g., people, animals) are mobile. For example, a car-sharing user picks up a vehicle from a station close to where he or she is currently located. Consequently, these efforts may fail to identify a superior solution to the FR problem. In this article, for the first time, we take into account the movement history of users and introduce a novel FR problem, called motion-fr, to address the preceding limitation. Specifically, we present a framework called frost to address it. frost comprises two exact algorithms: index based and index free. The former is designed to address the scenario when facilities and objects are known a priori, whereas the latter solves the motion-fr problem by jettisoning this assumption. Further, we extend the index-based algorithm to solve the general k - motion-fr problem, which aims to relocate k inferior facilities. We devise an approximate solution due to NP-hardness of the problem. Experimental study over both real-world and synthetic datasets demonstrates the superiority of our framework in comparison to state-of-the-art FR techniques in efficiency and effectiveness.

ICRA Conference 2020 Conference Paper

MagicHand: Context-Aware Dexterous Grasping Using an Anthropomorphic Robotic Hand

  • Hui Li
  • Jindong Tan
  • Hongsheng He

Understanding of characteristics of objects such as fragility, rigidity, texture and dimensions facilitates and innovates robotic grasping. In this paper, we propose a context- aware anthropomorphic robotic hand (MagicHand) grasping system which is able to gather various information about its target object and generate grasping strategies based on the perceived information. In this work, NIR spectra of target objects are perceived to recognize materials on a molecular level and RGB-D images are collected to estimate dimensions of the objects. We selected six most used grasping poses and our system is able to decide the most suitable grasp strategies based on the characteristics of an object. Through multiple experiments, the performance of the MagicHand system is demonstrated.

AAAI Conference 2019 Conference Paper

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

  • Hui Li
  • Peng Wang
  • Chunhua Shen
  • Guyu Zhang

Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using offthe-shelf neural network components and only word-level annotations. It is composed of a 31-layer ResNet, an LSTMbased encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method is robust. It achieves state-of-the-art performance on irregular text recognition benchmarks and comparable results on regular text datasets. The code will be released.

ICML Conference 2018 Conference Paper

Adversarial Attack on Graph Structured Data

  • Hanjun Dai
  • Hui Li
  • Tian Tian 0001
  • Xin Huang
  • Lin Wang
  • Jun Zhu 0001
  • Le Song

Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool deep learning models by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. We further propose attack methods based on genetic algorithms and gradient descent in the scenario where additional prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers.

AAAI Conference 2018 Conference Paper

On Multi-Relational Link Prediction With Bilinear Models

  • Yanjie Wang
  • Rainer Gemulla
  • Hui Li

We study bilinear embedding models for the task of multirelational link prediction and knowledge graph completion. Bilinear models belong to the most basic models for this task, they are comparably efficient to train and use, and they can provide good prediction performance. The main goal of this paper is to explore the expressiveness of and the connections between various bilinear models proposed in the literature. In particular, a substantial number of models can be represented as bilinear models with certain additional constraints enforced on the embeddings. We explore whether or not these constraints lead to universal models, which can in principle represent every set of relations, and whether or not there are subsumption relationships between various models. We report results of an independent experimental study that evaluates recent bilinear models in a common experimental setup. Finally, we provide evidence that relation-level ensembles of multiple bilinear models can achieve state-of-the art prediction performance.

JBHI Journal 2017 Journal Article

Efficient and Privacy-Preserving Online Medical Prediagnosis Framework Using Nonlinear SVM

  • Hui Zhu
  • Xiaoxia Liu
  • Rongxing Lu
  • Hui Li

With the advances of machine learning algorithms and the pervasiveness of network terminals, the online medical prediagnosis system, which can provide the diagnosis of healthcare provider anywhere anytime, has attracted considerable interest recently. However, the flourish of online medical prediagnosis system still faces many challenges including information security and privacy preservation. In this paper, we propose an e fficient and privacy-preserving online medical prediagnosis framework, called eDiag, by using nonlinear kernel support vector machine (SVM). With eDiag, the sensitive personal health information can be processed without privacy disclosure during online prediagnosis service. Specifically, based on an improved expression for the nonlinear SVM, an efficient and privacy-preserving classification scheme is introduced with lightweight multiparty random masking and polynomial aggregation techniques. The encrypted user query is directly operated at the service provider without decryption, and the diagnosis result can only be decrypted by user. Through extensive analysis, we show that eDiag can ensure that users' health information and healthcare provider's prediction model are kept confidential, and has significantly less computation and communication overhead than existing schemes. In addition, performance evaluations via implementing eDiag on smartphone and computer demonstrate eDiag's effectiveness in term of real online environment.

AAAI Conference 2011 Conference Paper

Hybrid Planning with Temporally Extended Goals for Sustainable Ocean Observing

  • Hui Li
  • Brian Williams

A challenge to modeling and monitoring the health of the ocean environment is that it is largely under sensed and difficult to sense remotely. Autonomous underwater vehicles (AUVs) can improve observability, for example of algal bloom regions, ocean acidification, and ocean circulation. This AUV paradigm, however, requires robust operation that is cost effective and responsive to the environment. To achieve low cost we generate operational sequences automatically from science goals, and achieve robustness by reasoning about the discrete and continuous effects of actions. We introduce Kongming2, a generative planner for hybrid systems with temporally extended goals (TEGs) and temporally flexible actions. It takes as input highlevel goals and outputs trajectories and actions of the hybrid system, for example an AUV. Kongming2 makes two major extensions to Kongming1: planning for TEGs, and planning with temporally flexible actions. We demonstrated a proof of concept of the planner in the Atlantic ocean on Odyssey IV, an AUV designed and built by the MIT AUV Lab at Sea Grant.

JMLR Journal 2009 Journal Article

Multi-task Reinforcement Learning in Partially Observable Stochastic Environments

  • Hui Li
  • Xuejun Liao
  • Lawrence Carin

We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent's choice of actions is directly based on this conditional distribution, without an intervening model to characterize the environment itself. We propose off-policy batch algorithms to learn the parameters of the RPRs, using episodic data collected when following a behavior policy, and show their linkage to policy iteration. We employ the Dirichlet process as a nonparametric prior over the RPRs across multiple environments. The intrinsic clustering property of the Dirichlet process imposes sharing of episodes among similar environments, which effectively reduces the number of episodes required for learning a good policy in each environment, when data sharing is appropriate. The number of distinct RPRs and the associated clusters (the sharing patterns) are automatically discovered by exploiting the episodic data as well as the nonparametric nature of the Dirichlet process. We demonstrate the effectiveness of the proposed RPR as well as the RPR-based MTRL framework on various problems, including grid-world navigation and multi-aspect target classification. The experimental results show that the RPR is a competitive reinforcement learning algorithm in partially observable domains, and the MTRL consistently achieves better performance than single task reinforcement learning. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

AAAI Conference 2007 Conference Paper

Point-Based Policy Iteration

  • Shihao Ji
  • Hui Li

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonic improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12, 545 states) demonstrate the scalability and robustness of the PBPI algorithm.

AAAI Conference 2006 Conference Paper

Incremental Least Squares Policy Iteration for POMDPs

  • Hui Li

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency.