Arrow Research search

Author name cluster

Li Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
2 author rows

Possible papers

48

TIST Journal 2026 Journal Article

Matryoshka Representation Learning for Recommendation with Layer- and Hardness-Adaptive Negative Sampling

  • Riwei Lai
  • Li Chen
  • Weixin Chen
  • Rui Chen

Representation learning is essential for deep-neural-network-based recommender systems to capture user preferences and item features within fixed-dimensional user and item vectors. Unlike existing representation learning methods that either treat each user preference and item feature uniformly or categorize them into discrete clusters, we argue that in the real world, user preferences and item features are naturally expressed and organized in a hierarchical manner, leading to a new direction for representation learning. In this article, we introduce a novel matryoshka representation learning method for recommendation (MRL4Rec), by which we restructure user and item vectors into matryoshka representations with nested vector spaces to explicitly represent user preferences and item features at different hierarchical layers. We theoretically establish that training with the same triplets for each sliced vector cannot guarantee representation learning with hierarchical structures. Subsequently, we propose the layer- and hardness-adaptive negative sampling (LHANS) mechanism to construct training triplets, which further ensures the soundness of learned matryoshka representations in capturing hierarchical user preferences and item features. The experiments demonstrate that MRL4Rec can consistently and substantially outperform a number of state-of-the-art competitors on several real-life datasets. Our code is publicly available at https://github.com/Riwei-HEU/MRL.

AAAI Conference 2026 Conference Paper

Model Whisper: Steering Vectors Unlock Large Language Models’ Potential in Test-Time

  • Xinyue Kang
  • Diwei Shi
  • Li Chen

It is a critical challenge to efficiently unlock the powerful reasoning potential of Large Language Models (LLMs) for specific tasks or new distributions. Existing test-time adaptation methods often require tuning model parameters, which is not only computationally expensive but also risks degrading the model's pre-existing abilities.To address this, we introduce a lightweight component, Test-Time Steering Vectors (TTSV), which is prepended to the input while keeping the LLM's parameters entirely frozen. By optimizing the TTSV on test data to minimize the model's output entropy, we steer the model towards an internal state of higher confidence, activating its inherent abilities most relevant to the current task. TTSV is both lightweight and highly efficient to optimize, making it a true plug-and-play enhancement. Extensive experiments validate our approach's effectiveness on both base models and reasoning-enhanced models. For instance, on the MATH500 task, TTSV achieves a 45.88% relative performance gain on the Qwen2.5-Math-7B model and a 16.22% relative gain on the Qwen3-4B model. Furthermore, our approach exhibits robust generalization, with its steering vectors proving highly transferable across diverse tasks.

AAAI Conference 2025 Conference Paper

From Pairwise to Ranking: Climbing the Ladder to Ideal Collaborative Filtering with Pseudo-Ranking

  • Yuhan Zhao
  • Rui Chen
  • Li Chen
  • Shuang Zhang
  • Qilong Han
  • Hongtao Song

Intuitively, an ideal collaborative filtering (CF) model should learn from users' full rankings over all items to make optimal top-K recommendations. Due to the absence of such full rankings in practice, most CF models rely on pairwise loss functions to approximate full rankings, resulting in an immense performance gap. In this paper, we provide a novel analysis using the multiple ordinal classification concept to reveal the inevitable gap between a pairwise approximation and the ideal case. However, bridging the gap in practice encounters two formidable challenges: (1) none of the real-world datasets contains full ranking information; (2) there does not exist a loss function that is capable of consuming ranking information. To overcome these challenges, we propose a pseudo-ranking paradigm (PRP) that addresses the lack of ranking information by introducing pseudo-rankings supervised by an original noise injection mechanism. Additionally, we put forward a new ranking loss function designed to handle ranking information effectively. To ensure our method's robustness against potential inaccuracies in pseudo-rankings, we equip the ranking loss function with a gradient-based confidence mechanism to detect and mitigate abnormal gradients. Extensive experiments on four real-world datasets demonstrate that PRP significantly outperforms state-of-the-art methods.

ICML Conference 2025 Conference Paper

On the Power of Learning-Augmented Search Trees

  • Jingbang Chen
  • Xinyuan Cao
  • Alicia Stepin
  • Li Chen

We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities. The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$. Specifically, each item $x$ is assigned a composite priority of $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality. This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML‘22], which only work for Zipfian distributions, by extending them to arbitrary input distributions. Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP’09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.

AAAI Conference 2025 Conference Paper

QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition

  • Chengpeng Wang
  • Li Chen
  • Lili Wang
  • Zhaofan Li
  • Xuebin Lv

Facial expression recognition faces challenges where labeled significant features in datasets are mixed with unlabeled redundant ones. In this paper, we introduce Cross Similarity Attention (CSA) to mine richer intrinsic information from image pairs, overcoming a limitation when the Scaled Dot-Product Attention of ViT is directly applied to calculate the similarity between two different images. Based on CSA, we simultaneously minimize intra-class differences and maximize inter-class differences at the fine-grained feature level through interactions among multiple branches. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. We ingeniously design a four-branch centrally symmetric network, named Quadruplet Cross Similarity (QCS), which alleviates gradient conflicts arising from the cross module and achieves balanced and stable training. It can adaptively extract discriminative features while isolating redundant ones. The cross-attention modules exist during training, and only one base branch is retained during inference, resulting in no increase in inference time. Extensive experiments show that our proposed method achieves state-of-the-art performance on several FER datasets.

NeurIPS Conference 2025 Conference Paper

ReSim: Reliable World Simulation for Autonomous Driving

  • Jiazhi Yang
  • Kashyap Chitta
  • Shenyuan Gao
  • Long Chen
  • Yuqian Shao
  • Xiaosong Jia
  • Hongyang Li
  • Andreas Geiger

How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e. g. , CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of diverse open-world driving scenarios under various actions, including hazardous non-expert ones. To close the gap between high-fidelity simulation and applications that require reward signals to judge different actions, we introduce a Video2Reward module that estimates reward from ReSim’s simulated future. Our ReSim paradigm achieves up to 44% higher visual fidelity, improves controllability for both expert and non-expert actions by over 50%, and boosts planning and policy selection performance on NAVSIM by 2% and 25%, respectively.

AAAI Conference 2025 Conference Paper

Simplifying Control Mechanism in Text-to-Image Diffusion Models

  • Zhida Feng
  • Li Chen
  • Yuenan Sun
  • Jiaxiang Liu
  • Shikun Feng

ControlNet has significantly advanced controllable image generation by integrating dense conditions (such as depth and canny edges) with text-to-image diffusion models. However, ControlNet's integration requires an additional amount nearly equal to half of the base diffusion model's parameters, making it inefficient. To address this, we introduce Simple-ControlNet, an efficient and streamlined network for controllable text-to-image generation. It employs a single-scale projection layer to incorporate condition information into the denoising U-Net. It is supplemented by Low-Rank Adapter (LoRA) parameters to facilitate condition learning. Impressively, Simple-ControlNet requires fewer than 3 million parameters for the control mechanism, substantially less than the 300 million needed by ControlNet. Our extensive experiments confirm that Simple-ControlNet matches and surpasses ControlNet's performance across a broad range of tasks and base diffusion models, showcasing its utility and efficiency.

NeurIPS Conference 2025 Conference Paper

Transcending Cost-Quality Tradeoff in Agent Serving via Session-Awareness

  • Yanyu Ren
  • Li Chen
  • Dan Li
  • Xizheng Wang
  • Zhiyuan Wu
  • Yukai Miao
  • Yu Bai

Large Language Model (LLM) agents are capable of task execution across various domains by autonomously interacting with environments and refining LLM responses based on feedback. However, existing model serving systems are not optimized for the unique demands of serving agents. Compared to classic model serving, agent serving has different characteristics: predictable request pattern, increasing quality requirement, and unique prompt formatting. We identify a key problem for agent serving: LLM serving systems lack session-awareness. They neither perform effective KV cache management nor precisely select the cheapest yet competent model in each round. This leads to a cost-quality tradeoff, and we identify an opportunity to surpass it in an agent serving system. To this end, we introduce AgServe for AGile AGent SERVing. AgServe features a session-aware server that boosts KV cache reuse via Estimated-Time-of-Arrival-based eviction and in-place positional embedding calibration, a quality-aware client that performs session-aware model cascading through real-time quality assessment, and a dynamic resource scheduler that maximizes GPU utilization. With AgServe, we allow agents to select and upgrade models during the session lifetime, and to achieve similar quality at much lower costs, effectively transcending the tradeoff. Extensive experiments on real testbeds demonstrate that AgServe (1) achieves comparable response quality to GPT-4o at a 16. 5\% cost. (2) delivers 1. 8$\times$ improvement in quality relative to the tradeoff curve.

NeurIPS Conference 2025 Conference Paper

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

  • Bowen Chen
  • Brynn zhao
  • Haomiao Sun
  • Li Chen
  • Xu Wang
  • Daniel Du
  • Xinglong Wu

Achieving fine-grained control over subject identity and semantic attributes (pose, style, lighting) in text-to-image generation, particularly for multiple subjects, often undermines the editability and coherence of Diffusion Transformers (DiTs). Many approaches introduce artifacts or suffer from attribute entanglement. To overcome these challenges, we propose a novel multi-subject controlled generation model XVerse. By transforming reference images into offsets for token-specific text-stream modulation, XVerse allows for precise and independent control for specific subject without disrupting image latents or features. Consequently, XVerse offers high-fidelity, editable multi-subject image synthesis with robust control over individual subject characteristics and semantic attributes. This advancement significantly improves personalized and complex scene generation capabilities.

NeurIPS Conference 2024 Conference Paper

$\texttt{Model-GLUE}$: Democratized LLM Scaling for A Large Model Zoo in the Wild

  • Xinyu Zhao
  • Guoheng Sun
  • Ruisi Cai
  • Yukun Zhou
  • Pingzhi Li
  • Peihao Wang
  • Bowen Tan
  • Yexiao He

As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has gained significant attention, which is challenged by potential performance drop when combining disparate models. Various techniques have been proposed to aggregate pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces $\texttt{Model-GLUE}$, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate a strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization. Our methodology involves clustering mergeable models, selecting a merging strategy, and integrating model clusters through model-level mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, $\texttt{Model-GLUE}$ shows an average performance enhancement of 5. 61\%, achieved without additional training. Codes are available at https: //github. com/Model-GLUE/Model-GLUE.

AAAI Conference 2024 Conference Paper

Adaptive Hardness Negative Sampling for Collaborative Filtering

  • Riwei Lai
  • Rui Chen
  • Qilong Han
  • Chi Zhang
  • Li Chen

Negative sampling is essential for implicit collaborative filtering to provide proper negative training signals so as to achieve desirable performance. We experimentally unveil a common limitation of all existing negative sampling methods that they can only select negative samples of a fixed hardness level, leading to the false positive problem (FPP) and false negative problem (FNP). We then propose a new paradigm called adaptive hardness negative sampling (AHNS) and discuss its three key criteria. By adaptively selecting negative samples with appropriate hardnesses during the training process, AHNS can well mitigate the impacts of FPP and FNP. Next, we present a concrete instantiation of AHNS called AHNS_{p<0}, and theoretically demonstrate that AHNS_{p<0} can fit the three criteria of AHNS well and achieve a larger lower bound of normalized discounted cumulative gain. Besides, we note that existing negative sampling methods can be regarded as more relaxed cases of AHNS. Finally, we conduct comprehensive experiments, and the results show that AHNS_{p<0} can consistently and substantially outperform several state-of-the-art competitors on multiple datasets.

JBHI Journal 2024 Journal Article

Adaptive Multi-Dimensional Weighted Network With Category-Aware Contrastive Learning for Fine-Grained Hand Bone Segmentation

  • Bolun Zeng
  • Li Chen
  • Yuanyi Zheng
  • Xiaojun Chen

Accurately delineating and categorizing individual hand bones in 3D ultrasound (US) is a promising technology for precise digital diagnostic analysis. However, this is a challenging task due to the inherent imaging limitations of the US and the insignificant feature differences among numerous bones. In this study, we have proposed a novel deep learning-based solution for pediatric hand bone segmentation in the US. Our method is unique in that it allows for effective detailed feature mining through an adaptive multi-dimensional weighting attention mechanism. It innovatively implements a category-aware contrastive learning method to highlight inter-class semantic feature differences, thereby enhancing the category discrimination performance of the model. Extensive experiments on the challenging pediatric clinical hand 3D US datasets show the outstanding performance of the proposed method in segmenting thirty-eight bone structures, with the average Dice coefficient of 90. 0%. The results outperform other state-of-the-art methods, demonstrating its effectiveness in fine-grained hand bone segmentation. Our method will be globally released as a plugin in the 3D Slicer, providing an innovative and reliable tool for relevant clinical applications.

AAAI Conference 2024 Conference Paper

AvatarVerse: High-Quality & Stable 3D Avatar Creation from Text and Pose

  • Huichao Zhang
  • Bowen Chen
  • Hao Yang
  • Liao Qu
  • Xu Wang
  • Li Chen
  • Chao Long
  • Feida Zhu

Creating expressive, diverse and high-quality 3D avatars from highly customized text descriptions and pose guidance is a challenging task, due to the intricacy of modeling and texturing in 3D that ensure details and various styles (realistic, fictional, etc). We present AvatarVerse, a stable pipeline for generating expressive high-quality 3D avatars from nothing but text descriptions and pose guidance. In specific, we introduce a 2D diffusion model conditioned on DensePose signal to establish 3D pose control of avatars through 2D images, which enhances view consistency from partially observed scenarios. It addresses the infamous Janus Problem and significantly stablizes the generation process. Moreover, we propose a progressive high-resolution 3D synthesis strategy, which obtains substantial improvement over the quality of the created 3D avatars. To this end, the proposed AvatarVerse pipeline achieves zero-shot 3D modeling of 3D avatars that are not only more expressive, but also in higher quality and fidelity than previous works. Rigorous qualitative evaluations and user studies showcase AvatarVerse's superiority in synthesizing high-fidelity 3D avatars, leading to a new standard in high-quality and stable 3D avatar creation. Our project page is: https://avatarverse3d.github.io/.

NeurIPS Conference 2024 Conference Paper

Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

  • Qingwen Bu
  • Jia Zeng
  • Li Chen
  • Yanchao Yang
  • Guyue Zhou
  • Junchi Yan
  • Ping Luo
  • Heming Cui

Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-trained visual representations, yet their efficacy and adaptability have been found to be constrained. Inspired by classic closed-loop control systems, we propose CLOVER, a closed-loop visuomotor control framework that incorporates feedback mechanisms to improve adaptive robotic control. CLOVER consists of a text-conditioned video diffusion model for generating visual plans as reference inputs, a measurable embedding space for accurate error quantification, and a feedback-driven controller that refines actions from feedback and initiates replans as needed. Our framework exhibits notable advancement in real-world robotic tasks and achieves state-of-the-art on CALVIN benchmark, improving by 8% over previous open-loop counterparts. Code and checkpoints are maintained at https: //github. com/OpenDriveLab/CLOVER.

NeurIPS Conference 2024 Conference Paper

Fairness-Aware Meta-Learning via Nash Bargaining

  • Yi Zeng
  • Xuelin Yang
  • Li Chen
  • Cristian C. Ferrer
  • Ming Jin
  • Michael I. Jordan
  • Ruoxi Jia

To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.

JBHI Journal 2024 Journal Article

OVAR-BPnet: A General Pulse Wave Deep Learning Approach for Cuffless Blood Pressure Measurement

  • Yuhui Cen
  • Jingchun Luo
  • Hongbo Wang
  • Li Chen
  • Xing Zhu
  • Shijie Guo
  • Jingjing Luo

Pulse wave analysis, a non-invasive and cuff-less approach, holds promise for blood pressure (BP) measurement in precision medicine. In recent years, pulse wave learning for BP estimation has undergone extensive scrutiny. However, prevailing methods still encounter challenges in grasping comprehensive features from pulse waves and generalizing these insights for precise BP estimation. In this study, we propose a general pulse wave deep learning (PWDL) approach for BP estimation, introduc-ing the OVAR-BPnet model to powerfully capture intricate pulse wave features and showcasing its effectiveness on multiple types of pulse waves. The approach involves constructing population pulse waves and employing a model comprising an omni-scale convolution subnet, a Vision Transformer subnet, and a multilayer perceptron subnet. This design enables the learning of both single-period and multi-period waveform features from multiple subjects. Additionally, the approach employs a data augmentation strategy to enhance the morphological features of pulse waves and devise a label sequence regularization strategy to strengthen the intrinsic relationship of the subnets' output. Notably, this is the first study to validate the performance of the deep learning approach of BP estimation on three types of pulse waves: photoplethysmography, forehead imaging photoplethysmography, and radial artery pulse pressure waveform. Experiments show that the OVAR-BPnet model has achieved advanced levels in both evaluation indicators and international evaluation criteria, demonstrating its excellent competitiveness and generalizability. The PWDL approach has the potential for widespread application in convenient and continuous BP monitoring systems.

NeurIPS Conference 2024 Conference Paper

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

  • Haochen Liu
  • Li Chen
  • Yu Qiao
  • Chen Lv
  • Hongyang Li

Autonomous driving system aims for safe and social-consistent driving through the behavioral integration among interactive agents. However, challenges remain due to multi-agent scene uncertainty and heterogeneous interaction. Current dense and sparse behavioral representations struggle with inefficiency and inconsistency in multi-agent modeling, leading to instability of collective behavioral patterns when integrating prediction and planning (IPP). To address this, we initiate a topological formation that serves as a compliant behavioral foreground to guide downstream trajectory generations. Specifically, we introduce Behavioral Topology (BeTop), a pivotal topological formulation that explicitly represents the consensual behavioral pattern among multi-agent future. BeTop is derived from braid theory to distill compliant interactive topology from multi-agent future trajectories. A synergistic learning framework (BeTopNet) supervised by BeTop facilitates the consistency of behavior prediction and planning within the predicted topology priors. Through imitative contingency learning, BeTop also effectively manages behavioral uncertainty for prediction and planning. Extensive verification on large-scale real-world datasets, including nuPlan and WOMD, demonstrates that BeTop achieves state-of-the-art performance in both prediction and planning tasks. Further validations on the proposed interactive scenario benchmark showcase planning compliance in interactive cases. Code and model is available at https: //github. com/OpenDriveLab/BeTop.

NeurIPS Conference 2024 Conference Paper

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

  • Shenyuan Gao
  • Jiazhi Yang
  • Li Chen
  • Kashyap Chitta
  • Yihang Qiu
  • Andreas Geiger
  • Jun Zhang
  • Hongyang Li

World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. Based on a systematic diagnosis of existing methods, we introduce several key ingredients to address these limitations. To accurately predict real-world dynamics at high resolution, we propose two novel losses to promote the learning of moving instances and structural information. We also devise an effective latent replacement approach to inject historical frames as priors for coherent long-horizon rollouts. For action controllability, we incorporate a versatile set of controls from high-level intentions (command, goal point) to low-level maneuvers (trajectory, angle, and speed) through an efficient learning strategy. After large-scale training, the capabilities of Vista can seamlessly generalize to different scenarios. Extensive experiments on multiple datasets show that Vista outperforms the most advanced general-purpose video generator in over 70% of comparisons and surpasses the best-performing driving world model by 55% in FID and 27% in FVD. Moreover, for the first time, we utilize the capacity of Vista itself to establish a generalizable reward for real-world action evaluation without accessing the ground truth actions.

IROS Conference 2023 Conference Paper

FABRIKv: A Fast, Iterative Inverse Kinematics Solver for Surgical Continuum Robot with Variable Curvature Model

  • Fuhao Wang
  • Wang Ye
  • Xiaoyang Kang 0001
  • Hongbo Wang
  • Jingjing Luo
  • Li Chen
  • Xiuhong Tang

Due to the advantages of high flexibility, large workspace, and good human-body compatibility, flexible tendon-driven surgical continuum robots have attracted a lot of attention in robot-assisted minimally invasive surgery. However, due to the coupling of the position and angle of the continuum robot, and the easy deformation of the external force, its inverse kinematics solution has always been a challenge. This paper proposes a fast inverse kinematics solver for surgical continuum robots with a variable curvature model. Firstly, the deformation of the continuum robot is analyzed, and a representation method of the variable curvature model is proposed. Next, to solve the inverse kinematics problem when the continuum robot deforms under load, FABRIKv is proposed by improving the Forward And Backward Reaching Inverse Kinematics (FABRIK). During the inverse kinematics solution, the algorithm preserves the real-time nature of FABRIK and corrects for deformation effects caused by the load. Finally, the experiment verifies the rationality and effectiveness of the variable curvature model representation method, as well as the fastness and accuracy of the FARIKv solver.

TIST Journal 2023 Journal Article

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

  • Lei Li
  • Yongfeng Zhang
  • Li Chen

Explaining to users why some items are recommended is critical, as it can help users to make better decisions, increase their satisfaction, and gain their trust in recommender systems (RS). However, existing explainable RS usually consider explanation as a side output of the recommendation model, which has two problems: (1) It is difficult to evaluate the produced explanations, because they are usually model-dependent, and (2) as a result, how the explanations impact the recommendation performance is less investigated. In this article, explaining recommendations is formulated as a ranking task and learned from data, similarly to item ranking for recommendation. This makes it possible for standard evaluation of explanations via ranking metrics (e.g., Normalized Discounted Cumulative Gain). Furthermore, this article extends traditional item ranking to an item–explanation joint-ranking formalization to study if purposely selecting explanations could reach certain learning goals, e.g., improving recommendation performance. A great challenge, however, is that the sparsity issue in the user-item-explanation data would be inevitably severer than that in traditional user–item interaction data, since not every user–item pair can be associated with all explanations. To mitigate this issue, this article proposes to perform two sets of matrix factorization by considering the ternary relationship as two groups of binary relationships. Experiments on three large datasets verify the solution’s effectiveness on both explanation ranking and item recommendation.

NeurIPS Conference 2023 Conference Paper

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping

  • Huijie Wang
  • Tianyu Li
  • Yang Li
  • Li Chen
  • Chonghao Sima
  • Zhenbo Liu
  • Bangjun Wang
  • Peijin Jia

Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. The objective of the presented dataset is to advance research in understanding the structure of road scenes by examining the relationship between perceived entities, such as traffic elements and lanes. Leveraging existing datasets, OpenLane-V2 consists of 2, 000 annotated road scenes that describe traffic elements and their correlation to the lanes. It comprises three primary sub-tasks, including the 3D lane detection inherited from OpenLane, accompanied by corresponding metrics to evaluate the model’s performance. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.

NeurIPS Conference 2023 Conference Paper

REASONER: An Explainable Recommendation Dataset with Comprehensive Labeling Ground Truths

  • Xu Chen
  • Jingsen Zhang
  • Lei Wang
  • Quanyu Dai
  • Zhenhua Dong
  • Ruiming Tang
  • Rui Zhang
  • Li Chen

Explainable recommendation has attracted much attention from the industry and academic communities. It has shown great potential to improve the recommendation persuasiveness, informativeness and user satisfaction. In the past few years, while a lot of promising explainable recommender models have been proposed, the datasets used to evaluate them still suffer from several limitations, for example, the explanation ground truths are not labeled by the real users, the explanations are mostly single-modal and around only one aspect. To bridge these gaps, in this paper, we build a new explainable recommendation dataset, which, to our knowledge, is the first contribution that provides a large amount of real user labeled multi-modal and multi-aspect explaination ground truths. In specific, we firstly develop a video recommendation platform, where a series of questions around the recommendation explainability are carefully designed. Then, we recruit about 3000 high-quality labelers with different backgrounds to use the system, and collect their behaviors and feedback to our questions. In this paper, we detail the construction process of our dataset and also provide extensive analysis on its characteristics. In addition, we develop a library, where ten well-known explainable recommender models are implemented in a unified framework. Based on this library, we build several benchmarks for different explainable recommendation tasks. At last, we present many new opportunities brought by our dataset, which are expected to promote the field of explainable recommendation. Our dataset, library and the related documents have been released at https: //reasoner2023. github. io/.

AAAI Conference 2023 Short Paper

Self-Paced Learning Based Graph Convolutional Neural Network for Mixed Integer Programming (Student Abstract)

  • Li Chen
  • Hua Xu
  • Ziteng Wang
  • Chengming Wang
  • Yu Jiang

Graph convolutional neural network (GCN) based methods have achieved noticeable performance in solving mixed integer programming problems (MIPs). However, the generalization of existing work is limited due to the problem structure. This paper proposes a self-paced learning (SPL) based GCN network (SPGCN) with curriculum learning (CL) to make the utmost of samples. SPGCN employs a GCN model to imitate the branching variable selection during the branch and bound process, while the training process is conducted in a self-paced fashion. Specifically, SPGCN contains a loss-based automatic difficulty measurer, where the training loss of the sample represents the difficulty level. In each iteration, a dynamic training dataset is constructed according to the difficulty level for GCN model training. Experiments on four NP-hard datasets verify that CL can lead to generalization improvement and convergence speedup in solving MIPs, where SPL performs better than predefined CL methods.

AAAI Conference 2023 Conference Paper

SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines

  • Shizun Wang
  • Weihong Zeng
  • Xu Wang
  • Hao Yang
  • Li Chen
  • Chuang Zhang
  • Ming Wu
  • Yi Yuan

The creation of a parameterized stylized character involves careful selection of numerous parameters, also known as the "avatar vectors" that can be interpreted by the avatar engine. Existing unsupervised avatar vector estimation methods that auto-create avatars for users, however, often fail to work because of the domain gap between realistic faces and stylized avatar images. To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works. SwiftAvatar introduces dual-domain generators to create pairs of realistic faces and avatar images using shared latent codes. The latent codes can then be bridged with the avatar vectors as pairs, by performing GAN inversion on the avatar images rendered from the engine using avatar vectors. Through this way, we are able to synthesize paired data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces. We also propose semantic augmentation to improve the diversity of synthesis. Finally, a light-weight avatar vector estimator is trained on the synthetic pairs to implement efficient auto-creation. Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines. The superiority and advantageous flexibility of SwiftAvatar are also verified in both subjective and objective evaluations.

NeurIPS Conference 2022 Conference Paper

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

  • Penghao Wu
  • Xiaosong Jia
  • Li Chen
  • Junchi Yan
  • Hongyang Li
  • Yu Qiao

Current end-to-end autonomous driving methods either run a controller based on a planned trajectory or perform control prediction directly, which have spanned two separately studied lines of research. Seeing their potential mutual benefits to each other, this paper takes the initiative to explore the combination of these two well-developed worlds. Specifically, our integrated approach has two branches for trajectory planning and direct control, respectively. The trajectory branch predicts the future trajectory, while the control branch involves a novel multi-step prediction scheme such that the relationship between current actions and future states can be reasoned. The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step. The outputs from two branches are then fused to achieve complementary advantages. Our results are evaluated in the closed-loop urban driving setting with challenging scenarios using the CARLA simulator. Even with a monocular camera input, the proposed approach ranks first on the official CARLA Leaderboard, outperforming other complex candidates with multiple sensors or fusion mechanisms by a large margin. The sourcecode is publicly available at https: //github. com/OpenPerceptionX/TCP

NeurIPS Conference 2021 Conference Paper

Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

  • Chen Ma
  • Xiangyu Guo
  • Li Chen
  • Jun-Hai Yong
  • Yisen Wang

One major problem in black-box adversarial attacks is the high query complexity in the hard-label attack setting, where only the top-1 predicted label is available. In this paper, we propose a novel geometric-based approach called Tangent Attack (TA), which identifies an optimal tangent point of a virtual hemisphere located on the decision boundary to reduce the distortion of the attack. Assuming the decision boundary is locally flat, we theoretically prove that the minimum $\ell_2$ distortion can be obtained by reaching the decision boundary along the tangent line passing through such tangent point in each iteration. To improve the robustness of our method, we further propose a generalized method which replaces the hemisphere with a semi-ellipsoid to adapt to curved decision boundaries. Our approach is free of pre-training. Extensive experiments conducted on the ImageNet and CIFAR-10 datasets demonstrate that our approach can consume only a small number of queries to achieve the low-magnitude distortion. The implementation source code is released online.

AAAI Conference 2020 Short Paper

CORAL-DMOEA: Correlation Alignment-Based Information Transfer for Dynamic Multi-Objective Optimization (Student Abstract)

  • Li Chen
  • Hua Xu

One essential characteristic of dynamic multi-objective optimization problems is that Pareto-Optimal Front/Set (POF/POS) varies over time. Tracking the time-dependent POF/POS is a challenging problem. Since continuous environments are usually highly correlated, past information is critical for the next optimization process. In this paper, we integrate CORAL methodology into a dynamic multi-objective evolutionary algorithm, named CORAL-DMOEA. This approach employs CORAL to construct a transfer model which transfer past well-performed solutions to form an initial population for the next optimization process. Experimental results demonstrate that CORAL-DMOEA can effectively improve the quality of solutions and accelerate the evolution process.

NeurIPS Conference 2018 Conference Paper

Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

  • Yuanxiang Gao
  • Li Chen
  • Baochun Li

Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements. In this paper, we propose a new joint learning algorithm, called Post, that integrates cross-entropy minimization and proximal policy optimization to achieve theoretically guaranteed optimal efficiency. In order to incorporate the cross-entropy method as a sampling technique, we propose to represent placements using discrete probability distributions, which allows us to estimate an optimal probability mass by maximal likelihood estimation, a powerful tool with the best possible efficiency. We have implemented Post in the Google Cloud platform, and our extensive experiments with several popular neural network training benchmarks have demonstrated clear evidence of superior performance: with the same amount of learning time, it leads to placements that have training times up to 63. 7% shorter over the state-of-the-art.

IS Journal 2014 Journal Article

An Adaptive Fusion Algorithm for Spam Detection

  • Congfu Xu
  • Baojun Su
  • Yunbiao Cheng
  • Weike Pan
  • Li Chen

Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.

ICRA Conference 2014 Conference Paper

Dynamic modeling and control of a free-flying space robot with flexible-link and flexible-joints

  • Xiaoyan Yu
  • Li Chen

Free-flying space robot's nonlinearity and strong coupling characters make the dynamics and control of such system more complicated than a terrestrial robot system. The Space robots are always built in very light for saving the launch energy. The flexibility of the joints and links is considerable arising from its elasticity. Controlling this manipulator is more complex than controlling one with rigid joints due to the interactions of rigid and flexible motion, in which only a single actuation signal can be applied at each joint and has to control the flexure of both the joint itself and the link attached to it. To discussing an under-actuated flexible-link flexible-joint space manipulator, a free-flying space manipulator with one flexible link and two flexible revolute joints is presented in this paper. The dynamical Lagrange equation is established, and a singularly perturbed model has been formulated and used for designing a reduced-order controller. This controller consists of a rigid control component and two fast control components. Numerical simulations show that the link and joint vibrations have been stabilized effectively with good tracking performance.

IJCAI Conference 2013 Conference Paper

GBPR: Group Preference Based Bayesian Personalized Ranking for One-Class Collaborative Filtering

  • Weike Pan
  • Li Chen

One-class collaborative filtering or collaborative ranking with implicit feedback has been steadily receiving more attention, mostly due to the “oneclass” characteristics of data in various services, e. g. , “like” in Facebook and “bought” in Amazon. Previous works for solving this problem include pointwise regression methods based on absolute rating assumptions and pairwise ranking methods with relative score assumptions, where the latter was empirically found performing much better because it models users’ ranking-related preferences more directly. However, the two fundamental assumptions made in the pairwise ranking methods, (1) individual pairwise preference over two items and (2) independence between two users, may not always hold. As a response, we propose a new and improved assumption, group Bayesian personalized ranking (GBPR), via introducing richer interactions among users. In particular, we introduce group preference, to relax the aforementioned individual and independence assumptions. We then design a novel algorithm correspondingly, which can recommend items more accurately as shown by various ranking-oriented evaluation metrics on four real-world datasets in our experiments.

TIST Journal 2013 Journal Article

Generating virtual ratings from chinese reviews to augment online recommendations

  • Weishi Zhang
  • Guiguang Ding
  • Li Chen
  • Chunping Li
  • Chengbo Zhang

Collaborative filtering (CF) recommenders based on User-Item rating matrix as explicitly obtained from end users have recently appeared promising in recommender systems. However, User-Item rating matrix is not always available or very sparse in some web applications, which has critical impact to the application of CF recommenders. In this article we aim to enhance the online recommender system by fusing virtual ratings as derived from user reviews. Specifically, taking into account of Chinese reviews' characteristics, we propose to fuse the self-supervised emotion-integrated sentiment classification results into CF recommenders, by which the User-Item Rating Matrix can be inferred by decomposing item reviews that users gave to the items. The main advantage of this approach is that it can extend CF recommenders to some web applications without user rating information. In the experiments, we have first identified the self-supervised sentiment classification's higher precision and recall by comparing it with traditional classification methods. Furthermore, the classification results, as behaving as virtual ratings, were incorporated into both user-based and item-based CF algorithms. We have also conducted an experiment to evaluate the proximity between the virtual and real ratings and clarified the effectiveness of the virtual ratings. The experimental results demonstrated the significant impact of virtual ratings on increasing system's recommendation accuracy in different data conditions (i.e., conditions with real ratings and without).

JMLR Journal 2013 Journal Article

The CAM Software for Nonnegative Blind Source Separation in R-Java

  • Niya Wang
  • Fan Meng
  • Li Chen
  • Subha Madhavan
  • Robert Clarke
  • Eric P. Hoffman
  • Jianhua Xuan
  • Yue Wang

We describe a R-Java CAM (convex analysis of mixtures) package that provides comprehensive analytic functions and a graphic user interface ( GUI ) for blindly separating mixed nonnegative sources. This open-source multiplatform software implements recent and classic algorithms in the literature including Chan et al. (2008), Wang et al. (2010), Chen et al. (2011a) and Chen et al. (2011b). The CAM package offers several attractive features: (1) instead of using proprietary MATLAB, its analytic functions are written in R, which makes the codes more portable and easier to modify; (2) besides producing and plotting results in R, it also provides a Java GUI for automatic progress update and convenient visual monitoring; (3) multi-thread interactions between the R and Java modules are driven and integrated by a Java GUI, assuring that the whole CAM software runs responsively; (4) the package offers a simple mechanism to allow others to plug-in additional R -functions. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2013. ( edit, beta )

IROS Conference 2009 Conference Paper

Robust adaptive composite control of space-based robot system with uncertain parameters and external disturbances

  • Zhiyong Chen
  • Li Chen

In this paper, the control problem of space robot system with uncertain parameters and external disturbances is discussed. With the momentum conservation of the system, the kinematics and dynamics of the system are analyzed, and it is found that the generalized Jacobi matrix and the dynamic equations of the system are nonlinearly dependent on inertial parameters. In order to overcome the problems mentioned above, the idea of augmentation approach is introduced. It is shown that the augmented generalized Jacobi matrix and the dynamic equations of the system can be linearly dependent on a group of inertial parameters with augmented inputs and outputs. Based on the results, a robust adaptive composite control scheme for space-based robot to track the desired trajectories in inertial space is developed. The stability of the overall system is analyzed through Lyapunov direct method. For the proposed approach, the global uniform asymptotic stability of the system is established. In addition, the controller presented possesses the advantage that it needs no measurement of the position, linear velocity and acceleration of the base with respect to the orbit, because of the effective exploitation of the particular property of system dynamics. To show the feasibility of control scheme, a planar space robot system is simulated.

IROS Conference 2006 Conference Paper

Adaptive Control of Dual-Arm Space Robot System in Joint Space

  • Li Chen
  • Yishen Guo

In this paper, the coordinated control of the base's attitude and its arm's joints of dual-arm space robot system is studied. Firstly, in order to overcome the difficulty that the dynamic equations of dual-arm space robot system can not be linearly parameterized, the system is modeled as under-actuated robot system. And then with the augmentation approach, we demonstrate that the dynamic equations of the system can be linearly dependent on inertial parameters. Based on the results, an adaptive control scheme of dual-arm space robot system for coordinated motion between the base's attitude and the arm's joints is developed. The asymptotic stability of the control scheme is proved with Lyapunov method. A planar dual-arm space robot system with two objects is simulated to verify the proposed control scheme.

AAAI Conference 2006 Conference Paper

Evaluating Critiquing-based Recommender Agents

  • Li Chen

We describe a user study evaluating two critiquing-based recommender agents based on three criteria: decision accuracy, decision effort, and user confidence. Results show that user-motivated critiques were more frequently applied and the example critiquing system employing only this type of critiques achieved the best results. In particular, the example critiquing agent significantly improves users’ decision accuracy with less cognitive effort consumed than the dynamic critiquing recommender with system-proposed critiques. Additionally, the former is more likely to inspire users’ confidence of their choice and promote their intention to purchase and return to the agent for future use.

IROS Conference 2006 Conference Paper

Robust Control of Dual-Arm Space Robot System with Two Objects in Joint Space

  • Yishen Guo
  • Li Chen

In this paper, the robust control problem for free-floating dual-arm space robot system with two objects is discussed. Firstly, the kinematics and dynamics of free-floating dual-arm space robot system with two objects are analysed. In order to overcome the difficulty that the dynamic equations of the system are nonlinearly dependent on inertial parameters, the system is modeled as under-actuated robot system, and the idea of augmentation approach is adopted. It is demonstrated that the dynamic equations of the system can be linearly dependent on a group of inertial parameters with augmented inputs and outputs. And then based on the above results, the robust control scheme for dual-arm space robot system with uncertain inertial parameters to track the desired trajectories in joint space is proposed. The proposed control scheme is computationally simple, because we guarantee the controller robust to the uncertain inertial parameters rather than explicitly estimating them online. In addition to this, it need not require controlling the position and attitude of the floating base. Finally, a planar free-floating dual-arm space robot system with two objects is simulated to verify the proposed control scheme.

ICRA Conference 2004 Conference Paper

Studies on Lateral Rolling Locomotion of a Snake Robot

  • Li Chen
  • Yuechao Wang
  • Shugen Ma
  • Bin Li 0001

A reconfigurable modular snake robot has been developed, which can not only move on a plane but aIso achieve some f-dimensional motions while reconfigured. Control equations of 3-dimensional locomotion were established by the composition of two bending motions in mutual orthogonal plane. Three types of lateral rolling locomotion, flapping, linear rolling and curved rolling, were achieved by controlling the amplitudes and the number of two waves in the two bending motions. Using the three types of locomotion the snake robot can realize net lateral translation, alternation of its contact base and rolling over some obstacles. The lateral rolling locomotion obtains its driving force through the interaction with the environment. The rolling shape and its direction depend on the transferring direction and phase difference of the two waves respectively.