Arrow Research search

Author name cluster

Lei Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

NeurIPS Conference 2025 Conference Paper

Learning 3D Persistent Embodied World Models

  • Siyuan Zhou
  • Yilun Du
  • Yuncong Yang
  • Lei Han
  • Peihao Chen
  • Dit-Yan Yeung
  • Chuang Gan

The ability to simulate the effects of future actions on the world is a crucial ability of intelligent embodied agents, enabling agents to anticipate the effects of their actions and make plans accordingly. While a large body of existing work has explored how to construct such world models using video models, they are often myopic in nature, without any memory of a scene not captured by currently observed images, preventing agents from making consistent long-horizon plans in complex environments where many parts of the scene are partially observed. We introduce a new persistent embodied world model with an explicit memory of previously generated content, enabling much more consistent long-horizon simulation. During generation time, our video diffusion model predicts RGB-D video of the future observations of the agent. This generation is then aggregated into a persistent 3D map of the environment. By conditioning the video model on this 3D spatial map, we illustrate how this enables video world models to faithfully simulate both seen and unseen parts of the world. Finally, we illustrate the efficacy of such a world model in downstream embodied applications, enabling effective planning and policy learning.

NeurIPS Conference 2025 Conference Paper

Multi-Agent Collaboration via Evolving Orchestration

  • Yufan Dang
  • Chen Qian
  • Xueheng Luo
  • Jingru Fan
  • Zihao Xie
  • Ruijie Shi
  • Weize Chen
  • Cheng Yang

Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator’s evolution. Our code is available at https: //github. com/OpenBMB/ChatDev/tree/puppeteer.

NeurIPS Conference 2025 Conference Paper

STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization

  • Fengshuo Bai
  • Rui Zhao
  • Hongming Zhang
  • Sijia Cui
  • Shao Zhang
  • Bo Xu
  • Lei Han
  • Ying Wen

Preference-based reinforcement learning (PbRL) bypasses complex reward engineering by learning from human feedback. However, due to the high cost of obtaining feedback, PbRL typically relies on a limited set of preference-labeled samples. This data scarcity introduces two key inefficiencies: (1) the reward model overfits to the limited feedback, leading to poor generalization to unseen samples, and (2) the agent exploits the learned reward model, exacerbating overestimation of action values in temporal difference (TD) learning. To address these issues, we propose STAR, an efficient PbRL method that integrates preference margin regularization and policy regularization. Preference margin regularization mitigates overfitting by introducing a bounded margin in reward optimization, preventing excessive bias toward specific feedback. Policy regularization bootstraps a conservative estimate $\widehat{Q}$ from well-supported state-action pairs in the replay memory, reducing overestimation during policy learning. Experimental results show that STAR improves feedback efficiency, achieving 34. 8\% higher performance in online settings and 29. 7\% in offline settings compared to state-of-the-art methods. Ablation studies confirm that STAR facilitates more robust reward and value function learning. The videos of this project are released at https: //sites. google. com/view/pbrl-star.

AAMAS Conference 2024 Conference Paper

Automatic Curriculum for Unsupervised Reinforcement Learning

  • Yucheng Yang
  • Tianyi Zhou
  • Lei Han
  • Meng Fang
  • Mykola Pechenizkiy

Unsupervised reinforcement learning (URL) relies on carefully designed training objectives rather than task rewards to learn general skills. However, we lack quantitative evaluation metrics for URL but mainly rely on visualizations of trajectories for comparison. Moreover, most URL methods choose to optimize a single training objective, which may hinder later-stage learning and the development of new skills. To bridge these gaps, we first introduce a combination of metrics that can evaluate diverse properties of URL. We show that balancing these metrics in URL leads to better performance and trajectories with empirical evidence and theoretical insights. Next, we develop an automatic curriculum that uses a nonstationary multi-armed bandit algorithm to select URL objectives for different learning episodes, resulting in a balanced improvement on all the metrics. Extensive experiments in different environments demonstrate the advantages of our method in achieving promising and balanced performance on multiple metrics when compared to recent URL methods.

NeurIPS Conference 2024 Conference Paper

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

  • Xinyu Xu
  • Yizheng Zhang
  • Yong-Lu Li
  • Lei Han
  • Cewu Lu

Physical Human-Scene Interaction (HSI) plays a crucial role in numerous applications. However, existing HSI techniques are limited to specific object dynamics and privileged information, which prevents the development of more comprehensive applications. To address this limitation, we introduce HumanVLA for general object rearrangement directed by practical vision and language. A teacher-student framework is utilized to develop HumanVLA. A state-based teacher policy is trained first using goal-conditioned reinforcement learning and adversarial motion prior. Then, it is distilled into a vision-language-action model via behavior cloning. We propose several key insights to facilitate the large-scale learning process. To support general object rearrangement by physical humanoid, we introduce a novel Human-in-the-Room dataset encompassing various rearrangement tasks. Through extensive experiments and analysis, we demonstrate the effectiveness of our approach.

AAAI Conference 2024 Conference Paper

Relative Policy-Transition Optimization for Fast Policy Transfer

  • Jiawei Xu
  • Cheng Zhou
  • Yizheng Zhang
  • Baoxiang Wang
  • Lei Han

We consider the problem of policy transfer between two Markov Decision Processes (MDPs). We introduce a lemma based on existing theoretical results in reinforcement learning to measure the relativity gap between two arbitrary MDPs, that is the difference between any two cumulative expected returns defined on different policies and environment dynamics. Based on this lemma, we propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO), which offer fast policy transfer and dynamics modelling, respectively. RPO transfers the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model to reduce the gap between the dynamics of the two environments. Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which the policy interacts with the two environments simultaneously, such that data collections from two environments, policy and transition updates are completed in one closed loop to form a principled learning framework for policy transfer. We demonstrate the effectiveness of RPTO on a set of MuJoCo continuous control tasks by creating policy transfer problems via variant dynamics.

NeurIPS Conference 2024 Conference Paper

Self-playing Adversarial Language Game Enhances LLM Reasoning

  • Pengyu Cheng
  • Yong Dai
  • Tianhao Hu
  • Han Xu
  • Zhisong Zhang
  • Lei Han
  • Nan Du
  • Xiaolong Li

We explore the potential of self-play training for large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https: //github. com/Linear95/SPAG.

NeurIPS Conference 2024 Conference Paper

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

  • Ye Tian
  • Baolin Peng
  • Linfeng Song
  • Lifeng Jin
  • Dian Yu
  • Lei Han
  • Haitao Mi
  • Dong Yu

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Self-correction and self-learning emerge as viable solutions, employing strategies that allow LLMs to refine their outputs and learn from self-assessed rewards. Yet, the efficacy of LLMs in self-refining its response, particularly in complex reasoning and planning task, remains dubious. In this paper, we introduce AlphaLLM for the self-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop, thereby enhancing the capabilities of LLMs without additional annotations. Drawing inspiration from the success of AlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLM for self-improvement, including data scarcity, the vastness search spaces of language tasks, and the subjective nature of feedback in language tasks. AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approach tailored for language tasks, and a trio of critic models for precise feedback. Our experimental results in mathematical reasoning tasks demonstrate that AlphaLLM significantly enhances the performance of LLMs without additional annotations, showing the potential for self-improvement in LLMs.

JBHI Journal 2023 Journal Article

CCS-Net: Cascade Detection Network With the Convolution Kernel Switch Block and Statistics Optimal Anchors Block in Hypopharyngeal Cancer MRI

  • Shuo Zhang
  • Yang Miao
  • Jun Chen
  • Xiwei Zhang
  • Lei Han
  • Zehao Huang
  • Ning Pei
  • Haibin Liu

Magnetic resonance imaging (MRI) is a common diagnostic method for hypopharyngeal cancer (HPC). It is a challenge to automatically detect HPC tumors and swollen lymph nodes (HPC risk areas) from MRI slices because of the small size and irregular shape of HPC risk areas. Herein, we propose a cascade detection network with Convolution Kernel Switch (CKS) Block and Statistics Optimal Anchors (SOA) Block in HPC MRI (CCS-Net). CKS Block can adaptively switch standard convolution to deformable convolution in some appropriate layers to detect irregular objects more efficiently without taking up too much computing resources. SOA Block can automatically generate the optimal anchors based on the size distribution of objects. Compared with other methods, our method achieves splendid detection performance and outperforms other methods on the HPC dataset (more than 1800 T2 MRI slices), achieving the highest AP 50 of 78. 90%. Experiments show that the proposed network can be the basis of a computer aided diagnosis utility that helps achieve faster and more accurate diagnostic decisions for HPC.

AAMAS Conference 2023 Conference Paper

CraftEnv: A Flexible Collective Robotic Construction Environment for Multi-Agent Reinforcement Learning

  • Rui Zhao
  • Xu Liu
  • Yizheng Zhang
  • Minghao Li
  • Cheng Zhou
  • Shuai Li
  • Lei Han

CraftEnv is a flexible Collective Robotic Construction (CRC) environment for Multi-Agent Reinforcement Learning (MARL) research. CraftEnv can be used to study how artificial intelligent agents may learn to cooperate and solve complex real world tasks, such as collective construction and intelligent warehousing. The environment contains a set of collective construction tasks, which require a group of robotic vehicles to cooperate and learn to build different constructions efficiently. There are different elements in the CraftEnv, such as smartcars, blocks, and slopes. The smartcars can use the blocks and slopes to build different structures. The CraftEnv is highly flexible and simple to use, which enables creative and quick task-designs. The environment is written in python and can be rendered using PyBullet. The simulation is built based on real world robotic systems, designed with real-world constraints in mind. The learned policy can be transferred to the real world robotic system. CraftEnv is tailored for effective use by the research community and pushing forward collective intelligence and swarm technology.

JAIR Journal 2023 Journal Article

Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization

  • Jiawei Xu
  • Shuxing Li
  • Rui Yang
  • Chun Yuan
  • Lei Han

Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.

NeurIPS Conference 2023 Conference Paper

MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy

  • Honghua Dong
  • Jiawei Xu
  • Yu Yang
  • Rui Zhao
  • Shiwen Wu
  • Chun Yuan
  • Xiu Li
  • Chris J. Maddison

Graph neural networks, which typically exchange information between local neighbors, often struggle to capture long-range interactions (LRIs) within the graph. Building a graph hierarchy via graph pooling methods is a promising approach to address this challenge; however, hierarchical information propagation cannot entirely take over the role of local information aggregation. To balance locality and hierarchy, we integrate the local and hierarchical structures, represented by intra- and inter-graphs respectively, of a multi-scale graph hierarchy into a single mega graph. Our proposed MeGraph model consists of multiple layers alternating between local and hierarchical information aggregation on the mega graph. Each layer first performs local-aware message-passing on graphs of varied scales via the intra-graph edges, then fuses information across the entire hierarchy along the bidirectional pathways formed by inter-graph edges. By repeating this fusion process, local and hierarchical information could intertwine and complement each other. To evaluate our model, we establish a new Graph Theory Benchmark designed to assess LRI capture ability, in which MeGraph demonstrates dominant performance. Furthermore, MeGraph exhibits superior or equivalent performance to state-of-the-art models on the Long Range Graph Benchmark. The experimental results on commonly adopted real-world datasets further demonstrate the broad applicability of MeGraph.

NeurIPS Conference 2022 Conference Paper

Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping

  • Hao Sun
  • Lei Han
  • Rui Yang
  • Xiaoteng Ma
  • Jian Guo
  • Bolei Zhou

In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.

NeurIPS Conference 2022 Conference Paper

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

  • Rui Yang
  • Chenjia Bai
  • Xiaoteng Ma
  • Zhaoran Wang
  • Chongjie Zhang
  • Lei Han

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

NeurIPS Conference 2021 Conference Paper

Dynamic Bottleneck for Robust Self-Supervised Exploration

  • Chenjia Bai
  • Lingxiao Wang
  • Lei Han
  • Animesh Garg
  • Jianye Hao
  • Peng Liu
  • Zhaoran Wang

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e. g. , white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

AAAI Conference 2020 Conference Paper

Gated Fully Fusion for Semantic Segmentation

  • Xiangtai Li
  • Houlong Zhao
  • Lei Han
  • Yunhai Tong
  • Shaohua Tan
  • Kuiyuan Yang

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of highlevel features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features. Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lowerlevel features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

NeurIPS Conference 2019 Conference Paper

Curriculum-guided Hindsight Experience Replay

  • Meng Fang
  • Tianyi Zhou
  • Yali Du
  • Lei Han
  • Zhengyou Zhang

In off-policy deep reinforcement learning, it is usually hard to collect sufficient successful experiences with sparse rewards to learn from. Hindsight experience replay (HER) enables an agent to learn from failures by treating the achieved state of a failed experience as a pseudo goal. However, not all the failed experiences are equally useful to different learning stages, so it is not efficient to replay all of them or uniform samples of them. In this paper, we propose to 1) adaptively select the failed experiences for replay according to the proximity to the true goals and the curiosity of exploration over diverse pseudo goals, and 2) gradually change the proportion of the goal-proximity and the diversity-based curiosity in the selection criteria: we adopt a human-like learning strategy that enforces more curiosity in earlier stages and changes to larger goal-proximity later. This Goal-and-Curiosity-driven Curriculum Learning'' leads to Curriculum-guided HER (CHER)'', which adaptively and dynamically controls the exploration-exploitation trade-off during the learning process via hindsight experience selection. We show that CHER improves the state of the art in challenging robotics environments.

AAAI Conference 2019 Conference Paper

Learning (from) Deep Hierarchical Structure among Features

  • Yu Zhang
  • Lei Han

Data features usually can be organized in a hierarchical structure to reflect the relations among them. Most of previous studies that utilize the hierarchical structure to help improve the performance of supervised learning tasks can only handle the structure of a limited height such as 2. In this paper, we propose a Deep Hierarchical Structure (DHS) method to handle the hierarchical structure of an arbitrary height with a convex objective function. The DHS method relies on the exponents of the edge weights in the hierarchical structure but the exponents need to be given by users or set to be identical by default, which may be suboptimal. Based on the DHS method, we propose a variant to learn the exponents from data. Moreover, we consider a case where even the hierarchical structure is not available. Based on the DHS method, we propose a Learning Deep Hierarchical Structure (LDHS) method which can learn the hierarchical structure via a generalized fused-Lasso regularizer and a proposed sequential constraint. All the optimization problems are solved by proximal methods where each subproblem has an efficient solution. Experiments on synthetic and real-world datasets show the effectiveness of the proposed methods.

NeurIPS Conference 2019 Conference Paper

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning

  • Yali Du
  • Lei Han
  • Meng Fang
  • Ji Liu
  • Tianhong Dai
  • Dacheng Tao

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.

NeurIPS Conference 2018 Conference Paper

Exponentially Weighted Imitation Learning for Batched Historical Data

  • Qing Wang
  • Jiechao Xiong
  • Lei Han
  • Peng Sun
  • Han Liu
  • Tong Zhang

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or ``environment oracle'' as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.

IROS Conference 2017 Conference Paper

Beyond SIFT using binary features in Loop Closure Detection

  • Lei Han
  • Guyue Zhou
  • Lan Xu 0003
  • Lu Fang 0001

In this paper a binary feature based Loop Closure Detection (LCD) method is proposed, which for the first time achieves higher precision-recall (PR) performance compared with state-of-the-art SIFT feature based approaches. The proposed system originates from our previous work Multi-Index hashing for Loop closure Detection (MILD), which employs Multi-Index Hashing (MIH) [1] for Approximate Nearest Neighbor (ANN) search of binary features. As the accuracy of MILD is limited by repeating textures and inaccurate image similarity measurement, burstiness handling is introduced to solve this problem and achieves considerable accuracy improvement. Additionally, a comprehensive theoretical analysis on MIH used in MILD is conducted to further explore the potentials of hashing methods for ANN search of binary features from probabilistic perspective. This analysis provides more freedom on best parameter choosing in MIH for different application scenarios. Experiments on popular public datasets show that the proposed approach achieved the highest accuracy compared with state-of-the-art while running at 30Hz for databases containing thousands of images.

AAAI Conference 2016 Conference Paper

Multi-Stage Multi-Task Learning with Reduced Rank

  • Lei Han
  • Yu Zhang

Multi-task learning (MTL) seeks to improve the generalization performance by sharing information among multiple tasks. Many existing MTL approaches aim to learn the lowrank structure on the weight matrix, which stores the model parameters of all tasks, to achieve task sharing, and as a consequence the trace norm regularization is widely used in the MTL literature. A major limitation of these approaches based on trace norm regularization is that all the singular values of the weight matrix are penalized simultaneously, leading to impaired estimation on recovering the larger singular values in the weight matrix. To address the issue, we propose a Reduced rAnk MUlti-Stage multi-tAsk learning (RAMUSA) method based on the recently proposed capped norms. Different from existing trace-norm-based MTL approaches which minimize the sum of all the singular values, the RAMUSA method uses a capped trace norm regularizer to minimize only the singular values smaller than some threshold. Due to the non-convexity of the capped trace norm, we develop a simple but well guaranteed multi-stage algorithm to learn the weight matrix iteratively. We theoretically prove that the estimation error at each stage in the proposed algorithm shrinks and finally achieves a lower upper-bound as the number of stages becomes large enough. Empirical studies on synthetic and real-world datasets demonstrate the effectiveness of the RAMUSA method in comparison with the state-of-the-art methods.

AAAI Conference 2016 Conference Paper

Reduction Techniques for Graph-Based Convex Clustering

  • Lei Han
  • Yu Zhang

The Graph-based Convex Clustering (GCC) method has gained increasing attention recently. The GCC method adopts a fused regularizer to learn the cluster centers and obtains a geometric clusterpath by varying the regularization parameter. One major limitation is that solving the GCC model is computationally expensive. In this paper, we develop efficient graph reduction techniques for the GCC model to eliminate edges, each of which corresponds to two data points from the same cluster, without solving the optimization problem in the GCC method, leading to improved computational efficiency. Specifically, two reduction techniques are proposed according to tree-based and cyclic-graph-based convex clustering methods separately. The proposed reduction techniques are appealing since they only need to scan the data once with negligibly additional cost and they are independent of solvers for the GCC method, making them capable of improving the efficiency of any existing solver. Experiments on both synthetic and real-world datasets show that our methods can largely improve the efficiency of the GCC model.

IJCAI Conference 2015 Conference Paper

Action2Activity: Recognizing Complex Activities from Sensor Data

  • Ye Liu
  • Liqiang Nie
  • Lei Han
  • Luming Zhang
  • David S. Rosenblum

As compared to simple actions, activities are much more complex, but semantically consistent with a human’s real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work on bridging the gap between actions and activities. To this end, this paper presents a novel approach for complex activity recognition comprising of two components. The first component is temporal pattern mining, which provides a mid-level feature representation for activities, encodes temporal relatedness among actions, and captures the intrinsic properties of activities. The second component is adaptive Multi-Task Learning, which captures relatedness among activities and selects discriminant features. Extensive experiments on a real-world dataset demonstrate the effectiveness of our work.

AAAI Conference 2015 Conference Paper

Discriminative Feature Grouping

  • Lei Han
  • Yu Zhang

Feature grouping has been demonstrated to be promising in learning with high-dimensional data. It helps reduce the variances in the estimation and improves the stability of feature selection. One major limitation of existing feature grouping approaches is that some similar but different feature groups are often mis-fused, leading to impaired performance. In this paper, we propose a Discriminative Feature Grouping (DFG) method to discover the feature groups with enhanced discrimination. Different from existing methods, DFG adopts a novel regularizer for the feature coefficients to tradeoff between fusing and discriminating feature groups. The proposed regularizer consists of a 1 norm to enforce feature sparsity and a pairwise ∞ norm to encourage the absolute differences among any three feature coefficients to be similar. To achieve better asymptotic property, we generalize the proposed regularizer to an adaptive one where the feature coefficients are weighted based on the solution of some estimator with root-n consistency. For optimization, we employ the alternating direction method of multipliers to solve the proposed methods efficiently. Experimental results on synthetic and real-world datasets demonstrate that the proposed methods have good performance compared with the state-of-the-art feature grouping methods.

AAAI Conference 2015 Conference Paper

Learning Multi-Level Task Groups in Multi-Task Learning

  • Lei Han
  • Yu Zhang

In multi-task learning (MTL), multiple related tasks are learned jointly by sharing information across them. Many MTL algorithms have been proposed to learn the underlying task groups. However, those methods are limited to learn the task groups at only a single level, which may be not sufficient to model the complex structure among tasks in many real-world applications. In this paper, we propose a Multi-Level Task Grouping (MeTaG) method to learn the multi-level grouping structure instead of only one level among tasks. Specifically, by assuming the number of levels to be H, we decompose the parameter matrix into a sum of H component matrices, each of which is regularized with a 2 norm on the pairwise difference among parameters of all the tasks to construct level-specific task groups. For optimization, we employ the smoothing proximal gradient method to efficiently solve the objective function of the MeTaG model. Moreover, we provide theoretical analysis to show that under certain conditions the MeTaG model can recover the true parameter matrix and the true task groups in each level with high probability. We experiment our approach on both synthetic and real-world datasets, showing competitive performance over state-of-the-art MTL methods.

AAAI Conference 2014 Conference Paper

Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework

  • Lei Han
  • Yu Zhang
  • Guojie Song
  • Kunqing Xie

Multi-task learning seeks to improve the generalization performance by sharing common information among multiple related tasks. A key assumption in most MTL algorithms is that all tasks are related, which, however, may not hold in many real-world applications. Existing techniques, which attempt to address this issue, aim to identify groups of related tasks using group sparsity. In this paper, we propose a probabilistic tree sparsity (PTS) model to utilize the tree structure to obtain the sparse solution instead of the group structure. Specifically, each model coefficient in the learning model is decomposed into a product of multiple component coefficients each of which corresponds to a node in the tree. Based on the decomposition, Gaussian and Cauchy distributions are placed on the component coefficients as priors to restrict the model complexity. We devise an efficient expectation maximization algorithm to learn the model parameters. Experiments conducted on both synthetic and real-world problems show the effectiveness of our model compared with state-of-the-art baselines.