Arrow Research search

Author name cluster

Hang Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

46 papers
2 author rows

Possible papers

46

TMLR Journal 2025 Journal Article

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

  • Geri Skenderi
  • Hang Li
  • Jiliang Tang
  • Marco Cristani

Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive learning while avoiding the overfitting issues associated with generative pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA). In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative prediction objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Through multiple experimental evaluations, we show that Graph-JEPA can learn highly semantic and expressive representations, as shown by the downstream performance in graph classification, regression, and distinguishing non-isomorphic graphs. The code is available at https://github.com/geriskenderi/graph-jepa.

ICRA Conference 2025 Conference Paper

Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Manipulation

  • Hang Li
  • Qian Feng
  • Zhi Zheng
  • Jianxiang Feng
  • Zhaopeng Chen
  • Alois C. Knoll

Learning from demonstrations faces challenges in generalizing beyond the training data and often lacks collision awareness. This paper introduces Lan-o3dp, a language-guided object-centric diffusion policy framework that can adapt to unseen situations such as cluttered scenes, shifting camera views, and ambiguous similar objects while offering trainingfree collision avoidance and achieving a high success rate with few demonstrations. We train a diffusion model conditioned on 3D point clouds of task-relevant objects to predict the robot's end-effector trajectories, enabling it to complete the tasks. During inference, we incorporate cost optimization into denoising steps to guide the generated trajectory to be collisionfree. We leverage open-set segmentation to obtain the 3D point clouds of related objects. We use a large language model to identify the target objects and possible obstacles by interpreting the user's natural language instructions. To effectively guide the conditional diffusion model using a time-independent cost function, we proposed a novel guided generation mechanism based on the estimated clean trajectories. In the simulation, we showed that diffusion policy based on the object-centric 3D representation achieves a much higher success rate (68. 7%) compared to baselines with simple 2D (39. 3%) and 3D scene (43. 6%) representations across 21 challenging RLBench tasks with only 40 demonstrations. In real-world experiments, we extensively evaluated the generalization in various unseen situations and validated the effectiveness of the proposed zeroshot cost-guided collision avoidance.

NeurIPS Conference 2024 Conference Paper

AGILE: A Novel Reinforcement Learning Framework of LLM Agents

  • Peiyuan Feng
  • Yichen He
  • Guanhua Huang
  • Yuan Lin
  • Hanchong Zhang
  • Yuchen Zhang
  • Hang Li

We introduce a novel reinforcement learning framework of LLM agents named AGILE (AGent that Interacts and Learns from Environments) designed to perform complex conversational tasks with users, leveraging LLMs, memory, tools, and interactions with experts. The agent possesses capabilities beyond conversation, including reflection, tool usage, and expert consultation. We formulate the construction of such an LLM agent as a reinforcement learning (RL) problem, in which the LLM serves as the policy model. We fine-tune the LLM using labeled data of actions and the PPO algorithm. We focus on question answering and release a dataset for agents called ProductQA, comprising challenging questions in online shopping. Our extensive experiments on ProductQA, MedMCQA and HotPotQA show that AGILE agents based on 7B and 13B LLMs trained with PPO can outperform GPT-4 agents. Our ablation study highlights the indispensability of memory, tools, consultation, reflection, and reinforcement learning in achieving the agent's strong performance. Datasets and code are available at https: //github. com/bytarnish/AGILE.

ICML Conference 2024 Conference Paper

Boximator: Generating Rich and Controllable Motions for Video Synthesis

  • Jiawei Wang
  • Yuchen Zhang
  • Jiaxin Zou
  • Yan Zeng
  • Guoqiang Wei
  • Liping Yuan
  • Hang Li

Generating rich and controllable motion is a pivotal challenge in video synthesis. We propose Boximator, a new approach for fine-grained motion control. Boximator introduces two constraint types: hard box and soft box. Users select objects in the conditional frame using hard boxes and then use either type of boxes to roughly or rigorously define the object’s position, shape, or motion path in future frames. Boximator functions as a plug-in for existing video diffusion models. Its training process preserves the base model’s knowledge by freezing the original weights and training only the control module. To address training challenges, we introduce a novel self-tracking technique that greatly simplifies the learning of box-object correlations. Empirically, Boximator achieves state-of-the-art video quality (FVD) scores, improving on two base models, and further enhanced after incorporating box constraints. Its robust motion controllability is validated by drastic increases in the bounding box alignment metric. Human evaluation also shows that users favor Boximator generation results over the base model.

IROS Conference 2024 Conference Paper

Driving Animatronic Robot Facial Expression From Speech

  • Boren Li
  • Hang Li
  • Hangxin Liu

Animatronic robots hold the promise of enabling natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions poses significant challenges due to the complexities of facial biomechanics and the need for responsive motion synthesis. This paper introduces a novel, skinning-centric approach to drive animatronic robot facial expressions from speech input. At its core, the proposed approach employs linear blend skinning (LBS) as a unifying representation, guiding innovations in both embodiment design and motion synthesis. LBS informs the actuation topology, facilitates human expression retargeting, and enables efficient speech-driven facial motion generation. This approach demonstrates the capability to produce highly realistic facial expressions on an animatronic face in real-time at over 4000 fps on a single Nvidia RTX 4090, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction. To foster further research and development in this field, the code has been made publicly available at: https://github.com/library87/OpenRoboExp.

IROS Conference 2024 Conference Paper

Flight Structure Optimization of Modular Reconfigurable UAVs

  • Yao Su 0001
  • Ziyuan Jiao
  • Zeyu Zhang 0001
  • Jingwen Zhang
  • Hang Li
  • Meng Wang 0051
  • Hangxin Liu

This paper presents a Genetic Algorithm (GA) designed to reconfigure a large group of modular Unmanned Aerial Vehicles (UAVs), each with different weights and inertia parameters, into an over-actuated flight structure with improved dynamic properties. Previous research efforts either utilized expert knowledge to design flight structures for a specific task or relied on enumeration-based algorithms that required extensive computation to find an optimal one. However, both approaches encounter challenges in accommodating the heterogeneity among modules. Our GA addresses these challenges by incorporating the complexities of over-actuation and dynamic properties into its formulation. Additionally, we employ a tree representation and a vector representation to describe flight structures, facilitating efficient crossover operations and fitness evaluations within the GA framework, respectively. Using cubic modular quadcopters capable of functioning as omnidirectional thrust generators, we validate that the proposed approach can (i) adeptly identify suboptimal configurations ensuring both over-actuation and trajectory tracking accuracy and (ii) significantly reduce computational costs compared to traditional enumeration-based methods.

ICRA Conference 2024 Conference Paper

Real-time Dynamic-consistent Motion Planning for Over-actuated UAVs

  • Yao Su 0001
  • Jingwen Zhang
  • Ziyuan Jiao
  • Hang Li
  • Meng Wang 0051
  • Hangxin Liu

Existing motion planning approaches for over-actuated unmanned aerial vehicle (UAV) platforms can achieve online planning without considering dynamics. However, in many envisioned application areas such as aerial manipulation, payload delivery, and moving target tracking, it is critical to ensure dynamic consistency in the generated trajectory. The dynamics of these platforms introduce a high nonlinearity, leading to a substantial increase in computational burden. This paper presents an efficient method to plan motions that are consistent with the dynamics of over-actuated UAVs. With a hierarchical control structure, the dimension of the optimization problem is greatly reduced with synthesized wrench commands. Additionally, by exploring the dynamics of over-actuated UAVs, the complex planning process is decoupled into two simpler sub-problems. As a result, the proposed planner can be solved as two small quadratic programmings (QPs) and deployed in real-time. The computational efficiency and dynamic consistency of the proposed method are verified through both simulations and experiments, including comparison with other approaches and dynamic target tracking.

ICLR Conference 2024 Conference Paper

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

  • Hongtao Wu
  • Ya Jing
  • Chilam Cheang
  • Guangzeng Chen
  • Jiafeng Xu
  • Xinghang Li
  • Minghuan Liu
  • Hang Li

Generative pre-trained models have demonstrated remarkable effectiveness in language and vision domains by learning useful representations. In this paper, we extend the scope of this effectiveness by showing that visual robot manipulation can significantly benefit from large-scale video generative pre-training. We introduce GR-1, a GPT-style model designed for multi-task language-conditioned visual robot manipulation. GR-1 takes as inputs a language instruction, a sequence of observation images, and a sequence of robot states. It predicts robot actions as well as future images in an end-to-end manner. Thanks to a flexible design, GR-1 can be seamlessly finetuned on robot data after pre-trained on a large-scale video dataset. We perform extensive experiments on the challenging CALVIN benchmark and a real robot. On CALVIN benchmark, our method outperforms state-of-the-art baseline methods and improves the success rate from 88.9% to 94.9%. In the setting of zero-shot unseen scene generalization, GR-1 improves the success rate from 53.3% to 85.4%. In real robot experiments, GR-1 also outperforms baseline methods and shows strong potentials in generalization to unseen scenes and objects. We provide inaugural evidence that a unified GPT-style transformer, augmented with large-scale video generative pre-training, exhibits remarkable generalization to multi-task visual robot manipulation. Project page: https://GR1-Manipulation.github.io

ICLR Conference 2024 Conference Paper

Vision-Language Foundation Models as Effective Robot Imitators

  • Xinghang Li
  • Minghuan Liu
  • Hanbo Zhang
  • Cunjun Yu
  • Jie Xu
  • Hongtao Wu
  • Chilam Cheang
  • Ya Jing

Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step vision-language comprehension, models sequential history information with an explicit policy head, and is slightly fine-tuned by imitation learning only on language-conditioned manipulation datasets. Such a decomposition provides RoboFlamingo the flexibility for open-loop control and deployment on low-performance platforms. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control. Our extensive experimental results also reveal several interesting conclusions regarding the behavior of different pre-trained VLMs on manipulation tasks. We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy. Our code will be made public upon acceptance.

IROS Conference 2023 Conference Paper

Aggregating Single-Wheeled Mobile Robots for Omnidirectional Movements

  • Meng Wang 0051
  • Yao Su 0001
  • Hang Li
  • Jiarui Li
  • Jixiang Liang
  • Hangxin Liu

This paper presents a novel modular robot system that can self-reconfigure to achieve omnidirectional movements for collaborative object transportation. Each robotic module is equipped with a steerable omni-wheel for navigation and is shaped as a regular icositetragon with a permanent magnet installed on each corner for stable docking. After aggregating multiple modules and forming a structure that can cage a target object, we have developed an optimization-based method to compute the distribution of all wheels' heading directions, which enables efficient omnidirectional movements of the structure. By implementing a hierarchical controller on our prototyped system in both simulation and experiment, we validated the trajectory tracking performance of an individual module and a team of six modules in multiple navigation and collaborative object transportation settings. The results demonstrate that the proposed system can maintain a stable caging formation and achieve smooth transportation, indicating the effectiveness of our hardware and locomotion designs.

IJCAI Conference 2023 Conference Paper

Generative Diffusion Models on Graphs: Methods and Applications

  • Chengyi Liu
  • Wenqi Fan
  • Yunqing Liu
  • Jiatong Li
  • Hang Li
  • Hui Liu
  • Jiliang Tang
  • Qing Li

Diffusion models, as a novel generative paradigm, have achieved remarkable success in various image generation tasks such as image inpainting, image-to-text translation, and video generation. Graph generation is a crucial computational task on graphs with numerous real-world applications. It aims to learn the distribution of given graphs and then generate new graphs. Given the great success of diffusion models in image generation, increasing efforts have been made to leverage these techniques to advance graph generation in recent years. In this paper, we first provide a comprehensive overview of generative diffusion models on graphs, In particular, we review representative algorithms for three variants of graph diffusion models, i. e. , Score Matching with Langevin Dynamics (SMLD), Denoising Diffusion Probabilistic Model (DDPM), and Score-based Generative Model (SGM). Then, we summarize the major applications of generative diffusion models on graphs with a specific focus on molecule and protein modeling. Finally, we discuss promising directions in generative diffusion models on graph-structured data.

IROS Conference 2023 Conference Paper

Sequential Manipulation Planning for Over-Actuated Unmanned Aerial Manipulators

  • Yao Su 0001
  • Jiarui Li
  • Ziyuan Jiao
  • Meng Wang 0051
  • Chi Chu
  • Hang Li
  • Yixin Zhu 0001
  • Hangxin Liu

We investigate the sequential manipulation planning problem for unmanned aerial manipulators (UAMs). Unlike prior work that primarily focuses on one-step manipulation tasks, sequential manipulations require coordinated motions of a UAM's floating base, the manipulator, and the object being manipulated, entailing a unified kinematics and dynamics model for motion planning under designated constraints. By leveraging a virtual kinematic chain (VKC)-based motion planning framework that consolidates components' kinematics into one chain, the sequential manipulation task of a UAM can be planned as a whole, yielding more coordinated motions. Integrating the kinematics and dynamics models with a hierarchical control framework, we demonstrate, for the first time, an over-actuated UAM achieves a series of new sequential manipulation capabilities in both simulation and experiment.

NeurIPS Conference 2023 Conference Paper

Uncertainty-Aware Instance Reweighting for Off-Policy Learning

  • Xiaoying Zhang
  • Junpu Chen
  • Hongning Wang
  • Hong Xie
  • Yang Liu
  • John C. S. Lui
  • Hang Li

Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various important real-world applications, such as search engines and recommender systems. While the ground-truth logging policy is usually unknown, previous work simply takes its estimated value for the off-policy learning, ignoring the negative impact from both high bias and high variance resulted from such an estimator. And these impact is often magnified on samples with small and inaccurately estimated logging probabilities. The contribution of this work is to explicitly model the uncertainty in the estimated logging policy, and propose an Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on the synthetic and real-world recommendation datasets demonstrate that UIPS significantly improves the quality of the discovered policy, when compared against an extensive list of state-of-the-art baselines.

JBHI Journal 2022 Journal Article

An Efficient Ciphertext-Policy Weighted Attribute-Based Encryption for the Internet of Health Things

  • Hang Li
  • Keping Yu
  • Bin Liu
  • Chaosheng Feng
  • Zhiguang Qin
  • Gautam Srivastava

The Internet of Health Things (IoHT) is a medical concept that describes uniquely identifiable devices connected to the Internet that can communicate with each other. As one of the most important components of smart health monitoring and improvement systems, the IoHT presents numerous challenges, among which cybersecurity is a priority. As a well-received security solution to achieve fine-grained access control, ciphertext-policy weighted attribute-based encryption (CP-WABE) has the potential to ensure data security in the IoHT. However, many issues remain, such as inflexibility, poor computational capability, and insufficient storage efficiency in attributes comparison. To address these issues, we propose a novel access policy expression method using 0-1 coding technology. Based on this method, a flexible and efficient CP-WABE is constructed for the IoHT. Our scheme supports not only weighted attributes but also any form of comparison of weighted attributes. Furthermore, we use offline/online encryption and outsourced decryption technology to ensure that the scheme can run on an inefficient IoT terminal. Both theoretical and experimental analyses show that our scheme is more efficient and feasible than other schemes. Moreover, security analysis indicates that our scheme achieves security against a chosen-plaintext attack.

AAAI Conference 2022 Conference Paper

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data

  • Yu Kang
  • Tianqiao Liu
  • Hang Li
  • Yang Hao
  • Wenbiao Ding

Multimodal pre-training for audio-and-text has recently been proved to be effective and has significantly improved the performance of many downstream speech understanding tasks. However, these state-of-the-art pre-training audio-text models work well only when provided with large amount of parallel audio-and-text data, which brings challenges on many languages that are rich in unimodal corpora but scarce of parallel cross-modal corpus. In this paper, we investigate whether it is possible to pre-train an audio-text multimodal model with extremely low-resource parallel data and extra non-parallel unimodal data. Our pre-training framework consists of the following components: (1) Intra-modal Denoising Auto-Encoding (IDAE), which is able to reconstruct input text (audio) representations from a noisy version of itself. (2) Cross-modal Denoising Auto-Encoding (CDAE), which is pre-trained to reconstruct the input text (audio), given both a noisy version of the input text (audio) and the corresponding translated noisy audio features (text embeddings). (3) Iterative Denoising Process (IDP), which iteratively translates raw audio (text) and the corresponding text embeddings (audio features) translated from previous iteration into the new less-noisy text embeddings (audio features). We adapt a dual cross-modal Transformer as our backbone model which consists of two unimodal encoders for IDAE and two cross-modal encoders for CDAE and IDP. Our method achieves comparable performance on multiple downstream speech understanding tasks compared with the model pre-trained on fully parallel data, demonstrating the great potential of the proposed method.

NeurIPS Conference 2021 Conference Paper

Disentangled Contrastive Learning on Graphs

  • Haoyang Li
  • Xin Wang
  • Ziwei Zhang
  • Zehuan Yuan
  • Hang Li
  • Wenwu Zhu

Recently, self-supervised learning for graph neural networks (GNNs) has attracted considerable attention because of their notable successes in learning the representation of graph-structure data. However, the formation of a real-world graph typically arises from the highly complex interaction of many latent factors. The existing self-supervised learning methods for GNNs are inherently holistic and neglect the entanglement of the latent factors, resulting in the learned representations suboptimal for downstream tasks and difficult to be interpreted. Learning disentangled graph representations with self-supervised learning poses great challenges and remains largely ignored by the existing literature. In this paper, we introduce the Disentangled Graph Contrastive Learning (DGCL) method, which is able to learn disentangled graph-level representations with self-supervision. In particular, we first identify the latent factors of the input graph and derive its factorized representations. Each of the factorized representations describes a latent and disentangled aspect pertinent to a specific latent factor of the graph. Then we propose a novel factor-wise discrimination objective in a contrastive learning manner, which can force the factorized representations to independently reflect the expressive information from different latent factors. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of our method against several state-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Feature Statistics Guided Efficient Filter Pruning

  • Hang Li
  • Chen Ma
  • Wei Xu
  • Xue Liu

Building compact convolutional neural networks (CNNs) with reliable performance is a critical but challenging task, especially when deploying them in real-world applications. As a common approach to reduce the size of CNNs, pruning methods delete part of the CNN filters according to some metrics such as l1-norm. However, previous methods hardly leverage the information variance in a single feature map and the similarity characteristics among feature maps. In this paper, we propose a novel filter pruning method, which incorporates two kinds of feature map selections: diversity-aware selection (DFS) and similarity-aware selection (SFS). DFS aims to discover features with low information diversity while SFS removes features that have high similarities with others. We conduct extensive empirical experiments with various CNN architectures on publicly available datasets. The experimental results demonstrate that our model obtains up to 91. 6% parameter decrease and 83. 7% FLOPs reduction with almost no accuracy loss.

JBHI Journal 2019 Journal Article

Dense Deconvolutional Network for Skin Lesion Segmentation

  • Hang Li
  • Xinzi He
  • Feng Zhou
  • Zhen Yu
  • Dong Ni
  • Siping Chen
  • Tianfu Wang
  • Baiying Lei

Automatic delineation of skin lesion contours from dermoscopy images is a basic step in the process of diagnosis and treatment of skin lesions. However, it is a challenging task due to the high variation of appearances and sizes of skin lesions. In order to deal with such challenges, we propose a new dense deconvolutional network (DDN) for skin lesion segmentation based on residual learning. Specifically, the proposed network consists of dense deconvolutional layers (DDLs), chained residual pooling (CRP), and hierarchical supervision (HS). First, unlike traditional deconvolutional layers, DDLs are adopted to maintain the dimensions of the input and output images unchanged. The DDNs are trained in an end-to-end manner without the need of prior knowledge or complicated postprocessing procedures. Second, the CRP aims to capture rich contextual background information and to fuse multilevel features. By combining the local and global contextual information via multilevel feature fusion, the high-resolution prediction output is obtained. Third, HS is added to serve as an auxiliary loss and to refine the prediction mask. Extensive experiments based on the public ISBI 2016 and 2017 skin lesion challenge datasets demonstrate the superior segmentation results of our proposed method over the state-of-the-art methods.

IJCAI Conference 2017 Conference Paper

Effective Representing of Information Network by Variational Autoencoder

  • Hang Li
  • Haozheng Wang
  • Zhenglu Yang
  • Haochen Liu

Network representation is the basis of many applications and of extensive interest in various fields, such as information retrieval, social network analysis, and recommendation systems. Most previous methods for network representation only consider the incomplete aspects of a problem, including link structure, node information, and partial integration. The present study proposes a deep network representation model that seamlessly integrates the text information and structure of a network. Our model captures highly non-linear relationships between nodes and complex features of a network by exploiting the variational autoencoder (VAE), which is a deep unsupervised generation algorithm. We also merge the representation learned with a paragraph vector model and that learned with the VAE to obtain the network representation that preserves both structure and text information. We conduct comprehensive empirical experiments on benchmark datasets and find our model performs better than state-of-the-art techniques by a large margin.

AAAI Conference 2017 Conference Paper

Neural Machine Translation Advised by Statistical Machine Translation

  • Xing Wang
  • Zhengdong Lu
  • Zhaopeng Tu
  • Hang Li
  • Deyi Xiong
  • Min Zhang

Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e. g. , the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.

AAAI Conference 2017 Conference Paper

Neural Machine Translation with Reconstruction

  • Zhaopeng Tu
  • Yang Liu
  • Lifeng Shang
  • Xiaohua Liu
  • Hang Li

Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-theart NMT and statistical MT systems.

AAAI Conference 2016 Conference Paper

Learning to Answer Questions from Image Using Convolutional Neural Network

  • Lin Ma
  • Zhengdong Lu
  • Hang Li

In this paper, we propose to employ the convolutional neural network (CNN) for the image question answering (QA) task. Our proposed CNN provides an end-to-end framework with convolutional architectures for learning not only the image and question representations, but also their inter-modal interactions to produce the answer. More specifically, our model consists of three CNNs: one image CNN to encode the image content, one sentence CNN to compose the words of the question, and one multimodal convolution layer to learn their joint representation for the classification in the space of candidate answer words. We demonstrate the efficacy of our proposed model on the DAQUAR and COCO-QA datasets, which are two benchmark datasets for image QA, with the performances significantly outperforming the state-of-the-art.

IJCAI Conference 2016 Conference Paper

Neural Enquirer: Learning to Query Tables in Natural Language

  • Pengcheng Yin
  • Zhengdong Lu
  • Hang Li
  • Ben Kao

We propose Neural Enquirer - a neural network architecture for answering natural language (NL) questions based on a knowledge base (KB) table. Unlike existing work on end-to-end training of semantic parsers, Neural Enquirer is fully "neuralized": it finds distributed representations of queries and KB tables, and executes queries through a series of neural network components called "executors". Executors model query operations and compute intermediate execution results in the form of table annotations at different levels. Neural Enquirer can be trained with gradient descent, with which the representations of queries and the KB table are jointly optimized with the query execution logic. The training can be done in an end-to-end fashion, and it can also be carried out with stronger guidance, e. g. , step-by-step supervision for complex queries. Neural Enquirer is one step towards building neural network systems that can understand natural language in real-world tasks. As a proof-of-concept, we conduct experiments on a synthetic QA task, and demonstrate that the model can learn to execute reasonably complex NL queries on small-scale KB tables.

IJCAI Conference 2016 Conference Paper

Neural Generative Question Answering

  • Jun Yin
  • Xin Jiang
  • Zhengdong Lu
  • Lifeng Shang
  • Hang Li
  • Xiaoming Li

This paper presents an end-to-end neural network model, named Neural Generative Question Answering (GENQA), that can generate answers to simple factoid questions, based on the facts in a knowledge-base. More specifically, the model is built on the encoder-decoder framework for sequence-to-sequence learning, while equipped with the ability to enquire the knowledge-base, and is trained on a corpus of question-answer pairs, with their associated triples in the knowledge-base. Empirical study shows the proposed model can effectively deal with the variations of questions and answers, and generate right and natural answers by referring to the facts in the knowledge-base. The experiment on question answering demonstrates that the proposed model can outperform an embedding-based QA model as well as a neural dialogue model trained on the same data.

IJCAI Conference 2015 Conference Paper

Reader-Aware Multi-Document Summarization via Sparse Coding

  • Piji Li
  • Lidong Bing
  • Wai Lam
  • Hang Li
  • Yi Liao

We propose a new MDS paradigm called readeraware multi-document summarization (RA-MDS). Specifically, a set of reader comments associated with the news reports are also collected. The generated summaries from the reports for the event should be salient according to not only the reports but also the reader comments. To tackle this RA- MDS problem, we propose a sparse-coding-based method that is able to calculate the salience of the text units by jointly considering news reports and reader comments. Another reader-aware characteristic of our framework is to improve linguistic quality via entity rewriting. The rewriting consideration is jointly assessed together with other summarization requirements under a unified optimization model. To support the generation of compressive summaries via optimization, we explore a finer syntactic unit, namely, noun/verb phrase. In this work, we also generate a data set for conducting RA-MDS. Extensive experiments on this data set and some classical data sets demonstrate the effectiveness of our proposed approach.

IJCAI Conference 2015 Conference Paper

Syntax-Based Deep Matching of Short Texts

  • Mingxuan Wang
  • Zhengdong Lu
  • Hang Li
  • Qun Liu

Many tasks in natural language processing, ranging from machine translation to question answering, can be reduced to the problem of matching two sentences or more generally two short texts. We propose a new approach to the problem, called Deep Match Tree (DEEPMATCHtree), under a general setting. The approach consists of two components, 1) a mining algorithm to discover patterns for matching two short-texts, defined in the product space of dependency trees, and 2) a deep neural network for matching short texts using the mined patterns, as well as a learning algorithm to build the network having a sparse structure. We test our algorithm on the problem of matching a tweet and a response in social media, a hard matching problem proposed in [Wang et al. , 2013], and show that DEEPMATCHtree can outperform a number of competitor models including one without using dependency trees and one based on word-embedding, all with large margins.

NeurIPS Conference 2014 Conference Paper

Convolutional Neural Network Architectures for Matching Natural Language Sentences

  • Baotian Hu
  • Zhengdong Lu
  • Hang Li
  • Qingcai Chen

Semantic matching is of central importance to many natural language tasks \cite{bordes2014semantic, RetrievalQA}. A successful matching algorithm needs to adequately model the internal structures of language objects and the interaction between them. As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling, but also capture the rich matching patterns at different levels. Our models are rather generic, requiring no prior knowledge on language, and can hence be applied to matching tasks of different nature and in different languages. The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models.

NeurIPS Conference 2013 Conference Paper

A Deep Architecture for Matching Short Texts

  • Zhengdong Lu
  • Hang Li

Many machine learning problems can be interpreted as learning for matching two types of objects (e. g. , images and captions, users and products, queries and documents). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e. g. , finding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models.

IJCAI Conference 2013 Conference Paper

An Introduction to String Re-Writing Kernel

  • Fan Bu
  • Hang Li
  • Xiaoyan Zhu

Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string rewriting kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings. It can capture the lexical and structural similarity between sentence pairs without the need of constructing syntactic trees. We further propose an instance of string re-writing kernel which can be computed efficiently. Experimental results on benchmark datasets show that our method can achieve comparable results with state-of-the-art methods on two sentence re-writing learning tasks: paraphrase identification and recognizing textual entailment.

JMLR Journal 2013 Journal Article

Learning Bilinear Model for Matching Queries and Documents

  • Wei Wu
  • Zhengdong Lu
  • Hang Li

The task of matching data from two heterogeneous domains naturally arises in various areas such as web search, collaborative filtering, and drug design. In web search, existing work has designed relevance models to match queries and documents by exploiting either user clicks or content of queries and documents. To the best of our knowledge, however, there has been little work on principled approaches to leveraging both clicks and content to learn a matching model for search. In this paper, we propose a framework for learning to match heterogeneous objects. The framework learns two linear mappings for two objects respectively, and matches them via the dot product of their images after mapping. Moreover, when different regularizations are enforced, the framework renders a rich family of matching models. With orthonormal constraints on mapping functions, the framework subsumes Partial Least Squares (PLS) as a special case. Alternatively, with a $\ell_1$+$\ell_2$ regularization, we obtain a new model called Regularized Mapping to Latent Structures (RMLS). RMLS enjoys many advantages over PLS, including lower time complexity and easy parallelization. To further understand the matching framework, we conduct generalization analysis and apply the result to both PLS and RMLS. We apply the framework to web search and implement both PLS and RMLS using a click-through bipartite with metadata representing features of queries and documents. We test the efficacy and scalability of RMLS and PLS on large scale web search problems. The results show that both PLS and RMLS can significantly outperform baseline methods, while RMLS substantially speeds up the learning process. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

TIST Journal 2013 Journal Article

Mining search and browse logs for web search

  • Daxin Jiang
  • Jian Pei
  • Hang Li

Huge amounts of search log data have been accumulated at Web search engines. Currently, a popular Web search engine may receive billions of queries and collect terabytes of records about user search behavior daily. Beside search log data, huge amounts of browse log data have also been collected through client-side browser plugins. Such massive amounts of search and browse log data provide great opportunities for mining the wisdom of crowds and improving Web search. At the same time, designing effective and efficient methods to clean, process, and model log data also presents great challenges. In this survey, we focus on mining search and browse log data for Web search. We start with an introduction to search and browse log data and an overview of frequently-used data summarizations in log mining. We then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, document ranking, user understanding, and monitoring and feedback. For each aspect, we survey the major tasks, fundamental principles, and state-of-the-art methods.

JMLR Journal 2011 Journal Article

Learning a Robust Relevance Model for Search Using Kernel Methods

  • Wei Wu
  • Jun Xu
  • Hang Li
  • Satoshi Oyama

This paper points out that many search relevance models in information retrieval, such as the Vector Space Model, BM25 and Language Models for Information Retrieval, can be viewed as a similarity function between pairs of objects of different types, referred to as an S-function. An S-function is specifically defined as the dot product between the images of two objects in a Hilbert space mapped from two different input spaces. One advantage of taking this view is that one can take a unified and principled approach to address the issues with regard to search relevance. The paper then proposes employing a kernel method to learn a robust relevance model as an S-function, which can effectively deal with the term mismatch problem, one of the biggest challenges in search. The kernel method exploits a positive semi-definite kernel referred to as an S-kernel. The paper shows that when using an S-kernel the model learned by the kernel method is guaranteed to be an S-function. The paper then gives more general principles for constructing S-kernels. A specific implementation of the kernel method is proposed using the Ranking SVM techniques and click-through data. The proposed approach is employed to learn a relevance model as an extension of BM25, referred to as Robust BM25. Experimental results on web search and enterprise search data show that Robust BM25 significantly outperforms baseline methods and can successfully tackle the term mismatch problem. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

TIST Journal 2011 Journal Article

Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion

  • Zhen Liao
  • Daxin Jiang
  • Enhong Chen
  • Jian Pei
  • Huanhuan Cao
  • Hang Li

Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context information effectively in query suggestions. Context-aware query suggestion is challenging in both modeling context and scaling up query suggestion using context. In this article, we propose a novel context-aware query suggestion approach. To tackle the challenges, our approach consists of two stages. In the first, offline model-learning stage, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. A concept sequence suffix tree is then constructed from session data as a context-aware query suggestion model. In the second, online query suggestion stage, a user’s search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence suffix tree, we suggest to the user context-aware queries. We test our approach on large-scale search logs of a commercial search engine containing 4.0 billion Web queries, 5.9 billion clicks, and 1.87 billion search sessions. The experimental results clearly show that our approach outperforms three baseline methods in both coverage and quality of suggestions.

AAAI Conference 2011 Conference Paper

Multi-Task Learning in Square Integrable Space

  • Wei Wu
  • Hang Li
  • Yunhua Hu
  • Rong Jin

Several kernel based methods for multi-task learning have been proposed, which leverage relations among tasks as regularization to enhance the overall learning accuracies. These methods assume that the tasks share the same kernel, which could limit their applications because in practice different tasks may need different kernels. The main challenge of introducing multiple kernels into multiple tasks is that models from different Reproducing Kernel Hilbert Spaces (RKHSs) are not comparable, making it difficult to exploit relations among tasks. This paper addresses the challenge by formalizing the problem in the Square Integrable Space (SIS). Specially, it proposes a kernel based method which makes use of a regularization term defined in the SIS to represent task relations. We prove a new representer theorem for the proposed approach in SIS. We further derive a practical method for solving the learning problem and conduct consistency analysis of the method. We discuss the relations between our method and an existing method. We also give an SVM based implementation of our method for multi-label classification. Experiments on two real-world data sets show that the proposed method performs better than the existing method.

NeurIPS Conference 2009 Conference Paper

Ranking Measures and Loss Functions in Learning to Rank

  • Wei Chen
  • Tie-Yan Liu
  • Yanyan Lan
  • Zhi-Ming Ma
  • Hang Li

Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking function by minimizing the loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking function. In this work, we reveal the relationship between ranking measures and loss functions in learning-to-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that these loss functions are upper bounds of the measure-based ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performance, demonstrating the correctness of our analysis.

NeurIPS Conference 2009 Conference Paper

Statistical Consistency of Top-k Ranking

  • Fen Xia
  • Tie-Yan Liu
  • Hang Li

This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutation-level ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions.

NeurIPS Conference 2008 Conference Paper

Global Ranking Using Continuous Conditional Random Fields

  • Tao Qin
  • Tie-Yan Liu
  • Xu-Dong Zhang
  • De-sheng Wang
  • Hang Li

This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for `local ranking', in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i. e. , the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.

IS Journal 2003 Journal Article

Using bilingual Web data to mine and rank translations

  • Hang Li
  • Yunbo Cao
  • Cong Li

The English Reading Wizard uses bilingual Web and local-dictionary data to help readers understand foreign languages by translating words and phrases. Methods include the expectation maximization algorithm and bilingual bootstrapping.

IS Journal 2002 Journal Article

Mining open answers in questionnaire data

  • K. Yamanishi
  • Hang Li

Surveys are important tools for marketing and for managing customer relationships; the answers to open-ended questions, in particular, often contain valuable information and provide an important basis for business decisions. The summaries that human analysts make of these open answers, however, tend to rely too much on intuition and so aren't satisfactorily reliable. Moreover, because the Web makes it so easy to take surveys and solicit comments, companies are finding themselves inundated with data from questionnaires and other sources. Handling it all manually would be not only cumbersome but also costly. Thus, devising a computer system that can automatically mine useful information from open answers has become an important issue. We have developed a survey analysis system that works on these principles. The system mines open answers through two statistical learning techniques: rule learning (which we call rule analysis) and correspondence analysis.

NeurIPS Conference 1991 Conference Paper

A Comparison of Projection Pursuit and Neural Network Regression Modeling

  • Jenq-Neng Huang
  • Hang Li
  • Martin Maechler
  • R. Martin
  • Jim Schimert

Two projection based feedforward network learning methods for model(cid: 173) free regression problems are studied and compared in this paper: one is the popular back-propagation learning (BPL); the other is the projection pursuit learning (PPL). Unlike the totally parametric BPL method, the PPL non-parametrically estimates unknown nonlinear functions sequen(cid: 173) tially (neuron-by-neuron and layer-by-Iayer) at each iteration while jointly estimating the interconnection weights. In terms of learning efficiency, both methods have comparable training speed when based on a Gauss(cid: 173) Newton optimization algorithm while the PPL is more parsimonious. In terms of learning robustness toward noise outliers, the BPL is more sensi(cid: 173) tive to the outliers.