Arrow Research search

Author name cluster

Jun Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

AAAI Conference 2026 Conference Paper

Rethinking Flow and Diffusion Bridge Models for Speech Enhancement

  • Dahan Wang
  • Jun Gao
  • Tong Lei
  • Yuxiang Hu
  • Changbao Zhu
  • Kai Chen
  • Jing Lu

Flow matching and diffusion bridge models have emerged as leading paradigms in generative speech enhancement, modeling stochastic processes between paired noisy and clean speech signals based on principles such as flow matching, score matching, and Schrödinger bridge. In this paper, we present a framework that unifies existing flow and diffusion bridge models by interpreting them as constructions of Gaussian probability paths with varying means and variances between paired data. Furthermore, we investigate the underlying consistency between the training/inference procedures of these generative models and conventional predictive models. Our analysis reveals that each sampling step of a well-trained flow or diffusion bridge model optimized with a data prediction loss is theoretically analogous to executing predictive speech enhancement. Motivated by this insight, we introduce an enhanced bridge model that integrates an effective probability path design with key elements from predictive paradigms, including improved network architecture, tailored loss functions, and optimized training strategies. Experiments on denoising and dereverberation tasks demonstrate that the proposed method outperforms existing flow and diffusion baselines with fewer parameters and reduced computational complexity. The results also highlight that the inherently predictive nature of this generative framework imposes limitations on its achievable upper-bound performance.

AAAI Conference 2026 Conference Paper

The Semantic Architect: How FEAML Bridges Structured Data and LLMs for Multi-Label Tasks

  • Wanfu Gao
  • Zebin He
  • Jun Gao

Existing feature engineering methods based on large language models (LLMs) have not yet been applied to multi-label learning tasks. They lack the ability to model complex label dependencies and are not specifically adapted to the characteristics of multi-label tasks. To address the above issues, we propose Feature Engineering Automation for Multi-Label Learning (FEAML), an automated feature engineering method for multi-label classification which leverages the code generation capabilities of LLMs. By utilizing metadata and label co-occurrence matrices, LLMs are guided to understand the relationships between data features and task objectives, based on which high-quality features are generated.The newly generated features are evaluated in terms of model accuracy to assess their effectiveness, while Pearson correlation coefficients are used to detect redundancy. FEAML further incorporates the evaluation results as feedback to drive LLMs to continuously optimize code generation in subsequent iterations. By integrating LLMs with a feedback mechanism, FEAML realizes an efficient, interpretable and self-improving feature engineering paradigm. Empirical results on various multi-label datasets demonstrate that our FEAML outperforms other feature engineering methods.

AAAI Conference 2025 Conference Paper

AIM: Let Any Multimodal Large Language Models Embrace Efficient In-Context Learning

  • Jun Gao
  • Qian Qiao
  • Tianxiang Wu
  • Zili Wang
  • Ziqiang Cao
  • Wenjie Li

In-context learning (ICL) advances Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters. However, in the area of multimodal Large Language Models (MLLMs), two problems hinder the application of multimodal ICL: (1) Most primary MLLMs are only trained on single-image datasets, making them unable to read extra multimodal demonstrations. (2) With the demonstrations increasing, thousands of visual tokens highly challenge hardware and degrade ICL performance. During preliminary explorations, we discovered that the inner LLM focuses more on the linguistic modality within multimodal demonstrations during generation. Therefore, we propose a general and lightweight framework AIM to tackle the mentioned problems through Aggregating Image information of Multimodal demonstrations to the latent space of the corresponding textual labels. After aggregation, AIM substitutes each demonstration with generated fused virtual tokens whose length is reduced to the same as its texts. Except for shortening input length, AIM further upgrades MLLMs pre-trained on image-text pairs to support multimodal ICL, as images from demonstrations are disregarded. Furthermore, benefiting from aggregating different demonstrations independently, AIM configures Demonstration Bank (DB) to avoid repeated aggregation, which significantly boosts model efficiency. We build AIM upon QWen-VL and LLaVA-Next, and AIM is comprehensively evaluated on image caption, VQA, and hateful speech detection. Outstanding results reveal that AIM provides an efficient and effective solution in upgrading MLLMs for multimodal ICL.

TMLR Journal 2025 Journal Article

Distributed Hierarchical Decomposition Framework for Multimodal Timeseries Prediction

  • Wei Ye
  • Prashant Khanduri
  • Jiangweizhi Peng
  • Feng Tian
  • Jun Gao
  • Jie Ding
  • Zhi-Li Zhang
  • Mingyi Hong

We consider a distributed time series forecasting problem where multiple distributed nodes each observing a local time series (of potentially different modality) collaborate to make both local and global forecasts. This problem is particularly challenging because each node only observes time series generated from a subset of sources, making it challenging to utilize correlations among different streams for accurate forecasting; and the data streams observed at each node may represent different modalities, leading to heterogeneous computational requirements among nodes. To tackle these challenges, we propose a hierarchical learning framework, consisting of multiple local models and a global model, and provide a suite of efficient training algorithms to achieve high local and global forecasting accuracy. We theoretically establish the convergence of the proposed framework and demonstrate the effectiveness of the proposed approach using several time series forecasting tasks, with the (somewhat surprising) observation that the proposed distributed models can match, or even outperform centralized ones.

ICML Conference 2025 Conference Paper

EGPlace: An Efficient Macro Placement Method via Evolutionary Search with Greedy Repositioning Guided Mutation

  • Ji Deng
  • Zhao Li
  • Ji Zhang
  • Jun Gao

Macro placement, which involves optimizing the positions of modules, is a critical phase in modern integrated circuit design and significantly influences chip performance. The growing complexity of integrated circuits demands increasingly sophisticated placement solutions. Existing approaches have evolved along two primary paths (e. g. , constructive and adjustment methods), but they face significant practical limitations that affect real-world chip design. Recent hybrid frameworks such as WireMask-EA have attempted to combine these strategies, but significant technical barriers still remain, including the computational overhead from separated layout adjustment and reconstruction that often require complete layout rebuilding, the inefficient exploration of design spaces due to random mutation operations, and the computational complexity of mask-based construction methods that limit scalability. To overcome these limitations, we introduce EGPlace, a novel evolutionary optimization framework that combines guided mutation strategies with efficient layout reconstruction. EGPlace introduces two key innovations: a greedy repositioning-guided mutation operator that systematically identifies and optimizes critical layout regions, and an efficient mask computation algorithm that accelerates layout evaluation. Our extensive evaluation using ISPD2005 and Ariane RISC-V CPU benchmarks demonstrate that EGPlace reduces wirelength by 10. 8% and 9. 3% compared to WireMask-EA and the state-of-the-art reinforcement learning-based constructive method EfficientPlace, respectively, while achieving speedups of 7. 8$\times$ and 2. 8$\times$ over these methods.

IJCAI Conference 2025 Conference Paper

Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method

  • Wanfu Gao
  • Jun Gao
  • Qingqi Han
  • Hanlin Pan
  • Kunpeng Liu

The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlinear associations and may lead to misalignment between the feature space and the label space. To address these two critical challenges, we propose innovative solutions. First, we design a random walk graph that integrates feature-feature, label-label, and feature-label relationships to accurately capture nonlinear and implicit indirect associations, while optimizing the latent representations of associations between features and labels after low-rank decomposition. Second, we align the variable spaces by leveraging low-dimensional representation coefficients, while preserving the manifold structure between the original high-dimensional multi-label data and the low-dimensional representation space. Extensive experiments and ablation studies conducted on seven benchmark datasets and three representative datasets using various evaluation metrics demonstrate the superiority of the proposed method.

IJCAI Conference 2025 Conference Paper

Two-Stage Feature Generation with Transformer and Reinforcement Learning

  • Wanfu Gao
  • Zengyao Man
  • Zebin He
  • Yuhao Tang
  • Jun Gao
  • Kunpeng Liu

Feature generation is a critical step in machine learning, aiming to enhance model performance by capturing complex relationships within the data and generating meaningful new features. Traditional feature generation methods heavily rely on domain expertise and manual intervention, making the process labor-intensive and challenging to adapt to different scenarios. Although automated feature generation techniques address these issues to some extent, they often face challenges such as feature redundancy, inefficiency in feature space exploration, and limited adaptability to diverse datasets and tasks. To address these problems, we propose a Two-Stage Feature Generation (TSFG) framework, which integrates a Transformer-based encoder-decoder architecture with Proximal Policy Optimization (PPO). The encoder-decoder model in TSFG leverages the Transformer’s self-attention mechanism to efficiently represent and transform features, capturing complex dependencies within the data. PPO further enhances TSFG by dynamically adjusting the feature generation strategy based on task-specific feedback, optimizing the process for improved performance and adaptability. TSFG dynamically generates high-quality feature sets, significantly improving the predictive performance of machine learning models. Experimental results demonstrate that TSFG outperforms existing state-of-the-art methods in terms of feature quality and adaptability.

EAAI Journal 2024 Journal Article

IPNet: Polarization-based Camouflaged Object Detection via dual-flow network

  • Xin Wang
  • Jiajia Ding
  • Zhao Zhang
  • Junfeng Xu
  • Jun Gao

Camouflaged Object Detection (COD) is a critical task in a variety of domains, such as medicine and military applications. The main challenge in COD is accurately detecting and extracting the concealed object from the complex background. The similarity between the camouflaged objects and their background significantly reduces the accuracy of object extraction. Polarization information can provide valuable insights into the characteristics of objects with different material properties and surface roughness. It reflects the difference in polarization information between the object and the background, which increases the contrast between the two and improves the object detection accuracy even under complex scenes. In this paper, we propose IPNet, an efficient cross-modal fusion network that utilizes both RGB intensity and linear polarization cues to generate scene representation with high contrast. Our novel network architecture dynamically fuses RGB intensity and polarization cues using an efficient cross-modal fusion module, leveraging cross-level contextual information to achieve robust detection. For training and evaluating the proposed network, we construct a polarization-based PCOD_1200 dataset that contains 89 subclasses and 1200 samples. A comprehensive set of experiments demonstrates the effectiveness of IPNet to fuse polarization and RGB intensity information and shows that our approach outperforms state-of-the-art methods.

AAAI Conference 2023 Conference Paper

A Generative Approach for Script Event Prediction via Contrastive Fine-Tuning

  • Fangqi Zhu
  • Jun Gao
  • Changlong Yu
  • Wei Wang
  • Chen Xu
  • Xin Mu
  • Min Yang
  • Ruifeng Xu

Script event prediction aims to predict the subsequent event given the context. This requires the capability to infer the correlations between events. Recent works have attempted to improve event correlation reasoning by using pretrained language models and incorporating external knowledge (e.g., discourse relations). Though promising results have been achieved, some challenges still remain. First, the pretrained language models adopted by current works ignore event-level knowledge, resulting in an inability to capture the correlations between events well. Second, modeling correlations between events with discourse relations is limited because it can only capture explicit correlations between events with discourse markers, and cannot capture many implicit correlations. To this end, we propose a novel generative approach for this task, in which a pretrained language model is fine-tuned with an event-centric pretraining objective and predicts the next event within a generative paradigm. Specifically, we first introduce a novel event-level blank infilling strategy as the learning objective to inject event-level knowledge into the pretrained language model, and then design a likelihood-based contrastive loss for fine-tuning the generative model. Instead of using an additional prediction layer, we perform prediction by using sequence likelihoods generated by the generative model. Our approach models correlations between events in a soft way without any external knowledge. The likelihood-based prediction eliminates the need to use additional networks to make predictions and is somewhat interpretable since it scores each word in the event. Experimental results on the multi-choice narrative cloze (MCNC) task demonstrate that our approach achieves better results than other state-of-the-art baselines. Our code will be available at https://github.com/zhufq00/mcnc.

JBHI Journal 2023 Journal Article

Anatomically Guided Cross-Domain Repair and Screening for Ultrasound Fetal Biometry

  • Jun Gao
  • Qicheng Lao
  • Paul Liu
  • Huahui Yi
  • Qingbo Kang
  • Zekun Jiang
  • Xiaohu Wu
  • Kang Li

Ultrasound based estimation of fetal biometry is extensively used to diagnose prenatal abnormalities and to monitor fetal growth, for which accurate segmentation of the fetal anatomy is a crucial prerequisite. Although deep neural network-based models have achieved encouraging results on this task, inevitable distribution shifts in ultrasound images can still result in severe performance drop in real world deployment scenarios. In this article, we propose a complete ultrasound fetal examination system to deal with this troublesome problem by repairing and screening the anatomically implausible results. Our system consists of three main components: A routine segmentation network, a fetal anatomical key points guided repair network, and a shape-coding based selective screener. Guided by the anatomical key points, our repair network has stronger cross-domain repair capabilities, which can substantially improve the outputs of the segmentation network. By quantifying the distance between an arbitrary segmentation mask to its corresponding anatomical shape class, the proposed shape-coding based selective screener can then effectively reject the entire implausible results that cannot be fully repaired. Extensive experiments demonstrate that our proposed framework has strong anatomical guarantee and outperforms other methods in three different cross-domain scenarios.

JBHI Journal 2022 Journal Article

Atrial Fibrillation Detection and Atrial Fibrillation Burden Estimation via Wearables

  • Li Zhu
  • Viswam Nathan
  • Jilong Kuang
  • Jacob Kim
  • Robert Avram
  • Jeffrey Olgin
  • Jun Gao

Atrial Fibrillation (AF) is an important cardiac rhythm disorder, which if left untreated can lead to serious complications such as a stroke. AF can remain asymptomatic, and it can progressively worsen over time; it is thus a disorder that would benefit from detection and continuous monitoring with a wearable sensor. We develop an AF detection algorithm, deploy it on a smartwatch, and prospectively and comprehensively validate its performance on a real-world population that included patients diagnosed with AF. The algorithm showed a sensitivity of 87. 8% and a specificity of 97. 4% over every 5-minute segment of PPG evaluated. Furthermore, we introduce novel algorithm blocks and system designs to increase the time of coverage and monitor for AF even during periods of motion noise and other artifacts that would be encountered in daily-living scenarios. An average of 67. 8% of the entire duration the patients wore the smartwatch produced a valid decision. Finally, we present the ability of our algorithm to function throughout the day and estimate the AF burden, a first-of-this-kind measure using a wearable sensor, showing 98% correlation with the ground truth and an average error of 6. 2%.

NeurIPS Conference 2022 Conference Paper

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

  • Jun Gao
  • Tianchang Shen
  • Zian Wang
  • Wenzheng Chen
  • Kangxue Yin
  • Daiqing Li
  • Or Litany
  • Zan Gojcic

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream applications. Prior works on 3D generative modeling either lack geometric details, are limited in the mesh topology they can produce, typically do not support textures, or utilize neural renderers in the synthesis process, which makes their use in common 3D software non-trivial. In this work, we introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high fidelity textures. We bridge recent success in the differentiable surface modeling, differentiable rendering as well as 2D Generative Adversarial Networks to train our model from 2D image collections. GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings, achieving significant improvements over previous methods.

NeurIPS Conference 2021 Conference Paper

Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

  • Tianchang Shen
  • Jun Gao
  • Kangxue Yin
  • Ming-Yu Liu
  • Sanja Fidler

We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. It marries the merits of implicit and explicit 3D representations by leveraging a novel hybrid 3D representation. Compared to the current implicit approaches, which are trained to regress the signed distance values, DMTet directly optimizes for the reconstructed surface, which enables us to synthesize finer geometric details with fewer artifacts. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology. The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh. Our approach significantly outperforms existing work on conditional shape synthesis from coarse voxel inputs, trained on a dataset of complex 3D animal shapes. Project page: https: //nv-tlabs. github. io/DMTet/.

NeurIPS Conference 2021 Conference Paper

DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer

  • Wenzheng Chen
  • Joey Litalien
  • Jun Gao
  • Zian Wang
  • Clement Fuji Tsang
  • Sameh Khamis
  • Or Litany
  • Sanja Fidler

We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiable renderers. Many previous learning-based approaches for inverse graphics adopt rasterization-based renderers and assume naive lighting and material models, which often fail to account for non-Lambertian, specular reflections commonly observed in the wild. In this work, we propose DIBR++, a hybrid differentiable renderer which supports these photorealistic effects by combining rasterization and ray-tracing, taking the advantage of their respective strengths---speed and realism. Our renderer incorporates environmental lighting and spatially-varying material models to efficiently approximate light transport, either through direct estimation or via spherical basis functions. Compared to more advanced physics-based differentiable renderers leveraging path tracing, DIBR++ is highly performant due to its compact and expressive shading model, which enables easy integration with learning frameworks for geometry, reflectance and lighting prediction from a single image without requiring any ground-truth. We experimentally demonstrate that our approach achieves superior material and lighting disentanglement on synthetic and real data compared to existing rasterization-based approaches and showcase several artistic applications including material editing and relighting.

ICLR Conference 2021 Conference Paper

Information Laundering for Model Privacy

  • Xinran Wang
  • Yu Xiang 0004
  • Jun Gao
  • Jie Ding 0002

In this work, we propose information laundering, a novel framework for enhancing model privacy. Unlike data privacy that concerns the protection of raw data information, model privacy aims to protect an already-learned model that is to be deployed for public use. The private model can be obtained from general learning methods, and its deployment means that it will return a deterministic or random response for a given input query. An information-laundered model consists of probabilistic components that deliberately maneuver the intended input and output for queries of the model, so the model's adversarial acquisition is less likely. Under the proposed framework, we develop an information-theoretic principle to quantify the fundamental tradeoffs between model utility and privacy leakage and derive the optimal design.

NeurIPS Conference 2020 Conference Paper

Learning Deformable Tetrahedral Meshes for 3D Reconstruction

  • Jun Gao
  • Wenzheng Chen
  • Tommy Xiang
  • Alec Jacobson
  • Morgan McGuire
  • Sanja Fidler

3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce \emph{Deformable Tetrahedral Meshes} (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem. Unlike existing volumetric approaches, DefTet optimizes for both vertex placement and occupancy, and is differentiable with respect to standard 3D reconstruction loss functions. It is thus simultaneously high-precision, volumetric, and amenable to learning-based neural architectures. We show that it can represent arbitrary, complex topology, is both memory and computationally efficient, and can produce high-fidelity reconstructions with a significantly smaller grid size than alternative volumetric approaches. The predicted surfaces are also inherently defined as tetrahedral meshes, thus do not require post-processing. We demonstrate that DefTetmatches or exceeds both the quality of the previous best approaches and the performance of the fastest ones. Our approach obtains high-quality tetrahedral meshes computed directly from noisy point clouds, and is the first to showcase high-quality 3D results using only a single image as input.

IJCAI Conference 2019 Conference Paper

A Similarity Measurement Method Based on Graph Kernel for Disconnected Graphs

  • Jun Gao
  • Jianliang Gao

Disconnected graphs are very common in the real world. However, most existing methods for graph similarity focus on connected graph. In this paper, we propose an effective approach for measuring the similarity of disconnected graphs. By embedding connected subgraphs with graph kernel, we obtain the feature vectors in low dimensional space. Then, we match the subgraphs and weigh the similarity of matched subgraphs. Finally, an intuitive example shows the feasibility of the method.

IJCAI Conference 2019 Conference Paper

AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN

  • Li Zheng
  • Zhenpeng Li
  • Jian Li
  • Zhao Li
  • Jun Gao

Anomaly detection in dynamic graphs becomes very critical in many different application scenarios, e. g. , recommender systems, while it also raises huge challenges due to the high flexible nature of anomaly and lack of sufficient labelled data. It is better to learn the anomaly patterns by considering all possible features including the structural, content and temporal features, rather than utilizing heuristic rules over the partial features. In this paper, we propose AddGraph, a general end-to-end anomalous edge detection framework using an extended temporal GCN (Graph Convolutional Network) with an attention model, which can capture both long-term patterns and the short-term patterns in dynamic graphs. In order to cope with insufficient explicit labelled data, we employ the negative sampling and margin loss in training of AddGraph in a semi-supervised fashion. We conduct extensive experiments on real-world datasets, and illustrate that AddGraph can outperform the state-of-the-art competitors in anomaly detection significantly.

AAAI Conference 2019 Conference Paper

Generating Multiple Diverse Responses for Short-Text Conversation

  • Jun Gao
  • Wei Bi
  • Xiaojiang Liu
  • Junhui Li
  • Shuming Shi

Neural generative models have become popular and achieved promising performance on short-text conversation tasks. They are generally trained to build a 1-to-1 mapping from the input post to its output response. However, a given post is often associated with multiple replies simultaneously in real applications. Previous research on this task mainly focuses on improving the relevance and informativeness of the top one generated response for each post. Very few works study generating multiple accurate and diverse responses for the same post. In this paper, we propose a novel response generation model, which considers a set of responses jointly and generates multiple diverse responses simultaneously. A reinforcement learning algorithm is designed to solve our model. Experiments on two short-text conversation tasks validate that the multiple responses generated by our model obtain higher quality and larger diversity compared with various state-ofthe-art generative models.

NeurIPS Conference 2019 Conference Paper

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

  • Wenzheng Chen
  • Huan Ling
  • Jun Gao
  • Edward Smith
  • Jaakko Lehtinen
  • Alec Jacobson
  • Sanja Fidler

Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering. Enabling ML models to understand image formation might be key for generalization. However, due to an essential rasterization step involving discrete assignment operations, rendering pipelines are non-differentiable and thus largely inaccessible to gradient-based ML techniques. In this paper, we present DIB-Render, a novel rendering framework through which gradients can be analytically computed. Key to our approach is to view rasterization as a weighted interpolation, allowing image gradients to back-propagate through various standard vertex shaders within a single framework. Our approach supports optimizing over vertex positions, colors, normals, light directions and texture coordinates, and allows us to incorporate various well-known lighting models from graphics. We showcase our approach in two ML applications: single-image 3D object prediction, and 3D textured object generation, both trained using exclusively 2D supervision.

AAAI Conference 2018 Conference Paper

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation

  • Chang Zhou
  • Jinze Bai
  • Junshuai Song
  • Xiaofei Liu
  • Zhengchao Zhao
  • Xiusi Chen
  • Jun Gao

A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.

AAAI Conference 2017 Conference Paper

Scalable Graph Embedding for Asymmetric Proximity

  • Chang Zhou
  • Yuqiong Liu
  • Xiaofei Liu
  • Zhongyi Liu
  • Jun Gao

Graph Embedding methods are aimed at mapping each vertex into a low dimensional vector space, which preserves certain structural relationships among the vertices in the original graph. Recently, several works have been proposed to learn embeddings based on sampled paths from the graph, e. g. , DeepWalk, Line, Node2Vec. However, their methods only preserve symmetric proximities, which could be insuf- ficient in many applications, even the underlying graph is undirected. Besides, they lack of theoretical analysis of what exactly the relationships they preserve in their embedding space. In this paper, we propose an asymmetric proximity preserving (APP) graph embedding method via random walk with restart, which captures both asymmetric and high-order similarities between node pairs. We give theoretical analysis that our method implicitly preserves the Rooted PageRank score for any two vertices. We conduct extensive experiments on tasks of link prediction and node recommendation on open source datasets, as well as online recommendation services in Alibaba Group, in which the training graph has over 290 million vertices and 18 billion edges, showing our method to be highly scalable and effective.

IJCAI Conference 2015 Conference Paper

Saliency Detection with a Deeper Investigation of Light Field

  • Jun Zhang
  • Meng Wang
  • Jun Gao
  • Yi Wang
  • Xudong Zhang
  • Xindong Wu

Although the light field has been recently recognized helpful in saliency detection, it is not comprehensively explored yet. In this work, we propose a new saliency detection model with light field data. The idea behind the proposed model originates from the following observations. (1) People can distinguish regions at different depth levels via adjusting the focus of eyes. Similarly, a light field image can generate a set of focal slices focusing at different depth levels, which suggests that a background can be weighted by selecting the corresponding slice. We show that background priors encoded by light field focusness have advantages in eliminating background distraction and enhancing the saliency by weighting the light field contrast. (2) Regions at closer depth ranges tend to be salient, while far in the distance mostly belong to the backgrounds. We show that foreground objects can be easily separated from similar or cluttered backgrounds by exploiting their light field depth. Extensive evaluations on the recently introduced Light Field Saliency Dataset (LFSD) [Li et al. , 2014], including studies of different light field cues and comparisons with Li et al. ’s method (the only reported light field saliency detection approach to our knowledge) and the 2D/3D state-of-the-art approaches extended with light field depth/focusness information, show that the investigated light field properties are complementary with each other and lead to improvements on 2D/3D models, and our approach produces superior results in comparison with the state-of-the-art.

IJCAI Conference 2011 Conference Paper

Learning to Rank under Multiple Annotators

  • Ou Wu
  • Weiming Hu
  • Jun Gao

Learning to rank has received great attention in recent years as it plays a crucial role in information retrieval. The existing concept of learning to rank assumes that each training sample is associated with an instance and a reliable label. However, in practice, this assumption does not necessarily hold true. This study focuses on the learning to rank when each training instance is labeled by multiple annotators that may be unreliable. In such a scenario, no accurate labels can be obtained. This study proposes two learning approaches. One is to simply estimate the ground truth first and then to learn a ranking model with it. The second approach is a maximum likelihood learning approach which estimates the ground truth and learns the ranking model iteratively. The two approaches have been tested on both synthetic and real-world data. The results reveal that the maximum likelihood approach outperforms the first approach significantly and is comparable of achieving results with the learning model considering reliable labels. Further more, both the approaches have been applied for ranking the Web visual clutter.

IROS Conference 2009 Conference Paper

Design of a robotic fish propelled by oscillating flexible pectoral foils

  • Yueri Cai
  • Shusheng Bi
  • Lige Zhang
  • Jun Gao

This paper proposed a new method of designing a flexible biomimetic fish propelled by oscillating flexible pectoral fins. The molding soft body is adopted in the robotic fish. Pneumatic artificial muscles are utilized as driving sources and two ribs with distributed flexibility as main parts of the propulsive mechanism. The leading edge locomotion profile of the flexible pectoral fin in air is studied experimentally, and the flapping locomotion in water is observed too. Finally, the effectiveness of the proposed method is illustrated by the experiment. It shows that the robotic fish can realize self-driven, and it can swim at a speed of 0. 18 m/s~0. 20 m/s after optimization.

TIME Conference 2006 Conference Paper

Adaptive Interpolation Algorithms for Temporal-Oriented Datasets

  • Jun Gao

Spatiotemporal datasets can be classified into two categories: temporal-oriented and spatial-oriented datasets depending on whether missing spatiotemporal values are closer to the values of its temporal or spatial neighbors. We present an adaptive spatiotemporal interpolation model that can estimate the missing values in both categories of spatiotemporal datasets. The key parameters of the adaptive spatiotemporal interpolation model can be adjusted based on experience