Arrow Research search

Author name cluster

Yu Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

65 papers
2 author rows

Possible papers

65

AAAI Conference 2026 Conference Paper

Activations as Features: Probing LLMs for Generalizable Essay Scoring Representations

  • Jinwei Chi
  • Ke Wang
  • Yu Chen
  • Xuanye Lin
  • Qiang Xu

Automated essay scoring (AES) is a challenging task in cross-prompt settings due to the diversity of scoring criteria. While previous studies have focused on the output of large language models (LLMs) to improve scoring accuracy, we believe activations from intermediate layers may also provide valuable information. To explore this possibility, we evaluated the discriminative power of LLMs’ activations in cross-prompt essay scoring task. Specifically, we used activation to fit probes and further analyzed the effects of different models and input content of LLMs on this discriminative power. By computing the directions of essays across various trait dimensions under different prompts, we analyzed the variation in evaluation perspectives of large language models concerning essay types and traits. Results show that the activations possess strong discriminative power in evaluating essay quality and that LLMs can adapt their evaluation perspectives to different traits and essay types, effectively handling the diversity of scoring criteria in cross-prompt settings.

AAAI Conference 2026 Conference Paper

AV-SSAN: Audio-Visual Selective DOA Estimation Through Explicit Multi-Band Semantic-Spatial Alignment

  • Yu Chen
  • Hongxu Zhu
  • Jiadong Wang
  • Kainan Chen
  • Xinyuan Qian

Audio-visual sound source localization (AV-SSL) estimates the position of sound sources by fusing auditory and visual cues. Current AV-SSL methodologies typically require spatially-paired audio-visual data and cannot selectively localize specific target sources. To address these limitations, we introduce Cross-Instance Audio-Visual Localization (CI-AVL), a novel task that localizes target sound sources using visual prompts from different instances of the same semantic class. CI-AVL enables selective localization without spatially paired data. To solve this task, we propose AV-SSAN, a semantic-spatial alignment framework centered on a Multi-Band Semantic-Spatial Alignment Network (MB-SSA Net). MB-SSA Net decomposes the audio spectrogram into multiple frequency bands, aligns each band with semantic visual prompts, and refines spatial cues to estimate the direction-of-arrival (DoA). To facilitate this research, we construct VGGSound-SSL, a large-scale dataset comprising 13,981 spatial audio clips across 296 categories, each paired with visual prompts. AV-SSAN achieves a mean absolute error of 16.59° and an accuracy of 71.29%, significantly outperforming existing AV-SSL methods.

ICML Conference 2025 Conference Paper

A Chaotic Dynamics Framework Inspired by Dorsal Stream for Event Signal Processing

  • Yu Chen
  • Jing Lian 0001
  • Zhaofei Yu
  • Jizhao Liu
  • Jisheng Dang
  • Gang Wang 0031

Event cameras are bio-inspired vision sensors that encode visual information with high dynamic range, high temporal resolution, and low latency. Current state-of-the-art event stream processing methods rely on end-to-end deep learning techniques. However, these models are heavily dependent on data structures, limiting their stability and generalization capabilities across tasks, thereby hindering their deployment in real-world scenarios. To address this issue, we propose a chaotic dynamics event signal processing framework inspired by the dorsal visual pathway of the brain. Specifically, we utilize Continuous-coupled Neural Network (CCNN) to encode the event stream. CCNN encodes polarity-invariant event sequences as periodic signals and polarity-changing event sequences as chaotic signals. We then use continuous wavelet transforms to analyze the dynamical states of CCNN neurons and establish the high-order mappings of the event stream. The effectiveness of our method is validated through integration with conventional classification networks, achieving state-of-the-art classification accuracy on the N-Caltech101 and N-CARS datasets, with results of 84. 3% and 99. 9%, respectively. Our method improves the accuracy of event camera-based object classification while significantly enhancing the generalization and stability of event representation.

AAAI Conference 2025 Conference Paper

A Pioneering Neural Network Method for Efficient and Robust Fuel Sloshing Simulation in Aircraft

  • Yu Chen
  • Shuai Zheng
  • Nianyi Wang
  • Menglong Jin
  • Yan Chang

Simulating fuel sloshing within aircraft tanks during flight is crucial for aircraft safety research. Traditional methods based on Navier-Stokes equations are computationally expensive. In this paper, we treat fluid motion as point cloud transformation and propose the first neural network method specifically designed for simulating fuel sloshing in aircraft. This model is also the first deep learning model capable of stably modeling fluid particle dynamics in such complex scenarios. Our triangle feature fusion design achieves an optimal balance among fluid dynamics modeling, momentum conservation constraints, and global stability control. Additionally, we constructed the Fueltank dataset, the first dataset for aircraft fuel surface sloshing. It comprises 320,000 frames across four typical tank types and covers a wide range of flight maneuvers, including multi-directional rotations. We conducted comprehensive experiments on both our dataset and the take-off scenario of the aircraft. Compared to existing neural network-based fluid simulation algorithms, we significantly enhanced accuracy while maintaining high computational speed. Compared to traditional SPH methods, our speed improved approximately 10 times. Furthermore, compared to traditional fluid simulation software such as Flow3D, our computation speed increased by more than 300 times.

NeurIPS Conference 2025 Conference Paper

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

  • Xiaoyang Liu
  • Kangjie Bao
  • Jiashuo Zhang
  • Yunqi Liu
  • Yu Chen
  • Yuntian Liu
  • Yang Jiao
  • Tao Luo

Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of the student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. Running the proposed ATLAS framework for 10 iterations, we construct an undergraduate-level dataset of 117k theorem statements and develop the ATLAS Translator by fine-tuning Llama3. 1-8B-Instruct with LoRA. This model establishes a new state of the art, demonstrating statistically significant improvements over both the Herald Translator and the Kimina-Autoformalizer across all benchmarks (p<0. 05, two-sided t-test). Furthermore, we demonstrate that the full-parameter fine-tuning of a stronger base model on the ATLAS dataset leads to superior performance. The datasets, model, and code are available at https: //github. com/XiaoyangLiu-sjtu/ATLAS.

NeurIPS Conference 2025 Conference Paper

Confidence-Aware With Prototype Alignment for Partial Multi-label Learning

  • Weijun Lv
  • Yu Chen
  • Xiaozhao Fang
  • Xuhuan Zhu
  • Jie Wen
  • Guoxu Zhou
  • Sixian Chan

Label prototype learning has emerged as an effective paradigm in Partial Multi-Label Learning (PML), providing a distinctive framework for modeling structured representations of label semantics while naturally filtering noise through prototype-based label confidence estimation. However, existing prototype-based methods face a critical limitation: class prototypes are the biased estimates due to noisy candidate labels, particularly when positive samples are scarce. To this end, we first propose a mutually class prototype alignment strategy bypassing noise interference by introducing two different transformation matrices, which makes the class prototypes learned by the fuzzy clustering and candidate label set mutually alignment for correcting themselves. Such alignment is also passed on to the fuzzy memberships label in turn. In addition, to eliminate noise interference in the candidate label set during the classifier learning, we use the learned permutation matrix to transform the fuzzy memberships label for learning a label reliability indicator matrix accompanied by the candidate label set. This makes the label reliability indicator matrix absolutely prevent the occurrence of numerical values located in non-label and simultaneously eliminate the introduction of incorrect label as much as possible. The resulting indicator matrix guides a robust multi-label classifier training process, jointly optimizing label confidence and classifier parameters. Extensive experiments demonstrate that our proposed model exhibits significant performance advantages over state-of-the-art PML approaches.

NeurIPS Conference 2025 Conference Paper

Deep Gaussian from Motion: Exploring 3D Geometric Foundation Models for Gaussian Splatting

  • Yu Chen
  • Rolandos Alexandros Potamias
  • Evangelos Ververas
  • Jifei Song
  • Jiankang Deng
  • Gim Hee Lee

Neural radiance fields (NeRF) and 3D Gaussian Splatting (3DGS) are popular techniques to reconstruct and render photorealistic images. However, the prerequisite of running Structure-from-Motion (SfM) to get camera poses limits their completeness. Although previous methods can reconstruct a few unposed images, they are not applicable when images are unordered or densely captured. In this work, we propose a method to train 3DGS from unposed images. Our method leverages a pre-trained 3D geometric foundation model as the neural scene representation. Since the accuracy of the predicted pointmaps does not suffice for accurate image registration and high-fidelity image rendering, we propose to mitigate the issue by initializing and fine-tuning the pre-trained model from a seed image. The images are then progressively registered and added to the training buffer, which is used to train the model further. We also propose to refine the camera poses and pointmaps by minimizing a point-to-camera ray consistency loss across multiple views. When evaluated on diverse challenging datasets, our method outperforms state-of-the-art pose-free NeRF/3DGS methods in terms of both camera pose accuracy and novel view synthesis, and even renders higher fidelity images than 3DGS trained with COLMAP poses.

NeurIPS Conference 2025 Conference Paper

FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network

  • Shuo Xu
  • Yu Chen
  • Shuxia Lin
  • Xin Geng
  • Xu Yang

The Transformer architecture serves as the foundation of modern AI systems, powering recent advances in Large Language Models (LLMs) and Large Multimodal Models (LMMs). Central to these models, attention mechanisms capture contextual dependencies via token interactions. Beyond inference, attention has been widely adopted for interpretability, offering insights into model behavior. Among interpretability techniques, attention flow --- which traces global information transfer across layers --- provides a more comprehensive perspective than single-layer attention maps. However, computing attention flow is computationally intensive due to the high complexity of max-flow algorithms. To address this challenge, we introduce FlowPrune, a novel framework that accelerates attention flow analysis by pruning the attention graph before applying max-flow computations. FlowPrune uses the Max-Flow Min-Cut Theorem and two structural properties of Transformer to identify and eliminate non-critical graph regions. It comprises two components: Edge Pruning, which removes insignificant attention edges, and Layer Compression, which discards layers with minimal contributions to the flow. We conduct extensive experiments on LLaMA and LLaVA to evaluate the robustness and effectiveness of FlowPrune. Our results show that FlowPrune achieves high agreement with the original attention flow in both absolute and relative error metrics, as well as in identifying influential input tokens. Finally, case studies in both NLP and vision domains demonstrate that FlowPrune produces consistent interpretability outcomes as the original Attention Flow, validating its practical utility. The code for this paper is publicly available.

ICRA Conference 2025 Conference Paper

JORD: A Benchmark Dataset for Off-Road LiDAR Place Recognition and SLAM

  • Wei Zhou 0051
  • Tongzhou Zhang 0001
  • Qian Xu 0016
  • Yu Chen
  • Minghui Hou
  • Gang Wang 0013

Simultaneous localization and mapping (SLAM) is a crucial component of unmanned systems, playing a key role in autonomous navigation. Currently, most LiDAR SLAM methods are focused on structured environments. However, highly irregular off-road terrain poses more challenges for LiDAR SLAM tasks, but these environments are not fully represented in existing datasets. To address this issue, we introduce the first dedicated LiDAR SLAM benchmark dataset for off-road environments, named Jlurobot Off-Road Dadaset (JORD). This dataset is collected using a custom avenger data collection platform in large-scale forest off-road scenes, consisting of 8 LiDAR sequences with a total length of approximately 6. 07 kilometers, containing 49, 144 point cloud frames along with accurate 6DoF ground truth. The dataset includes multiple revisit information within the sequences, making it suitable for LiDAR place recognition and SLAM tasks. Furthermore, we employe several state-of-the-art methods for benchmarking to validate the dataset's challenges. The release of JORD aims to provide researchers with valuable resources to develop new approaches and explore novel directions for unmanned systems in off-road environments. The complete dataset and code is available at https://github.com/jiurobots/JORD.

ICRA Conference 2025 Conference Paper

Propagative Distance Optimization for Motion Planning

  • Yu Chen
  • Jinyun Xu
  • Yilin Cai
  • Ting-Wei Wong
  • Zhongqiang Ren
  • Howie Choset
  • Guanya Shi

This paper focuses on the motion planning problem for serial articulated robots with revolute joints under kinematic constraints. Many motion planners leverage iterative local optimization methods but are often trapped in local minima due to non-convexity of the problem. A key reason for the non-convexity is the trigonometric term when parameterizing the kinematics using joint angles. Recent distance-based formulations can eliminate these trigonometric terms by formulating the kinematics based on distances, and has shown superior performance against classic joint angle based formulations in domains like inverse kinematics (IK). However, distance-based kinematics formulations have not yet been studied for motion planning, and naively applying them for motion planning may lead to poor computational efficiency. In particular, IK seeks one configuration while motion planning seeks a sequence of configurations, which greatly increases the scale of the underlying optimization problem. This paper proposes Propagative Distance Optimization for Motion Planning (PDOMP), which addresses the challenge by (i) introducing a new compact representation that reduces the number of variables in the distance-based formulation, and (ii) leveraging the chain structure to efficiently compute forward kinematics and Jacobians of the robot among waypoints along a path. Test results show that PDOMP runs up to 10 times faster than the sampling-based and angle-based-optimization baseline methods.

IJCAI Conference 2025 Conference Paper

Pseudo-Label Reconstruction for Partial Multi-Label Learning

  • Yu Chen
  • Fang Li
  • Na Han
  • Guanbin Li
  • Hongbo Gao
  • Sixian Chan
  • Xiaozhao Fang

In Partial Multi-Label Learning (PML), each instance is associated with a candidate label set containing multiple relevant labels along with other false positive labels. Currently, most PML methods directly extract instance correlation from instance features while ignoring the candidate labels, which may contain more discriminative instance-related information. This paper argues that, with a well-designed model, more accurate instance correlation can be mined from the candidate labels to facilitate label disambiguation. To this end, we propose a novel PML method based on pseudo-label reconstruction (PML-PLR). Specifically, we first propose a novel orthogonal candidate label reconstruction method, which jointly optimizes with instance features to extract more consistent instance correlation. Then, we use instance correlation as reconstruction coefficient to reconstruct pseudo-labels. Subsequently, through local manifold learning, the reconstructed pseudo-labels are leveraged to propagate the consistency relationship between labels and instances, thereby improving the accuracy of pseudo-labels. Extensive experiments and analyses demonstrate that the proposed PML-PLR outperforms state-of-the-art methods.

NeurIPS Conference 2024 Conference Paper

DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

  • Yu Chen
  • Gim Hee Lee

The recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task. With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training efficiency of 3DGS on large-scale scenes has not gained much attention. In this work, we propose DoGaussian, a method that trains 3DGS distributedly. Our method first decomposes a scene into $K$ blocks and then introduces the Alternating Direction Method of Multipliers (ADMM) into the training procedure of 3DGS. During training, our DoGaussian maintains one global 3DGS model on the master node and $K$ local 3DGS models on the slave nodes. The $K$ local 3DGS models are dropped after training and we only query the global 3DGS model during inference. The training time is reduced by scene decomposition, and the training convergence and stability are guaranteed through the consensus on the shared 3D Gaussians. Our method accelerates the training of 3DGS by $6+$ times when evaluated on large-scale scenes while concurrently achieving state-of-the-art rendering quality. Our code is publicly available at [https: //github. com/AIBluefisher/DOGS](https: //github. com/AIBluefisher/DOGS).

NeurIPS Conference 2024 Conference Paper

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

  • Chubin Zhang
  • Hongliang Song
  • Yi Wei
  • Yu Chen
  • Jiwen Lu
  • Yansong Tang

In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications. The project page: https: //linshan-bin. github. io/GeoLRM/.

NeurIPS Conference 2024 Conference Paper

MeLLoC: Lossless Compression with High-order Mechanism Learning

  • Xinyue Luo
  • Jin Cheng
  • Yu Chen

Lossless compression of large-scale scientific floating-point data is critical yet challenging due to the presence of high-order information and noise that arises from model truncation and discretization errors. Existing entropy coding techniques fail to effectively leverage the mechanisms underlying the data generation process. This paper introduces MeLLoC(Mechanism Learning for Lossless Compression), a novel approach that combines high-order mechanism learning with classical encoding to enhance lossless compression for scientific data. The key idea is to treat the data as discrete samples from an underlying physical field described by differential equations and solve an inverse problem to identify the governing equation coefficients exhibiting more compressible numeric representations. Periodic extension techniques are employed to accelerate the decompression. Through extensive experiments on various scientific datasets, MeLLoC consistently outperforms state-of-the-art lossless compressors while offering compelling trade-offs between compression ratios and computational costs. This work opens up new avenues for exploiting domain knowledge and high-order information to improve data compression in scientific computing.

AAAI Conference 2024 Conference Paper

Multi-Constellation-Inspired Single-Shot Global LiDAR Localization

  • Tongzhou Zhang
  • Gang Wang
  • Yu Chen
  • Hai Zhang
  • Jue Hu

Global localization is a challenging task for intelligent robots, as its accuracy directly contributes to the performance of downstream navigation and planning tasks. However, existing literature focus more on the place retrieval and the success rate of localization, with limited attention given to the metrics of position estimation. In this paper, a single-shot global LiDAR localization method is proposed with the ultimate goal of achieving high position accuracy, inspired by the positioning approach of multi-constellation localization systems. Initially, we perform coarse localization using global descriptors and select observation points along with their corresponding coordinates based on the obtained coarse localization results. Coordinates can be acquired from a pre-built map, GNSS, or other devices. Then, a lightweight LiDAR odometry method is designed to estimate the distance between the retrieved data and the observation points. Ultimately, the localization problem is transformed into an optimization problem of solving a system of multiple sphere equations. The experimental results on the KITTI dataset and the self-collected dataset demonstrate that our method achieves an average localization error (including errors in the z-axis) of 0.89 meters. In addition, it achieves retrieval efficiency of 0.357 s per frame on the former dataset and 0.214 s per frame on the latter one. Code and data are available at https://github.com/jlurobot/multi-constellation-localization.

TMLR Journal 2024 Journal Article

On the Equivalence of Graph Convolution and Mixup

  • Xiaotian Han
  • Hanqing Zeng
  • Yu Chen
  • Shaoliang Nie
  • Jingzhou Liu
  • Kanika Narang
  • Zahra Shakeri
  • Karthik Abinav Sankararaman

This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples. One commonality between these techniques is their utilization of information from multiple samples to derive feature representation. This study aims to explore whether a connection exists between the two. Our investigation reveals that, under two mild modifications, graph convolution can be viewed as a specialized form of Mixup that is applied during both the training and testing phases. The two modifications are 1) \textit{Homophily Relabel} - assigning the target node's label to all its neighbors, and 2) \textit{Test-Time Mixup} - Mixup the feature during the test time. We establish this equivalence mathematically by demonstrating that graph convolution networks and simplified graph convolution can be expressed as a form of Mixup. We also empirically verify the equivalence by training an MLP using the two modifications to achieve comparable performance.

UAI Conference 2024 Conference Paper

Reflected Schrödinger Bridge for Constrained Generative Modeling

  • Wei Deng 0002
  • Yu Chen
  • Nicole Tianjiao Yang
  • Hengrong Du
  • Qi Feng 0005
  • Ricky T. Q. Chen

Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic mappings and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrödinger Bridge algorithm{—}an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence{—}a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks.

ICRA Conference 2024 Conference Paper

Safe and Individualized Motion Planning for Upper-limb Exoskeleton Robots Using Human Demonstration and Interactive Learning

  • Yu Chen
  • Gong Chen 0001
  • Jing Ye 0005
  • Xiangjun Qiu
  • Xiang Li 0009

A typical application of upper-limb exoskeleton robots is deployment in rehabilitation training, helping patients to regain manipulative abilities. However, as the patient is not always capable of following the robot, safety issues may arise during the training. Due to the bias in different patients, an individualized scheme is also important to ensure that the robot suits the specific conditions (e. g. , movement habits) of a patient, hence guaranteeing effectiveness. To fulfill this requirement, this paper proposes a new motion planning scheme for upper-limb exoskeleton robots, which drives the robot to provide customized, safe, and individualized assistance using both human demonstration and interactive learning. Specifically, the robot first learns from a group of healthy subjects to generate a reference motion trajectory via probabilistic movement primitives (ProMP). It then learns from the patient during the training process to further shape the trajectory inside a moving safe region. The interactive data is fed back into the ProMP iteratively to enhance the individualized features for as long as the training process continues. The robot tracks the individualized trajectory under a variable impedance model to realize the assistance. Finally, the experimental results are presented in this paper to validate the proposed control scheme.

ICML Conference 2024 Conference Paper

Variational Schrödinger Diffusion Models

  • Wei Deng 0002
  • Weijian Luo
  • Yixin Tan
  • Marin Bilos
  • Yu Chen
  • Yuriy Nevmyvaka
  • Ricky T. Q. Chen

Schrödinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the (costly) implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore simulation-free properties in training backward scores. We propose the variational Schrödinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations required by SB.

ICRA Conference 2023 Conference Paper

AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from Motion

  • Yu Chen
  • Zihao Yu
  • Shu Song
  • Tianning Yu
  • Jianming Li
  • Gim Hee Lee

Despite the impressive results achieved by many existing Structure from Motion (SfM) approaches, there is still a need to improve the robustness, accuracy, and efficiency on large-scale scenes with many outlier matches and sparse view graphs. In this paper, we propose AdaSfM: a coarse-to-fine adaptive SfM approach that is scalable to large-scale and challenging datasets. Our approach first does a coarse global SfM which improves the reliability of the view graph by leveraging measurements from low-cost sensors such as Inertial Measurement Units (IMUs) and wheel encoders. Subsequently, the view graph is divided into sub-scenes that are refined in parallel by a fine local incremental SfM regularised by the result from the coarse global SfM to improve the camera registration accuracy and alleviate scene drifts. Finally, our approach uses a threshold-adaptive strategy to align all local reconstructions to the coordinate frame of global SfM. Extensive experiments on large-scale benchmark datasets show that our approach achieves state-of-the-art accuracy and efficiency. [Project Page]

UAI Conference 2023 Conference Paper

Detection of Short-Term Temporal Dependencies in Hawkes Processes with Heterogeneous Background Dynamics

  • Yu Chen
  • Fengpei Li
  • Anderson Schneider
  • Yuriy Nevmyvaka
  • Asohan Amarasingham
  • Henry Lam

Many kinds of simultaneously-observed event sequences exhibit mutually exciting or inhibiting patterns. Reliable detection of such temporal dependencies is crucial for scientific investigation. A common model is the Multivariate Hawkes Process (MHP), whose impact function naturally encodes a causal structure in Granger causality. However, the vast majority of existing methods use a transformed standard MHP intensity with a constant baseline, which may be inconsistent with real-world data. On the other hand, modeling irregular and unknown background dynamics directly is a challenge, as one struggles to distinguish the effect of mutual interaction from that of fluctuations in background dynamics. In this paper, we address the short-term temporal dependency detection issue. We show that maximum likelihood estimation (MLE) for cross-impact from MHP has an error that can not be eliminated, but may be reduced by an order of magnitude using a heterogeneous intensity not for the target HP but for the interacting HP. Then we propose a robust and computationally-efficient modification of MLE that does not rely on the prior estimation of the heterogeneous intensity and is thus applicable in a data-limited regime (e. g. , few-shot, unrepeated observations). Extensive experiments on various datasets show that our method outperforms existing ones by notable margins, with highlighted novel applications in neuroscience.

UAI Conference 2023 Conference Paper

Inference and sampling of point processes from diffusion excursions

  • Ali Hasan
  • Yu Chen
  • Yuting Ng
  • Mohamed Abdelghani
  • Anderson Schneider
  • Vahid Tarokh

Point processes often have a natural interpretation with respect to a continuous process. We propose a point process construction that describes arrival time observations in terms of the state of a latent diffusion process. In this framework, we relate the return times of a diffusion in a continuous path space to new arrivals of the point process. This leads to a continuous sample path that is used to describe the underlying mechanism generating the arrival distribution. These models arise in many disciplines, such as financial settings where actions in a market are determined by a hidden continuous price or in neuroscience where a latent stimulus generates spike trains. Based on the developments in Itô’s excursion theory, we propose methods for inferring and sampling from the point process derived from the latent diffusion process. We illustrate the approach with numerical examples using both simulated and real data. The proposed methods and framework provide a basis for interpreting point processes through the lens of diffusions.

UAI Conference 2023 Conference Paper

Information theoretic clustering via divergence maximization among clusters

  • Sahil Garg
  • Mina Dalirrooyfard
  • Anderson Schneider
  • Yeshaya Adler
  • Yuriy Nevmyvaka
  • Yu Chen
  • Fengpei Li
  • Guillermo A. Cecchi

Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. Such an approach, however, does not explicitly promote any type of inter-cluster behavior. We instead propose to maximize the Kullback-Leibler divergence between the underlying data distributions associated to clusters (referred to as cluster distributions). We show it to entail the mutual information criterion along with maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k-1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w. r. t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.

ICRA Conference 2023 Conference Paper

Multi-Modal Learning and Relaxation of Physical Conflict for an Exoskeleton Robot with Proprioceptive Perception

  • Xuan Zhang
  • Yana Shu
  • Yu Chen
  • Gong Chen 0001
  • Jing Ye 0005
  • Xiu Li 0001
  • Xiang Li 0009

Exoskeleton robots provide assistive forces to suit the human subject via physical human-robot interaction. During the closely-coupled interaction, a mismatch between the wearer and the robot may result in physical conflict, which could affect assistance efficiency or even compromise safety. Therefore, such conflicts should be accurately detected and then properly relaxed by adjusting the robot's action. This paper proposes a new learning scheme to detect physical conflicts between humans and robots. The constructed learning network receives multi-modal information from proprioceptive sensors and then outputs the anomaly score to specify the physical conflict, which score is further used to continuously adjust the robot impedance to ensure a safe and efficient interaction. Such a formulation allows the robot to explore the semantic information during the interaction (e. g. , gait phases, imbalance, human fatigue) and hence react properly to the physical conflict. Experimental results and comparative studies on a lower-limb exoskeleton robot are presented to illustrate that the proposed learning scheme can deal with physical conflicts in a faster and more accurate manner.

ICML Conference 2023 Conference Paper

Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation

  • Yu Chen
  • Wei Deng 0002
  • Shikai Fang
  • Fengpei Li
  • Nicole Tianjiao Yang
  • Yikai Zhang 0003
  • Kashif Rasul
  • Shandian Zhe

The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections. As for its practical applications, we apply SBP to probabilistic time series imputation by generating missing values conditioned on observed data. We show that optimizing the transport cost improves the performance and the proposed algorithm achieves the state-of-the-art result in healthcare and environmental data while exhibiting the advantage of exploring both temporal and feature patterns in probabilistic time series imputation.

IROS Conference 2023 Conference Paper

Two-Stage Trajectory-Tracking Control of Cable-Driven Upper-Limb Exoskeleton Robots with Series Elastic Actuators: A Simple, Accurate, and Force-Sensorless Method

  • Yana Shu
  • Yu Chen
  • Xuan Zhang
  • Shisheng Zhang
  • Gong Chen 0001
  • Jing Ye 0005
  • Xiang Li 0009

The advantages of cable-driven exoskeleton robots with series elastic actuators can be summarized in twofold: 1) the inertia of the robot joint is relatively low, which is more friendly for human-robot interaction; 2) the elastic element is tolerant to impacts and hence provides structural safety. As trade-offs, the overall dynamic model of such a system is of high order and subject to both unmodelled disturbances (due to the cable-driven mechanism) and external torques (due to the human-robot interaction), opening up challenges for the controller development. This paper proposes a new trajectory-tracking control scheme for cable-driven upper-limb exoskeleton robots with series elastic actuators. The control objectives are achieved in two stages: Stage I is to approximate then compensate for unmodelled disturbances with iterative learning techniques; Stage II is to employ a suboptimal model predictive controller to drive the robot to track the desired trajectory. While controlling such a robot is not trivial, the proposed control scheme exhibits the advantages of force-sensorlessness, high accuracy, and low complexity compared with other methods in the real-world experiments.

UAI Conference 2022 Conference Paper

Estimating transfer entropy under long ranged dependencies

  • Sahil Garg
  • Umang Gupta
  • Yu Chen
  • Syamantak Datta Gupta
  • Yeshaya Adler
  • Anderson Schneider
  • Yuriy Nevmyvaka

Estimating Transfer Entropy (TE) between time series is a highly impactful problem in fields such as finance and neuroscience. The well-known nearest neighbor estimator of TE potentially fails if temporal dependencies are noisy and long ranged, primarily because it estimates TE indirectly relying on the estimation of joint entropy terms in high dimensions, which is a hard problem in itself. Other estimators, such as those based on Copula entropy or conditional mutual information have similar limitations. Leveraging the successes of modern discriminative models that operate in high dimensional (noisy) feature spaces, we express TE as a difference of two conditional entropy terms, which we directly estimate from conditional likelihoods computed in-sample from any discriminator (timeseries forecaster) trained per maximum likelihood principle. To ensure that the in-sample log likelihood estimates are not overfit to the data, we propose a novel perturbation model based on locality sensitive hash (LSH) functions, which regularizes a discriminative model to have smooth functional outputs within local neighborhoods of the input space. Our estimator is consistent, and its variance reduces linearly in sample size. We also demonstrate its superiority w. r. t. state-of-the-art estimators through empirical evaluations on a synthetic as well as real world datasets from the neuroscience and finance domains.

IROS Conference 2022 Conference Paper

Hierarchical Learning and Control for In-Hand Micromanipulation Using Multiple Laser-Driven Micro-Tools

  • Yongyi Jia
  • Yu Chen
  • Hao Liu
  • Xiu Li 0001
  • Xiang Li 0009

Laser-driven micro-tools are formulated by treating highly-focused laser beams as actuators, to control the tool's motion to contact then manipulate a micro object, which allows it to manipulate opaque micro objects, or large cells without causing photodamage. However, most existing laser-driven tools are limited to relatively simple tasks, such as moving and caging, and cannot carry out in-hand dexterous tasks. This is mainly because in-hand manipulation involves continuously coordinating multiple laser beams, micro-tools, and the object itself, which has high degrees of freedom (DoF) and poses up challenge for planner and controller design. This paper presents a new hierarchical formulation for the grasping and manipulation of micro objects using multiple laser-driven micro-tools. In hardware, multiple laser-driven tools are assembled to act as a robotic hand to carry out in-hand tasks (e. g. , rotating); in software, a hierarchical scheme is developed to shrunken the action space and coordinate the motion of multiple tools, subject to both the parametric uncertainty in the tool and the unknown dynamic model of the object. Such a formulation provides potential for achieving robotic in-hand manipulation at a micro scale. The performance of the proposed system is validated in simulation studies under different scenarios.

ICLR Conference 2021 Conference Paper

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

  • Shangqing Liu
  • Yu Chen
  • Xiaofei Xie
  • Jing Kai Siow
  • Yang Liu 0003

Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic graph to complement the static graph representation of the source code, and design a hybrid message passing GNN for capturing both the local and global structural information. To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversified large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.

IROS Conference 2021 Conference Paper

Semi-supervised Vein Segmentation of Ultrasound Images for Autonomous Venipuncture

  • Yu Chen
  • Yuxuan Wang
  • Bolin Lai
  • Zijie Chen
  • Xu Cao
  • Nanyang Ye 0001
  • Zhongyuan Ren
  • Junbo Zhao

Venipuncture is an indispensable procedure for both diagnosis and treatment. In this paper, unlike existing solutions that fully or partially rely on professional assistance, a compact robotic system integrating both novel hardware and software developments is introduced. The hardware consists of a set of units to facilitate the supporting, positioning, puncturing, and imaging functionalities. To achieve full automation, a novel deep learning framework — semi-ResNeXt-Unet for semi-supervised vein segmentation from ultrasound images is proposed. The depth information of vein is calculated and enables the automated navigation for the puncturing unit. The algorithm is validated on 40 volunteers, and the proposed semi-ResNeXt-Unet improves the dice similarity coefficient (DSC) by 5. 36%, decreases the centroid error by 1. 38 pixels and decreases the failure rate by 5. 60%, compared to fully-supervised ResNeXt-Unet.

AAAI Conference 2021 Conference Paper

The LOB Recreation Model: Predicting the Limit Order Book from TAQ History Using an Ordinary Differential Equation Recurrent Neural Network

  • Zijian Shi
  • Yu Chen
  • John Cartlidge

In an order-driven financial market, the price of a financial asset is discovered through the interaction of orders - requests to buy or sell at a particular price - that are posted to the public limit order book (LOB). Therefore, LOB data is extremely valuable for modelling market dynamics. However, LOB data is not freely accessible, which poses a challenge to market participants and researchers wishing to exploit this information. Fortunately, trades and quotes (TAQ) data - orders arriving at the top of the LOB, and trades executing in the market - are more readily available. In this paper, we present the LOB recreation model, a first attempt from a deep learning perspective to recreate the top five price levels of the LOB for small-tick stocks using only TAQ data. Volumes of orders sitting deep in the LOB are predicted by combining outputs from: (1) a history compiler that uses a Gated Recurrent Unit (GRU) module to selectively compile prediction relevant quote history; (2) a market events simulator, which uses an Ordinary Differential Equation Recurrent Neural Network (ODE-RNN) to simulate the accumulation of net order arrivals; and (3) a weighting scheme to adaptively combine the predictions generated by (1) and (2). By the paradigm of transfer learning, the source model trained on one stock can be fine-tuned to enable application to other financial assets of the same class with much lower demand on additional data. Comprehensive experiments conducted on two real world intraday LOB datasets demonstrate that the proposed model can efficiently recreate the LOB with high accuracy using only TAQ data as input.

AAAI Conference 2021 Conference Paper

Towards Faster Deep Collaborative Filtering via Hierarchical Decision Networks

  • Yu Chen
  • Sinno Jialin Pan

For personalized recommendations, collaborative filtering (CF) methods aim to recommend items to users based on data of historical user-item interactions. Deep learning has indicated success in improving performance of CF methods in recent works. However, to generate an item recommendation list for each user, a lot of deep learning-based CF methods require every pair of users and items to be passed through multiple neural layers. This requires intensive computation and makes real-time end-to-end neural recommendations very costly. To address this issue, in this paper, we propose a new deep learning-based hierarchical decision network to filter out irrelevant items to save computation cost while maintaining good recommendation accuracy of deep CF methods. We also develop a distillation-based training algorithm, which uses a well-trained CF model as a teacher network to guide the training of the decision network. We conducted extensive experiments on real-world benchmark datasets to verify the effectiveness and the efficiency of our decision network for making recommendations. The experimental results indicate that the proposed decision network is able to maintain or even improve the recommendation quality in terms of various metrics and meanwhile enjoy lower computational cost.

IJCAI Conference 2020 Conference Paper

GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension

  • Yu Chen
  • Lingfei Wu
  • Mohammed J. Zaki

Conversational machine comprehension (MC) has proven significantly more challenging compared to traditional MC since it requires better utilization of conversation history. However, most existing approaches do not effectively capture conversation history and thus have trouble handling questions involving coreference or ellipsis. Moreover, when reasoning over passage text, most of them simply treat it as a word sequence without exploring rich semantic relationships among words. In this paper, we first propose a simple yet effective graph structure learning technique to dynamically construct a question and conversation history aware context graph at each conversation turn. Then we propose a novel Recurrent Graph Neural Network, and based on that, we introduce a flow mechanism to model the temporal dependencies in a sequence of context graphs. The proposed GraphFlow model can effectively capture conversational flow in a dialog, and shows competitive performance compared to existing state-of-the-art methods on CoQA, QuAC and DoQA benchmarks. In addition, visualization experiments show that our proposed model can offer good interpretability for the reasoning process.

NeurIPS Conference 2020 Conference Paper

Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings

  • Yu Chen
  • Lingfei Wu
  • Mohammed Zaki

In this paper, we propose an end-to-end graph learning framework, namely \textbf{I}terative \textbf{D}eep \textbf{G}raph \textbf{L}earning (\alg), for jointly and iteratively learning graph structure and graph embedding. The key rationale of \alg is to learn a better graph structure based on better node embeddings, and vice versa (i. e. , better node embeddings based on a better graph structure). Our iterative method dynamically stops when the learned graph structure approaches close enough to the graph optimized for the downstream prediction task. In addition, we cast the graph learning problem as a similarity metric learning problem and leverage adaptive graph regularization for controlling the quality of the learned graph. Finally, combining the anchor-based approximation technique, we further propose a scalable version of \alg, namely \salg, which significantly reduces the time and space complexity of \alg without compromising the performance. Our extensive experiments on nine benchmarks show that our proposed \alg models can consistently outperform or match the state-of-the-art baselines. Furthermore, \alg can be more robust to adversarial graphs and cope with both transductive and inductive learning.

IROS Conference 2019 Conference Paper

Fast and Incremental Loop Closure Detection Using Proximity Graphs

  • Shan An
  • Guangfu Che
  • Fangru Zhou
  • Xianglong Liu 0001
  • Xin Ma
  • Yu Chen

Visual loop closure detection, which can be considered as an image retrieval task, is an important problem in SLAM (Simultaneous Localization and Mapping) systems. The frequently used bag-of-words (BoW) models can achieve high precision and moderate recall. However, the requirement for lower time costs and fewer memory costs for mobile robot applications is not well satisfied. In this paper, we propose a novel loop closure detection framework titled FILD’ (Fast and Incremental Loop closure Detection), which focuses on an on-line and incremental graph vocabulary construction for fast loop closure detection. The global and local features of frames are extracted using the Convolutional Neural Networks (CNN) and SURF on the GPU, which guarantee extremely fast extraction speeds. The graph vocabulary construction is based on one type of proximity graph, named Hierarchical Navigable Small World (HNSW) graphs, which is modified to adapt to this specific application. In addition, this process is coupled with a novel strategy for real-time geometrical verification, which only keeps binary hash codes and significantly saves on memory usage. Extensive experiments on several publicly available datasets show that the proposed approach can achieve fairly good recall at 100% precision compared to other state-of-the-art methods. The source code can be downloaded at https://github.com/AnshanTJU/FILD for further studies.

NeurIPS Conference 2019 Conference Paper

Fisher Efficient Inference of Intractable Models

  • Song Liu
  • Takafumi Kanamori
  • Wittawat Jitkrittum
  • Yu Chen

Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cram{\'e}r-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator. We study the problem of model inference using DLE. We prove its consistency and show that the asymptotic variance of its solution can attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network.

IJCAI Conference 2019 Conference Paper

Network Formation under Random Attack and Probabilistic Spread

  • Yu Chen
  • Shahin Jabbari
  • Michael Kearns
  • Sanjeev Khanna
  • Jamie Morgenstern

We study a network formation game where agents receive benefits by forming connections to other agents but also incur both direct and indirect costs from the formed connections. Specifically, once the agents have purchased their connections, an attack starts at a randomly chosen vertex in the network and spreads according to the independent cascade model with a fixed probability, destroying any infected agents. The utility or welfare of an agent in our game is defined to be the expected size of the agent's connected component post-attack minus her expenditure in forming connections. Our goal is to understand the properties of the equilibrium networks formed in this game. Our first result concerns the edge density of equilibrium networks. A network connection increases both the likelihood of remaining connected to other agents after an attack as well the likelihood of getting infected by a cascading spread of infection. We show that the latter concern primarily prevails and any equilibrium network in our game contains only $O(n\log n)$ edges where $n$ denotes the number of agents. On the other hand, there are equilibrium networks that contain $\Omega(n)$ edges showing that our edge density bound is tight up to a logarithmic factor. Our second result shows that the presence of attack and its spread through a cascade does not significantly lower social welfare as long as the network is not too dense. We show that any non-trivial equilibrium network with $O(n)$ edges has $\Theta(n^2)$ social welfare, asymptotically similar to the social welfare guarantee in the game without any attacks.

AAAI Conference 2019 Conference Paper

Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos

  • Ying Tai
  • Yicong Liang
  • Xiaoming Liu
  • Lei Duan
  • Jilin Li
  • Chengjie Wang
  • Feiyue Huang
  • Yu Chen

In recent years, heatmap regression based models have shown their effectiveness in face alignment and pose estimation. However, Conventional Heatmap Regression (CHR) is not accurate nor stable when dealing with high-resolution facial videos, since it finds the maximum activated location in heatmaps which are generated from rounding coordinates, and thus leads to quantization errors when scaling back to the original high-resolution space. In this paper, we propose a Fractional Heatmap Regression (FHR) for high-resolution video-based face alignment. The proposed FHR can accurately estimate the fractional part according to the 2D Gaussian function by sampling three points in heatmaps. To further stabilize the landmarks among continuous video frames while maintaining the precise at the same time, we propose a novel stabilization loss that contains two terms to address time delay and non-smooth issues, respectively. Experiments on 300W, 300- VW and Talking Face datasets clearly demonstrate that the proposed method is more accurate and stable than the state-ofthe-art models.

AAAI Conference 2018 Conference Paper

Cross-Domain Human Parsing via Adversarial Feature and Label Adaptation

  • Si Liu
  • Yao Sun
  • Defa Zhu
  • Guanghui Ren
  • Yu Chen
  • Jiashi Feng
  • Jizhong Han

Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i. e. , parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e. g. , a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem.

TIST Journal 2017 Journal Article

Exploring Indoor White Spaces in Metropolises

  • Xuhang Ying
  • Jincheng Zhang
  • Lichao Yan
  • Yu Chen
  • Guanglin Zhang
  • Minghua Chen
  • Ranveer Chandra

It is a promising vision to exploit white spaces, that is, vacant VHF and UHF TV channels, to meet the rapidly growing demand for wireless data services in both outdoor and indoor scenarios. While most prior works have focused on outdoor white space, the indoor story is largely open for investigation. Motivated by this observation and discovering that 70% of the spectrum demand comes from indoor environment, we carry out a comprehensive study to explore indoor white spaces. We first conduct a large-scale measurement study and compare outdoor and indoor TV spectrum occupancy at 30+ diverse locations in a typical metropolis—Hong Kong. Our results show that abundant white spaces are available in different areas in Hong Kong, which account for more than 50% and 70% of the entire TV spectrum in outdoor and indoor scenarios, respectively. Although there are substantially more white spaces indoors than outdoors, there have been very few solutions for identifying indoor white space. To fill in this gap, we develop the first data-driven, low-cost indoor white space identification system for White-space Indoor Spectrum EnhanceR (WISER), to allow secondary users to identify white spaces for communication without sensing the spectrum themselves. We design the architecture and algorithms to address the inherent challenges. We build a WISER prototype and carry out real-world experiments to evaluate its performance. Our results show that WISER can identify 30%--40% more indoor white spaces with negligible false alarms, as compared to alternative baseline approaches.

IJCAI Conference 2017 Conference Paper

MRLR: Multi-level Representation Learning for Personalized Ranking in Recommendation

  • Zhu Sun
  • Jie Yang
  • Jie Zhang
  • Alessandro Bozzon
  • Yu Chen
  • Chi Xu

Representation learning (RL) has recently proven to be effective in capturing local item relationships by modeling item co-occurrence in individual user's interaction record. However, the value of RL for recommendation has not reached the full potential due to two major drawbacks: 1) recommendation is modeled as a rating prediction problem but should essentially be a personalized ranking one; 2) multi-level organizations of items are neglected for fine-grained item relationships. We design a unified Bayesian framework MRLR to learn user and item embeddings from a multi-level item organization, thus benefiting from RL as well as achieving the goal of personalized ranking. Extensive validation on real-world datasets shows that MRLR consistently outperforms state-of-the-art algorithms.

IJCAI Conference 2016 Conference Paper

ADL&trade; : A Topic Model for Discovery of Activities of Daily Living in a Smart Home

  • Yu Chen
  • Tom Diethe
  • Peter Flach

We present an unsupervised approach for discovery of Activities of Daily Living (ADL) in a smart home. Activity discovery is an important enabling technology, for example to tackle the healthcare requirements of elderly people in their homes. The technique applied most often is supervised learning, which relies on expensive labelled data and lacks the flexibility to discover unseen activities. Building on ideas from text mining, we present a powerful topic model and a segmentation algorithm that can learn from unlabelled sensor data. The model has been evaluated extensively on datasets collected from real smart homes. The results demonstrate that this approach can successfully discover the activities of residents, and can be effectively used in a range of applications such as detection of abnormal activities and monitoring of sleep quality, among many others.

JBHI Journal 2014 Journal Article

Enabling Smart Personalized Healthcare: A Hybrid Mobile-Cloud Approach for ECG Telemonitoring

  • Xiaoliang Wang
  • Qiong Gui
  • Bingwei Liu
  • Zhanpeng Jin
  • Yu Chen

The severe challenges of the skyrocketing healthcare expenditure and the fast aging population highlight the needs for innovative solutions supporting more accurate, affordable, flexible, and personalized medical diagnosis and treatment. Recent advances of mobile technologies have made mobile devices a promising tool to manage patients' own health status through services like telemedicine. However, the inherent limitations of mobile devices make them less effective in computation- or data-intensive tasks such as medical monitoring. In this study, we propose a new hybrid mobile-cloud computational solution to enable more effective personalized medical monitoring. To demonstrate the efficacy and efficiency of the proposed approach, we present a case study of mobile-cloud based electrocardiograph monitoring and analysis and develop a mobile-cloud prototype. The experimental results show that the proposed approach can significantly enhance the conventional mobile-based medical monitoring in terms of diagnostic accuracy, execution efficiency, and energy efficiency, and holds the potential in addressing future large-scale data analysis in personalized healthcare.

IROS Conference 1998 Conference Paper

Smart sensor snow

  • Thomas C. Henderson
  • Mohamed Dekhil
  • Scott Morris 0002
  • Yu Chen
  • William B. Thompson

We propose to deploy and exploit a large number of inexpensive sensors to obtain information or trigger actions over a wide geographic area. Sensors may be of diverse physical natures: acoustic, IR, seismic, chemical, magnetic, thermal, etc. We describe three major issues: (1) sensor distribution patterns, (2) local sensor frames, and (3) autonomous robot sensor snow exploitation techniques.