Author name cluster

Tong Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

EAAI Journal 2026 Journal Article

A fuzzy cross domain matrix machine for fault diagnosis under multi-objective domain

Haiyang Pan
Chunan Chen
Tong Chen
Jian Cheng
Jinde Zheng
Shuchao Deng

Details DOI

AAAI Conference 2026 Conference Paper

EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

Tong Chen
Xinyu Ma
Long Bai
Wenyang Wang
Yue Sun
Luping Zhou

Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based framework that restores multiple degradation types using a single model. EndoIR introduces a Dual-Domain Prompter that extracts joint spatial–frequency features, coupled with an adaptive embedding that encodes both shared and task-specific cues as conditioning for denoising. To mitigate feature confusion in conventional concatenation-based conditioning, we design a Dual-Stream Diffusion architecture that processes clean and degraded inputs separately, with a Rectified Fusion Block integrating them in a structured, degradation-aware manner. Furthermore, Noise-Aware Routing Block improves efficiency by dynamically selecting only noise-relevant features during denoising. Experiments on SegSTRONG-C and CEC datasets demonstrate that EndoIR achieves state-of-the-art performance across multiple degradation scenarios while using fewer parameters than strong baselines, and downstream segmentation experiments confirm its clinical utility.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework

Jinge Wu
Zhaolong Wu
Ruizhe Li
Tong Chen
Abul Hasan
Yunsoo Kim
Jason Pui-Yin Cheung
Teng Zhang

The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threatening harm to patients. While large language models (LLMs) have shown remarkable progress in many tasks, their utilities in detecting and correcting errors in radiology reporting are limited. This paper proposes a novel dual-knowledge infusion framework that enhances LLMs' capability for radiology report proofreading through systematic integration of medical expertise. Specifically, the knowledge infusion combines medical knowledge graph distillation (MKGD) with external knowledge retrieval (EXKR), enabling an effective automated approach in tackling mistakes in radiology reporting. By decomposing the complex proofreading task into three specialized stages of detection, localization, and correction, our method mirrors the systematic review process employed by expert radiologists, ensuring both precision and clinical interpretability. To perform a robust, clinically relevant evaluation, a comprehensive benchmark is also proposed using real-world radiology reports with real-world error patterns, including speech recognition confusions, terminology ambiguities, and template-related inconsistencies. Extensive evaluations across multiple LLM architectures demonstrate substantial improvements of our approach: up to 31.56% increase in error detection accuracy and 37.4% reduction in processing time. Human evaluation by radiologists confirms superior clinical relevance and factual consistency compared to existing approaches.

PDF Details DOI

JBHI Journal 2026 Journal Article

Img2Gene: Debiased Spatially Resolved Transcriptomics with Biological Context from Pathology Images

Wei Zhang
Tong Chen
Wenxin Xu
Collin Sakal
Xinyue Li

Spatial transcriptomics integrates morphological information from pathology images with gene expression, providing high-resolution spatial gene expression profiles while preserving tissue architectures in a cost-effective manner. However, the inherent heterogeneity between images and gene expression data, coupled with sparse gene expression distribution, poses significant challenges for accurate and unbiased prediction models. To address these issues, we propose Img2Gene, a debiased framework designed to predict gene expression levels from whole slide images by incorporating biological context. Specifically, we integrate causal analysis into the gene expression prediction task to mitigate data sparsity and achieve unbiased predictions. Furthermore, we employ gene set enrichment analysis to identify highly associated pathway information as biological context and introduce a cross-modal coherence loss to align data from different modalities, fostering enhanced interplay among diverse features and achieving improved accuracy of gene expression prediction. Extensive experiments conducted on four public datasets demonstrate that our method achieves state-of-the-art performance. The pathway data and source code are available at https://github.com/coffeeNtv/Img2Gene.

Details DOI

TIST Journal 2026 Journal Article

Towards On-device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model

Zhaofeng Zhong
Wei Yuan
Liang Qu
Tong Chen
Hao Wang
Xiangyu Zhao
Hongzhi Yin

With the advancement of large language models (LLMs), significant progress has been achieved in various natural language processing (NLP) tasks. However, existing LLMs still face two major challenges that hinder their broader adoption: (1) their responses tend to be generic and lack personalization tailored to individual users, and (2) they rely heavily on cloud infrastructure due to intensive computational requirements, leading to stable network dependency and response delay. Recent research has predominantly focused on either developing cloud-based personalized LLMs or exploring the on-device deployment of general-purpose LLMs. However, few studies have addressed both limitations simultaneously by investigating personalized on-device language models (LMs). To bridge this gap, we propose CDCDA-PLM, a framework for deploying personalized on-device LMs on user devices with support from a powerful cloud-based LLM. Specifically, CDCDA-PLM leverages the server-side LLM’s strong generalization capabilities to augment users’ limited personal data, mitigating the issue of data scarcity. Using both real and synthetic data, a personalized on-device LM is fine-tuned via parameter-efficient fine-tuning (PEFT) modules and deployed on users’ local devices, enabling them to process queries without depending on cloud-based LLMs. This approach eliminates reliance on network stability and ensures high response speeds. Experimental results across six NLP personalization tasks demonstrate the effectiveness of CDCDA-PLM.

Details DOI

TMLR Journal 2025 Journal Article

Denoising Pretrained Black-box Models via Amplitude-Guided Phase Realignment

Hongliang Ni
Tong Chen
Shazia Sadiq
Gianluca Demartini

Pre-trained models tend to inherit noisy label information from their training datasets, internalising it as biased knowledge. While learning with label noise has been explored, existing approaches rarely address the mitigation of biased knowledge embedded in pre-trained representations introduced by noisy labels. Moreover, existing denoising methods invariably rely on modifying training datasets or models to improve downstream task performance. However, we observe a growing trend in which both pre-trained models and their training datasets are scaling up significantly and becoming increasingly inaccessible, making modifications ever more infeasible. In this paper, we propose a black-box biased knowledge mitigation method called ``Lorem'', which leverages feature frequency amplitudes to guide phase correction on pre-trained representations, without access to training data or model parameters. We first present empirical evidence that, across different noise levels, the phase components of pre-trained representations are more sensitive to noisy labels than the amplitude components, while discriminative information for classification is primarily encoded in the amplitude. Moreover, we find that the impact of noisy labels on amplitude is global, leading to a gradual loss of discriminative information. Therefore, corrective strategies must be adaptive across the entire frequency spectrum rather than limited to the high-frequency components. Inspired by this observation, we design a method that leverages the amplitude residual to realign phase, thereby removing biased knowledge from pre-trained representations. Experiments on a variety of popular pre-trained vision and language models suggest that, even with a simple linear classifier, our method can enhance downstream performance across a range of in-domain and out-of-domain tasks.

PDF Details

IROS Conference 2025 Conference Paper

Extreme-Hydrostatic-Pressure Resilient Dielectric Elastomer Actuator for Propeller Propulsion

Boyuan Du
Liang Zhou
Xuguang Dong
Xinge Li
Tong Chen
Tiefeng Li
Xin-Jun Liu
Huichan Zhao

Exploring high hydrostatic pressure environments such as deep sea presents significant challenges to robotic devices, for they often rely on strong yet heavy and costly protective structures to shield components from being crushed by the extreme pressure. To dismiss the need for bulky protection shells for actuation devices, we reported an extreme-hydrostatic-pressure resilient rotary dielectric elastomer actuator (DEA) for propulsion application in deep-sea pressure condition. DEAs are inherently resistant to damage caused by external pressure, due to their uniform and cavity-free structure. In this study, we analyzed the material properties of the DEA’s elastomer, evaluated the rotary actuator’s lifespan at up to 110 MPa high-pressure liquid conditions, and output performance under both ambient and 30 MPa (equivalent to 3, 000 m underwater). Our results show that the rotary actuator maintained functionality at such hydrostatic pressure, with a lifespan exceeding 300, 000 cycles and a high rotational output speed of 820 rpm. The rotary actuator was subsequently used to drive the robot with a propeller in a simulated deep-sea pressure fluidic environment, demonstrating our DEA’s performance as well as design simplicity for deep-sea applications without protection structures. While high hydrostatic pressure negatively impacted the actuator’s lifespan and slightly reduced its dynamic performance, our results confirmed that the DEA was a viable solution for deep-sea exploration, laying a solid foundation for the further development of DEA-powered devices for underwater missions.

Details

ICLR Conference 2025 Conference Paper

Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension

Jiahan Li
Tong Chen
Shitong Luo
Chaoran Cheng
Jiaqi Guan
Ruihan Guo
Sheng Wang
Ge Liu

Peptides, short chains of amino acids, interact with target proteins, making them a unique class of protein-based therapeutics for treating human diseases. Recently, deep generative models have shown great promise in peptide generation. However, several challenges remain in designing effective peptide binders. First, not all residues contribute equally to peptide-target interactions. Second, the generated peptides must adopt valid geometries due to the constraints of peptide bonds. Third, realistic tasks for peptide drug development are still lacking. To address these challenges, we introduce PepHAR, a hot-spot-driven autoregressive generative model for designing peptides targeting specific proteins. Building on the observation that certain hot spot residues have higher interaction potentials, we first use an energy-based density model to fit and sample these key residues. Next, to ensure proper peptide geometry, we autoregressively extend peptide fragments by estimating dihedral angles between residue frames. Finally, we apply an optimization process to iteratively refine fragment assembly, ensuring correct peptide structures. By combining hot spot sampling with fragment-based extension, our approach enables \textit{de novo} peptide design tailored to a target protein and allows the incorporation of key hot spot residues into peptide scaffolds. Extensive experiments, including peptide design and peptide scaffold generation, demonstrate the strong potential of PepHAR in computational peptide binder design. The source code will be available at https://github.com/Ced3-han/PepHAR.

Details

TIST Journal 2024 Journal Article

Explicit Knowledge Graph Reasoning for Conversational Recommendation

Xuhui Ren
Tong Chen
Quoc Viet Hung Nguyen
Lizhen Cui
Zi Huang
Hongzhi Yin

Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current preference through a sequence of clarifying questions. Recently, there has been a rise of using knowledge graphs (KGs) for CRSs, where the core motivation is to incorporate the abundant side information carried by a KG into both the recommendation and conversation processes. However, existing KG-based CRSs are subject to two defects: (1) there is a semantic gap between the learned representations of utterances and KG entities, hindering the retrieval of relevant KG information; (2) the reasoning over KG is mostly performed with the implicitly learned user interests, overlooking the explicit signals from the entities actually mentioned in the conversation. To address these drawbacks, we propose a new CRS framework, namely, the Knowledge Enhanced Conversational Reasoning (KECR) model. As a user can reflect her/his preferences via both attribute- and item-level expressions, KECR jointly embeds the structured knowledge from two levels in the KG. A mutual information maximization constraint is further proposed for semantic alignment between the embedding spaces of utterances and KG entities. Meanwhile, KECR utilizes the connectivity within the KG to conduct explicit reasoning of the user demand, making the model less dependent on the user’s feedback to clarifying questions. As such, the semantic alignment and explicit KG reasoning can jointly facilitate accurate recommendation and quality dialogue generation. By comparing with strong baselines on two real-world datasets, we demonstrate that KECR obtains state-of-the-art recommendation effectiveness, as well as competitive dialogue generation performance.

Details DOI

ICML Conference 2024 Conference Paper

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Paul S. Scotti
Mihir Tripathy
Cesar Torrico 0002
Reese Kneeland
Tong Chen
Ashutosh Narang
Charan Santhirasegaran
Jonathan Xu

Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on Github: https: //github. com/MedARC-AI/MindEyeV2

Details

YNIMG Journal 2023 Journal Article

Decoding the temporal representation of facial expression in face-selective regions

Zhihao Zhang
Tong Chen
Ye Liu
Chongyang Wang
Ke Zhao
Chang Hong Liu
Xiaolan Fu

Details DOI

JBHI Journal 2023 Journal Article

Heterogeneous Collaborative Learning for Personalized Healthcare Analytics via Messenger Distillation

Guanhua Ye
Tong Chen
Yawen Li
Lizhen Cui
Quoc Viet Hung Nguyen
Hongzhi Yin

The Healthcare Internet-of-Things (IoT) framework aims to provide personalized medical services with edge devices. Due to the inevitable data sparsity on an individual device, cross-device collaboration is introduced to enhance the power of distributed artificial intelligence. Conventional collaborative learning protocols (e. g. , sharing model parameters or gradients) strictly require the homogeneity of all participant models. However, real-life end devices have various hardware configurations (e. g. , compute resources), leading to heterogeneous on-device models with different architectures. Moreover, clients (i. e. , end devices) may participate in the collaborative learning process at different times. In this paper, we propose a S imilarity- Q uality-based M essenger D istillation (SQMD) framework for heterogeneous asynchronous on-device healthcare analytics. By introducing a preloaded reference dataset, SQMD enables all participant devices to distill knowledge from peers via messengers (i. e. , the soft labels of the reference dataset generated by clients) without assuming the same model architecture. Furthermore, the messengers also carry important auxiliary information to calculate the similarity between clients and evaluate the quality of each client model, based on which the central server creates and maintains a dynamic collaboration graph (communication graph) to improve the personalization and reliability of SQMD under asynchronous conditions. Extensive experiments on three real-life datasets show that SQMD achieves superior performance.

Details DOI

JBHI Journal 2022 Journal Article

Personalized On-Device E-Health Analytics With Decentralized Block Coordinate Descent

Guanhua Ye
Hongzhi Yin
Tong Chen
Miao Xu
Quoc Viet Hung Nguyen
Jiangning Song

Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be centrally stored on the server to facilitate model training, which inevitably incurs privacy concerns and high time delay. Distributed solutions like Decentralized Stochastic Gradient Descent (D-SGD) are proposed to provide safe and timely diagnostic results based on personal devices. However, methods like D-SGD are subject to the gradient vanishing issue and usually proceed slowly at the early training stage, thereby impeding the effectiveness and efficiency of training. In addition, existing methods are prone to learning models that are biased towards users with dense data, compromising the fairness when providing E-health analytics for minority groups. In this paper, we propose a Decentralized Block Coordinate Descent (D-BCD) learning framework that can better optimize deep neural network-based models distributed on decentralized devices for E-health analytics. As a gradient-free optimization method, Block Coordinate Descent (BCD) mitigates the gradient vanishing issue and converges faster at the early stage compared with the conventional gradient-based optimization. To overcome the potential data scarcity issues for users’ local data, we propose similarity-based model aggregation that allows each on-device model to leverage knowledge from similar neighbor models, so as to achieve both personalization and high accuracy for the learned models. Benchmarking experiments on three real-world datasets illustrate the effectiveness and practicality of our proposed D-BCD, where additional simulation study showcases the strong applicability of D-BCD in real-life E-health scenarios.

Details DOI

IJCAI Conference 2021 Conference Paper

DA-GCN: A Domain-aware Attentive Graph Convolution Network for Shared-account Cross-domain Sequential Recommendation

Lei Guo
Li Tang
Tong Chen
Lei Zhu
Quoc Viet Hung Nguyen
Hongzhi Yin

Shared-account Cross-domain Sequential Recommendation (SCSR) is the task of recommending the next item based on a sequence of recorded user behaviors, where multiple users share a single account, and their behaviours are available in multiple domains. Existing work on solving SCSR mainly relies on mining sequential patterns via RNN-based models, which are not expressive enough to capture the relationships among multiple entities. Moreover, all existing algorithms try to bridge two domains via knowledge transfer in the latent space, and the explicit cross-domain graph structure is unexploited. In this work, we propose a novel graph-based solution, namely DA-GCN, to address the above challenges. Specifically, we first link users and items in each domain as a graph. Then, we devise a domain-aware graph convolution network to learn user-specific node representations. To fully account for users' domain-specific preferences on items, two novel attention mechanisms are further developed to selectively guide the message passing process. Extensive experiments on two real-world datasets are conducted to demonstrate the superiority of our DA-GCN method.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Discovering Collaborative Signals for Next POI Recommendation with Iterative Seq2Graph Augmentation

Yang Li
Tong Chen
Yadan Luo
Hongzhi Yin
Zi Huang

Being an indispensable component in location-based social networks, next point-of-interest (POI) recommendation recommends users unexplored POIs based on their recent visiting histories. However, existing work mainly models check-in data as isolated POI sequences, neglecting the crucial collaborative signals from cross-sequence check-in information. Furthermore, the sparse POI-POI transitions restrict the ability of a model to learn effective sequential patterns for recommendation. In this paper, we propose Sequence-to-Graph (Seq2Graph) augmentation for each POI sequence, allowing collaborative signals to be propagated from correlated POIs belonging to other sequences. We then devise a novel Sequence-to-Graph POI Recommender (SGRec), which jointly learns POI embeddings and infers a user's temporal preferences from the graph-augmented POI sequence. To overcome the sparsity of POI-level interactions, we further infuse category-awareness into SGRec with a multi-task learning scheme that captures the denser category-wise transitions. As such, SGRec makes full use of the collaborative signals for learning expressive POI representations, and also comprehensively uncovers multi-level sequential patterns for user preference modelling. Extensive experiments on two real-world datasets demonstrate the superiority of SGRec against state-of-the-art methods in next POI recommendation.

PDF Details DOI

JBHI Journal 2021 Journal Article

Disease Prediction via Graph Neural Networks

Zhenchao Sun
Hongzhi Yin
Hongxu Chen
Tong Chen
Lizhen Cui
Fan Yang

With the increasingly available electronic medical records (EMRs), disease prediction has recently gained immense research attention, where an accurate classifier needs to be trained to map the input prediction signals (e. g. , symptoms, patient demographics, etc.) to the estimated diseases for each patient. However, existing machine learning-based solutions heavily rely on abundant manually labeled EMR training data to ensure satisfactory prediction results, impeding their performance in the existence of rare diseases that are subject to severe data scarcity. For each rare disease, the limited EMR data can hardly offer sufficient information for a model to correctly distinguish its identity from other diseases with similar clinical symptoms. Furthermore, most existing disease prediction approaches are based on the sequential EMRs collected for every patient and are unable to handle new patients without historical EMRs, reducing their real-life practicality. In this paper, we introduce an innovative model based on Graph Neural Networks (GNNs) for disease prediction, which utilizes external knowledge bases to augment the insufficient EMR data, and learns highly representative node embeddings for patients, diseases and symptoms from the medical concept graph and patient record graph respectively constructed from the medical knowledge base and EMRs. By aggregating information from directly connected neighbor nodes, the proposed neural graph encoder can effectively generate embeddings that capture knowledge from both data sources, and is able to inductively infer the embeddings for a new patient based on the symptoms reported in her/his EMRs to allow for accurate prediction on both general diseases and rare diseases. Extensive experiments on a real-world EMR dataset have demonstrated the state-of-the-art performance of our proposed model.

Details DOI

JBHI Journal 2021 Journal Article

FENet: A Frequency Extraction Network for Obstructive Sleep Apnea Detection

Guanhua Ye
Hongzhi Yin
Tong Chen
Hongxu Chen
Lizhen Cui
Xiangliang Zhang

Obstructive Sleep Apnea (OSA) is a highly prevalent but inconspicuous disease that seriously jeopardizes the health of human beings. Polysomnography (PSG), the gold standard of detecting OSA, requires multiple specialized sensors for signal collection, hence patients have to physically visit hospitals and bear the costly treatment for a single detection. Recently, many single-sensor alternatives have been proposed to improve the cost efficiency and convenience. Among these methods, solutions based on RR-interval (i. e. , the interval between two consecutive pulses) signals reach a satisfactory balance among comfort, portability and detection accuracy. In this paper, we advance RR-interval based OSA detection by considering its real-world practicality from energy perspectives. As photoplethysmogram (PPG) pulse sensors are commonly equipped on smart wrist-worn wearable devices (e. g. , smart watches and wristbands), the energy efficiency of the detection model is crucial to fully support an overnight observation on patients. This creates challenges as the PPG sensors are unable to keep collecting continuous signals due to the limited battery capacity on smart wrist-worn devices. Therefore, we propose a novel Frequency Extraction Network (FENet), which can extract features from different frequency bands of the input RR-interval signals and generate continuous detection results with downsampled, discontinuous RR-interval signals. With the help of the one-to-multiple structure, FENet requires only one-third of the operation time of the PPG sensor, thus sharply cutting down the energy consumption and enabling overnight diagnosis. Experimental results on real OSA datasets reveal the state-of-the-art performance of FENet.

Details DOI

NeurIPS Conference 2021 Conference Paper

Localization with Sampling-Argmax

Jiefeng Li
Tong Chen
Ruiqi Shi
Yujing Lou
Yong-Lu Li
Cewu Lu

Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method. Code is available at https: //github. com/Jeff-sjtu/sampling-argmax

PDF Details

ICML Conference 2021 Conference Paper

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang 0016
Jianhao Wang
Hao Hu 0006
Tong Chen
Yingfeng Chen
Changjie Fan
Chongjie Zhang

Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.

Details

TIST Journal 2021 Journal Article

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graphs

Yuandong Wang
Hongzhi Yin
Tong Chen
Chunyang Liu
Ben Wang
Tianyu Wo
Jie Xu

In recent years, ride-hailing services have been increasingly prevalent, as they provide huge convenience for passengers. As a fundamental problem, the timely prediction of passenger demands in different regions is vital for effective traffic flow control and route planning. As both spatial and temporal patterns are indispensable passenger demand prediction, relevant research has evolved from pure time series to graph-structured data for modeling historical passenger demand data, where a snapshot graph is constructed for each time slot by connecting region nodes via different relational edges (origin-destination relationship, geographical distance, etc.). Consequently, the spatiotemporal passenger demand records naturally carry dynamic patterns in the constructed graphs, where the edges also encode important information about the directions and volume (i.e., weights) of passenger demands between two connected regions. aspects in the graph-structure data. representation for DDW is the key to solve the prediction problem. However, existing graph-based solutions fail to simultaneously consider those three crucial aspects of dynamic, directed, and weighted graphs, leading to limited expressiveness when learning graph representations for passenger demand prediction. Therefore, we propose a novel spatiotemporal graph attention network, namely Gallat ( G raph prediction with all at tention) as a solution. In Gallat, by comprehensively incorporating those three intrinsic properties of dynamic directed and weighted graphs, we build three attention layers to fully capture the spatiotemporal dependencies among different regions across all historical time slots. Moreover, the model employs a subtask to conduct pretraining so that it can obtain accurate results more quickly. We evaluate the proposed model on real-world datasets, and our experimental results demonstrate that Gallat outperforms the state-of-the-art approaches.

Details DOI

NeurIPS Conference 2021 Conference Paper

Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification

Tong Chen
Jean B. Lasserre
Victor Magron
Edouard Pauwels

Deep equilibrium models are based on implicitly defined functional relations and have shown competitive performance compared with the traditional deep networks. Monotone operator equilibrium networks (monDEQ) retain interesting performance with additional theoretical guaranties. Existing certification tools for classical deep networks cannot directly be applied to monDEQs for which much fewer tools exist. We introduce a semialgebraic representation for ReLU based monDEQs which allow to approximate the corresponding input output relation by semidefinite programs (SDP). We present several applications to network certification and obtain SDP models for the following problems: robustness certification, Lipschitz constant estimation, ellipsoidal uncertainty propagation. We use these models to certify robustness of monDEQs with respect to a general $L_p$ norm. Experimental results show that the proposed models outperform existing approaches for monDEQ certification. Furthermore, our investigations suggest that monDEQs are much more robust to $L_2$ perturbations than $L_{\infty}$ perturbations.

PDF Details

AAAI Conference 2020 Conference Paper

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration

Haojie Liu
Han Shen
Lichao Huang
Ming Lu
Tong Chen
Zhan Ma

Traditional video compression technologies have been developed over decades in pursuit of higher coding efﬁciency. Ef- ﬁcient temporal information representation plays a key role in video coding. Thus, in this paper, we propose to exploit the temporal correlation using both ﬁrst-order optical ﬂow and second-order ﬂow prediction. We suggest an one-stage learning approach to encapsulate ﬂow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors to exploit second-order correlations. Joint priors are embedded in autoregressive spatial neighbors, co-located hyper elements and temporal neighbors using ConvLSTM recurrently. We evaluate our approach for the low-delay scenario with High-Efﬁciency Video Coding (H. 265/HEVC), H. 264/AVC and another learned video compression method, following the common test settings. Our work offers the state-of-theart performance, with consistent gains across all popular test sequences.

PDF Details

NeurIPS Conference 2020 Conference Paper

Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

Tong Chen
Jean B. Lasserre
Victor Magron
Edouard Pauwels

The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network. We introduce a semidefinite programming hierarchy to estimate the global and local Lipschitz constant of a multiple layer deep neural network. The novelty is to combine a polynomial lifting for ReLU functions derivatives with a weak generalization of Putinar's positivity certificate. This idea could also apply to other, nearly sparse, polynomial optimization problems in machine learning. We empirically demonstrate that our method provides a trade-off with respect to state of the art linear programming approach, and in some cases we obtain better bounds in less time.

PDF Details

AAAI Conference 2020 Conference Paper

Where to Go Next: Modeling Long- and Short-Term User Preferences for Point-of-Interest Recommendation

Ke Sun
Tieyun Qian
Tong Chen
Yile Liang
Quoc Viet Hung Nguyen
Hongzhi Yin

Point-of-Interest (POI) recommendation has been a trending research topic as it generates personalized suggestions on facilities for users from a large number of candidate venues. Since users’ check-in records can be viewed as a long sequence, methods based on recurrent neural networks (RNNs) have recently shown promising applicability for this task. However, existing RNN-based methods either neglect users’ long-term preferences or overlook the geographical relations among recently visited POIs when modeling users’ shortterm preferences, thus making the recommendation results unreliable. To address the above limitations, we propose a novel method named Long- and Short-Term Preference Modeling (LSTPM) for next-POI recommendation. In particular, the proposed model consists of a nonlocal network for longterm preference modeling and a geo-dilated RNN for shortterm preference learning. Extensive experiments on two realworld datasets demonstrate that our model yields signiﬁcant improvements over the state-of-the-art methods.

PDF Details

IJCAI Conference 2019 Conference Paper

Inferring Substitutable Products with Deep Network Embedding

Shijie Zhang
Hongzhi Yin
Qinyong Wang
Tong Chen
Hongxu Chen
Quoc Viet Hung Nguyen

On E-commerce platforms, understanding the relationships (e. g. , substitute and complement) among products from user's explicit feedback, such as users' online transactions, is of great importance to boost extra sales. However, the significance of such relationships is usually neglected by existing recommender systems. In this paper, we propose a semisupervised deep embedding model, namely, Substitute Products Embedding Model (SPEM), which models the substitutable relationships between products by preserving the second-order proximity, negative first-order proximity and semantic similarity in a product co-purchasing graph based on user's purchasing behaviours. With SPEM, the learned representations of two substitutable products align closely in the latent embedding space. Extensive experiments on real-world datasets are conducted, and the results verify that our model outperforms state-of-the-art baselines.

PDF Details