Author name cluster

Zhuo Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers

2 author rows

AAAI Conference 2026 Conference Paper

Force-Aware 3D Contact Modeling for Stable Grasp Generation

Zhuo Chen
Zhongqun Zhang
Yihua Cheng
Aleš Leonardis
Hyung Jin Chang

Contact-based grasp generation plays a crucial role in various applications. Recent methods typically focus on the geometric structure of objects, producing grasps with diverse hand poses and plausible contact points. However, these approaches often overlook the physical attributes of the grasp, specifically the contact force, leading to reduced stability of the grasp. In this paper, we focus on stable grasp generation using explicit contact force predictions. First, we define a force-aware contact representation by transforming the normal force value into discrete levels and encoding it using a one-hot vector. Next, we introduce force-aware stability constraints. We define the stability problem as an acceleration minimization task and explicitly relate stability with contact geometry by formulating the underlying physical constraints. Finally, we present a pose optimizer that systematically integrates our contact representation and stability constraints to enable stable grasp generation. We show that these constraints can help identify key contact points for stability which provide effective initialization and guidance for optimization towards a stable grasp. Experiments are carried out on two public benchmarks, showing that our method brings about 20% improvement in stability metrics and adapts well to novel objects.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Multi-Modal Fact Knowledge Generation for Imbalanced Cross-Source Entity Alignment

Qian Li
Cheng Ji
Zhaoji Liang
Yuzheng Zhang
Zhuo Chen
Siyuan Liang

Multi-modal imbalanced cross-source entity alignment aims to identify equivalent entity pairs across multi-modal knowledge graphs (MMKGs) that encompass diverse data sources with imbalanced modality, which poses significant challenges due to the non-uniform distribution of information across different modalities. Existing methods encounter major limitations in aligning entities across MMKGs, where missing data and modality-specific inconsistencies thus create information gaps. These gaps, stemming from disparities in neighborhood structure and attribute availability, result in reduced alignment performance. To address these challenges, we propose a novel multi-modal fact knowledge generation framework to advance imbalanced cross-source entity alignment. Utilizing large language models (LLMs) for comprehensive knowledge completion, our framework enriches MMKGs by synthesizing missing neighboring entities and relational attributes, enabling precise one-to-one similarity comparisons across all relations and attributes. Specifically, neighbor entity completion generates probable neighboring entities to fill structural gaps, while attribute completion synthesizes missing relational attributes to improve alignment. The facts evaluation module assesses generated triples, ensuring that only high-quality information supports the alignment. Extensive experiments on benchmark datasets demonstrate that our framework significantly outperforms strong competitors, achieving superior entity alignment performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

rMMEA: Robust Multi-Modal Entity Alignment with Missing and Noise Visual Modality

Lingbing Guo
Zhuo Chen
Yichi Zhang
Wenbin Guo
Haonan Yang
Zhao Li
Zirui Chen
Xin Wang

Recently, multi-modal embedding methods have flourished in entity alignment. As state-of-the-art approaches evolve rapidly, visual modality (i.e., images) missing emerges as a critical challenge. While visual modality typically offers the most informative signals in multi-modal entity alignment (MMEA), it is frequently unavailable for many entities. The existing methods commonly use dummy vectors to represent visual-missing embeddings, which negatively impacts both model training and inference. In this paper, we propose robust multi-modal entity alignment (rMMEA), which leverages ranking-based knowledge distillation and mutual information (MI) estimation to address missing modalities while enhancing noise robustness. Unlike conventional teacher-student distillation that requires the student to replicate teacher outputs, our rMMEA learns soft rankings from pure and complete modality sides while capturing implicit key semantics of teacher embeddings through mutual information maximization, allowing rMMEA to avoid strict point-to-point alignment. The experimental results across multiple benchmarks and settings demonstrate that rMMEA significantly outperforms the state-of-the-art anti-modality-missing methods in terms of effectiveness and efficiency.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Zhiqiang Liu
Yin Hua
Mingyang Chen
Yichi Zhang
Zhuo Chen
Lei Liang
Wen Zhang

Real-world knowledge graphs (KGs) contain not only standard triple-based facts, but also more complex, heterogeneous types of facts, such as hyper-relational facts with auxiliary key-value pairs, temporal facts with additional timestamps, and nested facts that imply relationships between facts. These richer forms of representation have attracted significant attention due to their enhanced expressiveness and capacity to model complex semantics in real-world scenarios. However, most existing studies suffer from two main limitations: (1) they typically focus on modeling only specific types of facts, thus making it difficult to generalize to real-world scenarios with multiple fact types; and (2) they struggle to achieve generalizable hierarchical (inter-fact and intra-fact) modeling due to the complexity of these representations. To overcome these limitations, we propose UniHR, a Unified Hierarchical Representation learning framework, which consists of a learning-optimized Hierarchical Data Representation (HiDR) module and a unified Hierarchical Structure Learning (HiSL) module. The HiDR module unifies hyper-relational KGs, temporal KGs, and nested factual KGs into triple-based representations. Then HiSL incorporates intra-fact and inter-fact message passing, focusing on enhancing both semantic information within individual facts and enriching the structural information between facts. To go beyond the unified method itself, we further explore the potential of unified representation in complex real-world scenarios. Extensive experiments on 9 datasets across 5 types of KGs demonstrate the effectiveness of UniHR and highlight the strong potential of unified representations.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization

jingfeng Guo
Jian Liu
Jinnan Chen
Shiwei Mao
Changrong Hu
Puhua Jiang
Junlin Yu
Jing Xu

We introduce Auto-Connect, a novel approach for automatic rigging that explicitly preserves skeletal connectivity through a connectivity-preserving tokenization scheme. Unlike previous methods that predict bone positions represented as two joints or first predict points before determining connectivity, our method employs special tokens to define endpoints for each joint's children and for each hierarchical layer, effectively automating connectivity relationships. This approach significantly enhances topological accuracy by integrating connectivity information directly into the prediction framework. To further guarantee high-quality topology, we implement a topology-aware reward function that quantifies topological correctness, which is then utilized in a post-training phase through reward-guided Direct Preference Optimization. Additionally, we incorporate implicit geodesic features for latent top-$k$ bone selection, which substantially improves skinning quality. By leveraging geodesic distance information within the model's latent space, our approach intelligently determines the most influential bones for each vertex, effectively mitigating common skinning artifacts. This combination of connectivity-preserving tokenization, reward-guided fine-tuning, and geodesic-aware bone selection enables our model to consistently generate more anatomically plausible skeletal structures with superior deformation properties.

NeurIPS Conference 2025 Conference Paper

Consistency of Physics-Informed Neural Networks for Second-Order Elliptic Equations

Yuqian Cheng
Zhuo Chen
Qian Lin

The physics-informed neural networks (PINNs) are widely applied in solving differential equations. However, few studies have discussed their consistency. In this paper, we consider the consistency of PINNs when applied to second-order elliptic equations with Dirichlet boundary conditions. We first provide the necessary and sufficient condition for the consistency of the physics-informed kernel gradient flow algorithm, and then as a direct corollary, when the neural network is sufficiently wide, we obtain a necessary and sufficient condition for the consistency of PINNs based on the neural tangent kernel theory. We also estimate the non-asymptotic loss bounds of physics-informed kernel gradient flow and PINN under suitable stronger assumptions. Finally, these results inspires us to construct a notable pathological example where the PINN method is inconsistent.

IROS Conference 2025 Conference Paper

Contactless and Economical Chemical Reaction Platform Based on Ultrasonic Field

Yunsheng Li
Qiao Wang
Yuyan Liu
Bo Yuan
Zhuo Chen
Qiang Huang 0002
Tatsuo Arai
Xiaoming Liu 0007

Chemical reactions constitute a cornerstone of fundamental scientific inquiry, yet traditional methodologies and platforms are encumbered by excessive reagent and consumable demands. Emerging alternatives, such as microfluidic systems, while innovative, suffer from intricate fabrication processes and elevated costs associated with operator training. Other contemporary approaches face limitations including reagent compatibility constraints and prohibitively expensive instrumentation. To address these challenges, this study introduces a contactless chemical reaction platform leveraging an ultrasonic vortex field to achieve stable capture, microscale droplet transport, and sequential multi-droplet mixing without direct contact. This platform substantially reduces contamination risks, minimizes reagent and consumable usage, accommodates a broad spectrum of reagent types, and imposes minimal demands on operator expertise. Demonstrating robust performance in microdose reaction control, the system offers significant potential for advancing chemical research and its applications.

IJCAI Conference 2025 Conference Paper

DO-CoLM: Dynamic 3D Conformation Relationships Capture with Self-Adaptive Ordering Molecular Relational Modeling in Language Models

Zhuo Chen
Jiahui Zhang
Sihan Wang
Hongxin Xiang
Jianmin Wang
Wenjie Du
Yang Wang

Molecular Relational Learning (MRL) aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research. Recently, Large Language Models (LLMs), with their extensive knowledge bases and advanced reasoning capabilities, have emerged as powerful tools for MRL. However, existing LLMs, which primarily rely on SMILES strings and molecular graphs, face two major challenges. They struggle to capture molecular stereochemistry and dynamics, as molecules possess multiple 3D conformations with varying reactivity and dynamic transformation relationships that are essential for accurately predicting molecular interactions but cannot be effectively represented by 1D SMILES or 2D molecular graphs. Additionally, these models do not consider the autoregressive nature of LLMs, overlooking the impact of input order on model performance. To address these issues, we propose DO-CoLM: a Dynamic relationship capture and self-adaptive Ordering 3D molecular Conformation LM for MRL. By introducing modules to dynamically model intra-molecular and inter-molecular conformational relationships and adaptively adjust the molecular modality input order, DO-CoLM achieves superior performance, as demonstrated by experimental results on 12 cross-domain datasets.

PDF Details DOI

AAAI Conference 2025 Conference Paper

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grained control over the synthesized speech with autoregressive (AR) language model; 3) infinite silence generation due to the nature of AR-based decoding, especially under the greedy strategy. To alleviate these issues, we propose ELLA-V, a simple but efficient LM-based zero-shot text-to-speech (TTS) framework, which enables fine-grained control over synthesized audio at the phoneme level. The key to ELLA-V is interleaving sequences of acoustic and phoneme tokens, where phoneme tokens appear ahead of the corresponding acoustic tokens. The experimental findings reveal that our model outperforms baselines in terms of accuracy and delivers more stable results using both greedy and sampling-based decoding strategies.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation

Zhan Qu
Shengyu Zhang
Mengze Li
Zhuo Chen
Chengfei Lv
Zhou Zhao
Fei Wu

Speech-driven 3D facial animation aims to create lifelike facial expressions that synchronize accurately with speech. Despite significant progress, many existing methods may focus on generating facial animation with a fixed emotional state, neglecting the diverse transformations of facial emotions under a given speech input. To solve this issue, we focus on exploring the refined alignment between speech representations and multiple domains in facial expression information. We aim to disentangle the spoken language and emotion facial priors from speech expressions, to guide the refinement of the facial vertices based on speech. To accomplish this objective, we propose ExpTalk, which first applies an Adaptive Disentanglement Variational Autoencoder (AD-VAE) to decouple facial expression aligned with spoken language and emotions of speech through contrastive learning. Then a Refined Alignment Diffusion (RAD) is employed to iteratively refine the decoupled facial expression priors through diffusion-based perturbations, producing facial animations that align with the emotional variations of the given speech. Extensive experiments prove the effectiveness of our ExpTalk by surpassing state-of-the-arts by a large margin.

PDF Details DOI

ICML Conference 2025 Conference Paper

FreeMesh: Boosting Mesh Generation with Coordinates Merging

Jian Liu 0036
Haohan Weng
Biwen Lei
Xianghui Yang
Zibo Zhao 0001
Zhuo Chen
Song Guo 0001
Tao Han 0002

The next-coordinate prediction paradigm has emerged as the de facto standard in current auto-regressive mesh generation methods. Despite their effectiveness, there is no efficient measurement for the various tokenizers that serialize meshes into sequences. In this paper, we introduce a new metric Per-Token-Mesh-Entropy (PTME) to evaluate the existing mesh tokenizers theoretically without any training. Building upon PTME, we propose a plug-and-play tokenization technique called coordinate merging. It further improves the compression ratios of existing tokenizers by rearranging and merging the most frequent patterns of coordinates. Through experiments on various tokenization methods like MeshXL, MeshAnything V2, and Edgerunner, we further validate the performance of our method. We hope that the proposed PTME and coordinate merging can enhance the existing mesh tokenizers and guide the further development of native mesh generation.

AAAI Conference 2025 Conference Paper

Infer the Whole from a Glimpse of a Part: Keypoint-Based Knowledge Graph for Vehicle Re-Identification

Kai Lv
Yunlong Li
Zhuo Chen
Shuo Wang
Sheng Han
Youfang Lin

Vehicle re-identification aims to match vehicles across non-overlapping camera views. Many existing methods extract features from one specific image, and these methods lack view-invariance when comparing vehicles of different orientations. As a result, discriminative parts obscured by viewpoint changes cannot contribute effectively to matching. This work presents a novel keypoint-based framework for vehicle Re-ID. We propose to explicitly model the intrinsic structural relationships between vehicle components via knowledge graph. By establishing connection between keypoints, our approach aims to leverage such prior to match vehicles even when some parts are not directly comparable due to orientation inconsistencies. Specifically, given query and gallery images, we first detect visible keypoints. Then, a transformer-based model infers features for non-overlapped keypoints by conditioning on visible correspondences defined in the knowledge graph. The final representation integrates visible and inferred features. Extensive experiments demonstrate our method outperforms state-of-the-arts on standard benchmarks under cross-view matching scenarios. To our knowledge, this is the first work introducing structural priors via keypoint knowledge graphs for view-invariant vehicle re-identification.

PDF Details DOI

AAAI Conference 2025 Conference Paper

K-ON: Stacking Knowledge on the Head Layer of Large Language Model

Lingbing Guo
Yichi Zhang
Zhongpu Bo
Zhuo Chen
Mengshu Sun
Zhiqiang Zhang
Wen Zhang
Huajun Chen

Recent advancements in large language models (LLMs) have significantly improved various natural language processing (NLP) tasks. Typically, LLMs are trained to predict the next token, aligning well with many NLP tasks. However, in knowledge graph (KG) scenarios, entities are the fundamental units and identifying an entity requires at least several tokens. This leads to a granularity mismatch between KGs and natural languages. To address this issue, we propose K-ON, which integrates KG knowledge into the LLM by employing multiple head layers for next k-step prediction. K-ON can not only generate entity-level results in one step, but also enables contrastive loss against entities, which is the most powerful tool in KG representation learning. Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling

Zhuo Chen
Oriol Comas
Zhuotao Jin
Di Luo
Marin Soljacic

We present a universal theoretical framework for understanding *long-context language modeling* based on a *bipartite* mutual information scaling law that we rigorously verify in natural language. We demonstrate that bipartite mutual information captures multi-token interactions distinct from and scaling independently of conventional two-point mutual information, and show that this provides a more complete characterization of the dependencies needed for accurately modeling long sequences. Leveraging this scaling law, we formulate the **L**ong-context **L**anguage **M**odeling (**L**$^2$**M**) condition, which lower bounds the necessary scaling of a model's history state—the latent variables responsible for storing past information—for effective long-context modeling. We validate the framework and its predictions on transformer and state-space models. Our work provides a principled foundation to understand long-context modeling and to design more efficient architectures with stronger long-context capabilities, with potential applications beyond natural language.

AAAI Conference 2025 Conference Paper

Language Model Can Listen While Speaking

Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen

Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM), have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisfactory. To address these limitations, we explore full duplex modeling (FDM) in interactive speech language models (iSLM), focusing on enhancing real-time interaction and, more explicitly, exploring the quintessential ability of interruption. We introduce a novel model design, namely listening-while-speaking language model (LSLM), an end-to-end system equipped with both listening and speaking channels. Our LSLM employs a token-based decoder-only TTS for speech generation and a streaming self-supervised learning (SSL) encoder for real-time audio input. LSLM fuses both channels for autoregressive generation and detects turn-taking in real time. Three fusion strategies—early fusion, middle fusion, and late fusion—are explored, with middle fusion achieving an optimal balance between speech generation and real-time interaction. Two experimental settings, command-based FDM and voice-based FDM, demonstrate LSLM’s robustness to noise and sensitivity to diverse instructions. Our results highlight LSLM’s capability to achieve duplex communication with minimal impact on existing systems. This study aims to advance the development of interactive speech dialogue systems, enhancing their applicability in real-world contexts.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning

Jian Liu
Jing Xu
Song Guo
Jing Li
jingfeng Guo
Jiaao Yu
Haohan Weng
Biwen Lei

Existing pretrained models for 3D mesh generation often suffer from data biases and produce low-quality results, while global reinforcement learning (RL) methods rely on object-level rewards that struggle to capture local structure details. To address these challenges, we present $\textbf{Mesh-RFT}$, a novel fine-grained reinforcement fine-tuning framework that employs Masked Direct Preference Optimization (M-DPO) to enable localized refinement via quality-aware face masking. To facilitate efficient quality evaluation, we introduce an objective topology-aware scoring system to evaluate geometric integrity and topological regularity at both object and face levels through two metrics: Boundary Edge Ratio (BER) and Topology Score (TS). By integrating these metrics into a fine-grained RL strategy, Mesh-RFT becomes the first method to optimize mesh quality at the granularity of individual faces, resolving localized errors while preserving global coherence. Experiment results show that our M-DPO approach reduces Hausdorff Distance (HD) by 24. 6\% and improves Topology Score (TS) by 3. 8\% over pre-trained models, while outperforming global DPO methods with a 17. 4\% HD reduction and 4. 9\% TS gain. These results demonstrate Mesh-RFT’s ability to improve geometric integrity and topological regularity, achieving new state-of-the-art performance in production-ready mesh generation.

NeurIPS Conference 2025 Conference Paper

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
Ruiyang Xu
Wenxi Chen
Yuanzhe Chen

We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1, 000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that are limited to specific domains of sound, music, or speech, MMAR extends them to a broad spectrum of real-world audio scenarios, including mixed-modality combinations of sound, music, and speech. Each question in MMAR is hierarchically categorized across four reasoning layers: Signal, Perception, Semantic, and Cultural, with additional sub-categories within each layer to reflect task diversity and complexity. To further foster research in this area, we annotate every question with a Chain-of-Thought (CoT) rationale to promote future advancements in audio reasoning. Each item in the benchmark demands multi-step deep reasoning beyond surface-level understanding. Moreover, a part of the questions requires graduate-level perceptual and domain-specific knowledge, elevating the benchmark's difficulty and depth. We evaluate MMAR using a broad set of models, including Large Audio-Language Models (LALMs), Large Audio Reasoning Models (LARMs), Omni Language Models (OLMs), Large Language Models (LLMs), and Large Reasoning Models (LRMs), with audio caption inputs. The performance of these models on MMAR highlights the benchmark's challenging nature, and our analysis further reveals critical limitations of understanding and reasoning capabilities among current models. These findings underscore the urgent need for greater research attention in audio-language reasoning, including both data and algorithm innovation. We hope MMAR will serve as a catalyst for future advances in this important but little-explored area.

NeurIPS Conference 2025 Conference Paper

ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models

Zhuo Chen
Yizhen Zheng
Huan Yee Koh
Hongxin Xiang
Linjiang Chen
Wenjie Du
Yang Wang

Molecular Relational Learning (MRL) aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research. With the recent development of large language models (LLMs), a growing number of studies have explored the integration of MRL with LLMs and achieved promising results. However, the increasing availability of diverse LLMs and molecular structure encoders has significantly expanded the model space, presenting major challenges for benchmarking. Currently, there is no LLM framework that supports both flexible molecular input formats and dynamic architectural switching. To address these challenges, reduce redundant coding, and ensure fair model comparison, we propose ModuLM, a framework designed to support flexible LLM-based model construction and diverse molecular representations. ModuLM provides a rich suite of modular components, including 8 types of 2D molecular graph encoders, 11 types of 3D molecular conformation encoders, 7 types of interaction layers, and 7 mainstream LLM backbones. Owing to its highly flexible model assembly mechanism, ModuLM enables the dynamic construction of over 50, 000 distinct model configurations. In addition, we provide comprehensive benchmark results to demonstrate the effectiveness of ModuLM in supporting LLM-based MRL tasks.

EAAI Journal 2025 Journal Article

Multi-source contrastive cluster center method for cross-domain bearing fault identification

Pengfei Chen
Lizhen Wu
Rongzhen Zhao
Kongyuan Wei
Yuqiao Zheng
Linfeng Deng
Yongfei Zhang
Mingkuan Shi

Because of the complexity and time-varying attributes of the exterior surroundings, rolling bearings commonly operate under variable working conditions at any time. Therefore, there is a multi-source domain adaptation problem that involves multiple source domains and a target domain. In this circumstance, identifying faults directly in a multi-source domain through a model built in a single source domain will also lead to limited model generalization ability, i. e. , due to the incompleteness of the training sample size caused by the singularity of the domain quantity limited access to fault knowledge in a single source domain, the established model may be prone to over-fitting, thus whose generalization ability has been decreased under complex and variable working conditions to a certain extent. Hence, this paper has proposed a Multi-source Contrastive Cluster Center (MS3C) Method for addressing the aforementioned issues. Experimental findings on two datasets have suggested that MS3C has not only considered the domain shifts of the same classes between different source domains and the target domain but also adaptively aligned the feature distributions of the same classses in different source domains, therefore, MS3C has a higher identification rate, a better clustering and classification performance and a superior convergence.

AIIM Journal 2025 Journal Article

Multiplex aggregation combining sample reweight composite network for pathology image segmentation

Dawei Fan
Zhuo Chen
Yifan Gao
Jiaming Yu
Kaibin Li
Yi Wei
Yanping Chen
Riqing Chen

In digital pathology, nuclei segmentation is a critical task for pathological image analysis, holding significant importance for diagnosis and research. However, challenges such as blurred boundaries between nuclei and background regions, domain shifts between pathological images, and uneven distribution of nuclei pose significant obstacles to segmentation tasks. To address these issues, we propose an innovative Causal inference inspired Diversified aggregation convolution Network named CDNet, which integrates a Diversified Aggregation Convolution (DAC), a Causal Inference Module (CIM) based on causal discovery principles, and a comprehensive loss function. DAC improves the issue of unclear boundaries between nuclei and background regions, and CIM enhances the model’s cross-domain generalization ability. A novel Stable-Weighted Combined loss function was designed that combined the chunk-computed Dice Loss with the Focal Loss and the Causal Inference Loss to address the issue of uneven nuclei distribution. Experimental evaluations on the MoNuSeg, GLySAC, and MoNuSAC datasets demonstrate that CDNet significantly outperforms other models and exhibits strong generalization capabilities. Specifically, CDNet outperforms the second-best model by 0. 79% (mIoU) and 1. 32% (DSC) on the MoNuSeg dataset, by 2. 65% (mIoU) and 2. 13% (DSC) on the GLySAC dataset, and by 1. 54% (mIoU) and 1. 10% (DSC) on the MoNuSAC dataset. Code is publicly available at https: //github. com/7FFDW/CDNet.

AAAI Conference 2025 Conference Paper

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Yichi Zhang
Zhuo Chen
Lingbing Guo
Yajing Xu
Binbin Hu
Ziqi Liu
Wen Zhang
Huajun Chen

Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given multi-modal knowledge graphs (MMKG), collaboratively leveraging structural information from the triples and multi-modal information of the entities to overcome the inherent incompleteness. Existing MMKGC methods usually extract multi-modal features with pre-trained models and employ fusion modules to integrate multi-modal features for the entities. This often results in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. Motivated by the tokenization technology, MyGO tokenizes multi-modal entity information as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 19 of the latest models, underlining its superior performance.

PDF Details DOI

AAAI Conference 2024 Short Paper

Dual Mapping of 2D StyleGAN for 3D-Aware Image Generation and Manipulation (Student Abstract)

Zhuo Chen
Haimei Zhao
Chaoyue Wang
Bo Yuan
Xiu Li

3D-aware GANs successfully solve the problem of 3D-consistency generation and furthermore provide a 3D shape of the generated object. However, the application of the volume renderer disturbs the disentanglement of the latent space, which makes it difficult to manipulate 3D-aware GANs and lowers the image quality of style-based generators. In this work, we devise a dual-mapping framework to make the generated images of pretrained 2D StyleGAN consistent in 3D space. We utilize a tri-plane representation to estimate the 3D shape of the generated object and two mapping networks to bridge the latent space of StyleGAN and the 3D tri-plane space. Our method does not alter the parameters of the pretrained generator, which means the interpretability of latent space is preserved for various image manipulations. Experiments show that our method lifts the 3D awareness of pretrained 2D StyleGAN to 3D-aware GANs and outperforms the 3D-aware GANs in controllability and image quality.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Dual-Diffusion for Binocular 3D Human Pose Estimation

Xiaoyue Wan
Zhuo Chen
Bingzhi Duan
Xu Zhao

Binocular 3D human pose estimation (HPE), reconstructing a 3D pose from 2D poses of two views, offers practical advantages by combining multiview geometry with the convenience of a monocular setup. However, compared to a multiview setup, the reduction in the number of cameras increases uncertainty in 3D reconstruction. To address this issue, we leverage the diffusion model, which has shown success in monocular 3D HPE by recovering 3D poses from noisy data with high uncertainty. Yet, the uncertainty distribution of initial 3D poses remains unknown. Considering that 3D errors stem from 2D errors within geometric constraints, we recognize that the uncertainties of 3D and 2D are integrated in a binocular configuration, with the initial 2D uncertainty being well-defined. Based on this insight, we propose Dual-Diffusion specifically for Binocular 3D HPE, simultaneously denoising the uncertainties in 2D and 3D, and recovering plausible and accurate results. Additionally, we introduce Z-embedding as an additional condition for denoising and implement baseline-width-related pose normalization to enhance the model flexibility for various baseline settings. This is crucial as 3D error influence factors encompass depth and baseline width. Extensive experiments validate the effectiveness of our Dual-Diffusion in 2D refinement and 3D estimation. The code and models are available at https: //github. com/sherrywan/Dual-Diffusion.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion

Qian Li
Zhuo Chen
Cheng Ji
Shiqi Jiang
Jianxin Li

Knowledge Graphs (KGs) are pivotal in various NLP applications but often grapple with incompleteness, especially due to the long-tail problem where infrequent, unpopular relationships drastically reduce the KG completion performance. In this paper, we focus on Few-shot Knowledge Graph Completion (FKGC), a task addressing these gaps in long-tail scenarios. Amidst the rapid evolution of Large Language Models, we propose a generation-based FKGC paradigm facilitated by LLM distillation. Our MuKDC framework employs multi-level knowledge distillation for few-shot KG completion, generating supplementary knowledge to mitigate data scarcity in few-shot environments. MuKDC comprises two primary components: Multi-level Knowledge Generation, which enriches the KG at various levels, and Consistency Assessment, to ensure the coherence and reliability of the generated knowledge. Most notably, our method achieves SOTA results in both FKGC and multi-modal FKGC benchmarks, significantly advancing KG completion and enhancing the understanding and application of LLMs in structured knowledge generation and assessment.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MKGL: Mastery of a Three-Word Language

Lingbing Guo
Zhongpu Bo
Zhuo Chen
Yichi Zhang
Jiaoyan Chen
Yarong Lan
Mengshu Sun
Zhiqiang Zhang

Large language models (LLMs) have significantly advanced performance across a spectrum of natural language processing (NLP) tasks. Yet, their application to knowledge graphs (KGs), which describe facts in the form of triplets and allow minimal hallucinations, remains an underexplored frontier. In this paper, we investigate the integration of LLMs with KGs by introducing a specialized KG Language (KGL), where a sentence precisely consists of an entity noun, a relation verb, and ends with another entity noun. Despite KGL's unfamiliar vocabulary to the LLM, we facilitate its learning through a tailored dictionary and illustrative sentences, and enhance context understanding via real-time KG context retrieval and KGL token embedding augmentation. Our results reveal that LLMs can achieve fluency in KGL, drastically reducing errors compared to conventional KG embedding methods on KG completion. Furthermore, our enhanced LLM shows exceptional competence in generating accurate three-word sentences from an initial entity and interpreting new unseen terms out of KGs.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Multi-times Monte Carlo Rendering for Inter-reflection Reconstruction

Tengjie Zhu
Zhuo Chen
Jingnan Gao
Yichao Yan
Xiaokang Yang

Inverse rendering methods have achieved remarkable performance in reconstructing high-fidelity 3D objects with disentangled geometries, materials, and environmental light. However, they still face huge challenges in reflective surface reconstruction. Although recent methods model the light trace to learn specularity, the ignorance of indirect illumination makes it hard to handle inter-reflections among multiple smooth objects. In this work, we propose Ref-MC2 that introduces the multi-time Monte Carlo sampling which comprehensively computes the environmental illumination and meanwhile considers the reflective light from object surfaces. To address the computation challenge as the times of Monte Carlo sampling grow, we propose a specularity-adaptive sampling strategy, significantly reducing the computational complexity. Besides the computational resource, higher geometry accuracy is also required because geometric errors accumulate multiple times. Therefore, we further introduce a reflection-aware surface model to initialize the geometry and refine it during inverse rendering. We construct a challenging dataset containing scenes with multiple objects and inter-reflections. Experiments show that our method outperforms other inverse rendering methods on various object groups. We also show downstream applications, e. g. , relighting and material editing, to illustrate the disentanglement ability of our method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Owen Dugan
Donato M. Jiménez-Benetó
Charlotte Loh
Zhuo Chen
Rumen Dangovski
Marin Soljačić

Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in *a single autoregressive step*, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic. Our implementation using Llama 3 with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+, -, \times, \div, \sin{}, \cos{}, \log{}, \exp{}, \sqrt{}$), outperforming GPT 4o with and without a code interpreter. Furthermore, OccamLlama outperforms GPT 4o with and without a code interpreter on average across a range of mathematical problem solving benchmarks, demonstrating that OccamLLMs can excel in arithmetic tasks, even surpassing much larger models. Code is available at https: //github. com/druidowm/OccamLLM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Zhuo Chen
Rumen Dangovski
Charlotte Loh
Owen Dugan
Di Luo
Marin Soljačić

We propose Quan tum-informed T ensor A daptation ( QuanTA ), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)---low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Rethinking the Soft Conflict Pseudo Boolean Constraint on MaxSAT Local Search Solvers

Jiongzhi Zheng
Zhuo Chen
Chu-Min Li
Kun He

MaxSAT is an optimization version of the famous NP-complete Satisfiability problem (SAT). Algorithms for MaxSAT mainly include complete solvers and local search incomplete solvers. In many complete solvers, once a better solution is found, a Soft conflict Pseudo Boolean (SPB) constraint will be generated to enforce the algorithm to find better solutions. In many local search algorithms, clause weighting is a key technique for effectively guiding the search directions. In this paper, we propose to transfer the SPB constraint into the clause weighting system of the local search method, leading the algorithm to better solutions. We further propose an adaptive clause weighting strategy that breaks the tradition of using constant values to adjust clause weights. Based on the above methods, we propose a new local search algorithm called SPB-MaxSAT that provides new perspectives for clause weighting on MaxSAT local search solvers. Extensive experiments demonstrate the excellent performance of the proposed methods.

PDF Details DOI

EAAI Journal 2024 Journal Article

Satisfied and fair two-sided matching method considering dual-reference with linguistic preference

Di Zhang
Zaiwu Gong
Shuli Yan
Zhuo Chen

The psychological behavior characteristics of the individual reference and social reference of the agents are significant factors in two-sided matching that cannot be ignored. How to describe the double reference effect of the individual reference and social reference of the agents and introduce it into two-sided matching has rarely been paid attention to in past research. In order to resolve the two-sided matching problem with linguistic preference, a satisfied and fair two-sided matching approach taking into account the individual reference and social reference of the agents is proposed in this study. First, reference points for upward, parallel, and downward comparisons are determined based on social comparison theory. Then, according to the prospect theory, the linguistic preference of the agents is transformed into gain or loss relative to individual and social reference points, and the comprehensive prospect value of the agents is calculated. On this basis, considering the satisfaction and fairness of matching, a two-sided matching multi-objective optimization model is established. Finally, a case is given to illustrate the feasibility and validity of the proposed method. The results show that the two-sided matching method that considers both individual reference and social reference has stronger applicability and operability and can effectively improve the efficiency of matching, reduce ineffective matching decisions, and lower the cost of matching.

AAAI Conference 2024 Conference Paper

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Yufeng Huang
Jiji Tang
Zhuo Chen
Rongsheng Zhang
Xinfeng Zhang
Weijie Chen
Zeng Zhao
Zhou Zhao

Large-scale vision-language pre-training has achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require structured representations, i.e., representations of objects, attributes, and relations. The models cannot make a distinction between "An astronaut rides a horse" and "A horse rides an astronaut". This is because they fail to fully leverage structured knowledge when learning multi-modal representations. In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations. Firstly, we use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. Moreover, a Knowledge-Enhance Encoder (KEE) is proposed to leverage SGK as input to further enhance structured representations. To verify the effectiveness of the proposed framework, we pre-train our model with the aforementioned approaches and conduct experiments on downstream tasks. Experimental results demonstrate that Structure-CLIP achieves state-of-the-art (SOTA) performance on VG-Attribution and VG-Relation datasets, with 12.5% and 4.1% ahead of the multi-modal SOTA model respectively. Meanwhile, the results on MSCOCO indicate that Structure-CLIP significantly enhances the structured representations while maintaining the ability of general representations. Our code is available at https://github.com/zjukg/Structure-CLIP.

PDF Details DOI

AAAI Conference 2024 Short Paper

STViT: Improving Self-Supervised Multi-Camera Depth Estimation with Spatial-Temporal Context and Adversarial Geometry Regularization (Student Abstract)

Zhuo Chen
Haimei Zhao
Bo Yuan
Xiu Li

Multi-camera depth estimation has recently garnered significant attention due to its substantial practical implications in the realm of autonomous driving. In this paper, we delve into the task of self-supervised multi-camera depth estimation and propose an innovative framework, STViT, featuring several noteworthy enhancements: 1) we propose a Spatial-Temporal Transformer to comprehensively exploit both local connectivity and the global context of image features, meanwhile learning enriched spatial-temporal cross-view correlations to recover 3D geometry. 2) to alleviate the severe effect of adverse conditions, e.g., rainy weather and nighttime driving, we introduce a GAN-based Adversarial Geometry Regularization Module (AGR) to further constrain the depth estimation with unpaired normal-condition depth maps and prevent the model from being incorrectly trained. Experiments on challenging autonomous driving datasets Nuscenes and DDAD show that our method achieves state-of-the-art performance.

PDF Details DOI

ICML Conference 2024 Conference Paper

TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

Zhuo Chen
Jacob McCarran
Esteban Vizcaino
Marin Soljacic
Di Luo

Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the Time-Evolving Natural Gradient (TENG), generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG’s effectiveness is further validated through its performance, surpassing current leading methods and achieving machine precision in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers’ equation.

EAAI Journal 2023 Journal Article

An unsupervised neural network for graphical health index construction and residual life prediction

Zhen Li
Tao Tao
Meng Yang
Jibin Wang
Zhuo Chen
Jianguo Wu

To better characterize the health status and performing remaining useful life prediction, a composite health index is developed through the fusion of multi-channel signals. However, most of the existing literature limits the data fusion to be linear, which implies that the underlying degradation pattern must follow a linear form. This strong prerequisite of these approaches undermines the effectiveness of existing techniques for capturing the potential nonlinear nature of degradation process. In order to overcome this limitation as well as to improve the predictability, this paper proposes a nonlinear health index construction method achieving by an unsupervised neural network. Specifically, a neural network structure is introduced to approximate the highly nonlinear relationship between signals and health status. Furthermore, we consider the remaining useful life prediction as a binary classification problem, and then propose a maximal classification margin constraint, which is integrated with the monotonicity and minimal variability at the failure time to formulate the novel loss function. To estimate the model parameter, we developed a customized adaptive moment estimation algorithm (Adam). The comprehensive case study is performed based on the benchmark C-MAPSS dataset. As reported in the experiment, the constructed health index can better characterize the underlying degradation process.

NeurIPS Conference 2023 Conference Paper

ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation

Zhuo Chen
Laker Newhouse
Eddie Chen
Di Luo
Marin Soljacic

Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology. However, due to the exponentially growing size of the Hilbert space with respect to the particle number, a direct simulation is intractable. While representing quantum states with tensor networks and neural networks are the two state-of-the-art methods for approximate simulations, each has its own limitations in terms of expressivity and inductive bias. To address these challenges, we develop a novel architecture, Autoregressive Neural TensorNet (ANTN), which bridges tensor networks and autoregressive neural networks. We show that Autoregressive Neural TensorNet parameterizes normalized wavefunctions, allows for exact sampling, generalizes the expressivity of tensor networks and autoregressive neural networks, and inherits a variety of symmetries from autoregressive neural networks. We demonstrate our approach on quantum state learning as well as finding the ground state of the challenging 2D $J_1$-$J_2$ Heisenberg model with different systems sizes and coupling parameters, outperforming both tensor networks and autoregressive neural networks. Our work opens up new opportunities for quantum many-body physics simulation, quantum technology design, and generative modeling in artificial intelligence.

AAAI Conference 2023 Conference Paper

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Zhuo Chen
Yufeng Huang
Jiaoyan Chen
Yuxia Geng
Wen Zhang
Yin Fang
Jeff Z. Pan
Huajun Chen

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Newton–Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

Lingbing Guo
Weiqing Wang
Zhuo Chen
Ningyu Zhang
Zequn Sun
Yixuan Lai
Qiang Zhang
Huajun Chen

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies. With the initial state of a system as input, the recent graph neural networks (GNNs)-based methods are capable of predicting the future state distant in time with high accuracy. Although these methods have diverse designs in modeling the coordinates and interacting forces of the system, we show that they actually share a common paradigm that learns the integration of the velocity over the interval between the initial and terminal coordinates. However, their integrand is constant w. r. t. time. Inspired by this observation, we propose a new approach to predict the integration based on several velocity estimations with Newton–Cotes formulas and prove its effectiveness theoretically. Extensive experiments on several benchmarks empirically demonstrate consistent and significant improvement compared with the state-of-the-art methods.

ICML Conference 2022 Conference Paper

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

Yikang Zhang
Zhuo Chen
Zhao Zhong

In this paper, we propose a Collaboration of Experts (CoE) framework to assemble the expertise of multiple networks towards a common goal. Each expert is an individual network with expertise on a unique portion of the dataset, contributing to the collective capacity. Given a sample, delegator selects an expert and simultaneously outputs a rough prediction to trigger potential early termination. For each model in CoE, we propose a novel training algorithm with two major components: weight generation module (WGM) and label generation module (LGM). It fulfills the co-adaptation of experts and delegator. WGM partitions the training data into portions based on delegator via solving a balanced transportation problem, then impels each expert to focus on one portion by reweighting the losses. LGM generates the label to constitute the loss of delegator for expert selection. CoE achieves the state-of-the-art performance on ImageNet, 80. 7% top-1 accuracy with 194M FLOPs. Combined with PWLU and CondConv, CoE further boosts the accuracy to 80. 0% with only 100M FLOPs for the first time. Furthermore, experiment results on the translation task also demonstrate the strong generalizability of CoE. CoE is hardware-friendly, yielding a 3 6x acceleration compared with existing conditional computation approaches.

AAAI Conference 2022 Conference Paper

Molecular Contrastive Learning with Chemical Element Knowledge Graph

Yin Fang
Qiang Zhang
Haihong Yang
Xiang Zhuang
Shumin Deng
Wen Zhang
Ming Qin
Zhuo Chen

Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes selfsupervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledgeaware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs.

IJCAI Conference 2021 Conference Paper

Knowledge-aware Zero-Shot Learning: Survey and Perspective

Jiaoyan Chen
Yuxia Geng
Zhuo Chen
Ian Horrocks
Jeff Z. Pan
Huajun Chen

Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a. side information) has been widely investigated. In this paper we present a literature review towards ZSL in the perspective of external knowledge, where we categorize the external knowledge, review their methods and compare different external knowledge. With the literature review, we further discuss and outlook the role of symbolic knowledge in addressing ZSL and other machine learning sample shortage issues.

PDF Details DOI

ICRA Conference 2014 Conference Paper

Predicting initialization effectiveness for trajectory optimization

Jia Pan 0001
Zhuo Chen
Pieter Abbeel

Trajectory optimization is a method for solving motion planning problems by formulating them as non-convex constrained optimization problems. The optimization process, however, can get stuck in local optima that are in collision. As a consequence, these methods typically require multiple initializations. This poses the problem of deciding which initializations to use when given a limited computational budget. In this paper we propose a machine learning approach to predict whether a collision-free solution will be found from a given initialization. We present a set of trajectory features that encode the obstacle distribution locally around a robot. These features are designed for generalization across different tasks. Our experiments on various planning benchmarks demonstrate the performance of our approach.