Author name cluster

Jie Fu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

TMLR Journal 2026 Journal Article

Re:Form --- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

Chuanhao Yan
Fengdi Che
Xuhan Huang
Xu Xu
Xin Li
Yizhi Li
Xingwei Qu
Jingzhe Shi

Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither reliable nor scalable. In fact, the prevalent large proprietary models could hardly generate verifiable programs. A promising yet largely uncharted alternative is formal language-based reasoning. Grounding LLMs in rigorous formal systems where generative models operate in formal language spaces (e.g., Dafny) enables the automatic and mathematically provable verification of their reasoning processes and outcomes. This capability is pivotal for achieving large-scale, reliable formal software verification. It is a common practice to employ human-annotated chain-of-thought and other human priors to induce the reasoning and coding capabilities of LLMs. Unfortunately, it becomes unacceptably all-consuming to provide such priors for supervising complex programming tasks. In this work, we systematically explore ways to reduce human priors with the formal language, Dafny, as the main environment for our pilot study. Our pipeline mainly relies on introducing an automatic and scalable data curation pipeline, and careful RL designs integrated with feedback from the formal language verifier. We introduce DafnyComp, a benchmark of compositional formal programs with auto-formalized specifications for specification reasoning. Our supervised fine-tuning (SFT) stage enables even small models (e.g., 0.5B) to generate syntactically valid and verifiable Dafny code, surpassing proprietary models. RL with regularization further improves performance, achieving stronger generalization to out-of-domain tasks and outperforming all strong baselines on the challenging DafnyComp benchmark. Anonymized code and models are available at https://github.com/ReFormDafny/ReForm and https://huggingface.co/ReFormDafny.

TMLR Journal 2025 Journal Article

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models

Shuai Zhao
Meihuizi Jia
Zhongliang Guo
Leilei Gan
Xiaoyu Xu
Xiaobao Wu
Jie Fu
Feng Yichao

Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LLMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and no fine-tuning. Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms.

TMLR Journal 2025 Journal Article

Adaptive Incentive Design for Markov Decision Processes with Unknown Rewards

Haoxiang Ma
Shuo Han
Ahmed Hemida
Charles A kamhoua
Jie Fu

Incentive design, also known as model design or environment design for Markov decision processes(MDPs), refers to a class of problems in which a leader can incentivize his follower by modifying the follower's reward function, in anticipation that the follower's optimal policy in the resulting MDP can be desirable for the leader's objective. In this work, we propose gradient-ascent algorithms to compute the leader's optimal incentive design, despite the lack of knowledge about the follower's reward function. First, we formulate the incentive design problem as a bi-level optimization problem and demonstrate that, by the softmax temporal consistency between the follower's policy and value function, the bi-level optimization problem can be reduced to single-level optimization, for which a gradient-based algorithm can be developed to optimize the leader's objective. We establish several key properties of incentive design in MDPs and prove the convergence of the proposed gradient-based method. Next, we show that the gradient terms can be estimated from observations of the follower's best response policy, enabling the use of a stochastic gradient-ascent algorithm to compute a locally optimal incentive design without knowing or learning the follower's reward function. Finally, we analyze the conditions under which an incentive design remains optimal for two different rewards which are policy invariant. The effectiveness of the proposed algorithm is demonstrated using a small probabilistic transition system and a stochastic gridworld.

TMLR Journal 2025 Journal Article

Generating Symbolic World Models via Test-time Scaling of Large Language Models

Zhouliang Yu
Yuhuan Yuan
Tim Z. Xiao
Fuxiang Frank Xia
Jie Fu
Ge Zhang
Ge lin
Weiyang Liu

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality—a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domains, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform the current state-of-the-art methods on almost all competition-level planning tasks.

AAAI Conference 2025 Conference Paper

Learning Nash Equilibrium of Markov Potential Games with a Shared Constraint via Primal-Dual Optimization

Songtao Feng
Michael Dorothy
Jie Fu

The problem of constrained Markov game has recently attracted interests in the study of multi-agent reinforcement learning (MARL). The existing literature has focused on safe MARL problems where safety constraints are imposed for each agent individually. In this work, we consider Markov potential game (MPG) with a shared constraint, where the cost function with respect to the constraint depends on states and joint actions of all agents. We adopt a primal-dual framework to tackle the problem and establish the Slater condition to ensure the strong duality. Moreover, we propose our primal-dual learning algorithm for learning approximate Nash equilibrium in MPG with shared constraint. Thanks to the novel design of the dual update, we provide asymptotic convergence on the weighted output policy. Specifically, we prove that both the value function gap and the constraint violation of the output policy converge at the rate O(epsilon+1/sqrt(T)), where epsilon is the accuracy level of the primal update, and T is the number of iterations. We further show that the weighted output policy outperforms the existing uniformly chosen policy.

PDF Details DOI

ICML Conference 2025 Conference Paper

PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

Jianqing Zhang
Yang Liu 0165
Jie Fu
Yang Hua 0001
Tianyuan Zou
Jian Cao 0001
Qiang Yang 0001

The rise of generative APIs has fueled interest in privacy-preserving synthetic data generation. While the Private Evolution (PE) algorithm generates Differential Privacy (DP) synthetic images using diffusion model APIs, it struggles with few-shot private data due to the limitations of its DP-protected similarity voting approach. In practice, the few-shot private data challenge is particularly prevalent in specialized domains like healthcare and industry. To address this challenge, we propose a novel API-assisted algorithm, Private Contrastive Evolution (PCEvolve), which iteratively mines inherent inter-class contrastive relationships in few-shot private data beyond individual data points and seamlessly integrates them into an adapted Exponential Mechanism (EM) to optimize DP’s utility in an evolution loop. We conduct extensive experiments on four specialized datasets, demonstrating that PCEvolve outperforms PE and other API-assisted baselines. These results highlight the potential of leveraging API access with private data for quality evaluation, enabling the generation of high-quality DP synthetic images and paving the way for more accessible and effective privacy-preserving generative API applications. Our code is available at https: //github. com/TsingZ0/PCEvolve.

JAIR Journal 2025 Journal Article

Robust Reward Design for Markov Decision Processes

Shuo Wu
Haoxiang Ma
Jie Fu
Shuo Han

The problem of reward design examines the interaction between a leader and a follower, where the leader aims to shape the follower’s behavior to maximize the leader’s payoff by modifying the follower’s reward function. Current approaches to reward design rely on an accurate model of how the follower responds to reward modifications, which can be sensitive to modeling inaccuracies. To address this issue of sensitivity, we present a solution that offers robustness against uncertainties in modeling the follower, including 1) how the follower breaks ties in the presence of nonunique best responses, 2) inexact knowledge of how the follower perceives reward modifications, and 3) bounded rationality of the follower. Our robust solution is guaranteed to exist under mild conditions and can be obtained numerically by solving a mixed-integer linear program. Numerical experiments on multiple test cases demonstrate that our solution improves robustness compared to the standard approach without incurring significant additional computing costs.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Sequential Decision Making in Stochastic Games with Incomplete Preferences over Temporal Objectives

Abhishek Ninad Kulkarni
Jie Fu
Ufuk Topcu

Ensuring that AI systems make strategic decisions aligned with the specified preferences in adversarial sequential interactions is a critical challenge for developing trustworthy AI systems, especially when the environment is stochastic and players' incomplete preferences leave some outcomes unranked. We study the problem of synthesizing preference-satisfying strategies in two-player stochastic games on graphs where players have opposite (possibly incomplete) preferences over a set of temporal goals. We represent these goals using linear temporal logic over finite traces (LTLf), which enables modeling the nuances of human preferences where temporal goals need not be mutually exclusive and comparison between some goals may be unspecified. We introduce a solution concept of non-dominated almost-sure winning, which guarantees to achieve a most preferred outcome aligned with specified preferences while maintaining robustness against the adversarial behaviors of the opponent. Our results show that strategy profiles based on this concept are Nash equilibria in the game where players are risk-averse, thus providing a practical framework for evaluating and ensuring stable, preference-aligned outcomes in the game. Using a drone delivery example, we demonstrate that our contributions offer valuable insights not only for synthesizing rational behavior under incomplete preferences but also for designing games that motivate the desired behavior from the players in adversarial conditions.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Thinker: Learning to Think Fast and Slow

Stephen Chung
Wenyu Du
Jie Fu

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting in long, redundant responses and highlighting deficiencies in intuition and verification. Inspired by the Dual Process Theory in psychology, we introduce a simple modification to the QA task that includes four stages: Fast Thinking, where the LLM must answer within a strict token budget; Verification, where the model evaluates its initial response; Slow Thinking, where it refines the initial response with more deliberation; and Summarization, where it distills the refinement from the previous stage into precise steps. Our proposed task improves average accuracy from 25. 6% to 27. 3% for Qwen2. 5-1. 5B, and from 45. 9% to 51. 0% for DeepSeek-R1-Qwen-1. 5B. Notably, for Qwen2. 5-1. 5B, the Fast Thinking mode alone achieves 25. 2% accuracy using fewer than 1000 tokens, demonstrating substantial inference efficiency gains. These findings suggest that intuition and deliberative reasoning are distinct, complementary systems benefiting from targeted training. Additionally, we have open-sourced both the trained models and the source code.

IJCAI Conference 2024 Conference Paper

AutoAgents: A Framework for Automatic Agent Generation

Guangyao Chen
Siwei Dong
Yu Shu
Ge Zhang
Jaward Sesay
Börje Karlsson
Jie Fu
Yemin Shi

Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. The repository of this project is available at https: //github. com/Link-AGI/AutoAgents.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Covert Planning aganist Imperfect Observers

Haoxiang Ma
Chongyang Shi
Shuo Han
Michael R. Dorothy
Jie Fu

Covert planning refers to a class of constrained planning problems where an agent aims to accomplish a task with minimal information leaked to a passive observer to avoid detection. However, existing methods of covert planning often consider deterministic environments or do not exploit the observer’s imperfect information. This paper studies how covert planning can leverage the coupling of stochastic dynamics and the observer’s imperfect observation to achieve optimal task performance without being detected. Specifically, we employ a Markov decision process to model the interaction between the agent and its stochastic environment, and a partial observation function to capture the leaked information to a passive observer. Assuming the observer employs hypothesis testing to detect if the observation deviates from a nominal policy, the covert planning agent aims to maximize the total discounted reward while keeping the probability of being detected as an adversary below a given threshold. We prove that finite-memory policies are more powerful than Markovian policies in covert planning. Then, we develop a primal-dual proximal policy gradient method with a twotime-scale update to compute a (locally) optimal covert policy. We demonstrate the effectiveness of our methods using a stochastic gridworld example. Our experimental results illustrate that the proposed method computes a policy that maximizes the adversary’s expected reward without violating the detection constraint, and empirically demonstrates how the environmental noises can influence the performance of the covert policies.

NeurIPS Conference 2024 Conference Paper

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Haoran Que
Jiaheng Liu
Ge Zhang
Chenchen Zhang
Xingwei Qu
Yinghao Ma
Feiyu Duan
Zhiqi Bai

Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model’s fundamental understanding of specific downstream domains (e. g. , math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e. g. , Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually adopt laborious human efforts by grid-searching on a set of mixture ratios, which require high GPU training consumption costs. Besides, we cannot guarantee the selected ratio is optimal for the specific domain. To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes. Specifically, by fitting the D-CPT Law, we can easily predict the general and downstream performance of arbitrary mixture ratios, model sizes, and dataset sizes using small-scale training costs on limited experiments. Moreover, we also extend our standard D-CPT Law on cross-domain settings and propose the Cross-Domain D-CPT Law to predict the D-CPT law of target domains, where very small training costs (about 1\% of the normal training costs) are needed for the target domains. Comprehensive experimental results on six downstream domains demonstrate the effectiveness and generalizability of our proposed D-CPT Law and Cross-Domain D-CPT Law.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Chongyang Shi
Yuheng Bu
Jie Fu

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered---Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

PDF Details DOI

EAAI Journal 2024 Journal Article

Instance segmentation algorithm for sorting dismantling components of end-of-life vehicles

Binbin Fan
Xunpeng Qin
Qiang Wu
Jie Fu
Zhongliang Hu
Zhe Wang

AAAI Conference 2024 Conference Paper

Scalable Geometric Fracture Assembly via Co-creation Space among Assemblers

Ruiyuan Zhang
Jiaxiang Liu
Zexi Li
Hao Dong
Jie Fu
Chao Wu

Geometric fracture assembly presents a challenging practical task in archaeology and 3D computer vision. Previous methods have focused solely on assembling fragments based on semantic information, which has limited the quantity of objects that can be effectively assembled. Therefore, there is a need to develop a scalable framework for geometric fracture assembly without relying on semantic information. To improve the effectiveness of assembling geometric fractures without semantic information, we propose a co-creation space comprising several assemblers capable of gradually and unambiguously assembling fractures. Additionally, we introduce a novel loss function, i.e., the geometric-based collision loss, to address collision issues during the fracture assembly process and enhance the results. Our framework exhibits better performance on both PartNet and Breaking Bad datasets compared to existing state-of-the-art frameworks. Extensive experiments and quantitative comparisons demonstrate the effectiveness of our proposed framework, which features linear computational complexity, enhanced abstraction, and improved generalization. Our code is publicly available at https://github.com/Ruiyuan-Zhang/CCS.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Yikang Shen
Reynold Cheng
Yike Guo
Jie Fu

LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehensive evaluation, ($\textit{O}$2) untested viability for scaling, and ($\textit{O}$3) lack of empirical guidelines. To tackle $\textit{O}$1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called $G_{\text{stack}}$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into $G_{\text{stack}}$ to address $\textit{O}$2 and $\textit{O}$3. For $\textit{O}$2 (untested scalability), our study shows that $G_{\text{stack}}$ is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54. 6\% speedup. We further address $\textit{O}$3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for $G_{\text{stack}}$, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of $G_{\text{stack}}$. Our code and pre-trained model are available at https: //llm-stacking. github. io/.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning

Haiteng Zhao
Shengchao Liu
Ma Chang
Hannan Xu
Jie Fu
Zhihong Deng
Lingpeng Kong
Qi Liu

Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.

NeurIPS Conference 2023 Conference Paper

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Ruibin Yuan
Yinghao Ma
Yizhi Li
Ge Zhang
Xingran Chen
Hanzhi Yin
zhuo le
Yiqi Liu

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 18 tasks on 12 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published to promote future music AI research.

NeurIPS Conference 2023 Conference Paper

Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias

Zhongwei Wan
Che Liu
Mi Zhang
Jie Fu
Benyou Wang
Sibo Cheng
Lei Ma
César Quilodrán-Casas

The scarcity of data presents a critical obstacle to the efficacy of medical vision-language pre-training (VLP). A potential solution lies in the combination of datasets from various language communities. Nevertheless, the main challenge stems from the complexity of integrating diverse syntax and semantics, language-specific medical terminology, and culture-specific implicit knowledge. Therefore, one crucial aspect to consider is the presence of community bias caused by different languages. This paper presents a novel framework named Unifying Cross-Lingual Medical Vision-Language Pre-Training (\textbf{Med-UniC}), designed to integrate multi-modal medical data from the two most prevalent languages, English and Spanish. Specifically, we propose \textbf{C}ross-lingual \textbf{T}ext Alignment \textbf{R}egularization (\textbf{CTR}) to explicitly unify cross-lingual semantic representations of medical reports originating from diverse language communities. \textbf{CTR} is optimized through latent language disentanglement, rendering our optimization objective to not depend on negative samples, thereby significantly mitigating the bias from determining positive-negative sample pairs within analogous medical reports. Furthermore, it ensures that the cross-lingual representation is not biased toward any specific language community. \textbf{Med-UniC} reaches superior performance across 5 medical image tasks and 10 datasets encompassing over 30 diseases, offering a versatile framework for unifying multi-modal medical data within diverse linguistic communities. The experimental outcomes highlight the presence of community bias in cross-lingual VLP. Reducing this bias enhances the performance not only in vision-language tasks but also in uni-modal visual tasks.

AAMAS Conference 2023 Conference Paper

Optimal Decoy Resource Allocation for Proactive Defense in Probabilistic Attack Graphs

Haoxiang Ma
Shuo Han
Nandi Leslie
Charles Kamhoua
Jie Fu

This paper investigates the problem of synthesizing proactive defense systems in which the defender can allocate deceptive targets and modify the cost of actions for the attacker who aims to compromise security assets in this system. We model the interaction of the attacker and the system using a formal security model– a probabilistic attack graph. By allocating fake targets/decoys, the defender aims to distract the attacker from compromising true targets. By increasing the cost of some attack actions, the defender aims to discourage the attacker from committing to certain policies and thereby improve the defense. To optimize the defense given limited decoy resources and operational constraints, we formulate the synthesis problem as a bi-level optimization problem, while the defender designs the system, in anticipation of the attacker’s best response given that the attacker has disinformation about the system due to the use of deception. Though the general formulation with bi-level optimization is NP-hard, we show that under certain assumptions, the problem can be transformed into a constrained optimization problem. We proposed an algorithm to approximately solve this constrained optimization problem using a novel, incentive-design method for projected gradient ascent. We demonstrate the effectiveness of the proposed method using numerical experiments.

IJCAI Conference 2023 Conference Paper

Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives

Lening Li
Hazhar Rahmani
Jie Fu

This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system's behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Quantitative Planning with Action Deception in Concurrent Stochastic Games

Chongyang Shi
Shuo Han
Jie Fu

We study a class of two-player competitive concurrent stochastic games on graphs with reachability objectives. Specifically, player 1 aims to reach a subset 𝐹1 of game states, and player 2 aims to reach a subset 𝐹2 of game states where 𝐹2 ∩ 𝐹1 = ∅. Both players aim to satisfy their reachability objectives before their opponent does. Yet, the information players have about the game dynamics is asymmetric: P1 has a (set of) hidden actions unknown to P2 at the beginning of their interaction. In this setup, we investigate P1’s strategic planning of action deception that decides when to deviate from the Nash equilibrium in P2’s game model and employ a hidden action, so that P1 can maximize the value of action deception, which is the additional payoff compared to P1’s payoff in the game where P2 has complete information. Anticipating that P2 may detect his misperception about the game and adapt his strategy during interaction in unpredictable ways, we construct a planning problem for P1 to augment the game model with an incomplete model about the theory of mind of the opponent P2. While planning in the augmented game, P1 can effectively influence P2’s perception so as to entice P2 to take actions that benefit P1. We prove that the proposed deceptive planning algorithm maximizes a lower bound on the value of action deception and demonstrate the effectiveness of our deceptive planning algorithm using a robot motion planning problem inspired by soccer games.

NeurIPS Conference 2023 Conference Paper

When Do Graph Neural Networks Help with Node Classification? Investigating the Homophily Principle on Node Distinguishability

Sitao Luan
Chenqing Hua
Minkai Xu
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Jie Fu
Jure Leskovec

Homophily principle, i. e. , nodes with the same labels are more likely to be connected, has been believed to be the main reason for the performance superiority of Graph Neural Networks (GNNs) over Neural Networks on node classification tasks. Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighborhood patterns. However, this argument only considers intra-class Node Distinguishability (ND) but neglects inter-class ND, which provides incomplete understanding of homophily on GNNs. In this paper, we first demonstrate such deficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and study ND deeply, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and negative generalized Jeffreys divergence, to quantify ND. With the metrics, we visualize and analyze how graph filters, node degree distributions and class variances influence ND, and investigate the combined effect of intra- and inter-class ND. Besides, we discovered the mid-homophily pitfall, which occurs widely in graph datasets. Furthermore, we verified that, in real-work tasks, the superiority of GNNs is indeed closely related to both intra- and inter-class ND regardless of homophily levels. Grounded in this observation, we propose a new hypothesis-testing based performance metric beyond homophily, which is non-linear, feature-based and can provide statistical threshold value for GNNs' the superiority. Experiments indicate that it is significantly more effective than the existing homophily metrics on revealing the advantage and disadvantage of graph-aware modes on both synthetic and benchmark real-world datasets.

NeurIPS Conference 2022 Conference Paper

Bidirectional Learning for Offline Infinite-width Model-based Optimization

Can Chen
Yingxueff Zhang
Jie Fu
Xue (Steve) Liu
Mark Coates

In offline model-based optimization, we strive to maximize a black-box objective function by only leveraging a static dataset of designs and their scores. This problem setting arises in numerous fields including the design of materials, robots, DNAs, proteins, etc. Recent approaches train a deep neural network (DNN) model on the static dataset to act as a proxy function, and then perform gradient ascent on the existing designs to obtain potentially high-scoring designs. This methodology frequently suffers from the out-of-distribution problem where the proxy function often returns adversarial designs. To mitigate this problem, we propose $\textit{\textbf{B}i\textbf{D}irectional learning for offline \textbf{I}nfinite-width model-based optimization}~(\textbf{BDI})$. BDI consists of two mappings: the forward mapping leverages the static dataset to predict the scores of the high-scoring designs, and the backward mapping leverages the high-scoring designs to predict the scores of the static dataset. The backward mapping, neglected in previous work, can distill more information of the static dataset into the high-scoring designs, which effectively mitigates the out-of-distribution problem. Yet, for a finite-width DNN model, the loss function of the backward mapping is intractable and only has an approximate form, which leads to a significant deterioration of the design quality. We thus adopt an infinite-width DNN model and propose to employ the corresponding neural tangent kernel to yield a closed-form loss for more accurate design updates. Experiments on various tasks verify the effectiveness of BDI. The code is available [here](https: //github. com/GGchen1997/BDI).

TMLR Journal 2022 Journal Article

Evolving Decomposed Plasticity Rules for Information-Bottlenecked Meta-Learning

Fan Wang
Hao Tian
Haoyi Xiong
Hua Wu
Jie Fu
Yang Cao
Yu Kang
Haifeng Wang

Artificial neural networks (ANNs) are typically confined to accomplishing pre-defined tasks by learning a set of static parameters. In contrast, biological neural networks (BNNs) can adapt to various new tasks by continually updating the neural connections based on the inputs, which is aligned with the paradigm of learning effective learning rules in addition to static parameters, \textit{e.g.}, meta-learning. Among various biologically inspired learning rules, Hebbian plasticity updates the neural network weights using local signals without the guide of an explicit target function, thus enabling an agent to learn automatically without human efforts. However, typical plastic ANNs using a large amount of meta-parameters violate the nature of the genomics bottleneck and potentially deteriorate the generalization capacity. This work proposes a new learning paradigm decomposing those connection-dependent plasticity rules into neuron-dependent rules thus accommodating $\Theta(n^2)$ learnable parameters with only $\Theta(n)$ meta-parameters. We also thoroughly study the effect of different neural modulation on plasticity. Our algorithms are tested in challenging random 2D maze environments, where the agents have to use their past experiences to shape the neural connections and improve their performances for the future. The results of our experiment validate the following: 1. Plasticity can be adopted to continually update a randomly initialized RNN to surpass pre-trained, more sophisticated recurrent models, especially when coming to long-term memorization. 2. Following the genomics bottleneck, the proposed decomposed plasticity can be comparable to or even more effective than canonical plasticity rules in some instances.

ICLR Conference 2021 Conference Paper

CoCon: A Self-Supervised Approach for Controlled Text Generation

Alvin Chan
Yew-Soon Ong
Bill Pung
Aston Zhang
Jie Fu

Pretrained Transformer-based language models (LMs) display remarkable natural language generation capabilities. With their immense potential, controlling text generation of such LMs is getting attention. While there are studies that seek to control high-level attributes (such as sentiment and topic) of generated text, there is still a lack of more precise control over its content at the word- and phrase-level. Here, we propose Content-Conditioner (CoCon) to control an LM's output text with a content input, at a fine-grained level. In our self-supervised approach, the CoCon block learns to help the LM complete a partially-observed text sequence by conditioning with content inputs that are withheld from the LM. Through experiments, we show that CoCon can naturally incorporate target content into generated texts and control high-level text attributes in a zero-shot manner.

ICLR Conference 2020 Conference Paper

Jacobian Adversarially Regularized Networks for Robustness

Alvin Chan
Yi Tay
Yew-Soon Ong
Jie Fu

Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks. Against such attacks, adversarial training and its variants stand as the strongest defense to date. Previous studies have pointed out that robust models that have undergone adversarial training tend to produce more salient and interpretable Jacobian matrices than their non-robust counterparts. A natural question is whether a model trained with an objective to produce salient Jacobian can result in better robustness. This paper answers this question with affirmative empirical results. We propose Jacobian Adversarially Regularized Networks (JARN) as a method to optimize the saliency of a classifier's Jacobian by adversarially regularizing the model's Jacobian to resemble natural training images. Image classifiers trained with JARN show improved robust accuracy compared to standard models on the MNIST, SVHN and CIFAR-10 datasets, uncovering a new angle to boost robustness without using adversarial training.

AAAI Conference 2020 Conference Paper

Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning

Dayiheng Liu
Jie Fu
Yidan Zhang
Chris Pal
Jiancheng Lv

Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to ﬁnd the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple ﬁne-grained attributes, such as sentence length and the presence of speciﬁc words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method signiﬁcantly outperforms ﬁve state-of-the-art methods.

IJCAI Conference 2020 Conference Paper

Semi-Dynamic Hypergraph Neural Network for 3D Pose Estimation

Shengyuan Liu
Pei Lv
Yuzhen Zhang
Jie Fu
Junjin Cheng
Wanqing Li
Bing Zhou
Mingliang Xu

This paper proposes a novel Semi-Dynamic Hypergraph Neural Network (SD-HNN) to estimate 3D human pose from a single image. SD-HNN adopts hypergraph to represent the human body to effectively exploit the kinematic constrains among adjacent and non-adjacent joints. Specifically, a pose hypergraph in SD-HNN has two components. One is a static hypergraph constructed according to the conventional tree body structure. The other is the semi-dynamic hypergraph representing the dynamic kinematic constrains among different joints. These two hypergraphs are combined together to be trained in an end-to-end fashion. Unlike traditional Graph Convolutional Networks (GCNs) that are based on a fixed tree structure, the SD-HNN can deal with ambiguity in human pose estimation. Experimental results demonstrate that the proposed method achieves state-of-the-art performance both on the Human3. 6M and MPI-INF-3DHP datasets.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Synthesis of Deceptive Strategies in Reachability Games with Action Misperception

Abhishek N. Kulkarni
Jie Fu

We consider a class of two-player turn-based zero-sum games on graphs with reachability objectives, known as reachability games, where the objective of Player 1 (P1) is to reach a set of goal states, and that of Player 2 (P2) is to prevent this. In particular, we consider the case where the players have asymmetric information about each other's action capabilities: P2 starts with an incomplete information (misperception) about P1's action set, and updates the misperception when P1 uses an action previously unknown to P2. When P1 is made aware of P2's misperception, the key question is whether P1 can control P2's perception so as to deceive P2 into selecting actions to P1's advantage? To answer this question, we introduce a dynamic hypergame model to capture the reachability game with evolving misperception of P2. Then, we present a fixed-point algorithm to compute the deceptive winning region and strategy for P1 under almost-sure winning condition. Finally, we show that the synthesized deceptive winning strategy is at least as powerful as the (non-deceptive) winning strategy in the game in which P1 does not account for P2's misperception. We illustrate our algorithm using a robot motion planning in an adversarial environment.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Learning Multi-Task Communication with Message Passing for Sequence Learning

Pengfei Liu
Jie Fu
Yue Dong
Xipeng Qiu
Jackie Chi Kit Cheung

We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different tasks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.

IJCAI Conference 2016 Conference Paper

DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks

Jie Fu
Hongyin Luo
Jiashi Feng
Kian Hsiang Low
Tat-Seng Chua

The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on several image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to state-of-the-art methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks. The code can be downloaded from https: //github. com/bigaidream-projects/drmad

AAAI Conference 2015 Conference Paper

AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis

Erik Cambria
Jie Fu
Federica Bisio
Soujanya Poria

Predicting the affective valence of unknown multi-word expressions is key for concept-level sentiment analysis. AffectiveSpace 2 is a vector space model, built by means of random projection, that allows for reasoning by analogy on natural language concepts. By reducing the dimensionality of affective common-sense knowledge, the model allows semantic features associated with concepts to be generalized and, hence, allows concepts to be intuitively clustered according to their semantic and affective relatedness. Such an affective intuition (so called because it does not rely on explicit features, but rather on implicit analogies) enables the inference of emotions and polarity conveyed by multiword expressions, thus achieving efficient concept-level sentiment analysis.

EAAI Journal 2015 Journal Article

Symbolic planning and control using game theory and grammatical inference

Jie Fu
Herbert G. Tanner
Jeffrey N. Heinz
Konstantinos Karydis
Jane Chandlee
Cesar Koirala

IROS Conference 2013 Conference Paper

Recognizing context-aware activities of daily living using RGBD sensor

Jie Fu
Chengyin Liu
Yen-Pin Hsu
Li-Chen Fu

In this paper, we propose a Bayesian conditional probability with latent-structure model for context-aware activities of daily living (ADL) recognition. The proposed ADL recognition system takes RGBD sensor (Microsoft Kinect) as the input device. In ADL recognition, the object interacted with human is a sort of important context as well as human action. To better understand the activity, we model the interacted object and the human action together. As far as we known, many related works failed to take into account the relation between the context information and human action features, instead, most of them only consider the human action features, causing ambiguity in classifying the activities with similar human actions. In this paper, the context information and human action features are taken into consideration, concurrently, so that the performance of recognition can be greatly improved from previous works as has been demonstrated in our experimental results.