Arrow Research search

Author name cluster

Hao He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

YNIMG Journal 2026 Journal Article

Compensatory and impaired trust updating in mild cognitive impairment: Evidence from computational modeling and fMRI

  • Yiqi Chen
  • Hao He
  • Yiyang Ding
  • Wuhai Tao
  • Qing Guan
  • Frank Krueger

Trust dynamics, how trust is formed, maintained, and adjusted, are essential to interpersonal functioning. Older adults with mild cognitive impairment (MCI) are known to exhibit social vulnerabilities, but the dynamic updating of trust during social interactions and its neural basis in this population remain unclear. Here, we combined computational modeling with task-based functional magnetic resonance imaging (fMRI) to investigate trust updating in 39 older adults with MCI (mean age = 67.8 years) compared to 45 normal healthy controls (NHC, mean age = 67.2 years). At the behavioral level, MCI participants showed slower trust reduction, larger prediction errors (PE), lower learning rates, and greater interference when interacting with non-cooperative partners, while responding similarly to cooperative ones compared to NHC. At the neural level, fMRI analyses revealed that, relative to the NHC group, MCI participants exhibited compensatory recruitment of central executive and default mode network regions during cooperative interactions. Conversely, during non-cooperative interactions, the MCI group showed reduced activation in social and executive cognition-related regions compared to controls. Critically, PE-modulated psychophysiological interaction analyses revealed diminished functional connectivity between the superior frontal gyrus and temporoparietal junction under non-cooperative conditions in the MCI group. Our findings suggest that while older adults with MCI can recruit compensatory neural resources during supportive interactions, they struggle to adaptively update trust when facing adverse social contexts. This impaired updating may underlie their heightened susceptibility to social exploitation and declining interpersonal functioning.

AAAI Conference 2026 Conference Paper

Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities

  • Weixiang Zhao
  • Xingyu Sui
  • Jiahe Guo
  • Yulin Hu
  • Yang Deng
  • Yanyan Zhao
  • Xuda Zhi
  • Yongbo Huang

Recent advancements in Large Reasoning Models (LRMs), such as OpenAI's o1/o3 and DeepSeek-R1, have demonstrated remarkable performance in specialized reasoning tasks through human-like deliberative thinking and long chain-of-thought reasoning. However, our systematic evaluation across various model families (DeepSeek, Qwen, and LLaMA) and scales (7B to 32B) reveals that acquiring these deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs, including notable declines in helpfulness and harmlessness, alongside substantially increased inference costs. Importantly, we demonstrate that adaptive reasoning---employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking---can effectively alleviate these drawbacks. Our empirical insights underline the critical need for developing more versatile LRMs capable of dynamically allocating inference-time compute according to specific task characteristics.

NeurIPS Conference 2025 Conference Paper

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

  • Shanchuan Lin
  • Ceyuan Yang
  • Hao He
  • Jianwen Jiang
  • Yuxi Ren
  • Xin Xia
  • Yang Zhao
  • Xuefeng Xiao

Existing large-scale video generation models are computationally intensive, preventing adoption in real-time and interactive applications. In this work, we propose autoregressive adversarial post-training (AAPT) to turn a pre-trained latent video diffusion model into a real-time, interactive, streaming video generator. Our model autoregressively generates a latent frame at a time using a single neural function evaluation (1NFE). The model can stream the result to the user in real time and receive interactive responses as control to generate the next latent frame. Unlike existing approaches, our method explores adversarial training as an effective paradigm for autoregressive generation. This allows us to design a more efficient architecture for one-step generation and to train the model in a student-forcing way to mitigate error accumulation. The adversarial approach also enables us to train the model for long-duration generation fully utilizing the KV cache. As a result, our 8B model achieves real-time, 24fps, nonstop, streaming video generation at 736x416 resolution on a single H100, or 1280x720 on 8xH100 up to a minute long (1440 frames).

ICLR Conference 2025 Conference Paper

CameraCtrl: Enabling Camera Control for Video Diffusion Models

  • Hao He
  • Yinghao Xu 0001
  • Yuwei Guo 0002
  • Gordon Wetzstein
  • Bo Dai 0002
  • Hongsheng Li 0001
  • Ceyuan Yang

Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely. Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce \method, enabling accurate camera pose control for video diffusion models. Our approach explores effective camera trajectory parameterization along with a plug-and-play camera pose control module that is trained on top of a video diffusion model, leaving other modules of the base model untouched. Moreover, a comprehensive study on the effect of various training datasets is conducted, suggesting that videos with diverse camera distributions and similar appearance to the base model indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of \method in achieving precise camera control with different video generation models, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs.

EAAI Journal 2025 Journal Article

Category knowledge-guided few-shot bearing fault diagnosis

  • Feng Zhan
  • Lingkai Hu
  • Wenkai Huang
  • Yikai Dong
  • Hao He
  • Guanjun Wu

Real-time bearing fault diagnosis plays a vital role in maintaining the safety and reliability of sophisticated industrial systems. However, the scarcity of labeled data in fault diagnosis, due to the difficulty of collecting fault samples and the high cost of labeling, poses a significant challenge in learning discriminative fault features from limited and complex monitoring signals. Few-shot learning (FSL) emerges as a potent method for extracting and accurately classifying features from severe fault signals. Nonetheless, challenges such as data scarcity and environmental noise significantly impede the efficacy of existing FSL methods in diagnosing incipient faults effectively. These limitations are primarily due to the inadequate consideration of inter-class correlations within noisy contexts by current FSL strategies, which restricts their ability to extrapolate familiar features to new classes. Consequently, there is a pressing demand for an FSL approach that can exploit inter-class correlations to address the hurdles of data insufficiency and environmental complexities, thereby facilitating the diagnosis of incipient faults in few-shot settings. This paper proposes a novel category-knowledge-guided model tailored for few-shot multi-task scenarios. By leveraging attribute data from base categories and the similarities across new class samples, our model efficiently establishes mapping relations for unencountered tasks, significantly enhancing its generalization capabilities for early-stage fault diagnosis and multi-task applications. This model ensures swift and precise FSL fault diagnosis under uncharted operational conditions. Comparative analyses utilizing the Case Western Reserve University bearing dataset and the Early Mild Fault Traction Motor bearing dataset demonstrate our model’s superior performance against leading FSL and transfer learning approaches.

NeurIPS Conference 2025 Conference Paper

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

  • Han Xiao
  • Guozhi Wang
  • Yuxiang Chai
  • Zimu Lu
  • Weifeng Lin
  • Hao He
  • Lue Fan
  • Liuyang Bian

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently processes historical context and unifies action-level and task-level rewards. To support the training of UI-Genie-RM, we develop deliberately-designed data generation strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI-Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory generation without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https: //github. com/Euphoria16/UI-Genie.

NeurIPS Conference 2025 Conference Paper

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

  • Jiaming Han
  • Hao Chen
  • Yang Zhao
  • Hanyu Wang
  • Qi Zhao
  • Ziyan Yang
  • Hao He
  • Xiangyu Yue

This paper presents a multimodal framework that attempts to unify visual understanding and generation within a shared discrete semantic representation. At its core is the Text-Aligned Tokenizer (TA-Tok), which converts images into discrete tokens using a text-aligned codebook projected from a large language model's (LLM) vocabulary. By integrating vision and text into a unified space with an expanded vocabulary, our multimodal LLM, Tar, enables cross-modal input and output through a shared interface, without the need for modality-specific designs. Additionally, we propose scale-adaptive encoding and decoding to balance efficiency and visual detail, along with a generative de-tokenizer to produce high-fidelity visual outputs. To address diverse decoding needs, we utilize two complementary de-tokenizers: a fast autoregressive model and a diffusion-based model. To enhance modality fusion, we investigate advanced pre-training tasks, demonstrating improvements in both visual understanding and generation. Experiments across benchmarks show that Tar matches or surpasses existing multimodal LLM methods, achieving faster convergence and greater training efficiency. All code, models, and data will be made publicly available.

AAAI Conference 2024 Conference Paper

Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis

  • Caoyun Fan
  • Jindou Chen
  • Yaohui Jin
  • Hao He

Game theory, as an analytical tool, is frequently utilized to analyze human behavior in social science research. With the high alignment between the behavior of Large Language Models (LLMs) and humans, a promising research direction is to employ LLMs as substitutes for humans in game experiments, enabling social science research. However, despite numerous empirical researches on the combination of LLMs and game theory, the capability boundaries of LLMs in game theory remain unclear. In this research, we endeavor to systematically analyze LLMs in the context of game theory. Specifically, rationality, as the fundamental principle of game theory, serves as the metric for evaluating players' behavior --- building a clear desire, refining belief about uncertainty, and taking optimal actions. Accordingly, we select three classical games (dictator game, Rock-Paper-Scissors, and ring-network game) to analyze to what extent LLMs can achieve rationality in these three aspects. The experimental results indicate that even the current state-of-the-art LLM (GPT-4) exhibits substantial disparities compared to humans in game theory. For instance, LLMs struggle to build desires based on uncommon preferences, fail to refine belief from many simple patterns, and may overlook or modify refined belief when taking actions. Therefore, we consider that introducing LLMs into game experiments in the field of social science should be approached with greater caution.

NeurIPS Conference 2024 Conference Paper

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

  • Zhengfei Kuang
  • Shengqu Cai
  • Hao He
  • Yinghao Xu
  • Hongsheng Li
  • Leonidas J. Guibas
  • Gordon Wetzstein

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories take an important step towards this goal. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments.

YNIMG Journal 2024 Journal Article

Connectome-based prediction of decreased trust propensity in older adults with mild cognitive impairment: A resting-state functional magnetic resonance imaging study

  • Yiqi Chen
  • Hao He
  • Yiyang Ding
  • Wuhai Tao
  • Qing Guan
  • Frank Krueger

Trust propensity (TP) relies more on social than economic rationality to transform the perceived probability of betrayal into positive reciprocity expectations in older adults with normal cognition. While deficits in social rationality have been observed in older adults with mild cognitive impairment (MCI), there is limited research on TP and its associated resting-state functional connectivity (RSFC) mechanisms in this population. To measure TP and related psychological functions (affect, motivation, executive cognition, and social cognition), MCI (n = 42) and normal healthy control (NHC, n = 115) groups completed a one-shot trust game and additional assessments of related psychological functions. RSFC associated with TP was analyzed using connectome-based predictive modeling (CPM) and lesion simulations. Our behavioral results showed that the MCI group trusted less (i.e., had lower TP) than the NHC group, with lower TP associated with higher sensitivity to the probability of betrayal in the MCI group. In the MCI group, only negative CPM models (RSFC negatively correlated with TP) significantly predicted TP, with a high salience network (SN) contribution. In contrast, in the NHC group, positive CPM models (RSFC positively correlated with TP) significantly predicted TP, with a high contribution from the default mode network (DMN). In addition, the total network strength of the NHC-specific positive network was lower in the MCI group than in the NHC group. Our findings demonstrated a decrease in TP in the MCI group compared to the NHC group, which is associated with deficits in social rationality (social cognition, associated with DMN) and increased sensitivity to betrayal (affect, associated with SN) in a trust dilemma. In conclusion, our study contributes to understanding MCI-related alterations in trust and their underlying neural mechanisms.

ICLR Conference 2023 Conference Paper

FedDAR: Federated Domain-Aware Representation Learning

  • Aoxiao Zhong
  • Hao He
  • Zhaolin Ren
  • Na Li 0002
  • Quanzheng Li

Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods. Our code is available at https://github.com/zlz0414/FedDAR.

AAAI Conference 2023 Conference Paper

Latent Constraints on Unsupervised Text-Graph Alignment with Information Asymmetry

  • Jidong Tian
  • Wenqing Chen
  • Yitian Li
  • Caoyun Fan
  • Hao He
  • Yaohui Jin

Unsupervised text-graph alignment (UTGA) is a fundamental task that bidirectionally generates texts and graphs without parallel data. Most available models of UTGA suffer from information asymmetry, a common phenomenon that texts and graphs include additional information invisible to each other. On the one hand, these models fail to supplement asymmetric information effectively due to the lack of ground truths. On the other hand, it is challenging to indicate asymmetric information with explicit indicators because it cannot be decoupled from the data directly. To address the challenge posed by information asymmetry, we propose the assumption that asymmetric information is encoded in unobservable latent variables and only affects the one-way generation processes. These latent variables corresponding to asymmetric information should obey prior distributions recovered approximately from original data. Therefore, we first propose a taxonomy of the latent variable that classifies the latent variable into transferrable (TV) and non-transferable (NTV) variables and further distinguish NTV as the dependent variable (DV) and the independent variable (IV). Next, we propose three latent VAE-based regularizations on TV, DV, and IV to constrain their distributions to well-designed prior distributions to introduce asymmetric information into models and enhance the preservation of shared contents. Finally, we impose the three proposed constraints on a cycle-consistent learning framework, back-translation (BT), named ConstrainedBT. Experimental results on three UTGA tasks demonstrate the effectiveness of ConstrainedBT on the information-asymmetric challenge.

AAAI Conference 2023 Conference Paper

Preference-Controlled Multi-Objective Reinforcement Learning for Conditional Text Generation

  • Wenqing Chen
  • Jidong Tian
  • Caoyun Fan
  • Yitian Li
  • Hao He
  • Yaohui Jin

Conditional text generation is to generate text sequences conditioning on linguistic or non-linguistic data. The main line of existing work proposed deterministic models to improve the fidelity of the generated text but often ignored the diversity. Another line relied on conditional variational auto-encoders (CVAEs), which increased the diversity over their deterministic backbones. However, CVAEs regard diversity as an implicit objective and may not be optimal. In this paper, we raise two questions: i) Can diversity be further improved with an explicit objective? ii) Since fidelity and diversity are two conflicting objectives, how can we obtain different multi-objective optimal solutions according to user preferences? To answer question i), we propose a multi-objective reinforcement learning (MORL) method which explicitly takes CIDEr and Self-CIDEr scores as the fidelity-oriented and diversity-oriented rewards respectively. To answer question ii), we propose a preference-controlled MORL method, which can obtain infinite multi-objective optimal solutions by tuning the preference variable. We conduct extensive experiments on paraphrasing and image captioning tasks, which show that in the fidelity-diversity trade-off space, our model outperforms both deterministic and CVAE-based baselines.

NeurIPS Conference 2023 Conference Paper

ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction

  • Yixun Liang
  • Hao He
  • Yingcong Chen

Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable $\textit{meta-ray token}$ and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views. Such improvements result in accurate surface assessment with high confidence. We demonstrate the effectiveness of our approach on various datasets, showcasing how our method outperforms the current state-of-the-art approaches in terms of reconstruction quality and generalization ability. $\textit{Our code is available at }$ https: //github. com/YixunLiang/ReTR.

NeurIPS Conference 2023 Conference Paper

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

  • Fei Zhang
  • Tianfei Zhou
  • Boyang Li
  • Hao He
  • Chaofan Ma
  • Tianjiao Zhang
  • Jiangchao Yao
  • Ya Zhang

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i. e. , employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v. s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens. This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness. Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns. Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets.

AAAI Conference 2022 Conference Paper

Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate

  • Lu Mi
  • Hao Wang
  • Yonglong Tian
  • Hao He
  • Nir N Shavit

Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas. However, most state-of-the-art deep learning models either fail to obtain uncertainty estimation or need significant modification (e. g. , formulating a proper Bayesian treatment) to obtain it. Most previous methods are not able to take an arbitrary model off the shelf and generate uncertainty estimation without retraining or redesigning it. To address this gap, we perform a systematic exploration into training-free uncertainty estimation for dense regression, an unrecognized yet important problem, and provide a theoretical construction justifying such estimations. We propose three simple and scalable methods to analyze the variance of outputs from a trained network under tolerable perturbations: infer-transformation, infer-noise, and infer-dropout. They operate solely during the inference, without the need to re-train, re-design, or fine-tune the models, as typically required by state-of-the-art uncertainty estimation methods. Surprisingly, even without involving such perturbations in training, our methods produce comparable or even better uncertainty estimation when compared to training-required state-of-the-art methods. Code is available at https: //github. com/lumi9587/train-free-uncertainty.

AAAI Conference 2022 Conference Paper

Weakly Supervised Neural Symbolic Learning for Cognitive Tasks

  • Jidong Tian
  • Yitian Li
  • Wenqing Chen
  • Liqiang Xiao
  • Hao He
  • Yaohui Jin

Despite the recent success of end-to-end deep neural networks, there are growing concerns about their lack of logical reasoning abilities, especially on cognitive tasks with perception and reasoning processes. A solution is the neural symbolic learning (NeSyL) method that can effectively utilize pre-defined logic rules to constrain the neural architecture making it perform better on cognitive tasks. However, it is challenging to apply NeSyL to these cognitive tasks because of the lack of supervision, the non-differentiable manner of the symbolic system, and the difficulty to probabilistically constrain the neural network. In this paper, we propose WS-NeSyL, a Weakly Supervised Neural Symbolic Learning model for cognitive tasks with logical reasoning. First, WS-NeSyL employs a novel back search algorithm to sample the possible reasoning process through logic rules. This sampled process can supervise the neural network as the pseudo label. Based on this algorithm, we can backpropagate gradients to the neural network of WS-NeSyL in a weakly supervised manner. Second, we introduce a probabilistic logic regularization into WS-NeSyL to help the neural network learn probabilistic logic. To evaluate WS-NeSyL, we have conducted experiments on three cognitive datasets, including temporal reasoning, handwritten formula recognition, and relational reasoning datasets. Experimental results show that WS-NeSyL not only outperforms the end-to-end neural model but also beats the state-of-the-art neural symbolic learning models.

IJCAI Conference 2021 Conference Paper

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

  • Wenqing Chen
  • Jidong Tian
  • Caoyun Fan
  • Hao He
  • Yaohui Jin

Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would miss parts of important information. From a causal perspective, the reason is that models have captured spurious statistical correlations between visual features and certain expressions (e. g. , visual features of "long hair" and "woman"). In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI). Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning. The intermediate task would help the model better understand the visual features and thus alleviate the content inconsistency problem. Secondly, we apply Pearl's do-calculus on the model, cutting off the link between the visual features and possible confounders and thus letting models focus on the causal visual features. Specifically, the high-frequency concept set is considered as the proxy confounders where the real confounders are inferred in the continuous space. Finally, we use a multi-agent reinforcement learning (MARL) strategy to enable end-to-end training and reduce the inter-task error accumulations. The extensive experiments show that our model outperforms the baseline models and achieves competitive performance with state-of-the-art models.

AAAI Conference 2021 Conference Paper

Synchronous Interactive Decoding for Multilingual Neural Machine Translation

  • Hao He
  • Qian Wang
  • Zhipeng Yu
  • Yang Zhao
  • Jiajun Zhang
  • Chengqing Zong

To simultaneously translate a source language into multiple different target languages is one of the most common scenarios of multilingual translation. However, existing methods cannot make full use of translation model information during decoding, such as intra-lingual and inter-lingual future information, and therefore may suffer from some issues like the unbalanced outputs. In this paper, we present a new approach for synchronous interactive multilingual neural machine translation (SimNMT), which predicts each target language output simultaneously and interactively using historical and future information of all target languages. Specifically, we first propose a synchronous cross-interactive decoder in which generation of each target output does not only depend on its generated sequences, but also relies on its future information, as well as history and future contexts of other target languages. Then, we present a new interactive multilingual beam search algorithm that enables synchronous interactive decoding of all target languages in a single model. We take two target languages as an example to illustrate and evaluate the proposed SimNMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and M- NMT models.

AAAI Conference 2020 Conference Paper

Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning

  • Liqiang Xiao
  • Lu Wang
  • Hao He
  • Yaohui Jin

Jointly using the extractive and abstractive summarization methods can combine their complementary advantages, generating both informative and concise summary. Existing methods that adopt an extract-then-abstract strategy have achieved impressive results, yet they suffer from the information loss in the abstraction step because they compress all the selected sentences without distinguish. Especially when the whole sentence is summary-worthy, salient content would be lost by compression. To address this problem, we propose HYSUM, a hybrid framework for summarization that can flexibly switch between copying sentence and rewriting sentence according to the degree of redundancy. In this way, our approach can effectively combine the advantages of two branches of summarization, juggling informativity and conciseness. Moreover, we based on Hierarchical Reinforcement Learning, propose an end-to-end reinforcing method to bridge together the extraction module and rewriting module, which can enhance the cooperation between them. Automatic evaluation shows that our approach significantly outperforms the state-of-the-arts on the CNN/DailyMail corpus. Human evaluation also demonstrates that our generated summaries are more informative and concise than popular models.

AAAI Conference 2019 Conference Paper

Bidirectional Inference Networks:A Class of Deep Bayesian Networks for Health Profiling

  • Hao Wang
  • Chengzhi Mao
  • Hao He
  • Mingmin Zhao
  • Tommi S. Jaakkola
  • Dina Katabi

We consider the problem of inferring the values of an arbitrary set of variables (e. g. , risk of diseases) given other observed variables (e. g. , symptoms and diagnosed diseases) and high-dimensional signals (e. g. , MRI images or EEG). This is a common problem in healthcare since variables of interest often differ for different patients. Existing methods including Bayesian networks and structured prediction either do not incorporate high-dimensional signals or fail to model conditional dependencies among variables. To address these issues, we propose bidirectional inference networks (BIN), which stich together multiple probabilistic neural networks, each modeling a conditional dependency. Predictions are then made via iteratively updating variables using backpropagation (BP) to maximize corresponding posterior probability. Furthermore, we extend BIN to composite BIN (CBIN), which involves the iterative prediction process in the training stage and improves both accuracy and computational efficiency by adaptively smoothing the optimization landscape. Experiments on synthetic and real-world datasets (a sleep study and a dermatology dataset) show that CBIN is a single model that can achieve state-of-the-art performance and obtain better accuracy in most inference tasks than multiple models each specifically trained for a different task.

IJCAI Conference 2019 Conference Paper

TransMS: Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics

  • Shihui Yang
  • Jidong Tian
  • Honglun Zhang
  • Junchi Yan
  • Hao He
  • Yaohui Jin

Knowledge graph embedding, which projects the symbolic relations and entities onto low-dimension continuous spaces, is essential to knowledge graph completion. Recently, translation-based embedding models (e. g. TransE) have aroused increasing attention for their simplicity and effectiveness. These models attempt to translate semantics from head entities to tail entities with the relations and infer richer facts outside the knowledge graph. In this paper, we propose a novel knowledge graph embedding method named TransMS, which translates and transmits multidirectional semantics: i) the semantics of head/tail entities and relations to tail/head entities with nonlinear functions and ii) the semantics from entities to relations with linear bias vectors. Our model has merely one additional parameter α than TransE for each triplet, which results in its better scalability in large-scale knowledge graph. Experiments show that TransMS achieves substantial improvements against state-of-the-art baselines, especially the Hit@10s of head entity prediction for N-1 relations and tail entity prediction for 1-N relations improved by about 27. 1% and 24. 8% on FB15K database respectively.

UAI Conference 2019 Conference Paper

Truly Proximal Policy Optimization

  • Yuhui Wang 0004
  • Hao He
  • Xiaoyang Tan

Proximal policy optimization (PPO) is one of the most successful deep reinforcement learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the probability ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Trust Region-based PPO with Rollback (TR-PPO-RB). Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the ratio between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, which is theoretically justified according to the trust region theorem. It seems, by adhering more truly to the “proximal” property − restricting the policy within the trust region, the new algorithm improves the original PPO on both stability and sample efficiency.

NeurIPS Conference 2019 Conference Paper

Trust Region-Guided Proximal Policy Optimization

  • Yuhui Wang
  • Hao He
  • Xiaoyang Tan
  • Yaozhong Gan

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method.

NeurIPS Conference 2017 Conference Paper

From Bayesian Sparsity to Gated Recurrent Nets

  • Hao He
  • Bo Xin
  • Satoshi Ikehata
  • David Wipf

The iterations of many first-order algorithms, when applied to minimizing common regularized regression functions, often resemble neural network layers with pre-specified weights. This observation has prompted the development of learning-based approaches that purport to replace these iterations with enhanced surrogates forged as DNN models from available training data. For example, important NP-hard sparse estimation problems have recently benefitted from this genre of upgrade, with simple feedforward or recurrent networks ousting proximal gradient-based iterations. Analogously, this paper demonstrates that more powerful Bayesian algorithms for promoting sparsity, which rely on complex multi-loop majorization-minimization techniques, mirror the structure of more sophisticated long short-term memory (LSTM) networks, or alternative gated feedback networks previously designed for sequence prediction. As part of this development, we examine the parallels between latent variable trajectories operating across multiple time-scales during optimization, and the activations within deep network structures designed to adaptively model such characteristic sequences. The resulting insights lead to a novel sparse estimation system that, when granted training data, can estimate optimal solutions efficiently in regimes where other algorithms fail, including practical direction-of-arrival (DOA) and 3D geometry recovery problems. The underlying principles we expose are also suggestive of a learning process for a richer class of multi-loop algorithms in other domains.

YNIMG Journal 2015 Journal Article

A group ICA based framework for evaluating resting fMRI markers when disease categories are unclear: application to schizophrenia, bipolar, and schizoaffective disorders

  • Yuhui Du
  • Godfrey D. Pearlson
  • Jingyu Liu
  • Jing Sui
  • Qingbao Yu
  • Hao He
  • Eduardo Castro
  • Vince D. Calhoun

Schizophrenia (SZ), bipolar disorder (BP) and schizoaffective disorder (SAD) share some common symptoms, and there is still a debate about whether SAD is an independent category. To the best of our knowledge, no study has been done to differentiate these three disorders or to investigate the distinction of SAD as an independent category using fMRI data. This study is aimed to explore biomarkers from resting-state fMRI networks for differentiating these disorders and investigate the relationship among these disorders based on fMRI networks with an emphasis on SAD. Firstly, a novel group ICA method, group information guided independent component analysis (GIG-ICA), was applied to extract subject-specific brain networks from fMRI data of 20 healthy controls (HC), 20 SZ patients, 20 BP patients, 20 patients suffering from SAD with manic episodes (SADM), and 13 patients suffering from SAD with depressive episodes exclusively (SADD). Then, five-level one-way analysis of covariance and multiclass support vector machine recursive feature elimination were employed to identify discriminative regions from the networks. Subsequently, the t-distributed stochastic neighbor embedding (t-SNE) projection and the hierarchical clustering were implemented to investigate the relationship among those groups. Finally, to evaluate the generalization ability, 16 new subjects were classified based on the found regions and the trained model using original 93 subjects. Results show that the discriminative regions mainly included frontal, parietal, precuneus, cingulate, supplementary motor, cerebellar, insula and supramarginal cortices, which performed well in distinguishing different groups. SADM and SADD were the most similar to each other, although SADD had greater similarity to SZ compared to other groups, which indicates that SAD may be an independent category. BP was closer to HC compared with other psychotic disorders. In summary, resting-state fMRI brain networks extracted via GIG-ICA provide a promising potential to differentiate SZ, BP, and SAD.

YNIMG Journal 2015 Journal Article

Assessing dynamic brain graphs of time-varying connectivity in fMRI data: Application to healthy controls and patients with schizophrenia

  • Qingbao Yu
  • Erik B. Erhardt
  • Jing Sui
  • Yuhui Du
  • Hao He
  • Devon Hjelm
  • Mustafa S. Cetin
  • Srinivas Rachakonda

Graph theory-based analysis has been widely employed in brain imaging studies, and altered topological properties of brain connectivity have emerged as important features of mental diseases such as schizophrenia. However, most previous studies have focused on graph metrics of stationary brain graphs, ignoring that brain connectivity exhibits fluctuations over time. Here we develop a new framework for accessing dynamic graph properties of time-varying functional brain connectivity in resting-state fMRI data and apply it to healthy controls (HCs) and patients with schizophrenia (SZs). Specifically, nodes of brain graphs are defined by intrinsic connectivity networks (ICNs) identified by group independent component analysis (ICA). Dynamic graph metrics of the time-varying brain connectivity estimated by the correlation of sliding time-windowed ICA time courses of ICNs are calculated. First- and second-level connectivity states are detected based on the correlation of nodal connectivity strength between time-varying brain graphs. Our results indicate that SZs show decreased variance in the dynamic graph metrics. Consistent with prior stationary functional brain connectivity works, graph measures of identified first-level connectivity states show lower values in SZs. In addition, more first-level connectivity states are disassociated with the second-level connectivity state which resembles the stationary connectivity pattern computed by the entire scan. Collectively, the findings provide new evidence about altered dynamic brain graphs in schizophrenia, which may underscore the abnormal brain performance in this mental illness.

YNIMG Journal 2013 Journal Article

Three-way (N-way) fusion of brain imaging data based on mCCA+jICA and its application to discriminating schizophrenia

  • Jing Sui
  • Hao He
  • Godfrey D. Pearlson
  • Tülay Adali
  • Kent A. Kiehl
  • Qingbao Yu
  • Vince P. Clark
  • Eduardo Castro

Multimodal fusion is an effective approach to better understand brain diseases. However, most such instances have been limited to pair-wise fusion; because there are often more than two imaging modalities available per subject, there is a need for approaches that can combine multiple datasets optimally. In this paper, we extended our previous two-way fusion model called “multimodal CCA+joint ICA”, to three or N-way fusion, that enables robust identification of correspondence among N data types and allows one to investigate the important question of whether certain disease risk factors are shared or distinct across multiple modalities. We compared “mCCA+jICA” with its alternatives in a 3-way fusion simulation and verified its advantages in both decomposition accuracy and modal linkage detection. We also applied it to real functional Magnetic Resonance Imaging (fMRI)–Diffusion Tensor Imaging (DTI) and structural MRI fusion to elucidate the abnormal architecture underlying schizophrenia (n=97) relative to healthy controls (n=116). Both modality-common and modality-unique abnormal regions were identified in schizophrenia. Specifically, the visual cortex in fMRI, the anterior thalamic radiation (ATR) and forceps minor in DTI, and the parietal lobule, cuneus and thalamus in sMRI were linked and discriminated between patients and controls. One fMRI component with regions of activity in motor cortex and superior temporal gyrus individually discriminated schizophrenia from controls. Finally, three components showed significant correlation with duration of illness (DOI), suggesting that lower gray matter volumes in parietal, frontal, and temporal lobes and cerebellum are associated with increased DOI, along with white matter disruption in ATR and cortico-spinal tracts. Findings suggest that the identified fractional anisotropy changes may relate to the corresponding functional/structural changes in the brain that are thought to play a role in the clinical expression of schizophrenia. The proposed “mCCA+jICA” method showed promise for elucidating the joint or coupled neuronal abnormalities underlying mental illnesses and improves our understanding of the disease process.