Author name cluster

Sheng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

Let’s Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts

Xu Liu
Yongheng Zhang
Qiguang Chen
Yao Li
Sheng Wang
Libo Qin

Recently, Interleaved-modal Chain-of-Thought (ICoT) reasoning has achieved remarkable success by leveraging both multimodal inputs and outputs, attracting increasing attention. While achieving promising performance, current ICoT methods still suffer from two major limitations: (1) Static Visual Thought Positioning, which statically inserts visual information at fixed steps, resulting in inefficient and inflexible reasoning; and (2) Broken Visual Thought Representation, which involves discontinuous and semantically incoherent visual tokens. To address these limitations, we introduce Interleaved-modal Chain-of-Thought reasoning with Dynamic and Precise Visual Thoughts (DaP-ICoT), which incorporates two key components: (1) Dynamic Visual Thought Integration adaptively introduces visual inputs based on reasoning needs, reducing redundancy and improving efficiency. (2) Precise Visual Thought Guidance ensures visual semantically coherent and contextually aligned representations. Experiments across multiple benchmarks and models demonstrate that DaP-ICoT achieves state-of-the-art performance. In addition, DaP-ICoT significantly reduces the number of inserted images, leading to a 72.6% decrease in token consumption, enabling more efficient ICoT reasoning.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration

Qintong Li
Jiahui Gao
Sheng Wang
Renjie Pi
Xueliang Zhao
Chuan Wu
Xin Jiang
Zhenguo Li

Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data, leading to impressive performance across a range of downstream applications. Current methods often rely on human-annotated data or predefined task templates to direct powerful LLMs in synthesizing task-relevant data for effective model training. However, this dependence on manually designed components may constrain the scope of generated data, potentially overlooking critical edge cases or novel scenarios that could challenge the model. In this paper, we present a novel approach, ReverseGen, designed to automatically generate effective training samples that expose the weaknesses of LLMs. Specifically, we introduce a dedicated proposer trained to produce queries that lead target models to generate unsatisfactory responses. These failure-inducing queries are then used to construct training data, helping to address the models' shortcomings and improve overall performance. Our approach is flexible and can be applied to models of various scales (3B, 7B, and 8B). We evaluate ReverseGen on three key applications—safety, honesty, and math—demonstrating that our generated data is both highly effective and diverse. Models fine-tuned with ReverseGen-generated data consistently outperform those trained on human-annotated or general model-generated data, offering a new perspective on data synthesis for task-specific LLM enhancement.

Details

ICLR Conference 2025 Conference Paper

Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension

Jiahan Li
Tong Chen
Shitong Luo
Chaoran Cheng
Jiaqi Guan
Ruihan Guo
Sheng Wang
Ge Liu

Peptides, short chains of amino acids, interact with target proteins, making them a unique class of protein-based therapeutics for treating human diseases. Recently, deep generative models have shown great promise in peptide generation. However, several challenges remain in designing effective peptide binders. First, not all residues contribute equally to peptide-target interactions. Second, the generated peptides must adopt valid geometries due to the constraints of peptide bonds. Third, realistic tasks for peptide drug development are still lacking. To address these challenges, we introduce PepHAR, a hot-spot-driven autoregressive generative model for designing peptides targeting specific proteins. Building on the observation that certain hot spot residues have higher interaction potentials, we first use an energy-based density model to fit and sample these key residues. Next, to ensure proper peptide geometry, we autoregressively extend peptide fragments by estimating dihedral angles between residue frames. Finally, we apply an optimization process to iteratively refine fragment assembly, ensuring correct peptide structures. By combining hot spot sampling with fragment-based extension, our approach enables \textit{de novo} peptide design tailored to a target protein and allows the incorporation of key hot spot residues into peptide scaffolds. Extensive experiments, including peptide design and peptide scaffold generation, demonstrate the strong potential of PepHAR in computational peptide binder design. The source code will be available at https://github.com/Ced3-han/PepHAR.

Details

ICLR Conference 2025 Conference Paper

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Peng Xia 0005
Kangyu Zhu
Haoran Li 0011
Tianze Wang
Weijia Shi
Sheng Wang
Linjun Zhang
James Y. Zou

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. Our approach introduces a domain-aware retrieval mechanism, an adaptive retrieved contexts selection, and a provable RAG-based preference fine-tuning strategy. These innovations make the RAG process sufficiently general and reliable, significantly improving alignment when introducing retrieved contexts. Experimental results across five medical datasets (involving radiology, ophthalmology, pathology) on medical VQA and report generation demonstrate that MMed-RAG can achieve an average improvement of 43.8% in factual accuracy in the factual accuracy of Med-LVLMs.

Details

ICML Conference 2025 Conference Paper

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

Kangyu Zhu
Peng Xia 0005
Yun Li 0010
Hongtu Zhu
Sheng Wang
Huaxiu Yao

The advancement of Large Vision-Language Models (LVLMs) has propelled their application in the medical field. However, Medical LVLMs (Med-LVLMs) encounter factuality challenges due to modality misalignment, where the models prioritize textual knowledge over visual input, leading to hallucinations that contradict information in medical images. Previous attempts to enhance modality alignment in Med-LVLMs through preference optimization have inadequately addressed clinical relevance in preference data, making these samples easily distinguishable and reducing alignment effectiveness. In response, we propose MMedPO, a novel multimodal medical preference optimization approach that considers the clinical relevance of preference samples to enhance Med-LVLM alignment. MMedPO curates multimodal preference data by introducing two types of dispreference: (1) plausible hallucinations injected through target Med-LVLMs or GPT-4o to produce medically inaccurate responses, and (2) lesion region neglect achieved through local lesion-noising, disrupting visual understanding of critical areas. We then calculate clinical relevance for each sample based on scores from multiple Med-LLMs and visual tools, enabling effective alignment. Our experiments demonstrate that MMedPO significantly enhances factual accuracy in Med-LVLMs, achieving substantial improvements over existing preference optimization methods by 14. 2% and 51. 7% on the Med-VQA and report generation tasks, respectively. Our code are available in https: //github. com/aiming-lab/MMedPOhttps: //github. com/aiming-lab/MMedPO.

Details

ICLR Conference 2025 Conference Paper

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang
Liheng Chen
Pengan Chen
Jingwei Dong
Boyang Xue
Jiyue Jiang
Lingpeng Kong
Chuan Wu

The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially concatenating them to low-rank matrices. Hence, it retains all the advantages of LoRA while offering enhanced parameter efficiency, and effectively circumvents the drawbacks of peer parameter-sharing methods. Our empirical experiments demonstrate approximately $8\times$ parameter savings in a standard LoRA setting. The ablation study confirms the significance of each component. Our insights into parameter sharing and MoS method may illuminate future developments of more parameter-efficient finetuning methods. The code is officially available at https://github.com/Forence1999/MoS.

Details

AAAI Conference 2025 Conference Paper

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

Yitao Zhu
Sheng Wang
Mengjie Xu
Zixu Zhuang
Zhixin Wang
Kaidong Wang
Han Zhang
Qian Wang

Multiple cameras can provide comprehensive multi-view video coverage of a person. Fusing this multi-view data is crucial for tasks like behavioral analysis, although it traditionally requires camera calibration—a process that is often complex. Moreover, previous studies have overlooked the challenges posed by self-occlusion under multiple views and the continuity of human body shape estimation. In this study, we introduce a method to reconstruct the 3D human body from multiple uncalibrated camera views. Initially, we utilize a pre-trained human body encoder to process each camera view individually, enabling the reconstruction of human body models and parameters for each view along with predicted camera positions. Rather than merely averaging the models across views, we develop a neural network trained to assign weights to individual views for all human body joints, based on the estimated distribution of joint distances from each camera. Additionally, we focus on the mesh surface of the human body for dynamic fusion, allowing for the seamless integration of facial expressions and body shape into a unified human body model. Our method has shown excellent performance in reconstructing the human body on two public datasets, advancing beyond previous work from the SMPL model to the SMPL-X model. This extension incorporates more complex hand poses and facial expressions, enhancing the detail and accuracy of the reconstructions. Crucially, it supports the flexible ad-hoc deployment of any number of cameras, offering significant potential for various applications.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control

Danfeng Li
Hui Zhang
Sheng Wang
Jiacheng Li
Zuxuan Wu

Despite recent advances in diffusion models, top-tier text-to-image (T2I) models still struggle to achieve precise spatial layout control, i. e. accurately generating entities with specified attributes and locations. Segmentation-mask-to-image (S2I) generation has emerged as a promising solution by incorporating pixel-level spatial guidance and regional text prompts. However, existing S2I methods fail to simultaneously ensure semantic consistency and shape consistency. To address these challenges, we propose Seg2Any, a novel S2I framework built upon advanced multimodal diffusion transformers ( e. g. FLUX). First, to achieve both semantic and shape consistency, we decouple segmentation mask conditions into regional semantic and high-frequency shape components. The regional semantic condition is introduced by a Semantic Alignment Attention Mask, ensuring that generated entities adhere to their assigned text prompts. The high-frequency shape condition, representing entity boundaries, is encoded as an Entity Contour Map and then introduced as an additional modality via multi-modal attention to guide image spatial structure. Second, to prevent attribute leakage across entities in multi-entity scenarios, we introduce an Attribute Isolation Attention Mask mechanism, which constrains each entity’s image tokens to attend exclusively to themselves during image self-attention. To support open-set S2I generation, we construct SACap-1M, a large-scale dataset containing 1 million images with 5. 9 million segmented entities and detailed regional captions, along with a SACap-Eval benchmark for comprehensive S2I evaluation. Extensive experiments demonstrate that Seg2Any achieves state-of-the-art performance on both open-set and closed-set S2I benchmarks, particularly in fine-grained spatial and attribute control of entities.

PDF Details

NeurIPS Conference 2025 Conference Paper

TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning

Sheng Wang
Pengan CHEN
Jingqi Zhou
Qintong Li
Jingwei Dong
Jiahui Gao
Boyang XUE
Jiyue Jiang

Model customization necessitates high-quality and diverse datasets, but acquiring such data remains time-consuming and labor-intensive. Despite the great potential of large language models (LLMs) for data synthesis, current approaches are constrained by limited seed data, model biases and low-variation prompts, resulting in limited diversity and biased distribution with the increase of data scales. To tackle this challenge, we introduce TreeSynth, a tree-guided subspace-based data synthesis approach inspired by decision trees. It constructs a spatial partitioning tree to recursively divide a task-specific full data space (i. e. , root node) into numerous atomic subspaces (i. e. , leaf nodes) with mutually exclusive and exhaustive attributes to ensure both distinctiveness and comprehensiveness, before synthesizing samples within each atomic subspace. This globally divide-and-synthesize method finally collects subspace samples into a comprehensive dataset, effectively circumventing repetition and space collapse to ensure the diversity of large-scale data synthesis. Furthermore, the spatial partitioning tree enables sample allocation into atomic subspaces, allowing the re-balancing of existing datasets for more balanced and comprehensive distributions. Empirically, extensive experiments across diverse benchmarks consistently validates the superior data diversity, model performance, and robust scalability of TreeSynth compared to both human-crafted datasets and peer data synthesis methods, with the average performance gain reaching 10%. Besides, the consistent improvements of TreeSynth-balanced datasets highlight its efficacious application to redistribute existing datasets for more comprehensive coverage and the induced performance enhancement. The code is available at https: //github. com/cpa2001/TreeSynth.

PDF Details

AAAI Conference 2024 Conference Paper

Cross-Domain Contrastive Learning for Time Series Clustering

Furong Peng
Jiachen Luo
Xuan Lu
Sheng Wang
Feijiang Li

Most deep learning-based time series clustering models concentrate on data representation in a separate process from clustering. This leads to that clustering loss cannot guide feature extraction. Moreover, most methods solely analyze data from the temporal domain, disregarding the potential within the frequency domain. To address these challenges, we introduce a novel end-to-end Cross-Domain Contrastive learning model for time series Clustering (CDCC). Firstly, it integrates the clustering process and feature extraction using contrastive constraints at both cluster-level and instance-level. Secondly, the data is encoded simultaneously in both temporal and frequency domains, leveraging contrastive learning to enhance within-domain representation. Thirdly, cross-domain constraints are proposed to align the latent representations and category distribution across domains. With the above strategies, CDCC not only achieves end-to-end output but also effectively integrates frequency domains. Extensive experiments and visualization analysis are conducted on 40 time series datasets from UCR, demonstrating the superior performance of the proposed model.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

Zihao Zhao
Sheng Wang
Qian Wang
Dinggang Shen

Obtaining large-scale radiology reports can be difficult for medical images due to ethical concerns, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.

PDF Details DOI

JBHI Journal 2024 Journal Article

RCPS: Rectified Contrastive Pseudo Supervision for Semi-Supervised Medical Image Segmentation

Xiangyu Zhao
Zengxin Qi
Sheng Wang
Qian Wang
Xuehai Wu
Ying Mao
Lichi Zhang

Medical image segmentation methods are generally designed as fully-supervised to guarantee model performance, which requires a significant amount of expert annotated samples that are high-cost and laborious. Semi-supervised image segmentation can alleviate the problem by utilizing a large number of unlabeled images along with limited labeled images. However, learning a robust representation from numerous unlabeled images remains challenging due to potential noise in pseudo labels and insufficient class separability in feature space, which undermines the performance of current semi-supervised segmentation approaches. To address the issues above, we propose a novel semi-supervised segmentation method named as Rectified Contrastive Pseudo Supervision (RCPS), which combines a rectified pseudo supervision and voxel-level contrastive learning to improve the effectiveness of semi-supervised segmentation. Particularly, we design a novel rectification strategy for the pseudo supervision method based on uncertainty estimation and consistency regularization to reduce the noise influence in pseudo labels. Furthermore, we introduce a bidirectional voxel contrastive loss in the network to ensure intra-class consistency and inter-class contrast in feature space, which increases class separability in the segmentation. The proposed RCPS segmentation method has been validated on two public datasets and an in-house clinical dataset. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art methods in semi-supervised medical image segmentation.

Details DOI

YNIMG Journal 2024 Journal Article

Structural and functional alterations in MRI-negative drug-resistant epilepsy and associated gene expression features

Ting Liu
Sheng Wang
Yingjie Tang
Sisi Jiang
Huixia Lin
Fei Li
Dezhong Yao
Xian Zhu

Neuroimaging techniques have been widely used in the study of epilepsy. However, structural and functional changes in the MRI-negative drug-resistant epilepsy (DRE) and the genetic mechanisms behind the structural alterations remain poorly understood. Using structural and functional MRI, we analyzed gray matter volume (GMV) and regional homogeneity (ReHo) in DRE, drug-sensitive epilepsy (DSE) and healthy controls. Gene expression data from Allen human brain atlas and GMV/ReHo were evaluated to obtain drug resistance-related and epilepsy-associated gene expression and compared with real transcriptional data in blood. We found structural and functional alterations in the cerebellum of DRE patients, which may be related to the mechanisms of drug resistance in DRE. Our study confirms that changes in brain morphology and regional activity in DRE patients may be associated with abnormal gene expression related to nervous system development. And SP1, as an important transcription factor, plays an important role in the mechanism of drug resistance.

Details DOI

AAAI Conference 2023 Conference Paper

GraphPrompt: Graph-Based Prompt Templates for Biomedical Synonym Prediction

Hanwen Xu
Jiayou Zhang
Zhirui Wang
Shizhuo Zhang
Megh Bhalerao
Yucong Liu
Dawei Zhu
Sheng Wang

In the expansion of biomedical dataset, the same category may be labeled with different terms, thus being tedious and onerous to curate these terms. Therefore, automatically mapping synonymous terms onto the ontologies is desirable, which we name as biomedical synonym prediction task. Unlike biomedical concept normalization (BCN), no clues from context can be used to enhance synonym prediction, making it essential to extract graph features from ontology. We introduce an expert-curated dataset OBO-syn encompassing 70 different types of concepts and 2 million curated concept-term pairs for evaluating synonym prediction methods. We find BCN methods perform weakly on this task for not making full use of graph information. Therefore, we propose GraphPrompt, a prompt-based learning approach that creates prompt templates according to the graphs. GraphPrompt obtained 37.2% and 28.5% improvement on zero-shot and few-shot settings respectively, indicating the effectiveness of these graph-based prompt templates. We envision that our method GraphPrompt and OBO-syn dataset can be broadly applied to graph-based NLP tasks, and serve as the basis for analyzing diverse and accumulating biomedical data. All the data and codes are avalible at: https://github.com/HanwenXuTHU/GraphPrompt

PDF Details DOI

EAAI Journal 2023 Journal Article

MIANet: Multi-level temporal information aggregation in mixed-periodicity time series forecasting tasks

Sheng Wang
Xi Chen
Dongliang Ma
Chen Wang
Yong Wang
Honggang Qi
Gongjian Zhou
Qingli Li

Regular human activities generate a large number of time series with mixed periodicity that can reflect human behavior patterns and the societal working mechanism. When forecasting these time series, nonlinear neural networks often encounter some limitations, such as utilizing mixed-periodic patterns, balancing multi-level information, incorporating future vision, forecasting delays and scale insensitivity, which affect the forecasting accuracy. To address these problems, we propose the Multi-level Information Aggregation Network (MIANet), a novel neural network with four key characteristics: (i) a novel folded recurrent structure that dynamically updates the local and mini-local information at a global range in a compact manner; (ii) a new recurrent unit called Folded Convolution Aggregation Temporal Memory (FCATM) that extracts and aggregates neighbor-trends in local and mini-local data; (iii) a fusing decoder structure that promotes the sharing of forward–backward future information and adaptively adjusts relationships among adjacent points; and (iv) a new Skip-Autoregressive (SAR) linear strategy that addresses scale sensitivity issues. The SAR can be embedded as a plug-and-play component into other deep learning (DL) models. Compared with other baseline methods, MIANet obtains statistically significant improvements on six real-world datasets, as demonstrated by conducting two-sample t-tests, indicating that the MIANet can be applied to various predictive scenarios, such as road occupancy, electricity consumption, pedestrian flow and urban noise.

Details DOI

AAAI Conference 2023 Conference Paper

Robust One-Shot Segmentation of Brain Tissues via Image-Aligned Style Transformation

Jinxin Lv
Xiaoyu Zeng
Sheng Wang
Ran Duan
Zhiwei Wang
Qiang Li

One-shot segmentation of brain tissues is typically a dual-model iterative learning: a registration model (reg-model) warps a carefully-labeled atlas onto unlabeled images to initialize their pseudo masks for training a segmentation model (seg-model); the seg-model revises the pseudo masks to enhance the reg-model for a better warping in the next iteration. However, there is a key weakness in such dual-model iteration that the spatial misalignment inevitably caused by the reg-model could misguide the seg-model, which makes it converge on an inferior segmentation performance eventually. In this paper, we propose a novel image-aligned style transformation to reinforce the dual-model iterative learning for robust one-shot segmentation of brain tissues. Specifically, we first utilize the reg-model to warp the atlas onto an unlabeled image, and then employ the Fourier-based amplitude exchange with perturbation to transplant the style of the unlabeled image into the aligned atlas. This allows the subsequent seg-model to learn on the aligned and style-transferred copies of the atlas instead of unlabeled images, which naturally guarantees the correct spatial correspondence of an image-mask training pair, without sacrificing the diversity of intensity patterns carried by the unlabeled images. Furthermore, we introduce a feature-aware content consistency in addition to the image-level similarity to constrain the reg-model for a promising initialization, which avoids the collapse of image-aligned style transformation in the first iteration. Experimental results on two public datasets demonstrate 1) a competitive segmentation performance of our method compared to the fully-supervised method, and 2) a superior performance over other state-of-the-art with an increase of average Dice by up to 4.67%. The source code is available at: https://github.com/JinxLv/One-shot-segmentation-via-IST.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures

Shitong Luo
Yufeng Su
Xingang Peng
Sheng Wang
Jian Peng
Jianzhu Ma

Antibodies are immune system proteins that protect the host by binding to specific antigens such as viruses and bacteria. The binding between antibodies and antigens is mainly determined by the complementarity-determining regions (CDR) of the antibodies. In this work, we develop a deep generative model that jointly models sequences and structures of CDRs based on diffusion probabilistic models and equivariant neural networks. Our method is the first deep learning-based method that generates antibodies explicitly targeting specific antigen structures and is one of the earliest diffusion probabilistic models for protein structures. The model is a "Swiss Army Knife" capable of sequence-structure co-design, sequence design for given backbone structures, and antibody optimization. We conduct extensive experiments to evaluate the quality of both sequences and structures of designed antibodies. We find that our model could yield competitive results in binding affinity measured by biophysical energy functions and other protein design metrics.

PDF Details

AAAI Conference 2022 Conference Paper

Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation

Qin Wang
Jiayang Chen
Yuzhe Zhou
Yu Li
Liangzhen Zheng
Sheng Wang
Zhen Li
Shuguang Cui

Accurate protein contact map prediction (PCMP) is essential for precise protein structure estimation and further biological studies. Recent works achieve significant performance on this task with high quality multiple sequence alignment (MSA). However, the PCMP accuracy drops dramatically while only poor MSA (e. g. , absolute MSA count less than 10) is available. Therefore, in this paper, we propose the Contact-Distil to improve the low homologous PCMP accuracy through knowledge distillation on a self-supervised model. Particularly, two pre-trained transformers are exploited to learn the high quality and low quality MSA representation in parallel for the teacher and student model correspondingly. Besides, the co-evolution information is further extracted from pure sequence through a pretrained ESM-1b model, which provides auxiliary knowledge to improve student performance. Extensive experiments show Contact-Distil outperforms previous state-of-the-arts by large margins on CAMEO-L dataset for low homologous PCMP, i. e. , around 13. 3% and 9. 5% improvements against Alphafold2 and MSA Transformer respectively when MSA count less than 10.

PDF Details

AAAI Conference 2022 Conference Paper

DisenCite: Graph-Based Disentangled Representation Learning for Context-Specific Citation Generation

Yifan Wang
Yiping Song
Shuai Li
Chaoran Cheng
Wei Ju
Ming Zhang
Sheng Wang

Citing and describing related literature are crucial to scientific writing. Many existing approaches show encouraging performance in citation recommendation, but are unable to accomplish the more challenging and onerous task of citation text generation. In this paper, we propose a novel disentangled representation based model DisenCite to automatically generate the citation text through integrating paper text and citation graph. A key novelty of our method compared with existing approaches is to generate context-specific citation text, empowering the generation of different types of citations for the same paper. In particular, we first build and make available a graph enhanced contextual citation dataset (GCite) with 25K edges in different types characterized by citation contained sections over 4. 8K research papers. Based on this dataset, we encode each paper according to both textual contexts and structure information in the heterogeneous citation graph. The resulted paper representations are then disentangled by the mutual information regularization between this paper and its neighbors in graph. Extensive experiments demonstrate the superior performance of our method comparing to state-of-the-art approaches. We further conduct ablation and case studies to reassure that the improvement of our method comes from generating the context-specific citation through incorporating the citation graph.

PDF Details

NeurIPS Conference 2022 Conference Paper

Distribution-Informed Neural Networks for Domain Adaptation Regression

Jun Wu
Jingrui He
Sheng Wang
Kaiyu Guan
Elizabeth Ainsworth

In this paper, we study the problem of domain adaptation regression, which learns a regressor for a target domain by leveraging the knowledge from a relevant source domain. We start by proposing a distribution-informed neural network, which aims to build distribution-aware relationship of inputs and outputs from different domains. This allows us to develop a simple domain adaptation regression framework, which subsumes popular domain adaptation approaches based on domain invariant representation learning, reweighting, and adaptive Gaussian process. The resulting findings not only explain the connections of existing domain adaptation approaches, but also motivate the efficient training of domain adaptation approaches with overparameterized neural networks. We also analyze the convergence and generalization error bound of our framework based on the distribution-informed neural network. Specifically, our generalization bound focuses explicitly on the maximum mean discrepancy in the RKHS induced by the neural tangent kernel of distribution-informed neural network. This is in sharp contrast to the existing work which relies on domain discrepancy in the latent feature space heuristically formed by one or several hidden neural layers. The efficacy of our framework is also empirically verified on a variety of domain adaptation regression benchmarks.

PDF Details

IJCAI Conference 2021 Conference Paper

Adaptive Residue-wise Profile Fusion for Low Homologous Protein Secondary Structure Prediction Using External Knowledge

Qin Wang
Jun Wei
Boyuan Wang
Zhen Li
Sheng Wang
Shuguang Cui

Protein secondary structure prediction (PSSP) is essential for protein function analysis. However, for low homologous proteins, the PSSP suffers from insufficient input features. In this paper, we explicitly import external self-supervised knowledge for low homologous PSSP under the guidance of residue-wise (amino acid wise) profile fusion. In practice, we firstly demonstrate the superiority of profile over Position-Specific Scoring Matrix (PSSM) for low homologous PSSP. Based on this observation, we introduce the novel self-supervised BERT features as the pseudo profile, which implicitly involves the residue distribution in all native discovered sequences as the complementary features. Furthermore, a novel residue-wise attention is specially designed to adaptively fuse different features (i. e. , original low-quality profile, BERT based pseudo profile), which not only takes full advantage of each feature but also avoids noise disturbance. Besides, the feature consistency loss is proposed to accelerate the model learning from multiple semantic levels. Extensive experiments confirm that our method outperforms state-of-the-arts (i. e. , 4. 7% for extremely low homologous cases on BC40 dataset).

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation

Fenglin Liu
Chenyu You
Xian Wu
Shen Ge
Sheng Wang
Xu Sun

Medical report generation, which aims to automatically generate a long and coherent report of a given medical image, has been receiving growing research interests. Existing approaches mainly adopt a supervised manner and heavily rely on coupled image-report pairs. However, in the medical domain, building a large-scale image-report paired dataset is both time-consuming and expensive. To relax the dependency on paired data, we propose an unsupervised model Knowledge Graph Auto-Encoder (KGAE) which accepts independent sets of images and reports in training. KGAE consists of a pre-constructed knowledge graph, a knowledge-driven encoder and a knowledge-driven decoder. The knowledge graph works as the shared latent space to bridge the visual and textual domains; The knowledge-driven encoder projects medical images and reports to the corresponding coordinates in this latent space and the knowledge-driven decoder generates a medical report given a coordinate in this space. Since the knowledge-driven encoder and decoder can be trained with independent sets of images and reports, KGAE is unsupervised. The experiments show that the unsupervised KGAE generates desirable medical reports without using any image-report training pairs. Moreover, KGAE can also work in both semi-supervised and supervised settings, and accept paired images and reports in training. By further fine-tuning with image-report pairs, KGAE consistently outperforms the current state-of-the-art models on two datasets.

PDF Details

AAAI Conference 2021 Conference Paper

PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning

Qin Wang
Boyuan Wang
Zhenlei Xu
Jiaxiang Wu
Peilin Zhao
Zhen Li
Sheng Wang
Junzhou Huang

Protein secondary structure prediction (PSSP) is an essential task in computational biology. To achieve the accurate PSSP, the general and vital feature engineering is to use multiple sequence alignment (MSA) for Position-Specific Scoring Matrix (PSSM) extraction. However, when only low-quality PSSM can be obtained due to poor sequence homology, previous PSSP accuracy (merely around 65%) is far from practical usage for subsequent tasks. In this paper, we propose a novel PSSM-Distil framework for PSSP on low-quality PSSM, which not only enhances the PSSM feature at a lower level but also aligns the feature distribution at a higher level. In practice, the PSSM-Distil first exploits the proteins with high-quality PSSM to achieve a teacher network for PSSP in a full-supervised way. Under the guidance of the teacher network, the low-quality PSSM and corresponding student network with low discriminating capacity are effectively resolved by feature enhancement through EnhanceNet and distribution alignment through knowledge distillation with contrastive learning. Further, our PSSM-Distil supports the input from a pre-trained protein sequence language BERT model to provide auxiliary information, which is designed to address the extremely low-quality PSSM cases, i. e. , no homologous sequence. Extensive experiments demonstrate the proposed PSSM-Distil outperforms state-of-the-art models on PSSP by 6% on average and nearly 8% in extremely low-quality cases on public benchmarks, BC40 and CB513.

PDF Details

AAAI Conference 2018 Conference Paper

Adaptive Graph Convolutional Neural Networks

Ruoyu Li
Sheng Wang
Feiyun Zhu
Junzhou Huang

Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current ﬁlters in graph CNNs are built for ﬁxed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and ﬂexible graph CNN taking data of arbitrary graph structure as input. In that way a task-driven adaptive graph is learned for each graph data while training. To efﬁciently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graph-structured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy.

PDF Details

IJCAI Conference 2016 Conference Paper

Feature Learning Based Deep Supervised Hashing with Pairwise Labels

Wu-Jun Li
Sheng Wang
Wang-Cheng Kang

Recent years have witnessed wide application of hashing for large-scale image retrieval. However, most existing hashing methods are based on hand-crafted features which might not be optimally compatible with the hashing procedure. Recently, deep hashing methods have been proposed to perform simultaneous feature learning and hash-code learning with deep neural networks, which have shown better performance than traditional hashing methods with hand-crafted features. Most of these deep hashing methods are supervised whose supervised information is given with triplet labels. For another common application scenario with pairwise labels, there have not existed methods for simultaneous feature learning and hash-code learning. In this paper, we propose a novel deep hashing method, called deep pairwise-supervised hashing (DPSH), to perform simultaneous feature learning and hash-code learning for applications with pairwise labels. Experiments on real datasets show that our DPSH method can outperform other methods to achieve the state-of-the-art performance in image retrieval applications.

PDF Details

AAAI Conference 2014 Conference Paper

SUIT: A Supervised User-Item Based Topic Model for Sentiment Analysis

Fangtao Li
Sheng Wang
Shenghua Liu
Ming Zhang

Probabilistic topic models have been widely used for sentiment analysis. However, most of existing topic methods only model the sentiment text, but do not consider the user, who expresses the sentiment, and the item, which the sentiment is expressed on. Since different users may use different sentiment expressions for different items, we argue that it is better to incorporate the user and item information into the topic model for sentiment analysis. In this paper, we propose a new Supervised User-Item based Topic model, called SUIT model, for sentiment analysis. It can simultaneously utilize the textual topic and latent user-item factors. Our proposed method uses the tensor outer product of text topic proportion vector, user latent factor and item latent factor to model the sentiment label generalization. Extensive experiments are conducted on two datasets: review dataset and microblog dataset. The results demonstrate the advantages of our model. It shows significant improvement compared with supervised topic models and collaborative filtering methods.

PDF Details