Arrow Research search

Author name cluster

Wei Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

AAAI Conference 2026 Conference Paper

Complex Instruction Following with Diverse Style Policies in Football Games

  • Chenglu Sun
  • Shuo Shen
  • Haonan Hu
  • Wei Zhou
  • Chen Chen

Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.

AAAI Conference 2026 Conference Paper

Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection

  • Fanxiao Li
  • Jiaying Wu
  • Tingchao Fu
  • Yunyun Dong
  • Bingbing Song
  • Wei Zhou

The proliferation of multimodal misinformation poses growing threats to public discourse and societal trust. While Large Vision-Language Models (LVLMs) have enabled recent progress in multimodal misinformation detection (MMD), the rise of generative AI (GenAI) tools introduces a new challenge: GenAI-driven news diversity, characterized by highly varied and complex content. We show that this diversity induces multi-level drift, comprising (1) model-level misperception drift, where stylistic variations disrupt a model’s internal reasoning, and (2) evidence-level drift, where expression diversity degrades the quality or relevance of retrieved external evidence. These drifts significantly degrade the robustness of current LVLM-based MMD systems. To systematically study this problem, we introduce DriftBench, a large-scale benchmark comprising 16,000 news instances across six categories of diversification. We design three evaluation tasks: (1) robustness of truth verification under multi-level drift; (2) susceptibility to adversarial evidence contamination generated by GenAI; and (3) analysis of reasoning consistency across diverse inputs. Experiments with six state-of-the-art LVLM-based detectors show substantial performance drops (average F1 ↓ 14.8%) and increasingly unstable reasoning traces, with even more severe failures under adversarial evidence injection. Our findings uncover fundamental vulnerabilities in existing MMD systems and suggest an urgent need for more resilient approaches in the GenAI era.

AAAI Conference 2026 Conference Paper

GLOBA: Rethinking Parameter Conflicts in Model Merging

  • Zehao Liu
  • Kun Li
  • Wei Zhou

Model merging serves as a training-free technique that combines multiple task-specific models into a unified multi-task model, but parameter conflicts often lead to performance drops. Previous methods flatten weight matrices into one-dimensional vectors, losing the inherent structural information of their row and column spaces. We mathematically prove and experimentally validate that parameter conflicts arise from non-orthogonal components of task vectors, while orthogonal components are conflict-free. Furthermore, we find that non-orthogonal components can contain both harmful conflicts and beneficial synergies. To precisely locate parameter conflicts and extract orthogonal components, we propose GLOBA (GLObal Basis Analysis Framework), which projects task vectors onto a global basis to align them within a unified coordinate system and construct a task interaction matrix. Following energy-based pruning, we divide parameters into five types based on the orthogonal relationships between the row spaces and column spaces of task vectors. Experiments on three fine-tuned models (mathematics, coding, and instruction-following) using LLaMA-2-7B and LLaMA-2-13B demonstrate significant performance gains through selective retention of beneficial parameters and removal of conflicting ones.

AAAI Conference 2026 Conference Paper

Hierarchical Dual-Domain Fusion with Frequency-Guided Spatial Modeling for Pan-Sharpening

  • Huangqimei Zheng
  • Chengyi Pan
  • Qian Jiang
  • Wei Zhou
  • Xin Jin

Pan-sharpening aims to generate high-resolution multispectral images by integrating the spectral richness of low-resolution multispectral images with the spatial details of high-resolution panchromatic images. Although frequency-domain modeling shows great potential in this field, most existing methods are still limited to spatial-domain processing or fail to effectively capture the contextual interactions between frequency and spatial features. To address these issues, we propose a novel multi-scale frequency-spatial collaborative fusion approach. A Frequency-Spatial U-Net is introduced as the backbone network, in which frequency-spatial modeling blocks are embedded to progressively enhance the frequency-guided spatial contextual modeling capability across layers. To this end, we design a Dual Branch Frequency Attention module that adaptively enhances high- and low-frequency information. In addition, we introduce fine-mid-coarse resolution branches and devise a main-auxiliary multi-scale reconstruction loss to facilitate collaborative optimization. The effectiveness of the proposed model is validated through extensive experiments, demonstrating superior performance in both qualitative and quantitative evaluations. Moreover, our model achieves the fastest inference time among all compared methods, striking an excellent balance between accuracy and efficiency.

AAAI Conference 2026 Conference Paper

MFmamba: A Multi-function Network for Panchromatic Image Resolution Restoration Based on State-Space Model

  • Qian Jiang
  • Qianqian Wang
  • Xin Jin
  • Michał Woźniak
  • Shaowen Yao
  • Wei Zhou

Remote sensing images are becoming increasingly widespread in military, earth resource exploration. Because of the limitation of a single sensor, we can obtain high spatial resolution grayscale panchromatic (PAN) images and low spatial resolution color multispectral (MS) images. Therefore, an important issue is to obtain a color image with high spatial resolution when there is only a PAN image at the input. The existing methods improve spatial resolution using super-resolution (SR) technology and spectral recovery using colorization technology. However, the SR technique cannot improve the spectral resolution, and the colorization technique cannot improve the spatial resolution. Moreover, the pansharpening method needs two registered inputs and can not achieve SR. As a result, an integrated approach is expected. We designed a novel multi-function model (MFmamba) to realize the tasks of SR, spectral recovery, joint SR and spectral recovery through three different inputs. Firstly, MFmamba utilizes UNet++ as the backbone, and a Mamba Upsample Block (MUB) is combined with UNet++. Secondly, a Dual Pool Attention (DPA) is designed to replace the skip connection in UNet++. Finally, a Multi-scale Hybrid Cross Block (MHCB) is proposed for initial feature extraction. Many experiments show that MFmamba is competitive in evaluation metrics and visual results and performs well in the three tasks when only the input PAN image is used.

AAAI Conference 2026 Conference Paper

Score-Based Model for Low-Rank Tensor Recovery

  • Zhengyun Cheng
  • Changhao Wang
  • Guanwen Zhang
  • Yi Xu
  • Wei Zhou
  • Xiangyang Ji

Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these methods effectively model the relationships between latent factors and the low-rank tensor using Dirac delta distributions. However, tensor low-rank decomposition is inherently non-unique, leading to a multimodal distribution over possible solutions. Critically, such prior knowledge is rarely available in practical scenarios, particularly regarding the optimal rank structure and contraction rules. To address this issue, we propose a score-based model that eliminates the need for predefined structural or distributional assumptions, enabling the learning of compatibility between tensors and latent factors. Specifically, a neural network is designed to learn the energy function, which is optimized via score matching to capture the gradient of the joint log-probability of tensor entries and latent factors. Our method allows for modeling structures and distributions beyond the Dirac delta assumption. Moreover, integrating the block coordinate descent (BCD) algorithm with the proposed smooth regularization enables the model to perform both tensor completion and denoising. Experimental results demonstrate significant performance improvements across various tensor types, including sparse and continuous-time tensors, as well as visual data.

AAAI Conference 2026 Conference Paper

Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment

  • Yixiao Li
  • Xiaoyuan Yang
  • Weide Liu
  • Xin Jin
  • Xu Jia
  • Yu-Kun Lai
  • Paul L. Rosin
  • Hantao Liu

As super-resolution (SR) techniques introduce unique distortions that fundamentally differ from those caused by traditional degradation processes (e.g., compression), there is an increasing demand for specialized video quality assessment (VQA) methods tailored to SR-generated content. One critical factor affecting perceived quality is temporal inconsistency, which refers to irregularities between consecutive frames. However, existing VQA approaches rarely quantify this phenomenon or explicitly investigate its relationship with human perception. Moreover, SR videos exhibit amplified inconsistency levels as a result of enhancement processes. In this paper, we propose Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment (TIG-SVQA) that underscores the critical role of temporal inconsistency in guiding the quality assessment of SR videos. We first design a perception-oriented approach to quantify frame-wise temporal inconsistency. Based on this, we introduce the Inconsistency Highlighted Spatial Module, which localizes inconsistent regions at both coarse and fine scales. Inspired by the human visual system, we further develop an Inconsistency Guided Temporal Module that performs progressive temporal feature aggregation: (1) a consistency-aware fusion stage in which a visual memory capacity block adaptively determines the information load of each temporal segment based on inconsistency levels, and (2) an informative filtering stage for emphasizing quality-related features. Extensive experiments on both single-frame and multi-frame SR video scenarios demonstrate that our method significantly outperforms state-of-the-art VQA approaches.

JBHI Journal 2025 Journal Article

AMLPF-CLIP: Adaptive Prompting and Distilled Learning for Imbalanced Histopathological Image Classification

  • Xizhang Yao
  • Guanghui Yue
  • Jeremiah D. Deng
  • Hanhe Lin
  • Wei Zhou

Histopathological image classification (HIC) plays a pivotal role in computer-aided diagnosis, enabling lesion characterization (e. g. , tumor grading) and survival outcome prediction. Despite recent advances in HIC, existing methods still face challenges in integrating domain-specific knowledge, addressing class imbalance, and ensuring computational efficiency. To address these challenges, we propose AMLPF-CLIP, an enhanced CLIP-based framework for HIC featuring three key innovations. First, we introduce an Adaptive Multi-Level Prompt Fusion (AMLPF) strategy that leverages three levels of textual prompts: class labels, basic descriptions, and GPT-4o-generated detailed pathological features for enhanced semantic representation and cross-modal alignment. Second, we design a class-balanced resampling method that dynamically adjusts sampling weights based on both data imbalance and classification performance, targeting underrepresented, low-confidence classes. Third, we develop a Knowledge Distillation (KD) technique that leverages output-level alignment via L2 loss, transferring knowledge from a large Vision Transformer (ViT-L/16) to a lightweight ResNet-50-based CLIP model. Extensive experiments on three public datasets demonstrate that AMLPF-CLIP consistently outperforms eleven state-of-the-art methods, achieving accuracy improvements of 1. 19% on Chaoyang, 2. 64% on BreaKHis, and 0. 90% on LungHist700. AMLFP-CLIP also demonstrates improved robustness and efficiency, highlighting its practical applicability.

AAAI Conference 2025 Conference Paper

An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding

  • Dou Hu
  • Lingwei Wei
  • Wei Zhou
  • Songlin Hu

This paper proposes a new principled multi-task representation learning framework (InfoMTL) to extract noise-invariant sufficient representations for all tasks. It ensures sufficiency of shared representations for all tasks and mitigates the negative effect of redundant features, which can enhance language understanding of pre-trained language models (PLMs) under the multi-task paradigm. Firstly, a shared information maximization principle is proposed to learn more sufficient shared representations for all target tasks. It can avoid the insufficiency issue arising from representation compression in the multi-task paradigm. Secondly, a task-specific information minimization principle is designed to mitigate the negative effect of potential redundant features in the input for each task. It can compress task-irrelevant redundant information and preserve necessary information relevant to the target for multi-task prediction. Experiments on six classification benchmarks show that our method outperforms 12 comparative multi-task methods under the same multi-task settings, especially in data-constrained and noisy scenarios. Extensive experiments demonstrate that the learned representations are more sufficient, data-efficient, and robust.

AAAI Conference 2025 Conference Paper

BotSim: LLM-Powered Malicious Social Botnet Simulation

  • Boyu Qiao
  • Kun Li
  • Wei Zhou
  • Shilong Li
  • Qianqian Lu
  • Songlin Hu

Social media platforms like X(Twitter) and Reddit are vital to global communication. However, advancements in Large Language Model (LLM) technology give rise to social media bots with unprecedented intelligence. These bots adeptly simulate human profiles, conversations, and interactions, disseminating large amounts of false information and posing significant challenges to platform regulation. To better understand and counter these threats, we innovatively design BotSim, a malicious social botnet simulation powered by LLM. BotSim mimics the information dissemination patterns of real-world social networks, creating a virtual environment composed of intelligent agent bots and real human users. In the temporal simulation constructed by BotSim, these advanced agent bots autonomously engage in social interactions such as posting and commenting, effectively modeling scenarios of information flow and user interaction. Building on the BotSim framework, we construct a highly human-like, LLM-driven bot dataset called BotSim-24 and benchmark multiple bot detection strategies against it. The experimental results indicate that detection methods effective on traditional bot datasets perform worse on BotSim-24, highlighting the urgent need for new detection strategies to address the cybersecurity threats posed by these advanced bots.

AAAI Conference 2025 Conference Paper

Editing Memories Through Few Targeted Neurons

  • Wei Zhou
  • Wei Wei
  • Guibang Cao
  • Fei Wang

Model editing is a novel research topic in large language models (LLMs), aimed at efficiently handling various knowledge editing tasks. Since irrelevant knowledge is difficult to measure, existing editing methods often lack explicit ways to preserve it, especially for editing methods based on the fine-tuning paradigm. They generally control the locality performance of model editing by constraining the range of changes in model parameters. However, their performance improvements are not always ideal, and may even lead to a decrease in the editing reliability. In this paper, we try to explore effective editing locality control methods based on the relationship between the stored knowledge and the strongly associated model components. Based on the discovery of ``knowledge neurons'' and enough experimental results, we further explore the potential characteristics between knowledge and model components, confirm and point out: (1) only 1% neurons have significant contributions to specific knowledge storage, and (2) these targeted neurons often have a high overlap for knowledge with similar relational descriptions, which means that knowledge with similar relationships may be severely affected when these targeted neurons are modified. Based on these findings, we propose Targeted Neurons Fine-tuning with Data Augmentation (TNF-DA), which performs data augmentation based on the relational representation of edited knowledge to improve editing locality. By freezing most of the model parameters and only fine-tuning the highly contributing neurons corresponding to the edited knowledge, we obtain desirable results in terms of generalization and specificity compared with previous fine-tuning-based methods. Extensive experiments have demonstrated the superior editing performance achieved by our proposed method.

AAAI Conference 2025 Conference Paper

Enhancing Multi-Hop Fact Verification with Structured Knowledge-Augmented Large Language Models

  • Han Cao
  • Lingwei Wei
  • Wei Zhou
  • Songlin Hu

The rapid development of social platforms exacerbates the dissemination of misinformation, which stimulates the research in fact verification. Recent studies tend to leverage semantic features to solve this problem as a single-hop task. However, the process of verifying a claim requires several pieces of evidence with complicated inner logic and relations to verify the given claim in real-world situations. Recent studies attempt to improve both understanding and reasoning abilities to enhance the performance, but they overlook the crucial relations between entities that benefit models to understand better and facilitate the prediction. To emphasize the significance of relations, we resort to Large Language Models (LLMs) considering their excellent understanding ability. Instead of other methods using LLMs as the predictor, we take them as relation extractors, for they do better in understanding rather than reasoning according to the experimental results. Thus, to solve the challenges above, we propose a novel Structured Knowledge-Augmented LLM-based Network (LLM-SKAN) for multi-hop fact verification. Specifically, we utilize an LLM-driven Knowledge Extractor to capture fine-grained information, including entities and their complicated relations. Besides, we leverage a Knowledge-Augmented Relation Graph Fusion module to interact with each node and learn better claim-evidence representations comprehensively. The experimental results on four common-used datasets demonstrate the effectiveness and superiority of our model.

JBHI Journal 2025 Journal Article

MASG-SAM: Enhancing Few-Shot Medical Image Segmentation with Multi-Scale Attention and Semantic Guidance

  • Wei Zhou
  • Guilin Guan
  • Yuan Gao
  • Pengju Si
  • Mengjia Xu
  • Qifeng Yan

Foundation models, such as the Segment Anything Model (SAM), have demonstrated impressive generalization across various image segmentation tasks. However, they encounter challenges when applied to medical imaging, primarily due to the lack of domain-specific expertise and the limited availability of annotated data. Existing methods for adapting SAM typically rely on expert-driven prompt design and extensive fine-tuning, which hinder their effectiveness in medical imaging, particularly for rare and complex anatomical structures. To overcome these challenges, we propose MASG-SAM, an innovative framework designed for efficient few-shot medical image segmentation. MASG-SAM integrates three key innovations: the Hierarchical Attention Enhancement (HAE), Boundary Feature Enhancement (BFE), and Dynamic Semantic Fusion (DSF) modules. The HAE module optimizes attention distribution across hierarchical feature maps, enhancing feature diversity and reducing feature drift, thereby improving segmentation of both global and local features in complex medical images. The BFE module introduces a boundary-sensitive mechanism that enhances edge detection, enabling precise segmentation of overlapping or difficult-to-delineate anatomical structures. Finally, the DSF module leverages Contrastive Language-Image Pretraining (CLIP) to inject domain-specific medical semantic knowledge. By adaptively refining feature fusion during training, DSF combines semantic guidance with spatial adjustments, progressively improving segmentation accuracy, particularly in data-scarce scenarios. Experiments conducted on four publicly available medical datasets show that MASG-SAM outperforms state-of-the-art methods, achieving high segmentation accuracy with minimal labeled data. Our framework significantly enhances the adaptability and accuracy of SAM in complex medical imaging tasks. The code for MASG-SAM will be made publicly available at https://github.com/ggllllll/MASG-SAM. git.

JBHI Journal 2025 Journal Article

MDTL-ACP: Anticancer Peptides Prediction Based on Multi-Domain Transfer Learning

  • Junhang Cao
  • Wei Zhou
  • Qiyuan Yu
  • Junkai Ji
  • Jun Zhang
  • Shan He
  • Zexuan Zhu

Anticancer peptides (ACPs) have emerged as one of the most promising therapeutic agents for cancer treatment. They are bioactive peptides featuring broad-spectrum activity and low drug-resistance. The discovery of ACPs via traditional biochemical methods is laborious and costly. Accordingly, various computational methods have been developed to facilitate the discovery of ACPs. However, the data resources and knowledge of ACPs are still very scarce, and only a few of them are clinically verified, which limits the competence of computational methods. To address this issue, in this article, we propose an ACP prediction model based on multi-domain transfer learning, namely MDTL-ACP, to discriminate novel ACPs from plentiful inactive peptides. In particular, we collect abundant antimicrobial peptides (AMPs) from four well-studied peptide domains and extract their inherent features as the input of MDTL-ACP. The features learned from multiple source domains of AMPs are then transferred into the target prediction task of ACPs via artificial neural network-based shared-extractor and task-specific classifiers in MDTL-ACP. The knowledge captured in the transferred features enhances the prediction of ACPs in the target domain. Experimental results demonstrate that MDTL-ACP can outperform the traditional and state-of-the-art ACP prediction methods.

ICML Conference 2025 Conference Paper

Novelty Detection in Reinforcement Learning with World Models

  • Geigh Zollicoffer
  • Kenneth Eaton 0002
  • Jonathan C. Balloch
  • Julia M. Kim
  • Wei Zhou
  • Robert Wright
  • Mark O. Riedl

Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents by utilizing the misalignment of the world model’s hallucinated states and the true observed states as a novelty score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL-focused novelty detection algorithms.

NeurIPS Conference 2025 Conference Paper

PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation

  • Wei Zhou
  • Guoliang Li
  • Haoyu Wang
  • Yuxing Han
  • Xufei Wu
  • Fan Wu
  • Xuanhe Zhou

Large language models (LLMs) have shown increasing effectiveness in Text-to-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a. k. a. , SQL-to-SQL), which adapts a query written for one database system (e. g. , MySQL) into its equivalent one for another system (e. g. , ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e. g. , customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQL Translation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e. g. , LLMS achieve lower than 38. 53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28, 003 translation (for extensive syntax testing) and PARROT-Simple with 5, 306 representative samples (for focused stress testing), covering 22 production-grade database systems. To promote future research, we release a public leaderboard and source code at: https: //code4db. github. io/parrot-bench/.

AAAI Conference 2024 Conference Paper

An Efficient Subgraph-Inferring Framework for Large-Scale Heterogeneous Graphs

  • Wei Zhou
  • Hong Huang
  • Ruize Shi
  • Kehan Yin
  • Hai Jin

Heterogeneous Graph Neural Networks (HGNNs) play a vital role in advancing the field of graph representation learning by addressing the complexities arising from diverse data types and interconnected relationships in real-world scenarios. However, traditional HGNNs face challenges when applied to large-scale graphs due to the necessity of training or inferring on the entire graph. As the size of the heterogeneous graphs increases, the time and memory overhead required by these models escalates rapidly, even reaching unacceptable levels. To address this issue, in this paper, we present a novel framework named (SubInfer), which conducts training and inferring on subgraphs instead of the entire graphs, hence efficiently handling large-scale heterogeneous graphs. The proposed framework comprises three main steps: 1) partitioning the heterogeneous graph from multiple perspectives to preserve various semantic information, 2) completing the subgraphs to improve the convergence speed of subgraph training and the performance of subgraph inference, and 3) training and inferring the HGNN model on distributed clusters to further reduce the time overhead. The framework is applicable to the vast majority of HGNN models. Experiments on five benchmark datasets demonstrate that SubInfer effectively optimizes the training and inference phase, delivering comparable performance to traditional HGNN models while significantly reducing time and memory overhead.

AILAW Journal 2024 Journal Article

Causality-inspired legal provision selection with large language model-based explanation

  • Zheng Wang
  • Yuanzhi Ding
  • Caiyuan Wu
  • Yuzhen Guo
  • Wei Zhou

Abstract Accurate identification of legal provisions is crucial for adjudicating criminal cases, but the complexity and volume of legal texts pose significant challenges for legal professionals. This paper addresses these challenges by introducing a novel legal provision selection framework that transforms the task from a simple classification problem into a sophisticated system combining semantic matching with causal relationship learning. Leveraging large language models, our approach enhances the understanding and interpretation of legal language, by extracting nuanced features from legal texts for deeper contextual comprehension. Additionally, integrating causal learning aligns with the inherent causality in legal reasoning, improving model interpretability and mitigating data bias. Our method demonstrates superior accuracy and robustness through extensive experiments on the CAIL2018 dataset and its subsets. This research significantly advances legal AI applications, promoting efficiency and fairness in the criminal justice system by providing precise and reliable legal provision selection.

JBHI Journal 2024 Journal Article

Distributed Medical Data Storage Mechanism Based on Proof of Retrievability and Vector Commitment for Metaverse Services

  • Guowei Fang
  • Yi Sun
  • Mutiq Almutiq
  • Wei Zhou
  • Yekang Zhao
  • Yongjun Ren

The metaverse is a unified, persistent, and shared multi-user virtual environment with a fully immersive, hyper-temporal, and diverse interconnected network. When combined with healthcare, it can effectively improve medical services and has great potential for development in realizing medical training, enhanced teaching, and remote surgical treatment. The metaverse provides immersive services for users through massive and multimodal data, and its data scale and data growth rate are bound to show exponential growth. Blockchain-based distributed storage is a fundamental way to keep the metaverse running continuously; however, many blockchains, such as Ethereum and Filecoin, suffer from low transaction throughput and high latency, which seriously affect the efficiency of distributed storage services and make it difficult to apply them to the metaverse environment. To this end, this paper first proposes a network architecture for distributed storage systems based on proof of retrievability to address the problem of centralized decision making and single point of access in centralized storage. The secure data storage of the metaverse health system is ensured. Secondly, we designed two data transmission protocols through vector commitment and encoding functions to achieve the transfer of time cost from the critical path to storage nodes and improve the efficiency of data verification between nodes as well as the scalability of the metaverse health system. Finally, this paper also conducts security analysis and performance analysis of the proposed scheme, and the results show that our scheme is secure and efficient.

AAAI Conference 2024 Conference Paper

Improving Open-Domain Dialogue Response Generation with Multi-Source Multilingual Commonsense Knowledge

  • Sixing Wu
  • Jiong Yu
  • Jiahao Chen
  • Xiaofan Deng
  • Wei Zhou

Knowledge-grounded Dialogue Response Generation (KRG) can facilitate informative and fidelity dialogues using external knowledge. Prior monolingual works can only use the knowledge of the corresponding native language. Thus, due to the prohibitive costs of collecting and constructing external knowledge bases, the limited scale of accessible external knowledge always constrains the ability of KRG, especially in low-resource language scenarios. To this end, we propose a new task, Multi-Source Multilingual Knowledge-Grounded Response Generation (MMKRG), which simultaneously uses multiple knowledge sources of different languages. We notice that simply combining knowledge of different languages is inefficient due to the Cross-Conflict issue and Cross-Repetition issue. Thus, we propose a novel approach MMK-BART, which uses a simple but elegant Estimate-Cluster-Penalize mechanism to overcome the mentioned issues and adopts the multilingual language model mBART as the backbone. Meanwhile, based on the recent multilingual corpus XDailyDialog, we propose an MMKRG dataset MMK-DailyDialog, which has been aligned to the large-scale multilingual commonsense knowledge base ConceptNet and supports four languages (English, Chinese, German, and Italian). Extensive experiments have verified the effectiveness of our dataset and approach in monolingual, cross-lingual, and multilingual scenarios.

JBHI Journal 2024 Journal Article

PSEENet: A Pseudo-Siamese Neural Network Incorporating Electroencephalography and Electrooculography Characteristics for Heterogeneous Sleep Staging

  • Wei Zhou
  • Ning Shen
  • Ligang Zhou
  • Minghui Liu
  • Yiyuan Zhang
  • Cong Fu
  • Huan Yu
  • Feng Shu

Sleep staging plays a critical role in evaluating the quality of sleep. Currently, most studies are either suffering from dramatic performance drops when coping with varying input modalities or unable to handle heterogeneous signals. To handle heterogeneous signals and guarantee favorable sleep staging performance when a single modality is available, a pseudo-siamese neural network (PSN) to incorporate electroencephalography (EEG), electrooculography (EOG) characteristics is proposed (PSEENet). PSEENet consists of two parts, spatial mapping modules (SMMs) and a weight-shared classifier. SMMs are used to extract high-dimensional features. Meanwhile, joint linkages among multi-modalities are provided by quantifying the similarity of features. Finally, with the cooperation of heterogeneous characteristics, associations within various sleep stages can be established by the classifier. The evaluation of the model is validated on two public datasets, namely, Montreal Archive of Sleep Studies (MASS) and SleepEDFX, and one clinical dataset from Huashan Hospital of Fudan University (HSFU). Experimental results show that the model can handle heterogeneous signals, provide superior results under multimodal signals and show good performance with single modality. PSEENet obtains accuracy of 79. 1%, 82. 1% with EEG, EEG and EOG on Sleep-EDFX, and significantly improves the accuracy with EOG from 73. 7% to 76% by introducing similarity information.

NeurIPS Conference 2024 Conference Paper

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

  • Haoran You
  • Yipin Guo
  • Yichao Fu
  • Wei Zhou
  • Huihong Shi
  • Xiaofan Zhang
  • Souvik Kundu
  • Amir Yazdanbakhsh

Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity reductions of 5. 6 and 22. 7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3- and 2-bit precision, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https: //github. com/GATECH-EIC/ShiftAddLLM.

AAAI Conference 2024 Conference Paper

Structured Probabilistic Coding

  • Dou Hu
  • Lingwei Wei
  • Yaxin Liu
  • Wei Zhou
  • Songlin Hu

This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, our probabilistic coding simultaneously performs information encoding and task prediction in one module to more fully utilize the effective information from input data. It uses variational inference in the output space to reduce randomness and uncertainty. Besides, to better control the learning process of probabilistic representations, a structured regularization is proposed to promote uniformity across classes in the latent space. With the regularization term, SPC can preserve the Gaussian structure of the latent code and achieve better coverage of the hidden space with class uniformly. Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained language models for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.

JBHI Journal 2024 Journal Article

Unsupervised Domain Adaptation Fundus Image Segmentation via Multi-Scale Adaptive Adversarial Learning

  • Wei Zhou
  • Jianhang Ji
  • Wei Cui
  • Yingyuan Wang
  • Yugen Yi

Segmentation of the Optic Disc (OD) and Optic Cup (OC) is crucial for the early detection and treatment of glaucoma. Despite the strides made in deep neural networks, incorporating trained segmentation models for clinical application remains challenging due to domain shifts arising from disparities in fundus images across different healthcare institutions. To tackle this challenge, this study introduces an innovative unsupervised domain adaptation technique called Multi-scale Adaptive Adversarial Learning (MAAL), which consists of three key components. The Multi-scale Wasserstein Patch Discriminator (MWPD) module is designed to extract domain-specific features at multiple scales, enhancing domain classification performance and offering valuable guidance for the segmentation network. To further enhance model generalizability and explore domain-invariant features, we introduce the Adaptive Weighted Domain Constraint (AWDC) module. During training, this module dynamically assigns varying weights to different scales, allowing the model to adaptively focus on informative features. Furthermore, the Pixel-level Feature Enhancement (PFE) module enhances low-level features extracted at shallow network layers by incorporating refined high-level features. This integration ensures the preservation of domain-invariant information, effectively addressing domain variation and mitigating the loss of global features. Two publicly accessible fundus image databases are employed to demonstrate the effectiveness of our MAAL method in mitigating model degradation and improving segmentation performance. The achieved results outperform current state-of-the-art (SOTA) methods in both OD and OC segmentation.

IJCAI Conference 2023 Conference Paper

A Refined Upper Bound and Inprocessing for the Maximum K-plex Problem

  • Hua Jiang
  • Fusheng Xu
  • Zhifei Zheng
  • Bowen Wang
  • Wei Zhou

A k-plex of a graph G is an induced subgraph in which every vertex has at most k-1 nonadjacent vertices. The Maximum k-plex Problem (MKP) consists in finding a k-plex of the largest size, which is NP-hard and finds many applications. Existing exact algorithms mainly implement a branch-and-bound approach and improve performance by integrating effective upper bounds and graph reduction rules. In this paper, we propose a refined upper bound, which can derive a tighter upper bound than existing methods, and an inprocessing strategy, which performs graph reduction incrementally. We implement a new BnB algorithm for MKP that employs the two components to reduce the search space. Extensive experiments show that both the refined upper bound and the inprocessing strategy are very efficient in the reduction of search space. The new algorithm outperforms the state-of-the-art algorithms on the tested benchmarks significantly.

JBHI Journal 2023 Journal Article

Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation

  • Tianyi Shi
  • Xiaohuan Ding
  • Wei Zhou
  • Feng Pan
  • Zengqiang Yan
  • Xiang Bai
  • Xin Yang

Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms. However, achieving high pixel-wise accuracy, complete topology structure and robustness to various contrast variations are critical and challenging, and most existing methods focus only on achieving one or two of these aspects. In this paper, we present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach. Specifically, we compute a multiscale affinity field for each pixel, capturing its semantic relationships with neighboring pixels in the predicted mask image. This field represents the local geometry of vessel segments of different sizes, allowing us to learn spatial- and scale-aware adaptive weights to strengthen vessel features. We evaluate our AFN on four different types of vascular datasets: X-ray angiography coronary vessel dataset (XCAD), portal vein dataset (PV), digital subtraction angiography cerebrovascular vessel dataset (DSA) and retinal vessel dataset (DRIVE). Extensive experimental results demonstrate that our AFN outperforms the state-of-the-art methods in terms of both higher accuracy and topological metrics, while also being more robust to various contrast changes.

TIST Journal 2023 Journal Article

Asymmetrical Attention Networks Fused Autoencoder for Debiased Recommendation

  • Yihao Zhang
  • Chu Zhao
  • Weiwen Liao
  • Wei Zhou
  • Meng Yuan

Popularity bias is a massive challenge for autoencoder-based models, which decreases the level of personalization and hurts the fairness of recommendations. User reviews reflect their preferences and help mitigate bias or unfairness in the recommendation. However, most existing works typically incorporate user (item) reviews into a long document and then use the same module to process the document in parallel. Actually, the set of user reviews is completely different from the set of item reviews. User reviews are heterogeneous in that they reflect a variety of items purchased by users, while item reviews are only related to the item itself and are thus typically homogeneous. In this article, a novel asymmetric attention network fused with autoencoders is proposed, which jointly learns representations from the user and item reviews and implicit feedback to perform recommendations. Specifically, we design an asymmetric attentive module to capture rich representations from user and item reviews, respectively, which solves data sparsity and explainable problems. Furthermore, to further address popularity bias, we apply a noise-contrastive estimation objective to learn high-quality “de-popularity” embedding via the decoder structure. A series of extensive experiments are conducted on four benchmark datasets to show that leveraging user review information can eliminate popularity bias and improve performance compared to various state-of-the-art recommendation techniques.

AAAI Conference 2023 Conference Paper

Learning Semantic Alignment with Global Modality Reconstruction for Video-Language Pre-training towards Retrieval

  • Mingchao Li
  • Xiaoming Shi
  • Haitao Leng
  • Wei Zhou
  • Hai-Tao Zheng
  • Kuncai Zhang

Video-language pre-training for text-based video retrieval tasks is vitally important. Previous pre-training methods suffer from the semantic misalignments. The reason is that these methods ignore sequence alignments but focusing on critical token alignment. To alleviate the problem, we propose a video-language pre-training framework, termed videolanguage pre-training For lEarning sEmantic aLignments (FEEL), to learn semantic alignments at the sequence level. Specifically, the global modality reconstruction and the cross- modal self-contrasting method is utilized to learn the alignments at the sequence level better. Extensive experimental results demonstrate the effectiveness of FEEL on text-based video retrieval and text-based video corpus moment retrieval.

JBHI Journal 2023 Journal Article

MaskSleepNet: A Cross-Modality Adaptation Neural Network for Heterogeneous Signals Processing in Sleep Staging

  • Hangyu Zhu
  • Wei Zhou
  • Cong Fu
  • Yonglin Wu
  • Ning Shen
  • Feng Shu
  • Huan Yu
  • Wei Chen

Deep learning methods have become an important tool for automatic sleep staging in recent years. However, most of the existing deep learning-based approaches are sharply constrained by the input modalities, where any insertion, substitution, and deletion of input modalities would directly lead to the unusable of the model or a deterioration in the performance. To solve the modality heterogeneity problems, a novel network architecture named MaskSleepNet is proposed. It consists of a masking module, a multi-scale convolutional neural network (MSCNN), a squeezing and excitation (SE) block, and a multi-headed attention (MHA) module. The masking module consists of a modality adaptation paradigm that can cooperate with modality discrepancy. The MSCNN extracts features from multiple scales and specially designs the size of the feature concatenation layer to prevent invalid or redundant features from zero-setting channels. The SE block further optimizes the weights of the features to optimize the network learning efficiency. The MHA module outputs the prediction results by learning the temporal information between the sleeping features. The performance of the proposed model was validated on two publicly available datasets, Sleep-EDF Expanded (Sleep-EDFX) and Montreal Archive of Sleep Studies (MASS), and a clinical dataset, Huashan Hospital Fudan University (HSFU). The proposed MaskSleepNet can achieve favorable performance with input modality discrepancy, e. g. for single-channel EEG signal, it can reach 83. 8%, 83. 4%, 80. 5%, for two-channel EEG+EOG signals it can reach 85. 0%, 84. 9%, 81. 9% and for three-channel EEG+EOG+EMG signals, it can reach 85. 7%, 87. 5%, 81. 1% on Sleep-EDFX, MASS, and HSFU, respectively. In contrast the accuracy of the state-of-the-art approach which fluctuated widely between 69. 0% and 89. 4%. The experimental results exhibit that the proposed model can maintain superior performance and robustness in handling input modality discrepancy issues.

NeurIPS Conference 2022 Conference Paper

An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning

  • Xiu-Shen Wei
  • H. -Y. Xu
  • Faen Zhang
  • Yuxin Peng
  • Wei Zhou

Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data. Many sophisticated methods have been developed to address the challenges this problem comprises. In this paper, we propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective, and then augment the extremely label-constrained support set in few-shot classification tasks. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it is able to outperform state-of-the-art methods on four benchmark datasets.

IJCAI Conference 2022 Conference Paper

Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation

  • Yinan Bao
  • Qianwen Ma
  • Lingwei Wei
  • Wei Zhou
  • Songlin Hu

The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED.

IJCAI Conference 2021 Conference Paper

Temporal Heterogeneous Information Network Embedding

  • Hong Huang
  • Ruize Shi
  • Wei Zhou
  • Xiao Wang
  • Hai Jin
  • Xiaoming Fu

Heterogeneous information network (HIN) embedding, learning the low-dimensional representation of multi-type nodes, has been applied widely and achieved excellent performance. However, most of the previous works focus more on static heterogeneous networks or learning node embedding within specific snapshots, and seldom attention has been paid to the whole evolution process and capturing all temporal dynamics. In order to fill the gap of obtaining multi-type node embeddings by considering all temporal dynamics during the evolution, we propose a novel temporal HIN embedding method (THINE). THINE not only uses attention mechanism and meta-path to preserve structures and semantics in HIN but also combines the Hawkes process to simulate the evolution of the temporal network. Our extensive evaluations with various real-world temporal HINs demonstrate that THINE achieves state-of-the-art performance in both static and dynamic tasks, including node classification, link prediction, and temporal link recommendation.

AAAI Conference 2020 Conference Paper

ALOHA: Artificial Learning of Human Attributes for Dialogue Agents

  • Aaron W. Li
  • Veronica Jiang
  • Steven Y. Feng
  • Julia Sprague
  • Wei Zhou
  • Jesse Hoey

For conversational AI and virtual assistants to communicate with humans in a realistic way, they must exhibit human characteristics such as expression of emotion and personality. Current attempts toward constructing human-like dialogue agents have presented significant difficulties. We propose Human Level Attributes (HLAs) based on tropes as the basis of a method for learning dialogue agents that can imitate the personalities of fictional characters. Tropes are characteristics of fictional personalities that are observed recurrently and determined by viewers’ impressions. By combining detailed HLA data with dialogue data for specific characters, we present a dataset, HLA-Chat, that models character profiles and gives dialogue agents the ability to learn characters’ language styles through their HLAs. We then introduce a three-component system, ALOHA (which stands for Artificial Learning of Human Attributes), that combines character space mapping, character community detection, and language style retrieval to build a character (or personality) specific language model. Our preliminary experiments demonstrate that two variations of ALOHA, combined with our proposed dataset, can outperform baseline models at identifying the correct dialogue responses of chosen target characters, and are stable regardless of the character’s identity, the genre of the show, and the context of the dialogue.

NeurIPS Conference 2020 Conference Paper

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

  • Wei Zhou
  • Yiying Li
  • Yongxin Yang
  • Huaimin Wang
  • Timothy Hospedales

Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible and augmented meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.

IROS Conference 2020 Conference Paper

Soft Microrobotic Transmissions Enable Rapid Ground-Based Locomotion

  • Wei Zhou
  • Nick Gravish

In this paper we present the design, fabrication, testing, and control of a 0. 4 g milliscale robot employing a soft polymer flexure transmission for rapid ground movement. The robot was constructed through a combination of two methods: smart-composite-manufacturing (SCM) process to fabricate the actuators and robot chassis, and silicone elastomer molding and casting to fabricate a soft flexure transmission. We actuate the flexure transmission using two customized piezoelectric (PZT) actuators that attach to the transmission inputs. Through high-frequency oscillations, the actuators are capable of exciting vibrational resonance modes of the transmission which result in motion amplification on the transmission output. Directional spines on the transmission output generate traction force with the ground and drive the robot forward. By varying the excitation frequency of the soft transmission we can control locomotion speed, and when the transmission is oscillated at its resonance frequency we achieve high speeds with a peak speed of 439 mm/s (22 body lengths/s). By exciting traveling waves through the soft transmission, we were able to control the steering direction. Overall this paper demonstrates the feasibility of generating resonance behavior in millimeter scale soft robotic structures to achieve high-speed controllable locomotion.

AAAI Conference 2020 Short Paper

Who Are Controlled by The Same User? Multiple Identities Deception Detection via Social Interaction Activity (Student Abstract)

  • Jiacheng Li
  • Chunyuan Yuan
  • Wei Zhou
  • Jingli Wang
  • Songlin Hu

Social media has become a preferential place for sharing information. However, some users may create multiple accounts and manipulate them to deceive legitimate users. Most previous studies utilize verbal or behavior features based methods to solve this problem, but they are only designed for some particular platforms, leading to low universalness. In this paper, to support multiple platforms, we construct interaction tree for each account based on their social interactions which is common characteristic of social platforms. Then we propose a new method to calculate the social interaction entropy of each account and detect the accounts which are controlled by the same user. Experimental results on two real-world datasets show that the method has robust superiority over state-of-the-art methods.

ICML Conference 2019 Conference Paper

Feature-Critic Networks for Heterogeneous Domain Generalization

  • Yiying Li
  • Yongxin Yang
  • Wei Zhou
  • Timothy M. Hospedales

The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalisation is the recently topical problem of learning a model that generalises to unseen domains out of the box, and various approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalisation is itself learned. Beyond conventional domain generalisation, we consider a more challenging setting of heterogeneous domain generalisation, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature representation that is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings.

IJCAI Conference 2019 Conference Paper

Knowledge-enhanced Hierarchical Attention for Community Question Answering with Multi-task and Adaptive Learning

  • Min Yang
  • Lei Chen
  • Xiaojun Chen
  • Qingyao Wu
  • Wei Zhou
  • Ying Shen

In this paper, we propose a Knowledge-enhanced Hierarchical Attention for community question answering with Multi-task learning and Adaptive learning (KHAMA). First, we propose a hierarchical attention network to fully fuse knowledge from input documents and knowledge base (KB) by exploiting the semantic compositionality of the input sequences. The external factual knowledge helps recognize background knowledge (entity mentions and their relationships) and eliminate noise information from long documents that have sophisticated syntactic and semantic structures. In addition, we build multiple CQA models with adaptive boosting and then combine these models to learn a more effective and robust CQA system. Further- more, KHAMA is a multi-task learning model. It regards CQA as the primary task and question categorization as the auxiliary task, aiming at learning a category-aware document encoder and enhance the quality of identifying essential information from long questions. Extensive experiments on two benchmarks demonstrate that KHAMA achieves substantial improvements over the compared methods.

IJCAI Conference 2019 Conference Paper

Spectral Perturbation Meets Incomplete Multi-view Data

  • Hao Wang
  • Linlin Zong
  • Bing Liu
  • Yan Yang
  • Wei Zhou

Beyond existing multi-view clustering, this paper studies a more realistic clustering scenario, referred to as incomplete multi-view clustering, where a number of data instances are missing in certain views. To tackle this problem, we explore spectral perturbation theory. In this work, we show a strong link between perturbation risk bounds and incomplete multi-view clustering. That is, as the similarity matrix fed into spectral clustering is a quantity bounded in magnitude O(1), we transfer the missing problem from data to similarity and tailor a matrix completion method for incomplete similarity matrix. Moreover, we show that the minimization of perturbation risk bounds among different views maximizes the final fusion result across all views. This provides a solid fusion criteria for multi-view data. We motivate and propose a Perturbation-oriented Incomplete multi-view Clustering (PIC) method. Experimental results demonstrate the effectiveness of the proposed method.

AAAI Conference 2015 Conference Paper

Coupled Interdependent Attribute Analysis on Mixed Data

  • Can Wang
  • Chi-Hung Chi
  • Wei Zhou
  • Raymond Wong

In the real-world applications, heterogeneous interdependent attributes that consist of both discrete and numerical variables can be observed ubiquitously. The usual representation of these data sets is an information table, assuming the independence of attributes. However, very often, they are actually interdependent on one another, either explicitly or implicitly. Limited research has been conducted in analyzing such attribute interactions, which causes the analysis results to be more local than global. This paper proposes the coupled heterogeneous attribute analysis to capture the interdependence among mixed data by addressing coupling context and coupling weights in unsupervised learning. Such global couplings integrate the interactions within discrete attributes, within numerical attributes and across them to form the coupled representation for mixed-type objects based on dimension conversion and feature selection. This work makes one step forward towards explicitly modeling the interdependence of heterogeneous attributes among mixed data, verified by the applications in data structure analysis, data clustering evaluation, and density comparison. Substantial experiments on 12 UCI data sets show that our approach can effectively capture the global couplings of heterogeneous attributes and outperforms the state-of-the-art methods, supported by statistical analysis.