EAAI Journal 2026 Journal Article
A general framework for interactive semantic segmentation refinement of point clouds
- Peng Zhang
- Ting Wu
- Jinsheng Sun
- Weiqing Li
- Zhiyong Su
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
With the widespread deployment of large language models (LLMs) in human-computer interaction, dark patterns have extended from traditional visual interfaces to conversational AI systems. While existing research has confirmed the prevalence of dark patterns in LLMs, current evaluation benchmarks face critical challenges including limited classification coverage, overlooked risks specific to reasoning models, and inadequate consideration of cross-linguistic differences. To address these limitations, we propose DarkBench+, an extended benchmark for evaluating dark patterns in LLMs. We construct an expanded taxonomy containing 10 major categories and 24 subcategories, introduce an annotation workflow combining manual and automated methods, and design 2,088 bilingual test samples in Chinese and English. This benchmark is the first to develop specialized evaluation dimensions for reasoning models and systematically evaluates dark pattern behaviors across nearly 40 mainstream LLMs. Experimental results demonstrate significant manipulation risks in reasoning models' transparency displays, while cross-linguistic evaluation analyzes AI manipulation behavior differences across different linguistic environments, promoting more ethical and responsible LLM development.
AAAI Conference 2026 Conference Paper
Safety alignment instills in Large Language Models (LLMs) a critical capacity to refuse malicious requests. Prior works have modeled this refusal mechanism as a single linear direction in the activation space. We posit that this is an oversimplification that conflates two functionally distinct neural processes: the detection of harm and the execution of a refusal. In this work, we deconstruct this single representation into a Harm Detection Direction and a Refusal Execution Direction. Leveraging this fine-grained model, we introduce Differentiated Bi-Directional Intervention (DBDI), a new white-box framework that precisely neutralizes the safety alignment at critical layer. DBDI applies adaptive projection nullification to the refusal execution direction while suppressing the harm detection direction via direct steering. Extensive experiments demonstrate that DBDI outperforms prominent jailbreaking methods, achieving up to a 97.88% attack success rate on models such as Llama-2. By providing a more granular and mechanistic framework, our work offers a new direction for the in-depth understanding of LLM safety alignment.
AAAI Conference 2026 Conference Paper
Machine learning methods have been increasingly applied to solve Vehicle Routing Problems (VRPs). A high-efficiency approach is to learn solution construction using deep neural networks. However, their tendency toward premature convergence is a critical barrier, severely hindering generalization across diverse distributions and scales. To overcome this, we introduce Elite-Pattern Reinforcement (EPR), a novel strategy designed to create a synergy between the diverse, exploratory nature of reinforcement learning and the high-quality, structured knowledge from classical heuristics. The strategy guides the learning process by reinforcing structural patterns from elite solutions, employing an elite-guided score modulation to integrate this external knowledge. The inherent symmetry of path patterns is also exploited to augment the structural information. This steers the policy away from premature convergence by enabling it to distinguish and favour elite path patterns over inferior ones. Integrating our strategy with four construction methods yields substantial performance improvements on the CVRPLIB and TSPLIB benchmarks. Furthermore, our approach outperforms state-of-the-art learning-based methods, demonstrating superior generalization across diverse distributions and scales.
AAAI Conference 2026 Conference Paper
Medical vision-language pre-training (VLP) offers significant potential for advancing medical image understanding by leveraging paired image-report data. However, existing methods are limited by False Negatives (FaNe) induced by semantically similar texts and insufficient fine-grained cross-modal alignment. To address these limitations, we propose FaNe, a semantic-enhanced VLP framework. To mitigate false negatives, we introduce a semantic-aware positive pair mining strategy based on text-text similarity with adaptive normalization. Furthermore, we design a text-conditioned sparse attention pooling module to enable fine-grained image-text alignment through localized visual representations guided by textual cues. To strengthen intra-modal discrimination, we develop a hard-negative aware contrastive loss that adaptively reweights semantically similar negatives. Extensive experiments on five downstream medical imaging benchmarks demonstrate that FaNe achieves state-of-the-art performance across image classification, object detection, and semantic segmentation, validating the effectiveness of our framework.
AAAI Conference 2026 Conference Paper
Recent unified models have demonstrated that the reasoning capacity of Multimodal Large Language Models (MLLMs) can be leveraged to facilitate diffusion-based image generation with impressive flexibility and performance. However, approaches that rely heavily on MLLMs for high-level semantic encoding often struggle with fine-grained visual tasks like image editing and virtual try-on. To address this gap, we propose FUSE, a unified framework excelling at both high-level vision–language understanding and fine-grained generation. First, we introduce a Semantic-to-Detail Connector that pre-aligns fine-grained visual features with the MLLM's semantic space. This design counteracts the low-level information loss inherent in MLLM encodings, creating a unified representation that steers the diffusion process with both global semantics and rich local details. Second, to further enhance semantic awareness and detail preservation, we introduce Adaptive-GRPO, a post-training objective that dynamically balances semantic coherence against pixel-level fidelity. The integration of these two innovations allows FUSE to generate images that are both semantically faithful and visually fine-grained. Comprehensive experiments on text-to-image and instruction-guided editing benchmarks show that FUSE significantly outperforms existing unified baselines, achieving 0.89 on Geneval, 0.65 on WISE, and 3.88 on ImageEdit.
AAAI Conference 2026 Conference Paper
Trained on various human-authored corpora, Large Language Models (LLMs) have demonstrated a certain capability of reflecting specific human-like traits (e.g., personality or values) by prompting, benefiting applications like personalized LLMs and social simulations. However, existing methods suffer from the superficial elicitation problem: LLMs can only be steered to mimic shallow and unstable stylistic patterns, failing to embody the desired traits precisely and consistently across diverse tasks like humans. To address this challenge, we propose IROTE, a novel in-context method for stable and transferable trait elicitation. Drawing on psychological theories suggesting that traits are formed through identity-related reflection, our method automatically generates and optimizes a textual self-reflection within prompts, which comprises self-perceived experience, to stimulate LLMs' trait-driven behavior. The optimization is performed by iteratively maximizing an information-theoretic objective that enhances the connections between LLMs' behavior and the target trait, while reducing noisy redundancy in reflection without any fine-tuning, leading to evocative and compact trait reflection. Extensive experiments across three human trait systems manifest that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks beyond simple questionnaire answering, consistently outperforming existing strong baselines.
AAAI Conference 2026 Conference Paper
Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs.
AAAI Conference 2026 Conference Paper
Video diffusion generation suffers from critical sampling efficiency bottlenecks, particularly for large-scale models and long contexts. Existing video acceleration methods, adapted from image-based techniques, lack a single-step distillation ability for large-scale video models and task generalization for conditional downstream tasks. To bridge this gap, we propose the Video Phased Adversarial Equilibrium (V-PAE), a distillation framework that enables high-quality, single-step video generation from large-scale video models. Our approach employs a two-phase process. (i) Stability priming is a warm-up process to align the distributions of real and generated videos. It improves the stability of single-step adversarial distillation in the following process. (ii) Unified adversarial equilibrium is a flexible self-adversarial process that reuses generator parameters for the discriminator backbone. It achieves a co-evolutionary adversarial equilibrium in the Gaussian noise space. For the conditional tasks, we primarily preserve video-image subject consistency, which is caused by semantic degradation and conditional frame collapse during the distillation training in image-to-video (I2V) generation. Comprehensive experiments on VBench-I2V demonstrate that V-PAE outperforms existing acceleration methods by an average of 5.8% in the overall quality score, including semantic alignment, temporal coherence, and frame quality. In addition, our approach reduces the diffusion latency of the large-scale video model (e.g., Wan2.1-I2V-14B) by 100 times, while preserving competitive performance.
YNIMG Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Domain generalization remains a critical challenge for deploying neural networks, particularly in out-of-distribution object detection. The distributional discrepancy between training (e.g., daytime-sunny) and the realistic condition (e.g., night-rainy) inevitably produces imprecise localization and wrong classification. To address these issues, we propose a unified interaction consistency learning (UICL) framework, a novel single-source domain-generalized method designed to learn intra-class domain-invariant representations. Specifically, we put forth a cross-domain interaction mechanism to exchange region proposals between original and augmented pipelines, enriching the diversity of instance-level representations. Building upon this, we propose prediction-guided consistency learning to unify the interaction mechanism and harmonize the cross-domain representations, contributing to a discriminative prediction distribution under domain shift. In addition, we devise a cyclic interaction resilient detection strategy, which mitigates inaccurate predictions suffering from partial occlusion and ambiguous boundaries among different domains. Extensive experiments evidence that UICL significantly improves the robustness of detectors over several target domains, achieving state-of-the-art generalization performance on the diverse weather benchmark.
AAAI Conference 2025 Conference Paper
An essential component in Large Language Models (LLMs) is Rotary Position Encoding (RoPE), which efficiently manages positional dependencies in long-context modeling. However, when the number of input tokens surpasses the pretrained capacity of LLMs, their ability to process and generate text is markedly weakened. Although position interpolation techniques for RoPE can mitigate this issue, an increase in interpolations leads to a decrease in positional resolution. To tackle this challenge, drawing inspiration from the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D RoPE, with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay, 3D-RPE allows for the regulation of long-term decay within the chunk size, ensuring the modeling of relative positional information between tokens at a distant relative position. For improved position resolution, 3D-RPE can mitigate the degradation of position resolution caused by position interpolation on RoPE. We have conducted experiments on long-context Natural Language Understanding (NLU) and long sequence Language Modeling (LM) tasks. From the experimental results, 3D-RPE achieved performance improvements over RoPE, especially in long-context NLU tasks.
EAAI Journal 2025 Journal Article
JAIR Journal 2025 Journal Article
In recent years, the realm of computer vision has experienced a significant surge in the importance of 3D object detection, especially in the context of autonomous driving. The capability to precisely identify the locations, dimensions, and types of key 3D objects surrounding an autonomous vehicle is crucial, rendering 3D object detection a vital component of any advanced perception system. This review delivers an extensive overview of the emerging technologies in 3D object detection tailored for autonomous vehicles. It encompasses a thorough examination, evaluation, and integration of the current research landscape in this domain, staying up-to-date with the latest advancements in 3D object detection and suggesting prospective avenues for future research. Our survey begins by clarifying the principles of 3D object detection and addressing its present challenges in the 3D domain. We then introduce three distinct taxonomies: camera-based, point cloudbased, and multi-modality-based approaches, providing a comprehensive classification of contemporary 3D object detection methodologies from various angles. Diverging from previous reviews, this paper also highlights and scrutinizes common issues and solutions for specific scenarios (such as pedestrian detection, lane lines, roadside cameras, and weather conditions) in object detection. Furthermore, we conduct an in-depth analysis and comparison of different classifications and methods, utilizing various datasets and experimental outcomes. Conclusively, we suggest several potential research directions, offering valuable insights for the ongoing evolution of 3D object detection technology. This review aims to serve as a comprehensive resource for researchers and practitioners in the field, guiding future innovations in 3D object detection for autonomous driving.
AAAI Conference 2025 Conference Paper
Aerodynamic coefficient prediction is pivotal in aircraft and vehicles' design, performance evaluation, and motion control. Integrating artificial neural networks into aerodynamic coefficient prediction offers a promising alternative to traditional numerical methods burdened by extensive computations and high costs. Nevertheless, this data-driven approach faces several critical challenges, which limit its further performance enhancement: i) The current research lacks a profound understanding of the complex interplay between the shape of an object and its aerodynamic characteristics. ii) The scarcity of high-quality aerodynamic data poses a significant barrier. The models trained on limited datasets lack generalization ability, struggling to accurately predict and adapt to diverse aerodynamic performance under new shapes or conditions. To overcome these challenges, we introduce an innovative framework that employs cross-attention to capture the intimate interplay between shape and flow conditions and allows for the direct utilization of pre-trained models on general shape datasets to mitigate the scarcity of aerodynamic data. Furthermore, to bolster the inference capabilities of this data-driven approach, we integrate physical information constraints into the model, leveraging them as guiding principles to enhance the model's predictive power under unknown conditions. Experimental validation demonstrates that our proposed method performs excellently in multiple aerodynamic prediction tasks. This achievement brings a new technological breakthrough to the field of aerodynamic prediction and provides robust support for the design optimization of complex systems such as aircraft and vehicles.
YNICL Journal 2025 Journal Article
NeurIPS Conference 2025 Conference Paper
Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the equilibrium policy for each single graph. To construct an equilibrium oracle for single-graph policies, we present a dynamic programming (DP) algorithm that provably generates pure-strategy Nash equilibrium with near-optimal time complexity. To guarantee scalability with respect to pursuer number, we further extend DP and RL by designing a grouping mechanism and a sequence model for joint policy decomposition, respectively. Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance in various unseen real-world graphs. Besides, when trained under an equilibrium heuristic proposed for the graphs with exits, our generalized pursuer policy can even match the performance of the fine-tuned policies from the state-of-the-art PEG methods.
ICML Conference 2025 Conference Paper
Scattering characteristics of synthetic aperture radar (SAR) targets are typically related to observed azimuth and depression angles. However, in practice, it is difficult to obtain adequate training samples at all observation angles, which probably leads to poor robustness of deep networks. In this paper, we first propose a Gamma-Distribution Principal Component Analysis ($\Gamma$PCA) model that fully accounts for the statistical characteristics of SAR data. The $\Gamma$PCA derives consistent convolution kernels to effectively capture the angle-invariant features of the same target at various attitude angles, thus alleviating deep models’ sensitivity to angle changes in SAR target recognition task. We validate $\Gamma$PCA model based on two commonly used backbones, ResNet and ViT, and conduct multiple robustness experiments on the MSTAR benchmark dataset. The experimental results demonstrate that $\Gamma$PCA effectively enables the model to withstand substantial distributional discrepancy caused by angle changes. Additionally, $\Gamma$PCA convolution kernel is designed to require no parameter updates, introducing no extra computational burden to the network. The source code is available at https: //github. com/ChGrey/GammaPCA.
AAAI Conference 2025 Conference Paper
Personalized federated learning (PFL) on graphs is an emerging field focusing on the collaborative development of architectures across multiple clients, each with distinct graph data distributions while adhering to strict privacy standards. This area often requires extensive expert intervention in model design, which is a significant limitation. Recent advancements have aimed to automate the search for graph neural network architectures, incorporating large language models (LLMs) for their advanced reasoning and self-reflection capabilities. However, two technical challenges persist. First, although LLMs are effective in natural language processing, their ability to meet the complex demands of graph neural architecture search (GNAS) is still being explored. Second, while LLMs can guide the architecture search process, they do not directly solve the issue of client drift due to heterogeneous data distributions. To address these challenges, we introduce a novel method, Personalized Federated Graph Neural Architecture Search (PFGNAS). This approach employs a task-specific prompt to identify and integrate optimal GNN architectures continuously. To counteract client drift, PFGNAS utilizes a weight-sharing strategy of supernet, which optimizes the local architectures while ensuring client-specific personalization. Extensive evaluations show that PFGNAS significantly outperforms traditional PFL methods, highlighting the advantages of integrating LLMs into personalized federated learning environments.
EAAI Journal 2025 Journal Article
NeurIPS Conference 2025 Conference Paper
Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1. 45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. We integrate this module into renowned baselines LLaVA-1. 5 and Qwen2. 5-VL and build our underwater LMM, NAUTILUS. Experiments conducted on the NautData and public underwater datasets demonstrate the effectiveness of the VFE module, consistently improving the performance of both baselines on the majority of supported tasks, thus ensuring the superiority of NAUTILUS in the underwater scene understanding area. Data and models are available at https: //github. com/H-EmbodVis/NAUTILUS.
IJCAI Conference 2025 Conference Paper
Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. Benchmark website: https: //github. com/phy-q/novphy
NeurIPS Conference 2025 Conference Paper
Although significant progress has been made in audio-driven talking head generation, text-driven methods remain underexplored. In this work, we present OmniTalker, a unified framework that jointly generates synchronized talking audio-video content from input text while emulating the target identity's speaking and facial movement styles, including speech characteristics, head motion, and facial dynamics. Our framework adopts a dual-branch diffusion transformer (DiT) architecture, with one branch dedicated to audio generation and the other to video synthesis. At the shallow layers, cross-modal fusion modules are introduced to integrate information between the two modalities. In deeper layers, each modality is processed independently, with the generated audio decoded by a vocoder and the video rendered using a GAN-based high-quality visual renderer. Leveraging DiT’s in-context learning capability through a masked-infilling strategy, our model can simultaneously capture both audio and visual styles without requiring explicit style extraction modules. Thanks to the efficiency of the DiT backbone and the optimized visual renderer, OmniTalker achieves real-time inference at 25 FPS. To the best of our knowledge, OmniTalker is the first one-shot framework capable of jointly modeling speech and facial styles in real time. Extensive experiments demonstrate its superiority over existing methods in terms of generation quality, particularly in preserving style consistency and ensuring precise audio-video synchronization, all while maintaining efficient inference.
JBHI Journal 2025 Journal Article
The blood pressure (BP) waveform is a vital source of physiological and pathological information concerning the cardiovascular system. This study proposes a novel attention-guided conditional generative adversarial network (cGAN), named PPG2BP-cGAN, to estimate BP waveforms based on photoplethysmography (PPG) signals. The proposed model comprises a generator and a discriminator. Specifically, the UNet3+-based generator integrates a full-scale skip connection structure with a modified polarized self-attention module based on a spatial-temporal attention mechanism. Additionally, its discriminator comprises PatchGAN, which augments the discriminative power of the generated BP waveform by increasing the perceptual field through fully convolutional layers. We demonstrate the superior BP waveform prediction performance of our proposed method compared to state-of-the-art (SOTA) techniques on two independent datasets. Our approach first pre-trained on a dataset containing 683 subjects and then tested on a public dataset. Experimental results from the Multi-parameter Intelligent Monitoring in Intensive Care dataset show that the proposed method achieves a root mean square error of 3. 54, mean absolute error of 2. 86, and Pearson coefficient of 0. 99 for BP waveform estimation. Furthermore, the estimation errors (mean error ± standard deviation error) for systolic BP and diastolic BP are 0. 72 ± 4. 34 mmHg and 0. 41 ± 2. 48 mmHg, respectively, meeting the American Association for the Advancement of Medical Instrumentation standard. Our approach exhibits significant superiority over SOTA techniques on independent datasets, thus highlighting its potential for future applications in continuous cuffless BP waveform measurement.
JBHI Journal 2025 Journal Article
The R-peak in electrocardiogram (ECG) signals is a critical physiological marker for the diagnosis of cardiovascular diseases. Although various R-peak detection methods have been proposed, their performance is often hindered by noise, especially in dynamic ECG monitoring. Furthermore, the potential of harnessing complementary information from 12-lead ECG signals has not been fully exploited. To address these challenges, this study conceptualized 12-lead ECG data as two-dimensional images and employed YOLOv5 as the model's backbone for R-peak detection, effectively transforming a signal segmentation task into an object detection task in images. Specifically, considering the characteristics of consistent R-peak positions across different leads, we proposed a strip attention mechanism to treat horizontal or vertical strips as tokens for computing inter- and intra-strip attention, enhancing the model's ability to capture R-peak positional information and likelihood. Additionally, a one-dimensional Manhattan distance-based NMS algorithm was used to minimize redundant detection frames, thereby enhancing model performance. The proposed model was rigorously evaluated on two publicly available datasets, INCART and LUDB, under varying noise conditions. On the INCART dataset, the model achieved F1 scores of 99. 97%, 99. 86%, 99. 63%, and 98. 00% at noise levels of Original, SNR = 10 dB, SNR = 5 dB, and SNR = 0 dB, respectively. Similarly, on the LUDB dataset, the F1 scores were 99. 89%, 100%, 100%, and 99. 86% for the corresponding noise levels. Extensive testing across multiple datasets and noise scenarios demonstrated that the proposed model outperformed existing state-of-the-art methods in terms of accuracy, noise robustness, and generalization capability.
AAAI Conference 2025 Conference Paper
An efficient and precise diagnosis of retinal diseases is a fundamental goal for auxiliary diagnostic systems in ophthalmology. Inspired by the importance of scattered subtle lesions in manual retinal disease diagnosis, recent research has achieved state-of-the-art performance by mining information related to subtle lesions, including their texture and shape. However, the spatial distribution patterns of subtle lesion areas, which are also crucial in manual diagnosis, have been overlooked in existing research. Neglecting these spatial distribution patterns (e.g., the ring distribution of microaneurysms in diabetic macular edema) may negatively impact the diagnostic process. In this paper, we introduce the Saliency-Image-Graph (SIGraph) network to capture the spatial distribution patterns of lesion areas. We first employ saliency-based perception to identify latent lesion pixels. Subsequently, we propose a novel image-graph block to efficiently capture the global distribution of abundant lesion pixels with minimal information loss. By leveraging additional distribution patterns, SIGraph achieves state-of-the-art performance with at least a 1.5% performance gain across three datasets. Furthermore, ablation studies demonstrate that our image-graph block can be integrated into other visual backbones and effectively boost performance.
JBHI Journal 2025 Journal Article
Automatic analysis methods of electrocardiograms (ECGs) usually required large-scale annotated training data, but the annotation process is extremely time-consuming. While semi-supervised learning can leverage unlabeled data, its performance depends heavily on the quality of the initial labeled subset. Active learning has been used to identify the most informative samples for annotation, but conventional approaches face three critical limitations: (1) dependency on manual intervention for iterative query design, (2) prohibitive computational costs during sample selection, and (3) limited compatibility with semi-supervised learning frameworks. To address these limitations, we proposed an Unsupervised Active Feature-selective Semi-Supervised Learning (UAFSSL) framework for ECG analysis, including an unsupervised feature selection-based active learning module and a semi-supervised learning module. UAFSSL captures latent data distributions via unsupervised feature extraction, selects diverse and representative samples using pseudo-label clustering, and integrates seamlessly with semi-supervised learning to eliminate human intervention. We validated our algorithm on an ECG waveform segmentation task and an atrial fibrillation detection task. In the waveform segmentation task, our method improved the F1-score for P-wave delineation by 2. 4% compared to random sampling, using only 5% of labeled samples. For the atrial fibrillation detection task, we evaluated our method on both the AFDB and a 24-hour dataset collected from 500 atrial fibrillation patients. Using only 200 labeled samples for model training, our method achieved AUC improvements of 2. 5% and 2. 2% over random sampling in five-fold cross validation. This is the first study to integrate unsupervised active learning with semi-supervised learning for automatic ECG analysis, offering a robust, automated solution to reduce annotation costs while enhancing clinical applicability.
JBHI Journal 2024 Journal Article
Radiomics refers to the high-throughput extraction of quantitative features from medical images, and is widely used to construct machine learning models for the prediction of clinical outcomes, while feature engineering is the most important work in radiomics. However, current feature engineering methods fail to fully and effectively utilize the heterogeneity of features when dealing with different kinds of radiomics features. In this work, latent representation learning is first presented as a novel feature engineering approach to reconstruct a set of latent space features from original shape, intensity and texture features. This proposed method projects features into a subspace called latent space, in which the latent space features are obtained by minimizing a unique hybrid loss function including a clustering-like loss and a reconstruction loss. The former one ensures the separability among each class while the latter one narrows the gap between the original features and latent space features. Experiments were performed on a multi-center non-small cell lung cancer (NSCLC) subtype classification dataset from 8 international open databases. Results showed that compared with four traditional feature engineering methods (baseline, PCA, Lasso and L2, 1-norm minimization), latent representation learning could significantly improve the classification performance of various machine learning classifiers on the independent test set (all p < 0. 001). Further on two additional test sets, latent representation learning also showed a significant improvement in generalization performance. Our research shows that latent representation learning is a more effective feature engineering method, which has the potential to be used as a general technology in a wide range of radiomics researches.
JBHI Journal 2024 Journal Article
Parkinson's disease (PD) is a common degenerative disease of the nervous system in the elderly. The early diagnosis of PD is very important for potential patients to receive prompt treatment and avoid the aggravation of the disease. Recent studies have found that PD patients always suffer from emotional expression disorder, thus forming the characteristics of “masked faces”. Based on this, we thus propose an auto PD diagnosis method based on mixed emotional facial expressions in the paper. Specifically, the proposed method is cast into four steps: Firstly, we synthesize virtual face images containing six basic expressions (i. e. , anger, disgust, fear, happiness, sadness, and surprise) via generative adversarial learning, in order to approximate the premorbid expressions of PD patients; Secondly, we design an effective screening scheme to assess the quality of the above synthesized facial expression images and then shortlist the high-quality ones; Thirdly, we train a deep feature extractor accompanied with a facial expression classifier based on the mixture of the original facial expression images of the PD patients, the high-quality synthesized facial expression images of PD patients, and the normal facial expression images from other public face datasets; Finally, with the well-trained deep feature extractor, we thus adopt it to extract the latent expression features for six facial expression images of a potential PD patient to conduct PD/non-PD prediction. To show real-world impacts, we also collected a new facial expression dataset of PD patients in collaboration with a hospital. Extensive experiments are conducted to validate the effectiveness of the proposed method for PD diagnosis and facial expression recognition.
AAAI Conference 2024 Conference Paper
Engineering design methods aim to generate new designs that meet desired performance requirements. Past work has directly introduced conditional Generative Adversarial Networks (cGANs) into this field and achieved promising results in single-point design problems(one performance requirement under one working condition). However, these methods assume that the performance requirements are distributed in categorical space, which is not reasonable in these scenarios. Although Continuous conditional GANs (CcGANs) introduce Vicinal Risk Minimization (VRM) to reduce the performance loss caused by this assumption, they still face the following challenges: 1) CcGANs can not handle multi-point design problems (multiple performance requirements under multiple working conditions). 2) Their training process is time-consuming due to the high computational complexity of the vicinal loss. To address these issues, A Continuous conditional Diffusion Probabilistic Model (CcDPM) is proposed, which the first time introduces the diffusion model into the engineering design area and VRM into the diffusion model. CcDPM adopts a novel sampling method called multi-point design sampling to deal with multi-point design problems. Moreover, the k-d tree is used in the training process of CcDPM to shorten the calculation time of vicinal loss and speed up the training process by 2-300 times in our experiments. Experiments on a synthetic problem and three real-world design problems demonstrate that CcDPM outperforms the state-of-the-art GAN models.
EAAI Journal 2024 Journal Article
EAAI Journal 2024 Journal Article
JBHI Journal 2024 Journal Article
The heart sound reflects the movement status of the cardiovascular system and contains the early pathological information of cardiovascular diseases. Automatic heart sound diagnosis plays an essential role in the early detection of cardiovascular diseases. In this study, we aim to develop a novel end-to-end heart sound abnormality detection and classification method, which can be adapted to different heart sound diagnosis tasks. Specifically, we developed a Multi-feature Decision Fusion Network (MDFNet) composed of a Multi-dimensional Feature Extraction (MFE) module and a Multi-dimensional Decision Fusion (MDF) module. The MFE module extracted spatial features, multi-level temporal features and spatial-temporal fusion features to learn heart sound characteristics from multiple perspectives. Through deep supervision and decision fusion, the MDF module made the multi-dimensional features extracted by the MFE module more discriminative, and fused the decision results of multi-dimensional features to integrate complementary information. Furthermore, attention modules were embedded in the MDFNet to emphasize the fundamental heart sounds containing effective feature information. Finally, we proposed an efficient data augmentation method to circumvent the diagnosis performance degradation caused by the lack of cardiac cycle segmentation in other end-to-end methods. The developed method achieved an overall accuracy of 94. 44% and a F1-score of 86. 90% on the binary classification task and a F1-score of 99. 30% on the five-classification task. Our method outperformed other state-of-the-art methods and had good clinical application prospects.
AIJ Journal 2024 Journal Article
IJCAI Conference 2024 Conference Paper
Big models have achieved revolutionary breakthroughs in the field of AI, but they also pose potential ethical and societal risks to humans. Addressing such problems, alignment technologies were introduced to make these models conform to human preferences and values. Despite the considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable oversight, and how to align remains an open question. In this survey paper, we comprehensively investigate value alignment approaches. We first unpack the historical context of alignment tracing back to the 1920s (where it comes from), then delve into the mathematical essence of alignment (what it is), shedding light on the inherent challenges. Following this foundation, we provide a detailed examination of existing alignment methods, which fall into three categories: RL-based Alignment, SFT-based Alignment, and Inference-Time Alignment, and demonstrate their intrinsic connections, strengths, and limitations, helping readers better understand this research area. In addition, two emerging topics, alignment goal and multimodal alignment, are also discussed as novel frontiers in the field. Looking forward, we discuss potential alignment paradigms and how they could handle remaining challenges, prospecting where future alignment will go.
NeurIPS Conference 2024 Conference Paper
We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior performance in both ID fidelity and editability. Another attractive property of PuLID is that the image elements (\eg, background, lighting, composition, and style) before and after the ID insertion are kept as consistent as possible. Codes and models are available at https: //github. com/ToTheBeginning/PuLID
ICML Conference 2024 Conference Paper
Implicit neural representations have emerged as a powerful paradigm to represent signals such as images and sounds. This approach aims to utilize neural networks to parameterize the implicit function of the signal. However, when representing implicit functions, traditional neural networks such as ReLU-based multilayer perceptrons face challenges in accurately modeling high-frequency components of signals. Recent research has begun to explore the use of Fourier Neural Networks (FNNs) to overcome this limitation. In this paper, we propose Quantum Implicit Representation Network (QIREN), a novel quantum generalization of FNNs. Furthermore, through theoretical analysis, we demonstrate that QIREN possesses a quantum advantage over classical FNNs. Lastly, we conducted experiments in signal representation, image superresolution, and image generation tasks to show the superior performance of QIREN compared to state-of-the-art (SOTA) models. Our work not only incorporates quantum advantages into implicit neural representations but also uncovers a promising application direction for Quantum Neural Networks.
AAAI Conference 2024 Conference Paper
In recent years, researchers have developed novel Quantum-Inspired Neural Network (QINN) frameworks for the Natural Language Processing (NLP) tasks, inspired by the theoretical investigations of quantum cognition. However, we have found that the training efficiency of QINNs is significantly lower than that of classical networks. We analyze the unitary transformation modules of existing QINNs based on the time displacement symmetry of quantum mechanics and discover that they are resembling a mathematical form similar to the first-order Euler method. The high truncation error associated with Euler method affects the training efficiency of QINNs. In order to enhance the training efficiency of QINNs, we generalize QINNs' unitary transformation modules to the Quantum-like high-order Runge-Kutta methods (QRKs). Moreover, we present the results of experiments on conversation emotion recognition and text classification tasks to validate the effectiveness of the proposed approach.
JAAMAS Journal 2023 Journal Article
Abstract Deep reinforcement learning has contributed to dramatic advances in many tasks, such as playing games, controlling robots, and navigating complex environments. However, it requires many interactions with the environment. This is different from the human learning process since humans can use prior knowledge, which can significantly speed up the learning process as it avoids unnecessary exploration. Previous works integrating knowledge in RL did not model uncertainty in human cognition, which reduces the reliability of knowledge. In this paper, we propose a knowledge-guided policy network, a novel framework that combines suboptimal human knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller representing human knowledge and a refined module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing reinforcement learning algorithms such as PPO, AC, and SAC. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, significantly improves the learning efficiency of basic RL algorithms, even with very low-performance human prior knowledge. Additional experiments are conducted on the number of fuzzy rules and the interpretability of the policy, which make our proposed framework more complete and reasonable. The code for this research is released under the project page of https: //github. com/yuyuanq/reinforcement-learning-using-knowledge-controller.
EAAI Journal 2023 Journal Article
EAAI Journal 2023 Journal Article
YNIMG Journal 2023 Journal Article
EAAI Journal 2023 Journal Article
JBHI Journal 2023 Journal Article
Ambulatory blood pressure (BP) monitoring plays a critical role in the early prevention and diagnosis of cardiovascular diseases. However, cuff-based inflatable devices cannot be used for continuous BP monitoring, while pulse transit time or multi-parameter-based methods require more bioelectrodes to acquire electrocardiogram signals. Thus, estimating the BP waveforms only based on photoplethysmography (PPG) signals for continuous BP monitoring has essential clinical values. Nevertheless, extracting useful features from raw PPG signals for fine-grained BP waveform estimation is challenging due to the physiological variation and noise interference. For single PPG analysis utilizing deep learning methods, the previous works depend mainly on stacked convolution operation, which ignores the underlying complementary time-dependent information. Thus, this work presents a novel Transformer-based method with knowledge distillation (KD-Informer) for BP waveform estimation. Meanwhile, we integrate the prior information of PPG patterns, selected by a novel backward elimination algorithm, into the knowledge transfer branch of the KD-Informer. With these strategies, the model can effectively capture the discriminative features through a lightweight architecture during the learning process. Then, we further adopt an effective transfer learning technique to demonstrate the excellent generalization capability of the proposed model using two independent multicenter datasets. Specifically, we first fine-tuned the KD-Informer with a large and high-quality dataset (Mindray dataset) and then transferred the pre-trained model to the target domain (MIMIC dataset). The experimental test results on the MIMIC dataset showed that the KD-Informer exhibited an estimation error of 0. 02 ± 5. 93 mmHg for systolic BP (SBP) and 0. 01 ± 3. 87 mmHg for diastolic BP (DBP), which complied with the association for the advancement of medical instrumentation (AAMI) standard. These results demonstrate that the KD-Informer has high reliability and elegant robustness to measure continuous BP waveforms.
YNICL Journal 2023 Journal Article
JBHI Journal 2023 Journal Article
Pancreatic cancer is one of the most malignant cancers with high mortality. The rapid on-site evaluation (ROSE) technique can significantly accelerate the diagnostic workflow of pancreatic cancer by immediately analyzing the fast-stained cytopathological images with on-site pathologists. However, the broader expansion of ROSE diagnosis has been hindered by the shortage of experienced pathologists. Deep learning has great potential for the automatic classification of ROSE images in diagnosis. But it is challenging to model the complicated local and global image features. The traditional convolutional neural network (CNN) structure can effectively extract spatial features, while it tends to ignore global features when the prominent local features are misleading. In contrast, the Transformer structure has excellent advantages in capturing global features and long-range relations, while it has limited ability in utilizing local features. We propose a multi-stage hybrid Transformer (MSHT) to combine the strengths of both, where a CNN backbone robustly extracts multi-stage local features at different scales as the attention guidance, and a Transformer encodes them for sophisticated global modeling. Going beyond the strength of each single method, the MSHT can simultaneously enhance the Transformer global modeling ability with the local guidance from CNN features. To evaluate the method in this unexplored field, a dataset of 4240 ROSE images is collected where MSHT achieves 95. 68% in classification accuracy with more accurate attention regions. The distinctively superior results compared to the state-of-the-art models make MSHT extremely promising for cytopathological image analysis.
TCS Journal 2023 Journal Article
TCS Journal 2023 Journal Article
AAAI Conference 2023 Conference Paper
The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and edit- ing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying net- work, and see great generalizability towards unseen manipulation types and out-of-domain images.
ICML Conference 2023 Conference Paper
Vision Transformers (ViTs) have continuously achieved new milestones in object detection. However, the considerable computation and memory burden compromise their efficiency and generalization of deployment on resource-constraint devices. Besides, efficient transformer-based detectors designed by existing works can hardly achieve a realistic speedup, especially on multi-core processors (e. g. , GPUs). The main issue is that the current literature solely concentrates on building algorithms with minimal computation, oblivious that the practical latency can also be affected by the memory access cost and the degree of parallelism. Therefore, we propose SpeedDETR, a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices. Specifically, we design a latency prediction model which can directly and accurately estimate the network latency by analyzing network properties, hardware memory access pattern, and degree of parallelism. Following the effective local-to-global visual modeling process and the guidance of the latency prediction model, we build our hardware-oriented architecture design and develop a new family of SpeedDETR. Experiments on the MS COCO dataset show SpeedDETR outperforms current DETR-based methods on Tesla V100. Even acceptable speed inference can be achieved on edge GPUs.
YNIMG Journal 2023 Journal Article
YNICL Journal 2022 Journal Article
AAAI Conference 2022 Conference Paper
Expert finding, a popular service provided by many online websites such as Expertise Finder, LinkedIn, and AMiner, is beneficial to seeking candidate qualifications, consultants, and collaborators. However, its quality is suffered from lack of ample sources of expert information. This paper employs AMiner as the basis with an aim at linking any external experts to the counterparts on AMiner. As it is infeasible to acquire sufficient linkages from arbitrary external sources, we explore the problem of zero-shot expert linking. In this paper, we propose CODE, which first pre-trains an expert linking model by contrastive learning on AMiner such that it can capture the representation and matching patterns of experts without supervised signals, then it is fine-tuned between AMiner and external sources to enhance the model’s transferability in an adversarial manner. For evaluation, we first design two intrinsic tasks, author identification and paper clustering, to validate the representation and matching capability endowed by contrastive learning. Then the final external expert linking performance on two genres of external sources also implies the superiority of the adversarial fine-tuning method. Additionally, we show the online deployment of CODE, and continuously improve its online performance via active learning.
YNIMG Journal 2022 Journal Article
NeurIPS Conference 2022 Conference Paper
Hand, the bearer of human productivity and intelligence, is receiving much attention due to the recent fever of digital twins. Among different hand morphable models, MANO has been widely used in vision and graphics community. However, MANO disregards textures and accessories, which largely limits its power to synthesize photorealistic hand data. In this paper, we extend MANO with Diverse Accessories and Rich Textures, namely DART. DART is composed of 50 daily 3D accessories which varies in appearance and shape, and 325 hand-crafted 2D texture maps covers different kinds of blemishes or make-ups. Unity GUI is also provided to generate synthetic hand data with user-defined settings, e. g. , pose, camera, background, lighting, textures, and accessories. Finally, we release DARTset, which contains large-scale (800K), high-fidelity synthetic hand images, paired with perfect-aligned 3D labels. Experiments demonstrate its superiority in diversity. As a complement to existing hand datasets, DARTset boosts the generalization in both hand pose estimation and mesh recovery tasks. Raw ingredients (textures, accessories), Unity GUI, source code and DARTset are publicly available at dart2022. github. io.
NeurIPS Conference 2022 Conference Paper
In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly reduces the parameters of the word embeddings. We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that MorphTE can compress word embedding parameters by about $20$ times without performance loss and significantly outperforms related embedding compression methods.
NeurIPS Conference 2022 Conference Paper
Dynamic interaction graphs have been widely adopted to model the evolution of user-item interactions over time. There are two crucial factors when modelling user preferences for link prediction in dynamic interaction graphs: 1) collaborative relationship among users and 2) user personalized interaction patterns. Existing methods often implicitly consider these two factors together, which may lead to noisy user modelling when the two factors diverge. In addition, they usually require time-consuming parameter learning with back-propagation, which is prohibitive for real-time user preference modelling. To this end, this paper proposes FreeGEM, a parameter-free dynamic graph embedding method for link prediction. Firstly, to take advantage of the collaborative relationships, we propose an incremental graph embedding engine to obtain user/item embeddings, which is an Online-Monitor-Offline architecture consisting of an Online module to approximately embed users/items over time, a Monitor module to estimate the approximation error in real time and an Offline module to calibrate the user/item embeddings when the online approximation errors exceed a threshold. Meanwhile, we integrate attribute information into the model, which enables FreeGEM to better model users belonging to some under represented groups. Secondly, we design a personalized dynamic interaction pattern modeller, which combines dynamic time decay with attention mechanism to model user short-term interests. Experimental results on two link prediction tasks show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency. All code and datasets can be found in https: //github. com/FudanCISL/FreeGEM.
JBHI Journal 2022 Journal Article
Paroxysmal atrial fibrillation (AF) is generally diagnosed by long-term dynamic electrocardiogram (ECG) monitoring. Identifying AF episodes from long-term ECG data can place a heavy burden on clinicians. Many machine-learning-based automatic AF detection methods have been proposed to solve this issue. However, these methods require numerous annotated data to train the model, and the annotation of AF in long-term ECG is extremely time-consuming. Reducing the demand for labeled data can effectively improve the clinical practicability of automatic AF detection methods. In this study, we developed a novel semi-supervised learning method that generated modified low-entropy labels of unlabeled samples for training a deep learning model to automatically detect paroxysmal AF in 24 h Holter monitoring data. Our method employed a 1D CNN-LSTM neural network with RR intervals as input and used few labeled training data with numerous unlabeled data for training the neural network. This method was evaluated using a 24 h Holter monitoring dataset collected from 1000 paroxysmal AF patients. Using labeled samples from only 10 patients for model training, our method achieved a sensitivity of 97. 8%, specificity of 97. 9%, and accuracy of 97. 9% in five-fold cross-validation. Compared to the supervised learning method with complete labeled samples, the detection accuracy of our method was only 0. 5% lower, while the workload of data annotation was significantly reduced by more than 98%. In general, this is the first study to apply semi-supervised learning techniques for automatic AF detection using ECG. Our method can effectively reduce the demand for AF data annotations and can improve the clinical practicability of automatic AF detection.
IJCAI Conference 2022 Conference Paper
Inductive link prediction for knowledge graph aims at predicting missing links between unseen entities, those not shown in training stage. Most previous works learn entity-specific embeddings of entities, which cannot handle unseen entities. Recent several methods utilize enclosing subgraph to obtain inductive ability. However, all these works only consider the enclosing part of subgraph without complete neighboring relations, which leads to the issue that partial neighboring relations are neglected, and sparse subgraphs are hard to be handled. To address that, we propose Subgraph Neighboring Relations Infomax, SNRI, which sufficiently exploits complete neighboring relations from two aspects: neighboring relational feature for node feature and neighboring relational path for sparse subgraph. To further model neighboring relations in a global way, we innovatively apply mutual information (MI) maximization for knowledge graph. Experiments show that SNRI outperforms existing state-of-art methods by a large margin on inductive link prediction task, and verify the effectiveness of exploring complete neighboring relations in a global way to characterize node features and reason on sparse subgraphs.
IJCAI Conference 2022 Conference Paper
Due to the increasing demand in films and games, synthesizing 3D avatar animation has attracted much attention recently. In this work, we present a production-ready text/speech-driven full-body animation synthesis system. Given the text and corresponding speech, our system synthesizes face and body animations simultaneously, which are then skinned and rendered to obtain a video stream output. We adopt a learning-based approach for synthesizing facial animation and a graph-based approach to animate the body, which generates high-quality avatar animation efficiently and robustly. Our results demonstrate the generated avatar animations are realistic, diverse and highly text/speech-correlated.
AAAI Conference 2022 Conference Paper
In order to be trusted by humans, Artificial Intelligence agents should be able to describe rationales behind their decisions. One such application is human action recognition in critical or sensitive scenarios, where trustworthy and explainable action recognizers are expected. For example, reliable pedestrian action recognition is essential for self-driving cars and explanations for real-time decision making are critical for investigations if an accident happens. In this regard, learningbased approaches, despite their popularity and accuracy, are disadvantageous due to their limited interpretability. This paper presents a novel neuro-symbolic approach that recognizes actions from videos with human-understandable explanations. Specifically, we first propose to represent videos symbolically by qualitative spatial relations between objects called qualitative spatial object relation chains. We further develop a neural saliency estimator to capture the correlation between such object relation chains and the occurrence of actions. Given an unseen video, this neural saliency estimator is able to tell which object relation chains are more important for the action recognized. We evaluate our approach on two real-life video datasets, with respect to recognition accuracy and the quality of generated action explanations. Experiments show that our approach achieves superior performance on both aspects to previous symbolic approaches, thus facilitating trustworthy intelligent decision making. Our approach can be used to augment state-of-the-art learning approaches with explainability.
TCS Journal 2021 Journal Article
AAAI Conference 2021 Conference Paper
Stacked self-attention models receive widespread attention, due to its ability of capturing global dependency among words. However, the stacking of many layers and components generates huge parameters, leading to low parameter efficiency. In response to this issue, we propose a lightweight architecture named Continuous Self-Attention models with neural ODE networks (CSAODE). In CSAODE, continuous dynamical models (i. e. , neural ODEs) are coupled with our proposed self-attention block to form a self-attention ODE solver. This solver continuously calculates and optimizes the hidden states via only one layer of parameters to improve the parameter efficiency. In addition, we design a novel accelerated continuous dynamical model to reduce computing costs, and integrate it in CSAODE. Moreover, since the original self-attention ignores local information, CSAODE makes use of N-gram convolution to encode local representations, and a fusion layer with only two trainable scalars are designed for generating sentence vectors. We perform a series of experiments on text classification, natural language inference (NLI) and text matching tasks. With fewer parameters, CSAODE outperforms state-of-the-art models on text classification tasks (e. g. , 1. 3% accuracy improved on SUBJ task), and has competitive performances for NLI and text matching tasks as well.
IJCAI Conference 2021 Conference Paper
In recent years, temporal knowledge graph (TKG) reasoning has received significant attention. Most existing methods assume that all timestamps and corresponding graphs are available during training, which makes it difficult to predict future events. To address this issue, recent works learn to infer future events based on historical information. However, these methods do not comprehensively consider the latent patterns behind temporal changes, to pass historical information selectively, update representations appropriately and predict events accurately. In this paper, we propose the Historical Information Passing (HIP) network to predict future events. HIP network passes information from temporal, structural and repetitive perspectives, which are used to model the temporal evolution of events, the interactions of events at the same time step, and the known events respectively. In particular, our method considers the updating of relation representations and adopts three scoring functions corresponding to the above dimensions. Experimental results on five benchmark datasets show the superiority of HIP network, and the significant improvements on Hits@1 prove that our method can more accurately predict what is going to happen.
YNIMG Journal 2021 Journal Article
IJCAI Conference 2021 Conference Paper
The interaction of multiple drugs could lead to serious events, which causes injuries and huge medical costs. Accurate prediction of drug-drug interaction (DDI) events can help clinicians make effective decisions and establish appropriate therapy programs. Recently, many AI-based techniques have been proposed for predicting DDI associated events. However, most existing methods pay less attention to the potential correlations between DDI events and other multimodal data such as targets and enzymes. To address this problem, we propose a Multimodal Deep Neural Network (MDNN) for DDI events prediction. In MDNN, we design a two-pathway framework including drug knowledge graph (DKG) based pathway and heterogeneous feature (HF) based pathway to obtain drug multimodal representations. Finally, a multimodal fusion neural layer is designed to explore the complementary among the drug multimodal representations. We conduct extensive experiments on real-world dataset. The results show that MDNN can accurately predict DDI events and outperform the state-of-the-art models.
TIST Journal 2021 Journal Article
In the emerging business of food delivery, rider traffic accidents raise financial cost and social traffic burden. Although there has been much effort on traffic accident forecasting using temporal-spatial prediction models, none of the existing work studies the problem of detecting the takeaway rider accidents based on food delivery trajectory data. In this article, we aim to detect whether a takeaway rider meets an accident on a certain time period based on trajectories of food delivery and riders’ contextual information. The food delivery data has a heterogeneous information structure and carries contextual information such as weather and delivery history, and trajectory data are collected as a spatial-temporal sequence. In this article, we propose a TakeAway Rider Accident detection fusion network TARA-Net to jointly model these heterogeneous and spatial-temporal sequence data. We utilize the residual network to extract basic contextual information features and take advantage of a transformer encoder to capture trajectory features. These embedding features are concatenated into a pyramidal feed-forward neural network. We jointly train the above three components to combine the benefits of spatial-temporal trajectory data and sparse basic contextual data for early detecting traffic accidents. Furthermore, although traffic accidents rarely happen in food delivery, we propose a sampling mechanism to alleviate the imbalance of samples when training the model. We evaluate the model on a transportation mode classification dataset Geolife and a real-world Ele.me dataset with over 3 million riders. The experimental results show that the proposed model is superior to the state-of-the-art.
YNIMG Journal 2021 Journal Article
YNIMG Journal 2021 Journal Article
EAAI Journal 2020 Journal Article
IJCAI Conference 2020 Conference Paper
Discrete network embedding emerged recently as a new direction of network representation learning. Compared with traditional network embedding models, discrete network embedding aims to compress model size and accelerate model inference by learning a set of short binary codes for network vertices. However, existing discrete network embedding methods usually assume that the network structures (e. g. , edge weights) are readily available. In real-world scenarios such as social networks, sometimes it is impossible to collect explicit network structure information and it usually needs to be inferred from implicit data such as information cascades in the networks. To address this issue, we present an end-to-end discrete network embedding model for latent networks DELN that can learn binary representations from underlying information cascades. The essential idea is to infer a latent Weisfeiler-Lehman proximity matrix that captures node dependence based on information cascades and then to factorize the latent Weisfiler-Lehman matrix under the binary node representation constraint. Since the learning problem is a mixed integer optimization problem, an efficient maximal likelihood estimation based cyclic coordinate descent (MLE-CCD) algorithm is used as the solution. Experiments on real-world datasets show that the proposed model outperforms the state-of-the-art network embedding methods.
IJCAI Conference 2020 Conference Paper
Graph neural networks (GNNs) emerged recently as a powerful tool for analyzing non-Euclidean data such as social network data. Despite their success, the design of graph neural networks requires heavy manual work and domain knowledge. In this paper, we present a graph neural architecture search method (GraphNAS) that enables automatic design of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS uses a recurrent network to generate variable-length strings that describe the architectures of graph neural networks, and trains the recurrent network with policy gradient to maximize the expected accuracy of the generated architectures on a validation data set. Furthermore, to improve the search efficiency of GraphNAS on big networks, GraphNAS restricts the search space from an entire architecture space to a sequential concatenation of the best search results built on each single architecture layer. Experiments on real-world datasets demonstrate that GraphNAS can design a novel network architecture that rivals the best human-invented architecture in terms of validation set accuracy. Moreover, in a transfer learning task we observe that graph neural architectures designed by GraphNAS, when transferred to new datasets, still gain improvement in terms of prediction accuracy.
IJCAI Conference 2020 Conference Paper
Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.
YNIMG Journal 2020 Journal Article
IJCAI Conference 2020 Conference Paper
Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e. g. , what color is the banana? ) based on the high-frequency answers (e. g. , yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency and strengthen image dependency. However, they are still subject to the language prior problem since the data biases have not been fundamentally addressed. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and then propose a self-supervised auxiliary task to utilize the balanced data to assist the VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49. 50% to 57. 59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations. Our code is available on GitHub.
AAAI Conference 2019 Conference Paper
Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.
NeurIPS Conference 2019 Conference Paper
Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i. e. , PTB, WikiText-103 and One-billion) and a neural machine translation task (i. e. , WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.
IJCAI Conference 2019 Conference Paper
Attributed network embedding plays an important role in transferring network data into compact vectors for effective network analysis. Existing attributed network embedding models are designed either in continuous Euclidean spaces which introduce data redundancy or in binary coding spaces which incur significant loss of representation accuracy. To this end, we present a new Low-Bit Quantization for Attributed Network Representation Learning model (LQANR for short) that can learn compact node representations with low bitwidth values while preserving high representation accuracy. Specifically, we formulate a new representation learning function based on matrix factorization that can jointly learn the low-bit node representations and the layer aggregation weights under the low-bit quantization constraint. Because the new learning function falls into the category of mixed integer optimization, we propose an efficient mixed-integer based alternating direction method of multipliers (ADMM) algorithm as the solution. Experiments on real-world node classification and link prediction tasks validate the promising results of the proposed LQANR model.
IJCAI Conference 2019 Conference Paper
Conversational sentiment analysis is an emerging, yet challenging Artificial Intelligence (AI) subtask. It aims to discover the affective state of each participant in a conversation. There exists a wealth of interaction information that affects the sentiment of speakers. However, the existing sentiment analysis approaches are insufficient in dealing with this task due to ignoring the interactions and dependency relationships between utterances. In this paper, we aim to address this issue by modeling intrautterance and inter-utterance interaction dynamics. We propose an approach called quantum-inspired interactive networks (QIN), which leverages the mathematical formalism of quantum theory (QT) and the long short term memory (LSTM) network, to learn such interaction dynamics. Specifically, a density matrix based convolutional neural network (DM-CNN) is proposed to capture the interactions within each utterance (i. e. , the correlations between words), and a strong-weak influence model inspired by quantum measurement theory is developed to learn the interactions between adjacent utterances (i. e. , how one speaker influences another). Extensive experiments are conducted on the MELD and IEMOCAP datasets. The experimental results demonstrate the effectiveness of the QIN model.
IJCAI Conference 2019 Conference Paper
Knowledge Graph (KG) embedding has become crucial for the task of link prediction. Recent work applies encoder-decoder models to tackle this problem, where an encoder is formulated as a graph neural network (GNN) and a decoder is represented by an embedding method. These approaches enforce embedding techniques with structure information. Unfortunately, existing GNN-based frameworks still confront 3 severe problems: low representational power, stacking in a flat way, and poor robustness to noise. In this work, we propose a novel multi-level graph neural network (M-GNN) to address the above challenges. We first identify an injective aggregate scheme and design a powerful GNN layer using multi-layer perceptrons (MLPs). Then, we define graph coarsening schemes for various kinds of relations, and stack GNN layers on a series of coarsened graphs, so as to model hierarchical structures. Furthermore, attention mechanisms are adopted so that our approach can make predictions accurately even on the noisy knowledge graph. Results on WN18 and FB15k datasets show that our approach is effective in the standard link prediction task, significantly and consistently outperforming competitive baselines. Furthermore, robustness analysis on FB15k-237 dataset demonstrates that our proposed M-GNN is highly robust to sparsity and noise.
TCS Journal 2018 Journal Article
TCS Journal 2018 Journal Article
TCS Journal 2018 Journal Article
AAAI Conference 2018 Conference Paper
Language Modeling (LM) is a fundamental research topic in a range of areas. Recently, inspired by quantum theory, a novel Quantum Language Model (QLM) has been proposed for Information Retrieval (IR). In this paper, we aim to broaden the theoretical and practical basis of QLM. We develop a Neural Network based Quantum-like Language Model (NNQLM) and apply it to Question Answering. Specifically, based on word embeddings, we design a new density matrix, which represents a sentence (e. g. , a question or an answer) and encodes a mixture of semantic subspaces. Such a density matrix, together with a joint representation of the question and the answer, can be integrated into neural network architectures (e. g. , 2-dimensional convolutional neural networks). Experiments on the TREC-QA and WIKIQA datasets have verified the effectiveness of our proposed models.
TIST Journal 2017 Journal Article
In many research and application areas, such as information retrieval and machine learning, we often encounter dealing with a probability distribution that is mixed by one distribution that is relevant to our task in hand and the other that is irrelevant and that we want to get rid of. Thus, it is an essential problem to separate the irrelevant distribution from the mixture distribution. This article is focused on the application in Information Retrieval, where relevance feedback is a widely used technique to build a refined query model based on a set of feedback documents. However, in practice, the relevance feedback set, even provided by users explicitly or implicitly, is often a mixture of relevant and irrelevant documents. Consequently, the resultant query model (typically a term distribution) is often a mixture rather than a true relevance term distribution, leading to a negative impact on the retrieval performance. To tackle this problem, we recently proposed a Distribution Separation Method (DSM), which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While it achieved a promising performance in an empirical evaluation with simulated explicit irrelevance feedback data, it has not been deployed in the scenario where one should automatically obtain the irrelevance feedback data. In this article, we propose a substantial extension of the basic DSM from two perspectives: developing a further regularization framework and deploying DSM in the automatic irrelevance feedback scenario. Specifically, in order to avoid the output distribution of DSM drifting away from the true relevance distribution when the quality of seed irrelevant distribution (as the input to DSM) is not guaranteed, we propose a DSM regularization framework to constrain the estimation for the relevance distribution. This regularization framework includes three algorithms, each corresponding to a regularization strategy incorporated in the objective function of DSM. In addition, we exploit DSM in automatic (i.e., pseudo) irrelevance feedback, by automatically detecting the seed irrelevant documents via three different document reranking methods. We have carried out extensive experiments based on various TREC datasets, in order to systematically evaluate the proposed methods. The experimental results demonstrate the effectiveness of our proposed approaches in comparison with various strong baselines.
TCS Journal 2017 Journal Article
IS Journal 2017 Journal Article
Although existing antispam strategies detect traditional spam activities effectively, evolving spam schemes can successfully cheat conventional testing by buying the comments that are written by genuine users and sold by specific web markets. Such spam activities turn into a kind of advertising campaign among business owners to maintain their rank in top positions. This article proposes a new collaborative marketing hyping detection solution that aims to identify spam comments generated by the Spam Reviewer Cloud and detect products that adopt an evolving spam strategy for promotion. The authors propose an unsupervised learning model that combines heterogeneous product review networks in an attempt to discover collective hyping activities. Their experiments validate the existence of the collaborative marketing hyping activities on a real-life e-commerce platform and demonstrate that their model can effectively and accurately identify these advanced spam activities.
TCS Journal 2016 Journal Article
AAAI Conference 2016 Conference Paper
The Angry Birds AI Competition1 has been held annually since 2012 in conjunction with some of the major AI conferences, most recently with IJCAI 2015. The goal of the competition is to build AI agents that can play new Angry Birds levels as good as or better than the best human players. Successful agents should be able to quickly analyze new levels and to predict physical consequences of possible actions in order to select actions that solve a given level with a high score. Agents have no access to the game internal physics, but only receive screenshots of the live game. In this paper we describe why this problem is a challenge for AI, and why it is an important step towards building AI that can successfully interact with the real world. We also summarise some highlights of past competitions, including a new competition track we introduced recently.
AAAI Conference 2016 Conference Paper
Multi-instance learning (MIL) is useful for tackling labeling ambiguity in learning tasks, by allowing a bag of instances to share one label. Recently, bag mapping methods, which transform a bag to a single instance in a new space via instance selection, have drawn significant attentions. To date, most existing works are developed based on the original space, i. e. , utilizing all instances for bag mapping, and instance selection is indirectly tied to the MIL objective. As a result, it is hard to guarantee the distinguish capacity of the selected instances in the new bag mapping space for MIL. In this paper, we propose a direct discriminative mapping approach for multi-instance learning (MILDM), which identifies instances to directly distinguish bags in the new mapping space. Experiments and comparisons on real-world learning tasks demonstrate the algorithm performance.
AAAI Conference 2016 Conference Paper
The restricted Boltzmann machine (RBM) has been used as building blocks for many successful deep learning models, e. g. , deep belief networks (DBN) and deep Boltzmann machine (DBM) etc. The training of RBM can be extremely slow in pathological regions. The second order optimization methods, such as quasi-Newton methods, were proposed to deal with this problem. However, the non-convexity results in many obstructions for training RBM, including the infeasibility of applying second order optimization methods. In order to overcome this obstruction, we introduce an em-like iterative project quasi-Newton (IPQN) algorithm. Specifically, we iteratively perform the sampling procedure where it is not necessary to update parameters, and the sub-training procedure that is convex. In sub-training procedures, we apply quasi-Newton methods to deal with the pathological problem. We further show that Newton’s method turns out to be a good approximation of the natural gradient (NG) method in RBM training. We evaluate IPQN in a series of density estimation experiments on the artificial dataset and the MNIST digit dataset. Experimental results indicate that IPQN achieves an improved convergent performance over the traditional CD method.
AAAI Conference 2016 Conference Paper
In this paper we theoretically study the minimum Differentially Resolving Set (DRS) problem derived from the classical sensor placement optimization problem in network source locating. A DRS of a graph G = (V, E) is defined as a subset S ⊆ V where any two elements in V can be distinguished by their different differential characteristic sets defined on S. The minimum DRS problem aims to find a DRS S in the graph G with minimum total weight v∈S w(v). In this paper we establish a group of Integer Linear Programming (ILP) models as the solution. By the weighted set cover theory, we propose an approximation algorithm with the Θ(ln n) approximability for the minimum DRS problem on general graphs, where n is the graph size.
TCS Journal 2016 Journal Article
IJCAI Conference 2016 Conference Paper
The capability to predict changes of spatial regions is important for an intelligent system that interacts with the physical world. For example, in a disaster management scenario, predicting potentially endangered areas and inferring safe zones is essential for planning evacuations and countermeasures. Existing approaches usually predict such spatial changes by simulating the physical world based on specific models. Thus, these simulation-based methods will not be able to provide reliable predictions when the scenario is not similar to any of the models in use or when the input parameters are incomplete. In this paper, we present a prediction approach that overcomes the aforementioned problem by using a more general model and by analysing the trend of the spatial changes. The method is also flexible to adopt to new observations and to adapt its prediction to new situations.
JBHI Journal 2015 Journal Article
This paper presents a wirelessly powered implantable electrochemical sensor tag for continuous blood glucose monitoring. The system is remotely powered by a 13. 56-MHz inductive link and utilizes an ISO 15693 radio frequency identification (RFID) standard for communication. This paper provides reliable and accurate measurement for changing glucose level. The sensor tag employs a long-term glucose sensor, a winding ferrite antenna, an RFID front-end, a potentiostat, a 10-bit sigma-delta analog to digital converter, an on-chip temperature sensor, and a digital baseband for protocol processing and control. A high-frequency external reader is used to power, command, and configure the sensor tag. The only off-chip support circuitry required is a tuned antenna and a glucose microsensor. The integrated chip fabricated in SMIC 0. 13-μm CMOS process occupies an area of 1. 2 mm × 2 mm and consumes 50 μW. The power sensitivity of the whole system is -4 dBm. The sensor tag achieves a measured glucose range of 0-30 mM with a sensitivity of 0. 75 nA/mM.
IJCAI Conference 2015 Conference Paper
Qualitative spatial reasoning deals with relational spatial knowledge and with how this knowledge can be processed efficiently. Identifying suitable representations for spatial knowledge and checking whether the given knowledge is consistent has been the main research focus in the past two decades. However, where the spatial information comes from, what kind of information can be obtained and how it can be obtained has been largely ignored. This paper is an attempt to start filling this gap. We present a method for extracting detailed spatial information from sensor measurements of regions. We analyse how different sparse sensor measurements can be integrated and what spatial information can be extracted from sensor measurements. Different from previous approaches to qualitative spatial reasoning, our method allows us to obtain detailed information about the internal structure of regions. The result has practical implications, for example, in disaster management scenarios, which include identifying the safe zones in bushfire and flood regions.
TCS Journal 2015 Journal Article
IJCAI Conference 2015 Conference Paper
Influence maximization plays a key role in social network viral marketing. Although the problem has been widely studied, it is still challenging to estimate influence spread in big networks with hundreds of millions of nodes. Existing heuristic algorithms and greedy algorithms incur heavy computation cost in big networks and are incapable of processing dynamic network structures. In this paper, we propose an incremental algorithm for influence spread estimation in big networks. The incremental algorithm breaks down big networks into small subgraphs and continuously estimate influence spread on these subgraphs as data streams. The challenge of the incremental algorithm is that subgraphs derived from a big network are not independent and MC simulations on each subgraph (defined as snapshots) may conflict with each other. In this paper, we assume that different combinations of MC simulations on subgraphs generate independent samples. In so doing, the incremental algorithm on streaming subgraphs can estimate influence spread with fewer simulations. Experimental results demonstrate the performance of the proposed algorithm.
YNIMG Journal 2015 Journal Article
IJCAI Conference 2015 Conference Paper
Recently, a Quantum Language Model (QLM) was proposed to model term dependencies upon Quantum Theory (QT) framework and successively applied in Information Retrieval (IR). Nevertheless, QLM’s dependency is based on co-occurrences of terms and has not yet taken into account the Quantum Entanglement (QE), which is a key quantum concept and has a significant cognitive implication. In QT, an entangled state can provide a more complete description for the nature of realities, and determine intrinsic correlations of considered objects globally, rather than those co-occurrences on the surface. It is, however, a real challenge to decide and measure QE using the classical statistics of texts in a post-measurement configuration. In order to circumvent this problem, we theoretically prove the connection between QE and statistically Unconditional Pure Dependence (UPD). Since UPD has an implementable deciding algorithm, we can in turn characterize QE by extracting the UPD patterns from texts. This leads to a measurable QE, based on which we further advance the existing QLM framework. We empirically compare our model with related models, and the results demonstrate the effectiveness of our model.
AAAI Conference 2014 Conference Paper
With the rapid growth of event-based social networks (EBSNs) like Meetup, the demand for event recommendation becomes increasingly urgent. In EBSNs, event recommendation plays a central role in recommending the most relevant events to users who are likely to participate in. Different from traditional recommendation problems, event recommendation encounters three new types of information, i. e. , heterogenous online+offline social relationships, geographical features of events and implicit rating data from users. Yet combining the three types of data for offline event recommendation has not been considered. Therefore, we present a Bayesian latent factor model that can unify these data for event recommendation. Experimental results on real-world data sets show the performance of our method.
AAAI Conference 2014 Conference Paper
With the rapid growth of event-based social networks, the demand of event recommendation becomes increasingly important. Different from classic recommendation problems, event recommendation generally faces the problems of heterogenous online and offline social relationships among users and implicit feedback data. In this paper, we present a baysian probability model that can fully unleash the power of heterogenous social relations and efficiently tackle with implicit feedback characteristic for event recommendation. Experimental results on several real-world datasets demonstrate the utility of our method.
KR Conference 2014 Conference Paper
TCS Journal 2009 Journal Article
TCS Journal 2007 Journal Article
TCS Journal 2007 Journal Article
TCS Journal 2006 Journal Article
YNIMG Journal 2006 Journal Article
YNIMG Journal 2005 Journal Article
AAAI Conference 2005 Conference Paper
Parzen Windows as a nonparametric method has been applied to a variety of density estimation as well as classification problems. Similar to nearest neighbor methods, Parzen Windows does not involve learning. While it converges to true but unknown probability densities in the asymptotic limit, there is a lack of theoretical analysis on its performance with finite samples. In this paper we establish a finite sample error bound for Parzen Windows. We first show that Parzen Windows is an approximation to regularized least squares (RLS) methods that have been well studied in statistical learning theory. We then derive the finite sample error bound for Parzen Windows, and discuss the properties of the error bound and its relationship to the error bound for RLS. This analysis provides interesting insight to Parzen Windows as well as the nearest neighbor method from the point of view of learning theory. Finally, we provide empirical results on the performance of Parzen Windows and other methods such as nearest neighbors, RLS and SVMs on a number of real data sets. These results corroborate well our theoretical analysis.