Arrow Research search

Author name cluster

Qiang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

EAAI Journal 2026 Journal Article

Causal prototype variational information bottleneck framework for cross-domain fault diagnosis

  • Yu Wang
  • Chenyu Jiang
  • Qiang Chen
  • Shujie Liu
  • Weiwei Liu

In practical applications of modern industry, the safe and reliable operation of rotating machinery is crucial. However, due to the domain shift problem caused by complex and variable working conditions, the generalization ability of Artificial Intelligence (AI)-based cross-domain fault diagnosis models is challenged. Traditional data-driven models rely on statistical associations and are prone to capturing non-causal spurious correlations, leading to performance degradation under various working conditions. To overcome this limitation, this paper proposes a novel Causal Prototypical Variational Information Bottleneck (CP-VIB) framework. The generation mechanism of vibration signals is modeled as a Structural Causal Model (SCM) to serve as a prior for feature decoupling, cutting off the non-causal confounding paths caused by working condition characteristics. By combining the information bottleneck principle with approximate causal intervention, working condition information is compressed while fault-related causal mutual information is retained. To implement this framework, the classification task is formulated as a Euclidean distance minimization problem between Monte Carlo sampled representations and causal prototypes. Experimental results on multiple datasets containing severe compound working condition shifts demonstrate that this AI diagnostic method can achieve robust fault diagnosis under varying working condition scenarios, possessing practical application value.

AAAI Conference 2026 Conference Paper

ROVER: Robust Generative Continual Identity Unlearning Against Relearning Attacks

  • Tairan Huang
  • Qiang Chen
  • Beibei Hu
  • Yunlong Zhao
  • Hongyan Xu
  • Zhiyuan Chen
  • Yi Chen
  • Xiu Su

Recent generative unlearning models synthesize high quality samples while protecting private information by unlearning the identity. However, existing generative identity unlearning methods face two challenges in multi-identity unlearning: 1) identity conflicts, which cause conflicts of model parameters in the continuous erasure of multiple identities; 2) fragile unlearning, where the model's unlearning ability deteriorates or fails under malicious attacks. In this paper, we introduce a critical yet under-explored task called robust multi-identity unlearning, with the goals of resolving identity conflicts to achieve interference-free unlearning and protecting against malicious attacks to achieve robust unlearning. To satisfy these goals, we propose a novel framework, RObust generatiVE continual identity unlearning against Relearning attacks (ROVER). By filtering unlearning requests with latent similarity, our method effectively isolates benign unlearning from malicious attacks to preserve identity removal integrity. Meanwhile, residual orthogonal resonator resolves identity conflicts in the continuous erasure of multiple identities, preserving stability in benign continual unlearning. Moreover, we introduce the phantom guard network to block malicious attacks by absorbing adversarial gradients, ensuring irreversible identity unlearning. The extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on the task of robust multi-identity unlearning against relearning attacks.

AAAI Conference 2025 Conference Paper

Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning

  • Shiping Ge
  • Qiang Chen
  • Zhiwei Jiang
  • Yafeng Yin
  • Liu Qin
  • Ziyao Chen
  • Qing Gu

Weakly-Supervised Dense Video Captioning (WSDVC) aims to localize and describe all events of interest in a video without requiring annotations of event boundaries. This setting poses a great challenge in accurately locating the temporal location of event, as the relevant supervision is unavailable. Existing methods rely on explicit alignment constraints between event locations and captions, which involve complex event proposal procedures during both training and inference. To tackle this problem, we propose a novel implicit location-caption alignment paradigm by complementary masking, which simplifies the complex event proposal and localization process while maintaining effectiveness. Specifically, our model comprises two components: a dual-mode video captioning module and a mask generation module. The dual-mode video captioning module captures global event information and generates descriptive captions, while the mask generation module generates differentiable positive and negative masks for localizing the events. These masks enable the implicit alignment of event locations and captions by ensuring that captions generated from positively and negatively masked videos are complementary, thereby forming a complete video description. In this way, even under weak supervision, the event location and event caption can be aligned implicitly. Extensive experiments on the public datasets demonstrate that our method outperforms existing weakly-supervised methods and achieves competitive results compared to fully-supervised methods.

EAAI Journal 2025 Journal Article

Multi-modal feature integration network for Visible-Depth-Thermal salient object detection

  • Fengyv Cui
  • Xiaofei Zhou
  • Liuxin Bao
  • Bin Wan
  • Ran Shi
  • Qiang Chen
  • Jiyong Zhang

In recent years, the task of salient object detection in multi-modal scenarios has attracted more and more attention, where the increase of modalities is beneficial for improving the detection performance of models. However, though the existing saliency models have achieved encouraging performance, they overlook the unbalanced information content between visible modality and other auxiliary modalities (i. e. , depth and thermal modalities), and lack the full utilization of multi-level features. This will lead to insufficient multi-modal fusion and multi-level integration. Therefore, in this paper, we propose a multi-modal feature integration network (MFINet) for Visible-Depth-Thermal (VDT) salient object detection (SOD), which contains three key modules. Firstly, we utilize the three-modal feature fusion (TMFF) module to enhance and fuse the multi-modal features by emphasizing effective feature channels and enlarging the receptive fields of features, where we further emphasize the visible cues. Secondly, we present a neighborhood layer feature enhancement (NLFE) module, which can utilize the complementary information from adjacent TMFF modules to enhance the decoder features by using different spatial attention strategies. Thirdly, a multi-level cascade feature integration (MCFI) module is proposed to aggregate the multi-level decoder features in a cascade way, acquiring the final high-quality saliency maps. Comprehensive experiments conducted on the VDT-2048 dataset demonstrate that our model outperforms the state-of-the-art models in terms of all evaluation metrics. The code is available at https: //github. com/banjamn/MFINet.

EAAI Journal 2025 Journal Article

Physics-informed deep homogenization approach for random nanoporous composites with energetic interfaces

  • Qiang Chen
  • George Chatzigeorgiou
  • Fodil Meraghni
  • Xuefeng Chen
  • Zhibo Yang

This contribution presents a new physics-informed deep homogenization neural network model for identifying local displacement and stress fields, as well as homogenized moduli, of nanocomposites with periodic arrays of porosities under general loading conditions. Notably, it accounts for the surface elasticity effect, utilizing the Gurtin-Murdoch interface theory. First of all, a fully connected neural network model is established that maps the spatial coordinates, passing first through several sinusoidal functions, to the microscopic displacements. The loss function is formulated as the weighted sum of residuals of Navier-Cauchy equations in the bulk domains and the Young-Laplace equations on the energetic surfaces, evaluated on separate sets of collocation points. To more effectively predict stress concentrations inside the microstructures, we introduce fully trainable weights to each collocation point. The capacity and effectiveness of the new homogenization technique for capturing the size-dependent local and global response of nanocomposites with distinct pore sizes and shapes are verified upon extensive comparisons with the finite-element benchmark results, under various loading conditions. New results showcase the proposed theory's ability to model random distributions of nano-porosities with a high degree of accuracy, a task not easily achievable with alternative techniques except for the specialized finite-element method.

TMLR Journal 2023 Journal Article

CAE v2: Context Autoencoder with CLIP Latent Alignment

  • Xinyu Zhang
  • Jiahui Chen
  • Junkun Yuan
  • Qiang Chen
  • Jian Wang
  • Xiaodi Wang
  • Shumin Han
  • Xiaokang Chen

Masked image modeling (MIM) learns visual representations by predicting the masked patches on a pre-defined target. Inspired by MVP(Wei et al., 2022b) that displays impressive gains with CLIP, in this work, we also employ the semantically rich CLIP latent as target and further tap its potential by introducing a new MIM pipeline, CAE v2, to learn a high-quality encoder and facilitate model convergence on the pre-training task. CAE v2 is an improved variant of CAE (Chen et al., 2023), applying the CLIP latent on two pretraining tasks, i.e., visible latent alignment and masked latent alignment. Visible latent alignment directly mimics the visible latent representations from the encoder to the corresponding CLIP latent, which is beneficial for facilitating model convergence and improving the representative ability of the encoder. Masked latent alignment predicts the representations of masked patches within the feature space of CLIP latent as standard MIM task does, effectively aligning the representations computed from the encoder and the regressor into the same domain. We pretrain CAE v2 on ImageNet-1K images and evaluate on various downstream vision tasks, including image classification, semantic segmentation, object detection and instance segmentation. Experiments show that our CAE v2 achieves competitive performance and even outperforms the CLIP vision encoder, demonstrating the effectiveness of our method. Code is available at https://github.com/Atten4Vis/CAE.

JBHI Journal 2023 Journal Article

LAGAN: Lesion-Aware Generative Adversarial Networks for Edema Area Segmentation in SD-OCT Images

  • Yuhui Tao
  • Xiao Ma
  • Yizhe Zhang
  • Kun Huang
  • Zexuan Ji
  • Wen Fan
  • Songtao Yuan
  • Qiang Chen

Large volume of labeled data is a cornerstone for deep learning (DL) based segmentation methods. Medical images require domain experts to annotate, and full segmentation annotations of large volumes of medical data are difficult, if not impossible, to acquire in practice. Compared with full annotations, image-level labels are multiple orders of magnitude faster and easier to obtain. Image-level labels contain rich information that correlates with the underlying segmentation tasks and should be utilized in modeling segmentation problems. In this article, we aim to build a robust DL-based lesion segmentation model using only image-level labels (normal v. s. abnormal). Our method consists of three main steps: (1) training an image classifier with image-level labels; (2) utilizing a model visualization tool to generate an object heat map for each training sample according to the trained classifier; (3) based on the generated heat maps (as pseudo-annotations) and an adversarial learning framework, we construct and train an image generator for Edema Area Segmentation (EAS). We name the proposed method Lesion-Aware Generative Adversarial Networks (LAGAN) as it combines the merits of supervised learning (being lesion-aware) and adversarial training (for image generation). Additional technical treatments, such as the design of a multi-scale patch-based discriminator, further enhance the effectiveness of our proposed method. We validate the superior performance of LAGAN via comprehensive experiments on two publicly available datasets (i. e. , AI Challenger and RETOUCH).

ICLR Conference 2022 Conference Paper

A Biologically Interpretable Graph Convolutional Network to Link Genetic Risk Pathways and Imaging Phenotypes of Disease

  • Sayan Ghosal
  • Qiang Chen
  • Giulio Pergola
  • Aaron L. Goldman
  • William Ulrich
  • Daniel R. Weinberger
  • Archana Venkataraman

We propose a novel end-to-end framework for whole-brain and whole-genome imaging-genetics. Our genetics network uses hierarchical graph convolution and pooling operations to embed subject-level data onto a low-dimensional latent space. The hierarchical network implicitly tracks the convergence of genetic risk across well-established biological pathways, while an attention mechanism automatically identifies the salient edges of this network at the subject level. In parallel, our imaging network projects multimodal data onto a set of latent embeddings. For interpretability, we implement a Bayesian feature selection strategy to extract the discriminative imaging biomarkers; these feature weights are optimized alongside the other model parameters. We couple the imaging and genetic embeddings with a predictor network, to ensure that the learned representations are linked to phenotype. We evaluate our framework on a schizophrenia dataset that includes two functional MRI paradigms and gene scores derived from Single Nucleotide Polymorphism data. Using repeated 10-fold cross-validation, we show that our imaging-genetics fusion achieves the better classification performance than state-of-the-art baselines. In an exploratory analysis, we further show that the biomarkers identified by our model are reproducible and closely associated with deficits in schizophrenia.

AAAI Conference 2022 Conference Paper

DPNAS: Neural Architecture Search for Deep Learning with Differential Privacy

  • Anda Cheng
  • Jiaxing Wang
  • Xi Sheryl Zhang
  • Qiang Chen
  • Peisong Wang
  • Jian Cheng

Training deep neural networks (DNNs) for meaningful differential privacy (DP) guarantees severely degrades model utility. In this paper, we demonstrate that the architecture of DNNs has a significant impact on model utility in the context of private deep learning, whereas its effect is largely unexplored in previous studies. In light of this missing, we propose the very first framework that employs neural architecture search to automatic model design for private deep learning, dubbed as DPNAS. To integrate private learning with architecture search, we delicately design a novel search space and propose a DP-aware method for training candidate models. We empirically certify the effectiveness of the proposed framework. The searched model DPNASNet achieves state-of-theart privacy/utility trade-offs, e. g. , for the privacy budget of (, δ) = (3, 1 × 10−5 ), our model obtains test accuracy of 98. 57% on MNIST, 88. 09% on FashionMNIST, and 68. 33% on CIFAR-10. Furthermore, by studying the generated architectures, we provide several intriguing findings of designing private-learning-friendly DNNs, which can shed new light on model design for deep learning with differential privacy.

JBHI Journal 2022 Journal Article

Joint Optimization of CycleGAN and CNN Classifier for Detection and Localization of Retinal Pathologies on Color Fundus Photographs

  • Ziyue Zhang
  • Zexuan Ji
  • Qiang Chen
  • Songtao Yuan
  • Wen Fan

Retinal related diseases are the leading cause of vision loss, and severe retinal lesion causes irreversible damage to vision. Therefore, the automatic methods for retinal diseases detection based on medical images is essential for timely treatment. Considering that manual diagnosis and analysis of medical images require a large number of qualified experts, deep learning can effectively diagnosis and locate critical biomarkers. In this paper, we present a novel model by jointly optimize the cycle generative adversarial network (CycleGAN) and the convolutional neural network (CNN) to detect retinal diseases and localize lesion areas with limited training data. The CycleGAN with cycle consistency can generate more realistic and reliable images. The discriminator and the generator achieve a local optimal solution in an adversarial manner, and the generator and the classifier are in a cooperative manner to distinguish the domain of input images. A novel res-guided sampling block is proposed by combining learnable residual features and pixel-adaptive convolutions. A res-guided U-Net is constructed as the generator by substituting the traditional convolution with the res-guided sampling blocks. Our model achieve superior classification and localization performance on LAG, Ichallenge-PM and Ichallenge-AMD datasets. With clear localization for lesion areas, the competitive results reveal great potentials of the joint optimization network. The source code is available at https://github.com/jizexuan/JointOptmization.

JBHI Journal 2022 Journal Article

LamNet: A Lesion Attention Maps-Guided Network for the Prediction of Choroidal Neovascularization Volume in SD-OCT Images

  • Yuhan Zhang
  • Xiao Ma
  • Mingchao Li
  • Zexuan Ji
  • Songtao Yuan
  • Qiang Chen

Choroidal neovascularization (CNV) volume prediction has an important clinical significance to predict the therapeutic effect and schedule the follow-up. In this paper, we propose a Lesion Attention Maps-Guided Network (LamNet) to automatically predict the CNV volume of next follow-up visit after therapy based on 3-dimentional spectral-domain optical coherence tomography (SD-OCT) images. In particular, the backbone of LamNet is a 3D convolutional neural network (3D-CNN). In order to guide the network to focus on the local CNV lesion regions, we use CNV attention maps generated by an attention map generator to produce the multi-scale local context features. Then, the multi-scale of both local and global feature maps are fused to achieve the high-precision CNV volume prediction. In addition, we also design a synergistic multi-task predictor, in which a trend-consistent loss ensures that the change trend of the predicted CNV volume is consistent with the real change trend of the CNV volume. The experiments include a total of 541 SD-OCT cubes from 68 patients with two types of CNV captured by two different SD-OCT devices. The results demonstrate that LamNet can provide the reliable and accurate CNV volume prediction, which would further assist the clinical diagnosis and design the treatment options.

JBHI Journal 2022 Journal Article

Self-Supervised Sequence Recovery for Semi-Supervised Retinal Layer Segmentation

  • Jiadong Yang
  • Yuhui Tao
  • Qiuzhuo Xu
  • Yuhan Zhang
  • Xiao Ma
  • Songtao Yuan
  • Qiang Chen

Automated layer segmentation plays an important role for retinal disease diagnosis in optical coherence tomography (OCT) images. However, the severe retinal diseases result in the performance degeneration of automated layer segmentation approaches. In this paper, we present a robust semi-supervised layer segmentation network to relieve the model failures on abnormal retinas. We obtain the lesion features from the labeled images with disease-balanced distribution, and utilize the unlabeled images to supplement the layer structure information. Specifically, in our method, the cross-consistency training is utilized over the predictions of different decoders, and we enforce a consistency between different decoder predictions to improve the encoder's representation. Then, we propose a sequence prediction branch based on self-supervised manner, which is designed to predict the position of each jigsaw puzzle to obtain sensory perception of the retinal layer structure. To this task, a layer spatial pyramid pooling (LSPP) module is designed to extract multi-scale layer spatial features. Furthermore, we use the optical coherence tomography angiography (OCTA) to supplement the information damaged by diseases. The experimental results illustrate that our method achieves more robust results compared with current supervised segmentation methods. Meanwhile, advanced segmentation performance can be obtained compared with state-of-the-art semi-supervised segmentation methods.

NeurIPS Conference 2022 Conference Paper

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning

  • Yanpeng Sun
  • Qiang Chen
  • Xiangyu He
  • Jian Wang
  • Haocheng Feng
  • Junyu Han
  • Errui Ding
  • Jian Cheng

Freezing the pre-trained backbone has become a standard paradigm to avoid overfitting in few-shot segmentation. In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}. We present a solution to overcome the overfitting problem, leading to better model generalization on learning novel classes. Our method decomposes backbone parameters into three successive matrices via the Singular Value Decomposition (SVD), then {\em only fine-tunes the singular values} and keeps others frozen. The above design allows the model to adjust feature representations on novel classes while maintaining semantic clues within the pre-trained backbone. We evaluate our {\em Singular Value Fine-tuning (SVF)} approach on various few-shot segmentation methods with different backbones. We achieve state-of-the-art results on both Pascal-5$^i$ and COCO-20$^i$ across 1-shot and 5-shot settings. Hopefully, this simple baseline will encourage researchers to rethink the role of backbone fine-tuning in few-shot settings.

YNIMG Journal 2021 Journal Article

A generative-discriminative framework that integrates imaging, genetic, and diagnosis into coupled low dimensional space

  • Sayan Ghosal
  • Qiang Chen
  • Giulio Pergola
  • Aaron L. Goldman
  • William Ulrich
  • Karen F. Berman
  • Giuseppe Blasi
  • Leonardo Fazio

We propose a novel optimization framework that integrates imaging and genetics data for simultaneous biomarker identification and disease classification. The generative component of our model uses a dictionary learning framework to project the imaging and genetic data into a shared low dimensional space. We have coupled both the data modalities by tying the linear projection coefficients to the same latent space. The discriminative component of our model uses logistic regression on the projection vectors for disease diagnosis. This prediction task implicitly guides our framework to find interpretable biomarkers that are substantially different between a healthy and disease population. We exploit the interconnectedness of different brain regions by incorporating a graph regularization penalty into the joint objective function. We also use a group sparsity penalty to find a representative set of genetic basis vectors that span a low dimensional space where subjects are easily separable between patients and controls. We have evaluated our model on a population study of schizophrenia that includes two task fMRI paradigms and single nucleotide polymorphism (SNP) data. Using ten-fold cross validation, we compare our generative-discriminative framework with canonical correlation analysis (CCA) of imaging and genetics data, parallel independent component analysis (pICA) of imaging and genetics data, random forest (RF) classification, and a linear support vector machine (SVM). We also quantify the reproducibility of the imaging and genetics biomarkers via subsampling. Our framework achieves higher class prediction accuracy and identifies robust biomarkers. Moreover, the implicated brain regions and genetic variants underlie the well documented deficits in schizophrenia.

JBHI Journal 2021 Journal Article

Noise Reduction for SD-OCT Using a Structure-Preserving Domain Transfer Approach

  • Menglin Wu
  • Wei Chen
  • Qiang Chen
  • Hyunjin Park

Spectral-domain optical coherence tomography (SD-OCT) images inevitably suffer from multiplicative speckle noise caused by random interference. This study proposes an unsupervised domain adaptation approach for noise reduction by translating the SD-OCT to the corresponding high-quality enhanced depth imaging (EDI)-OCT. We propose a structure-persevered cycle-consistent generative adversarial network for unpaired image-to-image translation, which can be applied to imbalanced unpaired data, and can effectively preserve retinal details based on a structure-specific cross-domain description. It also imposes smoothness by penalizing the intensity variation of the low reflective region between consecutive slices. Our approach was tested on a local data set that consisted of 268 SD-OCT volumes and two public independent validation datasets including 20 SD-OCT volumes and 17 B-scans, respectively. Experimental results show that our method can effectively suppress noise and maintain the retinal structure, compared with other traditional approaches and deep learning methods in terms of qualitative and quantitative assessments. Our proposed method shows good performance for speckle noise reduction and can assist downstream tasks of OCT analysis.

JBHI Journal 2020 Journal Article

Automated Quantification of Hyperreflective Foci in SD-OCT With Diabetic Retinopathy

  • Idowu Paul Okuwobi
  • Zexuan Ji
  • Wen Fan
  • Songtao Yuan
  • Loza Bekalo
  • Qiang Chen

The presence of hyperreflective foci (HFs) is related to retinal disease progression, and the quantity has proven to be a prognostic factor of visual and anatomical outcome in various retinal diseases. However, lack of efficient quantitative tools for evaluating the HFs has deprived ophthalmologist of assessing the volume of HFs. For this reason, we propose an automated quantification algorithm to segment and quantify HFs in spectral domain optical coherence tomography (SD-OCT). The proposed algorithm consists of two parallel processes namely: region of interest (ROI) generation and HFs estimation. To generate the ROI, we use morphological reconstruction to obtain the reconstructed image and histogram constructed for data distributions and clustering. In parallel, we estimate the HFs by extracting the extremal regions from the connected regions obtained from a component tree. Finally, both the ROI and the HFs estimation process are merged to obtain the segmented HFs. The proposed algorithm was tested on 40 3D SD-OCT volumes from 40 patients diagnosed with non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), and diabetic macular edema (DME). The average dice similarity coefficient (DSC) and correlation coefficient (r) are 69. 70%, 0. 99 for NPDR, 70. 31%, 0. 99 for PDR, and 71. 30%, 0. 99 for DME, respectively. The proposed algorithm can provide ophthalmologist with good HFs quantitative information, such as volume, size, and location of the HFs.

IJCAI Conference 2020 Conference Paper

Global Structure and Local Semantics-Preserved Embeddings for Entity Alignment

  • Hao Nie
  • Xianpei Han
  • Le Sun
  • Chi Man Wong
  • Qiang Chen
  • Suhui Wu
  • Wei Zhang

Entity alignment (EA) aims to identify entities located in different knowledge graphs (KGs) that refer to the same real-world object. To learn the entity representations, most EA approaches rely on either translation-based methods which capture the local relation semantics of entities or graph convolutional networks (GCNs), which exploit the global KG structure. Afterward, the aligned entities are identified based on their distances. In this paper, we propose to jointly leverage the global KG structure and entity-specific relational triples for better entity alignment. Specifically, a global structure and local semantics preserving network is proposed to learn entity representations in a coarse-to-fine manner. Experiments on several real-world datasets show that our method significantly outperforms other entity alignment approaches and achieves the new state-of-the-art performance.

JBHI Journal 2020 Journal Article

MS-CAM: Multi-Scale Class Activation Maps for Weakly-Supervised Segmentation of Geographic Atrophy Lesions in SD-OCT Images

  • Xiao Ma
  • Zexuan Ji
  • Sijie Niu
  • Theodore Leng
  • Daniel L. Rubin
  • Qiang Chen

As one of the most critical characteristics in advanced stage of non-exudative Age-related Macular Degeneration (AMD), Geographic Atrophy (GA) is one of the significant causes of sustained visual acuity loss. Automatic localization of retinal regions affected by GA is a fundamental step for clinical diagnosis. In this paper, we present a novel weakly supervised model for GA segmentation in Spectral-Domain Optical Coherence Tomography (SD-OCT) images. A novel Multi-Scale Class Activation Map (MS-CAM) is proposed to highlight the discriminatory significance regions in localization and detail descriptions. To extract available multi-scale features, we design a Scaling and UpSampling (SUS) module to balance the information content between features of different scales. To capture more discriminative features, an Attentional Fully Connected (AFC) module is proposed by introducing the attention mechanism into the fully connected operations to enhance the significant informative features and suppress less useful ones. Based on the location cues, the final GA region prediction is obtained by the projection segmentation of MS-CAM. The experimental results on two independent datasets demonstrate that the proposed weakly supervised model outperforms the conventional GA segmentation methods and can produce similar or superior accuracy comparing with fully supervised approaches. The source code has been released and is available on GitHub: https://github.com/jizexuan/Multi-Scale-Class-Activation-Map-Tensorflow.

EAAI Journal 2018 Journal Article

Social personalized ranking with both the explicit and implicit influence of user trust and of item ratings

  • Gai Li
  • Youfen Chen
  • Zhiqiang Zhang
  • Jianghong Zhong
  • Qiang Chen

Due to the inherent deficiency of social collaborative filtering algorithms based on rating prediction, social personalized ranking algorithms based on ranking prediction have recently received much more attention in recommendation communities due to their close relationship with real industry problem settings. However, most existing social personalized ranking algorithms focus on either explicit feedback data or implicit feedback data rather than making full use of the information in the dataset. Until now, no studies have been done on social personalized ranking algorithms by exploiting both the explicit and implicit influence of user trust and of item ratings. In order to overcome the defects of prior researches and to further solve the problems of data sparsity and cold start of collaborative filtering, a new social personalized ranking model (SPR_SVD + + ) based on the newest xCLiMF model and TrustSVD model was proposed, which exploited both the explicit and implicit influence of user trust and of item ratings simultaneously and optimized the well-known evaluation metric Expected Reciprocal Rank ( E R R ) Experimental results on practical datasets showed that our proposed model outperformed existing state-of-the-art collaborative filtering approaches over two different evaluation metrics N D C G and E R R, and that the running time of SPR_SVD + + showed a linear correlation with the number of users in the data collection and the number of observations in the rating and trust matrices. Due to its high precision and good expansibility, SPR_SVD + + is suitable for processing big data and has wide application prospects in the field of internet information recommendation.

IJCAI Conference 2017 Conference Paper

Interactive Image Segmentation via Pairwise Likelihood Learning

  • Tao Wang
  • Quansen Sun
  • Qi Ge
  • Zexuan Ji
  • Qiang Chen
  • Guiyu Xia

This paper presents an interactive image segmentation approach where the segmentation problem is formulated as a probabilistic estimation manner. Instead of measuring the distances between unseeded pixels and seeded pixels, we measure the similarities between pixel pairs and seed pairs to improve the robustness to the seeds. The unary prior probability of each pixel belonging to the foreground F and background B can be effectively estimated based on the similarities with label pairs (F, F), (F, B), (B, F) and (B, B). Then a likelihood learning framework is proposed to fuse the region and boundary information of the image by imposing the smoothing constraint on the unary potentials. Experiments on challenging data sets demonstrate that the proposed method can obtain better performance than state-of-the-art methods.

TIST Journal 2014 Journal Article

Snap & Play

  • Si Liu
  • Qiang Chen
  • Shuicheng Yan
  • Changsheng Xu
  • Hanqing Lu

In this article, by taking a popular game, the Find-the-Difference (FiDi) game, as a concrete example, we explore how state-of-the-art image processing techniques can assist in developing a personalized, automatic, and dynamic game. Unlike the traditional FiDi game, where image pairs (source image and target image) with five different patches are manually produced by professional game developers, the proposed Personalized FiDi (P-FiDi) electronic game can be played in a fully automatic Snap & Play mode. Snap means that players first take photos with their digital cameras. The newly captured photos are used as source images and fed into the P-FiDi system to autogenerate the counterpart target images for users to play. Four steps are adopted to autogenerate target images: enhancing the visual quality of source images, extracting some changeable patches from the source image, selecting the most suitable combination of changeable patches and difference styles for the image, and generating the differences on the target image with state-of-the-art image processing techniques. In addition, the P-FiDi game can be easily redesigned for the im-game advertising. Extensive experiments show that the P-FiDi electronic game is satisfying in terms of player experience, seamless advertisement, and technical feasibility.