Arrow Research search

Author name cluster

Chang Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

91 papers
2 author rows

Possible papers

91

AAAI Conference 2026 Conference Paper

Controllable Financial Market Generation with Diffusion Guided Meta Agent

  • Yu-Hao Huang
  • Chang Xu
  • Yang Liu
  • Weiqing Liu
  • Wu-Jun Li
  • Jiang Bian

Generative modeling has transformed many fields, such as language and visual modeling, while its application in financial markets remains under-explored. As the minimal unit within a financial market is an order, order-flow modeling represents a fundamental generative financial task. However, current approaches often yield unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their practical applications. In this paper, we formulate the challenge of controllable financial market generation, and propose a Diffusion Guided Meta Agent (DigMA) model to address it. Specifically, we employ a conditional diffusion model to capture the dynamics of the market state represented by time-evolving distribution parameters of the mid-price return rate and the order arrival rate, and we define a meta agent with financial economic priors to generate orders from the corresponding distributions. Extensive experimental results show that DigMA achieves superior controllability and generation fidelity. Moreover, we validate its effectiveness as a generative environment for downstream high-frequency trading tasks and its computational efficiency.

AAAI Conference 2026 Conference Paper

Eliciting Chain-of-Thought in Base LLMs via Gradient-Based Representation Optimization

  • Zijian Wang
  • Yanxiang Ma
  • Chang Xu

Chain-of-Thought (CoT) reasoning is a critical capability for large language models (LLMs), enabling them to tackle complex multi-step tasks. While base LLMs, pre-trained on general text corpora, often struggle with reasoning due to a lack of specialized training, recent studies reveal their latent reasoning potential tied to hidden states. However, existing hidden state manipulation methods, such as linear activation steering, suffer from limitations due to their rigid and unconstrained nature, often leading to distribution shifts and degraded text quality. In this work, we propose a novel approach for eliciting CoT reasoning from base LLMs through hidden state manipulation grounded in probabilistic conditional generation. By reformulating the challenge as an optimization problem with a balanced likelihood and prior regularization framework, our method guides hidden states toward reasoning-oriented trajectories while preserving linguistic coherence. Extensive evaluations across mathematical, commonsense, and logical reasoning benchmarks demonstrate that our approach consistently outperforms existing steering methods, offering a theoretically principled and effective solution for enhancing reasoning capabilities in base LLMs.

AAAI Conference 2026 Conference Paper

Explainable Oracle Bone Script Recognition via Multimodal Pictographic Reasoning

  • Yin Wu
  • Zhengxuan Zhang
  • Jiayu Chen
  • Chang Xu
  • Yuyu Luo
  • Nan Tang
  • Hui Xiong

Oracle Bone Script, East Asia's earliest mature writing system from over 3,500 years ago, encodes ancient cognition through visual metaphors, yet remains largely undeciphered and inaccessible, severing modern society from its cultural roots. Traditional AI methods, while accurate in classification, treat glyphs as opaque data, neglecting their pictographic essence and failing to foster public understanding—exacerbating a heritage crisis amid linguistic evolution. We pioneer a paradigm shift toward AI-driven cultural democratization, introducing OracleVis, the first human-validated multimodal dataset of glyph-image-explanation triplets, curated through expert collaborations to overcome data scarcity, bias, and incompleteness in archaeological sources. Building on this, OBS-VM, an explainability-centric multimodal large language model fine-tuned on Qwen2-VL-7B, models pictographic reasoning by balancing semantic fidelity with interpretive transparency, transforming black-box predictions into cognition-aligned narratives. Rigorous evaluations, including benchmarks and a user study with 24 non-experts, reveal our system's superiority: it outperforms GPT-4o in pictographic rationality (3.79 vs. 3.58 in human evaluation) and achieves a 35.3% relative improvement in recognition accuracy, while interactive learning boosts knowledge gains (+5.5 vs. +1.7), interest (+1.9 vs. +0.4), and confidence (+2.0 vs. +0.3) over static methods. This work illuminates AI's potential to bridge ancient wisdom and contemporary audiences, redefining heritage preservation as an inclusive, socially impactful endeavor that turns cultural alienation into enlightened engagement.

AAAI Conference 2025 Conference Paper

Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

  • Hefei Mei
  • Minjing Dong
  • Chang Xu

Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-based methods. Simply reducing the network size and timesteps in DMs could significantly harm the image generation quality, which invalidates previous frameworks. To alleviate this issue, we redesign the diffusion framework from generating high-quality images to predicting distinguishable image labels. Specifically, we employ an image translation framework to learn many-to-one mapping from input samples to designed orthogonal image labels. Based on this framework, we introduce an efficient Image-to-Image diffusion classifier with a pruned U-Net structure and reduced diffusion timesteps. Besides the framework, we redesign the optimization objective of DMs to fit the target of image classification, where a new classification loss is incorporated in the DM-based image translation framework to distinguish the generated label from those of other classes. We conduct sufficient evaluations of the proposed classifier under various attacks on popular benchmarks. Extensive experiments show that our method achieves better adversarial robustness with fewer computational costs than DM-based and CNN-based methods.

AAAI Conference 2025 Conference Paper

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

  • Xiaohuan Pei
  • Tao Huang
  • Chang Xu

Prior efforts in light-weight model development mainly centered on CNN and Transformer-based designs yet faced persistent challenges. CNNs adept at local feature extraction compromise resolution while Transformers offer global reach but escalate computational demands O(N^2). This ongoing trade-off between accuracy and efficiency remains a significant hurdle. Recently, state space models (SSMs), such as Mamba, have shown outstanding performance and competitiveness in various tasks such as language modeling and computer vision, while reducing the time complexity of global information extraction to O(N). Inspired by this, this work proposes to explore the potential of visual state space models in light-weight model design and introduce a novel efficient model variant dubbed EfficientVMamba. Concretely, our EfficientVMamba integrates a atrous-based selective scan approach by efficient skip sampling, constituting building blocks designed to harness both global and local representational features. Additionally, we investigate the integration between SSM blocks and convolutions, and introduce an efficient visual state space block combined with an additional convolution branch, which further elevate the model performance. Experimental results show that, EfficientVMamba scales down the computational complexity while yields competitive results across a variety of vision tasks. For example, our EfficientVMamba-S with 1.3G FLOPs improves Vim-Ti with 1.5G FLOPs by a large margin of 5.6% accuracy on ImageNet.

AAAI Conference 2025 Conference Paper

Feature Clipping for Uncertainty Calibration

  • Linwei Tao
  • Minjing Dong
  • Chang Xu

Deep neural networks (DNNs) have achieved significant success across various tasks, but ensuring reliable uncertainty estimates, known as model calibration, is crucial for their safe and effective deployment. Modern DNNs often suffer from overconfidence, leading to miscalibration. We propose a novel post-hoc calibration method called feature clipping (FC) to address this issue. FC involves clipping feature values to a specified threshold, effectively increasing entropy in high calibration error samples while maintaining the information in low calibration error samples. This process reduces the overconfidence in predictions, improving the overall calibration of the model. Our extensive experiments on datasets such as CIFAR-10, CIFAR-100, and ImageNet, and models including CNNs and transformers, demonstrate that FC consistently enhances calibration performance. Additionally, we provide a theoretical analysis that validates the effectiveness of our method. As the first calibration technique based on feature modification, feature clipping offers a novel approach to improving model calibration, showing significant improvements over both post-hoc and train-time calibration methods and pioneering a new avenue for feature-based model calibration.

NeurIPS Conference 2025 Conference Paper

MIRA: Medical Time Series Foundation Model for Real-World Health Data

  • Hao Li
  • Bowen Deng
  • Chang Xu
  • ZhiYuan Feng
  • Viktor Schlegel
  • Yu-Hao Huang
  • Yizheng Sun
  • Jingyuan Sun

A unified foundation model for medical time series—pretrained on open access and ethically reviewed medical corpora—offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing time series foundation models struggle to handle medical time series data due to its inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missingness. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieving reductions in forecasting errors by an average of 8% and 6% in out-of-distribution and in-distribution scenarios, respectively. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.

IROS Conference 2025 Conference Paper

Origami-Inspired Soft Gripper with Tunable Constant Force Output

  • Zhenwei Ni
  • Chang Xu
  • Zhihang Qin
  • Ceng Zhang
  • Zhiqiang Tang
  • Peiyi Wang
  • Cecilia Laschi

Soft robotic grippers gently and safely manipulate delicate objects due to their inherent adaptability and softness. Limited by insufficient stiffness and imprecise force control, conventional soft grippers are not suitable for applications that require stable grasping force. In this work, we propose a soft gripper that utilizes an origami-inspired structure to achieve tunable constant force output over a wide strain range. The geometry of each taper panel is established to provide necessary parameters such as protrusion distance, taper angle, and crease thickness required for 3D modeling and FEA analysis. Simulations and experiments show that by optimizing these parameters, our design can achieve a tunable constant force output. Moreover, the origami-inspired soft gripper dynamically adapts to different shapes while preventing excessive forces, with potential applications in logistics, manufacturing, and other industrial settings that require stable and adaptive operations.

NeurIPS Conference 2025 Conference Paper

SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning

  • Chen Chen
  • Majid Abdolshah
  • Violetta Shevchenko
  • Hongdong Li
  • Chang Xu
  • Pulak Purkait

Existing diffusion-based super-resolution approaches often exhibit semantic ambiguities due to inaccuracies and incompleteness in their text conditioning, coupled with the inherent tendency for cross-attention to divert towards irrelevant pixels. These limitations can lead to semantic misalignment and hallucinated details in the generated high-resolution outputs. To address these, we propose a novel, plug-and-play spatially re-focused super-resolution (SRSR) framework that consists of two core components: first, we introduce Spatially Re-focused Cross-Attention (SRCA), which refines text conditioning at inference time by applying visually-grounded segmentation masks to guide cross-attention. Second, we introduce a Spatially Targeted Classifier-Free Guidance (STCFG) mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations. Extensive experiments on both synthetic and real-world datasets demonstrate that SRSR consistently outperforms seven state-of-the-art baselines in standard fidelity metrics (PSNR and SSIM) across all datasets, and in perceptual quality measures (LPIPS and DISTS) on two real-world benchmarks, underscoring its effectiveness in achieving both high semantic fidelity and perceptual quality in super-resolution.

AAAI Conference 2025 Conference Paper

TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts

  • Yu-Hao Huang
  • Chang Xu
  • Yueying Wu
  • Wu-Jun Li
  • Jiang Bian

Time series generation models are crucial for applications like data augmentation and privacy preservation. Most existing time series generation models are typically designed to generate data from one specified domain. While leveraging data from other domain for better generalization is proved to work in other application areas, this approach remains challenging for time series modeling due to the large divergence in patterns among different real world time series categories. In this paper, we propose a multi-domain time series diffusion model with domain prompts, named TimeDP. In TimeDP, we utilize a time series semantic prototype module which defines time series prototypes to represent time series basis, each prototype vector serving as "word" representing some elementary time series feature. A prototype assignment module is applied to extract the extract domain specific prototype weights, for learning domain prompts as generation condition. During sampling, we extract ``domain prompt" with few-shot samples from the target domain and use the domain prompts as condition to generate time series samples. Experiments demonstrate that our method outperforms baselines to provide the state-of-the-art in-domain generation quality and strong unseen domain generation capability.

NeurIPS Conference 2025 Conference Paper

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

  • Siyu Xu
  • Yunke Wang
  • Chenghao Xia
  • Dihao Zhu
  • Tao Huang
  • Chang Xu

Vision-Language-Action (VLA) models have demonstrated strong multi-modal reasoning capabilities, enabling direct action generation from visual perception and language instructions in an end-to-end manner. However, their substantial computational cost poses a challenge for real-time robotic control, where rapid decision-making is essential. This paper introduces VLA-Cache, a training-free inference acceleration method that reduces computational overhead by adaptively caching and reusing static visual tokens across frames. Exploiting the temporal continuity in robotic manipulation, VLA-Cache identifies minimally changed tokens between adjacent frames and reuses their cached key-value representations, thereby circumventing redundant computations. Additionally, to maintain action precision, VLA-Cache selectively re-computes task-relevant tokens that are environmentally sensitive, ensuring the fidelity of critical visual information. To further optimize efficiency, we introduce a layer adaptive token reusing strategy that dynamically adjusts the reuse ratio based on attention concentration across decoder layers, prioritizing critical tokens for recomputation. Extensive experiments on two simulation platforms (LIBERO and SIMPLER) and a real-world robotic system demonstrate that VLA-Cache achieves up to 1. 7× speedup in CUDA latency and a 15\% increase in control frequency, with negligible loss on task success rate. The code and videos can be found at our project page: https: //vla-cache. github. io.

AAAI Conference 2025 Conference Paper

WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration

  • Laibin Chang
  • Yunke Wang
  • Longxiang Deng
  • Bo Du
  • Chang Xu

Underwater salient object detection (USOD) plays a pivotal role in various vision-based marine exploration tasks. However, existing USOD techniques face the dilemma of object mislocalization and imprecise boundaries due to the complex underwater environment. The quality degradation of raw underwater images (caused by selective absorption and medium scattering) makes it challenging to perform instance detection directly. One conceivable approach involves initially removing visual disturbances through underwater image enhancement (UIE), followed by saliency detection. However, this two-stage approach neglects the potential positive impact of the restoration procedure on saliency detection due to it executes in a cascade. Based on this insight, we propose a generalized prior-involved diffusion model, called WaterDiffusion for collaborative underwater saliency detection and visual restoration. Specifically, we first propose a revised self-attention joint diffusion, which embeds dynamic saliency masks into the diffusive network as latent features. By extending the underwater degradation prior into the multi-scale decoder, we innovatively exploit optical transmission maps to aid in localizing underwater salient objects. Then, we further design a gate-guided binary indicator to select either normalized or raw channels for improving feature generalization. Finally, the Half-quadratic Splitting is introduced into the unfolding sampling to refine saliency masks iteratively. Comprehensive experiments demonstrate the superior performance of WaterDiffusion over state-of-the-art methods in both quantitative and qualitative evaluations.

NeurIPS Conference 2024 Conference Paper

A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era

  • Fangyun Wei
  • Jinjing Zhao
  • Kun Yan
  • Hongyang Zhang
  • Chang Xu

Prior research in human-centric AI has primarily addressed single-modality tasks like pedestrian detection, action recognition, and pose estimation. However, the emergence of large multimodal models (LMMs) such as GPT-4V has redirected attention towards integrating language with visual content. Referring expression comprehension (REC) represents a prime example of this multimodal approach. Current human-centric REC benchmarks, typically sourced from general datasets, fall short in the LMM era due to their limitations, such as insufficient testing samples, overly concise referring expressions, and limited vocabulary, making them inadequate for evaluating the full capabilities of modern REC models. In response, we present HC-RefLoCo (Human-Centric Referring Expression Comprehension with Long Context), a benchmark that includes 13, 452 images, 24, 129 instances, and 44, 738 detailed annotations, encompassing a vocabulary of 18, 681 words. Each annotation, meticulously reviewed for accuracy, averages 93. 2 words and includes topics such as appearance, human-object interaction, location, action, celebrity, and OCR. HC-RefLoCo provides a wider range of instance scales and diverse evaluation protocols, encompassing accuracy with various IoU criteria, scale-aware evaluation, and subject-specific assessments. Our experiments, which assess 24 models, highlight HC-RefLoCo’s potential to advance human-centric AI by challenging contemporary REC models with comprehensive and varied data. Our benchmark, along with the evaluation code, are available at https: //github. com/ZhaoJingjing713/HC-RefLoCo.

AAAI Conference 2024 Conference Paper

AMD: Autoregressive Motion Diffusion

  • Bo Han
  • Hao Peng
  • Minjing Dong
  • Yi Ren
  • Yixuan Shen
  • Chang Xu

Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with “No Modality Left Behind”, enabling for the first time the generation of high-definition and high-fidelity human motions based on user-defined modality input.

IJCAI Conference 2024 Conference Paper

Boosting Diffusion Models with an Adaptive Momentum Sampler

  • Xiyu Wang
  • Anh-Dung Dinh
  • Daochang Liu
  • Chang Xu

Diffusion probabilistic models (DPMs) have been shown to generate high-quality images without the need for delicate adversarial training. The sampling process of DPMs is mathematically similar to Stochastic Gradient Descent (SGD), with both being iteratively updated with a function increment. Building on this, we present a novel reverse sampler for DPMs in this paper, drawing inspiration from the widely-used Adam optimizer. Our proposed sampler can be readily applied to a pre-trained diffusion model, utilizing momentum mechanisms and adaptive updating to enhance the generated image's quality. By effectively reusing update directions from early steps, our proposed sampler achieves a better balance between high-level semantics and low-level details. Additionally, this sampler is flexible and can be easily integrated into pre-trained DPMs regardless of the sampler used during training. Our experimental results on multiple benchmarks demonstrate that our proposed reverse sampler yields remarkable improvements over different baselines.

NeurIPS Conference 2024 Conference Paper

Enhancing Large Language Models through Adaptive Tokenizers

  • Mengyu Zheng
  • Hanting Chen
  • Tianyu Guo
  • Chong Zhu
  • Binfan Zheng
  • Chang Xu
  • Yunhe Wang

Tokenizers serve as crucial interfaces between models and linguistic data, substantially influencing the efficacy and precision of large language models (LLMs). Traditional tokenization methods often rely on static frequency-based statistics and are not inherently synchronized with LLM architectures, which may limit model performance. In this study, we propose a simple but effective method to learn tokenizers specifically engineered for seamless integration with LLMs. Initiating with a broad initial vocabulary, we refine our tokenizer by monitoring changes in the model’s perplexity during training, allowing for the selection of a tokenizer that is closely aligned with the model’s evolving dynamics. Through iterative refinement, we develop an optimized tokenizer. Our empirical evaluations demonstrate that this adaptive approach significantly enhances accuracy compared to conventional methods, maintaining comparable vocabulary sizes and affirming its potential to improve LLM functionality.

AAAI Conference 2024 Conference Paper

Harnessing Edge Information for Improved Robustness in Vision Transformers

  • Yanxi Li
  • Chengbin Du
  • Chang Xu

Deep Neural Networks (DNNs) have demonstrated remarkable accuracy in vision classification tasks. However, they exhibit vulnerability to additional noises known as adversarial attacks. Previous studies hypothesize that this vulnerability might stem from the fact that high-accuracy DNNs heavily rely on irrelevant and non-robust features, such as textures and the background. In this work, we reveal that edge information extracted from images can provide relevant and robust features related to shapes and the foreground. These features assist pretrained DNNs in achieving improved adversarial robustness without compromising their accuracy on clean images. A lightweight and plug-and-play EdgeNet is proposed, which can be seamlessly integrated into existing pretrained DNNs, including Vision Transformers, a recent family of state-of-the-art models for vision classification. Our EdgeNet can process edges derived from either clean nature images or noisy adversarial images, yielding robust features which can be injected into the intermediate layers of the frozen backbone DNNs. The cost of obtaining such edges using conventional edge detection algorithms (e.g., Canny edge detector) is marginal, and the cost of training the EdgeNet is equivalent to that of fine-tuning the backbone network with techniques such as Adapter.

AAAI Conference 2024 Conference Paper

Learning Visual Abstract Reasoning through Dual-Stream Networks

  • Kai Zhao
  • Chang Xu
  • Bailu Si

Visual abstract reasoning tasks present challenges for deep neural networks, exposing limitations in their capabilities. In this work, we present a neural network model that addresses the challenges posed by Raven’s Progressive Matrices (RPM). Inspired by the two-stream hypothesis of visual processing, we introduce the Dual-stream Reasoning Network (DRNet), which utilizes two parallel branches to capture image features. On top of the two streams, a reasoning module first learns to merge the high-level features of the same image. Then, it employs a rule extractor to handle combinations involving the eight context images and each candidate image, extracting discrete abstract rules and utilizing an multilayer perceptron (MLP) to make predictions. Empirical results demonstrate that the proposed DRNet achieves state-of-the-art average performance across multiple RPM benchmarks. Furthermore, DRNet demonstrates robust generalization capabilities, even extending to various out-of-distribution scenarios. The dual streams within DRNet serve distinct functions by addressing local or spatial information. They are then integrated into the reasoning module, leveraging abstract rules to facilitate the execution of visual reasoning tasks. These findings indicate that the dual-stream architecture could play a crucial role in visual abstract reasoning.

NeurIPS Conference 2024 Conference Paper

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

  • Yuheng Shi
  • Minjing Dong
  • Chang Xu

Despite the significant achievements of Vision Transformers (ViTs) in various vision tasks, they are constrained by the quadratic complexity. Recently, State Space Models (SSMs) have garnered widespread attention due to their global receptive field and linear complexity with respect to the input length, demonstrating substantial potential across fields including natural language processing and computer vision. To improve the performance of SSMs in vision tasks, a multi-scan strategy is widely adopted, which leads to significant redundancy of SSMs. For a better trade-off between efficiency and performance, we analyze the underlying reasons behind the success of the multi-scan strategy, where long-range dependency plays an important role. Based on the analysis, we introduce Multi-Scale Vision Mamba (MSVMamba) to preserve the superiority of SSMs in vision tasks with limited parameters. It employs a multi-scale 2D scanning technique on both original and downsampled feature maps, which not only benefits long-range dependency learning but also reduces computational costs. Additionally, we integrate a Convolutional Feed-Forward Network (ConvFFN) to address the lack of channel mixing. Our experiments demonstrate that MSVMamba is highly competitive, with the MSVMamba-Tiny model achieving 83. 0% top-1 accuracy on ImageNet, 46. 9% box mAP, and 42. 5% instance mAP with the Mask R-CNN framework, 1x training schedule on COCO, and 47. 9% mIoU with single-scale testing on ADE20K. Code is available at https: //github. com/YuHengsss/MSVMamba.

NeurIPS Conference 2023 Conference Paper

Adversarial Robustness through Random Weight Sampling

  • Yanxiang Ma
  • Minjing Dong
  • Chang Xu

Deep neural networks have been found to be vulnerable in a variety of tasks. Adversarial attacks can manipulate network outputs, resulting in incorrect predictions. Adversarial defense methods aim to improve the adversarial robustness of networks by countering potential attacks. In addition to traditional defense approaches, randomized defense mechanisms have recently received increasing attention from researchers. These methods introduce different types of perturbations during the inference phase to destabilize adversarial attacks. Although promising empirical results have been demonstrated by these approaches, the defense performance is quite sensitive to the randomness parameters, which are always manually tuned without further analysis. On the contrary, we propose incorporating random weights into the optimization to fully exploit the potential of randomized defense. To perform better optimization of randomness parameters, we conduct a theoretical analysis of the connections between randomness parameters and gradient similarity as well as natural performance. From these two aspects, we suggest imposing theoretically-guided constraints on random weights during optimizations, as these weights play a critical role in balancing natural performance and adversarial robustness. We derive both the upper and lower bounds of random weight parameters by considering prediction bias and gradient similarity. In this study, we introduce the Constrained Trainable Random Weight (CTRW), which adds random weight parameters to the optimization and includes a constraint guided by the upper and lower bounds to achieve better trade-offs between natural and robust accuracy. We evaluate the effectiveness of CTRW on several datasets and benchmark convolutional neural networks. Our results indicate that our model achieves a robust accuracy approximately 16% to 17% higher than the baseline model under PGD-20 and 22% to 25% higher on Auto Attack.

NeurIPS Conference 2023 Conference Paper

Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial Defense

  • Zunzhi You
  • Daochang Liu
  • Bohyung Han
  • Chang Xu

Recent advancements in masked image modeling (MIM) have made it a prevailing framework for self-supervised visual representation learning. The MIM pretrained models, like most deep neural network methods, remain vulnerable to adversarial attacks, limiting their practical application, and this issue has received little research attention. In this paper, we investigate how this powerful self-supervised learning paradigm can provide adversarial robustness to downstream classifiers. During the exploration, we find that noisy image modeling (NIM), a simple variant of MIM that adopts denoising as the pre-text task, reconstructs noisy images surprisingly well despite severe corruption. Motivated by this observation, we propose an adversarial defense method, referred to as De^3, by exploiting the pretrained decoder for denoising. Through De^3, NIM is able to enhance adversarial robustness beyond providing pretrained features. Furthermore, we incorporate a simple modification, sampling the noise scale hyperparameter from random distributions, and enable the defense to achieve a better and tunable trade-off between accuracy and robustness. Experimental results demonstrate that, in terms of adversarial robustness, NIM is superior to MIM thanks to its effective denoising capability. Moreover, the defense provided by NIM achieves performance on par with adversarial training while offering the extra tunability advantage. Source code and models are available at https: //github. com/youzunzhi/NIM-AdvDef.

AAAI Conference 2023 Conference Paper

Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations

  • Haoyu Xie
  • Changqi Wang
  • Mingkai Zheng
  • Minjing Dong
  • Shan You
  • Chong Fu
  • Chang Xu

Recent breakthroughs in semi-supervised semantic segmentation have been developed through contrastive learning. In prevalent pixel-wise contrastive learning solutions, the model maps pixels to deterministic representations and regularizes them in the latent space. However, there exist inaccurate pseudo-labels which map the ambiguous representations of pixels to the wrong classes due to the limited cognitive ability of the model. In this paper, we define pixel-wise representations from a new perspective of probability theory and propose a Probabilistic Representation Contrastive Learning (PRCL) framework that improves representation quality by taking its probability into consideration. Through modelling the mapping from pixels to representations as the probability via multivariate Gaussian distributions, we can tune the contribution of the ambiguous representations to tolerate the risk of inaccurate pseudo-labels. Furthermore, we define prototypes in the form of distributions, which indicates the confidence of a class, while the point prototype cannot. More- over, we propose to regularize the distribution variance to enhance the reliability of representations. Taking advantage of these benefits, high-quality feature representations can be derived in the latent space, thereby the performance of se- mantic segmentation can be further improved. We conduct sufficient experiment to evaluate PRCL on Pascal VOC and CityScapes to demonstrate its superiority. The code is available at https://github.com/Haoyu-Xie/PRCL.

IJCAI Conference 2023 Conference Paper

Calibrating a Deep Neural Network with Its Predecessors

  • Linwei Tao
  • Minjing Dong
  • Daochang Liu
  • Changming Sun
  • Chang Xu

Confidence calibration - the process to calibrate the output probability distribution of neural networks - is essential for safety-critical applications of such networks. Recent works verify the link between mis-calibration and overfitting. However, early stopping, as a well-known technique to mitigate overfitting, fails to calibrate networks. In this work, we study the limitions of early stopping and comprehensively analyze the overfitting problem of a network considering each individual block. We then propose a novel regularization method, predecessor combination search (PCS), to improve calibration by searching a combination of best-fitting block predecessors, where block predecessors are the corresponding network blocks with weight parameters from earlier training stages. PCS achieves the state-of-the-art calibration performance on multiple datasets and architectures. In addition, PCS improves model robustness under dataset distribution shift. Supplementary material and code are available at https: //github. com/Linwei94/PCS

AAAI Conference 2023 Conference Paper

ContraFeat: Contrasting Deep Features for Semantic Discovery

  • Xinqi Zhu
  • Chang Xu
  • Dacheng Tao

StyleGAN has shown strong potential for disentangled semantic control, thanks to its special design of multi-layer intermediate latent variables. However, existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results, which is tedious and demanding. In this paper, we propose a model that automates this process and achieves state-of-the-art semantic discovery performance. The model consists of an attention-equipped navigator module and losses contrasting deep-feature changes. We propose two model variants, with one contrasting samples in a binary manner, and another one contrasting samples with learned prototype variation patterns. The proposed losses are computed with pretrained deep features, based on our assumption that the features implicitly possess the desired semantic variation structure including consistency and orthogonality. Additionally, we design two metrics to quantitatively evaluate the performance of semantic discovery methods on FFHQ dataset, and also show that disentangled representations can be derived via a simple training process. Experimentally, we show that our models achieve state-of-the-art semantic discovery results without relying on layer-wise manual selection, and these discovered semantics can be used to manipulate real-world images.

NeurIPS Conference 2023 Conference Paper

Contrastive Sampling Chains in Diffusion Models

  • Junyu Zhang
  • Daochang Liu
  • Shichao Zhang
  • Chang Xu

The past few years have witnessed great success in the use of diffusion models (DMs) to generate high-fidelity images with the help of stochastic differential equations (SDEs). However, discretization error is an inevitable limitation when utilizing numerical solvers to solve SDEs. To address this limitation, we provide a theoretical analysis demonstrating that an appropriate combination of the contrastive loss and score matching serves as an upper bound of the KL divergence between the true data distribution and the model distribution. To obtain this bound, we utilize a contrastive loss to construct a contrastive sampling chain to fine-tuning the pre-trained DM. In this manner, our method reduces the discretization error and thus yields a smaller gap between the true data distribution and our model distribution. Moreover, the presented method can be applied to fine-tuning various pre-trained DMs, both with or without fast sampling algorithms, contributing to better sample quality or slightly faster sampling speeds. To validate the efficacy of our method, we conduct comprehensive experiments. For example, on CIFAR10, when applied to a pre-trained EDM, our method improves the FID from 2. 04 to 1. 88 with 35 neural function evaluations (NFEs), and reduces NFEs from 35 to 25 to achieve the same 2. 04 FID.

NeurIPS Conference 2023 Conference Paper

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

  • Yichao Cao
  • Qingfei Tang
  • Xiu Su
  • Song Chen
  • Shan You
  • Xiaobo Lu
  • Chang Xu

Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as UniHOI. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (i. e. GPT) for interaction interpretation, generating a richer linguistic understanding for complex HOIs. For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence. Our efficient architecture design and learning methods effectively unleash the potential of the VL foundation models and LLMs, allowing UniHOI to surpass all existing methods with a substantial margin, under both supervised and zero-shot settings. The code and pre-trained weights will be made publicly available.

NeurIPS Conference 2023 Conference Paper

Knowledge Diffusion for Distillation

  • Tao Huang
  • Yuan Zhang
  • Mingkai Zheng
  • Shan You
  • Fei Wang
  • Chen Qian
  • Chang Xu

The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models. Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model. To address this, we propose to denoise student features using a diffusion model trained by teacher features. This allows us to perform better distillation between the refined clean feature and teacher feature. Additionally, we introduce a light-weight diffusion model with a linear autoencoder to reduce the computation cost and an adaptive noise matching module to improve the denoising performance. Extensive experiments demonstrate that DiffKD is effective across various types of features and achieves state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at https: //github. com/hunto/DiffKD.

AAAI Conference 2023 Conference Paper

Neural Architecture Search for Wide Spectrum Adversarial Robustness

  • Zhi Cheng
  • Yanxi Li
  • Minjing Dong
  • Xiu Su
  • Shan You
  • Chang Xu

One major limitation of CNNs is that they are vulnerable to adversarial attacks. Currently, adversarial robustness in neural networks is commonly optimized with respect to a small pre-selected adversarial noise strength, causing them to have potentially limited performance when under attack by larger adversarial noises in real-world scenarios. In this research, we aim to find Neural Architectures that have improved robustness on a wide range of adversarial noise strengths through Neural Architecture Search. In detail, we propose a lightweight Adversarial Noise Estimator to reduce the high cost of generating adversarial noise with respect to different strengths. Besides, we construct an Efficient Wide Spectrum Searcher to reduce the cost of adjusting network architecture with the large adversarial validation set during the search. With the two components proposed, the number of adversarial noise strengths searched can be increased significantly while having a limited increase in search time. Extensive experiments on benchmark datasets such as CIFAR and ImageNet demonstrate that with a significantly richer search signal in robustness, our method can find architectures with improved overall robustness while having a limited impact on natural accuracy and around 40% reduction in search time compared with the naive approach of searching. Codes available at: https://github.com/zhicheng2T0/Wsr-NAS.git

NeurIPS Conference 2023 Conference Paper

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

  • Zhiwei Hao
  • Jianyuan Guo
  • Kai Han
  • Yehui Tang
  • Han Hu
  • Yunhe Wang
  • Chang Xu

Knowledge distillation (KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8. 0% on the CIFAR-100 dataset and 0. 7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https: //github. com/Hao840/OFAKD.

NeurIPS Conference 2023 Conference Paper

Rethinking Conditional Diffusion Sampling with Progressive Guidance

  • Anh-Dung Dinh
  • Daochang Liu
  • Chang Xu

This paper tackles two critical challenges encountered in classifier guidance for diffusion generative models, i. e. , the lack of diversity and the presence of adversarial effects. These issues often result in a scarcity of diverse samples or the generation of non-robust features. The underlying cause lies in the mechanism of classifier guidance, where discriminative gradients push samples to be recognized as conditions aggressively. This inadvertently suppresses information with common features among relevant classes, resulting in a limited pool of features with less diversity or the absence of robust features for image construction. We propose a generalized classifier guidance method called Progressive Guidance, which mitigates the problems by allowing relevant classes' gradients to contribute to shared information construction when the image is noisy in early sampling steps. In the later sampling stage, we progressively enhance gradients to refine the details in the image toward the primary condition. This helps to attain a high level of diversity and robustness compared to the vanilla classifier guidance. Experimental results demonstrate that our proposed method further improves the image quality while offering a significant level of diversity as well as robust features.

NeurIPS Conference 2023 Conference Paper

Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale

  • Zhiwei Hao
  • Jianyuan Guo
  • Kai Han
  • Han Hu
  • Chang Xu
  • Yunhe Wang

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results. Therefore, the reflection on the rationality of designing knowledge distillation (KD) approaches for limited-capacity architectures solely based on small-scale datasets is now deemed imperative. In this paper, we identify the small data pitfall that presents in previous KD methods, which results in the underestimation of the power of vanilla KD framework on large-scale datasets such as ImageNet-1K. Specifically, we show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants. This highlights the necessity of designing and evaluating KD approaches in the context of practical scenarios, casting off the limitations of small-scale datasets. Our investigation of the vanilla KD and its variants in more complex schemes, including stronger training strategies and different model capacities, demonstrates that vanilla KD is elegantly simple but astonishingly effective in large-scale scenarios. Without bells and whistles, we obtain state-of-the-art ResNet-50, ViT-S, and ConvNeXtV2-T models for ImageNet, which achieve 83. 1%, 84. 3%, and 85. 0% top-1 accuracy, respectively. PyTorch code and checkpoints can be found at https: //github. com/Hao840/vanillaKD.

NeurIPS Conference 2023 Conference Paper

Stable Diffusion is Unstable

  • Chengbin Du
  • Yanxi Li
  • Zhongwei Qiu
  • Chang Xu

Recently, text-to-image models have been thriving. Despite their powerful generative capacity, our research has uncovered a lack of robustness in this generation process. Specifically, the introduction of small perturbations to the text prompts can result in the blending of primary subjects with other categories or their complete disappearance in the generated images. In this paper, we propose Auto-attack on Text-to-image Models (ATM), a gradient-based approach, to effectively and efficiently generate such perturbations. By learning a Gumbel Softmax distribution, we can make the discrete process of word replacement or extension continuous, thus ensuring the differentiability of the perturbation generation. Once the distribution is learned, ATM can sample multiple attack samples simultaneously. These attack samples can prevent the generative model from generating the desired subjects without tampering with the category keywords in the prompt. ATM has achieved a 91. 1\% success rate in short-text attacks and an 81. 2\% success rate in long-text attacks. Further empirical analysis revealed three attack patterns based on: 1) variability in generation speed, 2) similarity of coarse-grained characteristics, and 3) polysemy of words. The code is available at https: //github. com/duchengbin8/Stable Diffusion is_Unstable

AAAI Conference 2023 Conference Paper

Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning

  • Yunke Wang
  • Bo Du
  • Chang Xu

Adversarial imitation learning has become a widely used imitation learning framework. The discriminator is often trained by taking expert demonstrations and policy trajectories as examples respectively from two categories (positive vs. negative) and the policy is then expected to produce trajectories that are indistinguishable from the expert demonstrations. But in the real world, the collected expert demonstrations are more likely to be imperfect, where only an unknown fraction of the demonstrations are optimal. Instead of treating imperfect expert demonstrations as absolutely positive or negative, we investigate unlabeled imperfect expert demonstrations as they are. A positive-unlabeled adversarial imitation learning algorithm is developed to dynamically sample expert demonstrations that can well match the trajectories from the constantly optimized agent policy. The trajectories of an initial agent policy could be closer to those non-optimal expert demonstrations, but within the framework of adversarial imitation learning, agent policy will be optimized to cheat the discriminator and produce trajectories that are similar to those optimal expert demonstrations. Theoretical analysis shows that our method learns from the imperfect demonstrations via a self-paced way. Experimental results on MuJoCo and RoboSuite platforms demonstrate the effectiveness of our method from different aspects.

NeurIPS Conference 2022 Conference Paper

GhostNetV2: Enhance Cheap Operation with Long-Range Attention

  • Yehui Tang
  • Kai Han
  • Jianyuan Guo
  • Chang Xu
  • Chao Xu
  • Yunhe Wang

Light-weight convolutional neural networks (CNNs) are specially designed for applications on mobile devices with faster inference speed. The convolutional operation can only capture local information in a window region, which prevents performance from being further improved. Introducing self-attention into convolution can capture global information well, but it will largely encumber the actual speed. In this paper, we propose a hardware-friendly attention mechanism (dubbed DFC attention) and then present a new GhostNetV2 architecture for mobile applications. The proposed DFC attention is constructed based on fully-connected layers, which can not only execute fast on common hardware but also capture the dependence between long-range pixels. We further revisit the expressiveness bottleneck in previous GhostNet and propose to enhance expanded features produced by cheap operations with DFC attention, so that a GhostNetV2 block can aggregate local and long-range information simultaneously. Extensive experiments demonstrate the superiority of GhostNetV2 over existing architectures. For example, it achieves 75. 3% top-1 accuracy on ImageNet with 167M FLOPs, significantly suppressing GhostNetV1 (74. 5%) with a similar computational cost. The source code will be available at https: //github. com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv2_pytorch and https: //gitee. com/mindspore/models/tree/master/research/cv/ghostnetv2.

NeurIPS Conference 2022 Conference Paper

Knowledge Distillation from A Stronger Teacher

  • Tao Huang
  • Shan You
  • Fei Wang
  • Chen Qian
  • Chang Xu

Unlike existing knowledge distillation methods focus on the baseline settings, where the teacher models and training strategies are not that strong and competing as state-of-the-art approaches, this paper presents a method dubbed DIST to distill better from a stronger teacher. We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer. As a result, the exact match of predictions in KL divergence would disturb the training and make existing methods perform poorly. In this paper, we show that simply preserving the relations between the predictions of teacher and student would suffice, and propose a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly. Besides, considering that different instances have different semantic similarities to each class, we also extend this relational match to the intra-class level. Our method is simple yet practical, and extensive experiments demonstrate that it adapts well to various architectures, model sizes and training strategies, and can achieve state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at: https: //github. com/hunto/DIST_KD.

NeurIPS Conference 2022 Conference Paper

Random Normalization Aggregation for Adversarial Defense

  • Minjing Dong
  • Xinghao Chen
  • Yunhe Wang
  • Chang Xu

The vulnerability of deep neural networks has been widely found in various models as well as tasks where slight perturbations on the inputs could lead to incorrect predictions. These perturbed inputs are known as adversarial examples and one of the intriguing properties of them is Adversarial Transfersability, i. e. the capability of adversarial examples to fool other models. Traditionally, this transferability is always regarded as a critical threat to the defense against adversarial attacks, however, we argue that the network robustness can be significantly boosted by utilizing adversarial transferability from a new perspective. In this work, we first discuss the influence of different popular normalization layers on the adversarial transferability, and then provide both empirical evidence and theoretical analysis to shed light on the relationship between normalization types and transferability. Based on our theoretical analysis, we propose a simple yet effective module named Random Normalization Aggregation (RNA) which replaces the batch normalization layers in the networks and aggregates different selected normalization types to form a huge random space. Specifically, a random path is sampled during each inference procedure so that the network itself can be treated as an ensemble of a wide range of different models. Since the entire random space is designed with low adversarial transferability, it is difficult to perform effective attacks even when the network parameters are accessible. We conduct extensive experiments on various models and datasets, and demonstrate the strong superiority of proposed algorithm. The PyTorch code is available at https: //github. com/UniSerj/Random-Norm-Aggregation and the MindSpore code is available at https: //gitee. com/mindspore/models/tree/master/research/cv/RNA.

NeurIPS Conference 2022 Conference Paper

Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition

  • Yichao Cao
  • Xiu Su
  • Qingfei Tang
  • Shan You
  • Xiaobo Lu
  • Chang Xu

Spatio-Temporal feature matching and alignment are essential for few-shot action recognition as they determine the coherence and effectiveness of the temporal patterns. Nevertheless, this process could be not reliable, especially when dealing with complex video scenarios. In this paper, we propose to improve the performance of matching and alignment from the end-to-end design of models. Our solution comes at two-folds. First, we encourage to enhance the extracted Spatio-Temporal representations from few-shot videos in the perspective of architectures. With this aim, we propose a specialized transformer search method for videos, thus the spatial and temporal attention can be well-organized and optimized for stronger feature representations. Second, we also design an efficient non-parametric spatio-temporal prototype alignment strategy to better handle the high variability of motion. In particular, a query-specific class prototype will be generated for each query sample and category, which can better match query sequences against all support sequences. By doing so, our method SST enjoys significant superiority over the benchmark UCF101 and HMDB51 datasets. For example, with no pretraining, our method achieves 17. 1\% Top-1 accuracy improvement than the baseline TRX on UCF101 5-way 1-shot setting but with only 3x fewer FLOPs.

AAAI Conference 2021 Conference Paper

Adversarial Robustness through Disentangled Representations

  • Shuo Yang
  • Tianyu Guo
  • Yunhe Wang
  • Chang Xu

Despite the remarkable empirical performance of deep learning models, their vulnerability to adversarial examples has been revealed in many studies. They are prone to make a susceptible prediction to the input with imperceptible adversarial perturbation. Although recent works have remarkably improved the model’s robustness under the adversarial training strategy, an evident gap between the natural accuracy and adversarial robustness inevitably exists. In order to mitigate this problem, in this paper, we assume that the robust and non-robust representations are two basic ingredients entangled in the integral representation. For achieving adversarial robustness, the robust representations of natural and adversarial examples should be disentangled from the non-robust part and the alignment of the robust representations can bridge the gap between accuracy and robustness. Inspired by this motivation, we propose a novel defence method called Deep Robust Representation Disentanglement Network (DRRDN). Specifically, DRRDN employs a disentangler to extract and align the robust representations from both adversarial and natural examples. Theoretical analysis guarantees the mitigation of the trade-off between robustness and accuracy with good disentanglement and alignment performance. Experimental results on benchmark datasets finally demonstrate the empirical superiority of our method.

NeurIPS Conference 2021 Conference Paper

An Empirical Study of Adder Neural Networks for Object Detection

  • Xinghao Chen
  • Chang Xu
  • Minjing Dong
  • Chunjing Xu
  • Yunhe Wang

Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations, which are more energy efficient than traditional convolutional neural networks built with multiplications. Compared with classification, there is a strong demand on reducing the energy consumption of modern object detectors via AdderNets for real-world applications such as autonomous driving and face detection. In this paper, we present an empirical study of AdderNets for object detection. We first reveal that the batch normalization statistics in the pre-trained adder backbone should not be frozen, since the relatively large feature variance of AdderNets. Moreover, we insert more shortcut connections in the neck part and design a new feature fusion architecture for avoiding the sparse features of adder layers. We present extensive ablation studies to explore several design choices of adder detectors. Comparisons with state-of-the-arts are conducted on COCO and PASCAL VOC benchmarks. Specifically, the proposed Adder FCOS achieves a 37. 8% AP on the COCO val set, demonstrating comparable performance to that of the convolutional counterpart with an about $1. 4\times$ energy reduction.

NeurIPS Conference 2021 Conference Paper

Augmented Shortcuts for Vision Transformers

  • Yehui Tang
  • Kai Han
  • Chang Xu
  • An Xiao
  • Yiping Deng
  • Chao Xu
  • Yunhe Wang

Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, \ie, feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs.

NeurIPS Conference 2021 Conference Paper

Handling Long-tailed Feature Distribution in AdderNets

  • Minjing Dong
  • Yunhe Wang
  • Xinghao Chen
  • Chang Xu

Adder neural networks (ANNs) are designed for low energy cost which replace expensive multiplications in convolutional neural networks (CNNs) with cheaper additions to yield energy-efficient neural networks and hardware accelerations. Although ANNs achieve satisfactory efficiency, there exist gaps between ANNs and CNNs where the accuracy of ANNs can hardly be compared to CNNs without the assistance of other training tricks, such as knowledge distillation. The inherent discrepancy lies in the similarity measurement between filters and features, however how to alleviate this difference remains unexplored. To locate the potential problem of ANNs, we focus on the property difference due to similarity measurement. We demonstrate that unordered heavy tails in ANNs could be the key component which prevents ANNs from achieving superior classification performance since fatter tails tend to overlap in feature space. Through pre-defining Multivariate Skew Laplace distributions and embedding feature distributions into the loss function, ANN features can be fully controlled and designed for various properties. We further present a novel method for tackling existing heavy tails in ANNs with only a modification of classifier where ANN features are clustered with their tails well-formulated through proposed angle-based constraint on the distribution parameters to encourage high diversity of tails. Experiments conducted on several benchmarks and comparison with other distributions demonstrate the effectiveness of proposed approach for boosting the performance of ANNs.

NeurIPS Conference 2021 Conference Paper

Learning Frequency Domain Approximation for Binary Neural Networks

  • Yixing Xu
  • Kai Han
  • Chang Xu
  • Yehui Tang
  • Chunjing Xu
  • Yunhe Wang

Binary neural networks (BNNs) represent original full-precision weights and activations into 1-bit with sign function. Since the gradient of the conventional sign function is almost zero everywhere which cannot be used for back-propagation, several attempts have been proposed to alleviate the optimization difficulty by using approximate gradient. However, those approximations corrupt the main direction of factual gradient. To this end, we propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs, namely frequency domain approximation (FDA). The proposed approach does not affect the low-frequency information of the original sign function which occupies most of the overall energy, and high-frequency coefficients will be ignored to avoid the huge computational overhead. In addition, we embed a noise adaptation module into the training phase to compensate the approximation error. The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy. Code will be available at https: //gitee. com/mindspore/models/tree/master/research/cv/FDA-BNN.

NeurIPS Conference 2021 Conference Paper

Neural Architecture Dilation for Adversarial Robustness

  • Yanxi Li
  • Zhaohui Yang
  • Yunhe Wang
  • Chang Xu

With the tremendous advances in the architecture and scale of convolutional neural networks (CNNs) over the past few decades, they can easily reach or even exceed the performance of humans in certain tasks. However, a recently discovered shortcoming of CNNs is that they are vulnerable to adversarial attacks. Although the adversarial robustness of CNNs can be improved by adversarial training, there is a trade-off between standard accuracy and adversarial robustness. From the neural architecture perspective, this paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, the introduction of a dilation architecture is expected to be friendly with the standard performance of the backbone CNN while pursuing adversarial robustness. Theoretical analyses on the standard and adversarial error bounds naturally motivate the proposed neural architecture dilation algorithm. Experimental results on real-world datasets and benchmark neural networks demonstrate the effectiveness of the proposed algorithm to balance the accuracy and adversarial robustness.

AAAI Conference 2021 Conference Paper

One-shot Graph Neural Architecture Search with Dynamic Search Space

  • Yanxi Li
  • Zean Wen
  • Yunhe Wang
  • Chang Xu

Relying on the diverse graph convolution operations that have emerged in recent years, graph neural networks (GNNs) are shown to be powerful to deal with high-dimensional non- Euclidean domains, such as social networks or citation networks. Despite the tremendous human efforts been taken to explore new graph convolution operations, there are a few attempts to automatically search operations in GNNs. The search space of GNNs is significantly larger than that of CNNs, because of diverse components in the messagepassing of GNNs. This, therefore, prevents the straightforward application of classical NAS methods for GNNs. In this work, we propose a novel dynamic one-shot search space for multi-branch neural architectures of GNNs. The dynamic search space maintains a subset of the large search space along with a set of importance weights for operation candidates in the subset as the architecture parameters. After each iteration, the subset is pruned by removing candidates with low importance weights and is expanded with new operations. The dynamic subsets of operation candidates are not uniform but is individual for each edge in the computation graph of the neural architecture, which can ensure the diversity of operations in the final architecture is as competitive as direct search in the large search space. Our experiments of semisupervised and supervised node classification on citation networks, including Cora, Citeseer, and Pubmed, demonstrate that our method outperforms the current state-of-the-art manually designed architectures and reaches competitive performance to existing GNN NAS approaches with up to 10 times of speedup.

AAAI Conference 2021 Conference Paper

PTN: A Poisson Transfer Network for Semi-supervised Few-shot Learning

  • Huaxi Huang
  • Junjie Zhang
  • Jian Zhang
  • Qiang Wu
  • Chang Xu

The predicament in semi-supervised few-shot learning (SS- FSL) is to maximize the value of the extra unlabeled data to boost the few-shot learner. In this paper, we propose a Poisson Transfer Network (PTN) to mine the unlabeled information for SSFSL from two aspects. First, the Poisson Merriman–Bence–Osher (MBO) model builds a bridge for the communications between labeled and unlabeled examples. This model serves as a more stable and informative classifier than traditional graph-based SSFSL methods in the message-passing process of the labels. Second, the extra unlabeled samples are employed to transfer the knowledge from base classes to novel classes through contrastive learning. Specifically, we force the augmented positive pairs close while push the negative ones distant. Our contrastive transfer scheme implicitly learns the novel-class embeddings to alleviate the over-fitting problem on the few labeled data. Thus, we can mitigate the degeneration of embedding generality in novel classes. Extensive experiments indicate that PTN outperforms the state-of-the-art few-shot and SSFSL models on miniImageNet and tieredImageNet benchmark datasets.

NeurIPS Conference 2021 Conference Paper

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

  • Mingkai Zheng
  • Shan You
  • Fei Wang
  • Chen Qian
  • Changshui Zhang
  • Xiaogang Wang
  • Chang Xu

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most of methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduced a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. Moreover, to boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. Experimental results show that our proposed ReSSL significantly outperforms the previous state-of-the-art algorithms in terms of both performance and training efficiency.

IJCAI Conference 2021 Conference Paper

Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations

  • Yunke Wang
  • Chang Xu
  • Bo Du

The agent in imitation learning (IL) is expected to mimic the behavior of the expert. Its performance relies highly on the quality of given expert demonstrations. However, the assumption that collected demonstrations are optimal cannot always hold in real-world tasks, which would seriously influence the performance of the learned agent. In this paper, we propose a robust method within the framework of Generative Adversarial Imitation Learning (GAIL) to address imperfect demonstration issue, in which good demonstrations can be adaptively selected for training while bad demonstrations are abandoned. Specifically, a binary weight is assigned to each expert demonstration to indicate whether to select it for training. The reward function in GAIL is employed to determine this weight (i. e. higher reward results in higher weight). Compared to some existing solutions that require some auxiliary information about this weight, we set up the connection between weight and model so that we can jointly optimize GAIL and learn the latent weight. Besides hard binary weighting, we also propose a soft weighting scheme. Experiments in the Mujoco demonstrate the proposed method outperforms other GAIL-based methods when dealing with imperfect demonstrations.

NeurIPS Conference 2021 Conference Paper

Towards Stable and Robust AdderNets

  • Minjing Dong
  • Yunhe Wang
  • Xinghao Chen
  • Chang Xu

Adder neural network (AdderNet) replaces the original convolutions with massive multiplications by cheap additions while achieving comparable performance thus yields a series of energy-efficient neural networks. Compared with convolutional neural networks (CNNs), the training of AdderNets is much more sophisticated including several techniques for adjusting gradient and batch normalization. In addition, variances of both weights and activations in resulting adder networks are very enormous which limits its performance and the potential for applying to other tasks. To enhance the stability and robustness of AdderNets, we first thoroughly analyze the variance estimation of weight parameters and output features of an arbitrary adder layer. Then, we develop a weight normalization scheme for adaptively optimizing the weight distribution of AdderNets during the training procedure, which can reduce the perturbation on running mean and variance in batch normalization layers. Meanwhile, the proposed weight normalization can also be utilized to enhance the adversarial robustness of resulting networks. Experiments conducted on several benchmarks demonstrate the superiority of the proposed approach for generating AdderNets with higher performance.

NeurIPS Conference 2020 Conference Paper

Adapting Neural Architectures Between Domains

  • Yanxi Li
  • Zhaohui Yang
  • Yunhe Wang
  • Chang Xu

Neural architecture search (NAS) has demonstrated impressive performance in automatically designing high-performance neural networks. The power of deep neural networks is to be unleashed for analyzing a large volume of data (e. g. ImageNet), but the architecture search is often executed on another smaller dataset (e. g. CIFAR-10) to finish it in a feasible time. However, it is hard to guarantee that the optimal architecture derived on the proxy task could maintain its advantages on another more challenging dataset. This paper aims to improve the generalization of neural architectures via domain adaptation. We analyze the generalization bounds of the derived architecture and suggest its close relations with the validation error and the data distribution distance on both domains. These theoretical analyses lead to AdaptNAS, a novel and principled approach to adapt neural architectures between domains in NAS. Our experimental evaluation shows that only a small part of ImageNet will be sufficient for AdaptNAS to extend its architecture success to the entire ImageNet and outperform state-of-the-art comparison algorithms.

AAAI Conference 2020 Conference Paper

Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks

  • Yehui Tang
  • Yunhe Wang
  • Yixing Xu
  • Boxin Shi
  • Chao Xu
  • Chunjing Xu
  • Chang Xu

Deep neural networks often consist of a great number of trainable parameters for extracting powerful features from given datasets. One one hand, massive trainable parameters significantly enhance the performance of these deep networks. One the other hand, they bring the problem of over-fitting. To this end, dropout based methods disable some elements in the output feature maps during the training phase for reducing the co-adaptation of neurons. Although the generalization ability of the resulting models can be enhanced by these approaches, the conventional binary dropout is not the optimal solution. Therefore, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks and propose a feature distortion method for addressing the aforementioned problem. In the training period, randomly selected elements in the feature maps will be replaced with specific values by exploiting the generalization error bound. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated on several benchmark image datasets.

AAAI Conference 2020 Conference Paper

Distilling Portable Generative Adversarial Networks for Image Translation

  • Hanting Chen
  • Yunhe Wang
  • Han Shu
  • Changyuan Wen
  • Chunjing Xu
  • Boxin Shi
  • Chao Xu
  • Chang Xu

Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance.

AAAI Conference 2020 Conference Paper

Efficient Residual Dense Block Search for Image Super-Resolution

  • Dehua Song
  • Chang Xu
  • Xu Jia
  • Yiyi Chen
  • Chunjing Xu
  • Yunhe Wang

Although remarkable progress has been made on single image super-resolution due to the revival of deep convolutional neural networks, deep learning methods are confronted with the challenges of computation and memory consumption in practice, especially for mobile devices. Focusing on this issue, we propose an efficient residual dense block search algorithm with multiple objectives to hunt for fast, lightweight and accurate networks for image super-resolution. Firstly, to accelerate super-resolution network, we exploit the variation of feature scale adequately with the proposed efficient residual dense blocks. In the proposed evolutionary algorithm, the locations of pooling and upsampling operator are searched automatically. Secondly, network architecture is evolved with the guidance of block credits to acquire accurate superresolution network. The block credit reflects the effect of current block and is earned during model evaluation process. It guides the evolution by weighing the sampling probability of mutation to favor admirable blocks. Extensive experimental results demonstrate the effectiveness of the proposed searching method and the found efficient super-resolution models achieve better performance than the state-of-the-art methods with limited number of parameters and FLOPs.

NeurIPS Conference 2020 Conference Paper

Kernel Based Progressive Distillation for Adder Neural Networks

  • Yixing Xu
  • Chang Xu
  • Xinghao Chen
  • Wei Zhang
  • Chunjing Xu
  • Yunhe Wang

Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using $\ell_1$-norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the difference of their distributions using a kernel based method. Finally, the desired ANN is learned based on the information from both the ground-truth and teacher, progressively. The effectiveness of the proposed method for learning ANN with higher performance is then well-verified on several benchmarks. For instance, the ANN-50 trained using the proposed PKKD method obtains a 76. 8\% top-1 accuracy on ImageNet dataset, which is 0. 6\% higher than that of the ResNet-50.

AAAI Conference 2020 Conference Paper

Learning Student Networks with Few Data

  • Shumin Kong
  • Tianyu Guo
  • Shan You
  • Chang Xu

Recently, the teacher-student learning paradigm has drawn much attention in compressing neural networks on low-end edge devices, such as mobile phones and wearable watches. Current algorithms mainly assume the complete dataset for the teacher network is also available for the training of the student network. However, for real-world scenarios, users may only have access to part of training examples due to commercial profits or data privacy, and severe over-fitting issues would happen as a result. In this paper, we tackle the challenge of learning student networks with few data by investigating the ground-truth data-generating distribution underlying these few data. Taking Wasserstein distance as the measurement, we assume this ideal data distribution lies in a neighborhood of the discrete empirical distribution induced by the training examples. Thus we propose to safely optimize the worst-case cost within this neighborhood to boost the generalization. Furthermore, with theoretical analysis, we derive a novel and easy-to-implement loss for training the student network in an end-to-end fashion. Experimental results on benchmark datasets validate the effectiveness of our proposed method.

AAAI Conference 2020 Conference Paper

Reborn Filters: Pruning Convolutional Neural Networks with Limited Data

  • Yehui Tang
  • Shan You
  • Chang Xu
  • Jin Han
  • Chen Qian
  • Boxin Shi
  • Chao Xu
  • Changshui Zhang

Channel pruning is effective in compressing the pretrained CNNs for their deployment on low-end edge devices. Most existing methods independently prune some of the original channels and need the complete original dataset to fix the performance drop after pruning. However, due to commercial protection or data privacy, users may only have access to a tiny portion of training examples, which could be insufficient for the performance recovery. In this paper, for pruning with limited data, we propose to use all original filters to directly develop new compact filters, named reborn filters, so that all useful structure priors in the original filters can be well preserved into the pruned networks, alleviating the performance drop accordingly. During training, reborn filters can be easily implemented via 1 × 1 convolutional layers and then be fused in the inference stage for acceleration. Based on reborn filters, the proposed channel pruning algorithm shows its effectiveness and superiority on extensive experiments.

NeurIPS Conference 2020 Conference Paper

SCOP: Scientific Control for Reliable Neural Network Pruning

  • Yehui Tang
  • Yunhe Wang
  • Yixing Xu
  • Dacheng Tao
  • Chunjing Xu
  • Chao Xu
  • Chang Xu

This paper proposes a reliable neural network pruning algorithm by setting up a scientific control. Existing pruning methods have developed various hypotheses to approximate the importance of filters to the network and then execute filter pruning accordingly. To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output. Acting as a control group, knockoff feature is generated to mimic the feature map produced by the network filter, but they are conditionally independent of the example label given the real feature map. We theoretically suggest that the knockoff condition can be approximately preserved given the information propagation of network layers. Besides the real feature map on an intermediate layer, the corresponding knockoff feature is brought in as another auxiliary input signal for the subsequent layers. Redundant filters can be discovered in the adversarial process of different features. Through experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art methods. For example, our method can reduce 57. 8% parameters and 60. 2% FLOPs of ResNet-101 with only 0. 01% top-1 accuracy loss on ImageNet.

NeurIPS Conference 2020 Conference Paper

Searching for Low-Bit Weights in Quantized Neural Networks

  • Zhaohui Yang
  • Yunhe Wang
  • Kai Han
  • Chunjing Xu
  • Chao Xu
  • Dacheng Tao
  • Chang Xu

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (\emph{i. e. }, 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network. Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-arts on both image classification and super-resolution tasks.

NeurIPS Conference 2020 Conference Paper

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging

  • Chu Zhou
  • Hang Zhao
  • Jin Han
  • Chang Xu
  • Chao Xu
  • Tiejun Huang
  • Boxin Shi

A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR). In contrast, a modulo camera with a Markov random field (MRF) based unwrapping algorithm can theoretically accomplish unbounded dynamic range but shows degenerate performances when there are modulus-intensity ambiguity, strong local contrast, and color misalignment. In this paper, we reformulate the modulo image unwrapping problem into a series of binary labeling problems and propose a modulo edge-aware model, named as UnModNet, to iteratively estimate the binary rollover masks of the modulo image for unwrapping. Experimental results show that our approach can generate 12-bit HDR images from 8-bit modulo images reliably, and runs much faster than the previous MRF-based algorithm thanks to the GPU acceleration.

IJCAI Conference 2019 Conference Paper

Attribute Aware Pooling for Pedestrian Attribute Recognition

  • Kai Han
  • Yunhe Wang
  • Han Shu
  • Chuanjian Liu
  • Chunjing Xu
  • Chang Xu

This paper expands the strength of deep convolutional neural networks (CNNs) to the pedestrian attribute recognition problem by devising a novel attribute aware pooling algorithm. Existing vanilla CNNs cannot be straightforwardly applied to handle multi-attribute data because of the larger label space as well as the attribute entanglement and correlations. We tackle these challenges that hampers the development of CNNs for multi-attribute classification by fully exploiting the correlation between different attributes. The multi-branch architecture is adopted for fucusing on attributes at different regions. Besides the prediction based on each branch itself, context information of each branch are employed for decision as well. The attribute aware pooling is developed to integrate both kinds of information. Therefore, attributes which are indistinct or tangled with others can be accurately recognized by exploiting the context information. Experiments on benchmark datasets demonstrate that the proposed pooling method appropriately explores and exploits the correlations between attributes for the pedestrian attribute recognition.

IJCAI Conference 2019 Conference Paper

Crafting Efficient Neural Graph of Large Entropy

  • Minjing Dong
  • Hanting Chen
  • Yunhe Wang
  • Chang Xu

Network pruning is widely applied to deep CNN models due to their heavy computation costs and achieves high performance by keeping important weights while removing the redundancy. Pruning redundant weights directly may hurt global information flow, which suggests that an efficient sparse network should take graph properties into account. Thus, instead of paying more attention to preserving important weight, we focus on the pruned architecture itself. We propose to use graph entropy as the measurement, which shows useful properties to craft high-quality neural graphs and enables us to propose efficient algorithm to construct them as the initial network architecture. Our algorithm can be easily implemented and deployed to different popular CNN models and achieve better trade-offs.

AAAI Conference 2019 Conference Paper

Deep Hierarchical Graph Convolution for Election Prediction from Geospatial Census Data

  • Mike Li
  • Elija Perrier
  • Chang Xu

Geographic information systems’ (GIS) research is widely used within the social and physical sciences and plays a crucial role in the development and implementation by governments of economic, education, environment and transportation policy. While machine learning methods have been applied to GIS datasets, the uptake of powerful deep learning CNN methodologies has been limited in part due to challenges posed by the complex and often poorly structured nature of the data. In this paper, we demonstrate the utility of GCNNs for GIS analysis via a multi-graph hierarchical spatial-filter GCNN network model in the context of GIS systems to predict election outcomes using socio-economic features drawn from the 2016 Australian Census. We report a marked improvement in performance accuracy of Hierarchical GCNNs over benchmark generalised linear models and standard GCNNs, especially in semi-supervised tasks. These results indicate the widespread potential for GIS-GCNN research methods to enrich socio-economic GIS analysis, aiding the social sciences and policy development.

NeurIPS Conference 2019 Conference Paper

Learning from Bad Data via Generation

  • Tianyu Guo
  • Chang Xu
  • Boxin Shi
  • Chao Xu
  • Dacheng Tao

Bad training data would challenge the learning model from understanding the underlying data-generating scheme, which then increases the difficulty in achieving satisfactory performance on unseen test data. We suppose the real data distribution lies in a distribution set supported by the empirical distribution of bad data. A worst-case formulation can be developed over this distribution set, and then be interpreted as a generation task in an adversarial manner. The connections and differences between GANs and our framework have been thoroughly discussed. We further theoretically show the influence of this generation task on learning from bad data and reveal its connection with a data-dependent regularization. Given different distance measures (\eg, Wasserstein distance or JS divergence) of distributions, we can derive different objective functions for the problem. Experimental results on different kinds of bad training data demonstrate the necessity and effectiveness of the proposed method.

IJCAI Conference 2019 Conference Paper

Learning Instance-wise Sparsity for Accelerating Deep Models

  • Chuanjian Liu
  • Yunhe Wang
  • Kai Han
  • Chunjing Xu
  • Chang Xu

Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e. g. , pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.

AAAI Conference 2019 Conference Paper

Modeling Local Dependence in Natural Language with Multi-Channel Recurrent Neural Networks

  • Chang Xu
  • Weiran Huang
  • Hongwei Wang
  • Gang Wang
  • Tie-Yan Liu

Recurrent Neural Networks (RNNs) have been widely used in processing natural language tasks and achieve huge success. Traditional RNNs usually treat each token in a sentence uniformly and equally. However, this may miss the rich semantic structure information of a sentence, which is useful for understanding natural languages. Since semantic structures such as word dependence patterns are not parameterized, it is a challenge to capture and leverage structure information. In this paper, we propose an improved variant of RNN, Multi-Channel RNN (MC-RNN), to dynamically capture and leverage local semantic structure information. Concretely, MC-RNN contains multiple channels, each of which represents a local dependence pattern at a time. An attention mechanism is introduced to combine these patterns at each step, according to the semantic information. Then we parameterize structure information by adaptively selecting the most appropriate connection structures among channels. In this way, diverse local structures and dependence patterns in sentences can be well captured by MC-RNN. To verify the effectiveness of MC-RNN, we conduct extensive experiments on typical natural language processing tasks, including neural machine translation, abstractive summarization, and language modeling. Experimental results on these tasks all show significant improvements of MC-RNN over current top systems.

IJCAI Conference 2019 Conference Paper

On Retrospecting Human Dynamics with Attention

  • Minjing Dong
  • Chang Xu

Deep recurrent neural networks have achieved impressive success in forecasting human motion with a sequence to sequence architecture. However, forecasting in longer time horizons often leads to implausible human poses or converges to mean poses, because of error accumulation and difficulties in keeping track of longerterm information. To address these challenges, we propose to retrospect human dynamics with attention. A retrospection module is designed upon RNN to regularly retrospect past frames and correct mistakes in time. This significantly improves the memory of RNN and provides sufficient information for the decoder networks to generate longer term prediction. Moreover, we present a spatial attention module to explore and exploit cooperation among joints in performing a particular motion. Residual connections are also included to guarantee the performance of short term prediction. We evaluate the proposed algorithm on the largest and most challenging Human 3. 6M dataset in the field. Experimental results demonstrate the necessity of investigating motion prediction in a self audit manner and the effectiveness of the proposed algorithm in both short term and long term predictions.

IJCAI Conference 2019 Conference Paper

Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models

  • Chang Xu
  • Tao Qin
  • Gang Wang
  • Tie-Yan Liu

Neural machine translation (NMT) has achieved great success. However, collecting large-scale parallel data for training is costly and laborious. Recently, unsupervised neural machine translation has attracted more and more attention, due to its demand for monolingual corpus only, which is common and easy to obtain, and its great potentials for the low-resource or even zero-resource machine translation. In this work, we propose a general framework called Polygon-Net, which leverages multi auxiliary languages for jointly boosting unsupervised neural machine translation models. Specifically, we design a novel loss function for multi-language unsupervised neural machine translation. In addition, different from the literature that just updating one or two models individually, Polygon-Net enables multiple unsupervised models in the framework to update in turn and enhance each other for the first time. In this way, multiple unsupervised translation models are associated with each other for training to achieve better performance. Experiments on the benchmark datasets including UN Corpus and WMT show that our approach significantly improves over the two-language based methods, and achieves better performance with more languages introduced to the framework.

NeurIPS Conference 2019 Conference Paper

Positive-Unlabeled Compression on the Cloud

  • Yixing Xu
  • Yunhe Wang
  • Hanting Chen
  • Kai Han
  • Chunjing Xu
  • Dacheng Tao
  • Chang Xu

Many attempts have been done to extend the great success of convolutional neural networks (CNNs) achieved on high-end GPU servers to portable devices such as smart phones. Providing compression and acceleration service of deep learning models on the cloud is therefore of significance and is attractive for end users. However, existing network compression and acceleration approaches usually fine-tuning the svelte model by requesting the entire original training data (e. g. ImageNet), which could be more cumbersome than the network itself and cannot be easily uploaded to the cloud. In this paper, we present a novel positive-unlabeled (PU) setting for addressing this problem. In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor. We further introduce a robust knowledge distillation (RKD) scheme to deal with the class imbalance problem of these newly augmented training examples. The superiority of the proposed method is verified through experiments conducted on the benchmark models and datasets. We can use only 8% of uniformly selected data from the ImageNet to obtain an efficient model with comparable performance to the baseline ResNet-34.

AAAI Conference 2019 Conference Paper

Smooth Deep Image Generator from Noises

  • Tianyu Guo
  • Chang Xu
  • Boxin Shi
  • Chao Xu
  • Dacheng Tao

Generative Adversarial Networks (GANs) have demonstrated a strong ability to fit complex distributions since they were presented, especially in the field of generating natural images. Linear interpolation in the noise space produces a continuously changing in the image space, which is an impressive property of GANs. However, there is no special consideration on this property in the objective function of GANs or its derived models. This paper analyzes the perturbation on the input of the generator and its influence on the generated images. A smooth generator is then developed by investigating the tolerable input perturbation. We further integrate this smooth generator with a gradient penalized discriminator, and design smooth GAN that generates stable and high-quality images. Experiments on real-world image datasets demonstrate the necessity of studying smooth generator and the effectiveness of the proposed algorithm.

AAAI Conference 2018 Conference Paper

Adversarial Learning of Portable Student Networks

  • Yunhe Wang
  • Chang Xu
  • Chao Xu
  • Dacheng Tao

Effective methods for learning deep neural networks with fewer parameters are urgently required, since storage and computations of heavy neural networks have largely prevented their widespread use on mobile devices. Compared with algorithms which directly remove weights or filters for obtaining considerable compression and speed-up ratios, training thin deep networks exploiting the student-teacher learning paradigm is more flexible. However, it is very hard to determine which formulation is optimal to measure the information inherited from teacher networks. To overcome this challenge, we utilize the generative adversarial network (GAN) to learn the student network. In practice, the generator is exactly the student network with extremely less parameters and the discriminator is used as a teaching assistant for distinguishing features extracted from student and teacher networks. By simultaneously optimizing the generator and the discriminator, the resulting student network can produce features of input data with the similar distribution as that of features of the teacher network. Extensive experimental results on benchmark datasets demonstrate that the proposed method is capable of learning well-performed portable networks, which is superior to the state-of-the-art methods.

NeurIPS Conference 2018 Conference Paper

Learning Versatile Filters for Efficient Convolutional Neural Networks

  • Yunhe Wang
  • Chang Xu
  • Chunjing Xu
  • Chao Xu
  • Dacheng Tao

This paper introduces versatile filters to construct efficient convolutional neural network. Considering the demands of efficient deep learning techniques running on cost-effective hardware, a number of methods have been developed to learn compact neural networks. Most of these works aim to slim down filters in different ways, e. g. , investigating small, sparse or binarized filters. In contrast, we treat filters from an additive perspective. A series of secondary filters can be derived from a primary filter. These secondary filters all inherit in the primary filter without occupying more storage, but once been unfolded in computation they could significantly enhance the capability of the filter by integrating information extracted from different receptive fields. Besides spatial versatile filters, we additionally investigate versatile filters from the channel perspective. The new techniques are general to upgrade filters in existing CNNs. Experimental results on benchmark datasets and neural networks demonstrate that CNNs constructed with our versatile filters are able to achieve comparable accuracy as that of original filters, but require less memory and FLOPs.

AAAI Conference 2018 Conference Paper

Learning With Single-Teacher Multi-Student

  • Shan You
  • Chang Xu
  • Chao Xu
  • Dacheng Tao

In this paper we study a new learning problem defined as “Single-Teacher Multi-Student” (STMS) problem, which investigates how to learn a series of student (simple and specific) models from a single teacher (complex and universal) model. Taking the multiclass and binary classification for example, we focus on learning multiple binary classifiers from a single multiclass classifier, where each of binary classifier is responsible for a certain class. This actually derives from some realistic problems, such as identifying the suspect based on a comprehensive face recognition system. By treating the already-trained multiclass classifier as the teacher, and multiple binary classifiers as the students, we propose a gated support vector machine (gSVM) as a solution. A series of gSVMs are learned with the help of single teacher multiclass classifier. The teacher’s help is two-fold; first, the teacher’s score provides the gated values for students’ decision; second, the teacher can guide the students to accommodate training examples with different difficulty degrees. Extensive experiments on real datasets validate its effectiveness.

IJCAI Conference 2018 Conference Paper

R-SVM+: Robust Learning with Privileged Information

  • Xue Li
  • Bo Du
  • Chang Xu
  • Yipeng Zhang
  • Lefei Zhang
  • Dacheng Tao

In practice, the circumstance that training and test data are clean is not always satisfied. The performance of existing methods in the learning using privileged information (LUPI) paradigm may be seriously challenged, due to the lack of clear strategies to address potential noises in the data. This paper proposes a novel Robust SVM+ (RSVM+) algorithm based on a rigorous theoretical analysis. Under the SVM+ framework in the LUPI paradigm, we study the lower bound of perturbations of both example feature data and privileged feature data, which will mislead the model to make wrong decisions. By maximizing the lower bound, tolerance of the learned model over perturbations will be increased. Accordingly, a novel regularization function is introduced to upgrade a variant form of SVM+. The objective function of RSVM+ is transformed into a quadratic programming problem, which can be efficiently optimized using off-the-shelf solvers. Experiments on real-world datasets demonstrate the necessity of studying robust SVM+ and the effectiveness of the proposed algorithm.

AAAI Conference 2018 Conference Paper

Reinforced Multi-Label Image Classification by Exploring Curriculum

  • Shiyi He
  • Chang Xu
  • Tianyu Guo
  • Chao Xu
  • Dacheng Tao

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Inspired by this curriculum learning mechanism, we propose a reinforced multi-label image classification approach imitating human behavior to label image from easy to complex. This approach allows a reinforcement learning agent to sequentially predict labels by fully exploiting image feature and previously predicted labels. The agent discovers the optimal policies through maximizing the longterm reward which reflects prediction accuracies. Experimental results on PASCAL VOC2007 and 2012 demonstrate the necessity of reinforcement multi-label learning and the algorithm’s effectiveness in real-world multi-label image classification tasks.

AAAI Conference 2017 Conference Paper

Beyond RPCA: Flattening Complex Noise in the Frequency Domain

  • Yunhe Wang
  • Chang Xu
  • Chao Xu
  • Dacheng Tao

Discovering robust low-rank data representations is important in many real-world problems. Traditional robust principal component analysis (RPCA) assumes that the observed data are corrupted by some sparse noise (e. g. , Laplacian noise) and utilizes the 1-norm to separate out the noisy component. Nevertheless, as well as simple Gaussian or Laplacian noise, noise in real-world data is often more complex, and thus the 1 and 2-norms are insufficient for noise characterization. This paper presents a more flexible approach to modeling complex noise by investigating their properties in the frequency domain. Although elements of a noise matrix are chaotic in the spatial domain, the absolute values of its alternative coefficients in the frequency domain are constant w. r. t. their variance. Based on this observation, a new robust PCA algorithm is formulated by simultaneously discovering the low-rank and noisy components. Extensive experiments on synthetic data and video background subtraction demonstrate that FRPCA is effective for handles complex noise.

IJCAI Conference 2017 Conference Paper

Collaborative Rating Allocation

  • Yali Du
  • Chang Xu
  • Dacheng Tao

This paper studies the collaborative rating allocation problem, in which each user has limited ratings on all items. These users are termed ``energy limited''. Different from existing methods which treat each rating independently, we investigate the geometric properties of a user's rating vector, and design a matrix completion method on the simplex. In this method, a user's rating vector is estimated by the combination of user profiles as basis points on the simplex. Instead of using Euclidean metric, a non-linear pull-back distance measurement from the sphere is adopted since it can depict the geometric constraints on each user's rating vector. The resulting objective function is then efficiently optimized by a Riemannian conjugate gradient method on the simplex. Experiments on real-world data sets demonstrate our model's competitiveness versus other collaborative rating prediction methods.

AAAI Conference 2017 Conference Paper

Cost-Sensitive Feature Selection via F-Measure Optimization Reduction

  • Meng Liu
  • Chang Xu
  • Yong Luo
  • Chao Xu
  • Yonggang Wen
  • Dacheng Tao

Feature selection aims to select a small subset from the high-dimensional features which can lead to better learning performance, lower computational complexity, and better model readability. The class imbalance problem has been neglected by traditional feature selection methods, therefore the selected features will be biased towards the majority classes. Because of the superiority of F-measure to accuracy for imbalanced data, we propose to use F-measure as the performance measure for feature selection algorithms. As a pseudo-linear function, the optimization of F-measure can be achieved by minimizing the total costs. In this paper, we present a novel cost-sensitive feature selection (CSFS) method which optimizes F-measure instead of accuracy to take class imbalance issue into account. The features will be selected according to optimal F-measure classifier after solving a series of cost-sensitive feature selection sub-problems. The features selected by our method will fully represent the characteristics of not only majority classes, but also minority classes. Extensive experimental results conducted on synthetic, multi-class and multi-label datasets validate the efficiency and significance of our feature selection method.

IJCAI Conference 2017 Conference Paper

Fast SVM Trained by Divide-and-Conquer Anchors

  • Meng Liu
  • Chang Xu
  • Chao Xu
  • Dacheng Tao

Supporting vector machine (SVM) is the most frequently used classifier for machine learning tasks. However, its training time could become cumbersome when the size of training data is very large. Thus, many kinds of representative subsets are chosen from the original dataset to reduce the training complexity. In this paper, we propose to choose the representative points which are noted as anchors obtained from non-negative matrix factorization (NMF) in a divide-and-conquer framework, and then use the anchors to train an approximate SVM. Our theoretical analysis shows that the solving the DCA-SVM can yield an approximate solution close to the primal SVM. Experimental results on multiple datasets demonstrate that our DCA-SVM is faster than the state-of-the-art algorithms without notably decreasing the accuracy of classification results.

IJCAI Conference 2017 Conference Paper

Multi-Positive and Unlabeled Learning

  • Yixing Xu
  • Chang Xu
  • Chao Xu
  • Dacheng Tao

The positive and unlabeled (PU) learning problem focuses on learning a classifier from positive and unlabeled data. Some methods have been developed to solve the PU learning problem. However, they are often limited in practical applications, since only binary classes are involved and cannot easily be adapted to multi-class data. Here we propose a one-step method that directly enables multi-class model to be trained using the given input multi-class data and that predicts the label based on the model decision. Specifically, we construct different convex loss functions for labeled and unlabeled data to learn a discriminant function F. The theoretical analysis on the generalization error bound shows that it is no worse than k√k times of the fully supervised multi-class classification methods when the size of the data in k classes is of the same order. Finally, our experimental results demonstrate the significance and effectiveness of the proposed algorithm in synthetic and real-world datasets.

IJCAI Conference 2017 Conference Paper

Online Reputation Fraud Campaign Detection in User Ratings

  • Chang Xu
  • Jie Zhang
  • Zhu Sun

Reputation fraud campaigns (RFCs) distort the reputations of rated items, by generating fake ratings through multiple spammers. One effective way of detecting RFCs is to characterize their collective behaviors based on rating histories. However, these campaigns are constantly evolving and changing tactics to evade detection. For example, they can launch early attacks on the items to quickly dominate the reputations. They can also whitewash themselves through creating new accounts for subsequent attacks. It is thus challenging for existing approaches working on historical data to promptly react to such emerging fraud activities. In this paper, we conduct RFC detection in online fashion, so as to spot campaign activities as early as possible. This leads to a unified and scalable optimization framework, FraudScan, that can adapt to emerging fraud patterns over time. Empirical analysis on two real-world datasets validates the effectiveness and efficiency of the proposed framework.

IJCAI Conference 2017 Conference Paper

Privileged Matrix Factorization for Collaborative Filtering

  • Yali Du
  • Chang Xu
  • Dacheng Tao

Collaborative filtering plays a crucial role in reducing excessive information in online consuming by suggesting products to customers that fulfil their potential interests. Observing that a user's review comments on purchases are often in companion with ratings, recent works exploit the review texts in representing user or item factors and have achieved prominent performance. Although effectiveness of reviews has been verified, one major defect of existing works is that reviews are used in justifying the learning of either user or item factors without noticing that each review associates a pair of user and item concurrently. To better explore the value of review comments, this paper presents the privileged matrix factorization method that utilize reviews in the learning of both user and item factors. By mapping review texts into the privileged feature space, a learned privileged function compensates the discrepancies between predicted ratings and groundtruth values rating-wisely. Thus by minimizing discrepancies and prediction errors, our method harnesses the information present in the review comments for the learning of both user and item factors. Experiments on five real datasets testify the effectiveness of the proposed method.

IJCAI Conference 2017 Conference Paper

Privileged Multi-label Learning

  • Shan You
  • Chang Xu
  • Yunhe Wang
  • Chao Xu
  • Dacheng Tao

This paper presents privileged multi-label learning (PrML) to explore and exploit the relationship between labels in multi-label learning problems. We suggest that for each individual label, it cannot only be implicitly connected with other labels via the low-rank constraint over label predictors, but also its performance on examples can receive the explicit comments from other labels together acting as an Oracle teacher. We generate privileged label feature for each example and its individual label, and then integrate it into the framework of low-rank based multi-label learning. The proposed algorithm can therefore comprehensively explore and exploit label relationships by inheriting all the merits of privileged information and low-rank constraints. We show that PrML can be efficiently solved by dual coordinate descent algorithm using iterative optimization strategy with cheap updates. Experiments on benchmark datasets show that through privileged label features, the performance can be significantly improved and PrML is superior to several competing methods in most cases.

IJCAI Conference 2017 Conference Paper

Tag Disentangled Generative Adversarial Network for Object Image Re-rendering

  • Chaoyue Wang
  • Chaohui Wang
  • Chang Xu
  • Dacheng Tao

In this paper, we propose a principled Tag Disentangled Generative Adversarial Networks (TD-GAN) for re-rendering new images for the object of interest from a single image of it by specifying multiple scene properties (such as viewpoint, illumination, expression, etc. ). The whole framework consists of a disentangling network, a generative network, a tag mapping net, and a discriminative network, which are trained jointly based on a given set of images that are completely/partially tagged (i. e. , supervised/semi-supervised setting). Given an input image, the disentangling network extracts disentangled and interpretable representations, which are then used to generate images by the generative network. In order to boost the quality of disentangled representations, the tag mapping net is integrated to explore the consistency between the image and its tags. Furthermore, the discriminative network is introduced to implement the adversarial training strategy for generating more realistic images. Experiments on two challenging datasets demonstrate the state-of-the-art performance of the proposed framework in the problem of interest.

NeurIPS Conference 2016 Conference Paper

CNNpack: Packing Convolutional Neural Networks in the Frequency Domain

  • Yunhe Wang
  • Chang Xu
  • Shan You
  • Dacheng Tao
  • Chao Xu

Deep convolutional neural networks (CNNs) are successfully used in a number of applications. However, their storage and computational requirements have largely prevented their widespread use on mobile devices. Here we present an effective CNN compression approach in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections. By treating convolutional filters as images, we decompose their representations in the frequency domain as common parts (i. e. , cluster centers) shared by other similar filters and their individual private parts (i. e. , individual residuals). A large number of low-energy frequency coefficients in both parts can be discarded to produce high compression without significantly compromising accuracy. We relax the computational burden of convolution operations in CNNs by linearly combining the convolution responses of discrete cosine transform (DCT) bases. The compression and speed-up ratios of the proposed algorithm are thoroughly analyzed and evaluated on benchmark image datasets to demonstrate its superiority over state-of-the-art methods.

IROS Conference 2015 Conference Paper

A haptic shared control algorithm for flexible human assistance to semi-autonomous robots

  • Ningbo Yu
  • Kui Wang
  • Yuan Li
  • Chang Xu
  • Jingtai Liu

Autonomous as well as teleoperated robots find wide applications in various environments. Their capability to accomplish complex and dynamic operations can be significantly improved by fusing human intelligence with autonomous algorithms. In this paper, we propose a haptic shared control algorithm to provide flexible human assistance for semi-autonomous mobile robots. Through the admittance and impedance models, the haptic shared controller smoothly puts together human operator inputs with robot autonomy. Further, the level of autonomy is fully determined by the operator with the grasp motion. A decomposed design has been taken for the autonomous controller of the mobile robot. The algorithm was implemented on the haptic interface omega. 7 together with a QBot mobile robot, and its feasibility and efficacy have been validated by experiments.

AAAI Conference 2015 Conference Paper

Large-Margin Multi-Label Causal Feature Learning

  • Chang Xu
  • Dacheng Tao
  • Chao Xu

In multi-label learning, an example is represented by a descriptive feature associated with several labels. Simply considering labels as independent or correlated is crude; it would be beneficial to define and exploit the causality between multiple labels. For example, an image label ‘lake’ implies the label ‘water’, but not vice versa. Since the original features are a disorderly mixture of the properties originating from different labels, it is intuitive to factorize these raw features to clearly represent each individual label and its causality relationship. Following the large-margin principle, we propose an effective approach to discover the causal features of multiple labels, thus revealing the causality between labels from the perspective of feature. We show theoretically that the proposed approach is a tight approximation of the empirical multi-label classification error, and the causality revealed strengthens the consistency of the algorithm. Extensive experimentations using synthetic and real-world data demonstrate that the proposed algorithm effectively discovers label causality, generates causal features, and improves multi-label learning.

IJCAI Conference 2015 Conference Paper

Multi-view Self-Paced Learning for Clustering

  • Chang Xu
  • Dacheng Tao
  • Chao Xu

Exploiting the information from multiple views can improve clustering accuracy. However, most existing multi-view clustering algorithms are nonconvex and are thus prone to becoming stuck into bad local minima, especially when there are outliers and missing data. To overcome this problem, we present a new multi-view self-paced learning (MSPL) algorithm for clustering, that learns the multi-view model by not only progressing from ‘easy’ to ‘complex’ examples, but also from ‘easy’ to ‘complex’ views. Instead of binarily separating the examples or views into ‘easy’ and ‘complex’, we design a novel probabilistic smoothed weighting scheme. Employing multiple views for clustering and defining complexity across both examples and views are shown theoretically to be beneficial to optimal clustering. Experimental results on toy and real-world data demonstrate the efficacy of the proposed algorithm.

AAAI Conference 2014 Conference Paper

Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks

  • Yuyu Zhang
  • Hanjun Dai
  • Chang Xu
  • Jun Feng
  • Taifeng Wang
  • Jiang Bian
  • Bin Wang
  • Tie-Yan Liu

Click prediction is one of the fundamental problems in sponsored search. Most of existing studies took advantage of machine learning approaches to predict ad click for each event of ad view independently. However, as observed in the real-world sponsored search system, user’s behaviors on ads yield high dependency on how the user behaved along with the past time, especially in terms of what queries she submitted, what ads she clicked or ignored, and how long she spent on the landing pages of clicked ads, etc. Inspired by these observations, we introduce a novel framework based on Recurrent Neural Networks (RNN). Compared to traditional methods, this framework directly models the dependency on user’s sequential behaviors into the click prediction process through the recurrent structure in RNN. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that our approach can significantly improve the click prediction accuracy, compared to sequence-independent approaches.

AAAI Conference 2013 Conference Paper

Vector-Valued Multi-View Semi-Supervsed Learning for Multi-Label Image Classification

  • Yong Luo
  • Dacheng Tao
  • Chang Xu
  • Dongchen Li
  • Chao Xu

Images are usually associated with multiple labels and comprised of multiple views, due to each image containing several objects (e. g. a pedestrian, bicycle and tree) and multiple visual features (e. g. color, texture and shape). Currently available tools tend to use either labels or features for classification, but both are necessary to describe the image properly. There have been recent successes in using vector-valued functions, which construct matrix-valued kernels, to explore the multi-label structure in the output space. This has motivated us to develop multi-view vector-valued manifold regularization (MV3 MR) in order to integrate multiple features. MV3 MR exploits the complementary properties of different features, and discovers the intrinsic local geometry of the compact support shared by different features, under the theme of manifold regularization. We validate the effectiveness of the proposed MV3 MR methodology for image classification by conducting extensive experiments on two challenge datasets, PASCAL VOC’ 07 and MIR Flickr.