Arrow Research search

Author name cluster

Ali Borji

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICLR Conference 2021 Conference Paper

Contemplating Real-World Object Classification

  • Ali Borji

Deep object recognition models have been very successful over benchmark datasets such as ImageNet. How accurate and robust are they to distribution shifts arising from natural and synthetic variations in datasets? Prior research on this problem has primarily focused on ImageNet variations (e.g., ImageNetV2, ImageNet-A). To avoid potential inherited biases in these studies, we take a different approach. Specifically, we reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations. They showed a dramatic performance drop of the state of the art object recognition models on this dataset. Due to the importance and implications of their results regarding the generalization ability of deep models, we take a second look at their analysis. We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement. Relative to the numbers reported in Barbu et al., around 10-15% of the performance loss is recovered, without any test time data augmentation. Despite this gain, however, we conclude that deep models still suffer drastically on the ObjectNet dataset. We also investigate the robustness of models against synthetic image perturbations such as geometric transformations (e.g., scale, rotation, translation), natural image distortions (e.g., impulse noise, blur) as well as adversarial attacks (e.g., FGSM and PGD-5). Our results indicate that limiting the object area as much as possible (i.e., from the entire image to the bounding box to the segmentation mask) leads to consistent improvement in accuracy and robustness. Finally, through a qualitative analysis of ObjectNet data, we find that i) a large number of images in this dataset are hard to recognize even for humans, and ii) easy (hard) samples for models match with easy (hard) samples for humans. Overall, our analysis shows that ObjecNet is still a challenging test platform that can be used to measure the generalization ability of models. The code and data are available in [masked due to blind review].

AAAI Conference 2020 Conference Paper

A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories

  • Zhaohui Che
  • Ali Borji
  • Guangtao Zhai
  • Suiyi Ling
  • Jing Li
  • Patrick Le Callet

Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of pre-trained source models can transfer to other new target models, thus pose a security threat to blackbox applications (when the attackers have no access to the target models). Despite adopting diverse architectures and parameters, source and target models often share similar decision boundaries. Therefore, if an adversary is capable of fooling several source models concurrently, it can potentially capture intrinsic transferable adversarial information that may allow it to fool a broad class of other black-box target models. Current ensemble attacks, however, only consider a limited number of source models to craft an adversary, and obtain poor transferability. In this paper, we propose a novel black-box attack, dubbed Serial-Mini-Batch- Ensemble-Attack (SMBEA). SMBEA divides a large number of pre-trained source models into several mini-batches. For each single batch, we design 3 new ensemble strategies to improve the intra-batch transferability. Besides, we propose a new algorithm that recursively accumulates the “long-term” gradient memories of the previous batch to the following batch. This way, the learned adversarial information can be preserved and the inter-batch transferability can be improved. Experiments indicate that our method outperforms state-ofthe-art ensemble attacks over multiple pixel-to-pixel vision tasks including image translation and salient region prediction. Our method successfully fools two online black-box saliency prediction systems including DeepGaze-II (Kummerer 2017) and SALICON (Huang et al. 2017). Finally, we also contribute a new repository to promote the research on adversarial attack and defense over pixel-to-pixel tasks: https: //github. com/CZHQuality/AAA-Pix2pix.

ICLR Conference 2020 Conference Paper

White Noise Analysis of Neural Networks

  • Ali Borji
  • Sikun Lin

A white noise analysis of modern deep neural networks is presented to unveil their biases at the whole network level or the single neuron level. Our analysis is based on two popular and related methods in psychophysics and neurophysiology namely classification images and spike triggered analysis. These methods have been widely used to understand the underlying mechanisms of sensory systems in humans and monkeys. We leverage them to investigate the inherent biases of deep neural networks and to obtain a first-order approximation of their functionality. We emphasize on CNNs since they are currently the state of the art methods in computer vision and are a decent model of human visual processing. In addition, we study multi-layer perceptrons, logistic regression, and recurrent neural networks. Experiments over four classic datasets, MNIST, Fashion-MNIST, CIFAR-10, and ImageNet, show that the computed bias maps resemble the target classes and when used for classification lead to an over two-fold performance than the chance level. Further, we show that classification images can be used to attack a black-box classifier and to detect adversarial patch attacks. Finally, we utilize spike triggered averaging to derive the filters of CNNs and explore how the behavior of a network changes when neurons in different layers are modulated. Our effort illustrates a successful example of borrowing from neurosciences to study ANNs and highlights the importance of cross-fertilization and synergy across machine learning, deep learning, and computational neuroscience.

TIST Journal 2018 Journal Article

A Review of Co-Saliency Detection Algorithms

  • Dingwen Zhang
  • Huazhu Fu
  • Junwei Han
  • Ali Borji
  • Xuelong Li

Co-saliency detection is a newly emerging and rapidly growing research area in the computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and it can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this article, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field and to give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.

IJCAI Conference 2018 Conference Paper

Enhanced-alignment Measure for Binary Foreground Map Evaluation

  • Deng-Ping Fan
  • Cheng Gong
  • Yang Cao
  • Bo Ren
  • Ming-Ming Cheng
  • Ali Borji

The existing binary foreground map (FM) measures address various types of errors in either pixel-wise or structural ways. These measures consider pixel-level match or image-level information independently, while cognitive vision studies have shown that human vision is highly sensitive to both global information and local details in scenes. In this paper, we take a detailed look at current binary FM evaluation measures and propose a novel and effective E-measure (Enhanced-alignment measure). Our measure combines local pixel values with the image-level mean value in one term, jointly capturing image-level statistics and local pixel matching information. We demonstrate the superiority of our measure over the available measures on 4 popular datasets via 5 meta-measures, including ranking models for applications, demoting generic, random Gaussian noise maps, ground-truth switch, as well as human judgments. We find large improvements in almost all the meta-measures. For instance, in terms of application ranking, we observe improvement ranging from 9. 08% to 19. 65% compared with other popular measures.

NeurIPS Conference 2013 Conference Paper

Bayesian optimization explains human active search

  • Ali Borji
  • Laurent Itti

Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploit all the x values tried and all the f(x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans.

AAAI Conference 2012 Conference Paper

An Object-Based Bayesian Framework for Top-Down Visual Attention

  • Ali Borji
  • Dicky Sihite
  • Laurent Itti

We introduce a new task-independent framework to model top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of-theart object detection models. Evaluating over ∼3 hours (appx. 315, 000 eye fixations and 12, 600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, 2) 14 state-of-theart bottom-up saliency models, and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data.

ICRA Conference 2012 Conference Paper

Modeling the influence of action on spatial attention in visual interactive environments

  • Ali Borji
  • Dicky N. Sihite
  • Laurent Itti

A large number of studies have been reported on top-down influences of visual attention. However, less progress have been made in understanding and modeling its mechanisms in real-world tasks. In this paper, we propose an approach for learning spatial attention taking into account influences of physical actions on top-down attention. For this purpose, we focus on interactive visual environments (video games) which are modest real-world simulations, where a player has to attend to certain aspects of visual stimuli and perform actions to achieve a goal. The basic idea is to learn a mapping from current mental state of the game player, represented by past actions and observations, to its gaze fixation. A data-driven approach is followed where we train a model from the data of some players and test it over a new subject. In particular, two contributions this paper makes are: 1) employing multi-modal information including mean eye position, gist of a scene, physical actions, bottom-up saliency, and tagged events for state representation and 2) analysis of different methods of combining bottom-up and top-down influences. Comparing with other top-down task-driven and bottom-up spatio-temporal models, our approach shows higher NSS scores in predicting eye positions.

ICRA Conference 2011 Conference Paper

Scene classification with a sparse set of salient regions

  • Ali Borji
  • Laurent Itti

This work proposes an approach for scene classification by extracting and matching visual features only at the focuses of visual attention instead of the entire scene. Analysis over a database of natural scenes demonstrates that regions proposed by the saliency-based model of visual attention are robust to image transformations. Using a nearest neighbor classifier and a distance measure defined over the salient regions, we obtained 97. 35% and 78. 28% classification rates with SIFT and C2 features from the HMAX model at 5 salient regions covering at most 31% of the image. Classification with features extracted from the entire image results in 99. 3% and 82. 32% using SIFT and C2 features, respectively. Comparing attentional and adhoc approaches shows that classification rate of the first approach is 0. 95 of the second. Overall, our results prove that efficient scene classification, in terms of reducing the complexity of feature extraction is possible without a significant drop in performance.

IROS Conference 2010 Conference Paper

Simultaneous learning of spatial visual attention and physical actions

  • Ali Borji
  • Majid Nili Ahmadabadi
  • Babak Nadjar Araabi

This paper introduces a new method for learning top-down and task-driven visual attention control along with physical actions in interactive environments. Our method is based on the Reinforcement Learning of Visual Classes(RLVC) algorithm and adapts it for learning spatial visual selection in order to reduce computational complexity. Proposed algorithm also addresses aliasings due to not knowing previous actions and perceptions. Continuing learning shows our method is robust to perturbations in perceptual information. Our method also allows object recognition when class labels are used instead of physical actions. We have tried to gain maximum generalization while performing local processing. Experiments over visual navigation and object recognition tasks show that our method is more efficient in terms of computational complexity and is biologically more plausible.

ICRA Conference 2009 Conference Paper

Learning sequential visual attention control through dynamic state space discretization

  • Ali Borji
  • Majid Nili Ahmadabadi
  • Babak Nadjar Araabi

Similar to humans and primates, artificial creatures like robots are limited in terms of allocation of their resources to huge sensory and perceptual information. Serial processing mechanisms used in the design of such creatures demands engineering attentional control mechanisms. In this paper, we present a new algorithm for learning top-down sequential visual attention control for agents acting in interactive environments. Our method is based on the key idea, that attention can be learned best in concert with visual representations through automatic construction and discretization of the visual state space. The tree representing the top-down attention is incrementally refined whenever aliasing occurs by selecting the most appropriate saccadic direction. The proposed approach is evaluated on action-based object recognition and urban navigation tasks, where obtained results support applicability and usefulness of developed saccade movement method for robotics.