Author name cluster

Nick Barnes

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

TIST Journal 2025 Journal Article

A Comprehensive Overview of Large Language Models

Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multimodal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to provide not only a systematic survey but also a quick, comprehensive reference for the researchers and practitioners to draw insights from extensive, informative summaries of the existing works to advance the LLM research.

Details DOI

NeurIPS Conference 2024 Conference Paper

LAM3D: Large Image-Point Clouds Alignment Model for 3D Reconstruction from Single Image

Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
Shenzhou Chen
Taizhang Shang
Yang Li

Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. In this work, we introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes. Our methodology begins with the development of a point-cloud-based network that effectively generates precise and meaningful latent tri-planes, laying the groundwork for accurate 3D mesh reconstruction. Building upon this, our Image-Point-Cloud Feature Alignment technique processes a single input image, aligning to the latent tri-planes to imbue image features with robust 3D information. This process not only enriches the image features but also facilitates the production of high-fidelity 3D meshes without the need for multi-view input, significantly reducing geometric distortions. Our approach achieves state-of-the-art high-fidelity 3D mesh reconstruction from a single image in just 6 seconds, and experiments on various datasets demonstrate its effectiveness.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Energy-Based Generative Cooperative Saliency Prediction

Jing Zhang
Jianwen Xie
Zilong Zheng
Nick Barnes

Conventional saliency prediction models typically learn a deterministic mapping from an image to its saliency map, and thus fail to explain the subjective nature of human attention. In this paper, to model the uncertainty of visual saliency, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over the saliency map given an input image, and treating the saliency prediction as a sampling process from the learned distribution. Specifically, we propose a generative cooperative saliency prediction framework, where a conditional latent variable model (LVM) and a conditional energybased model (EBM) are jointly trained to predict salient objects in a cooperative manner. The LVM serves as a fast but coarse predictor to efficiently produce an initial saliency map, which is then refined by the iterative Langevin revision of the EBM that serves as a slow but fine predictor. Such a coarse-tofine cooperative saliency prediction strategy offers the best of both worlds. Moreover, we propose a “cooperative learning while recovering” strategy and apply it to weakly supervised saliency prediction, where saliency annotations of training images are partially observed. Lastly, we find that the learned energy function in the EBM can serve as a refinement module that can refine the results of other pre-trained saliency prediction models. Experimental results show that our model can produce a set of diverse and plausible saliency maps of an image, and obtain state-of-the-art performance in both fully supervised and weakly supervised saliency prediction tasks.

PDF Details

AAAI Conference 2022 Conference Paper

Transmission-Guided Bayesian Generative Model for Smoke Segmentation

Siyuan Yan
Jing Zhang
Nick Barnes

Smoke segmentation is essential to precisely localize wildfire so that it can be extinguished in an early phase. Although deep neural networks have achieved promising results on image segmentation tasks, they are prone to be overconfident for smoke segmentation due to its non-rigid shape and transparent appearance. This is caused by both knowledge level uncertainty due to limited training data for accurate smoke segmentation and labeling level uncertainty representing the difficulty in labeling ground-truth. To effectively model the two types of uncertainty, we introduce a Bayesian generative model to simultaneously estimate the posterior distribution of model parameters and its predictions. Further, smoke images suffer from low contrast and ambiguity, inspired by physics-based image dehazing methods, we design a transmission-guided local coherence loss to guide the network to learn pair-wise relationships based on pixel distance and the transmission feature. To promote the development of this field, we also contribute a high-quality smoke segmentation dataset, SMOKE5K, consisting of 1, 400 real and 4, 000 synthetic images with pixel-wise annotation. Experimental results on benchmark testing datasets illustrate that our model achieves both accurate predictions and reliable uncertainty maps representing model ignorance about its prediction. Our code and dataset are publicly available at: https: //github. com/redlessme/Transmission-BVM.

PDF Details

ICLR Conference 2021 Conference Paper

Conditional Generative Modeling via Learning the Latent Space

Sameera Ramasinghe
Kanchana Ranasinghe
Salman H. Khan 0001
Nick Barnes
Stephen Gould

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings. We propose a novel general-purpose framework for conditional generation in multimodal spaces, that uses latent variables to model generalizable learning patterns while minimizing a family of regression cost functions. At inference, the latent variables are optimized to find solutions corresponding to multiple output modes. Compared to existing generative solutions, our approach demonstrates faster and more stable convergence, and can learn better representations for downstream tasks. Importantly, it provides a simple generic model that can perform better than highly engineered pipelines tailored using domain expertise on a variety of tasks, while generating diverse outputs. Code available at https://github.com/samgregoost/cGML.

Details

NeurIPS Conference 2021 Conference Paper

Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction

Jing Zhang
Jianwen Xie
Nick Barnes
Ping Li

Vision transformer networks have shown superiority in many computer vision tasks. In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection. Both the vision transformer network and the energy-based prior model are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. Further, with the generative vision transformer, we can easily obtain a pixel-wise uncertainty map from an image, which indicates the model confidence in predicting saliency from the image. Different from the existing generative models which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive to capture the latent space of the data. We apply the proposed framework to both RGB and RGB-D salient object detection tasks. Extensive experimental results show that our framework can achieve not only accurate saliency predictions but also meaningful uncertainty maps that are consistent with the human perception.

PDF Details

NeurIPS Conference 2021 Conference Paper

Rethinking conditional GAN training: An approach using geometrically structured latent manifolds

Sameera Ramasinghe
Moshiur Farazi
Salman H Khan
Nick Barnes
Stephen Gould

Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds. Although efforts have been made to improve results, they can suffer from unpleasant side-effects such as the topology mismatch between latent and output spaces. In contrast, we tackle this problem from a geometrical perspective and propose a novel training mechanism that increases both the diversity and the visual quality of a vanilla cGAN, by systematically encouraging a bi-lipschitz mapping between the latent and the output manifolds. We validate the efficacy of our solution on a baseline cGAN (i. e. , Pix2Pix) which lacks diversity, and show that by only modifying its training mechanism (i. e. , with our proposed Pix2Pix-Geo), one can achieve more diverse and realistic outputs on a broad set of image-to-image translation tasks.

PDF Details

AAAI Conference 2020 Conference Paper

Improved Visual-Semantic Alignment for Zero-Shot Object Detection

Shafin Rahman
Salman Khan
Nick Barnes

Zero-shot object detection is an emerging research topic that aims to recognize and localize previously ‘unseen’ objects. This setting gives rise to several unique challenges, e. g. , highly imbalanced positive vs. negative instance ratio, proper alignment between visual and semantic concepts and the ambiguity between background and unseen classes. Here, we propose an end-to-end deep learning framework underpinned by a novel loss function that handles class-imbalance and seeks to properly align the visual and semantic cues for improved zero-shot learning. We call our objective the ‘Polarity loss’ because it explicitly maximizes the gap between positive and negative predictions. Such a margin maximizing formulation is not only important for visual-semantic alignment but it also resolves the ambiguity between background and unseen objects. Further, the semantic representations of objects are noisy, thus complicating the alignment between visual and semantic domains. To this end, we perform metric learning using a ‘Semantic vocabulary’ of related concepts that reﬁnes the noisy semantic embeddings and establishes a better synergy between visual and semantic domains. Our approach is inspired by the embodiment theories in cognitive science, that claim human semantic understanding to be grounded in past experiences (seen objects), related linguistic concepts (word vocabulary) and the visual perception (seen/unseen object images). Our extensive results on MS-COCO and Pascal VOC datasets show signiﬁcant improvements over state of the art. 1

PDF Details

IROS Conference 2020 Conference Paper

Spectral-GANs for High-Resolution 3D Point-cloud Generation

Sameera Ramasinghe
Salman H. Khan 0001
Nick Barnes
Stephen Gould

Point-clouds are a popular choice for robotics and computer vision tasks due to their accurate shape description and direct acquisition from range-scanners. This demands the ability to synthesize and reconstruct high-quality point-clouds. Current deep generative models for 3D data generally work on simplified representations (e. g. , voxelized objects) and cannot deal with the inherent redundancy and irregularity in point-clouds. A few recent efforts on 3D point-cloud generation offer limited resolution and their complexity grows with the increase in output resolution. In this paper, we develop a principled approach to synthesize 3D point-clouds using a spectral-domain Generative Adversarial Network (GAN). Our spectral representation is highly structured and allows us to disentangle various frequency bands such that the learning task is simplified for a GAN model. As compared to spatial-domain generative approaches, our formulation allows us to generate high-resolution point-clouds with minimal computational overhead. Furthermore, we propose a fully differentiable block to transform from the spectral to the spatial domain and back, thereby allowing us to integrate knowledge from well-established spatial models. We demonstrate that Spectral-GAN performs well for point-cloud generation task. Additionally, it can learn a highly discriminative representation in an unsupervised fashion and can be used to accurately reconstruct 3D objects. Our codes are available at https://github.com/samgregoost/Spectral-GAN/.

Details

ICRA Conference 2007 Conference Paper

Real Time Biologically-Inspired Depth Maps from Spherical Flow

Chris McCarthy
Nick Barnes
Mandyam V. Srinivasan

We present a strategy for generating real-time relative depth maps of an environment from optical flow, under general motion. We achieve this using an insect-inspired hemispherical fish-eye sensor with 190 degree FOV, and a de-rotated optical flow field. The de-rotation algorithm applied is based on the theoretical work of Nelson and Aloimonos (1988). From this we obtain the translational component of motion, and construct full relative depth maps on the sphere. We examine the robustness of this strategy in both simulation and real-world experiments, for a variety of environmental scenarios. To our knowledge, this is the first demonstrated implementation of the Nelson and Aloimonos algorithm working in real-time, over real image sequences. In addition, we apply this algorithm to the real-time recovery of full relative depth maps. These preliminary results demonstrate the feasibility of this approach for closed-loop control of a robot.

Details

IROS Conference 2006 Conference Paper

A Robust Docking Strategy for a Mobile Robot using Flow Field Divergence

Chris McCarthy
Nick Barnes

We present a robust strategy for docking a mobile robot in close proximity with an upright surface using optical flow field divergence. Unlike previous approaches, we achieve this without the need for explicit segmentation of the surface in the image, and using complete optical estimation (i. e. no affine models) in the control loop. A simple proportional control law is used to regulate the vehicle's velocity, using only the raw, unfiltered flow divergence as input. Central to the robustness of our approach is the derivation of a time-to-contact estimator that accounts for small rotations of the robot during ego-motion. We present both analytical and experimental results showing that through tracking of the focus of expansion to a looming surface, we may compensate for such rotations, thereby significantly improving the robustness of the time-to-contact estimate. This is demonstrated using an off-board natural image sequence, and in closed-loop control of a mobile robot

Details

ICRA Conference 2005 Conference Paper

A Sign Reading Driver Assistance System Using Eye Gaze

Luke Fletcher
Lars Petersson
Nick Barnes
David J. Austin
Alexander Zelinsky

Cars are becoming, in effect, a robotic system with an embedded human. It is not possible to know what the driver is thinking. We can, however, monitor their gaze and compare it with information in their view-field to make an inference. In this paper we present a complete system that reads speed signs in real-time, compares the driver gaze, and provides immediate feedback if the sign has been missed by the driver. This paper focuses on correlating measures of driver gaze direction with the position of signs in the road scene and improving recognition of signs through image enhancement.

Details

ICRA Conference 2005 Conference Paper

Improved Signal To Noise Ratio And Computational Speed For Gradient-Based Detection Algorithms

Nick Barnes

Image gradient-based feature detectors offer great advantages over their standard edge-only equivalents. In driver support systems research, the radial symmetry detection algorithm has given real-time results for speed sign recognition. The regular polygon detector is a scan line algorithm for these features facilitating recognition of other road signs such as stop and give way signs. Radial symmetry has also been applied to real-time face detection, and the polygon detector is showing promising results as a feature detector for SLAM. However, gradient-based feature detection is more sensitive to noise than standard edge-based algorithms. As the total gradient magnitude at a pixel decreases, the component of the gradient at that point that arises from image noise increases. When a pixel votes in its gradient direction out to an extended radius, its position is more likely to be inaccurate if the gradient magnitude is low. In this paper, we analyse the performance of the radial symmetry and regular polygon detector algorithms under changes to the threshold on gradient magnitude. We show that the number of pixels correctly voting on a circle is not greatly reduced by thresholds that decrease the total number of pixels that vote in the image to 20%. This greatly reduces the noise component in the image, with only slight impact on the signal. This improves the performance, particularly for the regular polygon detector where the voting mechanism is complex and constitutes a large amount of the processing per pixel. This facilitates a real-time implementation, which is presented here.

Details

ICRA Conference 2004 Conference Paper

An Interactive Driver Assistance System Monitoring the Scene in and out of the Vehicle

Lars Petersson
Luke Fletcher
Nick Barnes
Alexander Zelinsky

This paper presents a framework for interactive driver assistance systems including techniques for fast speed sign detection and classification, car detection and tracking, and lane departure warning. In addition, the driver's actions are monitored. The integrated system uses information extracted from the road scene (speed signs, position within the lane, relative position to other cars, etc.) together with information about the driver's state such as eye gaze and head pose, to issue adequate warnings. A touch screen monitor presents relevant information and allows the driver to interact with the system. The research is focused around robust on-line algorithms. Initial results of online speed sign detection and car tracking are presented in the context of a driver assistance system.

Details

IROS Conference 2004 Conference Paper

Fast shape-based road sign detection for a driver assistance system

Gareth Blake Loy
Nick Barnes

A new method is presented for detecting triangular, square and octagonal road signs efficiently and robustly. The method uses the symmetric nature of these shapes, together with the pattern of edge orientations exhibited by equiangular polygons with a known number of sides, to establish possible shape centroid locations in the image. This approach is invariant to in-plane rotation and returns the location and size of the shape detected. Results on still images show a detection rate of over 95%. The method is efficient enough for real-time applications, such as on-board-vehicle sign detection.

Details

ICRA Conference 2004 Conference Paper

Fast Sum of Absolute Differences Visual Landmark Detector

Craig Watman
David J. Austin
Nick Barnes
Gary Overett
Simon Thompson 0002

This paper presents various optimisation that can be applied to the sum of absolute differences (SAD) correlation algorithm for automated landmark detection. This has applications in mobile robotic navigation and mapping. We show how some assumptions about the environment and the generic form of strong landmarks selected by the SAD correlation algorithm have led to the development of an algorithm to enable near real tune selection of strong landmarks from visual information. The landmarks that have been selected from a series of frames using our optimisation are shown to be stable through the image sequence, demonstration the scale invariance of the landmarks that are selected by the SAD correlation algorithm.

Details

ICRA Conference 2004 Conference Paper

Performance of Optical Flow Techniques for Indoor Navigation with a Mobile Robot

Chris McCarthy
Nick Barnes

We present a comparison of four optical flow methods and three spatio-temporal filters for mobile robot navigation in corridor-like environments. Previous comparisons of optical flow methods have evaluated performance only in terms of accuracy and/or efficiency, and typically in isolation. These comparisons are inadequate for addressing applicability to continuous, real-time operation as part of a robot control loop. We emphasise the need for comparisons that consider the context of a system, and that are confirmed by in-system results. To this end, we give results for on and off-board trials of two biologically inspired behaviours: corridor centring and visual odometry. Our results show the best in-system performances are achieved using Lucas and Kanade's gradient-based method in combination with a recursive temporal filter. Results for traditionally used Gaussian filters indicate that long latencies significantly impede performance for real-time tasks in the control loop.

Details

IROS Conference 2003 Conference Paper

Particle attraction localisation

Damien George
Nick Barnes

In this paper, we present an original method for Bayesian localisation based on particle approximation. Our method overcomes a majority of problems inherent in previous Kalman filter and Bayesian approaches, including the recent Monte Carlo localisation methods. The algorithm converges quickly to any desired precision. It does not over-converge in the case of highly accurate sensor data and thus does not require a mixture-based approach. Also, the algorithm recovers well from random repositioning. These benefits are not hindered by computation which can be performed in real time on low powered processors. Further, the algorithm is intuitive and easy to implement. This algorithm is evaluated in simulation and has been applied to our entrant in the Sony four legged league of RoboCup, where it has been tested over many hours of international competition.

Details

IROS Conference 2003 Conference Paper

Towards an efficient optimal trajectory planner for multiple mobile robots

Jason Thomas
Alan D. Blair
Nick Barnes

In this paper, we present a real-time algorithm that plans mostly optimal trajectories for multiple mobile robots in a dynamic environment. This approach combines the use of a Delaunay triangulation to discretise the environment, a novel efficient use of the A* search method, and a novel cubic spline representation for a robot trajectory that meets the kinematic and dynamic constraints of the robot. We show that for complex environments the shortest-distance path is not always the shortest-time path due to these constraints. The algorithm has been implemented on real robots, and we present experimental results in cluttered environments.

Details