Author name cluster

Naveed Akhtar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

TIST Journal 2025 Journal Article

A Comprehensive Overview of Large Language Models

Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multimodal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to provide not only a systematic survey but also a quick, comprehensive reference for the researchers and practitioners to draw insights from extensive, informative summaries of the existing works to advance the LLM research.

Details DOI

NeurIPS Conference 2025 Conference Paper

CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation

Li Liang
Bo Miao
Xinyu Wang
Naveed Akhtar
Jordan Vice
Ajmal Mian

Outdoor 3D semantic scene generation produces realistic and semantically rich environments for applications such as urban simulation and autonomous driving. However, advances in this direction are constrained by the absence of publicly available, well-annotated datasets. We introduce SketchSem3D, the first large‑scale benchmark for generating 3D outdoor semantic scenes from abstract freehand sketches and pseudo‑labeled annotations of satellite images. SketchSem3D includes two subsets, Sketch-based SemanticKITTI and Sketch-based KITTI-360 (containing LiDAR voxels along with their corresponding sketches and annotated satellite images), to enable standardized, rigorous, and diverse evaluations. We also propose Cylinder Mamba Diffusion (CymbaDiff) that significantly enhances spatial coherence in outdoor 3D scene generation. CymbaDiff imposes structured spatial ordering, explicitly captures cylindrical continuity and vertical hierarchy, and preserves both physical neighborhood relationships and global context within the generated scenes. Extensive experiments on SketchSem3D demonstrate that CymbaDiff achieves superior semantic consistency, spatial realism, and cross-dataset generalization. The code and dataset will be available at here.

PDF Details

IROS Conference 2025 Conference Paper

Multistream Network for LiDAR and Camera-based 3D Object Detection in Outdoor Scenes

Muhammad Ibrahim 0001
Naveed Akhtar
Haitian Wang 0002
Saeed Anwar
Ajmal Mian

Fusion of LiDAR and RGB data has the potential to enhance outdoor 3D object detection accuracy. To address real-world challenges in outdoor 3D object detection, fusion of LiDAR and RGB input has started gaining traction. However, effective integration of these modalities for precise object detection tasks still remains a largely open problem. To address that, we propose a MultiStream Detection (MuStD) network, which meticulously extracts task-relevant information from both data modalities. The network follows a three-stream structure. Its LiDAR-PillarNet stream extracts sparse 2D pillar features from the LiDAR input while the LiDAR-Height Compression stream computes Bird’s-Eye View features. An additional 3D Multimodal stream combines RGB and LiDAR features using UV mapping and polar coordinate indexing. Eventually, the features containing comprehensive spatial, textural, and geometric information are carefully fused and fed to a detection head for 3D object detection. We evaluate our method on the challenging KITTI Object Detection Benchmark, with results available on the official evaluation server. 1. Our approach achieves strong performance, with an average precision (AP) of 85. 39% in 3D detection, 91. 34% in Bird’s Eye View (BEV) detection, and 96. 39% in 2D detection. These results match or surpass existing state-of-the-art methods. In the difficult "Hard" category, our method attains 80. 78% AP in 3D detection and 94. 04% AP in 2D detection, highlighting its robustness in challenging scenarios. Furthermore, our method runs at 67 ms, demonstrating efficiency and real-time capability. Our code will be released through the MuStD GitHub repository at https://github.com/IbrahimUWA/MuStD.

Details

AAAI Conference 2025 Conference Paper

Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion

Li Liang
Naveed Akhtar
Jordan Vice
Xiangrui Kong
Ajmal Saeed Mian

3D semantic scene completion is critical for multiple downstream tasks in autonomous systems. It estimates missing geometric and semantic information in the acquired scene data. Due to the challenging real-world conditions, this task usually demands complex models that process multi-modal data to achieve acceptable performance. We propose a unique neural model, leveraging advances from the state space and diffusion generative modeling to achieve remarkable 3D semantic scene completion performance with monocular image input. Our technique processes the data in the conditioned latent space of a variational autoencoder where diffusion modeling is carried out with an innovative state space technique. A key component of our neural network is the proposed Skimba (Skip Mamba) denoiser, which is adept at efficiently processing long-sequence data. The Skimba diffusion model is integral to our 3D scene completion network, incorporating a triple Mamba structure, dimensional decomposition residuals and varying dilations along three directions. We also adopt a variant of this network for the subsequent semantic segmentation stage of our method. Extensive evaluation on the standard SemanticKITTI and SSCBench-KITTI360 datasets show that our approach not only outperforms other monocular techniques by a large margin, it also achieves competitive performance against stereo methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Improved MLP Point Cloud Processing with High-Dimensional Positional Encoding

Yanmei Zou
Hongshan Yu
Zhengeng Yang
Zechuan Li
Naveed Akhtar

Multi-Layer Perceptron (MLP) models are the bedrock of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength. We first develop an “abstraction and refinement” (ABS-REF) view for the neural modeling of point clouds. This view elucidates that whereas the early models focused on the ABS stage, the more recent techniques devise sophisticated REF stages to attain performance advantage in point cloud processing. We then borrow the concept of “positional encoding” from transformer literature, and propose a High-dimensional Positional Encoding (HPE) module, which can be readily deployed to MLP based architectures. We leverage our module to develop a suite of HPENet, which are MLP networks that follow ABS-REF paradigm, albeit with a sophisticated HPE based REF stage. The developed technique is extensively evaluated for 3D object classification, object part segmentation, semantic segmentation and object detection. We establish new state-of-the-art results of 87.6 mAcc on ScanObjectNN for object classification, and 85.5 class mIoU on ShapeNetPart for object part segmentation, and 72.7 and 78.7 mIoU on Area-5 and 6-fold experiments with S3DIS for semantic segmentation. The source code for this work is available at https://github.com/zouyanmei/HPENet.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility

Rohit Gupta
Naveed Akhtar
Ajmal Mian
Mubarak Shah

Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification. However, it is still largely unknown if the nature of the representations induced by the two learning paradigms is similar. We investigate this under the lens of adversarial robustness. Our analysis of the problem reveals that CSL has intrinsically higher sensitivity to perturbations over supervised learning. We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon. We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations. Our finding is supported by extensive experiments for image and video classification using adversarial perturbations and other input corruptions. We devise a strategy to detect and remove false negative pairs that is simple, yet effective in improving model robustness with CSL training. We close up to 68% of the robustness gap between CSL and its supervised counterpart. Finally, we contribute to adversarial learning by incorporating our method in CSL. We demonstrate an average gain of about 5% over two different state-of-the-art methods in this domain.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Local Path Integration for Attribution

Peiyu Yang
Naveed Akhtar
Zeyi Wen
Ajmal Mian

Path attribution methods are a popular tool to interpret a visual model's prediction on an input. They integrate model gradients for the input features over a path defined between the input and a reference, thereby satisfying certain desirable theoretical properties. However, their reliability hinges on the choice of the reference. Moreover, they do not exhibit weak dependence on the input, which leads to counter-intuitive feature attribution mapping. We show that path-based attribution can account for the weak dependence property by choosing the reference from the local distribution of the input. We devise a method to identify the local input distribution and propose a technique to stochastically integrate the model gradients over the paths defined by the references sampled from that distribution. Our local path integration (LPI) method is found to consistently outperform existing path attribution techniques when evaluated on deep visual models. Contributing to the ongoing search of reliable evaluation metrics for the interpretation methods, we also introduce DiffID metric that uses the relative difference between insertion and deletion games to alleviate the distribution shift problem faced by existing metrics. Our code is available at https://github.com/ypeiyu/LPI.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Re-calibrating Feature Attributions for Model Interpretation

Peiyu Yang
Naveed Akhtar
Zeyi Wen
Mubarak Shah
Ajmal Mian

The ability to interpret machine learning models is critical for high-stakes applications. Due to its desirable theoretical properties, path integration is a widely used scheme for feature attribution to interpret model predictions. However, the methods implementing this scheme currently rely on absolute attribution scores to eventually provide sensible interpretations. This not only contradicts the premise that the features with larger attribution scores are more relevant to the model prediction, but also conflicts with the theoretical settings for which the desirable properties of the attributions are proven. We address this by devising a method to first compute an appropriate reference for the path integration scheme. This reference further helps in identifying valid interpolation points on a desired integration path. The reference is computed in a gradient ascending direction on the model's loss surface, while the interpolations are performed by analyzing the model gradients and variations between the reference and the input. The eventual integration is effectively performed along a non-linear path. Our scheme can be incorporated into the existing integral-based attribution methods. We also devise an effective sampling and integration procedure that enables employing our scheme with multi-reference path integration efficiently. We achieve a marked performance boost for a range of integral-based attribution methods on both local and global evaluation metrics by enhancing them with our scheme. Our extensive results also show improved sensitivity, sanity preservation and model robustness with the proposed re-calibration of the attribution techniques with our method.

Details

AAAI Conference 2023 Conference Paper

Rethinking Interpretation: Input-Agnostic Saliency Mapping of Deep Visual Classifiers

Naveed Akhtar
Mohammad Amir Asim Khan Jalwana

Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs. Current methods mainly achieve this using a single input sample, thereby failing to answer input-independent inquiries about the model. We also show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution. Current attempts to use `general' input features for model interpretation assume access to a dataset containing those features, which biases the interpretation. Addressing the gap, we introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs. These features are geometrically correlated, and are computed by accumulating model's gradient information with respect to an unrestricted data distribution. To compute these features, we nudge independent data points over the model loss surface towards the local minima associated by a human-understandable concept, e.g., class label for classifiers. With a systematic projection, scaling and refinement process, this information is transformed into an interpretable visualization without compromising its model-fidelity. The visualization serves as a stand-alone qualitative interpretation. With an extensive evaluation, we not only demonstrate successful visualizations for a variety of concepts for large-scale models, but also showcase an interesting utility of this new form of saliency mapping by identifying backdoor signatures in compromised classifiers.

PDF Details DOI

ICRA Conference 2023 Conference Paper

Slice Transformer and Self-supervised Learning for 6DoF Localization in 3D Point Cloud Maps

Muhammad Ibrahim 0001
Naveed Akhtar
Saeed Anwar
Michael J. Wise
Ajmal Mian

Precise localization is critical for autonomous vehicles. We present a self-supervised learning method that employs transformers for the first time for the task of outdoor localization using LiDAR data. We propose a pre-text task that reorganizes the slices of a 360° LiDAR scan to leverage its axial properties. Our model, called Slice Transformer, employs multi-head attention while systematically processing the slices. To the best of our knowledge, this is the first instance of leveraging multi-head attention for outdoor point clouds. We additionally introduce the Perth-Wadataset, which provides a large-scale LiDAR map of Perth city in Western Australia, covering ~4km 2 area. Localization annotations are provided for Perth - Wa. The proposed localization method is thoroughly evaluated on Perth-WA and Appollo-SouthBay datasets. We also establish the efficacy of our self-supervised learning approach for the common downstream task of object classification using ModelNet40 and ScanNN datasets. The code and Perth-WA data will be publicly released.

Details

ICML Conference 2023 Conference Paper

Towards credible visual model interpretation with path attribution

Naveed Akhtar
Mohammad A. A. K. Jalwana

With its inspirational roots in game-theory, path attribution framework stands out among the post-hoc model interpretation techniques due to its axiomatic nature. However, recent developments show that despite being axiomatic, path attribution methods can compute counter-intuitive feature attributions. Not only that, for deep visual models, the methods may also not conform to the original game-theoretic intuitions that are the basis of their axiomatic nature. To address these issues, we perform a systematic investigation of the path attribution framework. We first pinpoint the conditions in which the counter-intuitive attributions of deep visual models can be avoided under this framework. Then, we identify a mechanism of integrating the attributions over the paths such that they computationally conform to the original insights of game-theory. These insights are eventually combined into a method, which provides intuitive and reliable feature attributions. We also establish the findings empirically by evaluating the method on multiple datasets, models and evaluation metrics. Extensive experiments show a consistent quantitative and qualitative gain in the results over the baselines.

Details

IROS Conference 2023 Conference Paper

UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Muhammad Ibrahim 0001
Naveed Akhtar
Saeed Anwar
Ajmal Mian

Localization is a fundamental task in robotics for autonomous navigation. Existing localization methods rely on a single input data modality or train several computational models to process different modalities. This leads to stringent computational requirements and sub-optimal results that fail to capitalize on the complementary information in other data streams. This paper proposes UnLoc, a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our multi-stream network can handle LiDAR, Camera and RADAR inputs for localization on demand, i. e. , it can work with one or more input sensors, making it robust to sensor failure. UnLoc uses 3D sparse convolutions and cylindrical partitioning of the space to process LiDAR frames and implements ResNet blocks with a slot attention-based feature filtering module for the Radar and image modalities. We introduce a unique learnable modality encoding scheme to distinguish between the input sensor data. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets. The results ascertain the efficacy of our technique. The dataset, results, and codes are available at https://github.com/IbrahimUWA/UnLoc

Details

ICAPS Conference 2014 Conference Paper

Towards Robust Task Execution for Domestic Service Robots

Anastassia Küstenmacher
Naveed Akhtar
Paul G. Plöger
Gerhard Lakemeyer

In the field of domestic service robots, recovery from faults is crucial to promote user acceptance. In this context we focus in particular on some specific faults, which arise from the interaction of a robot with its real world environment. Even a well-modelled robot may fail to perform its tasks successfully due to unexpected situations, which occur while interacting. These situations occur as deviations of properties of the objects (manipulated by the robot) from their expected values. Hence, they are experienced by the robot as external faults. In this paper we present two approaches to handle external faults which result from inadequate descriptions of a planner operator. In both approaches we assume that the robot is able to detect the occurrence of the fault at the planning level by monitoring the effects of an executed action. In our work we limit the scope of the sources of external faults to natural physical phenomena. Hence, we do not consider cases in which an external agent (e. g. another robot, a human being) is the cause of a detected fault. We apply the proposed approaches to scenarios in which the robot performs a manipulation task (pick and place).

Details