Arrow Research search

Author name cluster

Qian Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

EAAI Journal 2026 Journal Article

Depth-aware and continuous edge curves for large-view underwater image reconstruction

  • Jingchun Zhou
  • Qian Liu
  • Dehuan Zhang
  • Zifan Lin
  • Deepak Kumar Jain
  • Dragan Pamucar
  • Vladimir Simic

Consistency of geometric features poses a major challenge in perspective reconstruction, especially in complex underwater environments where existing methods struggle to utilize linear edge features. To address this, we propose a novel Depth Restoration Feature Stitching (DRFS) approach, which integrates four core procedures such as depth estimation (D-procedure), restoration (R-procedure), feature extraction (F-procedure), and image stitching (S-procedure) to reconstruct natural-looking, large-view underwater images. Our method leverages unsupervised deep learning techniques, including Monocular Depth Estimation v2 (Monodepth2), combined with domain priors to estimate accurate depth under varying illumination conditions. The restoration procedure enhances image contrast and corrects color distortion using a complex underwater imaging model. The feature extraction procedure constructs large-scale geometric structures based on depth-guided edge curves, addressing the challenge of inconsistent or missing straight lines in underwater scenes. The stitching procedure preserves structural consistency through grid alignment, global similarity, and pyramid-based image fusion. Experimental results demonstrate that our method produces visually appealing reconstructions with improved geometric fidelity. These capabilities make it well-suited for artificial intelligence applications in underwater robotics, intelligent marine perception, autonomous exploration, environmental monitoring, and large-scale ocean mapping.

TCS Journal 2026 Journal Article

k-Submodular and approximately non-k-submodular maximization under p-system and ℓ knapsack constraints

  • Hanlu Ye
  • Heqing Li
  • Min Li
  • Yang Zhou
  • Qian Liu

This paper addresses the problem of k-submodular and approximately non-k-submodular maximization under p-system and ℓ knapsack constraints. For monotone k-submodular functions, we first propose a greedy algorithm, achieving a 1 ( 1 + ϵ ′ ) ( 1 + p + 2 ℓ ) -approximation and a 1 ( 1 + ϵ ′ ) ( 2 + p + 2 ℓ ) -approximation for non-monotone case, with the O ( n 2 ( 1 + k ) log ( 2 n ) log ( 1 + ϵ ′ ) ) time complexity, where ϵ′ is a very small positive number. We further introduce an improved algorithm that enhances the approximation ratio to 1 ( 1 + ϵ ′ + ϵ ′ 2 ) ( 1 + p + 7 4 ℓ ) and 1 ( 1 + ϵ ′ + ϵ ′ 2 ) ( 2 + p + 7 4 ℓ ), respectively, while reducing the time complexity to O ( n k log n ϵ ′ log ( 2 n ) ). For monotone k-submodular functions with curvature c, we obtain an approximation result of 1 ( 1 + ϵ ′ ) ( p + c + ϵ ′ + 7 4 ℓ ). Additionally, we provide an approximation guarantee of min { 1, 1 α ( 1 + ϵ ′ ) } 1 + 1 + ϵ α 2 ( 1 − ϵ ) [ ( 1 + ϵ ′ ) ( p + α ϵ ′ ) + 7 4 ℓ ] for ϵ-approximately α-weakly diminishing returns functions.

AAAI Conference 2026 Conference Paper

SalDiff-DTM: A Novel Dual-Temporal Modulated Diffusion Model for Omnidirectional Images Scanpath Prediction

  • Xiaohui Kong
  • Qian Liu
  • Dandan Zhu
  • Kaiwei Zhang
  • Xiongkuo Min

Scanpath prediction in omnidirectional images (ODIs) serves as a critical component for optimizing foveated rendering efficiency and enhancing interactive quality in virtual reality systems. However, existing scanpath prediction methods for ODIs still suffer from fundamental limitations: (1) inadequate modeling and capturing of long-range temporal dependencies in fixation regions, and (2) suboptimal integration of spatial and temporal visual features, ultimately compromising prediction performance. To address these limitations, we propose a novel Dual-Temporal Modulated Diffusion model for Omnidirectional Images Scanpath Prediction, named SalDiff-DTM model, to effectively generate realistic human eye viewing trajectories. Specifically, to effectively model spatial relationships, we propose a novel Dual-Graph Convolutional Network (Dual-GCN) module that simultaneously captures semantic-level and image-level correlations. By integrating both local spatial details and global contextual information across the internal temporal dimension, this module achieves comprehensive and robust modeling of spatial relationships. To further enhance the modeling of temporal dependencies inherent in diverse fixation patterns, we introduce TABiMamba (Temporal-Aware BiLSTM-Mamba), a dedicated module that synergistically combines the contextual sensitivity of BiLSTM with the long-range sequence modeling capabilities of Mamba. This design facilitates deep information flow and context-aware sequential reasoning, thereby enabling high-fidelity capture of intricate temporal correlations. Inspired by the progressive refinement mechanism of diffusion models in various generative tasks, we propose a saliency-guided diffusion module that formulates the prediction problem as a conditional generative process, iteratively yielding accurate and perceptually plausible scanpaths. Extensive experiments demonstrate that SalDiff-DTM significantly outperforms state-of-the-art models, paving the way for future advancements in eye-tracking technologies and cognitive modeling.

JBHI Journal 2025 Journal Article

A Novel Approach for the Early Identification of Genetic Risk Factors for Alzheimer's Disease Using EEG and Psychometric Data

  • Shyamal Y. Dharia
  • Qian Liu
  • Stephen D. Smith
  • Camilo E. Valderrama

Alzheimer's disease (AD) is a progressive neurodegenerative disorder associated with impairments in memory and executive functions. Despite significant advancements in identifying genetic risk factors, the high cost and limited accessibility of genetic testing remain major barriers. In this work, we propose a cost-effective screening approach that leverages EEG recordings and psychometric test scores to predict an individual's genetic risk for AD. Our Convolutional Neural Network (CNN) model shows promising performance: it achieved an F1 score of 72. 21% in distinguishing APOE-ϵ4/PICALM GG non-carriers (N) from APOE-ϵ4 carriers with the risky PICALM GG alleles (A+P+). It reached an F1 score of 60. 78% for differentiating non-carriers (N) from APOE-ϵ4 carriers without the risky alleles (A+P-), and 65. 12% when separating A+P- from A+P+. To enhance interpretability, we employ Grad-CAM, which reveals that EEG features contribute more significantly to gene prediction than psychometric measures. Notably, our model also identifies three key psychometric tests, MINI COPE (which assesses emotional coping skills), the California Verbal Learning Test (CVLT), and NEO Neuroticism, as associated with higher AD risk, consistent with prior research. Moreover, our results align with earlier findings reporting increased theta-band power among high-risk individuals. Finally, Higuchi Fractal Dimension (HFD) features drove most of the EEG-based prediction capability, as shown through our ablation study. This study highlights the potential of integrating neurophysiological and cognitive assessments to develop accessible and reliable screening tools for AD genetic risk, enabling earlier diagnoses. The code has been released at https://github.com/Shyamal-Dharia/EEG-Psycho-Genes-AD.

NeurIPS Conference 2025 Conference Paper

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

  • Mingzhe Du
  • Anh Tuan Luu
  • Yue Liu
  • Yuhao Qing
  • Dong Huang
  • Xinyi He
  • Qian Liu
  • Zejun Ma

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore three training strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization~(GRPO). Experiments on our Venus dataset and the APPS benchmark show that SFT and DPO rapidly saturate in efficiency gains. In contrast, GRPO, using reinforcement learning (RL) with execution feedback, continuously optimizes code performance, significantly boosting both pass@1 (from 47% to 62%) and the likelihood of outperforming human submissions in efficiency (from 31% to 45%). Our work demonstrates effective test-time code efficiency improvement and critically reveals the power of RL in teaching LLMs to truly self-improve code efficiency. We released our code and data at https: //github. com/Elfsong/Afterburner.

IJCAI Conference 2025 Conference Paper

Consensus-Guided Incomplete Multi-view Clustering via Cross-view Affinities Learning

  • Qian Liu
  • Huibing Wang
  • Jinjia Peng
  • Yawei Chen
  • Mingze Yao
  • Xianping Fu
  • Yang Wang

Incomplete multi-view clustering (IMC) has garnered substantial attention due to its capacity to handle unlabeled data. Existing methods predominantly explore pairwise consistency between every two views. However, such consistency is highly susceptible to missing samples and outliers within a certain view and thus deviates from the true clustering distribution. Moreover, dual-view interaction neglects the collaboration effects of multiple views, making it challenging to capture the holistic characteristics across views. In response to these issues, we propose a novel Consensus-Guided Incomplete Multi-view Clustering via Cross-view Affinities Learning (CAL). Specifically, CAL reconstructs views with available instances to mine sample-wise affinities and harness comprehensive content information within views. Subsequently, to extract clean structural information, CAL imposes a structured sparse constraint on the representation tensor to eliminate biased errors. Furthermore, by integrating the consensus representation into a representation tensor, CAL can employ high-order interaction of multiple views to depict the semantic correlation between views while acquiring a unified structural graph across multiple views. Extensive experiments on seven benchmark datasets demonstrate that CAL outperforms some state-of-the-art methods in clustering performance. The code is available at https: //github. com/whbdmu/CAL.

JBHI Journal 2025 Journal Article

Fractal Dimension of Resting-State EEG as a Biomarker for Autonomous Sensory Meridian Response (ASMR)

  • Shyamal Y. Dharia
  • Camilo E. Valderrama
  • Qian Liu
  • Beverley K. Fredborg
  • Amy S. Desroches
  • Stephen D. Smith

Autonomous Sensory Meridian Response (ASMR) is an audio-visual phenomenon characterized by multisensory experiences in response to specific auditory stimuli, typically triggering a tingling sensation beginning in the scalp and neck and accompanied by decreased heart rate and deep relaxation. While prior electroencephalogram (EEG) studies have identified ASMR-related neural signatures in stimulus-based paradigms, resting-state differe nces between ASMR-sensitive (ASMR+) and non-sensitive (ASMR-) individuals remain unexplored. In this study, we apply Higuchi’s fractal dimension (HFD) to eyes-open and eyes-closed resting-state EEG and demonstrate that ASMR+ participants exhibit significantly lower complexity in the delta (1–4Hz) and theta (4–8Hz) bands and higher complexity in the alpha (8–12Hz) band. Moreover, we train Transformer, Mamba, Random Forest and SVM classifiers on these HFD features to distinguish ASMR+ individuals from ASMR-, achieving F1 scores of 82. 56%, 77. 33%, 73. 93%, and 70. 85%, respectively. Finally, using an explainable-AI approach, we showed that ASMR+ participants had significantly lower hubness proportions (network connectivity) than ASMR-. These findings reveal novel resting-state biomarkers of ASMR sensitivity and lay the groundwork for rapid, noninvasive EEG-based screening in ASMR-augmented therapeutic applications. The code has been released on https://github.com/Shyamal-Dharia/Fractal-Dimension-of-Resting-State-EEG-as-a-Biomarker-for-Autonomous-Sensory-Meridian-Response-ASMR-GitHub.

NeurIPS Conference 2025 Conference Paper

General-Reasoner: Advancing LLM Reasoning Across All Domains

  • Xueguang Ma
  • Qian Liu
  • Dongfu Jiang
  • Ge Zhang
  • Zejun Ma
  • Wenhu Chen

Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs). Particularly, the "Zero" reinforcement learning introduced by Deepseek-R1-Zero, enables direct RL training of base LLMs without relying on an intermediate supervised fine-tuning stage. Despite these advancements, current works for LLM reasoning mainly focus on mathematical and coding domains, largely due to data abundance and the ease of answer verification. This limits the applicability and generalization of such models to broader domains, where questions often have diverse answer representations, and data is more scarce. In this paper, we propose General-Reasoner, a novel training framework designed to enhance LLM reasoning capabilities across diverse domains. Our key contributions include: (1) constructing a large-scale, high-quality dataset of questions with verifiable answers curated by web crawling, covering a wide range of disciplines; and (2) developing a generative model-based answer verifier, which replaces traditional rule-based verification with the capability of chain-of-thought and context-awareness. We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc. Our comprehensive evaluation across these 12 benchmarks (e. g. MMLU-Pro, GPQA, SuperGPQA, TheoremQA, BBEH and MATH AMC) demonstrates that General-Reasoner outperforms existing baseline methods, achieving robust and generalizable reasoning performance while maintaining superior effectiveness in mathematical reasoning tasks.

ICRA Conference 2025 Conference Paper

High-Resolution Reconstruction of Non-Planar Tactile Patterns From Low-Resolution Taxel-Based Tactile Sensors

  • Chen Zhou
  • He Zhao
  • Qian Liu

Over the past decades, the development of tactile sensors has gained increasing attention and has gradually become a fundamental device for robots. Especially in today's context where human-robot interaction demands are growing and the requirements for tactile perception are becoming stricter, how to enable robots to better perceive their environment has become a topic worth discussing. Tactile sensors, after years of development, have emerged in two main types: taxel-based and vision-based sensors, where the latter can provide relatively low resolution (LR) tactile patterns compared with the former. Both of them have seen significant enhancements in their tactile perception capabilities on flat and regular surfaces. However, as application scenarios expand, current flat tactile perception can no longer meet the robots' needs for multi-dimensional and complex perception capabilities. Therefore, we investigate the high-resolution (HR) reconstruction of non-planar tactile patterns captured by LR taxel-based sensors in this paper. We first develop a new dataset, where the ground truth of non-planar tactile patterns are obtained with a vision-based GelSight Mini tactile sensor, and the LR data are collected via a commercial taxel-based Xela sensor. In addition, we propose to adapt the state-of-the-art CNN- and GAN-based tactile super-resolution model of flat/planar surfaces to the non-planar scenario, and also develop a diffusion-based model for the nonplanar HR reconstruction. Experimental results confirm the efficiency of the proposed models.

NeurIPS Conference 2025 Conference Paper

MuSLR: Multimodal Symbolic Logical Reasoning

  • Jundong Xu
  • Hao Fei
  • Yuhui Zhang
  • Liangming Pan
  • Qijun Huang
  • Qian Liu
  • Preslav Nakov
  • Min-Yen Kan

Multimodal symbolic logical reasoning, which aims to deduce new facts from multimodal input via formal logic, is critical in high-stakes applications such as autonomous driving and medical diagnosis, as its rigorous, deterministic reasoning helps prevent serious consequences. To evaluate such capabilities of current state-of-the-art vision language models (VLMs), we introduce the first benchmark MuSLR for multimodal symbolic logical reasoning grounded in formal logical rules. MuSLR comprises 1, 093 instances across 7 domains, including 35 atomic symbolic logic and 976 logical combinations, with reasoning depths ranging from 2 to 9. We evaluate 7 state-of-the-art VLMs on MuSLR and find that they all struggle with multimodal symbolic reasoning, with the best model, GPT-4. 1, achieving only 46. 8%. Thus, we propose LogiCAM, a modular framework that applies formal logical rules to multimodal inputs, boosting GPT-4. 1’s Chain-of-Thought performance by 14. 13%, and delivering even larger gains on complex logics such as first-order logic. We also conduct a comprehensive error analysis, showing that around 70% of failures stem from logical misalignment between modalities, offering key insights to guide future improvements.

IROS Conference 2025 Conference Paper

Refer and Grasp: Vision-Language Guided Continuous Dexterous Grasping

  • Yayu Huang
  • Dongxuan Fan
  • Wen Qi 0002
  • Daheng Li
  • Yifan Yang
  • Yongkang Luo 0001
  • Jia Sun 0008
  • Qian Liu

Robotic grasping guided by natural language instructions faces challenges due to ambiguities in object descriptions and the need to interpret complex spatial context. Existing visual grounding methods often rely on datasets that fail to capture these complexities, particularly when object categories are vague or undefined. To address these challenges, we make three key contributions. First, we present an automated dataset generation engine for visual grounding in tabletop grasping, combining procedural scene synthesis with template-based referring expression generation, requiring no manual labeling. Second, we introduce the RefGrasp dataset, featuring diverse indoor environments and linguistically challenging expressions for robotic grasping tasks. Third, we propose a visually grounded dexterous grasping framework with continuous grasp generation, validated through extensive real-world robotic experiments. Our work offers a novel approach for language-guided robotic manipulation, providing both a challenging dataset and an effective grasping framework for real-world applications. Project website: https://refer-and-grasp.github.io.

NeurIPS Conference 2025 Conference Paper

SkyLadder: Better and Faster Pretraining via Context Window Scheduling

  • Tongyao Zhu
  • Qian Liu
  • Haonan Wang
  • Shiqi Chen
  • Xiangming Gu
  • Tianyu Pang
  • Min-Yen Kan

Recent advancements in LLM pretraining have featured ever-expanding context windows to process longer sequences. However, our controlled study reveals that models pretrained with shorter context windows consistently outperform their long-context counterparts under a fixed token budget. This finding motivates us to explore an optimal context window scheduling strategy to better balance long-context capability with pretraining efficiency. To this end, we propose SkyLadder, a simple yet effective approach that implements a short-to-long context window transition. SkyLadder preserves strong standard benchmark performance, while matching or exceeding baseline results on long-context tasks. Through extensive experiments, we pretrain 1B-parameter models (up to 32K context) and 3B-parameter models (8K context) on 100B tokens, demonstrating that SkyLadder yields consistent gains of up to 3. 7% on common benchmarks, while achieving up to 22% faster training speeds compared to baselines.

JBHI Journal 2025 Journal Article

Spatial Prior-Guided Dual-Path Network for Thyroid Nodule Segmentation

  • Chen Pang
  • Hui Miao
  • Renfeng Zhang
  • Qian Liu
  • Lei Lyu

Accurate segmentation of thyroid nodules in ultrasound images is critical for clinical diagnosis but remains challenging due to low contrast and complex anatomical structures. Existing deep learning methods often rely solely on local nodule features, lacking anatomical prior knowledge of the thyroid region, which can result in misclassification of non-thyroid tissues, especially in low-quality scans. To address these issues, we propose a Spatial Prior-Guided Dual-Path Network that integrates a prior-aware encoder to model thyroid anatomical structures and a low-cost heterogeneous encoder to preserve fine-grained multi-scale features, enhancing both spatial detail and contextual awareness. To capture the diverse and irregular appearances of nodules, we design a CrossBlock module, which combines an efficient cross-attention mechanism with mixed-scale convolutional operations to enable global context modeling and local feature extraction. The network further employs a dual-decoder architecture, where one decoder learns thyroid region priors and the other focuses on accurate nodule segmentation. Gland-specific features are hierarchically refined and injected into the nodule decoder to enhance boundary delineation through anatomical guidance. Extensive experiments on the TN3K and MTNS datasets demonstrate that our method consistently outperforms state-of-the-art approaches, particularly in boundary precision and localization accuracy, offering practical value for preoperative planning and clinical decision-making.

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

  • Xeron Du
  • Yifan Yao
  • Kaijing Ma
  • Bingli Wang
  • Tianyu Zheng
  • Minghao Liu
  • Yiming Liang
  • Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

TMLR Journal 2025 Journal Article

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

  • Haonan Wang
  • Qian Liu
  • Chao Du
  • Tongyao Zhu
  • Cunxiao Du
  • Kenji Kawaguchi
  • Tianyu Pang

Extending context window sizes allows large language models (LLMs) to process longer sequences and handle more complex tasks. Rotary Positional Embedding (RoPE) has become the de facto standard due to its relative positional encoding properties that benefit long-context training. However, we observe that using RoPE with BFloat16 format results in numerical issues, causing it to deviate from its intended relative positional encoding, especially in long-context scenarios. This issue arises from BFloat16's limited precision and accumulates as context length increases, with the first token contributing significantly to this problem. Despite its limitations, BFloat16 remains desirable for its computational efficiency, particularly given the substantial memory overhead required to extend the context window. To improve long-context training under BFloat16, we develop AnchorAttention, a plug-and-play attention method that enhances long-context capabilities, and speeds up training. AnchorAttention reduces unnecessary attention computations, maintains semantic coherence, and boosts computational efficiency by treating the first token as a shared anchor with a consistent position ID, making it visible to all documents within the training context. Experiments on three types of LLMs demonstrate that AnchorAttention significantly improves long-context performance and reduces training time by over 50\% compared to standard full attention mechanisms, while preserving the original LLM's capabilities on general tasks.

NeurIPS Conference 2025 Conference Paper

ZeCO: Zero-Communication Overhead Sequence Parallelism for Linear Attention

  • Yuhong Chou
  • Zehao Liu
  • Rui-jie Zhu
  • Xinyi Wan
  • Tianjian Li
  • Congying Chu
  • Qian Liu
  • Jibin Wu

Linear attention mechanisms deliver significant advantages for Large Language Models (LLMs) by providing linear computational complexity, enabling efficient processing of ultra-long sequences (e. g. , 1M context). However, existing Sequence Parallelism (SP) methods, essential for distributing these workloads across devices, become the primary performance bottleneck due to substantial communication overhead. In this paper, we introduce ZeCO (Zero Communication Overhead) sequence parallelism for linear attention models, a new SP method designed to overcome these limitations and achieve practically end-to-end near-linear scalability for long sequence training. For example, training a model with a 1M sequence length across 64 devices using ZeCO takes roughly the same time as training with an 16k sequence on a single device. At the heart of ZeCO lies All-Scan, a novel collective communication primitive. All-Scan provides each SP rank with precisely the initial operator state it requires while maintaining a minimal communication footprint, effectively eliminating communication overhead. Theoretically, we prove the optimaity of ZeCO, showing that it introduces only negligible time and space overhead. Empirically, we compare the communication costs of different sequence parallelism strategies and demonstrate that All-Scan achieves the fastest communication in SP scenarios. Specifically, on 256 GPUs with an 8M sequence length, ZeCO achieves a 60\% speedup compared to the current state-of-the-art (SOTA) SP method. We believe ZeCO establishes a clear path toward efficiently training next-generation LLMs on previously intractable sequence lengths.

EAAI Journal 2024 Journal Article

Attention based lightweight asymmetric network for real-time semantic segmentation

  • Qian Liu
  • Cunbao Wang
  • Zhensheng Li
  • Youwei Qi
  • Jiongtao Fang

Real-time semantic segmentation is one of the important tasks in the field of computer vision, which is widely used in the fields of autonomous driving and medical imaging. Existing lightweight networks usually improve inference speed at the sacrifice of segmentation accuracy. How to achieve a balance between accuracy and speed is still a challenging problem for real-time semantic segmentation. In this paper, we propose an attention based lightweight asymmetric network (ALANet) to address this problem. Specifically, in the encoder, a channel-wise attention based depth-wise asymmetric block (CADAB) is designed to extract sufficient features, which has a small number of parameters. In the decoder, a spatial attention based pyramid pooling (SAPP) module is presented to aggregate multi-scale context information by using a few convolutions and poolings; and a pixel-wise attention based multi-scale feature fusion (PAMFF) module is developed to fuse features from different scales and generate pixel-wise attention for improving image restoration. Our ALANet has only 1. 32M parameters. Experimental results on the Cityscapes and CamVid datasets show that ALANet obtains the segmentation accuracy (mIoU) of 74. 4% and 69. 5% and the inference speed of 115. 6FPS and 113. 2FPS, respectively. These results demonstrate that ALANet achieves a good balance between accuracy and speed.

NeurIPS Conference 2024 Conference Paper

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

  • Xuan Zhang
  • Chao Du
  • Tianyu Pang
  • Qian Liu
  • Wei Gao
  • Min Lin

The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. This deliberation, however, comes at the cost of significantly increased inference complexity. In this work, we demonstrate that fine-tuning LLMs leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance, thereby avoiding the substantial inference burden. This is achieved through \emph{Chain of Preference Optimization} (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT using the inherent preference information in the tree-search process. Extensive experimental results show that CPO significantly improves LLM performance in solving a variety of complex problems, including question answering, fact verification, and arithmetic reasoning, demonstrating its effectiveness. Our code is available at https: //github. com/sail-sg/CPO.

JBHI Journal 2024 Journal Article

Clustering-Guided Twin Contrastive Learning for Endomicroscopy Image Classification

  • Jingjun Zhou
  • Xiangjiang Dong
  • Qian Liu

Learning better representations is essential in medical image analysis for computer-aided diagnosis. However, learning discriminative semantic features is a major challenge due to the lack of large-scale well-annotated datasets. Thus, how can we learn a well-structured categorizable embedding space in limited-scale and unlabeled datasets? In this paper, we proposed a novel clustering-guided twin-contrastive learning framework (CTCL) that learns the discriminative representations of probe-based confocal laser endomicroscopy (pCLE) images for gastrointestinal (GI) tumor classification. Compared with traditional contrastive learning, in which only two randomly augmented views of the same instance are considered, the proposed CTCL aligns more semantically related and class-consistent samples by clustering, which improved intra-class tightness and inter-class variability to produce more informative representations. Furthermore, based on the inherent properties of CLE (geometric invariance and intrinsic noise), we proposed to regard CLE images with any angle rotation and CLE images with different noises as the same instance, respectively, for increased variability and diversity of samples. By optimizing CTCL in an end-to-end expectation-maximization framework, comprehensive experimental results demonstrated that CTCL-based visual representations achieved competitive performance on each downstream task as well as more robustness and transferability compared with existing state-of-the-art SSL and supervised methods. Notably, CTCL achieved 75. 60%/78. 45% and 64. 12%/77. 37% top-1 accuracy on the linear evaluation protocol and few-shot classification downstream tasks, respectively, which outperformed the previous best results by 1. 27%/1. 63% and 0. 5%/3%, respectively. The proposed method holds great potential to assist pathologists in achieving an automated, fast, and high-precision diagnosis of GI tumors and accurately determining different stages of tumor development based on CLE images.

YNIMG Journal 2024 Journal Article

Developing an AI-empowered head-only ultra-high-performance gradient MRI system for high spatiotemporal neuroimaging

  • Dan Wu
  • Liyi Kang
  • Haotian Li
  • Ruicheng Ba
  • Zuozhen Cao
  • Qian Liu
  • Yingchao Tan
  • Qinwei Zhang

Recent advances in neuroscience requires high-resolution MRI to decipher the structural and functional details of the brain. Developing a high-performance gradient system is an ongoing effort in the field to facilitate high spatial and temporal encoding. Here, we proposed a head-only gradient system NeuroFrontier, dedicated for neuroimaging with an ultra-high gradient strength of 650 mT/m and 600 T/m/s. The proposed system features in 1) ultra-high power of 7MW achieved by running two gradient power amplifiers using a novel paralleling method; 2) a force/torque balanced gradient coil design with a two-step mechanical structure that allows high-efficiency and flexible optimization of the peripheral nerve stimulation; 3) a high-density integrated RF system that is miniaturized and customized for the head-only system; 4) an AI-empowered compressed sensing technique that enables ultra-fast acquisition of high-resolution images and AI-based acceleration in q-t space for diffusion MRI (dMRI); and 5) a prospective head motion correction technique that effectively corrects motion artifacts in real-time with 3D optical tracking. We demonstrated the potential advantages of the proposed system in imaging resolution, speed, and signal-to-noise ratio for 3D structural MRI (sMRI), functional MRI (fMRI) and dMRI in neuroscience applications of submillimeter layer-specific fMRI and dMRI. We also illustrated the unique strength of this system for dMRI-based microstructural mapping, e.g., enhanced lesion contrast at short diffusion-times or high b-values, and improved estimation accuracy for cellular microstructures using diffusion-time-dependent dMRI or for neurite microstructures using q-space approaches.

ICRA Conference 2024 Conference Paper

GrainGrasp: Dexterous Grasp Generation with Fine-grained Contact Guidance

  • Fuqiang Zhao
  • Dzmitry Tsetserukou
  • Qian Liu

One goal of dexterous robotic grasping is to allow robots to handle objects with the same level of flexibility and adaptability as humans. However, it remains a challenging task to generate an optimal grasping strategy for dexterous hands, especially when it comes to delicate manipulation and accurate adjustment the desired grasping poses for objects of varying shapes and sizes. In this paper, we propose a novel dexterous grasp generation scheme called GrainGrasp that provides fine-grained contact guidance for each fingertip. In particular, we employ a generative model to predict separate contact maps for each fingertip on the object point cloud, effectively capturing the specifics of finger-object interactions. In addition, we develop a new dexterous grasping optimization algorithm that solely relies on the point cloud as input, eliminating the necessity for complete mesh information of the object. By leveraging the contact maps of different fingertips, the proposed optimization algorithm can generate precise and determinable strategies for human-like object grasping. Experimental results confirm the efficiency of the proposed scheme. Our code is available at https://github.com/wmtlab/GrainGrasp.

NeurIPS Conference 2024 Conference Paper

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

  • Xiaosen Zheng
  • Tianyu Pang
  • Chao Du
  • Qian Liu
  • Jing Jiang
  • Min Lin

Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting special system tokens like [/INST] and employing demo-level random search from a collected demo pool. These simple techniques result in surprisingly effective jailbreaking against aligned LLMs (even with advanced defenses). For example, our method achieves >80% (mostly >95%) ASRs on Llama-2-7B and Llama-3-8B without multiple restarts, even if the models are enhanced by strong defenses such as perplexity detection and/or SmoothLLM, which is challenging for suffix-based jailbreaking. In addition, we conduct comprehensive and elaborate (e. g. , making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs. Our code is available at https: //github. com/sail-sg/I-FSJ.

TMLR Journal 2024 Journal Article

Mantis: Interleaved Multi-Image Instruction Tuning

  • Dongfu Jiang
  • Xuan He
  • Huaye Zeng
  • Cong Wei
  • Max Ku
  • Qian Liu
  • Wenhu Chen

Large multimodal models (LMMs) have shown great results in single-image vision language tasks. However, their abilities to solve multi-image visual language tasks is yet to be improved. The existing LMMs like OpenFlamingo, Emu2, and Idefics gain their multi-image ability through pre-training on hundreds of millions of noisy interleaved image-text data from the web, which is neither efficient nor effective. In this paper, we aim to build strong multi-image LMMs via instruction tuning with academic-level resources. Therefore, we meticulously construct Mantis-Instruct containing 721K multi-image instruction data to train a family of Mantis models. The instruction tuning empowers Mantis with different multi-image skills like co-reference, comparison, reasoning, and temporal understanding. We evaluate Mantis on 8 multi-image benchmarks and 6 single-image benchmarks. Mantis-Idefics2 can achieve SoTA results on all the multi-image benchmarks and beat the strongest multi-image baseline, Idefics2-8B by an average of 13 absolute points. Notably, Idefics2-8B was pre-trained on 140M interleaved multi-image data, which is 200x larger than Mantis-Instruct. We observe that Mantis performs equivalently well on the held-in and held-out benchmarks, which shows its generalization ability. We further evaluate Mantis on single-image benchmarks and demonstrate that Mantis also maintains a strong single-image performance on par with CogVLM and Emu2. Our results show that multi-image abilities are not necessarily gained through massive pre-training, instead, they can be gained by low-cost instruction tuning. The training and evaluation of Mantis has paved the road for future work to improve LMMs' multi-image abilities.

NeurIPS Conference 2024 Conference Paper

Mercury: A Code Efficiency Benchmark for Code Large Language Models

  • Mingzhe Du
  • Luu A. Tuan
  • Bin Ji
  • Qian Liu
  • See-Kiong Ng

Amidst the recent strides in evaluating Large Language Models for Code (Code LLMs), existing benchmarks have mainly focused on the functional correctness of generated code, neglecting the importance of their computational efficiency. To fill the gap, we present Mercury, the first code efficiency benchmark for Code LLMs. It comprises 1, 889 Python tasks, each accompanied by adequate solutions that serve as real-world efficiency baselines, enabling a comprehensive analysis of the runtime distribution. Based on the distribution, we introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and code efficiency simultaneously. On Mercury, leading Code LLMs can achieve 65% on Pass, while less than 50% on Beyond. Given that an ideal Beyond score would be aligned with the Pass score, it indicates that while Code LLMs exhibit impressive capabilities in generating functionally correct code, there remains a notable gap in their efficiency. Finally, our empirical experiments reveal that Direct Preference Optimization (DPO) serves as a robust baseline for enhancing code efficiency compared with Supervised Fine Tuning (SFT), which paves a promising avenue for future exploration of efficient code generation. Our code and data are available on GitHub: https: //github. com/Elfsong/Mercury.

NeurIPS Conference 2024 Conference Paper

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

  • Chaofan Tao
  • Qian Liu
  • Longxu Dou
  • Niklas Muennighoff
  • Zhongwei Wan
  • Ping Luo
  • Min Lin
  • Ngai Wong

Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the conclusion that the optimal vocabulary size depends on the compute budget, with larger models requiring larger vocabularies. Most LLMs, however, use insufficient vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2-70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29. 1 to 32. 0 with the same 2. 3e21 FLOPs. Our work highlights the importance of jointly considering tokenization and model scaling for efficient pre-training. The code and demo are available at https: //github. com/sail-sg/scaling-with-vocab and https: //hf. co/spaces/sail/scaling-with-vocab-demo.

NeurIPS Conference 2024 Conference Paper

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

  • Ruisheng Cao
  • Fangyu Lei
  • Haoyuan Wu
  • Jixuan Chen
  • Yeqiao Fu
  • Hongcheng Gao
  • Xinzhuang Xiong
  • Hanchong Zhang

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14. 0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16. 2%) and involve remote cloud-hosted workspaces (10. 6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https: //spider2-v. github. io.

EAAI Journal 2023 Journal Article

A prospect theory-based MABAC algorithm with novel similarity measures and interactional operations for picture fuzzy sets and its applications

  • Tao Wang
  • Xinxing Wu
  • Harish Garg
  • Qian Liu
  • Guanrong Chen

Picture fuzzy set (PFS) is one the reliable tool to handle the uncertainties in the data as compared to the intuitionistic fuzzy set (IFS) or fuzzy set. PFS simultaneously handle the four degrees namely, membership, neutrality, non-membership, and refusal, and thus widely applicable to solve the real-life decision-making problems more accurately. Keeping their advantages, in this paper, we present some interactive operational laws for the picture fuzzy numbers (PFNs) to aggregate picture fuzzy information. Also, we state some new information measures namely picture fuzzy similarity measures (PFSimMs) based on fuzzy strict negations, which can overcome the various drawbacks of the existing PFSimMs. The various properties and their features are studied in detail to show their advantages. Finally, we develop a prospect theory-based multi-attributive border approximation area comparison (MABAC) method under picture fuzzy environment by using the proposed operational laws and PFSimMs to solve the decision-making problems. The applicability of the developed algorithm is explained through a numerical example and show its superiorities.

AAAI Conference 2023 Conference Paper

On Grounded Planning for Embodied Tasks with Language Models

  • Bill Yuchen Lin
  • Chengsong Huang
  • Qian Liu
  • Wenda Gu
  • Sam Sommerer
  • Xiang Ren

Language models (LMs) have demonstrated their capability in possessing commonsense knowledge of the physical world, a crucial aspect of performing tasks in everyday life. However, it remains unclear whether they have the capacity to generate grounded, executable plans for embodied tasks. This is a challenging task as LMs lack the ability to perceive the environment through vision and feedback from the physical environment. In this paper, we address this important research question and present the first investigation into the topic. Our novel problem formulation, named G-PlanET, inputs a high-level goal and a data table about objects in a specific environment, and then outputs a step-by-step actionable plan for a robotic agent to follow. To facilitate the study, we establish an evaluation protocol and design a dedicated metric, KAS, to assess the quality of the plans. Our experiments demonstrate that the use of tables for encoding the environment and an iterative decoding strategy can significantly enhance the LMs' ability in grounded planning. Our analysis also reveals interesting and non-trivial findings.

TMLR Journal 2023 Journal Article

StarCoder: may the source be with you!

  • Raymond Li
  • Loubna Ben allal
  • Yangtian Zi
  • Niklas Muennighoff
  • Denis Kocetkov
  • Chenghao Mou
  • Marc Marone
  • Christopher Akiki

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

IJCAI Conference 2022 Conference Paper

Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering

  • Wanjun Zhong
  • Junjie Huang
  • Qian Liu
  • Ming Zhou
  • Jiahai Wang
  • Jian Yin
  • Nan Duan

Tabular and textual question answering requires systems to perform reasoning over heterogeneous information, considering table structure, and the connections among table and text. In this paper, we propose a ChAin-centric Reasoning and Pre-training framework (CARP). CARP utilizes hybrid chain to model the explicit intermediate reasoning process across table and text for question answering. We also propose a novel chain-centric pre-training method, to enhance the pre-trained model in identifying the cross-modality reasoning process and alleviating the data sparsity problem. This method constructs the large-scale reasoning corpus by synthesizing pseudo heterogeneous reasoning paths from Wikipedia and generating corresponding questions. We evaluate our system on OTT-QA, a large-scale table-and-text open-domain question answering benchmark, and our system achieves the state-of-the-art performance. Further analyses illustrate that the explicit hybrid chain offers substantial performance improvement and interpretablity of the intermediate reasoning process, and the chain-centric pre-training boosts the performance on the chain extraction.

TCS Journal 2022 Journal Article

The submodularity of two-stage stochastic maximum-weight independent set problems

  • Min Li
  • Hao Xiao
  • Qian Liu
  • Yang Zhou

In this paper, we extend the maximal independent set problem to two-stage stochastic case: given an independence system associated with one deterministic weight function and a random weight function, the goal is to find two nonoverlapping independent subsets from these two stages with the maximum total weight. In this paper, we study the submodularity of three kinds of two-stage independent set problems with max-weight. When the independent set problem is a matroid constraint, we can show its submodularity. However, neither submodular nor supermodular maximization problem can be obtained for the knapsack independent set problem by designing a counterexample. At last, we show that the robust two-stage stochastic maximum-weight uniform matroid problem can be formulated as a γ-submodular problem with cardinality constraint and also give a lower bound for γ.

TCS Journal 2021 Journal Article

Approximation algorithms for fuzzy C-means problem based on seeding method

  • Qian Liu
  • Jianxin Liu
  • Min Li
  • Yang Zhou

As a kind of important soft clustering model, the fuzzy C-means method is widely applied in many fields. In this method, instead of the strict distributive ability in the classical k-means method, all the sample points are endowed with degrees of membership to each center to depict the fuzzy clustering. In this paper, we show that the fuzzy C-means++ algorithm, which introduces the k-means++ algorithm as a seeding strategy, gives a solution for which the approximation guarantee is O ( k 2 ln ⁡ k ). A novel seeding algorithm is then designed based on the contribution of the fuzzy potential function, which improves the approximation ratio to O ( k ln ⁡ k ). Preliminary numerical experiments are proposed to support the theoretical results of this paper.

IS Journal 2021 Journal Article

Concept Representation by Learning Explicit and Implicit Concept Couplings

  • Wenpeng Lu
  • Yuteng Zhang
  • Shoujin Wang
  • Heyan Huang
  • Qian Liu
  • Sheng Luo

Generating the precise semantic representation of a word or concept is a fundamental task in natural language processing. Recent studies which incorporate semantic knowledge into word embedding have shown their potential in improving the semantic representation of a concept. However, existing approaches only achieved limited performance improvement as they usually 1) model a word’s semantics from some explicit aspects while ignoring the intrinsic aspects of the word, 2) treat semantic knowledge as a supplement of word embeddings, and 3) consider partial relations between concepts while ignoring rich coupling relations between them, such as explicit concept co-occurrences in descriptive texts in a corpus as well as concept hyperlink relations in a knowledge network, and implicit couplings between concept co-occurrences and hyperlinks. In human consciousness, a concept is always associated with various couplings that exist within/between descriptive texts and knowledge networks, which inspires us to capture as many concept couplings as possible for building a more informative concept representation. We thus propose a neural coupled concept representation (CoupledCR) framework and its instantiation: a coupled concept embedding (CCE) model. CCE first learns two types of explicit couplings that are based on concept co-occurrences and hyperlink relations, respectively, and then learns a type of high-level implicit couplings between these two types of explicit couplings for better concept representation. Extensive experimental results on six real-world datasets show that CCE significantly outperforms eight state-of-the-art word embeddings and semantic representation methods.

IJCAI Conference 2021 Conference Paper

Keep the Structure: A Latent Shift-Reduce Parser for Semantic Parsing

  • Yuntao Li
  • Bei Chen
  • Qian Liu
  • Yan Gao
  • Jian-Guang Lou
  • Yan Zhang
  • Dongmei Zhang

Traditional end-to-end semantic parsing models treat a natural language utterance as a holonomic structure. However, hierarchical structures exist in natural languages, which also align with the hierarchical structures of logical forms. In this paper, we propose a latent shift-reduce parser, called LASP, which decomposes both natural language queries and logical form expressions according to their hierarchical structures and finds local alignment between them to enhance semantic parsing. LASP consists of a base parser and a shift-reduce splitter. The splitter dynamically separates an NL query into several spans. The base parser converts the relevant simple spans into logical forms, which are further combined to obtain the final logical form. We conducted empirical studies on two datasets across different domains and different types of logical forms. The results demonstrate that the proposed method significantly improves the performance of semantic parsing, especially on unseen scenarios.

TCS Journal 2020 Journal Article

A general method for decomposing self-intersecting polygon to normal based on self-intersection points

  • Yong Cui
  • Qian Liu
  • Guo Chen
  • Hujun Zhang

Checking whether polygons are self-intersecting or not is an important step for GIS projects before they are published to the web. Automatically converting self-intersection polygons into normal ones is practically useful, especially there are numerous polygons need to be processed. Based on the relationships of self-intersection points, this paper presents an algorithm to convert a complex self-intersection polygon to a normal one which has no self-intersection part. Furthermore, using the relationships of the repeat points (original self-intersection points) of the decomposed polygon, the result of the only simple polygon can be split into independent sub-polygons bounded by those points. The algorithm is easy to understand and with high efficiency because we consider only the self-intersection point relationships of the polygon, and we do not pay attention to the edges and their directions. A point structure in which the relationships of the self-intersection points are defined is used in the algorithm.

NeurIPS Conference 2020 Conference Paper

Compositional Generalization by Learning Analytical Expressions

  • Qian Liu
  • Shengnan An
  • Jian-Guang Lou
  • Bei Chen
  • Zeqi Lin
  • Yan Gao
  • Bin Zhou
  • Nanning Zheng

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules, Composer and Solver, fitting well with the cognitive argument while being able to be trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.

IJCAI Conference 2020 Conference Paper

How Far are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context

  • Qian Liu
  • Bei Chen
  • Jiaqi Guo
  • Jian-Guang Lou
  • Bin Zhou
  • Dongmei Zhang

Recently semantic parsing in context has received a considerable attention, which is challenging since there are complex contextual phenomena. Previous works verified their proposed methods in limited scenarios, which motivates us to conduct an exploratory study on context modeling methods under real-world semantic parsing in context. We present a grammar-based decoding semantic parser and adapt typical context modeling methods on top of it. We evaluate 13 context modeling methods on two large complex cross-domain datasets, and our best model achieves state-of-the-art performances on both datasets with significant improvements. Furthermore, we summarize the most frequent contextual phenomena, with a fine-grained analysis on representative models, which may shed light on potential research directions. Our code is available at https: //github. com/microsoft/ContextualSP.

IJCAI Conference 2020 Conference Paper

RECPARSER: A Recursive Semantic Parsing Framework for Text-to-SQL Task

  • Yu Zeng
  • Yan Gao
  • Jiaqi Guo
  • Bei Chen
  • Qian Liu
  • Jian-Guang Lou
  • Fei Teng
  • Dongmei Zhang

Neural semantic parsers usually fail to parse long and complicated utterances into nested SQL queries, due to the large search space. In this paper, we propose a novel recursive semantic parsing framework called RECPARSER to generate the nested SQL query layer-by-layer. It decomposes the complicated nested SQL query generation problem into several progressive non-nested SQL query generation problems. Furthermore, we propose a novel Question Decomposer module to explicitly encourage RECPARSER to focus on different components of an utterance when predicting SQL queries of different layers. Experiments on the Spider dataset show that our approach is more effective compared to the previous works at predicting the nested SQL queries. In addition, we achieve an overall accuracy that is comparable with state-of-the-art approaches.

AAAI Conference 2019 Conference Paper

FANDA: A Novel Approach to Perform Follow-Up Query Analysis

  • Qian Liu
  • Bei Chen
  • Jian-Guang Lou
  • Ge Jin
  • Dongmei Zhang

Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with 1000 query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics.

AAAI Conference 2018 Conference Paper

Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence

  • Qian Liu
  • Heyan Huang
  • Guangquan Zhang
  • Yang Gao
  • Junyu Xuan
  • Jie Lu

Representing the semantics of words is a fundamental task in text processing. Several research studies have shown that text and knowledge bases (KBs) are complementary sources for word embedding learning. Most existing methods only consider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we propose a semantic structure-based word embedding method, and introduce concept convergence and word divergence to reveal semantic structures in the word embedding learning process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method outperforms state-of-the-art methods, including the methods trained solely on the corpus, and others trained on the corpus and the KBs.

AAAI Conference 2016 Conference Paper

Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations

  • Qian Liu
  • Bing Liu
  • Yuanlin Zhang
  • Doo Soon Kim
  • Zhiqiang Gao

Aspect extraction is a key task of fine-grained opinion mining. Although it has been studied by many researchers, it remains to be highly challenging. This paper proposes a novel unsupervised approach to make a major improvement. The approach is based on the framework of lifelong learning and is implemented with two forms of recommendations that are based on semantic similarity and aspect associations respectively. Experimental results using eight review datasets show the effectiveness of the proposed approach.

IJCAI Conference 2015 Conference Paper

Automated Rule Selection for Aspect Extraction in Opinion Mining

  • Qian Liu
  • Zhiqiang Gao
  • Bing Liu
  • Yuanlin Zhang

Aspect extraction aims to extract fine-grained opinion targets from opinion texts. Recent work has shown that the syntactical approach, which employs rules about grammar dependency relations between opinion words and aspects, performs quite well. This approach is highly desirable in practice because it is unsupervised and domain independent. However, the rules need to be carefully selected and tuned manually so as not to produce too many errors. Although it is easy to evaluate the accuracy of each rule automatically, it is not easy to select a set of rules that produces the best overall result due to the overlapping coverage of the rules. In this paper, we propose a novel method to select an effective set of rules. To our knowledge, this is the first work that selects rules automatically. Our experiment results show that the proposed method can select a subset of a given rule set to achieve significantly better results than the full rule set and the existing state-of-the-art CRF-based supervised method.

IJCAI Conference 2015 Conference Paper

Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction

  • Xiao-Yuan Jing
  • Qian Liu
  • Fei Wu
  • Baowen Xu
  • Yangping Zhu
  • Songcan Chen

Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such as text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semisupervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2 MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2 MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2 MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2 MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.

AAAI Conference 2014 Conference Paper

Uncorrelated Multi-View Discrimination Dictionary Learning for Recognition

  • Xiao-Yuan Jing
  • Rui-Min Hu
  • Fei Wu
  • Xi-Lin Chen
  • Qian Liu
  • Yong-Fang Yao

Dictionary learning (DL) has now become an important feature learning technique that owns state-of-the-art recognition performance. Due to sparse characteristic of data in real-world applications, DL uses a set of learned dictionary bases to represent the linear decomposition of a data point. Fisher discrimination DL (FDDL) is a representative supervised DL method, which constructs a structured dictionary whose atoms correspond to the class labels. Recent years have witnessed a growing interest in multi-view (more than two views) feature learning techniques. Although some multi-view (or multi-modal) DL methods have been presented, there still exists much room for improvement. How to enhance the total discriminability of dictionaries and reduce their redundancy is a crucial research topic. To boost the performance of multi-view DL technique, we propose an uncorrelated multi-view discrimination DL (UMD2 L) approach for recognition. By making dictionary atoms correspond to the class labels such that the obtained reconstruction error is discriminative, UMD2 L aims to jointly learn multiple dictionaries with totally favorable discriminative power. Furthermore, we design the uncorrelated constraint for multi-view DL, so as to reduce the redundancy among dictionaries learned from different views. Experiments on several public datasets demonstrate the effectiveness of the proposed approach.