Arrow Research search

Author name cluster

Jingyu Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
2 author rows

Possible papers

31

JBHI Journal 2026 Journal Article

Dual-Cross Tri-Level Routing Transformer Based Metric Learning Network for Epileptic Seizure Prediction Using a Single-Channel iEEG

  • Yifan Wang
  • Weidong Yan
  • Yulan Ma
  • Liang Qiao
  • Tao Yu
  • Jingyu Liu

With the development of deep brain stimulation technique, single-channel intracranial electroencephalography (iEEG) based seizure prediction is a necessary and urgent needed tool for epilepsy closed-loop neuromodulation. However, previous prediction methods based on multi-channel scalp signals heavily relied on the spatial information, failing to fully exploit the interdependencies between temporal scales and spectral rhythms of single-channel iEEG. Additionally, current contrastive learning strategies can lead to model overfitting by excessively learning the feature distances in small samples, limiting the precision of seizure prediction. To tackle above issues, based on a single-channel iEEG, we propose a novel dual-cross tri-level routing transformer based metric learning network (DC-TRT-MLNet) for epileptic seizure prediction. First, a scale-rhythm dual-cross (DC) graph attention network is introduced to construct the dependent relationships across multi-scale temporal and multi-rhythm spectral features. Second, we design a tri-level routing transformer (TRT) network to comprehensively refine the most seizure-potential routing features while eliminating redundant information. Finally, a hard triplet optimization based metric learning (ML) strategy is developed to iteratively optimize the intra-class and inter-class distances of inter-ictal and pre-ictal routing features. Competitive experimental results on a private Xuanwu Single-Channel iEEG dataset validate the effectiveness of our proposed method, demonstrating the superior prediction performance of our DC-TRT-MLNet compared with the state-of-the-art methods. Our study may offer a new solution for intracranial single-channel seizure prediction.

TMLR Journal 2026 Journal Article

SSFL: Discovering Sparse Unified Subnetworks at Initialization for Efficient Federated Learning

  • Riyasat Ohib
  • Bishal Thapaliya
  • Gintare Karolina Dziugaite
  • Jingyu Liu
  • Vince D. Calhoun
  • Sergey Plis

In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are trained and communicated each round between the clients and the server. On standard benchmarks including CIFAR-10, CIFAR-100, and Tiny-ImageNet, SSFL consistently improves the accuracy–sparsity trade-off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline, while reducing communication costs by $2 \times$ relative to dense FL. Finally, in a real-world federated learning deployment, SSFL delivers over $2.3 \times$ faster communication time, underscoring its practical efficiency.

AAAI Conference 2025 Conference Paper

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

  • Jingyu Liu
  • Minquan Wang
  • Ye Ma
  • Bo Wang
  • Aozhu Chen
  • Quan Chen
  • Peng Jiang
  • Xirong Li

Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to such moments, or video decoration with SFX (VDSFX), is crucial for enhancing user engaging experience. Previous work adds SFX to videos by video-to-SFX matching at a holistic level, lacking the ability of adding SFX to a specific moment. Meanwhile, previous studies on video highlight detection or video moment retrieval consider only moment localization, leaving moment to SFX matching untouched. By contrast, we propose in this paper D&M, a unified method that accomplishes key moment detection and moment-to-SFX matching simultaneously. Moreover, for the new VDSFX task we build a large-scale dataset SFX-Moment from an E-commerce video creation platform. For a fair comparison, we build competitive baselines by extending a number of current video moment detection methods to the new task. Extensive experiments on SFX-Moment show the superior performance of the proposed method over the baselines.

ICLR Conference 2025 Conference Paper

Palmbench: a comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

  • Yilong Li
  • Jingyu Liu
  • Hao Zhang
  • M. Badri Narayanan
  • Utkarsh Sharma
  • Shuai Zhang
  • Yijing Zeng
  • Jayaram Raghuram

Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network connection. Recent advancements have facilitated the local deployment of LLMs. However, local deployment also presents challenges, particularly in balancing quality (generative performance), latency, and throughput within the hardware constraints of mobile devices. In this paper, we introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate LLMs on mobile devices. We provide a comprehensive benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities. Unlike traditional benchmarks that assess full-scale models on high-end GPU clusters, we focus on evaluating resource efficiency (memory and power consumption) and harmful output for compressed models on mobile devices. Our key observations include: i) differences in energy efficiency and throughput across mobile platforms; ii) the impact of quantization on memory usage, GPU execution time, and power consumption; and iii) accuracy and performance degradation of quantized models compared to their non-quantized counterparts; and iv) the frequency of hallucinations and toxic content generated by compressed LLMs on mobile devices.

JBHI Journal 2025 Journal Article

Schizophrenia Detection Based on Morphometry of Hippocampus and Amygdala

  • Qunxi Dong
  • Yuhang Sheng
  • Junru Zhu
  • Zhigang Li
  • Weijia Liu
  • Jingyu Liu
  • Yalin Wang
  • Bin Hu

Schizophrenia (SZ) is a severe mental disorder characterized by hallucinations, delusions, cognitive impairments, and social withdrawal. It leads to a series of brain abnormalities, particularly the deformation of the hippocampus and amygdala, which are highly associated with emotion, memory, and motivation. Most previous studies have used the hippocampal and amygdaloid volume, whereas surface-based morphometry reflects nuclear deformation more finely, but it is unclear the hippocampal and amygdaloid morphometry relates to schizophrenic pathology and its potential as a biomarker. In this study, we extracted individual multivariate morphometry statistics (MMS) of hippocampus and amygdala from MRI images and analyzed the morphometric differences between groups. After dictionary learning and max pooling, we obtain reduced dimensional features and use machine learning algorithms for individual diagnosis. The results showed that the hippocampus of the schizophrenia group was significantly atrophied bilaterally and the atrophied areas were symmetrical. Subregions of the amygdala are both atrophied and expanded, and in particular, the right amygdala shows a greater degree and extent of deformation. Using the random forest classifier, the accuracy of classification using hippocampal and amygdaloid morphometric features are 94. 52% and 94. 57%, respectively, and the accuracy of classification combining the two morphometric features reached 96. 57%. Our study demonstrates the efficacy of MMS in identifying morphometric differences of the hippocampus and amygdala between healthy controls and schizophrenic, and these findings emphasize the potential of MMS as a reliable biomarker for the diagnosis of schizophrenia.

ICML Conference 2025 Conference Paper

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

  • Jingyu Liu
  • Beidi Chen
  • Ce Zhang 0001

Improving time-to-first-token (TTFT) is an essentially important objective in modern large language model (LLM) inference engines. Optimizing TTFT directly results in higher maximal QPS and meets the requirements of many critical applications. However, boosting TTFT is notoriously challenging since it is compute-bounded and the performance bottleneck shifts from the self-attention that many prior works focus on to the MLP part. In this work, we present SpecPrefill, a training free framework that accelerates the inference TTFT for both long and medium context queries based on the following insight: LLMs are generalized enough to preserve the quality given only a carefully chosen subset of prompt tokens. At its core, SpecPrefill leverages a lightweight model to speculate locally important tokens based on the context. These tokens, along with the necessary positional information, are then sent to the main model for processing. We evaluate SpecPrefill with a diverse set of tasks, followed by a comprehensive benchmarking of performance improvement both in a real end-to-end setting and ablation studies. SpecPrefill manages to serve Llama-3. 1-405B-Instruct-FP8 with up to 7$\times$ maximal end-to-end QPS on real downstream tasks and 7. 66$\times$ TTFT improvement.

AAAI Conference 2025 Conference Paper

TC-LLaVA: Rethinking the Transfer of LLava from Image to Video Understanding with Temporal Considerations

  • Mingze Gao
  • Jingyu Liu
  • Mingda Li
  • Jiangtao Xie
  • Qingbin Liu
  • Kevin Zhao
  • Xi Chen
  • Hui Xiong

Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most efforts concentrate on enhancing the vision encoder and projector components, while the core part, Large Language Models (LLMs), remains comparatively under-explored. In this paper, we propose two strategies to enhance the model's capability in video understanding tasks by improving inter-layer attention computation in LLMs. Specifically, the first approach focuses on the enhancement of Rotary Position Embedding (RoPE) with Temporal-Aware Dual RoPE, which introduces temporal position information to strengthen the MLLM's temporal modeling capabilities while preserving the relative position relationships of both visual and text tokens. The second approach involves enhancing the Attention Mask with the Frame-wise Block Causal Attention Mask, a simple yet effective method that broadens visual token interactions within and across video frames while maintaining the causal inference mechanism. Based on these proposed methods, we adapt LLaVA for video understanding tasks, naming it Temporal-Considered LLaVA (TC-LLaVA). Our TC-LLaVA achieves new state-of-the-art performance across various video understanding benchmarks with only supervised fine-tuning (SFT) on video-related datasets.

ICLR Conference 2025 Conference Paper

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

  • Yongxin Guo 0001
  • Jingyu Liu
  • Mingda Li
  • Qingbin Liu
  • Xi Chen 0003
  • Xiaoying Tang 0002

Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, there is a growing trend in employing video LLMs for VTG tasks. However, current video LLM-based methods rely exclusively on natural language generation, lacking the ability to model the clear structure inherent in videos, which restricts their effectiveness in tackling VTG tasks. To address this issue, this paper first formally introduces causal event modeling framework, which represents video LLM outputs as sequences of events, and predict the current event using previous events, video inputs, and textural instructions. Each event consists of three components: timestamps, salient scores, and textual captions. We then propose a novel task-interleaved video LLM called TRACE to effectively implement the causal event modeling framework in practice. The TRACE process visual frames, timestamps, salient scores, and text as distinct tasks, employing various encoders and decoding heads for each. Task tokens are arranged in an interleaved sequence according to the causal event modeling framework's formulation. Extensive experiments on various VTG tasks and datasets demonstrate the superior performance of TRACE compared to state-of-the-art video LLMs. Our model and code are avaliable at \url{https://github.com/gyxxyg/TRACE}.

AAAI Conference 2025 Conference Paper

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

  • Yongxin Guo
  • Jingyu Liu
  • Mingda Li
  • Dingxin Cheng
  • Xiaoying Tang
  • Dianbo Sui
  • Qingbin Liu
  • Xi Chen

Video Temporal Grounding (VTG) strives to accurately pinpoint event timestamps in a specific video using linguistic queries, significantly impacting downstream tasks like video browsing and editing. Unlike traditional task-specific models, Video Large Language Models (video LLMs) can handle multiple tasks concurrently in a zero-shot manner. Consequently, exploring the application of video LLMs for VTG tasks has become a burgeoning research area. However, despite considerable advancements in video content understanding, video LLMs often struggle to accurately pinpoint timestamps within videos, limiting their effectiveness in VTG tasks. To address this, we introduce VTG-LLM, a model designed to enhance video LLMs' timestamp localization abilities. Our approach includes: (1) effectively integrating timestamp knowledge into visual tokens; (2) incorporating absolute-time tokens to manage timestamp knowledge without concept shifts; and (3) introducing a lightweight, high-performance, slot-based token compression technique designed to accommodate the demands of a large number of frames to be sampled for VTG tasks. Additionally, we present VTG-IT-120K, a collection of publicly available VTG datasets that we have re-annotated to improve upon low-quality annotations. Our comprehensive experiments demonstrate the superior performance of VTG-LLM in comparison to other video LLM methods across a variety of VTG tasks.

JBHI Journal 2024 Journal Article

A Multiview Brain Network Transformer Fusing Individualized Information for Autism Spectrum Disorder Diagnosis

  • Qunxi Dong
  • Hongxin Cai
  • Zhigang Li
  • Jingyu Liu
  • Bin Hu

Functional connectivity (FC) networks, built from analyses of resting-state magnetic resonance imaging (rs-fMRI), serve as efficacious biomarkers for identifying Autism Spectrum Disorders (ASD) patients. Given the neurobiological heterogeneity across individuals and the unique presentation of ASD symptoms, the fusion of individualized information into diagnosis becomes essential. However, this aspect is overlooked in most methods. Furthermore, the existing methods typically focus on studying direct pairwise connections between brain ROIs, while disregarding interactions between indirectly connected neighbors. To overcome above challenges, we build common FC and individualized FC by tangent pearson embedding (TP) and common orthogonal basis extraction (COBE) respectively, and present a novel multiview brain transformer (MBT) aimed at effectively fusing common and indivinformation of subjects. MBT is mainly constructed by transformer layers with diffusion kernel (DK), fusion quality-inspired weighting module (FQW), similarity loss and orthonormal clustering fusion readout module (OCFRead). DK transformer can incorporate higher-order random walk methods to capture wider interactions among indirectly connected brain regions. FQW promotes adaptive fusion of features between views, and similarity loss and OCFRead are placed on the last layer to accomplish the ultimate integration of information. In our method, TP, DK and FQW modules all help to model wider connectivity in the brain that make up for the shortcomings of traditional methods. We conducted experiments on the public ABIDE dataset based on AAL and CC200 respectively. Our framework has shown promising results, outperforming state-of-the-art methods on both templates. This suggests its potential as a valuable approach for clinical ASD diagnosis.

TMLR Journal 2024 Journal Article

How Far Are We From AGI: Are LLMs All We Need?

  • Tao Feng
  • Chuanyang Jin
  • Jingyu Liu
  • Kunlun Zhu
  • Haoqin Tu
  • Zirui Cheng
  • Guanyu Lin
  • Jiaxuan You

The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI’s current offerings, catalyzing a movement towards Arti- ficial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing studies have reviewed specific advancements in AI and proposed potential paths to AGI, such as large language models (LLMs), they fall short of providing a thorough exploration of AGI’s definitions, objectives, and developmental trajectories. Unlike previous survey papers, this work goes beyond summarizing LLMs by addressing key questions about our progress toward AGI and outlining the strategies essential for its realization through comprehensive analysis, in-depth discussions, and novel insights. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status-quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.

ICML Conference 2024 Conference Paper

Perfect Alignment May be Poisonous to Graph Contrastive Learning

  • Jingyu Liu
  • Huayi Tang
  • Yong Liu 0018

Graph Contrastive Learning (GCL) aims to learn node representations by aligning positive pairs and separating negative ones. However, few of researchers have focused on the inner law behind specific augmentations used in graph-based learning. What kind of augmentation will help downstream performance, how does contrastive learning actually influence downstream tasks, and why the magnitude of augmentation matters so much? This paper seeks to address these questions by establishing a connection between augmentation and downstream performance. Our findings reveal that GCL contributes to downstream tasks mainly by separating different classes rather than gathering nodes of the same class. So perfect alignment and augmentation overlap which draw all intra-class samples the same can not fully explain the success of contrastive learning. Therefore, in order to understand how augmentation aids the contrastive learning process, we conduct further investigations into the generalization, finding that perfect alignment that draw positive pair the same could help contrastive loss but is poisonous to generalization, as a result, perfect alignment may not lead to best downstream performance, so specifically designed augmentation is needed to achieve appropriate alignment performance and improve downstream accuracy. We further analyse the result by information theory and graph spectrum theory and propose two simple but effective methods to verify the theories. The two methods could be easily applied to various GCL algorithms and extensive experiments are conducted to prove its effectiveness. The code is available at https: //github. com/somebodyhh1/GRACEIS

AAAI Conference 2024 Conference Paper

WaveNet: Tackling Non-stationary Graph Signals via Graph Spectral Wavelets

  • Zhirui Yang
  • Yulan Hu
  • Sheng Ouyang
  • Jingyu Liu
  • Shuqiang Wang
  • Xibo Ma
  • Wenhan Wang
  • Hanjing Su

In the existing spectral GNNs, polynomial-based methods occupy the mainstream in designing a filter through the Laplacian matrix. However, polynomial combinations factored by the Laplacian matrix naturally have limitations in message passing (e.g., over-smoothing). Furthermore, most existing spectral GNNs are based on polynomial bases, which struggle to capture the high-frequency parts of the graph spectral signal. Additionally, we also find that even increasing the polynomial order does not change this situation, which means polynomial-based models have a natural deficiency when facing high-frequency signals. To tackle these problems, we propose WaveNet, which aims to effectively capture the high-frequency part of the graph spectral signal from the perspective of wavelet bases through reconstructing the message propagation matrix. We utilize Multi-Resolution Analysis (MRA) to model this question, and our proposed method can reconstruct arbitrary filters theoretically. We also conduct node classification experiments on real-world graph benchmarks and achieve superior performance on most datasets. Our code is available at https://github.com/Bufordyang/WaveNet