Arrow Research search

Author name cluster

Nai Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

NeurIPS Conference 2025 Conference Paper

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

  • Jingmin An
  • Yilong Song
  • Ruolin Yang
  • Nai Ding
  • Lingxi Lu
  • Yuxuan Wang
  • Wei Wang
  • Chu Zhuang

Large Language Models (LLMs) demonstrate human-level or even superior language abilities, effectively modeling syntactic structures, yet the specific computational units responsible remain unclear. A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e. g. , individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. Our results show that models such as GPT-2, Gemma, Gemma 2, Llama 2, Llama 3. 1, and GLM-4 process syntax in analogous layers, while the human brain relies on distinct cortical regions for different syntactic levels. Representational similarity analysis reveals a stronger alignment between LLM representations and the left hemisphere of the brain (dominant in language processing). Notably, upgraded models exhibit divergent trends: Gemma 2 shows greater brain similarity than Gemma, while Llama 3. 1 shows less alignment with the brain compared to Llama 2. These findings offer new insights into the interpretability of LLM behavioral improvements, raising questions about whether these advancements are driven by human-like or non-human-like mechanisms, and establish HFTP as a valuable tool bridging computational linguistics and cognitive neuroscience. This project is available at https: //github. com/LilTiger/HFTP.

YNIMG Journal 2024 Journal Article

Cortical encoding of hierarchical linguistic information when syllabic rhythms are obscured by echoes

  • Cheng Luo
  • Nai Ding

In speech perception, low-frequency cortical activity tracks hierarchical linguistic units (e.g., syllables, phrases, and sentences) on top of acoustic features (e.g., speech envelope). Since the fluctuation of speech envelope typically corresponds to the syllabic boundaries, one common interpretation is that the acoustic envelope underlies the extraction of discrete syllables from continuous speech for subsequent linguistic processing. However, it remains unclear whether and how cortical activity encodes linguistic information when the speech envelope does not provide acoustic correlates of syllables. To address the issue, we introduced a frequency-tagging speech stream where the syllabic rhythm was obscured by echoic envelopes and investigated neural encoding of hierarchical linguistic information using electroencephalography (EEG). When listeners attended to the echoic speech, cortical activity showed reliable tracking of syllable, phrase, and sentence levels, among which the higher-level linguistic units elicited more robust neural responses. When attention was diverted from the echoic speech, reliable neural tracking of the syllable level was also observed in contrast to deteriorated neural tracking of the phrase and sentence levels. Further analyses revealed that the envelope aligned with the syllabic rhythm could be recovered from the echoic speech through a neural adaptation model, and the reconstructed envelope yielded higher predictive power for the neural tracking responses than either the original echoic envelope or anechoic envelope. Taken together, these results suggest that neural adaptation and attentional modulation jointly contribute to neural encoding of linguistic information in distorted speech where the syllabic rhythm is obscured by echoes.

YNIMG Journal 2024 Journal Article

Exploring the clinical diagnostic value of linguistic learning ability in patients with disorders of consciousness using electrooculography

  • Xiangyue Xiao
  • Junhua Ding
  • Mingyan Yu
  • Zhicai Dong
  • Sara Cruz
  • Nai Ding
  • Charlène Aubinet
  • Steven Laureys

For patients with disorders of consciousness (DoC), accurate assessment of residual consciousness levels and cognitive abilities is critical for developing appropriate rehabilitation interventions. In this study, we investigated the potential of electrooculography (EOG) in assessing language processing abilities and consciousness levels. Patients' EOG data and related electrophysiological data were analysed before and after explicit language learning. The results showed distinct differences in vocabulary learning patterns among patients with varying levels of consciousness. While minimally conscious patients showed significant neural tracking of artificial words and notable learning effects similar to those observed in healthy controls, whereas patients with unresponsive wakefulness syndrome did not show such effects. Correlation analysis further indicated that EOG detected vocabulary learning effects with comparable validity to electroencephalography, reinforcing the credibility of EOG indicator as a diagnostic tool. Critically, EOG also revealed significant correlations between individual patients' linguistic learning performance and their Oromotor/verbal function as assessed through behavioural scales. In conclusion, this study explored the differences in language processing abilities among patients with varying consciousness levels. By demonstrating the utility of EOG in evaluating consciousness and detecting vocabulary learning effects, as well as its potential to guide personalised rehabilitation, our findings indicate that EOG indicators show promise as a rapid, accurate and effective additional tool for diagnosing and managing patients with DoC.

AAAI Conference 2023 Conference Paper

Adjective Scale Probe: Can Language Models Encode Formal Semantics Information?

  • Wei Liu
  • Ming Xiang
  • Nai Ding

It is an open question what semantic representations transformer-based language models can encode and whether they have access to more abstract aspects of semantic meaning. Here, we propose a diagnostic dataset to investigate how well language models understand the degree semantics of adjectives. In the dataset, referred as the Adjective Scale Probe (ASP), we semi-automatically generate 8 tests of Natural Language Inference (NLI) questions to test 8 key capabilities of adjective interpretation. We apply the ASP dataset to evaluate the performance of 3 language models, i.e., BERT, DeBERTa, and T0. It is found that language models perform below the majority baseline for most tests of the ASP, even when the models have been fine-tuned to achieve high performance on the large-scale MNLI dataset. But after we fine-tune the pre-trained models on a subset of the ASP, DeBERTa can achieve high performance on the untrained adjectives and untrained tests, suggesting that DeBERTa may have captured degree semantic information of adjectives through pre-training but it needs specific training data to learn how to apply such information to the current tasks. In sum, the ASP provides an easy-to-use method to test fine-grained formal semantic properties of adjectives, and reveals language models' abilities to access formal semantic information.

YNIMG Journal 2023 Journal Article

Dual interaction between heartbeat-evoked responses and stimuli

  • Yihui Zhang
  • Jianfeng Zhang
  • Musi Xie
  • Nai Ding
  • Yang Zhang
  • Pengmin Qin

Heartbeat-evoked responses (HERs) can interact with external stimuli and play a crucial role in shaping perception, self-related processes, and emotional processes. On the one hand, the external stimulus could modulate HERs. On the other hand, the HERs could affect cognitive processing of the external stimulus. Whether the same neural mechanism underlies these two processes, however, remains unclear. Here, we investigated this interactive mechanism by measuring HERs using magnetoencephalography (MEG) and two name perception tasks. Specifically, we tested (1) how hearing a subject's own name (SON) modulates HERs and (2) how the judgment of an SON is biased by prestimulus HERs. The results showed a dual interaction between HERs and SON. In particular, SON can modulate HERs for heartbeats occurring from 200 to 1200 ms after SON presentation. In addition, prestimulus HERs can bias the SON judgment when a stimulus is presented. Importantly, MEG activities from these two types of interactions differed in spatial and temporal patterns, suggesting that they may be associated with distinct neural pathways. These findings extend our understanding of brain-heart interactions.

YNIMG Journal 2022 Journal Article

Asymmetrical cross-modal influence on neural encoding of auditory and visual features in natural scenes

  • Wenyuan Yu
  • Wenhui Sun
  • Nai Ding

Natural scenes contain multi-modal information, which is integrated to form a coherent perception. Previous studies have demonstrated that cross-modal information can modulate neural encoding of low-level sensory features. These studies, however, mostly focus on the processing of single sensory events or rhythmic sensory sequences. Here, we investigate how the neural encoding of basic auditory and visual features is modulated by cross-modal information when the participants watch movie clips primarily composed of non-rhythmic events. We presented audiovisual congruent and audiovisual incongruent movie clips, and since attention can modulate cross-modal interactions, we separately analyzed high- and low-arousal movie clips. We recorded neural responses using electroencephalography (EEG), and employed the temporal response function (TRF) to quantify the neural encoding of auditory and visual features. The neural encoding of sound envelope is enhanced in the audiovisual congruent condition than the incongruent condition, but this effect is only significant for high-arousal movie clips. In contrast, audiovisual congruency does not significantly modulate the neural encoding of visual features, e.g., luminance or visual motion. In summary, our findings demonstrate asymmetrical cross-modal interactions during the processing of natural scenes that lack rhythmicity: Congruent visual information enhances low-level auditory processing, while congruent auditory information does not significantly modulate low-level visual processing.

YNIMG Journal 2022 Journal Article

Delta-band neural activity primarily tracks sentences instead of semantic properties of words

  • Yuhan Lu
  • Peiqing Jin
  • Xunyi Pan
  • Nai Ding

Human language is generally combinatorial: Words are combined into sentences to flexibly convey meaning. How the brain represents sentences, however, remains debated. Recently, it has been shown that delta-band cortical activity correlates with the sentential structure of speech. It remains debated, however, whether delta-band cortical tracking of sentences truly reflects mental representations of sentences or is caused by neural encoding of semantic properties of individual words. The current study investigates whether delta-band neural tracking of speech can be explained by semantic properties of individual words. Cortical activity is recorded using electroencephalography (EEG) when participants listen to sentences repeating at 1 Hz and word lists. The semantic properties of individual words, simulated using a word2vec model, predict a stronger 1 Hz response to word lists than to sentences. When listeners perform a word-monitoring task that does not require sentential processing, the 1 Hz response to word lists, however, is much weaker than the 1 Hz response to sentences, contradicting the prediction of the lexical semantics model. When listeners are explicitly asked to parse word lists into multi-word chunks, however, cortical activity can reliably track the multi-word chunks. Taken together, these results suggest that delta-band neural responses to speech cannot be fully explained by the semantic properties of single words and are potentially related to the neural representation of multi-word chunks.

IJCAI Conference 2022 Conference Paper

On Tracking Dialogue State by Inheriting Slot Values in Mentioned Slot Pools

  • Zhoujian Sun
  • Zhengxing Huang
  • Nai Ding

Dialogue state tracking (DST) is a component of the task oriented dialogue system. It is responsible for extracting and managing slots, where each slot represents a part of the information to accomplish a task, and slot value is updated recurrently in each dialogue turn. However, many DST models cannot update slot values appropriately. These models may repeatedly inherit wrong slot values extracted in previous turns, resulting in the fail of the entire DST task. They cannot update indirectly mentioned slots well, either. This study designed a model with a mentioned slot pool (MSP) to tackle the update problem. The MSP is a slot specific memory that records all mentioned slot values that may be inherited, and our model updates slot values according to the MSP and the dialogue context. Our model rejects inheriting the previous slot value when it predicates the value is wrong. Then, it extracts the slot value from the current dialogue context. As the contextual information accumulates, the new value is more likely to be correct. It also can track the indirectly mentioned slot by picking a value from the MSP. Experimental results showed our model reached state of the art DST performance on MultiWOZ datasets.

YNIMG Journal 2022 Journal Article

Working memory asymmetrically modulates auditory and linguistic processing of speech

  • Yiguang Liu
  • Cheng Luo
  • Jing Zheng
  • Junying Liang
  • Nai Ding

Working memory load can modulate speech perception. However, since speech perception and working memory are both complex functions, it remains elusive how each component of the working memory system interacts with each speech processing stage. To investigate this issue, we concurrently measure how the working memory load modulates neural activity tracking three levels of linguistic units, i.e., syllables, phrases, and sentences, using a multiscale frequency-tagging approach. Participants engage in a sentence comprehension task and the working memory load is manipulated by asking them to memorize either auditory verbal sequences or visual patterns. It is found that verbal and visual working memory load modulate speech processing in similar manners: Higher working memory load attenuates neural activity tracking of phrases and sentences but enhances neural activity tracking of syllables. Since verbal and visual WM load similarly influence the neural responses to speech, such influences may derive from the domain-general component of WM system. More importantly, working memory load asymmetrically modulates lower-level auditory encoding and higher-level linguistic processing of speech, possibly reflecting reallocation of attention induced by mnemonic load.

YNIMG Journal 2020 Journal Article

Visual target detection in a distracting background relies on neural encoding of both visual targets and background

  • Cheng Luo
  • Nai Ding

The ability to detect visual targets in complex background varies across individuals and are affected by factors such as stimulus saliency and top-down attention. Here, we investigated how the saliency of visual background (naturalistic cartoon video vs. blank screen) and top-down attention (single vs. dual tasks) separately affect individual ability to detect visual targets. Behaviorally, we found that target detection accuracy decreased and reaction time elongated when the background was salient or during dual tasking. The EEG response to visual background was recorded using a novel stimulus tagging technique. This response was strongest in occipital electrodes and was sensitive to background saliency but not dual tasking. In contrast, the event-related potential (ERP) evoked by the visual target was strongest in central electrodes, and was affected by both background saliency and dual tasking. With a cartoon background, the EEG responses to visual targets, presented in the central visual field, and the EEG responses to peripheral visual background could both predict individual target detection performance. When these two responses were combined, better prediction was achieved. These results suggest that neural processing of visual targets and background jointly contribute to individual visual target detection performance.

YNIMG Journal 2019 Journal Article

Auditory and language contributions to neural encoding of speech features in noisy environments

  • Jiajie Zou
  • Jun Feng
  • Tianyong Xu
  • Peiqing Jin
  • Cheng Luo
  • Jianfeng Zhang
  • Xunyi Pan
  • Feiyan Chen

Recognizing speech in noisy environments is a challenging task that involves both auditory and language mechanisms. Previous studies have demonstrated human auditory cortex can reliably track the temporal envelope of speech in noisy environments, which provides a plausible neural basis for noise-robust speech recognition. The current study aimed at teasing apart auditory and language contributions to noise-robust envelope tracking by comparing the neural responses of 2 groups of listeners, i. e. , native listeners and foreign listeners who did not understand the testing language. In the experiment, speech signals were mixed with spectrally matched stationary noise at 4 intensity levels and listeners’ neural responses were recorded using electroencephalography (EEG). When the noise intensity increased, the neural response gain increased in both groups of listeners, demonstrating auditory gain control. Language comprehension generally reduced the response gain and envelope-tracking precision, and modulated the spatial and temporal profile of envelope-tracking activity. Based on the spatio-temporal dynamics of envelope-tracking activity, a linear classifier can jointly decode the 2 listener groups and 4 levels of noise intensity. Altogether, the results showed that without feedback from language processing, auditory mechanisms such as gain control can lead to a noise-robust speech representation. High-level language processing modulated the spatio-temporal profile of the neural representation of speech envelope, instead of generally enhancing the envelope representation.

YNIMG Journal 2017 Journal Article

Time-domain analysis of neural tracking of hierarchical linguistic structures

  • Wen Zhang
  • Nai Ding

When listening to continuous speech, cortical activity measured by MEG concurrently follows the rhythms of multiple linguistic structures, e. g. , syllables, phrases, and sentences. This phenomenon was previously characterized in the frequency domain. Here, we investigate the waveform of neural activity tracking linguistic structures in the time domain and quantify the coherence of neural response phases over subjects listening to the same stimulus. These analyses are achieved by decomposing the multi-channel MEG recordings into components that maximize the correlation between neural response waveforms across listeners. Each MEG component can be viewed as the recording from a virtual sensor that is spatially tuned to a cortical network showing coherent neural activity over subjects. This analysis reveals information not available from previous frequency-domain analysis of MEG global field power: First, concurrent neural tracking of hierarchical linguistic structures emerges at the beginning of the stimulus, rather than slowly building up after repetitions of the same sentential structure. Second, neural tracking of the sentential structure is reflected by slow neural fluctuations, rather than, e. g. , a series of short-lasting transient responses at sentential boundaries. Lastly and most importantly, it shows that the MEG responses tracking the syllabic rhythm are spatially separable from the MEG responses tracking the sentential and phrasal rhythms.

YNIMG Journal 2014 Journal Article

Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

  • Nai Ding
  • Monita Chatterjee
  • Jonathan Z. Simon

Speech recognition is robust to background noise. One underlying neural mechanism is that the auditory system segregates speech from the listening background and encodes it reliably. Such robust internal representation has been demonstrated in auditory cortex by neural activity entrained to the temporal envelope of speech. A paradox, however, then arises, as the spectro-temporal fine structure rather than the temporal envelope is known to be the major cue to segregate target speech from background noise. Does the reliable cortical entrainment in fact reflect a robust internal “synthesis” of the attended speech stream rather than direct tracking of the acoustic envelope? Here, we test this hypothesis by degrading the spectro-temporal fine structure while preserving the temporal envelope using vocoders. Magnetoencephalography (MEG) recordings reveal that cortical entrainment to vocoded speech is severely degraded by background noise, in contrast to the robust entrainment to natural speech. Furthermore, cortical entrainment in the delta-band (1–4Hz) predicts the speech recognition score at the level of individual listeners. These results demonstrate that reliable cortical entrainment to speech relies on the spectro-temporal fine structure, and suggest that cortical entrainment to the speech envelope is not merely a representation of the speech envelope but a coherent representation of multiscale spectro-temporal features that are synchronized to the syllabic and phrasal rhythms of speech.