Arrow Research search

Author name cluster

Wen Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

JBHI Journal 2025 Journal Article

Infusing Multi-Hop Medical Knowledge Into Smaller Language Models for Biomedical Question Answering

  • Jing Chen
  • Zhihua Wei
  • Wen Shen
  • Rui Shang

MedQA-USMLE is a challenging biomedical question answering (BQA) task, as its questions typically involve multi-hop reasoning. To solve this task, BQA systems should possess not only extensive medical professional knowledge but also strong medical reasoning capabilities. While state-of-the-art larger language models, such as Med-PaLM 2, have overcome this challenge, smaller language models (SLMs) still struggle with it. To bridge this gap, we introduces a multi-hop medical knowledge infusion (MHMKI) procedure to endow SLMs with medical reasoning capabilities. Specifically, we categorize MedQA-USMLE questions into distinct reasoning types, then tailor pre-training instances for each type of questions using the semi-structured information and hyperlinks of Wikipedia articles. To enable SLMs to efficiently capture the multi-hop knowledge contained in these instances, we design a reasoning chain masked language model to further pre-train BERT models. Moreover, we convert the pre-training instances into a composite question answering dataset for intermediate fine-tuning of GPT models. We evaluate MHMKI on six SLMs across five datasets spanning three BQA tasks. The results demonstrate that MHMKI consistently improves SLMs' performance, particularly on tasks requiring substantial medical reasoning. For instance, the accuracy of MedQA-USMLE shows a significant increase of 5. 3% on average.

NeurIPS Conference 2025 Conference Paper

Interpreting Arithmetic Reasoning in Large Language Models using Game-Theoretic Interactions

  • 蕾蕾 温
  • Liwei Zheng
  • Hongda Li
  • Lijun Sun
  • Zhihua Wei
  • Wen Shen

In recent years, large language models (LLMs) have made significant advancements in arithmetic reasoning. However, the internal mechanism of how LLMs solve arithmetic problems remains unclear. In this paper, we propose explaining arithmetic reasoning in LLMs using game-theoretic interactions. Specifically, we disentangle the output score of the LLM into numerous interactions between the input words. We quantify different types of interactions encoded by LLMs during forward propagation to explore the internal mechanism of LLMs for solving arithmetic problems. We find that (1) the internal mechanism of LLMs for solving simple one-operator arithmetic problems is their capability to encode operand-operator interactions and high-order interactions from input samples. Additionally, we find that LLMs with weak one-operator arithmetic capabilities focus more on background interactions. (2) The internal mechanism of LLMs for solving relatively complex two-operator arithmetic problems is their capability to encode operator interactions and operand interactions from input samples. (3) We explain the task-specific nature of the LoRA method from the perspective of interactions.

AAAI Conference 2024 Conference Paper

Batch Normalization Is Blind to the First and Second Derivatives of the Loss

  • Zhanpeng Zhou
  • Wen Shen
  • Huixin Chen
  • Ling Tang
  • Yuefeng Chen
  • Quanshi Zhang

We prove that when we do the Taylor series expansion of the loss function, the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. We also find that such a problem is caused by the standardization phase of the BN operation. We believe that proving the blocking of certain loss terms provides an analytic perspective for potential detects of a deep model with BN operations, although the blocking problem is not fully equivalent to significant damages in all tasks on benchmark datasets. Experiments show that the BN operation significantly affects feature representations in specific tasks.

AAAI Conference 2024 Conference Paper

Clarifying the Behavior and the Difficulty of Adversarial Training

  • Xu Cheng
  • Hao Zhang
  • Yue Xin
  • Wen Shen
  • Quanshi Zhang

Adversarial training is usually difficult to optimize. This paper provides conceptual and analytic insights into the difficulty of adversarial training via a simple theoretical study, where we derive an approximate dynamics of a recursive multi-step attack in a simple setting. Despite the simplicity of our theory, it still reveals verifiable predictions about various phenomena in adversarial training under real-world settings. First, compared to vanilla training, adversarial training is more likely to boost the influence of input samples with large gradient norms in an exponential manner. Besides, adversarial training also strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which is more likely to make network parameters oscillate and boosts the difficulty of adversarial training.

AAAI Conference 2024 Conference Paper

Explaining Generalization Power of a DNN Using Interactive Concepts

  • Huilin Zhou
  • Hao Zhang
  • Huiqi Deng
  • Dongrui Liu
  • Wen Shen
  • Shih-Han Chan
  • Quanshi Zhang

This paper explains the generalization power of a deep neural network (DNN) from the perspective of interactions. Although there is no universally accepted definition of the concepts encoded by a DNN, the sparsity of interactions in a DNN has been proved, i.e., the output score of a DNN can be well explained by a small number of interactions between input variables. In this way, to some extent, we can consider such interactions as interactive concepts encoded by the DNN. Therefore, in this paper, we derive an analytic explanation of inconsistency of concepts of different complexities. This may shed new lights on using the generalization power of concepts to explain the generalization power of the entire DNN. Besides, we discover that the DNN with stronger generalization power usually learns simple concepts more quickly and encodes fewer complex concepts. We also discover the detouring dynamics of learning complex concepts, which explains both the high learning difficulty and the low generalization power of complex concepts. The code will be released when the paper is accepted.

YNIMG Journal 2023 Journal Article

The iron burden of cerebral microbleeds contributes to brain atrophy through the mediating effect of white matter hyperintensity

  • Ke Lv
  • Yanzhen Liu
  • Yongsheng Chen
  • Sagar Buch
  • Ying Wang
  • Zhuo Yu
  • Huiying Wang
  • Chenxi Zhao

The goal of this work was to explore the total iron burden of cerebral microbleeds (CMBs) using a semi-automatic quantitative susceptibility mapping and to establish its effect on brain atrophy through the mediating effect of white matter hyperintensities (WMH). A total of 95 community-dwelling people were enrolled. Quantitative susceptibility mapping (QSM) combined with a dynamic programming algorithm (DPA) was used to measure the characteristics of 1309 CMBs. WMH were evaluated according to the Fazekas scale, and brain atrophy was assessed using a 2D linear measurement method. Histogram analysis was used to explore the distribution of CMBs susceptibility, volume, and total iron burden, while a correlation analysis was used to explore the relationship between volume and susceptibility. Stepwise regression analysis was used to analyze the risk factors for CMBs and their contribution to brain atrophy. Mediation analysis was used to explore the interrelationship between CMBs and brain atrophy. We found that the frequency distribution of susceptibility of the CMBs was Gaussian in nature with a mean of 201 ppb and a standard deviation of 84 ppb; however, the volume and total iron burden of CMBs were more Rician in nature. A weak but significant correlation between the susceptibility and volume of CMBs was found (r = -0.113, P < 0.001). The periventricular WMH (PVWMH) was a risk factor for the presence of CMBs (number: β = 0.251, P = 0.014; volume: β = 0.237, P = 0.042; total iron burden: β = 0.238, P = 0.020) and was a risk factor for brain atrophy (third ventricle width: β = 0.325, P = 0.001; Evans's index: β = 0.323, P = 0.001). PVWMH had a significant mediating effect on the correlation between CMBs and brain atrophy. In conclusion, QSM along with the DPA can measure the total iron burden of CMBs. PVWMH might be a risk factor for CMBs and may mediate the effect of CMBs on brain atrophy.

AIIM Journal 2022 Journal Article

Brain gray matter nuclei segmentation on quantitative susceptibility mapping using dual-branch convolutional neural network

  • Chao Chai
  • Pengchong Qiao
  • Bin Zhao
  • Huiying Wang
  • Guohua Liu
  • Hong Wu
  • Wen Shen
  • Chen Cao

Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM). To quantitatively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we proposed a dual-branch residual-structured U-Net (DB-ResUNet) based on 3D convolutional neural network (CNN) to automatically segment such brain gray matter nuclei. Due to memory limit, 3D-CNN-based methods typically adopted image patches, instead of the whole volumetric image, which, however, ignored the spatial contextual information of the neighboring patches, and therefore led to the accuracy loss. To better tradeoff segmentation accuracy and the memory efficiency, the proposed DB-ResUNet incorporated patches with different resolutions. By jointly using QSM and 3D T1 weighted imaging (T1WI) as inputs, the proposed method was able to achieve better segmentation accuracy over its single-branch counterpart, as well as the conventional atlas-based method and the classical 3D CNN structures. The susceptibility values and the volumes were also measured, which indicated that the measurements from the proposed DB-ResUNet was able to present high correlation with values from the manually annotated regions of interest.

IJCAI Conference 2021 Conference Paper

Interpretable Compositional Convolutional Neural Networks

  • Wen Shen
  • Zhihua Wei
  • Shikun Huang
  • Binbin Zhang
  • Jiaqi Fan
  • Ping Zhao
  • Quanshi Zhang

This paper proposes a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN, in order to learn filters that encode meaningful visual patterns in intermediate convolutional layers. In a compositional CNN, each filter is supposed to consistently represent a specific compositional object part or image region with a clear meaning. The compositional CNN learns from image labels for classification without any annotations of parts or regions for supervision. Our method can be broadly applied to different types of CNNs. Experiments have demonstrated the effectiveness of our method. The code will be released when the paper is accepted.

NeurIPS Conference 2021 Conference Paper

Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

  • Wen Shen
  • Qihan Ren
  • Dongrui Liu
  • Quanshi Zhang

In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training. The code will be released when this paper is accepted.

AAAI Conference 2019 Conference Paper

Multi-Winner Contests for Strategic Diffusion in Social Networks

  • Wen Shen
  • Yang Feng
  • Cristina V. Lopes

Strategic diffusion encourages participants to take active roles in promoting stakeholders’ agendas by rewarding successful referrals. As social media continues to transform the way people communicate, strategic diffusion has become a powerful tool for stakeholders to influence people’s decisions or behaviors for desired objectives. Existing reward mechanisms for strategic diffusion are usually either vulnerable to falsename attacks or not individually rational for participants that have made successful referrals. Here, we introduce a novel multi-winner contests (MWC) mechanism for strategic diffusion in social networks. The MWC mechanism satisfies several desirable properties, including false-name-proofness, individual rationality, budget constraint, monotonicity, and subgraph constraint. Numerical experiments on four real-world social network datasets demonstrate that stakeholders can significantly boost participants’ aggregated efforts with proper design of competitions. Our work sheds light on how to design manipulation-resistant mechanisms with appropriate contests.

AAMAS Conference 2018 Conference Paper

Information Design in Crowdfunding under Thresholding Policies

  • Wen Shen
  • Jacob W. Crandall
  • Ke Yan
  • Cristina V. Lopes

Crowdfunding has emerged as a prominent way for entrepreneurs to secure funding without sophisticated intermediation. In crowdfunding, an entrepreneur often has to decide how to disclose the campaign status in order to collect as many contributions as possible. Such decisions are difficult to make primarily due to incomplete information. We propose information design as a tool to help the entrepreneur to improve revenue by influencing backers’ beliefs. We introduce a heuristic algorithm to dynamically compute information-disclosure policies for the entrepreneur, followed by an empirical evaluation to demonstrate its competitiveness over the widely-adopted immediate-disclosure policy. Our results demonstrate that the immediate-disclosure policy is not optimal when backers follow thresholding policies despite its ease of implementation. With appropriate heuristics, an entrepreneur can benefit from dynamic information disclosure. Our work sheds light on information design in a dynamic setting where agents make decisions using thresholding policies.

YNICL Journal 2017 Journal Article

Decreased susceptibility of major veins in mild traumatic brain injury is correlated with post-concussive symptoms: A quantitative susceptibility mapping study

  • Chao Chai
  • Rui Guo
  • Chao Zuo
  • Linlin Fan
  • Saifeng Liu
  • Tianyi Qian
  • E. Mark Haacke
  • Shuang Xia

Cerebral venous oxygen saturation (SvO2) is an important biomarker of brain function. In this study, we aimed to explore the relative changes of regional cerebral SvO2 among axonal injury (AI) patients, non-AI patients and healthy controls (HCs) using quantitative susceptibility mapping (QSM). 48 patients and 32 HCs were enrolled. The patients were divided into two groups depending on the imaging based evidence of AI. QSM was used to measure the susceptibility of major cerebral veins. Nonparametric testing was performed for susceptibility differences among the non-AI patient group, AI patient group and healthy control group. Correlation was performed between the susceptibility of major cerebral veins, elapsed time post trauma (ETPT) and post-concussive symptom scores. The ROC analysis was performed for the diagnostic efficiency of susceptibility to discriminate mTBI patients from HCs. The susceptibility of the straight sinus in non-AI and AI patients was significantly lower than that in HCs (P <0. 001, P =0. 004, respectively, Bonferroni corrected), which may indicate an increased regional cerebral SvO2 in patients. The susceptibility of the straight sinus in non-AI patients positively correlated with ETPT (r =0. 573, P =0. 003, FDR corrected) while that in AI patients negatively correlated with the Rivermead Post Concussion Symptoms Questionnaire scores (r =−0. 582, P =0. 018, FDR corrected). The sensitivity, specificity and AUC values of susceptibility for the discrimination between mTBI patients and HCs were 88%, 69% and 0. 84. In conclusion, the susceptibility of the straight sinus can be used as a biomarker to monitor the progress of mild TBI and to differentiate mTBI patients from healthy controls.

IJCAI Conference 2016 Conference Paper

An Online Mechanism for Ridesharing in Autonomous Mobility-on-Demand Systems

  • Wen Shen
  • Cristina V. Lopes
  • Jacob W. Crandall

With proper management, Autonomous Mobility-on-Demand (AMoD) Systems have great potential to satisfy the transport demand of urban populations by providing safe, convenient, and affordable ridesharing services. Meanwhile, such systems can substantially decrease private car ownership and use, and thus significantly reduce traffic congestion, energy consumption, and carbon emissions. To achieve this objective, an AMoD system requires private information about the demand from passengers. However, due to self-interestedness, passengers are unlikely to cooperate with the service providers in this regard. Therefore, an online mechanism is desirable if it incentivizes passengers to truthfully report their actual demand. For the purpose of promoting ridesharing, we hereby introduce a posted-price, integrated online ridesharing mechanism (IORS) that satisfies desirable properties such as ex-post incentive compatibility, individual rationality, and budget-balance. Numerical results indicate the competitiveness of IORS compared with two benchmarks, namely the optimal assignment and an offline, auction-based mechanism.