Arrow Research search

Author name cluster

Si Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

  • Bichen Wang
  • Yixin Sun
  • Junzhe Wang
  • Hao Yang
  • Xing Fu
  • Yanyan Zhao
  • Si Wei
  • Shijin Wang

The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static question-and-answer evaluation formats, and unidimensional metrics. These limitations hinder their effectiveness in assessing a model's comprehensive ability to handle diverse and complex clients. To address this gap, we introduce CARE-Bench, a dynamic and interactive automated benchmark. It is built upon diverse client profiles derived from real-world counseling cases and simulated according to expert guidelines. CARE-Bench provides a multidimensional performance evaluation grounded in established psychological scales. Using CARE-Bench, we evaluate several general-purpose LLMs and specialized counseling models, revealing their current limitations. In collaboration with psychologists, we conduct a detailed analysis of the reasons for LLMs' failures when interacting with clients of different types, which provides directions for developing more comprehensive, universal, and effective counseling models.

NeurIPS Conference 2025 Conference Paper

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation

  • Xin Lu
  • Yanyan Zhao
  • Si Wei
  • Shijin Wang
  • Bing Qin
  • Ting Liu

Pre-trained language models represented by the Transformer have been proven to possess strong base capabilities, and the representative self-attention mechanism in the Transformer has become a classic in sequence modeling architectures. Different from the work of proposing sequence modeling architecture to improve the efficiency of attention mechanism, this work focuses on the impact of sequence modeling architectures on base capabilities. Specifically, our concern is: How exactly do sequence modeling architectures affect the base capabilities of pre-trained language models? In this work, we first point out that the mixed domain pre-training setting commonly adopted in existing architecture design works fails to adequately reveal the differences in base capabilities among various architectures. To address this, we propose a limited domain pre-training setting with out-of-distribution testing, which successfully uncovers significant differences in base capabilities among architectures at an early stage. Next, we analyze the base capabilities of stateful sequence modeling architectures, and find that they exhibit significant degradation in base capabilities compared to the Transformer. Then, through a series of architecture component analysis, we summarize a key architecture design principle: A sequence modeling architecture need possess full-sequence arbitrary selection capability to avoid degradation in base capabilities. Finally, we empirically validate this principle using an extremely simple Top-1 element selection architecture and further generalize it to a more practical Top-1 chunk selection architecture. Experimental results demonstrate our proposed sequence modeling architecture design principle and suggest that our work can serve as a valuable reference for future architecture improvements and novel designs.

ICML Conference 2020 Conference Paper

A Tree-Structured Decoder for Image-to-Markup Generation

  • Jianshu Zhang 0001
  • Jun Du 0002
  • Yongxin Yang
  • Yi-Zhe Song
  • Si Wei
  • Lirong Dai 0001

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.

IJCAI Conference 2020 Conference Paper

End-to-End Transition-Based Online Dialogue Disentanglement

  • Hui Liu
  • Zhan Shi
  • Jia-Chen Gu
  • Quan Liu
  • Si Wei
  • Xiaodan Zhu

Dialogue disentanglement aims to separate intermingled messages into detached sessions. The existing research focuses on two-step architectures, in which a model first retrieves the relationships between two messages and then divides the message stream into separate clusters. Almost all existing work puts significant efforts on selecting features for message-pair classification and clustering, while ignoring the semantic coherence within each session. In this paper, we introduce the first end-to- end transition-based model for online dialogue disentanglement. Our model captures the sequential information of each session as the online algorithm proceeds on processing a dialogue. The coherence in a session is hence modeled when messages are sequentially added into their best-matching sessions. Meanwhile, the research field still lacks data for studying end-to-end dialogue disentanglement, so we construct a large-scale dataset by extracting coherent dialogues from online movie scripts. We evaluate our model on both the dataset we developed and the publicly available Ubuntu IRC dataset [Kummerfeld et al. , 2019]. The results show that our model significantly outperforms the existing algorithms. Further experiments demonstrate that our model better captures the sequential semantics and obtains more coherent disentangled sessions.

AAAI Conference 2018 Conference Paper

Exercise-Enhanced Sequential Modeling for Student Performance Prediction

  • Yu Su
  • Qingwen Liu
  • Qi Liu
  • Zhenya Huang
  • Yu Yin
  • Enhong Chen
  • Chris Ding
  • Si Wei

In online education systems, for offering proactive services to students (e. g. , personalized exercise recommendation), a crucial demand is to predict student performance (e. g. , scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text descriptions of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i. e. , knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i. e. , EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.

IJCAI Conference 2017 Conference Paper

Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems

  • Quan Liu
  • Hui Jiang
  • Andrew Evdokimov
  • Zhen-Hua Ling
  • Xiaodan Zhu
  • Si Wei
  • Yu Hu

This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning. Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years. Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models. Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model. To avoid the sparseness issue, the knowledge we aim to collect is the cause-effect relationships between thousands of commonly used words. The knowledge acquisition method supports us to extract hundreds of thousands of cause-effect pairs from large text corpus automatically. Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events. Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70. 0% accuracy. Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.

AAAI Conference 2017 Conference Paper

Question DifÞculty Prediction for READING Problems in Standard Tests

  • Zhenya Huang
  • Qi Liu
  • Enhong Chen
  • Hongke Zhao
  • Mingyong Gao
  • Si Wei
  • Yu Su
  • Guoping Hu

Standard tests aim to evaluate the performance of examinees using different tests with consistent difficulties. Thus, a critical demand is to predict the difficulty of each test question before the test is conducted. Existing studies are usually based on the judgments of education experts (e. g. , teachers), which may be subjective and labor intensive. In this paper, we propose a novel Test-aware Attention-based Convolutional Neural Network (TACNN) framework to automatically solve this Question Difficulty Prediction (QDP) task for READ- ING problems (a typical problem style in English tests) in standard tests. Specifically, given the abundant historical test logs and text materials of questions, we first design a CNNbased architecture to extract sentence representations for the questions. Then, we utilize an attention strategy to qualify the difficulty contribution of each sentence to questions. Considering the incomparability of question difficulties in different tests, we propose a test-dependent pairwise strategy for training TACNN and generating the difficulty prediction value. Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.

IJCAI Conference 2016 Conference Paper

Distraction-Based Neural Networks for Modeling Document

  • Qian Chen
  • Xiaodan Zhu
  • ZhenHua Ling
  • Si Wei
  • Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e. g. , documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.