Author name cluster

Si Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

Bichen Wang
Yixin Sun
Junzhe Wang
Hao Yang
Xing Fu
Yanyan Zhao
Si Wei
Shijin Wang

The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static question-and-answer evaluation formats, and unidimensional metrics. These limitations hinder their effectiveness in assessing a model's comprehensive ability to handle diverse and complex clients. To address this gap, we introduce CARE-Bench, a dynamic and interactive automated benchmark. It is built upon diverse client profiles derived from real-world counseling cases and simulated according to expert guidelines. CARE-Bench provides a multidimensional performance evaluation grounded in established psychological scales. Using CARE-Bench, we evaluate several general-purpose LLMs and specialized counseling models, revealing their current limitations. In collaboration with psychologists, we conduct a detailed analysis of the reasons for LLMs' failures when interacting with clients of different types, which provides directions for developing more comprehensive, universal, and effective counseling models.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation

Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu

Pre-trained language models represented by the Transformer have been proven to possess strong base capabilities, and the representative self-attention mechanism in the Transformer has become a classic in sequence modeling architectures. Different from the work of proposing sequence modeling architecture to improve the efficiency of attention mechanism, this work focuses on the impact of sequence modeling architectures on base capabilities. Specifically, our concern is: How exactly do sequence modeling architectures affect the base capabilities of pre-trained language models? In this work, we first point out that the mixed domain pre-training setting commonly adopted in existing architecture design works fails to adequately reveal the differences in base capabilities among various architectures. To address this, we propose a limited domain pre-training setting with out-of-distribution testing, which successfully uncovers significant differences in base capabilities among architectures at an early stage. Next, we analyze the base capabilities of stateful sequence modeling architectures, and find that they exhibit significant degradation in base capabilities compared to the Transformer. Then, through a series of architecture component analysis, we summarize a key architecture design principle: A sequence modeling architecture need possess full-sequence arbitrary selection capability to avoid degradation in base capabilities. Finally, we empirically validate this principle using an extremely simple Top-1 element selection architecture and further generalize it to a more practical Top-1 chunk selection architecture. Experimental results demonstrate our proposed sequence modeling architecture design principle and suggest that our work can serve as a valuable reference for future architecture improvements and novel designs.

PDF Details

ICML Conference 2020 Conference Paper

A Tree-Structured Decoder for Image-to-Markup Generation

Jianshu Zhang 0001
Jun Du 0002
Yongxin Yang
Yi-Zhe Song
Si Wei
Lirong Dai 0001

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.

Details

IJCAI Conference 2020 Conference Paper

End-to-End Transition-Based Online Dialogue Disentanglement

Hui Liu
Zhan Shi
Jia-Chen Gu
Quan Liu
Si Wei
Xiaodan Zhu

Dialogue disentanglement aims to separate intermingled messages into detached sessions. The existing research focuses on two-step architectures, in which a model first retrieves the relationships between two messages and then divides the message stream into separate clusters. Almost all existing work puts significant efforts on selecting features for message-pair classification and clustering, while ignoring the semantic coherence within each session. In this paper, we introduce the first end-to- end transition-based model for online dialogue disentanglement. Our model captures the sequential information of each session as the online algorithm proceeds on processing a dialogue. The coherence in a session is hence modeled when messages are sequentially added into their best-matching sessions. Meanwhile, the research field still lacks data for studying end-to-end dialogue disentanglement, so we construct a large-scale dataset by extracting coherent dialogues from online movie scripts. We evaluate our model on both the dataset we developed and the publicly available Ubuntu IRC dataset [Kummerfeld et al. , 2019]. The results show that our model significantly outperforms the existing algorithms. Further experiments demonstrate that our model better captures the sequential semantics and obtains more coherent disentangled sessions.

PDF Details DOI

AAAI Conference 2018 Conference Paper

Exercise-Enhanced Sequential Modeling for Student Performance Prediction

Yu Su
Qingwen Liu
Qi Liu
Zhenya Huang
Yu Yin
Enhong Chen
Chris Ding
Si Wei

In online education systems, for offering proactive services to students (e. g. , personalized exercise recommendation), a crucial demand is to predict student performance (e. g. , scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text descriptions of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we ﬁrst design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i. e. , knowledge states) in their sequential exercising process with the combination of exercise representations. For making ﬁnal predictions, we design two strategies under EERNN, i. e. , EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.

PDF Details

IJCAI Conference 2017 Conference Paper

Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems

Quan Liu
Hui Jiang
Andrew Evdokimov
Zhen-Hua Ling
Xiaodan Zhu
Si Wei
Yu Hu

This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning. Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years. Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models. Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model. To avoid the sparseness issue, the knowledge we aim to collect is the cause-effect relationships between thousands of commonly used words. The knowledge acquisition method supports us to extract hundreds of thousands of cause-effect pairs from large text corpus automatically. Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events. Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70. 0% accuracy. Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.

PDF Details

AAAI Conference 2017 Conference Paper

Question DifÞculty Prediction for READING Problems in Standard Tests

Zhenya Huang
Qi Liu
Enhong Chen
Hongke Zhao
Mingyong Gao
Si Wei
Yu Su
Guoping Hu

Standard tests aim to evaluate the performance of examinees using different tests with consistent difﬁculties. Thus, a critical demand is to predict the difﬁculty of each test question before the test is conducted. Existing studies are usually based on the judgments of education experts (e. g. , teachers), which may be subjective and labor intensive. In this paper, we propose a novel Test-aware Attention-based Convolutional Neural Network (TACNN) framework to automatically solve this Question Difﬁculty Prediction (QDP) task for READ- ING problems (a typical problem style in English tests) in standard tests. Speciﬁcally, given the abundant historical test logs and text materials of questions, we ﬁrst design a CNNbased architecture to extract sentence representations for the questions. Then, we utilize an attention strategy to qualify the difﬁculty contribution of each sentence to questions. Considering the incomparability of question difﬁculties in different tests, we propose a test-dependent pairwise strategy for training TACNN and generating the difﬁculty prediction value. Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.

PDF Details

IJCAI Conference 2016 Conference Paper

Distraction-Based Neural Networks for Modeling Document

Qian Chen
Xiaodan Zhu
ZhenHua Ling
Si Wei
Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e. g. , documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.

PDF Details