Author name cluster

Shijin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

1 author row

AAAI Conference 2026 Conference Paper

BLADE: A Behavior-Level Data Augmentation Framework with Dual Fusion Modeling for Multi-Behavior Sequential Recommendation

Yupeng Li
Mingyue Cheng
Yucong Luo
Yitong Zhou
Qingyang Mao
Shijin Wang

Multi-behavior sequential recommendation aims to capture users' dynamic interests by modeling diverse types of user interactions over time. Although several studies have explored this setting, the recommendation performance remains suboptimal, mainly due to two fundamental challenges: the heterogeneity of user behaviors and data sparsity. To address these challenges, we propose BLADE, a framework that enhances multi-behavior modeling while mitigating data sparsity. Specifically, to handle behavior heterogeneity, we introduce a dual item-behavior fusion architecture that incorporates behavior information at both the input and intermediate levels, enabling preference modeling from multiple perspectives. To mitigate data sparsity, we design three behavior-level data augmentation methods that operate directly on behavior sequences rather than core item sequences. These methods generate diverse augmented views while preserving the semantic consistency of item sequences. These augmented views further enhance representation learning and generalization via contrastive learning. Experiments on three real-world datasets demonstrate the effectiveness of our approach.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

Bichen Wang
Yixin Sun
Junzhe Wang
Hao Yang
Xing Fu
Yanyan Zhao
Si Wei
Shijin Wang

The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static question-and-answer evaluation formats, and unidimensional metrics. These limitations hinder their effectiveness in assessing a model's comprehensive ability to handle diverse and complex clients. To address this gap, we introduce CARE-Bench, a dynamic and interactive automated benchmark. It is built upon diverse client profiles derived from real-world counseling cases and simulated according to expert guidelines. CARE-Bench provides a multidimensional performance evaluation grounded in established psychological scales. Using CARE-Bench, we evaluate several general-purpose LLMs and specialized counseling models, revealing their current limitations. In collaboration with psychologists, we conduct a detailed analysis of the reasons for LLMs' failures when interacting with clients of different types, which provides directions for developing more comprehensive, universal, and effective counseling models.

PDF Details DOI

JBHI Journal 2026 Journal Article

Confidence-Aware Adaptive Fusion Leaning of Imbalance Multi-Modal Data for Cancer Diagnosis and Prognosis

Ziye Zhang
Shijin Wang
Yuying Huang
Xiaorou Zheng
Shoubin Dong

The effective fusion of pathological images and molecular omics holds significant potential for precision medicine. However, pathological and molecular data are highly heterogeneous, and large-scale multi-modal cancer data often suffer from incomplete information. Predicting clinical tasks from such imbalanced multi-modal data presents a major challenge. Therefore, we propose a confidence-aware adaptive fusion framework CAFusion. The framework adopts a modular design, providing independent and flexible modal feature learning modules to capture high-quality features. To address issues of modal imbalance caused by heterogeneous and incomplete modal, we design a confidence-aware method that evaluates the features of each modal and automatically adjusts their weights. To effectively fuse pathological and molecular modals, we propose an adaptive deep network, which features a flexible, non-fixed layer structure that effectively extracts hidden joint information from multi-modal features, ensuring high generalizability. Experiment results demonstrate that the performance of the CAFusion framework outperforms other state-of-the-art methods, both on complete and incomplete datasets. Moreover, the CAFusion framework offers reasonable medical interpretability.

Details DOI

AAAI Conference 2026 Conference Paper

From Diagnosis to Generalization: A Cognitive Approach to Data Selection for Educational LLMs

Yuxiang Guo
Yan Zhuang
Qi Liu
Zhenya Huang
Xianquan Wang
Liyang He
Jiatong Li
Rui Li

Specializing Large Language Models for educational domains is a key frontier in creating personalized learning tools. The central challenge is not data scarcity but its abundance: efficiently selecting a curated data subset from vast corpora to enhance specialized skills and foster generalization, without degrading existing abilities. Existing data selection paradigms, relying on superficial semantic similarity or model training dynamics, often lack a principled framework to identify data that promotes true cognitive growth. Our work proposes a paradigm shift from leveraging indirect proxies of learning value, such as semantic similarity and training dynamics, towards a framework that performs a direct, cognitive-level modeling of the learner's state. We introduce CASS, a novel framework that implements this cognitive approach through a clear pipeline, moving from an initial Diagnosis to the ultimate goal of expanding the model's cognitive frontier. First, CASS diagnoses the LLM's cognitive frontier using Multidimensional Item Response Theory. Leveraging this diagnosis, it then employs Fisher Information to select a data subset situated at LLM's cognitive frontier that offers maximum informational gain. Finally, the model is fine-tuned on this curated data using a structured, easy-to-hard curriculum to ensure effective learning. Experiments on our new multi-subject dataset show that models trained with CASS not only achieve superior accuracy in the target domain but also exhibit enhanced generalization. CASS provides a more efficient, effective, and theoretically-grounded paradigm for building expert educational LLMs.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

A Closed-Form Solution for Fast and Reliable Adaptive Testing

Yan Zhuang
Chenye Ke
Zirui Liu
Qi Liu
Yuting Ning
Zhenya Huang
Weizhe Huang
Qingyang Mao

Human ability estimation is essential for educational assessment, career advancement, and professional certification. Adaptive Testing systems can improve estimation efficiency by selecting fewer, targeted questions, and are widely used in exams, e. g. , GRE, GMAT, and Duolingo English Test. However, selecting an optimal subset of questions remains a challenging nested optimization problem. Existing methods rely on costly approximations or data-intensive training, making them unsuitable for today's large-scale and complex testing environments. Thus, we propose a Closed-Form solution for question subset selection in Adaptive Testing. It directly minimizes ability estimation error by reducing ability parameter's gradient bias while maintaining Hessian stability, which enables a simple greedy algorithm for question selection. Moreover, it can quantify the impact of human behavioral perturbations on ability estimation. Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.

PDF Details

NeurIPS Conference 2025 Conference Paper

FACT: Mitigating Inconsistent Hallucinations in LLMs via Fact-Driven Alternating Code-Text Training

Xinxin You
Qixin Sun
Chenwei Yan
Xiao Zhang
Chen Ning
Xiangling Fu
Si Liu
Guoping Hu

Inconsistent hallucinations remain a major challenge for large language models (LLMs), undermining the accuracy and reliability of fact-based reasoning in real-world applications. Existing approaches often rely on task-specific training or adaptation, such as hand-crafted synthetic datasets for domain tasks or solutions mainly focused on numerical reasoning, thereby limiting generalizability to broader, unseen NLP tasks. Inspired by the structural rigor and logical consistency of programming languages, we observe that fact-based texts can be mapped to programming structures due to their inherent patterns. We further propose FACT, a novel Fact-driven Alternating Code-text Training framework that alternates between text-to-code and code-to-text prediction. FACT is the first task-agnostic paradigm that embeds code and natural language in a shared semantic space, thereby transferring the logical consistency of code to LLM outputs in NLP tasks. Experiments show that with only a small subset of Wiki-40B-en for training, FACT reduces inconsistent hallucinations by 2. 7%–8. 0% and improves overall performance by 2. 5%–6. 1% in three leading LLMs and four diverse datasets covering QA and summarization tasks. This framework offers a new perspective on addressing challenging hallucinations in LLMs, contributing to more reliable AI.

PDF Details

NeurIPS Conference 2025 Conference Paper

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation

Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu

Pre-trained language models represented by the Transformer have been proven to possess strong base capabilities, and the representative self-attention mechanism in the Transformer has become a classic in sequence modeling architectures. Different from the work of proposing sequence modeling architecture to improve the efficiency of attention mechanism, this work focuses on the impact of sequence modeling architectures on base capabilities. Specifically, our concern is: How exactly do sequence modeling architectures affect the base capabilities of pre-trained language models? In this work, we first point out that the mixed domain pre-training setting commonly adopted in existing architecture design works fails to adequately reveal the differences in base capabilities among various architectures. To address this, we propose a limited domain pre-training setting with out-of-distribution testing, which successfully uncovers significant differences in base capabilities among architectures at an early stage. Next, we analyze the base capabilities of stateful sequence modeling architectures, and find that they exhibit significant degradation in base capabilities compared to the Transformer. Then, through a series of architecture component analysis, we summarize a key architecture design principle: A sequence modeling architecture need possess full-sequence arbitrary selection capability to avoid degradation in base capabilities. Finally, we empirically validate this principle using an extremely simple Top-1 element selection architecture and further generalize it to a more practical Top-1 chunk selection architecture. Experimental results demonstrate our proposed sequence modeling architecture design principle and suggest that our work can serve as a valuable reference for future architecture improvements and novel designs.

PDF Details

NeurIPS Conference 2025 Conference Paper

Investigating and Mitigating Catastrophic Forgetting in Medical Knowledge Injection through Internal Knowledge Augmentation Learning

Yuxuan Zhou
Xien Liu
Xiao Zhang
Chen Ning
Shijin Wang
Guoping Hu
Ji Wu

Large Language Models (LLMs) are expected to possess comprehensive medical knowledge to support real-world clinical applications. While domain-specific fine-tuning effectively injects medical knowledge into LLMs, it often causes catastrophic forgetting of previously acquired knowledge and instruction-following capabilities. In this paper, we investigate this issue and reveal a pattern of proximity-dependent forgetting: knowledge that is semantically or topically close to the injected content is more likely to be forgotten, while unrelated knowledge shows minimal degradation. Moreover, we observe that existing mitigation techniques fail to address this type of forgetting effectively. Motivated by this observation and inspired by human learning mechanisms, we proposeInternAL (\Internal Knowledge Augmentation Learning), a novel approach that leverages LLMs' own internal knowledge to mitigate forgetting. InternAL first probes internal knowledge closely related to the injection by prompting the model with questions derived from the injected knowledge. This knowledge is then used to augment the original injection dataset, guiding the model to retain related prior knowledge during training. Experimental results on multiple LLMs (LLaMA, Qwen) demonstrate that InternAL significantly mitigates proximity-related forgetting while maintaining strong knowledge injection performance. Our findings provide new insights into the nature of catastrophic forgetting in medical knowledge injection and highlight a promising direction for robust domain adaptation in LLMs. Code and datasets are available at https: //github. com/THUMLP/InternAL.

PDF Details

AAAI Conference 2025 Conference Paper

Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model

Guanhao Zhao
Zhenya Huang
Cheng Cheng
Yan Zhuang
Qingyang Mao
Xin Li
Shijin Wang
Enhong Chen

Cognitive diagnosis, which assesses the learners' competence from learners' interaction logs, plays a vital role in education. It provides a crucial reference for gauging learners' proficiency levels and tailoring future learning activities accordingly. Researchers have proposed numerous cognitive diagnosis models to address this task. Despite their success, these models continue to face the ill-posed problem because of the information loss caused by under-expressive interaction function and incomplete observations. In this paper, we address these challenges by proposing a novel cognitive diagnosis model, DMC-CDM, based on the theoretical premise that cognitive states can be captured with minimal information loss by maximizing the mutual information between observed and potential observations. Specifically, DMC-CDM incorporates a semantic extractor to provide a comprehensive semantic understanding of learners' interaction logs, thereby enhancing current collaborative-based cognitive state representations. It then consolidates multi-perspective observations to capture precise cognitive states by maximizing mutual information between these observations. We conducted extensive experiments on three datasets, and the experimental results demonstrate that our proposed model is both effective and beneficial for downstream applications in education.

PDF Details DOI

EAAI Journal 2025 Journal Article

Plug-and-play fine grained neural cognitive diagnosis framework

Xinjie Sun
Qi Liu
Kai Zhang
Weiyin Gong
Shuanghong Shen
Fei Wang
Yan Zhuang
Meikai Bao

Details DOI

NeurIPS Conference 2024 Conference Paper

Computerized Adaptive Testing via Collaborative Ranking

Zirui Liu
Yan Zhuang
Qi Liu
Jiatong Li
Yuren Zhang
Zhenya Huang
Jinze Wu
Shijin Wang

As the deep integration of machine learning and intelligent education, Computerized Adaptive Testing (CAT) has received more and more research attention. Compared to traditional paper-and-pencil tests, CAT can deliver both personalized and interactive assessments by automatically adjusting testing questions according to the performance of students during the test process. Therefore, CAT has been recognized as an efficient testing methodology capable of accurately estimating a student’s ability with a minimal number of questions, leading to its widespread adoption in mainstream selective exams such as the GMAT and GRE. However, just improving the accuracy of ability estimation is far from satisfactory in the real-world scenarios, since an accurate ranking of students is usually more important (e. g. , in high-stakes exams). Considering the shortage of existing CAT solutions in student ranking, this paper emphasizes the importance of aligning test outcomes (student ranks) with the true underlying abilities of students. Along this line, different from the conventional independent testing paradigm among students, we propose a novel collaborative framework, Collaborative Computerized Adaptive Testing (CCAT), that leverages inter-student information to enhance student ranking. By using collaborative students as anchors to assist in ranking test-takers, CCAT can give both theoretical guarantees and experimental validation for ensuring ranking consistency.

PDF Details DOI

AAAI Conference 2024 Conference Paper

CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework

Rui Li
Liyang He
Qi Liu
Yuze Zhao
Zheng Zhang
Zhenya Huang
Yu Su
Shijin Wang

Multilingual code retrieval aims to find code snippets relevant to a user's query from a multilingual codebase, which plays a crucial role in software development and expands their application scenarios compared to classical monolingual code retrieval. Despite the performance improvements achieved by previous studies, two crucial problems are overlooked in the multilingual scenario. First, certain programming languages face data scarcity in specific domains, resulting in limited representation capabilities within those domains. Second, different programming languages can be used interchangeably within the same domain, making it challenging for multilingual models to accurately identify the intended programming language of a user's query. To address these issues, we propose the CommONalities and SpecIalties Driven Multilingual CodE Retrieval Framework (CONSIDER), which includes two modules. The first module enhances the representation of various programming languages by modeling pairwise and global commonalities among them. The second module introduces a novel contrastive learning negative sampling algorithm that leverages language confusion to automatically extract specific language features. Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well. Our source code is available at https://github.com/smsquirrel/consider.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Kun Zhou
Beichen Zhang
Jiapeng Wang
Zhipeng Chen
Wayne X. Zhao
Jing Sha
Zhichao Sheng
Shijin Wang

Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels. Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts. The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3. 0 model. The whole process only needs to invoke GPT-4 API 9. 3k times and use 4. 6B data for training. Experimental results have shown that JiuZhang3. 0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. Our code and data will be publicly released in \url{https: //github. com/RUCAIBox/JiuZhang3. 0}.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

Tong Xiao
Jiayu Liu
Zhenya Huang
Jinze Wu
Jing Sha
Shijin Wang
Enhong Chen

Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of encoders, which neglects the essential characteristics of human geometry reasoning. In this paper, inspired by dual-process theory, we propose a Dual-Reasoning Geometry Solver (DualGeoSolver) to simulate the dual-reasoning process of humans for GPS. Specifically, we construct two systems in DualGeoSolver, namely Knowledge System and Inference System. Knowledge System controls an implicit reasoning process, which is responsible for providing diagram information and geometry knowledge according to a step-wise reasoning goal generated by Inference System. Inference System conducts an explicit reasoning process, which specifies the goal in each reasoning step and applies the knowledge to generate program tokens for resolving it. The two systems carry out the above process iteratively, which behaves more in line with human cognition. We conduct extensive experiments on two benchmark datasets, GeoQA and GeoQA+. The results demonstrate the superiority of DualGeoSolver in both solving accuracy and robustness from explicitly modeling human reasoning process and knowledge application.

PDF Details DOI

TIST Journal 2024 Journal Article

Model-Agnostic Adaptive Testing for Intelligent Education Systems via Meta-learned Gradient Embeddings

Haoyang Bi
Qi Liu
Han Wu
Weidong He
Zhenya Huang
Yu Yin
Haiping Ma
Yu Su

The field of education has undergone a significant revolution with the advent of intelligent systems and technology, which aim to personalize the learning experience, catering to the unique needs and abilities of individual learners. In this pursuit, a fundamental challenge is designing proper test for assessing the students’ cognitive status on knowledge and skills accurately and efficiently. One promising approach, referred to as Computerized Adaptive Testing (CAT), is to administrate computer-automated tests that alternately select the next item for each examinee and estimate their cognitive states given their responses to the selected items. Nevertheless, existing CAT systems suffer from inflexibility in item selection and ineffectiveness in cognitive state estimation, respectively. In this article, we propose a Model-Agnostic adaptive testing framework via Meta-leaned Gradient Embeddings, MAMGE for short, improving both item selection and cognitive state estimation simultaneously. For item selection, we design a Gradient Embedding-based Item Selector (GEIS) which incorporates the concept of gradient embeddings to represent items and selects the best ones that are both informative and representative. For cognitive state estimation, we propose a Meta-learned Cognitive State Estimator (MCSE) to automatically control the estimation process by learning to learn a proper initialization and dynamically inferred updates. Both MCSE and GEIS are inherently model-agnostic, and the two modules have an ingenious connection via meta-learned gradient embeddings. Finally, extensive experiments evaluate the effectiveness and flexibility of MAMGE.

Details DOI

NeurIPS Conference 2024 Conference Paper

SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

Jiayu Liu
Zhenya Huang
Tong Xiao
Jing Sha
Jinze Wu
Qi Liu
Shijin Wang
Enhong Chen

Large language models (LLMs) are considered a crucial technology for advancing intelligent education since they exhibit the potential for an in-depth understanding of teaching scenarios and providing students with personalized guidance. Nonetheless, current LLM-based application in personalized teaching predominantly follows a "Question-Answering" paradigm, where students are passively provided with answers and explanations. In this paper, we propose SocraticLM, which achieves a Socratic "Thought-Provoking" teaching paradigm that fulfills the role of a real classroom teacher in actively engaging students in the thought process required for genuine problem-solving mastery. To build SocraticLM, we first propose a novel "Dean-Teacher-Student" multi-agent pipeline to construct a new dataset, SocraTeach, which contains $35$K meticulously crafted Socratic-style multi-round (equivalent to $208$K single-round) teaching dialogues grounded in fundamental mathematical problems. Our dataset simulates authentic teaching scenarios, interacting with six representative types of simulated students with different cognitive states, and strengthening four crucial teaching abilities. SocraticLM is then fine-tuned on SocraTeach with three strategies balancing its teaching and reasoning abilities. Moreover, we contribute a comprehensive evaluation system encompassing five pedagogical dimensions for assessing the teaching quality of LLMs. Extensive experiments verify that SocraticLM achieves significant improvements in the teaching performance, outperforming GPT4 by more than 12\%. Our dataset and code is available at https: //github. com/Ljyustc/SocraticLM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Towards Accurate and Fair Cognitive Diagnosis via Monotonic Data Augmentation

Zheng Zhang
Wei Song
Qi Liu
Qingyang Mao
Yiyan Wang
Weibo Gao
Zhenya Huang
Shijin Wang

Intelligent education stands as a prominent application of machine learning. Within this domain, cognitive diagnosis (CD) is a key research focus that aims to diagnose students' proficiency levels in specific knowledge concepts. As a crucial task within the field of education, cognitive diagnosis encompasses two fundamental requirements: accuracy and fairness. Existing studies have achieved significant success by primarily utilizing observed historical logs of student-exercise interactions. However, real-world scenarios often present a challenge, where a substantial number of students engage with a limited number of exercises. This data sparsity issue can lead to both inaccurate and unfair diagnoses. To this end, we introduce a monotonic data augmentation framework, CMCD, to tackle the data sparsity issue and thereby achieve accurate and fair CD results. Specifically, CMCD integrates the monotonicity assumption, a fundamental educational principle in CD, to establish two constraints for data augmentation. These constraints are general and can be applied to the majority of CD backbones. Furthermore, we provide theoretical analysis to guarantee the accuracy and convergence speed of CMCD. Finally, extensive experiments on real-world datasets showcase the efficacy of our framework in addressing the data sparsity issue with accurate and fair CD results.

PDF Details DOI

AAAI Conference 2023 Conference Paper

BETA-CD: A Bayesian Meta-Learned Cognitive Diagnosis Framework for Personalized Learning

Haoyang Bi
Enhong Chen
Weidong He
Han Wu
Weihao Zhao
Shijin Wang
Jinze Wu

Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands for practice data. The key to achieving that lies in the cognitive diagnosis task, which estimates the cognitive state of the student through his/her logged data of doing practice quizzes. Nevertheless, in the personalized learning scenario, existing cognitive diagnosis models suffer from the inability to (1) quickly adapt to new students using a small amount of data, and (2) measure the reliability of the diagnosis result to avoid improper services that mismatch the student's actual state. In this paper, we propose a general Bayesian mETA-learned Cognitive Diagnosis framework (BETA-CD), which addresses the two challenges by prior knowledge exploitation and model uncertainty quantification, respectively. Specifically, we firstly introduce Bayesian hierarchical modeling to associate each student's cognitive state with a shared prior distribution encoding prior knowledge and a personal posterior distribution indicating model uncertainty. Furthermore, we formulate a meta-learning objective to automatically exploit prior knowledge from historical students, and efficiently solve it with a gradient-based variational inference method. The code will be publicly available at https://github.com/AyiStar/pyat.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Beichen Zhang
Kun Zhou
Xilin Wei
Xin Zhao
Jing Sha
Shijin Wang
Ji-Rong Wen

Chain-of-thought prompting (CoT) and tool augmentation have been validated in recent work as effective practices for improving large language models (LLMs) to perform step-by-step reasoning on complex math-related tasks. However, most existing math reasoning datasets may not be able to fully evaluate and analyze the ability of LLMs in manipulating tools and performing reasoning, as they often only require very few invocations of tools or miss annotations for evaluating intermediate reasoning steps, thus supporting only outcome evaluation. To address the issue, we construct CARP, a new Chinese dataset consisting of 4, 886 computation-intensive algebra problems with formulated annotations on intermediate steps, facilitating the evaluation of the intermediate reasoning process. In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to incorrect answers. Based on this finding, we propose a new approach that can facilitate the deliberation on reasoning steps with tool interfaces, namely DELI. In DELI, we first initialize a step-by-step solution based on retrieved exemplars, then iterate two deliberation procedures that check and refine the intermediate steps of the generated solution, from both tool manipulation and natural language reasoning perspectives, until solutions converge or the maximum iteration is achieved. Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines, and can further boost the performance of existing CoT methods. Our data and code are available at https: //github. com/RUCAIBox/CARP.

PDF Details

IJCAI Conference 2023 Conference Paper

Exploiting Non-Interactive Exercises in Cognitive Diagnosis

Fangzhou Yao
Qi Liu
Min Hou
Shiwei Tong
Zhenya Huang
Enhong Chen
Jing Sha
Shijin Wang

Cognitive Diagnosis aims to quantify the proficiency level of students on specific knowledge concepts. Existing studies merely leverage observed historical students-exercise interaction logs to access proficiency levels. Despite effectiveness, observed interactions usually exhibit a power-law distribution, where the long tail consisting of students with few records lacks supervision signals. This phenomenon leads to inferior diagnosis among few records students. In this paper, we propose the Exercise-aware Informative Response Sampling (EIRS) framework to address the long-tail problem. EIRS is a general framework that explores the partial order between observed and unobserved responses as auxiliary ranking-based training signals to supplement cognitive diagnosis. Considering the abundance and complexity of unobserved responses, we first design an Exercise-aware Candidates Selection module, which helps our framework produce reliable potential responses for effective supplementary training. Then, we develop an Expected Ability Change-weighted Informative Sampling strategy to adaptively sample informative potential responses that contribute greatly to model training. Experiments on real-world datasets demonstrate the supremacy of our framework in long-tailed data.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training

Yuting Ning
Zhenya Huang
Xin Lin
Enhong Chen
Shiwei Tong
Zheng Gong
Shijin Wang

Understanding mathematical questions effectively is a crucial task, which can benefit many applications, such as difficulty estimation. Researchers have drawn much attention to designing pre-training models for question representations due to the scarcity of human annotations (e.g., labeling difficulty). However, unlike general free-format texts (e.g., user comments), mathematical questions are generally designed with explicit purposes and mathematical logic, and usually consist of more complex content, such as formulas, and related mathematical knowledge (e.g., Function). Therefore, the problem of holistically representing mathematical questions remains underexplored. To this end, in this paper, we propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo, which attempts to bring questions with more similar purposes closer. Specifically, we first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy (KHAR), which ranks the similarities between questions in a fine-grained manner. Next, we adopt a ranking contrastive learning task to optimize our model based on the augmented and ranked questions. We conduct extensive experiments on two real-world mathematical datasets. The experimental results demonstrate the effectiveness of our model.

PDF Details DOI

AAAI Conference 2021 Conference Paper

HMS: A Hierarchical Solver with Dependency-Enhanced Understanding for Math Word Problem

Xin Lin
Zhenya Huang
Hongke Zhao
Enhong Chen
Qi Liu
Hao Wang
Shijin Wang

Automatically solving math word problems is a crucial task for exploring the intelligence levels of machines in the general AI domain. It is highly challenging since it requires not only natural language understanding but also mathematical expression inference. Existing solutions usually explore sequence-to-sequence models to generate expressions, where the problems are simply encoded sequentially. However, such models are generally far from enough for understanding problems as similar to humans and lead to incorrect answers. To this end, in this paper, we propose a novel Hierarchical Math Solver (HMS) to make deep understanding and exploitation of problems. In problem understanding, imitating human reading habits, we propose a hierarchical word-clauseproblem encoder. Specifically, we first split each problem into several clauses and learn problem semantics from the local clause level to the global problem level. Then, in clause understanding, we propose a dependency-based module to enhance clause semantics with the dependency structure of the problem. Next, in expression inference, we propose a novel tree-based decoder to generate the mathematical expression for the answer. In the decoder, we apply a hierarchical attention mechanism to enhance the problem semantics with context from different levels, and a pointer-generator network to guide the model to copy existing information and infer extra knowledge. Extensive experimental results on two widely used datasets demonstrate that HMS achieves not only better answers but also more reasonable inference.

PDF Details

EAAI Journal 2021 Journal Article

Wafer map defect recognition based on deep transfer learning-based densely connected convolutional network and deep forest

Jianbo Yu
Zongli Shen
Shijin Wang

Details DOI

AAAI Conference 2020 Conference Paper

Discriminative Sentence Modeling for Story Ending Prediction

Yiming Cui
Wanxiang Che
Wei-Nan Zhang
Ting Liu
Shijin Wang
Guoping Hu

Story Ending Prediction is a task that needs to select an appropriate ending for the given story, which requires the machine to understand the story and sometimes needs commonsense knowledge. To tackle this task, we propose a new neural network called Diff-Net for better modeling the differences of each ending in this task. The proposed model could discriminate two endings in three semantic levels: contextual representation, story-aware representation, and discriminative representation. Experimental results on the Story Cloze Test dataset show that the proposed model siginiﬁcantly outperforms various systems by a large margin, and detailed ablation studies are given for better understanding our model. We also carefully examine the traditional and BERT-based models on both SCT v1. 0 and v1. 5 with interesting ﬁndings that may potentially help future studies.

PDF Details

AAAI Conference 2020 Conference Paper

Neural Cognitive Diagnosis for Intelligent Education Systems

Fei Wang
Qi Liu
Enhong Chen
Zhenya Huang
Yuying Chen
Yu Yin
Zai Huang
Shijin Wang

Cognitive diagnosis is a fundamental issue in intelligent education, which aims to discover the proﬁciency level of students on speciﬁc knowledge concepts. Existing approaches usually mine linear interactions of student exercising process by manual-designed function (e. g. , logistic function), which is not sufﬁcient for capturing complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex exercising interactions, for getting both accurate and interpretable diagnosis results. Speciﬁcally, we project students and exercises to factor vectors and leverage multi neural layers for modeling their interactions, where the monotonicity assumption is applied to ensure the interpretability of both factors. Furthermore, we propose two implementations of NeuralCD by specializing the required concepts of each exercise, i. e. , the NeuralCDM with traditional Q-matrix and the improved NeuralCDM+ exploring the rich text content. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.

PDF Details

AAAI Conference 2019 Conference Paper

Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions

Zhipeng Chen
Yiming Cui
Wentao Ma
Shijin Wang
Guoping Hu

Machine Reading Comprehension (MRC) with multiplechoice questions requires the machine to read given passage and select the correct answer among several candidates. In this paper, we propose a novel approach called Convolutional Spatial Attention (CSA) model which can better handle the MRC with multiple-choice questions. The proposed model could fully extract the mutual information among the passage, question, and the candidates, to form the enriched representations. Furthermore, to merge various attention results, we propose to use convolutional operation to dynamically summarize the attention values within the different size of regions. Experimental results show that the proposed model could give substantial improvements over various state-of- the-art systems on both RACE and SemEval-2018 Task11 datasets.

PDF Details

TCS Journal 2013 Journal Article

Approximation algorithms for parallel machine scheduling with linear deterioration

Ming Liu
Feifeng Zheng
Shijin Wang
Yinfeng Xu

Details DOI

TCS Journal 2012 Journal Article

Optimal algorithms for online single machine scheduling with deteriorating jobs

Ming Liu
Feifeng Zheng
Shijin Wang
Jiazhen Huo

Details DOI