Arrow Research search

Author name cluster

Huacan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

AAAI Conference 2026 Conference Paper

Easy for Children, Hard for AI: The Limits of Multimodal LLMs in Early Childhood Learning

  • Jingping Liu
  • Xueyan Wu
  • Hanxuan Chen
  • Ziyan Liu
  • Zhangquan Chen
  • Ronghao Chen
  • Huacan Wang

Early childhood is a critical stage for cognitive development, involving core skills such as visual perception and reasoning. While multimodal large language models (MLLMs) have made rapid progress in various general-purpose tasks, their ability to support early education remains largely underexplored. Existing research on child-related AI largely centers on modeling language, emotion, or behavior, with limited focus on evaluating cognitive tasks relevant to early learning. To address this gap, we propose ChildBench, a multimodal benchmark designed to assess models on tasks inspired by early childhood cognitive development. It covers five key domains through ten tasks, including spatial reasoning, visual reasoning, visual discrimination, counting skills, and visual tracking. The benchmark includes 4,890 carefully constructed images and 5,346 manually annotated samples, ensuring both diversity and age-appropriate content. We evaluate a range of state-of-the-art (SoTA) open-source and closed-source MLLMs—including GPT-4o, Gemini, and Qwen2.5-VL—on ChildBench. Despite strong performance on other benchmarks, the best 7B-parameter model with LoRA tuning achieves only 52.01% accuracy, far below the 96% achieved by 5-year-old children. These results reveal critical limitations in fine-grained perception and reasoning. We further analyze failure cases and discuss directions for future model development.

AAAI Conference 2026 Conference Paper

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

  • Ziyi Ni
  • Huacan Wang
  • Shuo Zhang
  • Shuo Lu
  • Ziyang He
  • WangYou
  • Zhenheng Tang
  • Sen Hu

Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios. To bridge this gap, we introduce GitTaskBench, a benchmark designed to systematically assess this capability via 54 realistic tasks across 7 modalities and 7 domains. Each task pairs a relevant repository with an automated, human-curated evaluation harness specifying practical success criteria. Beyond measuring execution and task success, we also propose the alpha-value metric to quantify the economic benefit of agent performance, which integrates task success rates, token cost, and average developer salaries. Experiments across three state-of-the-art agent frameworks with multiple advanced LLMs show that leveraging code repositories for complex task solving remains challenging: even the best-performing system, OpenHands+Claude 3.7, solves only 48.15% of tasks. Error analysis attributes over half of failures to seemingly mundane yet critical steps like environment setup and dependency resolution, highlighting the need for more robust workflow management and increased timeout preparedness. By releasing GitTaskBench, we aim to drive progress and attention toward repository-aware code reasoning, execution, and deployment---moving agents closer to solving complex, end-to-end real-world tasks.

AAAI Conference 2026 Conference Paper

PsyPARSE: Retrieval-Augmented Slow Thinking for Personalized Empathetic Counseling

  • Longxiang Wang
  • Pukun Zhao
  • Chen Chen
  • Jinhe Bi
  • Huacan Wang
  • Tong Zhang
  • Ronghao Chen

The escalating global demand for mental health services highlights the potential of Large Language Models (LLMs) in psychological counseling. However, current LLM-based approaches, particularly fine-tuned models, are constrained by data distribution biases, leading to limited therapeutic diversity and personalization. Crucially, they often lack anticipatory empathetic reasoning, struggle to foresee patient emotional responses beyond immediate dialogue history, and incur substantial computational costs. To address these limitations, we propose PsyPARSE, a novel training-free framework for psychological counseling that emulates the deliberate and empathetic reasoning of human counselors. PsyPARSE integrates Multi-Therapy Retrieval-Augmented Generation (RAG) to overcome data biases and provide highly personalized therapeutic approaches tailored to individual patient attributes. Pioneering the first multi-stage slow-thinking engine in mental health LLMs, PsyPARSE employs Multi-Turn Rollouts to identify optimal therapeutic paths and through anticipating patient reactions, optimizes empathetic responses, thereby ensuring genuinely empathetic and impactful responses in complex, long-dialogue interactions. Operating as a plug-and-play solution, PsyPARSE avoids the computational burden of fine-tuning. We establish a comprehensive LLM-based patient-therapist agent simulation framework for evaluation. Extensive experiments demonstrate that PsyPARSE significantly enhances the capabilities of various LLM baselines, achieving superior personalization and deeper empathy compared to both fine-tuned and other training-free methods. This work offers an efficient, adaptable, and scalable solution to advance mental health support.

NeurIPS Conference 2025 Conference Paper

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

  • Huacan Wang
  • Ziyi Ni
  • Shuo Zhang
  • Shuo Lu
  • Sen Hu
  • Ziyang He
  • Chen Hu
  • Jiaye Lin

The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than simple scripts. Building such repositories from scratch remains a major challenge. Fortunately, GitHub hosts a vast, evolving collection of open-source repositories, which developers frequently reuse as modular components for complex tasks. Yet, existing frameworks like OpenHands and SWE-Agent still struggle to effectively leverage these valuable resources. Relying solely on README files provides insufficient guidance, and deeper exploration reveals two core obstacles: overwhelming information and tangled dependencies of repositories, both constrained by the limited context windows of current LLMs. To tackle these issues, we propose RepoMaster, an autonomous agent framework designed to explore and reuse GitHub repositories for solving complex tasks. For efficient understanding, RepoMaster constructs function-call graphs, module-dependency graphs, and hierarchical code trees to identify essential components, providing only identified core elements to the LLMs rather than the entire repository. During autonomous execution, it progressively explores related components using our exploration tools and prunes information to optimize context usage. Evaluated on the adjusted MLE-bench, RepoMaster achieves a 110\% relative boost in valid submissions over the strongest baseline OpenHands. On our newly released GitTaskBench, RepoMaster lifts the task-pass rate from 40. 7% to 62. 9% while reducing token usage by 95%. Our code and demonstration materials are publicly available at https: //github. com/QuantaAlpha/RepoMaster.

NeurIPS Conference 2025 Conference Paper

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

  • Yifu Guo
  • Jiaye Lin
  • Huacan Wang
  • Yuzhen Han
  • Sen Hu
  • Ziyi Ni
  • Licheng Wang
  • Mingguang Chen

Large Language Model (LLM)-based agents have recently shown impressive capabilities in complex reasoning and tool use via multi-step interactions with their environments. While these agents have the potential to tackle complicated tasks, their problem-solving process—agents' interaction trajectory leading to task completion—remains underexploited. These trajectories contain rich feedback that can navigate agents toward the right directions for solving problems correctly. Although prevailing approaches, such as Monte Carlo Tree Search (MCTS), can effectively balance exploration and exploitation, they ignore the interdependence among various trajectories and lack the diversity of search spaces, which leads to redundant reasoning and suboptimal outcomes. To address these challenges, we propose SE-Agent, a Self-Evolution framework that enables Agents to optimize their reasoning processes iteratively. Our approach revisits and enhances former pilot trajectories through three key operations: revision, recombination, and refinement. This evolutionary mechanism enables two critical advantages: (1) it expands the search space beyond local optima by intelligently exploring diverse solution paths guided by previous trajectories, and (2) it leverages cross-trajectory inspiration to efficiently enhance performance while mitigating the impact of suboptimal reasoning paths. Through these mechanisms, SE-Agent achieves continuous self-evolution that incrementally improves reasoning quality. We evaluate SE-Agent on SWE-bench Verified to resolve real-world GitHub issues. Experimental results across five strong LLMs show that integrating SE-Agent delivers up to 55% relative improvement, achieving state-of-the-art performance among all open-source agents on SWE-bench Verified.