Arrow Research search

Author name cluster

Junjie Yao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

ICML Conference 2025 Conference Paper

An Analysis for Reasoning Bias of Language Models with Small Initialization

  • Junjie Yao
  • Zhongwang Zhang
  • Zhi-Qin John Xu

Transformer-based Large Language Models (LLMs) have revolutionized Natural Language Processing by demonstrating exceptional performance across diverse tasks. This study investigates the impact of the parameter initialization scale on the training behavior and task preferences of LLMs. We discover that smaller initialization scales encourage models to favor reasoning tasks, whereas larger initialization scales lead to a preference for memorization tasks. We validate this reasoning bias via real datasets and meticulously designed anchor functions. Further analysis of initial training dynamics suggests that specific model components, particularly the embedding space and self-attention mechanisms, play pivotal roles in shaping these learning biases. We provide a theoretical framework from the perspective of model training dynamics to explain these phenomena. Additionally, experiments on real-world language tasks corroborate our theoretical insights. This work enhances our understanding of how initialization strategies influence LLM performance on reasoning tasks and offers valuable guidelines for training models.

IJCAI Conference 2019 Conference Paper

Recurrent Neural Network for Text Classification with Hierarchical Multiscale Dense Connections

  • Yi Zhao
  • Yanyan Shen
  • Junjie Yao

Text classification is a fundamental task in many Natural Language Processing applications. While recurrent neural networks have achieved great success in performing text classification, they fail to capture the hierarchical structure and long-term semantics dependency which are common features of text data. Inspired by the advent of the dense connection pattern in advanced convolutional neural networks, we propose a simple yet effective recurrent architecture, named Hierarchical Mutiscale Densely Connected RNNs (HM-DenseRNNs), which: 1) enables direct access to the hidden states of all preceding recurrent units via dense connections, and 2) organizes multiple densely connected recurrent units into a hierarchical multi-scale structure, where the layers are updated at different scales. HM-DenseRNNs can effectively capture long-term dependencies among words in long text data, and a dense recurrent block is further introduced to reduce the number of parameters and enhance training efficiency. We evaluate the performance of our proposed architecture on three text datasets and the results verify the advantages of HM-DenseRNN over the baseline methods in terms of the classification accuracy.

AAAI Conference 2019 Conference Paper

Towards Reliable Learning for High Stakes Applications

  • Jinyang Gao
  • Junjie Yao
  • Yingxia Shao

In this paper, we focus on delivering reliable learning results for high stakes applications such as self-driving, financial investment and clinical diagnosis, where the accuracy of predictions is considered as a more crucial requirement than giving predictions for all query samples. We adopt the learning with reject option framework where the learning model only predict those samples which they convince to give the correct answer. However, for most prevailing deep learning predictors, the confidence estimated by the model themselves are far from reflecting the real generalization performance. To model the reliability of prediction concisely, we propose an exploratory solution called GALVE (Generative Adversarial Learning with Variance Expansion) which adopts generative adversarial learning to implicitly measure the region where the model achieve good generalization performance. By applying GALVE to measure the reliability of predictions, we achieved an error rate less than half of which straightforwardly measured by confidence in CIFAR10 and SVHN computer vision tasks.

AAAI Conference 2014 Conference Paper

User Group Oriented Temporal Dynamics Exploration

  • Zhiting Hu
  • Junjie Yao
  • Bin Cui

Temporal online content becomes the zeitgeist to reflect our interests and changes. Active users are essential participants and promoters behind it. Temporal dynamics becomes a viable way to investigate users. However, most current work only use global temporal trend and fail to distinguish such fine-grained patterns across groups. Different users have diverse interest and exhibit distinct behaviors, and temporal dynamics tend to be different. This paper proposes GrosToT (Group Specific Topics-over- Time), a unified probabilistic model to infer latent user groups and temporal topics at the same time. It models group-specific temporal topic variation from social content. By leveraging the comprehensive group-specific temporal patterns, Gros- ToT significantly outperforms state-of-the-art dynamics modeling methods. Our proposed approach shows advantage not only in temporal dynamics but also group content modeling. The dynamics over different groups vary, reflecting the groups’ intention. GrosToT uncovers the interplay between group interest and temporal dynamics. Specifically, groups’ attention to their medium-interested topics are event-driven, showing rich bursts; while its engagement in group’s dominating topics are interest-driven, remaining stable over time.

AAAI Conference 2010 Conference Paper

Temporal and Social Context Based Burst Detection from Folksonomies

  • Junjie Yao
  • Bin Cui
  • Yuxin Huang
  • Xin Jin

Burst detection is an important topic in temporal stream analysis. Usually, only the textual features are used in burst detection. In the theme extraction from current prevailing social media content, it is necessary to consider not only textual features but also the pervasive collaborative context, e. g. , resource lifetime and user activity. This paper explores novel approaches to combine multiple sources of such indication for better burst extraction. We systematically investigate the characters of collaborative context, i. e. , metadata frequency, topic coverage and user attractiveness. First, a robust state based model is utilized to detect bursts from individual streams. We then propose a learning method to combine these burst pulses. Experiments on a large real dataset demonstrate the remarkable improvements over the traditional methods.