Author name cluster

Yukino Baba

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Self Iterative Label Refinement via Robust Unlabeled Learning

Hikaru Asano
Tadashi Kozuno
Yukino Baba

Recent advances in large language models (LLMs) have yielded impressive performance on various tasks, yet they often depend on high-quality feedback that can be costly. Self-refinement methods attempt to leverage LLMs' internal evaluation mechanisms with minimal human supervision; however, these approaches frequently suffer from inherent biases and overconfidence, especially in domains where the models lack sufficient internal knowledge, resulting in performance degradation. As an initial step toward enhancing self-refinement for broader applications, we introduce an iterative refinement pipeline that employs the Unlabeled-Unlabeled learning framework to improve LLM-generated pseudo-labels for classification tasks. By exploiting two unlabeled datasets with differing positive class ratios, our approach iteratively denoises and refines the initial pseudo-labels, thereby mitigating the adverse effects of internal biases with minimal human supervision. Evaluations on diverse datasets, including low-resource language corpora, patent classifications, and protein structure categorizations, demonstrate that our method consistently outperforms both initial LLM's classification performance and the self-refinement approaches by cutting-edge models (e. g. , GPT-4o and DeepSeek-R1). Moreover, we experimentally confirm that our refined classifier facilitates effective post-training alignment for safety in LLMs and demonstrate successful self-refinement in generative tasks as well. Our code is available at https: //github. com/HikaruAsano/self-iterative-label-refinement.

PDF Details

IJCAI Conference 2020 Conference Paper

Performance as a Constraint: An Improved Wisdom of Crowds Using Performance Regularization

Jiyi Li
Yasushi Kawase
Yukino Baba
Hisashi Kashima

Quality assurance is one of the most important problems in crowdsourcing and human computation, and it has been extensively studied from various aspects. Typical approaches for quality assurance include unsupervised approaches such as introducing task redundancy (i. e. , asking the same question to multiple workers and aggregating their answers) and supervised approaches such as using worker performance on past tasks or injecting qualification questions into tasks in order to estimate the worker performance. In this paper, we propose to utilize the worker performance as a global constraint for inferring the true answers. The existing semi-supervised approaches do not consider such use of qualification questions. We also propose to utilize the constraint as a regularizer combined with existing statistical aggregation methods. The experiments using heterogeneous multiple-choice questions demonstrate that the performance constraint not only has the power to estimate the ground truths when used by itself, but also boosts the existing aggregation methods when used as a regularizer.

PDF Details DOI

AAAI Conference 2018 Conference Paper

AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling

Ryusuke Takahama
Yukino Baba
Nobuyuki Shimizu
Sumio Fujita
Hisashi Kashima

Feature engineering is the key to successful application of machine learning algorithms to real-world data. The discovery of informative features often requires domain knowledge or human inspiration, and data scientists expend a certain amount of effort into exploring feature spaces. Crowdsourcing is considered a promising approach for allowing many people to be involved in feature engineering; however, there is a demand for a sophisticated strategy that enables us to acquire good features at a reasonable crowdsourcing cost. In this paper, we present a novel algorithm called AdaFlock to efﬁciently obtain informative features through crowdsourcing. AdaFlock is inspired by AdaBoost, which iteratively trains classiﬁers by increasing the weights of samples misclassiﬁed by previous classiﬁers. AdaFlock iteratively generates informative features; at each iteration of AdaFlock, crowdsourcing workers are shown samples selected according to the classiﬁcation errors of the current classiﬁers and are asked to generate new features that are helpful for correctly classifying the given examples. The results of our experiments conducted using real datasets indicate that AdaFlock successfully discovers informative features with fewer iterations and achieves high classiﬁcation accuracy.

PDF Details

AAAI Conference 2018 Conference Paper

Data Analysis Competition Platform for Educational Purposes: Lessons Learned and Future Challenges

Yukino Baba
Tomoumi Takase
Kyohei Atarashi
Satoshi Oyama
Hisashi Kashima

Data analysis education plays an important role in accelerating the efﬁcient use of data analysis technologies in various domains. Not only the knowledge of statistics and machine learning, but also practical skills of deploying machine learning and data analysis techniques, are required for conducting data analysis projects in the real world. Data analysis competitions, such as Kaggle, have been considered as an efﬁcient system for learning such skills by addressing real data analysis problems. However, current data analysis competitions are not designed for educational purposes and it is not well studied how data analysis competition platforms should be designed for enhancing educational effectiveness. To answer this research question, we built, and subsequently operated an educational data analysis competition platform called University of Big Data for several years. In this paper, we present our approaches for supporting and motivating learners and the results of our case studies. We found that providing a tutorial article is beneﬁcial for encouraging active participation of learners, and a leaderboard system allowing an unlimited number of submissions can motivate the efforts of learners. We further discuss future directions of educational data analysis competitions.

PDF Details

AAAI Conference 2018 Conference Paper

Predictive Modeling of Learning Continuation in Preschool Education Using Temporal Patterns of Development Tests

Junpei Naito
Yukino Baba
Hisashi Kashima
Takenori Takaki
Takuya Funo

Learning analytics applies data analysis techniques to learning data in order to support students’ learning processes and to improve the quality of education. Despite the increasing attention to learning analytics for higher education, it has not been fully addressed in primary and preschool education. In this research, we apply learning analytics to preschool education to predict the continuation of learning of preschool children. Based on our hypothesis that temporal patterns in the assessment scores of development tests are effective features for prediction, we extract the temporal patterns using timeseries clustering, and use them as the features of prediction models. The experimental results using a real preschool education dataset show that the use of the temporal patterns improves the predictive accuracy of future continuation of study.

PDF Details

IJCAI Conference 2018 Conference Paper

Simultaneous Clustering and Ranking from Pairwise Comparisons

Jiyi Li
Yukino Baba
Hisashi Kashima

When people make decisions with a number of ideas, designs, or other kinds of objects, one attempt is probably to organize them into several groups of objects and to prioritize them according to some preference. The grouping task is referred to as clustering and the prioritizing task is called as ranking. These tasks are often outsourced with the help of human judgments in the form of pairwise comparisons. Two objects are compared on whether they are similar in the clustering problem, while the object of higher priority is determined in the ranking problem. Our research question in this paper is whether the pairwise comparisons for clustering also help ranking (and vice versa). Instead of solving the two tasks separately, we propose a unified formulation to bridge the two types of pairwise comparisons. Our formulation simultaneously estimates the object embeddings and the preference criterion vector. The experiments using real datasets support our hypothesis; our approach can generate better neighbor and preference estimation results than the approaches that only focus on a single type of pairwise comparisons.

PDF Details

IJCAI Conference 2018 Conference Paper

Statistical Quality Control for Human Computation and Crowdsourcing

Yukino Baba

Human computation is a method for solving difficult problems by combining humans and computers. Quality control is a critical issue in human computation because it relies on a large number of participants (i. e. , crowds) and there is an uncertainty about their reliability. A solution for this issue is to leverage the power of the "wisdom of crowds"; for example, we can aggregate the outputs of multiple participants or ask a participant to check the output of another participant to improve its quality. In this paper, we review several statistical approaches for controlling the quality of outputs from crowds.

PDF Details

AAAI Conference 2017 Conference Paper

Pairwise HITS: Quality Estimation from Pairwise Comparisons in Creator-Evaluator Crowdsourcing Process

Takeru Sunahase
Yukino Baba
Hisashi Kashima

A common technique for improving the quality of crowdsourcing results is to assign a same task to multiple workers redundantly, and then to aggregate the results to obtain a higher-quality result; however, this technique is not applicable to complex tasks such as article writing since there is no obvious way to aggregate the results. Instead, we can use a two-stage procedure consisting of a creation stage and an evaluation stage, where we ﬁrst ask workers to create artifacts, and then ask other workers to evaluate the artifacts to estimate their quality. In this study, we propose a novel quality estimation method for the two-stage procedure where pairwise comparison results for pairs of artifacts are collected at the evaluation stage. Our method is based on an extension of Kleinberg’s HITS algorithm to pairwise comparison, which takes into account the ability of evaluators as well as the ability of creators. Experiments using actual crowdsourcing tasks show that our methods outperform baseline methods especially when the number of evaluators per artifact is small.

PDF Details

IJCAI Conference 2016 Conference Paper

Assessing Translation Ability through Vocabulary Ability Assessment

Yo Ehara
Yukino Baba
Masao Utiyama
Eiichiro Sumita

Translation ability is known as one of the most difficult language abilities to measure. A typical method of measuring translation ability involves asking translators to translate sentences and to request professional evaluators to grade the translations. It imposes a heavy burden on both translators and evaluators. In this paper, we propose a practical method for assessing translation ability. Our key idea is to incorporate translators' vocabulary knowledge for translation ability assessment. Our method involves just asking translators to tell if they know given words. Using this vocabulary information, we build a probabilistic model to estimate the translators' vocabulary and translation abilities simultaneously. We evaluated our method in a realistic crowdsourcing translation setting in which there is a great need to measure translators' translation ability to select good translators. The results of our experiments show that the proposed method accurately estimates translation ability and selects translators who have sufficient skills in translating a given sentence. We also found that our method significantly reduces the cost of crowdsourcing translation.

PDF Details

IJCAI Conference 2013 Conference Paper

Accurate Integration of Crowdsourced Labels Using Workers' Self-Reported Confidence Scores

Satoshi Oyama
Yukino Baba
Yuko Sakurai
Hisashi Kashima

We have developed a method for using confidence scores to integrate labels provided by crowdsourcing workers. Although confidence scores can be useful information for estimating the quality of the provided labels, a way to effectively incorporate them into the integration process has not been established. Moreover, some workers are overconfident about the quality of their labels while others are underconfident, and some workers are quite accurate in judging the quality of their labels. This differing reliability of the confidence scores among workers means that the probability distributions for the reported confidence scores differ among workers. To address this problem, we extended the Dawid-Skene model and created two probabilistic models in which the values of unobserved true labels are inferred from the observed provided labels and reported confidence scores by using the expectation-maximization algorithm. Results of experiments using actual crowdsourced data for image labeling and binary question answering tasks showed that incorporating workers’ confidence scores can improve the accuracy of integrated crowdsourced labels.

PDF Details DOI

ECAI Conference 2010 Conference Paper

Extraction of Places Related to Flickr Tags

Yukino Baba
Fuyuki Ishikawa
Shinichi Honiden

Geographic information systems use databases to map keywords to places. These databases are currently most often created by using a top-down approach based on the geographic definitions. However, there is a problem with this approach in that these databases only contain location definitions such as addresses and place names, which does not allow for searches using keywords other than these words. Additionally, they do not give any information on the popularity, e. g. , which is more popular among the places indexed by the same keyword. A bottom-up approach, based on the actual usage of words, can address these problems. We propose a method to aggregate tagging data and extract places related to a tag using the pair of a tag and a geo-tagged photo. We target the co-occurrence of a tag and the geolocation and represent the places related to a tag as a probability distribution over the longitudes and latitudes. We applied our method to data on the photo sharing service Flickr and experimentally confirmed that our method made it possible to highly-accurately extract places related to tags. Our direct bottom-up approach enables the extraction of place information that is not obtained by using traditional top-down approaches.

Details