Arrow Research search

Author name cluster

I-Chen Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

AAMAS Conference 2025 Conference Paper

Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning

  • Wei-Chen Liao
  • Ti-Rong Wu
  • I-Chen Wu

Multi-agent reinforcement Learning (MARL) is often challenged by the sight range dilemma, where agents either receive insufficient or excessive information from their environment. In this paper, we propose a novel method, called Dynamic Sight Range Selection (DSR), to address this issue. DSR utilizes an Upper Confidence Bound (UCB) algorithm and dynamically adjusts the sight range during training. Experiment results show several advantages of using DSR. First, we demonstrate using DSR achieves better performance in three common MARL environments, including Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and StarCraft Multi-Agent Challenge (SMAC). Second, our results show that DSR consistently improves performance across multiple MARL algorithms, including QMIX and MAPPO. Third, DSR offers suitable sight ranges for different training steps, thereby accelerating the training process. Finally, DSR provides additional interpretability by indicating the optimal sight range used during training. Unlike existing methods that rely on global information or communication mechanisms, our approach operates solely based on the individual sight ranges of agents. This approach offers a practical and efficient solution to the sight range dilemma, making it broadly applicable to real-world complex environments.

NeurIPS Conference 2025 Conference Paper

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

  • Jian-Ting Guo
  • Yu-Cheng Chen
  • Ping-Chun Hsieh
  • Kuo-Hao Ho
  • Po-Wei Huang
  • Ti-Rong Wu
  • I-Chen Wu

Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents. Our code is available at https: //rlg. iis. sinica. edu. tw/papers/MAQ.

NeurIPS Conference 2025 Conference Paper

TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models

  • Hsin Yi Hsieh
  • Shang-Wei Liu
  • Chang-Chih Meng
  • Chien-Hua Chen
  • Shuo-Yueh Lin
  • Hung-Ju Lin
  • Hen-Hsen Huang
  • I-Chen Wu

Vision-language models (VLMs) often struggle with culturally specific content — a challenge largely overlooked by existing benchmarks that focus on dominant languages and globalized datasets. We introduce TᴀɪᴡᴀɴVQA, a VQA benchmark designed for Taiwanese culture to evaluate recognition and reasoning in regional contexts. TᴀɪᴡᴀɴVQA contains 2, 736 images and 5, 472 manually curated questions covering topics such as traditional foods, public signs, festivals, and landmarks. The official benchmark set includes 1, 000 images and 2, 000 questions for systematic assessment, with the remainder of the data used as training material. Evaluations on state-of-the-art VLMs reveal strong visual recognition but notable weaknesses in cultural reasoning. To address this, we propose a data augmentation strategy that combines human-annotated and synthesized dialogues to enhance cultural understanding. Fine-tuning yields significant gains on TᴀɪᴡᴀɴVQA while maintaining stable performance on other multimodal tasks. To further explore the models’ cultural understanding, we conducted an open-ended question answering experiment. The results indicate a notable decline in cultural knowledge generation ($\approx$10–20\%), suggesting challenges remain. TᴀɪᴡᴀɴVQA offers a scalable framework for building culturally grounded AI models in low-resource cultures, promoting diversity and fairness in multimodal AI. Our dataset and code are publicly available on [Hugging Face](https: //huggingface. co/datasets/hhhuang/TaiwanVQA) and [GitHub](https: //github. com/hhhuang/TaiwanVQA).

IROS Conference 2024 Conference Paper

Gradient-based Regularization for Action Smoothness in Robotic Control with Reinforcement Learning

  • I Lee
  • Hoang-Giang Cao
  • Cong-Tinh Dao
  • Yu-Cheng Chen
  • I-Chen Wu

Deep Reinforcement Learning (DRL) has achieved remarkable success, ranging from complex computer games to real-world applications, showing the potential for intelligent agents capable of learning in dynamic environments. However, its application in real-world scenarios presents challenges, including the jerky problem, in which jerky trajectories not only compromise system safety but also increase power consumption and shorten the service life of robotic and autonomous systems. To address jerky actions, a method called conditioning for action policy smoothness (CAPS) was proposed by adding regularization terms to reduce the action changes. This paper further proposes a novel method, named Gradient-based CAPS (Grad-CAPS), that modifies CAPS by reducing the difference in the gradient of action and then uses displacement normalization to enable the agent to adapt to invariant action scales. Consequently, our method effectively reduces zigzagging action sequences while enhancing policy expressiveness and the adaptability of our method across diverse scenarios and environments. In the experiments, we integrated Grad-CAPS with different reinforcement learning algorithms and evaluated its performance on various robotic-related tasks in DeepMind Control Suite and OpenAI Gym environments. The results demonstrate that Grad-CAPS effectively improves performance while maintaining a comparable level of smoothness compared to CAPS and Vanilla agents.

TMLR Journal 2024 Journal Article

Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

  • Chiu-Chou Lin
  • Yu-Wei Shih
  • Kuei-Ting Kuo
  • Yu-Cheng Chen
  • Chien-Hua Chen
  • Wei-Chen Chiu
  • I-Chen Wu

\textbf{How can balance be quantified in game settings?} This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions—such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games—is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in zero-sum competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations. Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology hinges on a simple technique to enhance codebook utilization in discrete representation with a deterministic vector quantization process for an extremely small state space. Our framework has been validated in popular online games, including \textit{Age of Empires II}, \textit{Hearthstone}, \textit{Brawl Stars}, and \textit{League of Legends}. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design.

TMLR Journal 2024 Journal Article

Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games

  • Chiu-Chou Lin
  • Wei-Chen Chiu
  • I-Chen Wu

Defining and measuring decision-making styles, also known as playstyles, is crucial in gaming, where these styles reflect a broad spectrum of individuality and diversity. However, finding a universally applicable measure for these styles poses a challenge. Building on $\textit{Playstyle Distance}$, the first unsupervised metric to measure playstyle similarity based on game screens and raw actions by identifying comparable states with discrete representations for computing policy distance, we introduce three enhancements to increase accuracy: multiscale analysis with varied state granularity, a perceptual kernel rooted in psychology, and the utilization of the intersection-over-union method for efficient evaluation. These innovations not only advance measurement precision but also offer insights into human cognition of similarity. Across two racing games and seven Atari games, our techniques significantly improve the precision of zero-shot playstyle classification, achieving an accuracy exceeding 90\% with fewer than 512 observation-action pairs—less than half an episode of these games. Furthermore, our experiments with $\textit{2048}$ and $\textit{Go}$ demonstrate the potential of discrete playstyle measures in puzzle and board games. We also develop an algorithm for assessing decision-making diversity using these measures. Our findings improve the measurement of end-to-end game analysis and the evolution of artificial intelligence for diverse playstyles.

AAAI Conference 2024 Conference Paper

PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping

  • Nai-Chieh Huang
  • Ping-Chun Hsieh
  • Kuo-Hao Ho
  • I-Chen Wu

Proximal Policy Optimization algorithm employing a clipped surrogate objective (PPO-Clip) is a prominent exemplar of the policy optimization methods. However, despite its remarkable empirical success, PPO-Clip lacks theoretical substantiation to date. In this paper, we contribute to the field by establishing the first global convergence results of a PPO-Clip variant in both tabular and neural function approximation settings. Our findings highlight the O(1/√T ) min-iterate convergence rate specifically in the context of neural function approximation. We tackle the inherent challenges in analyzing PPO-Clip through three central concepts: (i) We introduce a generalized version of the PPO-Clip objective, illuminated by its connection with the hinge loss. (ii) Employing entropic mirror descent, we establish asymptotic convergence for tabular PPO-Clip with direct policy parameterization. (iii) Inspired by the tabular analysis, we streamline convergence analysis by introducing a two-step policy improvement approach. This decouples policy search from complex neural policy parameterization using a regression-based update scheme. Furthermore, we gain deeper insights into the efficacy of PPO-Clip by interpreting these generalized objectives. Our theoretical findings also mark the first characterization of the influence of the clipping mechanism on PPO-Clip convergence. Importantly, the clipping range affects only the pre-constant of the convergence rate.

NeurIPS Conference 2023 Conference Paper

Game Solving with Online Fine-Tuning

  • Ti-Rong Wu
  • Hung Guei
  • Ting Han Wei
  • Chung-Chin Shih
  • Jui-Te Chin
  • I-Chen Wu

Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome. The AlphaZero algorithm has demonstrated super-human level play, and its powerful policy and value predictions have also served as heuristics in game solving. However, to solve a game and obtain a full strategy, a winning response must be found for all possible moves by the losing player. This includes very poor lines of play from the losing side, for which the AlphaZero self-play process will not encounter. AlphaZero-based heuristics can be highly inaccurate when evaluating these out-of-distribution positions, which occur throughout the entire search. To address this issue, this paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed heuristics for game solving. Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23. 54\% of computation time compared to the baseline without online fine-tuning. Results suggest that the savings scale with problem size. Our method can further be extended to any tree search algorithm for problem solving. Our code is available at https: //rlg. iis. sinica. edu. tw/papers/neurips2023-online-fine-tuning-solver.

IROS Conference 2023 Conference Paper

Image-based Regularization for Action Smoothness in Autonomous Miniature Racing Car with Deep Reinforcement Learning

  • Hoang-Giang Cao
  • I Lee
  • Bo-Jiun Hsu
  • Zheng-Yi Lee
  • Yu-Wei Shih
  • Hsueh-Cheng Wang
  • I-Chen Wu

Deep reinforcement learning has achieved signif-icant results in low-level controlling tasks. However, for some applications like autonomous driving and drone flying, it is difficult to control behavior stably since the agent may suddenly change its actions which often lowers the controlling sys-tem's efficiency, induces excessive mechanical wear, and causes uncontrollable, dangerous behavior to the vehicle. Recently, a method called conditioning for action policy smoothness (CAPS) was proposed to solve the problem of jerkiness in low-dimensional features for applications such as quadrotor drones. To cope with high-dimensional features, this paper proposes image-based regularization for action smoothness (1-RAS) for solving jerky control in autonomous miniature car racing. We also introduce a control based on impact ratio, an adaptive regularization weight to control the smoothness constraint, called IR control. In the experiment, an agent with 1- RAS and IR control significantly improves the success rate from 59% to 95%. In the real-world-track experiment, the agent also outperforms other methods, namely reducing the average finish lap time, while also improving the completion rate even without real world training. This is also justified by an agent based on I-RAS winning the 2022 AWS DeepRacer Final Championship Cup.

ICRA Conference 2023 Conference Paper

Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation

  • Hoang-Giang Cao
  • Weihao Zeng
  • I-Chen Wu

It is crucial to address the following issues for ubiquitous robotics manipulation applications: (a) vision-based manipulation tasks require the robot to visually learn and understand the object with rich information like dense object descriptors; and (b) sim-to-real transfer in robotics aims to close the gap between simulated and real data. In this paper, we present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptors that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency. We proposed an object-to-object matching method for image pairs from different scenes and different domains. This method helps reduce the effort of training data from real-world by taking advantage of public datasets, such as GraspNet. With sim-to-real object representation consistency, our SRDONs can serve as a building block for a variety of sim-to-real manipulation tasks. We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.

AAAI Conference 2022 Conference Paper

A Novel Approach to Solving Goal-Achieving Problems for Board Games

  • Chung-Chin Shih
  • Ti-Rong Wu
  • Ting Han Wei
  • I-Chen Wu

Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is wellstudied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying region safety. Many previous methods like lambda search try null moves first, then derive so-called relevance zones (RZs), outside of which the opponent does not need to search. This paper first proposes a novel RZ-based approach, called the RZ- Based Search (RZS), to solving L&D problems for Go. RZS tries moves before determining whether they are null moves post-hoc. This means we do not need to rely on null move heuristics, resulting in a more elegant algorithm, so that it can also be seamlessly incorporated into AlphaZero’s superhuman level play in our solver. To repurpose AlphaZero for solving, we also propose a new training method called Faster to Life (FTL), which modifies AlphaZero to entice it to win more quickly. We use RZS and FTL to solve L&D problems on Go, namely solving 68 among 106 problems from a professional L&D book while a previous state-of-the-art program TSUMEGO-EXPLORER solves 11 only. Finally, we discuss that the approach is generic in the sense that RZS is applicable to solving many other goal-achieving problems for board games.

ICLR Conference 2022 Conference Paper

AlphaZero-based Proof Cost Network to Aid Game Solving

  • Ti-Rong Wu
  • Chung-Chin Shih
  • Ting-Han Wei
  • Meng-Yu Tsai 0001
  • Wei-Yuan Hsu
  • I-Chen Wu

The AlphaZero algorithm learns and plays games without hand-crafted expert knowledge. However, since its objective is to play well, we hypothesize that a better objective can be defined for the related but separate task of solving games. This paper proposes a novel approach to solving problems by modifying the training target of the AlphaZero algorithm, such that it prioritizes solving the game quickly, rather than winning. We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems. This matches the general concept of the so-called proof number from proof number search, which has been shown to be well-suited for game solving. We propose two specific training targets. The first finds the shortest path to a solution, while the second estimates the proof cost. We conduct experiments on solving 15x15 Gomoku and 9x9 Killall-Go problems with both MCTS-based and FDFPN solvers. Comparisons between using AlphaZero networks and PCN as heuristics show that PCN can solve more problems.

NeurIPS Conference 2022 Conference Paper

Are AlphaZero-like Agents Robust to Adversarial Perturbations?

  • Li-Cheng Lan
  • Huan Zhang
  • Ti-Rong Wu
  • Meng-Yu Tsai
  • I-Chen Wu
  • Cho-Jui Hsieh

The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players, and 90\% of examples indeed lead the Go agent to play an obviously inferior action. Ourcode is available at \url{https: //PaperCode. cc/GoAttack}.

ICRA Conference 2022 Conference Paper

Reinforcement Learning for Picking Cluttered General Objects with Dense Object Descriptors

  • Hoang-Giang Cao
  • Weihao Zeng
  • I-Chen Wu

Picking cluttered general objects is a challenging task due to the complex geometries and various stacking configurations. Many prior works utilize pose estimation for picking, but pose estimation is difficult on cluttered objects. In this paper, we propose Cluttered Objects Descriptors (CODs), a dense cluttered objects descriptor which can represent rich object structures, and use the pre-trained CODs network along with its intermediate outputs to train a picking policy. Additionally, we train the policy with reinforcement learning, which enable the policy to learn picking without supervision. We conduct experiments to demonstrate that our CODs is able to consistently represent seen and unseen cluttered objects, which allowed for the picking policy to robustly pick cluttered general objects. The resulting policy can pick 96. 69% of unseen objects in our experimental environment that are twice as cluttered as the training scenarios.

UAI Conference 2021 Conference Paper

An unsupervised video game playstyle metric via state discretization

  • Chiu-Chou Lin
  • Wei-Chen Chiu
  • I-Chen Wu

On playing video games, different players usually have their own playstyles. Recently, there have been great improvements for the video game AIs on the playing strength. However, past researches for analyzing the behaviors of players still used heuristic rules or the behavior features with the game-environment support, thus being exhausted for the developers to define the features of discriminating various playstyles. In this paper, we propose the first metric for video game playstyles directly from the game observations and actions, without any prior specification on the playstyle in the target game. Our proposed method is built upon a novel scheme of learning discrete representations that can map game observations into latent discrete states, such that playstyles can be exhibited from these discrete states. Namely, we measure the playstyle distance based on game observations aligned to the same states. We demonstrate high playstyle accuracy of our metric in experiments on some video game platforms, including TORCS, RGSK, and seven Atari games, and for different agents including rule-based AI bots, learning-based AI bots, and human players.

AAAI Conference 2021 Conference Paper

Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

  • Li-Cheng Lan
  • Ti-Rong Wu
  • I-Chen Wu
  • Cho-Jui Hsieh

Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2. 5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.

AAAI Conference 2020 Conference Paper

Accelerating and Improving AlphaZero Using Population Based Training

  • Ti-Rong Wu
  • Ting-Han Wei
  • I-Chen Wu

AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source state-of-theart AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47% against ELF OpenGo under the same circumstances.

TCS Journal 2020 Journal Article

On solving the 7,7,5-game and the 8,8,5-game

  • Wei-Yuan Hsu
  • Chu-Ling Ko
  • Jr-Chang Chen
  • Ting-Han Wei
  • Chu-Hsuan Hsueh
  • I-Chen Wu

An mnk-game is a kind of k-in-a-row game played on an m × n board, where two players alternatively mark empty squares with their own colors and the first player who gets k-consecutive marks (horizontally, vertically, or diagonally) wins. In this paper, we present an AND/OR search tree algorithm specifically for proving mnk-games. We first propose three novel methods to reduce the branching factor of AND/OR search trees. We also propose a new method to find pairing strategies, which further accelerate the proof of mnk-games. The combined methods drastically speed up the proof for the 7, 7, 5-game, which is solved in 2. 5 seconds. Moreover, this paper is the first to solve the 8, 8, 5-game, which is proven as a draw within 17. 4 hours.

IJCAI Conference 2019 Conference Paper

Multiple Policy Value Monte Carlo Tree Search

  • Li-Cheng Lan
  • Wei Li
  • Ting-Han Wei
  • I-Chen Wu

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

AAAI Conference 2019 Conference Paper

On Strength Adjustment for MCTS-Based Programs

  • I-Chen Wu
  • Ti-Rong Wu
  • An-Jen Liu
  • Hung Guei
  • Tinghan Wei

This paper proposes an approach to strength adjustment for MCTS-based game-playing programs. In this approach, we use a softmax policy with a strength index z to choose moves. Most importantly, we filter low quality moves by excluding those that have a lower simulation count than a pre-defined threshold ratio of the maximum simulation count. We perform a theoretical analysis, reaching the result that the adjusted policy is guaranteed to choose moves exceeding a lower bound in strength by using a threshold ratio. The approach is applied to the Go program ELF OpenGo. The experiment results show that z is highly correlated to the empirical strength; namely, given a threshold ratio 0.1, z is linearly related to the Elo rating with regression error 47.95 Elo where −2≤z ≤2. Meanwhile, the covered strength range is about 800 Elo ratings in the interval of z in [−2,2]. With the ease of strength adjustment using z, we present two methods to adjust strength and predict opponents’ strengths dynamically. To our knowledge, this result is state-of-the-art in terms of the range of strengths in Elo rating while maintaining a controllable relationship between the strength and a strength index.

TCS Journal 2016 Journal Article

An analysis for strength improvement of an MCTS-based program playing Chinese dark chess

  • Chu-Hsuan Hsueh
  • I-Chen Wu
  • Wen-Jie Tseng
  • Shi-Jim Yen
  • Jr-Chang Chen

Monte Carlo tree search (MCTS) has been successfully applied to many games recently. Since then, many techniques are used to improve the strength of MCTS-based programs. This paper investigates four recent techniques: early playout terminations, implicit minimax backups, quality-based rewards and progressive bias. The strength improvements are analyzed by incorporating the techniques into an MCTS-based program, named DarkKnight, for Chinese Dark Chess. Experimental results showed that the win rates against the original DarkKnight were 60. 75%, 71. 85%, 59. 00%, and 82. 10%, respectively for incorporating the four techniques. The results indicated that the improvement by progressive bias was most significant. By incorporating all together, a better win rate of 84. 75% was obtained.

TCS Journal 2011 Journal Article

Drawn k -in-a-row games

  • Sheng-Hao Chiang
  • I-Chen Wu
  • Ping-Hung Lin

Wu and Huang (2005) [12] and Wu et al. (2006) [13] presented a generalized family of k -in-a-row games, called Connect( m, n, k, p, q ). Two players, Black and White, alternately place p stones on an m × n board in each turn. Black plays first, and places q stones initially. The player who first gets k consecutive stones of his/her own horizontally, vertically, or diagonally wins. Both tie the game when the board is filled up with neither player winning. A Connect( m, n, k, p, q ) game is drawn if neither has any winning strategy. Given p, this paper derives the value k d r a w ( p ), such that Connect( m, n, k, p, q ) games are drawn for all k ≥ k d r a w ( p ), m ≥ 1, n ≥ 1, 0 ≤ q ≤ p, as follows. (1) k d r a w ( p ) = 11. (2) For all p ≥ 3, k d r a w ( p ) = 3 p + 3 d − 1, where d is a logarithmic function of p. So, the ratio k d r a w ( p ) / p is approximately 3 for sufficiently large p. The first result was derived with the help of a program. To our knowledge, our k d r a w ( p ) values are currently the smallest for all 2 ≤ p < 1000.

FOCS Conference 1991 Conference Paper

Communication Complexity for Parallel Divide-and-Conquer

  • I-Chen Wu
  • H. T. Kung 0001

The relationship between parallel computation cost and communication cost for performing divide-and-conquer (D&C) computations on a parallel system of p processors is studied. The parallel computation cost is the maximal number of the D&C nodes that any processor in the parallel system may expand, whereas the communication cost is the total number of cross nodes (nodes generated by one processor but expanded by another processor). A scheduling algorithm is proposed, and lower bounds on the communication cost are derived. The proposed scheduling algorithm is optimal with respect to the communication cost, since the parallel computation cost of the algorithm is near optimal. >