Arrow Research search

Author name cluster

Tatsuya Harada

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

74 papers
2 author rows

Possible papers

74

TMLR Journal 2026 Journal Article

Contrastive VQ Priors for Multi-Class Plaque Segmentation via SAM Adaptation

  • Ruan Yizhe
  • Yusuke Kurose
  • JUNICHI IHO
  • Yoji Tokunaga
  • Makoto Horie
  • YUSAKU HAYASHI
  • Keisuke Nishizawa
  • Yasushi Koyama

Accurate plaque subtype segmentation in coronary CT angiography (CCTA) is clinically relevant yet remains difficult in practice, where annotations are scarce, and the visual evidence for non-calcified lesions is subtle and highly variable. Meanwhile, segmentation foundation models such as SAM provide strong robustness from large-scale pretraining, but their benefits do not reliably transfer to private CCTA tasks under naïve fine-tuning, especially for multi-class plaque taxonomy. We present a targeted strategy to transfer SAM's segmentation robustness to a private CCTA setting by injecting a task-specific, texture-aware prior into the SAM feature stream. Our framework is two-stage: (i) we learn a discrete latent prior from the private CCTA data using a vector-quantized autoencoder, and structure it with supervised contrastive learning to emphasize hard class boundaries; (ii) we fuse this prior into a SAM-based encoder through a query-based feature-aware cross-attention module, and decode with a multi-class head/decoder tailored for plaque taxonomy. On this private CCTA cohort, the proposed design improves overall performance over the compared baselines, with the largest gains on vessel wall and non-calcified plaque. Ablations suggest that the class-structured prior, query-based fusion, and multi-class decoding each contribute to the final result within this setting.

NeurIPS Conference 2025 Conference Paper

Dr. RAW: Towards General High-Level Vision from RAW with Efficient Task Conditioning

  • Wenjun Huang
  • Ziteng Cui
  • Yinqiang Zheng
  • Yirui He
  • Tatsuya Harada
  • Mohsen Imani

We introduce Dr. RAW, a unified and tuning-efficient framework for high-level computer vision tasks directly operating on camera RAW data. Unlike previous approaches that optimize image signal processing (ISP) pipelines and fully fine-tune networks for each task, Dr. RAW achieves state-of-the-art performance with minimal parameter updates. At the input stage, we apply lightweight pre-processing modules, sensor and illumination mapping, followed by re-mosaicing, to mitigate data inconsistencies stemming from sensor variation and lighting. At the network level, we introduce task-specific adaptation through two modules: Sensor Prior Prompts (SPP) and Low-Rank Adaptation (LoRA). SPP injects sensor-aware conditioning into the network via learnable prompts derived from imaging priors, while LoRA enables efficient task-specific tuning by updating only low-rank matrices in key backbone layers. Despite minimal tuning, our method delivers superior results across four RAW-based tasks (object detection, semantic segmentation, instance segmentation, and pose estimation) on nine datasets encompassing low-light and over-exposed conditions. By harnessing the intrinsic physical cues of RAW data alongside parameter-efficient techniques, our method advances RAW-based vision systems, achieving both high accuracy and computational economy. We will release our source code.

TMLR Journal 2025 Journal Article

EDM-TTS: Efficient Dual-Stage Masked Modeling for Alignment-Free Text-to-Speech Synthesis

  • Nabarun Goswami
  • Hanqin Wang
  • Tatsuya Harada

Tokenized speech modeling has significantly advanced zero-shot text-to-speech (TTS) capabilities. The most de facto approach involves a dual-stage process: text-to-semantic (T2S) followed by semantic-to-acoustic (S2A) generation. Several auto-regressive (AR) and non-autoregressive (NAR) methods have been explored in literature for both the stages. While AR models achieve state-of-the-art performance, its token-by-token generation causes inference inefficiencies, while NAR methods while being more efficient, require explicit alignment for upsampling intermediate representations, which constrains the model's capability for more natural prosody. To overcome these issues, we propose an **E**fficient **D**ual-stage **M**asked **TTS** (EDM-TTS) model that employs an alignment-free masked generative approach for the T2S stage that overcomes the constrains of an explicit aligner, while retaining the efficiency of NAR methods. For the S2A stage, we introduce an innovative NAR approach using a novel Injection Conformer architecture, that effectively models the conditional dependence among different acoustic quantization levels, optimized by a masked language modeling objective, enabling zero-shot speech generation. Our evaluations demonstrated not only the superior inference efficiency of EDM-TTS, but also its state-of-the-art high-quality zero-shot speech quality, naturalness and speaker similarity.

TMLR Journal 2025 Journal Article

Enhancing Plaque Segmentation in CCTA with Prompt- based Diffusion Data Augmentation

  • Ruan Yizhe
  • Xuangeng Chu
  • Ziteng Cui
  • Yusuke Kurose
  • JUNICHI IHO
  • Yoji Tokunaga
  • Makoto Horie
  • YUSAKU HAYASHI

Coronary computed tomography angiography (CCTA) is essential for non-invasive assessment of coronary artery disease (CAD). However, accurate segmentation of atherosclerotic plaques remains challenging due to data scarcity, severe class imbalance, and significant variability between calcified and non-calcified plaques. Inspired by DiffTumor’s tumor synthesis and PromptIR’s adaptive restoration framework, we introduce PromptLesion, a prompt-conditioned diffusion model for multi-class lesion synthesis. Unlike single-class methods, our approach integrates lesion-specific prompts within the diffusion generation process, enhancing diversity and anatomical realism in synthetic data. We validate PromptLesion on a private CCTA dataset and multi-organ tumor segmentation tasks (kidney, liver, pancreas) using public datasets, achieving superior performance compared to baseline methods. Models trained with our prompt-guided synthetic augmentation significantly improve Dice Similarity Coefficient (DSC) scores for both plaque and tumor segmentation. Extensive evaluations and ablation studies confirm the effectiveness of prompt conditioning.

ICML Conference 2025 Conference Paper

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

  • Motoki Omura
  • Kazuki Ota
  • Takayuki Osa
  • Yusuke Mukuta
  • Tatsuya Harada

For continuous action spaces, actor-critic methods are widely used in online reinforcement learning (RL). However, unlike RL algorithms for discrete actions, which generally model the optimal value function using the Bellman optimality operator, RL algorithms for continuous actions typically model Q-values for the current policy using the Bellman operator. These algorithms for continuous actions rely exclusively on policy updates for improvement, which often results in low sample efficiency. This study examines the effectiveness of incorporating the Bellman optimality operator into actor-critic frameworks. Experiments in a simple environment show that modeling optimal values accelerates learning but leads to overestimation bias. To address this, we propose an annealing approach that gradually transitions from the Bellman optimality operator to the Bellman operator, thereby accelerating learning while mitigating bias. Our method, combined with TD3 and SAC, significantly outperforms existing approaches across various locomotion and manipulation tasks, demonstrating improved performance and robustness to hyperparameters related to optimality. The code for this study is available at https: //github. com/motokiomura/annealed-q-learning.

TMLR Journal 2025 Journal Article

HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

  • Nabarun Goswami
  • Yusuke Mukuta
  • Tatsuya Harada

The success of models operating on tokenized data has heightened the need for effective tokenization methods, particularly in vision and auditory tasks where inputs are naturally continuous. A common solution is to employ Vector Quantization (VQ) within VQ Variational Autoencoders (VQVAEs), transforming inputs into discrete tokens by clustering embeddings in Euclidean space. However, Euclidean embeddings not only suffer from inefficient packing and limited separation—due to their polynomial volume growth—but are also prone to codebook collapse, where only a small subset of codebook vectors are effectively utilized. To address these limitations, we introduce HyperVQ, a novel approach that formulates VQ as a hyperbolic Multinomial Logistic Regression (MLR) problem, leveraging the exponential volume growth in hyperbolic space to mitigate collapse and improve cluster separability. Additionally, HyperVQ represents codebook vectors as geometric representatives of hyperbolic decision hyperplanes, encouraging disentangled and robust latent representations. Our experiments demonstrate that HyperVQ matches traditional VQ in generative and reconstruction tasks, while surpassing it in discriminative performance and yielding a more efficient and disentangled codebook.

NeurIPS Conference 2025 Conference Paper

I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions

  • Shuhong Liu
  • Lin Gu
  • Ziteng Cui
  • Xuangeng Chu
  • Tatsuya Harada

Participating in efforts to endow generative AI with the 3D physical world perception, we propose I2-NeRF, a novel neural radiance field framework that enhances isometric and isotropic metric perception under media degradation. While existing NeRF models predominantly rely on object-centric sampling, I2-NeRF introduces a reverse-stratified upsampling strategy to achieve near-uniform sampling across 3D space, thereby preserving isometry. We further present a general radiative formulation for media degradation that unifies emission, absorption, and scattering into a particle model governed by the Beer–Lambert attenuation law. By matting direct and media-induced in-scatter radiance, this formulation extends naturally to complex media environments such as underwater, haze, and even low-light scenes. By treating light propagation uniformly in both vertical and horizontal directions, I2-NeRF enables isotropic metric perception and can even estimate medium properties such as water depth. Experiments on real-world datasets demonstrate that our method significantly improves both reconstruction fidelity and physical plausibility compared to existing approaches. The source code is available at https: //github. com/ShuhongLL/I2-NeRF.

NeurIPS Conference 2025 Conference Paper

Intend to Move: A Multimodal Dataset for Intention-Aware Human Motion Understanding

  • Ryo Umagami
  • Liu Yue
  • Xuangeng Chu
  • Ryuto Fukushima
  • Tetsuya Narita
  • Yusuke Mukuta
  • Tomoyuki Takahata
  • Jianfei Yang

Human motion is inherently intentional, yet most motion modeling paradigms focus on low-level kinematics, overlooking the semantic and causal factors that drive behavior. Existing datasets further limit progress: they capture short, decontextualized actions in static scenes, providing little grounding for embodied reasoning. To address these limitations, we introduce $\textit{Intend to Move (I2M)}$, a large-scale, multimodal dataset for intention-grounded motion modeling. I2M contains 10. 1 hours of two-person 3D motion sequences recorded in dynamic realistic home environments, accompanied by multi-view RGB-D video, 3D scene geometry, and language annotations of each participant’s evolving intentions. Benchmark experiments reveal a fundamental gap in current motion models: they fail to translate high-level goals into physically and socially coherent motion. I2M thus serves not only as a dataset but as a benchmark for embodied intelligence, enabling research on models that can reason about, predict, and act upon the ``why'' behind human motion.

RLJ Journal 2025 Journal Article

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

  • Motoki Omura
  • Yusuke Mukuta
  • Kazuki Ota
  • Takayuki Osa
  • Tatsuya Harada

Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional shift, where the learned policy deviates from the dataset distribution, potentially leading to unreliable out-of-distribution actions. To mitigate this issue, regularization techniques have been employed. While many existing methods utilize density ratio-based measures, such as the $f$-divergence, for regularization, we propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data and captures the similarity between actions. Our method employs input-convex neural networks (ICNNs) to model optimal transport maps, enabling the computation of the Wasserstein distance in a discriminator-free manner, thereby avoiding adversarial training and ensuring stable learning. Our approach demonstrates comparable or superior performance to widely used existing methods on the D4RL benchmark dataset. The code is available at [https://github.com/motokiomura/Q-DOT](url).

RLC Conference 2025 Conference Paper

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

  • Motoki Omura
  • Yusuke Mukuta
  • Kazuki Ota
  • Takayuki Osa
  • Tatsuya Harada

Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional shift, where the learned policy deviates from the dataset distribution, potentially leading to unreliable out-of-distribution actions. To mitigate this issue, regularization techniques have been employed. While many existing methods utilize density ratio-based measures, such as the $f$-divergence, for regularization, we propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data and captures the similarity between actions. Our method employs input-convex neural networks (ICNNs) to model optimal transport maps, enabling the computation of the Wasserstein distance in a discriminator-free manner, thereby avoiding adversarial training and ensuring stable learning. Our approach demonstrates comparable or superior performance to widely used existing methods on the D4RL benchmark dataset. The code is available at [https: //github. com/motokiomura/Q-DOT](url).

ICLR Conference 2025 Conference Paper

T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning

  • Nabarun Goswami
  • Hanqin Wang
  • Tatsuya Harada

We introduce T2V2 (**T**ext to **V**oice and **V**oice to **T**ext), a unified non-autoregressive model capable of performing both automatic speech recognition (ASR) and text-to-speech (TTS) synthesis within the same framework. T2V2 uses a shared Conformer backbone with rotary positional embeddings to efficiently handle these core tasks, with ASR trained using Connectionist Temporal Classification (CTC) loss and TTS using masked language modeling (MLM) loss. The model operates on discrete tokens, where speech tokens are generated by clustering features from a self-supervised learning model. To further enhance performance, we introduce auxiliary tasks: CTC error correction to refine raw ASR outputs using contextual information from speech embeddings, and unconditional speech MLM, enabling classifier free guidance to improve TTS. Our method is self-contained, leveraging intermediate CTC outputs to align text and speech using Monotonic Alignment Search, without relying on external aligners. We perform extensive experimental evaluation to verify the efficacy of the T2V2 framework, achieving state-of-the-art performance on TTS task and competitive performance in discrete ASR.

AAAI Conference 2024 Conference Paper

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption

  • Ziteng Cui
  • Lin Gu
  • Xiao Sun
  • Xianzheng Ma
  • Yu Qiao
  • Tatsuya Harada

The standard Neural Radiance Fields (NeRF) paradigm employs a viewer-centered methodology, entangling the aspects of illumination and material reflectance into emission solely from 3D points. This simplified rendering approach presents challenges in accurately modeling images captured under adverse lighting conditions, such as low light or over-exposure. Motivated by the ancient Greek emission theory that posits visual perception as a result of rays emanating from the eyes, we slightly refine the conventional NeRF framework to train NeRF under challenging light conditions and generate normal-light condition novel views unsupervisedly. We introduce the concept of a ``Concealing Field," which assigns transmittance values to the surrounding air to account for illumination effects. In dark scenarios, we assume that object emissions maintain a standard lighting level but are attenuated as they traverse the air during the rendering process. Concealing Field thus compel NeRF to learn reasonable density and colour estimations for objects even in dimly lit situations. Similarly, the Concealing Field can mitigate over-exposed emissions during rendering stage. Furthermore, we present a comprehensive multi-view dataset captured under challenging illumination conditions for evaluation. Our code and proposed dataset are available at https://github.com/cuiziteng/Aleth-NeRF.

ICML Conference 2024 Conference Paper

Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning

  • Takayuki Osa
  • Tatsuya Harada

Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we therefore addressed the problem of finding multiple solutions from a single task in offline RL. We propose algorithms that can learn multiple solutions in offline RL, and empirically investigate their performance. Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL.

NeurIPS Conference 2024 Conference Paper

Generalizable and Animatable Gaussian Head Avatar

  • Xuangeng Chu
  • Tatsuya Harada

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGA) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars.

ICLR Conference 2024 Conference Paper

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

  • Xuangeng Chu
  • Yu Li
  • Ailing Zeng
  • Tianyu Yang
  • Lijian Lin
  • Yun Fei Liu
  • Tatsuya Harada

Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based warping, mesh-based, and neural rendering approaches, present challenges in maintaining multi-view consistency, incorporating non-facial information, and generalizing to new identities. In this paper, we propose a framework named GPAvatar that reconstructs 3D head avatars from one or several images in a single forward pass. The key idea of this work is to introduce a dynamic point-based expression field driven by a point cloud to precisely and effectively capture expressions. Furthermore, we use a Multi Tri-planes Attention (MTA) fusion module in tri-planes canonical field to leverage information from multiple input images. The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency, demonstrating promising results for free-viewpoint rendering and novel view synthesis.

TMLR Journal 2024 Journal Article

Offline Deep Reinforcement Learning for Visual Distractions via Domain Adversarial Training

  • Jen-Yen Chang
  • Thomas Westfechtel
  • Takayuki Osa
  • Tatsuya Harada

Recent advances in offline reinforcement learning (RL) have relied predominantly on learning from proprioceptive states. However, obtaining proprioceptive states for all objects may not always be feasible, particularly in offline settings. Therefore, RL agents must be capable of learning from raw sensor inputs such as images. However, recent studies have indicated that visual distractions can impair the performance of RL agents when observations in the evaluation environment differ significantly from those in the training environment. This issue is even more crucial in the visual offline RL paradigm, where the collected datasets can differ drastically from the testing environment. In this work, we investigated an adversarial-based algorithm to address the problem of visual distraction in offline RL settings. Our adversarial approach involves training agents to learn features that are more robust against visual distractions. Furthermore, we proposed a complementary dataset to add to the V-D4RL distraction dataset by extending it to more locomotion tasks. We empirically demonstrate that our method surpasses state-of-the-art baselines in tasks on both the VD4RL and proposed dataset when evaluated on random visual distractions.

ICRA Conference 2024 Conference Paper

Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration

  • Abby O'Neill
  • Abdul Rehman
  • Abhiram Maddukuri
  • Abhishek Gupta 0004
  • Abhishek Padalkar
  • Abraham Lee
  • Acorn Pooley
  • Agrim Gupta

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x. github.io.

ICRA Conference 2024 Conference Paper

Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks

  • Takayuki Osa
  • Tatsuya Harada

Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, policies trained in multi-agent RL are often sensitive to the policies of other agents. In such a case, a trained caregiver’s policy may not work for different care-receivers. To alleviate this issue, we propose a framework that learns a robust caregiver’s policy by training it for diverse care-receiver responses. In our framework, diverse care-receiver responses are autonomously learned through trials and errors. In addition, to robustify the care-giver’s policy, we propose a strategy for sampling a care-receiver’s response in an adversarial manner during the training. We evaluated the proposed method using tasks in an Assistive Gym. We demonstrate that policies trained with a popular deep RL method are vulnerable to changes in policies of other agents and that the proposed framework improves the robustness against such changes.

RLC Conference 2024 Conference Paper

Stabilizing Extreme Q-learning by Maclaurin Expansion

  • Motoki Omura
  • Takayuki Osa
  • Yusuke Mukuta
  • Tatsuya Harada

In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the modeled value function between the value function under the behavior policy and the soft optimal value function, thus achieving a trade-off between stability and optimality depending on the order of expansion. It also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. Our method significantly stabilizes learning in online RL tasks from DM Control, where XQL was previously unstable. Additionally, it improves performance in several offline RL tasks from D4RL.

RLJ Journal 2024 Journal Article

Stabilizing Extreme Q-learning by Maclaurin Expansion

  • Motoki Omura
  • Takayuki Osa
  • Yusuke Mukuta
  • Tatsuya Harada

In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the modeled value function between the value function under the behavior policy and the soft optimal value function, thus achieving a trade-off between stability and optimality depending on the order of expansion. It also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. Our method significantly stabilizes learning in online RL tasks from DM Control, where XQL was previously unstable. Additionally, it improves performance in several offline RL tasks from D4RL.

AAAI Conference 2024 Conference Paper

Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

  • Motoki Omura
  • Takayuki Osa
  • Yusuke Mukuta
  • Tatsuya Harada

In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. To address this, we proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution. We evaluated the proposed method on continuous control benchmark tasks in MuJoCo. It improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.

ICLR Conference 2023 Conference Paper

3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation

  • Zhennan Wu
  • Yang Li 0193
  • Yifei Huang 0002
  • Lin Gu 0003
  • Tatsuya Harada
  • Hiroyuki Sato 0002

Recently, 2D semantic segmentation has witnessed a significant advancement thanks to the huge amount of 2D image datasets available. Therefore, in this work, we propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models. Specifically, unlike standard knowledge distillation, where teacher and student models take the same data as input, we use 2D panoramas properly aligned with corresponding 3D rooms to train the teacher network and use the learned knowledge from 2D teacher to guide 3D student. To facilitate our research, we create a large-scale, fine-annotated 3D semantic segmentation benchmark, containing voxel-wise semantic labels and aligned panoramas of 5175 scenes. Based on this benchmark, we propose a 3D volumetric semantic segmentation network, which adapts Video Swin Transformer as backbone and introduces a skip connected linear decoder. Achieving a state-of-the-art performance, our 3D Segmenter is computationally efficient and only requires $3.8\%$ of the parameters compared to the prior art. Our code and data will be released upon acceptance.

NeurIPS Conference 2023 Conference Paper

Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image

  • Yuki Kawana
  • Tatsuya Harada

We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image, focusing on part-level shape reconstruction and pose and kinematics estimation. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Instead, we propose a novel alternative approach that employs part-level representation, representing instances as combinations of detected parts. While our detect-then-group approach effectively handles instances with diverse part structures and various part counts, it faces issues of false positives, varying part sizes and scales, and an increasing model size due to end-to-end training. To address these challenges, we propose 1) test-time kinematics-aware part fusion to improve detection performance while suppressing false positives, 2) anisotropic scale normalization for part shape learning to accommodate various part sizes and scales, and 3) a balancing strategy for cross-refinement between feature space and output space to improve part detection while maintaining model size. Evaluation on both synthetic and real data demonstrates that our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.

TMLR Journal 2023 Journal Article

Invariant Feature Coding using Tensor Product Representation

  • Yusuke Mukuta
  • Tatsuya Harada

In this study, a novel feature coding method that exploits invariance for transformations represented by a finite group of orthogonal matrices is proposed. We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier using convex loss minimization. Based on this result, a novel feature model that explicitly considers group action is proposed for principal component analysis and k-means clustering, which are commonly used in most feature coding methods, and global feature functions. Although the global feature functions are in general complex nonlinear functions, the group action on this space can be easily calculated by constructing these functions as tensor-product representations of basic representations, resulting in an explicit form of invariant feature functions. The effectiveness of our method is demonstrated on several image datasets.

AAAI Conference 2023 Conference Paper

People Taking Photos That Faces Never Share: Privacy Protection and Fairness Enhancement from Camera to User

  • Junjie Zhu
  • Lin Gu
  • Xiaoxiao Wu
  • Zheng Li
  • Tatsuya Harada
  • Yingying Zhu

The soaring number of personal mobile devices and public cameras poses a threat to fundamental human rights and ethical principles. For example, the stolen of private information such as face image by malicious third parties will lead to catastrophic consequences. By manipulating appearance of face in the image, most of existing protection algorithms are effective but irreversible. Here, we propose a practical and systematic solution to invertiblely protect face information in the full-process pipeline from camera to final users. Specifically, We design a novel lightweight Flow-based Face Encryption Method (FFEM) on the local embedded system privately connected to the camera, minimizing the risk of eavesdropping during data transmission. FFEM uses a flow-based face encoder to encode each face to a Gaussian distribution and encrypts the encoded face feature by random rotating the Gaussian distribution with the rotation matrix is as the password. While encrypted latent-variable face images are sent to users through public but less reliable channels, password will be protected through more secure channels through technologies such as asymmetric encryption, blockchain, or other sophisticated security schemes. User could select to decode an image with fake faces from the encrypted image on the public channel. Only trusted users are able to recover the original face using the encrypted matrix transmitted in secure channel. More interestingly, by tuning Gaussian ball in latent space, we could control the fairness of the replaced face on attributes such as gender and race. Extensive experiments demonstrate that our solution could protect privacy and enhance fairness with minimal effect on high-level downstream task.

TMLR Journal 2023 Journal Article

Unsupervised Domain Adaptation via Minimized Joint Error

  • Dexuan Zhang
  • Thomas Westfechtel
  • Tatsuya Harada

Unsupervised domain adaptation transfers knowledge from a fully labeled source domain to a different target domain, where no labeled data are available. Some researchers have proposed upper bounds for the target error when transferring knowledge. For example, Ben-David et al. (2010) established a theory based on minimizing the source error and distance between marginal distributions simultaneously. However, in most research, the joint error is ignored because of its intractability. In this research, we argue that joint errors are essential for domain adaptation problems, particularly when the domain gap is large. To address this problem, we propose a novel objective related to the upper bound of the joint error. Moreover, we adopt a source/pseudo-target label-induced hypothesis space that can reduce the search space to further tighten this bound. To measure the dissimilarity between hypotheses, we define a novel cross-margin discrepancy to alleviate instability during adversarial learning. In addition, we present extensive empirical evidence showing that the proposed method boosts the performance of image classification accuracy on standard domain adaptation benchmarks.

AAAI Conference 2022 Conference Paper

Fully Spiking Variational Autoencoder

  • Hiromichi Kamata
  • Yusuke Mukuta
  • Tatsuya Harada

Spiking neural networks (SNNs) can be run on neuromorphic devices with ultra-high speed and ultra-low energy consumption because of their binary and event-driven nature. Therefore, SNNs are expected to have various applications, including as generative models being running on edge devices to create high-quality images. In this study, we build a variational autoencoder (VAE) with SNN to enable image generation. VAE is known for its stability among generative models; recently, its quality advanced. In vanilla VAE, the latent space is represented as a normal distribution, and floating-point calculations are required in sampling. However, this is not possible in SNNs because all features must be binary time series data. Therefore, we constructed the latent space with an autoregressive SNN model, and randomly selected samples from its output to sample the latent variables. This allows the latent variables to follow the Bernoulli process and allows variational learning. Thus, we build the Fully Spiking Variational Autoencoder where all modules are constructed with SNN. To the best of our knowledge, we are the first to build a VAE only with SNN layers. We experimented with several datasets, and confirmed that it can generate images with the same or better quality compared to conventional ANNs. The code is available at https: //github. com/kamata1729/FullySpikingVAE.

NeurIPS Conference 2022 Conference Paper

Non-rigid Point Cloud Registration with Neural Deformation Pyramid

  • Yang Li
  • Tatsuya Harada

Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via hierarchical motion decomposition. Our method called Neural Deformation Pyramid (NDP) represents non-rigid motion using a pyramid architecture. Each pyramid level, denoted by a Multi-Layer Perception (MLP), takes as input a sinusoidally encoded 3D point and outputs its motion increments from the previous level. The sinusoidal function starts with a low input frequency and gradually increases when the pyramid level goes down. This allows a multi-level rigid to nonrigid motion decomposition and also speeds up the solving by ×50 times compared to the existing MLP-based approach. Our method achieves advanced partial-to-partial non-rigid point cloud registration results on the 4DMatch/4DLoMatchbenchmark under both no-learned and supervised settings.

ICLR Conference 2021 Conference Paper

Hyperbolic Neural Networks++

  • Ryohei Shimizu
  • Yusuke Mukuta
  • Tatsuya Harada

Hyperbolic spaces, which have the capacity to embed tree structures without distortion owing to their exponential volume growth, have recently been applied to machine learning to better capture the hierarchical nature of data. In this study, we generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincaré ball model. This novel methodology constructs a multinomial logistic regression, fully-connected layers, convolutional layers, and attention mechanisms under a unified mathematical interpretation, without increasing the parameters. Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.

ICRA Conference 2021 Conference Paper

Real-Time Mesh Extraction from Implicit Functions via Direct Reconstruction of Decision Boundary

  • Wataru Kawai
  • Yusuke Mukuta
  • Tatsuya Harada

The ability to estimate 3D object shape from a single image is vital to robotics and manufacturing. For instance, it enables iterative trial-and-error in simulated environments. In single-view reconstruction, implicit functions have demonstrated superior results over traditional methods. However, implicit functions suffer from the heavy computation of mesh extraction. This is due to the indirect mesh extraction, where the number of evaluation points grows cubically with resolution. On the other hand, reducing the resolution results in the discretization error of marching cubes (MC). In this work, we aim to perform efficient and accurate mesh extraction from implicit functions. The idea is to directly reconstruct the decision boundary of implicit functions as a mesh by reverse tracing from the output. It eliminates the need for evaluating massive points and error-prone MC. Consequently, we propose implementing an implicit function via a composite function of a flow and Binary-coded Input Neural Network (BCINN). The boundary of BCINN is easily identifiable, and the flow is invertible. Owing to these properties, the decision boundary of the composite function can be directly and efficiently reconstructed. In our experiments, we demonstrate that the proposed method significantly improves runtime/memory efficiency, with results comparable to those of existing methods. Specifically, our method enables real-time high-quality mesh inference from a single image.

AAAI Conference 2021 Conference Paper

Spherical Image Generation from a Single Image by Considering Scene Symmetry

  • Takayuki Hara
  • Yusuke Mukuta
  • Tatsuya Harada

Spherical images taken in all directions (360◦ ×180◦ ) allow the full surroundings of a subject to be represented, providing an immersive experience to viewers. Generating a spherical image from a single normal-field-of-view (NFOV) image is convenient and expands the usage scenarios considerably without relying on a specific panoramic camera or images taken from multiple directions; however, achieving such images remains a challenging and unresolved problem. The primary challenge is controlling the high degree of freedom involved in generating a wide area that includes all directions of the desired spherical image. We focus on scene symmetry, which is a basic property of the global structure of spherical images, such as rotational symmetry, plane symmetry, and asymmetry. We propose a method for generating a spherical image from a single NFOV image and controlling the degree of freedom of the generated regions using the scene symmetry. To estimate and control the scene symmetry using both a circular shift and flip of the latent image features, we incorporate the intensity of the symmetry as a latent variable into conditional variational autoencoders. Our experiments show that the proposed method can generate various plausible spherical images controlled from symmetric to asymmetric, and can reduce the reconstruction errors of the generated images based on the estimated symmetry.

AAAI Conference 2020 Conference Paper

Domain Generalization Using a Mixture of Multiple Latent Domains

  • Toshihiko Matsuura
  • Tatsuya Harada

When domains, which represent underlying data distributions, vary during training and testing processes, deep neural networks suffer a drop in their performance. Domain generalization allows improvements in the generalization performance for unseen target domains by using multiple source domains. Conventional methods assume that the domain to which each sample belongs is known in training. However, many datasets, such as those collected via web crawling, contain a mixture of multiple latent domains, in which the domain of each sample is unknown. This paper introduces domain generalization using a mixture of multiple latent domains as a novel and more realistic scenario, where we try to train a domain-generalized model without using domain labels. To address this scenario, we propose a method that iteratively divides samples into latent domains via clustering, and which trains the domain-invariant feature extractor shared among the divided latent domains via adversarial learning. We assume that the latent domain of images is re- flected in their style, and thus, utilize style features for clustering. By using these features, our proposed method successfully discovers latent domains and achieves domain generalization even if the domain labels are not given. Experiments show that our proposed method can train a domaingeneralized model without using domain labels. Moreover, it outperforms conventional domain generalization methods, including those that utilize domain labels.

IROS Conference 2020 Conference Paper

Learning Agile Locomotion via Adversarial Training

  • Yujin Tang
  • Jie Tan 0001
  • Tatsuya Harada

Developing controllers for agile locomotion is a long-standing challenge for legged robots. Reinforcement learning (RL) and Evolution Strategy (ES) hold the promise of automating the design process of such controllers. However, dedicated and careful human effort is required to design training environments to promote agility. In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape. We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort. In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility. Through extensive experiments, we show that the locomotion controller learned with adversarial training significantly outperforms carefully designed baselines.

NeurIPS Conference 2020 Conference Paper

Neural Star Domain as Primitive Representation

  • Yuki Kawana
  • Yusuke Mukuta
  • Tatsuya Harada

Reconstructing 3D objects from 2D images is a fundamental task in computer vision. Acurate structured reconstruction by parsimonious and semantic primitive representation further broadens its application. When reconstructing a target shape with multiple primitives, it is preferable that one can instantly access the union of basic properties of the shape such as collective volume and surface, treating the primitives as if they are one single shape. This becomes possible by primitive representation with unified implicit and explicit representations. However, primitive representations in current approaches do not satisfy all of the above requirements at the same time. To solve this problem, we propose a novel primitive representation named neural star domain (NSD) that learns primitive shapes in the star domain. We show that NSD is a universal approximator of the star domain and is not only parsimonious and semantic but also an implicit and explicit shape representation. We demonstrate that our approach outperforms existing methods in image reconstruction tasks, semantic capabilities, and speed and quality of sampling high-resolution meshes.

IROS Conference 2020 Conference Paper

Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial Observability in Visual Navigation

  • Kenzo Lobos-Tsunekawa
  • Tatsuya Harada

Reinforcement Learning (RL), among other learning-based methods, represents powerful tools to solve complex robotic tasks (e. g. , actuation, manipulation, navigation, etc.), with the need for real-world data to train these systems as one of its most important limitations. The use of simulators is one way to address this issue, yet knowledge acquired in simulations does not work directly in the real-world, which is known as the sim-to-real transfer problem. While previous works focus on the nature of the images used as observations (e. g. , textures and lighting), which has proven useful for a sim-to-sim transfer, they neglect other concerns regarding said observations, such as precise geometrical meanings, failing at robot-to-robot, and thus in sim-to-real transfers. We propose a method that learns on an observation space constructed by point clouds and environment randomization, generalizing among robots and simulators to achieve sim-to-real, while also addressing partial observability. We demonstrate the benefits of our methodology on the point goal navigation task, in which our method proves to be highly unaffected to unseen scenarios produced by robot-to-robot transfer, outperforms image-based baselines in robot-randomized experiments, and presents high performances in sim-to-sim conditions. Finally, we perform several experiments to validate the sim-to-real transfer to a physical domestic robot platform, confirming the out-of-the-box performance of our system.

ICLR Conference 2020 Conference Paper

RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis

  • Atsuhiro Noguchi
  • Tatsuya Harada

Understanding three-dimensional (3D) geometries from two-dimensional (2D) images without any labeled information is promising for understanding the real world without incurring annotation cost. We herein propose a novel generative model, RGBD-GAN, which achieves unsupervised 3D representation learning from 2D images. The proposed method enables camera parameter--conditional image generation and depth image generation without any 3D annotations, such as camera poses or depth. We use an explicit 3D consistency loss for two RGBD images generated from different camera parameters, in addition to the ordinal GAN objective. The loss is simple yet effective for any type of image generator such as DCGAN and StyleGAN to be conditioned on camera parameters. Through experiments, we demonstrated that the proposed method could learn 3D representations from 2D images with various generator architectures.

IROS Conference 2020 Conference Paper

SplitFusion: Simultaneous Tracking and Mapping for Non-Rigid Scenes

  • Yang Li 0143
  • Tianwei Zhang 0002
  • Yoshihiko Nakamura
  • Tatsuya Harada

We present SplitFusion, a novel dense RGB-D SLAM framework that simultaneously performs tracking and dense reconstruction for both rigid and non-rigid components of the scene. SplitFusion first adopts deep learning based semantic instant segmentation technique to split the scene into rigid or non-rigid surfaces. The split surfaces are independently tracked via rigid or non-rigid ICP and reconstructed through incremental depth map fusion. Experimental results show that the proposed approach can provide not only accurate environment maps but also well-reconstructed non-rigid targets, e. g. , the moving humans.

AAAI Conference 2019 Conference Paper

Estimating the Causal Effect from Partially Observed Time Series

  • Akane Iseki
  • Yusuke Mukuta
  • Yoshitaka Ushiku
  • Tatsuya Harada

Many real-world systems involve interacting time series. The ability to detect causal dependencies between system components from observed time series of their outputs is essential for understanding system behavior. The quantification of causal influences between time series is based on the definition of some causality measure. Partial Canonical Correlation Analysis (Partial CCA) and its extensions are examples of methods used for robustly estimating the causal relationships between two multidimensional time series even when the time series are short. These methods assume that the input data are complete and have no missing values. However, real-world data often contain missing values. It is therefore crucial to estimate the causality measure robustly even when the input time series is incomplete. Treating this problem as a semi-supervised learning problem, we propose a novel semi-supervised extension of probabilistic Partial CCA called semi-Bayesian Partial CCA. Our method exploits the information in samples with missing values to prevent the overfitting of parameter estimation even when there are few complete samples. Experiments based on synthesized and real data demonstrate the ability of the proposed method to estimate causal relationships more correctly than existing methods when the data contain missing values, the dimensionality is large, and the number of samples is small.

ICRA Conference 2019 Conference Paper

Improved Optical Flow for Gesture-based Human-robot Interaction

  • Jen-Yen Chang
  • Antonio Tejero-de-Pablos
  • Tatsuya Harada

Gesture interaction is a natural way of communicating with a robot as an alternative to speech. Gesture recognition methods leverage optical flow in order to understand human motion. However, while accurate optical flow estimation (i. e. , traditional) methods are costly in terms of runtime, fast estimation (i. e. , deep learning) methods' accuracy can be improved. In this paper, we present a pipeline for gesture-based human-robot interaction that uses a novel optical flow estimation method in order to achieve an improved speed-accuracy trade-off. Our optical flow estimation method introduces four improvements to previous deep learning-based methods: strong feature extractors, attention to contours, midway features, and a combination of these three. This results in a better understanding of motion, and a finer representation of silhouettes. In order to evaluate our pipeline, we generated our own dataset, MIBURI, which contains gestures to command a house service robot. In our experiments, we show how our method improves not only optical flow estimation, but also gesture recognition, offering a speed-accuracy trade-off more realistic for practical robot applications.

ICRA Conference 2019 Conference Paper

Pose Graph optimization for Unsupervised Monocular Visual Odometry

  • Yang Li 0143
  • Yoshitaka Ushiku
  • Tatsuya Harada

Unsupervised Learning based monocular visual odometry (VO) has lately drawn significant attention for its potential in label-free leaning ability and robustness to camera parameters and environmental variations. However, partially due to the lack of drift correction technique, these methods are still by far less accurate than geometric approaches for large-scale odometry estimation. In this paper, we propose to leverage graph optimization and loop closure detection to overcome limitations of unsupervised learning based monocular visual odometry. To this end, we propose a hybrid VO system which combines an unsupervised monocular VO called NeuralBundler with a pose graph optimization back-end. NeuralBundler is a neural network architecture that uses temporal and spatial photometric loss as main supervision and generates a windowed pose graph consists of multi-view 6DoF constraints. We propose a novel pose cycle consistency loss to relieve the tensions in the windowed pose graph, leading to improved performance and robustness. In the back-end, a global pose graph is built from local and loop 6DoF constraints estimated by NeuralBundler, and is optimized over SE(3). Empirical evaluation on the KITTI odometry dataset demonstrates that 1) NeuralBundler achieves state-of-the-art performance on unsupervised monocular VO estimation, and 2) our whole approach can achieve efficient loop closing and show favorable overall translational accuracy compared to established monocular SLAM systems.

IROS Conference 2019 Conference Paper

Simultaneous Transparent and Non-Transparent Object Segmentation With Multispectral Scenes

  • Atsuro Okazawa
  • Tomoyuki Takahata
  • Tatsuya Harada

For an autonomous mobile system such as an autonomous robot that moves throughout a city, semantic segmentation is important. Performing semantic segmentation under diverse conditions, in turn, requires 1) a robust ability to recognize objects in low-visibility environments, such as at night and 2) the ability to recognize objects that transmit visible light, such as glass and acrylic used in doors and windows. To satisfy these requirements, using RGB images and infrared images simultaneously is considered effective. Visibility and infrared transmission characteristics are different for different objects; therefore, merely entering them into the conventional semantic segmentation framework is not applicable. For example, when a pedestrian is present behind a glass, the visible image captures the pedestrian rather than the glass and the infrared image captures the glass. In this research, we propose a new semantic segmentation method having a three-stream structure, focusing on the difference in the transmission characteristics. This method extracts not only valid features for ordinary non-transparent objects but also features effective for the recognition of transparent objects by utilizing differences in objects to be imaged owing to transmission characteristics. Furthermore, we constructed a new dataset called “coaxials” for the visible and infrared coaxial dataset and demonstrated that we can obtain better segmentation performance compared with the conventional method.

AAAI Conference 2018 Conference Paper

Alternating Circulant Random Features for Semigroup Kernels

  • Yusuke Mukuta
  • Yoshitaka Ushiku
  • Tatsuya Harada

The random features method is an efficient method to approximate the kernel function. In this paper, we propose novel random features called “alternating circulant random features, ” which consist of a random mixture of independent random structured matrices. Existing fast random features exploit random sign flipping to reduce the correlation between features. Sign flipping works well on random Fourier features for real-valued shift-invariant kernels because the corresponding weight distribution is symmetric. However, this method cannot be applied to random Laplace features directly because the distribution is not symmetric. The method proposed herein yields alternating circulant random features, with the correlation between features being reduced through the random sampling of weights from multiple independent random structured matrices instead of via random sign flipping. The proposed method facilitates rapid calculation by employing structured matrices. In addition, the weight distribution is preserved because sign flipping is not implemented. The performance of the proposed alternating circulant random features method is theoretically and empirically evaluated.

AAAI Conference 2018 Conference Paper

Hierarchical Video Generation From Orthogonal Information: Optical Flow and Texture

  • Katsunori Ohnishi
  • Shohei Yamamoto
  • Yoshitaka Ushiku
  • Tatsuya Harada

Learning to represent and generate videos from unlabeled data is a very challenging problem. To generate realistic videos, it is important not only to ensure that the appearance of each frame is real, but also to ensure the plausibility of a video motion and consistency of a video appearance in the time direction. The process of video generation should be divided according to these intrinsic difficulties. In this study, we focus on the motion and appearance information as two important orthogonal components of a video, and propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to explore a way to learn from unlabeled data. Thus, we employ optical flow as motion information to generate videos. FlowGAN generates optical flow, which contains only the edge and motion of the videos to be begerated. On the other hand, Texture- GAN specializes in giving a texture to optical flow generated by FlowGAN. This hierarchical approach brings more realistic videos with plausible motion and appearance consistency. Our experiments show that our model generates more plausible motion videos and also achieves significantly improved performance for unsupervised action classification in comparison to previous GAN works. In addition, because our model generates videos from two independent information, our model can generate new combinations of motion and attribute that are not seen in training data, such as a video in which a person is doing sit-up in a baseball ground.

ICML Conference 2017 Conference Paper

Asymmetric Tri-training for Unsupervised Domain Adaptation

  • Kuniaki Saito
  • Yoshitaka Ushiku
  • Tatsuya Harada

It is important to apply models trained on a large number of labeled samples to different domains because collecting many labeled samples in various domains is expensive. To learn discriminative representations for the target domain, we assume that artificially labeling the target samples can result in a good representation. Tri-training leverages three classifiers equally to provide pseudo-labels to unlabeled samples; however, the method does not assume labeling samples generated from a different domain. In this paper, we propose the use of an asymmetric tri-training method for unsupervised domain adaptation, where we assign pseudo-labels to unlabeled samples and train the neural networks as if they are true labels. In our work, we use three networks asymmetrically, and by asymmetric, we mean that two networks are used to label unlabeled target samples, and one network is trained by the pseudo-labeled samples to obtain target-discriminative representations. Our proposed method was shown to achieve a state-of-the-art performance on the benchmark digit recognition datasets for domain adaptation.

IROS Conference 2017 Conference Paper

MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes

  • Qishen Ha
  • Kohei Watanabe
  • Takumi Karasawa
  • Yoshitaka Ushiku
  • Tatsuya Harada

This work addresses the semantic segmentation of images of street scenes for autonomous vehicles based on a new RGB-Thermal dataset, which is also introduced in this paper. An increasing interest in self-driving vehicles has brought the adaptation of semantic segmentation to self-driving systems. However, recent research relating to semantic segmentation is mainly based on RGB images acquired during times of poor visibility at night and under adverse weather conditions. Furthermore, most of these methods only focused on improving performance while ignoring time consumption. The aforementioned problems prompted us to propose a new convolutional neural network architecture for multi-spectral image segmentation that enables the segmentation accuracy to be retained during real-time operation. We benchmarked our method by creating an RGB-Thermal dataset in which thermal and RGB images are combined. We showed that the segmentation accuracy was significantly increased by adding thermal infrared information.

IROS Conference 2015 Conference Paper

3D Selective Search for obtaining object candidates

  • Asako Kanezaki
  • Tatsuya Harada

We propose a new method for obtaining object candidates in 3D space. Our method requires no learning, has no limitation of object properties such as compactness or symmetry, and therefore produces object candidates using a completely general approach. This method is a simple combination of Selective Search, which is a non-learning-based objectness detector working in 2D images, and a supervoxel segmentation method, which works with 3D point clouds. We made a small but non-trivial modification to supervoxel segmentation; it brings better “seeding” for supervoxels, which produces more proper object candidates as a result. Our experiments using a couple of publicly available RGB-D datasets demonstrated that our method outperformed state-of-the-art methods of generating object proposals in 2D images.

ICRA Conference 2014 Conference Paper

Hard negative classes for multiple object detection

  • Asako Kanezaki
  • Sho Inaba
  • Yoshitaka Ushiku
  • Yuya Yamashita
  • Hiroshi Muraoka
  • Yasuo Kuniyoshi
  • Tatsuya Harada

We propose an efficient method to train multiple object detectors simultaneously using a large scale image dataset. The one-vs-all approach that optimizes the boundary between positive samples from a target class and negative samples from the others has been the most standard approach for object detection. However, because this approach trains each object detector independently, the scores are not balanced between object classes. The proposed method combines ideas derived from both detection and classification in order to balance the scores across all object classes. We optimized the boundary between target classes and their “hard negative” samples, just as in detection, while simultaneously balancing the detector scores across object classes, as done in multi-class classification. We evaluated the performances on multi-class object detection using a subset of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2011 dataset and showed our method outperformed a de facto standard method.

ICML Conference 2014 Conference Paper

Probabilistic Partial Canonical Correlation Analysis

  • Yusuke Mukuta
  • Tatsuya Harada

Partial canonical correlation analysis (partial CCA) is a statistical method that estimates a pair of linear projections onto a low dimensional space, where the correlation between two multidimensional variables is maximized after eliminating the influence of a third variable. Partial CCA is known to be closely related to a causality measure between two time series. However, partial CCA requires the inverses of covariance matrices, so the calculation is not stable. This is particularly the case for high-dimensional data or small sample sizes. Additionally, we cannot estimate the optimal dimension of the subspace in the model. In this paper, we have addressed these problems by proposing a probabilistic interpretation of partial CCA and deriving a Bayesian estimation method based on the probabilistic model. Our numerical experiments demonstrated that our methods can stably estimate the model parameters, even in high dimensions or when there are a small number of samples.

NeurIPS Conference 2012 Conference Paper

Graphical Gaussian Vector for Image Categorization

  • Tatsuya Harada
  • Yasuo Kuniyoshi

This paper proposes a novel image representation called a Graphical Gaussian Vector, which is a counterpart of the codebook and local feature matching approaches. In our method, we model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. We consider the parameter of GMRF as a feature vector of the image. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Finally we define a new image feature by embedding the metric into the parameters, which can be directly applied to scalable linear classifiers. Our method obtains superior performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. As the proposed method simply calculates the local auto-correlations of local features, it is able to achieve both high classification accuracy and high efficiency.

IROS Conference 2012 Conference Paper

Visual anomaly detection from small samples for mobile robots

  • Hiroharu Kato
  • Tatsuya Harada
  • Yasuo Kuniyoshi

We propose a novel method of visual anomaly detection for mobile robots in daily real-life settings. Visual anomaly detection using mobile robots is important for security systems or simply for gathering information. However, this task is challenging for two reasons. First, because the number of observed images sampled at the same location is small, anomaly detection systems cannot use standard statistical methods. Second, anomalies must be detected in the presence of other continuous, ambient changes in the visual scene, such as changes in lighting from morning to night. Regarding the former problem, we develop and apply an analysis-by-synthesis-based anomaly detection method for mobile robots. For the latter, we propose a novel definition of anomaly that uses observed samples at other locations to filter out ambient changes that should be ignored by the system. Experimental results demonstrate that our method can detect anomalies from small samples in the presence of ambient changes, which could not be detected by conventional methods.

ICRA Conference 2011 Conference Paper

Fast object detection for robots in a cluttered indoor environment using integral 3D feature table

  • Asako Kanezaki
  • Takahiro Suzuki
  • Tatsuya Harada
  • Yasuo Kuniyoshi

Realizing automatic object search by robots in an indoor environment is one of the most important and challenging topics in mobile robot research. If the target object does not exist in a nearby area, the obvious strategy is to go to the area in which it was last observed. We have developed a robot system that collects 3D-scene data in an indoor environment during automatic routine crawling, and also detects objects quickly through a global search of the collected 3D-scene data. The 3D-scene data can be obtained automatically by transforming color images and range images into a set of color voxel data using self-location information. To detect an object, the system moves the bounding box of the target object by a certain step in the color voxel data, extracts 3D features in each box region, and computes the similarity between these features and the target object's features, using an appropriate feature projection learned beforehand. Taking advantage of the additive property of our 3D features, both feature extraction and similarity calculation are considerably accelerated. In the object learning process, the system obtains the feature-projection matrix by weighting unique features of the target object rather than its common features, resulting in reducing object detection errors.

IROS Conference 2011 Conference Paper

Visual anomaly detection under temporal and spatial non-uniformity for news finding robot

  • Takahiro Suzuki
  • Fumihiro Bessho
  • Tatsuya Harada
  • Yasuo Kuniyoshi

In this paper, we propose a news-gathering mobile robot system, and the novel visual anomaly detection method as the core function of news detection in the real world. Visual anomaly detection is important and widely applicable not only to the news-gathering robot but also to the security systems. However, visual anomaly detection from the mobile robot is highly challenging, because the appearances of images captured by the moving robot are dynamically changing. In consequence, the number of observed images at the same location becomes small, and the sampling interval of those images is not constant. To tackle this problem, we developed a new method to incorporate many samples observed at different locations as previous knowledge, which implicitly represent semantically similar to the intended location. Also, we developed a new statistical model, which explicitly considers sampling interval of input images, whereas conventional methods ignore correlation among samples. Experimental results demonstrate that our method outperforms conventional methods, and our mobile robot system including the proposed method finds, investigates, and publishes news of a local community of the real world.

ICRA Conference 2010 Conference Paper

High-speed 3D object recognition using additive features in a linear subspace

  • Asako Kanezaki
  • Hideki Nakayama
  • Tatsuya Harada
  • Yasuo Kuniyoshi

In this paper we propose a method of high-speed 3D object recognition using linear subspace method and our 3D features. This method can be applied to partial models with any size in any posture. Although it is becoming easy to obtain textured 3D models by a 3D scanner, there are few methods for 3D object recognition which take into account both shape and textures of objects. Moreover, it is difficult to achieve high-speed processing of large 3D data. Our 3D features consider the co-occurrence of shape and colors of an object's surface. The additive property of these features makes it possible to calculate the similarity between a query part and the subspace of each object in a database without division, and therefore the time for recognition is quite short. In the experiments, we compare our method with conventional methods using Spin-Images and Textured Spin-Images. We show that our method is appropriate for 3D object recognition.

ICRA Conference 2009 Conference Paper

Wearable motion capture suit with full-body tactile sensors

  • Yuki Fujimori
  • Yoshiyuki Ohmura
  • Tatsuya Harada
  • Yasuo Kuniyoshi

This paper presents a system for capturing human movement and tactile data and methods for analyzing this data. We cannot fully capture the essence of motion without tactile information, and sometimes the lack of such information causes critical problems. To achieve a better understanding of motion behavior, we developed a wearable motion capture suit with full-body tactile sensors. We also developed a motion sensor which can estimate its orientation with its inner CPU. We also built a tactile sensor module which can fit many kinds of body shapes. With this system, we can measure a user's movement and tactile information simultaneously. By integrating tactile data with motion data, we can achieve many kinds of meaningful insights. We demonstrate the effectiveness of this system with experiments. We captured two motions: stretching after sitting on a chair and laying down on a bed. By recognizing the contact point from the tactile data and fitting it into the environment, we were able to estimate the motion trajectories.

IROS Conference 2008 Conference Paper

Smart extraction of desired object from color-distance image with user's tiny scribble

  • Naoki Shibuya
  • Yasuyuki Shimohata
  • Tatsuya Harada
  • Yasuo Kuniyoshi

Image segmentation is an important problem because it is required for many different applications. In particular, visual extraction of an object that is the target of attention or manipulation, is an increasingly important issue in robot vision. In real-world applications, a robot needs to extract an object designated by a human in a complicated environment. There is a large literature on the problem of image segmentation, but most previous methods have a limited ability to extract desired objects from a cluttered scene. Moreover, from the perspective of human-robot interfaces, it is desirable to make it as easy as possible for the user to indicate an object. In this paper, we propose a segmentation method, CD-matting, which can correctly extract a target object in complicated real-world visual situations. This method exploits color and distance information in an integrated way. The system requires only a simple input to designate the target object. We verify the proposed system by real-world experiments. The results show the effectiveness of our method in complicated situations.

IROS Conference 2007 Conference Paper

Development of Wireless Networked Tiny Orientation Device for Wearable Motion Capture and Measurement of Walking Around, Walking Up and Down, and Jumping Tasks

  • Tatsuya Harada
  • Tomoaki Gyota
  • Yasuo Kuniyoshi
  • Tomomasa Sato

In this paper, we developed a tiny orientation device equipped with a wireless network function for a wearable motion capture. The wearable motion capture is defined that it not only measures the posture of the human body but also collects environmental information and the human’s internal state simultaneously and easily. Because the realized device automatically configures wireless networks and is small enough to attach it anywhere, it is easy to gather any sensor information. The feature of the orientation estimation method is that models are switched according to the environment to exclude the effect of motion disturbances. In experiments, by integrating of sole sensors and the orientation sensors, walking around, walking up and down, and jumping tasks were successfully measured. Because it is difficult to measure these motions with only the inertial sensors, it showed the importance of integrating various sensors for acquiring human motion.

IROS Conference 2007 Conference Paper

Journalist robot: robot system making news articles from real world

  • Rie Matsumoto
  • Hideki Nakayama
  • Tatsuya Harada
  • Yasuo Kuniyoshi

We describe the development of a journalist robot system, which generates articles by searching for news in the real world. Our system repeats the steps: (1) autonomous exploration (2) recording of news, and (3) generation of articles. We characterize events with two values: "anomaly" and "relevance" to the user. During the exploration step, images are evaluated using these values. If an interesting event is detected, the robot approaches it to collect additional information. The system then labels the images, and generates a description from the labels. Experiments show the ability of our system to find news-like phenomena and describe images with words.

IROS Conference 2006 Conference Paper

Imitation Learning System to Assist Human Task Interactively

  • So Taoka
  • Tatsuya Harada
  • Tomomasa Sato
  • Taketoshi Mori

This paper proposes an imitation learning system to generate trajectories by which a robot supports a human with close physical assistance adapting to human movements and daily life environments. The proposed system is composed of 1) division algorithms, 2) learning algorithms and 3) assistance algorithms. 1) In division algorithms, the system measures time series of human task execution data and divides them into multiple motion segments automatically. This division is based on standard deviations of motion errors between measured trajectories and an ideal trajectory where the ideal trajectory is mean of all measured human trajectories and is expected to achieve the purpose of human task successfully. Since an important motion parameter is paid attention to by the human and has small standard deviation of errors, series of measured data are divided into segment motions at the points where the importance of parameters changes suddenly. Thus this division is guaranteed to accord with human attention. 2) In learning algorithms, the system learns trajectories with dynamic neural network (DNN). Since the DNN has convergence, generated trajectories can converge to an ideal trajectory. The importance of each parameter, in other words how much attention human pays to the parameter, is evaluated as how small the standard deviation of errors is. The DNN learns trajectories reflecting the evaluated importance of parameters to accord with human feeling. 3) In assistance algorithms, the system judges when to start assistance by the assumption of multiplied errors of motion parameters by the respective importance. In assistance algorithms, the system also connects generated trajectories of motion segments smoothly. An experiment to support human drink task was performed successfully where the proposed system judged not only when to start assistance to the task but also execute assistance when a cup was about to incline too much not to spill water

IROS Conference 2005 Conference Paper

Behavior prediction based on daily-life record database in distributed sensing space

  • Taketoshi Mori
  • Aritoki Takada
  • Hiroshi Noguchi
  • Tatsuya Harada
  • Tomomasa Sato

This paper proposes a behavior prediction system for supporting our daily lives. The behaviors in daily-life are recorded in an environment with embedded sensors, and the prediction system learns the characteristic patterns that would be followed by the behaviors to be predicted. In this research, the authors applied a method of discovering time-series association rules, which discovers frequent combinations of events called episodes. The prediction system observes behaviors with the sensors and outputs the prediction of the future behaviors based on the rules.

IROS Conference 2005 Conference Paper

Construction of wireless ad hoc network for Lifelog based physical and informational support system

  • Tatsuya Harada
  • Yusuke Kawano
  • Satoshi Otani
  • Taketoshi Mori
  • Tomomasa Sato

In this paper, we constructed wireless ad hoc network for realizing various physical and informational support systems based on Lifelog that is the records of experiences in the daily life, and realized a prototype of electric appliances operational support system which is one of useful systems. Utilization of Lifelog reduces the burden for controlling a huge amount of convoluted electric appliances by constructing the probabilistic model of user's operational behavior based on Lifelog and predicting user's successive operations with this model. In this time, Lifelog accumulates user's operational behavior for electric appliances and environmental information through the wireless network. For the basis of collecting Lifelog including operational behavior, we made a portable Bluetooth-equipped device for wireless network which is necessary to communicate information in the ubiquitous computing environment because Bluetooth has good features such as the ad hoc networking function, sufficient data throughput, high resistance to noise and low power consumption. The realized device has abundant 110 connectors which are able to connect various sensors and actuators. By attaching these devices to electric appliances, they can easily participate in the wireless network and communicate information. Therefore the system can probabilistically model user's behavior, especially operations for electric appliances using Lifelog as training data, and predicts the user's next successive operations for surrounding appliances utilizing this model and the user's present state. The system gives prediction results to the user and executes operations via the wireless network after the user's confirmation. The various experimental results prove that the operational system in ubiquitous computing environment is useful and our realized device is sufficient performance in the daily life.

IROS Conference 2005 Conference Paper

Human posture reconstruction based on posture probability density

  • Tatsuya Harada
  • Tomomasa Sato
  • Taketoshi Mori

In this paper, we propose a human posture reconstruction method from the insufficient input posture data based on human posture probability density that is constructed by a long-term human motion capture data. Since the long continuous daily human motion data has high dimensions and becomes huge size, the human posture data should be effectively compressed. The long term posture data has nonlinear distribution on the posture space, since each specific posture such as standing and sitting has different property. The posture data is allocated into some subspaces and compressed for each subspace with mixtures of probabilistic principal component analyzer (MPPCA). MPPCA is improved by replacing conventional EM algorithm with deterministic annealing EM algorithm (DAEM) to avoid initial parameter sensitivity. The posture probability density is constructed over those subspaces. The adequate human posture can be reconstructed from the insufficient data by introducing the posture probability density into the sequential Monte Carlo framework. The experimental results show that the robust human posture estimation can be realized since this method does not estimate the unique posture but estimates the proper posterior posture density with using the posture prior knowledge.

ICRA Conference 2005 Conference Paper

Marginalized Bags of Vectors Kernels on Switching Linear Dynamics for Online Action Recognition

  • Masamichi Shimosaka
  • Taketoshi Mori
  • Tatsuya Harada
  • Tomomasa Sato

In this paper, we propose a novel kernel computation algorithm between time-series human motion data for online action recognition. The proposed kernel is based on probabilistic models called switching linear dynamics (SLDs). SLD is one of the powerful tools for tracking, analyzing and classifying human complex time-series motion. The proposed kernel incorporates information about the latent variables in SLDs with simplified designing approach called marginalized kernels. The empirical evaluation using real motion data shows that a classifier using SVM with our proposed kernel has much better performance than the classifier with some conventional kernel techniques. Another experiment using walking around motion shows that a classifier with the proposed kernel can properly segment the start and the end of the target action.

IROS Conference 2005 Conference Paper

Online recognition and segmentation for time-series motion with HMM and conceptual relation of actions

  • Taketoshi Mori
  • Yu Nejigane
  • Masamichi Shimosaka
  • Yushi Segawa
  • Tatsuya Harada
  • Tomomasa Sato

In this paper, we propose a robust online action recognition algorithm with a segmentation scheme that detects start and end points of action occurrences. In other words, the algorithm estimates reliably what kind of actions occurring at present time. The algorithm has following characteristics: 1) The algorithm incorporates human knowledge about relation between action names in order to simplify and toughen the algorithm, thus our algorithm can label robustly multiple action names at the same time. 2) The algorithm uses time-series action probability that represents the likelihood of each action occurrence at every frame time. 3) The classification technique with hidden Markov models (HMMs) enables the algorithm to detect robustly and immediately the segmental points. The experimental results using real motion capture data show that our algorithm not only decreases effectively the latency for detecting the segmental points but also prevents the system from making unnecessary segments due to the error of time-series action probability.

IROS Conference 2004 Conference Paper

Informative motion extractor for action recognition with kernel feature alignment

  • Taketoshi Mori
  • Masamichi Shimosaka
  • Tatsuya Harada
  • Tomomasa Sato

This paper proposes a novel algorithm for extracting informative motion features in daily life action recognition based on support vector machine (SVM). The main advantage of the proposed method is not only to extract remarkable motion features, which fit into human intuition, but also to improve the performance of the recognition system. Concretely speaking, the main properties of the proposed method are 1) optimizing kernel parameters so as to minimize its generalization error, 2) extracting remarkable motion features in response to the sensitivity of the kernel function. Experimental result shows that the proposed algorithm improves the accuracy of the recognition system and enables human to identify informative motion features intuitively.

ICRA Conference 2004 Conference Paper

Portable Absolute Orientation Estimation Device with Wireless Network under Accelerated Situation

  • Tatsuya Harada
  • Hiroto Uchino
  • Taketoshi Mori
  • Tomomasa Sato

In this paper, we develop an absolute orientation estimation device equipped with a wireless network. Accelerometers and magnetometers are used to measure the gravity and the geomagnetic field respectively. Gyroscope sensors are used to measure the local angular velocity. The geomagnetic field varies according to the environment. Therefore the device can obtain the information about the magnetic field through the wireless network. The orientation estimation task can be also requested to the other computers with the wireless network. By integrating the measured gravity and geomagnetic field with the local angular velocity using Sigma-Points Kalman Filters (SPKFs), the stability and the robustness of estimating the absolute orientation are improved over either sensor alone. We also propose an estimation method which excludes the effect of motion and magnetic disturbances for the accurate estimation.

IROS Conference 2003 Conference Paper

Human behavior logging support system utilizing pose/position sensors and behavior target sensors

  • Tomomasa Sato
  • Satoru Itoh
  • Satoshi Otani
  • Tatsuya Harada
  • Taketoshi Mori

This paper proposes a behavior log creation support system utilizing human pose/position sensors and behavior target sensors. The system is equipped with a human pose sensor and a staying room sensors as the pose/position sensors as a well as a voice sensor and a PC utilization history sensor to detect the target of the behavior. The human pose sensor classifies human behaviors into "standing", "sitting", and "walking". The staying room sensor records a name of the room where the user stays. The voice sensor detects the conversation which appears when the user is communicating with someone else. The PC utilization history sensor records not only whether a PC is used or not but also the names of the application software to serve as a detector of the behavior target during the computer work. The measured data from these sensors is displayed to the user to support creating the behavior log, i. e. to support the user to recall and to input contents and targets of his or her behaviors. The experiment of making use of the sensors and creating the behavior log proved that the recorded events of the behavior log per all events improve from 60% with no support to more than 90% with support of the system. The result quantitatively shows the capability of human behavior logging support of the system.

IROS Conference 2003 Conference Paper

Robot imitation of human motion based on qualitative description from multiple measurement of human and environmental data

  • Tomomasa Sato
  • Yuichiro Genda
  • Hideyuki Kubotera
  • Taketoshi Mori
  • Tatsuya Harada

This paper proposes an imitation algorithm for a robot to acquire typical tasks from multiple measured data of human tasks in the daily life. The algorithm consists of the following procedures: 1) Firstly, the system measures multiple human's object-transferring tasks on a table. Then it calculates qualitative description from measured raw data of positions of human hand and object, as well as the force applied to the table. This description is then converted to probabilistic description. 2) Secondly, the system finds typical human tasks with the maximum likelihood from the probabilistic description. 3) Thirdly, the trajectory, which enables the robot to imitate the typical human task, is extracted. 4) Finally, the imitation task within limited force to the environment is generated from the trajectory by simulation and adaptation. The experimental execution of the generated trajectory proves the validity of the algorithm.

ICRA Conference 2002 Conference Paper

Estimation of Bed-Ridden Human's Gross and Slight Movement Based on Pressure Sensors Distribution Bed

  • Tatsuya Harada
  • Tomomasa Sato
  • Taketoshi Mori

In this paper, we developed a bed-ridden human's body movement unrestraint estimation system by using a pressure sensors distribution bed. We classified body movements into gross and slight movements. In order to estimate both gross and slight movements, we realized the distinction methods between a human and an object and between sitting and lying status. We also realized the estimation methods of a posture, an articular movement, the respiration and the pulse. By integrating these methods to complement each other, a bed-ridden human's body movement from gross to slight movements can be estimated totally.

ICRA Conference 2001 Conference Paper

Pressure Distribution Image Based Human Motion Tracking System Using Skeleton and Surface Integration Model

  • Tatsuya Harada
  • Tomomasa Sato
  • Taketoshi Mori

A lying person's motion tracking system by using a pressure distribution image and a full body model is proposed. The full body model consists of a skeleton and a surface model to cope with a variety of body shapes. BVH files are used as the skeleton model that describes a hierarchy of joints and links. Wavefront object files are used as the surface model that describes geometry of the surface. The bed has 210 pressure sensors that are under the mattress. It can measure a pressure distribution image of a lying person. The lying person's motion is tracked by considering potential energy, momentum and a difference between the measured pressure distribution image and a pressure distribution image that is calculated by the full body model. Experimental results reveal that the realized system can track not only horizontal motions such as opening and closing legs but also vertical motions such as raising the upper body.

ICRA Conference 2000 Conference Paper

Infant Behavior Recognition System Based on Pressure Distribution Image

  • Tatsuya Harada
  • Akihiko Saito
  • Tomomasa Sato
  • Taketoshi Mori

The authors developed a novel infant behavior recognition system based on a pressure distribution image. The system can recognize an infant's status (quiet, moving and crying), posture, body parts' positions and movement unrestrainedly. It can, in recognizing the behavior, cope with the infant's rapid growth and unique physique. The algorithm of the infant behavior recognition system is summarized as follows. 1) First, the system measures the pressure distribution image with 384 pressure sensors distributed in the bed. 2) The authors propose "activity score"; this is calculated by using the measured pressure distribution image and indicates kinetic energy of the infant's activity. Based on the activity score, the system decides the infant's status. 3) If the infant is quiet, the system estimates the infant's physique. 4) Based on the estimated physique, the system recognizes the infant's posture and body part movement. Experimental results reveal that the system successfully recognizes infants' status (quiet, moving and crying), posture, body parts position and movements.

IROS Conference 2000 Conference Paper

Sensor pillow system: monitoring respiration and body movement in sleep

  • Tatsuya Harada
  • Akiko Sakata
  • Taketoshi Mori
  • Tomomasa Sato

This paper presents "Sensor Pillow System" to measure physiological parameters in sleep without restraint to a human. The system consists of an array of pressure sensors under the pillow, a one-chip microcomputer to digitize and transmit the pressure data to a desktop computer; and the computer to count respirations and turns in sleep. This paper also presents a simple motion model which explains the change of the head pressure distribution accompanied with respiration. Based on this model, respiration count algorithms is proposed. The effectiveness of this system is experimentally shown by comparing the number of respirations and turns counted by the sensor pillow system of a medical device and a video image.

ICRA Conference 1999 Conference Paper

Body Parts Positions and Posture Estimation System Based on Pressure Distribution Image

  • Tatsuya Harada
  • Taketoshi Mori
  • Yoshifumi Nishida
  • Tomohisa Yoshimi
  • Tomomasa Sato

We develop a body parts position and posture estimation system. This system consists of a pressure sensor distributed bed and body parts position and posture estimation software. The computer constructs many pressure distribution images based on the simple human models and accumulates these images to its memory, called the model-based pressure image templates. The pressure distribution image through is compared with model-based pressure image templates to find out the most matched pressure image template. The model-based pressure image templates contain body parts positions and joint angles information, so the body parts position contacting with the bed can easily be estimated by using the most matched model-based pressure image template. Finally, an estimated posture is displayed as a 3D computer graphics image. An experimental result reveals that the system not only can display the estimated lying human posture intuitively, but can also estimate the lying human's body parts position accurately where the body parts contacts with bed.

IROS Conference 1997 Conference Paper

Contact interaction robot-communication between robot and human through contact behavior

  • Tomomasa Sato
  • Tatsuya Harada
  • Taketoshi Mori

This paper proposes a contact interaction robot (CIR) which utilizes contact behavior as the interaction means between a human and a robot. The CIR is a puppet robot designed so that the robot and the human touch each other. The psychological experiments are performed by utilizing a CIR equipped with pressure sensors on both sides of its neck and six servo motors in its neck, two arms and, two legs. The experimental results reveal that the CIR is able to moderate the painfulness perceived by the human as well as to bring a sense of relief.