Arrow Research search

Author name cluster

Meng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

AAAI Conference 2025 System Paper

AutoMV: An Autonomous Agent Framework for Real Estate Marketing Video Generation

  • Kuizong Wu
  • Shaozu Yuan
  • Chang Shen
  • Long Xu
  • Meng Chen

In this paper, we introduce AutoMV, an autonomous agent framework designed for generating real estate marketing videos. The framework integrates a diverse set of existing models into a tool library, allowing the agent to intelligently select and execute the appropriate tools. Given property images and text, the agent decomposes the task into manageable subtasks, generating storyline directives and corresponding camera movement trajectories to guide the video production process. By automatically applying video synthesis techniques and incorporating multimedia elements such as subtitles and background music, the agent transforms static real estate images into dynamic, visually appealing videos, thereby optimizing their impact for digital marketing purposes.

TIST Journal 2025 Journal Article

MGRL4RE: A Multi-Graph Representation Learning Approach for Urban Region Embedding

  • Meng Chen
  • Zechen Li
  • Hongwei Jia
  • Xin Shao
  • Jun Zhao
  • Qiang Gao
  • Min Yang
  • Yilong Yin

Using multi-modal data to learn region representations has gained popularity for its ability to reveal diverse socioeconomic features in cities. However, many studies focus solely on semantic features from points-of-interest (POIs), neglecting the issue of spatial imbalance. This article introduces a Multi-Graph Representation Learning framework for Region Embedding (MGRL4RE), which leverages both inter-region and intra-region correlations through two main components: multi-graph construction based on various region correlations and multi-graph representation learning. The construction module creates a multi-graph reflecting various correlations among regions, utilizing geo-tagged POIs, region data, and human mobility data. Specifically, we assess a region’s importance relative to its spatial context (neighborhood) and develop spatially invariant semantic features to address spatial imbalance. Furthermore, the representation learning module generates comprehensive and effective region representations via multi-view embedding fusion. Our extensive experiments across various downstream tasks, including land use clustering, region popularity prediction, and crime prediction, confirm that our model significantly outperforms existing state-of-the-art region embedding methods.

NeurIPS Conference 2025 Conference Paper

Results of the Big ANN: NeurIPS’23 competition

  • Harsha Vardhan simhadri
  • Martin Aumüller
  • Matthijs Douze
  • Dmitry Baranchuk
  • Amir Ingber
  • Edo Liberty
  • George Williams
  • Ben Landrum

The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect its the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search (Simhadri et al. , NeurIPS 2021), this competition addressed sparse, filtered, out-of-distribution, and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

NeurIPS Conference 2025 Conference Paper

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

  • Di Liu
  • Meng Chen
  • Baotong Lu
  • Huiqiang Jiang
  • Zhenhua Han
  • Qianxi Zhang
  • Qi Chen
  • Chengruidong Zhang

Transformer-based Large Language Models (LLMs) have become increasingly important. However, scaling LLMs to longer contexts incurs slow inference speed and high GPU memory consumption for caching key-value (KV) vectors. This paper presents RetrievalAttention, a training-free approach to both accelerate the decoding phase and reduce GPU memory consumption by pre-building KV vector indexes for fixed contexts and maintaining them in CPU memory for efficient retrieval. Unlike conventional KV cache methods, RetrievalAttention integrate approximate nearest neighbor search (ANNS) indexes into attention computation. We observe that off-the-shelf ANNS techniques often fail due to the out-of-distribution (OOD) nature of query and key vectors in attention mechanisms. RetrievalAttention overcomes this with an attention-aware vector index. Our evaluation shows RetrievalAttention achieves near full attention accuracy while accessing only 1-3\% of the data, significantly reducing inference costs. Remarkably, RetrievalAttention enables LLMs with 8B parameters to handle 128K tokens on a single NVIDIA RTX4090 (24GB), achieving a decoding speed of 0. 107 seconds per token.

IJCAI Conference 2025 Conference Paper

Spatio-temporal Prototype-based Hierarchical Learning for OD Demand Prediction

  • Shilu Yuan
  • Xiaoyu Li
  • Wenqian Mu
  • Ji Zhong
  • Meng Chen
  • Haoliang Sun
  • Yongshun Gong

Origin-Destination (OD) demand prediction is a pivotal yet highly challenging task in intelligent transportation systems, aiming to accurately forecast cross-region ridership flows within urban networks. While previous studies have focused on modeling node-to-node relationships, most of them neglect the fact that nodes (regions/stations) exhibit similar spatio-temporal (ST) patterns, which are termed as spatio-temporal prototypes. Capturing these prototypes is crucial for understanding the unified ST dependencies across the network. To bridge this gap, we propose STPro, an ST prototype-based hierarchical model with a dual-branch structure that extracts ST features from the micro and macro perspectives. At the micro level, our model learns unified ST features of individual nodes, while at the macro level, it employs dynamic clustering to identify city-wide ST prototypes, thereby uncovering latent patterns of urban mobility. Besides, we leverage different roles of nodes as origins and destinations by constructing dual O and D branches and learn the mutual information to model their intricate interactions and correlations. Extensive experiments on two public datasets demonstrate that our STPro outperforms recent state-of-the-art baselines, achieving remarkable predictive improvements in OD demand prediction.

IROS Conference 2025 Conference Paper

Towards Physically Realizable Adversarial Attacks in Embodied Vision Navigation

  • Meng Chen
  • Jiawei Tu
  • Chao Qi
  • Yonghao Dang
  • Feng Zhou
  • Wei Wei
  • Jianqin Yin

The significant advancements in embodied vision navigation have raised concerns about its susceptibility to adversarial attacks exploiting deep neural networks. Investigating the adversarial robustness of embodied vision navigation is crucial, especially given the threat of 3D physical attacks that could pose risks to human safety. However, existing attack methods for embodied vision navigation often lack physical feasibility due to challenges in transferring digital perturbations into the physical world. Moreover, current physical attacks for object detection struggle to achieve both multi-view effectiveness and visual naturalness in navigation scenarios. To address this, we propose a practical attack method for embodied navigation by attaching adversarial patches to objects, where both opacity and textures are learnable. Specifically, to ensure effectiveness across varying viewpoints, we employ a multi-view optimization strategy based on object-aware sampling, which optimizes the patch’s texture based on feedback from the vision-based perception model used in navigation. To make the patch inconspicuous to human observers, we introduce a two-stage opacity optimization mechanism, in which opacity is fine-tuned after texture optimization. Experimental results demonstrate that our adversarial patches decrease the navigation success rate by an average of 22. 39%, outperforming previous methods in practicality, effectiveness, and naturalness. Code is available at: github.com/chen37058/Physical-Attacks-in-Embodied-Nav.

JBHI Journal 2024 Journal Article

Elimination of Random Mixed Noise in ECG Using Convolutional Denoising Autoencoder With Transformer Encoder

  • Meng Chen
  • Yongjian Li
  • Liting Zhang
  • Lei Liu
  • Baokun Han
  • Wenzhuo Shi
  • Shoushui Wei

Electrocardiogram (ECG) signals frequently encounter diverse types of noise, such as baseline wander (BW), electrode motion (EM) artifacts, muscle artifact (MA), and others. These noises often occur in combination during the actual data acquisition process, resulting in erroneous or perplexing interpretations for cardiologists. To suppress random mixed noise (RMN) in ECG with less distortion, we propose a Transformer-based Convolutional Denoising AutoEncoder model (TCDAE) in this study. The encoder of TCDAE is composed of three stacked gated convolutional layers and a Transformer encoder block with a point-wise multi-head self-attention module. To obtain minimal distortion in both time and frequency domains, we also propose a frequency weighted Huber loss function in training phase to better approximate the original signals. The TCDAE model is trained and tested on the QT Database (QTDB) and MIT-BIH Noise Stress Test Database (NSTDB), with the training data and testing data coming from different records. All the metrics perform the most robust in overall noise and separate noise intervals for RMN removal compared with the baseline methods. We also conduct generalization tests on the Icentia11k database where the TCDAE outperforms the state-of-the-art models, with a 55% reduction of the false positives in R peak detection after denoising. The TCDAE model approximates the short-term and long-term characteristics of ECG signals and has higher stability even under extreme RMN corruption. The memory consumption and inference speed of TCDAE are also feasible for its deployment in clinical applications.

IJCAI Conference 2024 Conference Paper

Exploring Urban Semantics: A Multimodal Model for POI Semantic Annotation with Street View Images and Place Names

  • Dabin Zhang
  • Meng Chen
  • Weiming Huang
  • Yongshun Gong
  • Kai Zhao

Semantic annotation for points of interest (POIs) is the process of annotating a POI with a category label, which facilitates many services related to POIs, such as POI search and recommendation. Most of the existing solutions extract features related to POIs from abundant user-generated content data (e. g. , check-ins and user comments). However, such data are often difficult to obtain, especially for newly created POIs. In this paper, we aim to explore semantic annotation for POIs with limited information such as POI (place) names and geographic locations. Additionally, we have found that the street view images provide extensive visual clues about POI attributes and could be an essential supplement to limited information of POIs that enables semantic annotation. To this end, we propose a novel multimodal model for POI semantic annotation, namely M3PA, which achieves enhanced semantic annotation through fusing a POI’s textual and visual representations. Specifically, M3PA extracts visual features from street view images using a pre-trained image encoder and integrates these features to generate the visual representation of a targeted POI based on a geographic attention mechanism. Furthermore, M3PA utilizes the contextual information of neighboring POIs to extract textual features and captures their spatial relationships through geographical encoding to generate the textual representation of a targeted POI. Finally, the visual and textual representations of a POI are fused for semantic annotation. Extensive experiments with POI data from Amap validate the effectiveness of M3PA for POI semantic annotation, compared with several competitive baselines.

AAAI Conference 2024 Conference Paper

G^2SAM: Graph-Based Global Semantic Awareness Method for Multimodal Sarcasm Detection

  • Yiwei Wei
  • Shaozu Yuan
  • Hengyang Zhou
  • Longbiao Wang
  • Zhiling Yan
  • Ruosong Yang
  • Meng Chen

Multimodal sarcasm detection, aiming to detect the ironic sentiment within multimodal social data, has gained substantial popularity in both the natural language processing and computer vision communities. Recently, graph-based studies by drawing sentimental relations to detect multimodal sarcasm have made notable advancements. However, they have neglected exploiting graph-based global semantic congruity from existing instances to facilitate the prediction, which ultimately hinders the model's performance. In this paper, we introduce a new inference paradigm that leverages global graph-based semantic awareness to handle this task. Firstly, we construct fine-grained multimodal graphs for each instance and integrate them into semantic space to draw graph-based relations. During inference, we leverage global semantic congruity to retrieve k-nearest neighbor instances in semantic space as references for voting on the final prediction. To enhance the semantic correlation of representation in semantic space, we also introduce label-aware graph contrastive learning to further improve the performance. Experimental results demonstrate that our model achieves state-of-the-art (SOTA) performance in multimodal sarcasm detection. The code will be available at https://github.com/upccpu/G2SAM.

IJCAI Conference 2024 Conference Paper

Learning Hierarchy-Enhanced POI Category Representations Using Disentangled Mobility Sequences

  • Hongwei Jia
  • Meng Chen
  • Weiming Huang
  • Kai Zhao
  • Yongshun Gong

Points of interest (POIs) carry a wealth of semantic information of varying locations in cities and thus have been widely used to enable various location-based services. To understand POI semantics, existing methods usually model contextual correlations of POI categories in users' check-in sequences and embed categories into a latent space based on the word2vec framework. However, such an approach does not fully capture the underlying hierarchical relationship between POI categories and can hardly integrate the category hierarchy into various deep sequential models. To overcome this shortcoming, we propose a Semantically Disentangled POI Category Embedding Model (SD-CEM) to generate hierarchy-enhanced category representations using disentangled mobility sequences. Specifically, first, we construct disentangled mobility sequences using human mobility data based on the semantics of POIs. Then we utilize the POI category hierarchy to initialize a hierarchy-enhanced representation for each category in the disentangled sequences, employing an attention mechanism. Finally, we optimize these category representations by incorporating both the masked category prediction task and the next category prediction task. To evaluate the effectiveness of SD-CEM, we conduct comprehensive experiments using two check-in datasets covering three tasks. Experimental results demonstrate that SD-CEM outperforms several competitive baselines, highlighting its substantial improvement in performance as well as the understanding of learned category representations.

AAAI Conference 2024 Conference Paper

Urban Region Embedding via Multi-View Contrastive Prediction

  • Zechen Li
  • Weiming Huang
  • Kai Zhao
  • Min Yang
  • Yongshun Gong
  • Meng Chen

Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. Specifically, ReCP comprises two major modules, namely an intra-view learning module utilizing contrastive learning and feature reconstruction to capture the unique information from each single view, and inter-view learning module that perceives the consistency between the two views using a contrastive prediction learning scheme. We conduct thorough experiments on two downstream tasks to assess the proposed model, i.e., land use clustering and region popularity prediction. The experimental results demonstrate that our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.

TIST Journal 2023 Journal Article

A Spatial and Adversarial Representation Learning Approach for Land Use Classification with POIs

  • Ronghui Xu
  • Weiming Huang
  • Jun Zhao
  • Meng Chen
  • Liqiang Nie

Points-of-interests (POIs) have been proven to be indicative for sensing urban land use in numerous studies. However, recent progress mainly relies on spatial co-occurrence patterns among POI categories, which falls short in utilizing the rich semantic information embodied in POI hierarchical categories and in sensing the spatial distribution patterns of POIs at an individual zonal scale. In this context, we present a spatial and adversarial representation learning approach (SARL) for predicting land use of urban zones with POIs. SARL deeply mines the information from POIs from both spatial and categorical perspectives. Specifically, we first utilize a convolutional neural network to sense the spatial distribution patterns of POIs in each urban zone. We then leverage an autoencoder and an adversarial learning strategy to mine the POI categorical information in all hierarchical levels, which emphasizes the prominent and definitive POIs while preserves the overall POI hierarchical structures in each zone. Finally, we fuse these information from the two perspectives via a Wide & Deep network and carry out land use prediction with the fused embeddings. We conduct comprehensive experiments to validate the effectiveness of SARL in four European cities with real-world data. The results demonstrate that SARL substantially outperforms several competitive baselines.

AAAI Conference 2023 Conference Paper

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

  • Meihuizi Jia
  • Lei Shen
  • Xin Shen
  • Lejian Liao
  • Meng Chen
  • Xiaodong He
  • Zhendong Chen
  • Jiaqi Li

Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both text and image. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.

IJCAI Conference 2023 Conference Paper

Towards an Integrated View of Semantic Annotation for POIs with Spatial and Textual Information

  • Dabin Zhang
  • Ronghui Xu
  • Weiming Huang
  • Kai Zhao
  • Meng Chen

Categories of Point of Interest (POI) facilitate location-based services from many aspects like location search and POI recommendation. However, POI categories are often incomplete and new POIs are being consistently generated, this rises the demand for semantic annotation for POIs, i. e. , labeling the POI with a semantic category. Previous methods usually model sequential check-in information of users to learn POI features for annotation. However, users' check-ins are hardly obtained in reality, especially for those newly created POIs. In this context, we present a Spatial-Textual POI Annotation (STPA) model for static POIs, which derives POI categories using only the geographic locations and names of POIs. Specifically, we design a GCN-based spatial encoder to model spatial correlations among POIs to generate POI spatial embeddings, and an attention-based text encoder to model the semantic contexts of POIs to generate POI textual embeddings. We finally fuse the two embeddings and preserve multi-view correlations for semantic annotation. We conduct comprehensive experiments to validate the effectiveness of STPA with POI data from AMap. Experimental results demonstrate that STPA substantially outperforms several competitive baselines, which proves that STPA is a promising approach for annotating static POIs in map services.

IJCAI Conference 2022 Conference Paper

Learning to Generate Poetic Chinese Landscape Painting with Calligraphy

  • Shaozu Yuan
  • Aijun Dai
  • Zhiling Yan
  • Ruixue Liu
  • Meng Chen
  • Baoyang Chen
  • Zhijie Qiu
  • Xiaodong He

In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image painting generation, Polaca takes the classic poetry as input and outputs the artistic landscape painting image with the corresponding calligraphy. It is equipped with three different modules to complete the whole piece of landscape painting artwork: the first one is a text-to-image module to generate landscape painting image, the second one is an image-to-image module to generate stylistic calligraphy image, and the third one is an image fusion module to fuse the two images into a whole piece of aesthetic artwork.

TIST Journal 2021 Journal Article

Origin-Aware Location Prediction Based on Historical Vehicle Trajectories

  • Meng Chen
  • Qingjie Liu
  • Weiming Huang
  • Teng Zhang
  • Yixuan Zuo
  • Xiaohui Yu

Next location prediction is of great importance for many location-based applications and provides essential intelligence to various businesses. In previous studies, a common approach to next location prediction is to learn the sequential transitions with massive historical trajectories based on conditional probability. Nevertheless, due to the time and space complexity, these methods (e.g., Markov models) only utilize the just passed locations to predict next locations, neglecting earlier passed locations in the trajectory. In this work, we seek to enhance the prediction performance by incorporating the travel time from all the passed locations in the query trajectory to each candidate next location. To this end, we propose a novel prediction method, namely the Travel Time Difference Model, which exploits the difference between the shortest travel time and the actual travel time to predict next locations. Moreover, we integrate the Travel Time Difference Model with a Sequential and Temporal Predictor to yield a joint model. The joint prediction model integrates local sequential transitions, temporal regularity, and global travel time information in the trajectory for the next location prediction problem. We have conducted extensive experiments on two real-world datasets: the vehicle passage record data and the taxi trajectory data. The experimental results demonstrate significant improvements in prediction accuracy over baseline methods.

TIST Journal 2021 Journal Article

PARP: A Parallel Traffic Condition Driven Route Planning Model on Dynamic Road Networks

  • Tianlun Dai
  • Bohan Li
  • Ziqiang Yu
  • Xiangrong Tong
  • Meng Chen
  • Gang Chen

The problem of route planning on road network is essential to many Location-Based Services (LBSs). Road networks are dynamic in the sense that the weights of the edges in the corresponding graph constantly change over time, representing evolving traffic conditions. Thus, a practical route planning strategy is required to supply the continuous route optimization considering the historic, current, and future traffic condition. However, few existing works comprehensively take into account these various traffic conditions during the route planning. Moreover, the LBSs usually suffer from extensive concurrent route planning requests in rush hours, which imposes a pressing need to handle numerous queries in parallel for reducing the response time of each query. However, this issue is also not involved by most existing solutions. We therefore investigate a parallel traffic condition driven route planning model on a cluster of processors. To embed the future traffic condition into the route planning, we employ a GCN model to periodically predict the travel costs of roads within a specified time period, which facilitates the robustness of the route planning model against the varying traffic condition. To reduce the response time, a Dual-Level Path (DLP) index is proposed to support a parallel route planning algorithm with the filter-and-refine principle. The bottom level of DLP partitions the entire graph into different subgraphs, and the top level is a skeleton graph that consists of all border vertices in all subgraphs. The filter step identifies a global directional path for a given query based on the skeleton graph. In the refine step, the overall route planning for this query is decomposed into multiple sub-optimizations in the subgraphs passed through by the directional path. Since the subgraphs are independently maintained by different processors, the sub-optimizations of extensive queries can be operated in parallel. Finally, extensive evaluations are conducted to confirm the effectiveness and superiority of the proposal.

IROS Conference 2019 Conference Paper

An autonomous exploration algorithm using environment-robot interacted traversability analysis

  • Yujie Tang 0004
  • Jun Cai
  • Meng Chen
  • Xuejiao Yan
  • Yangmin Xie

Auto-exploration is a task for self-driving robots to explore unknown environments, which becomes much complicated when they move on irregular outdoor terrains. To improve the situation, a new frontier-based exploration algorithm is presented in this paper. It starts from original 3D cloud points of the environment to analyze the traversability of the scanned area, and further provides a reachability map to mark all map grid cells as reachable, dangerous or unknown. Frontier candidates are obtained from the reachable map, then clustered and reduced using an improved K-means. Finally, the target of next exploration step is selected from the frontiers left by evaluating their travel cost. The algorithm is validated on an irregular outdoor terrain and shows the capability for a field robot to explore on an irregular terrain.

IJCAI Conference 2019 Conference Paper

Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination

  • Ruixue Liu
  • Baoyang Chen
  • Meng Chen
  • Youzheng Wu
  • Zhijie Qiu
  • Xiaodong He

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist’s original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing.

JBHI Journal 2017 Journal Article

A Method of Signal Scrambling to Secure Data Storage for Healthcare Applications

  • Shu-Di Bao
  • Meng Chen
  • Guang-Zhong Yang

A body sensor network that consists of wearable and/or implantable biosensors has been an important front-end for collecting personal health records. It is expected that the full integration of outside-hospital personal health information and hospital electronic health records will further promote preventative health services as well as global health. However, the integration and sharing of health information is bound to bring with it security and privacy issues. With extensive development of healthcare applications, security and privacy issues are becoming increasingly important. This paper addresses the potential security risks of healthcare data in Internet-based applications and proposes a method of signal scrambling as an add-on security mechanism in the application layer for a variety of healthcare information, where a piece of tiny data is used to scramble healthcare records. The former is kept locally and the latter, along with security protection, is sent for cloud storage. The tiny data can be derived from a random number generator or even a piece of healthcare data, which makes the method more flexible. The computational complexity and security performance in terms of theoretical and experimental analysis has been investigated to demonstrate the efficiency and effectiveness of the proposed method. The proposed method is applicable to all kinds of data that require extra security protection within complex networks.

IROS Conference 2015 Conference Paper

The calibration device and method of humanoid finger sensor based on multimodal perception

  • Meng Chen
  • Ping Tang
  • Dong Han

Depending on a humanoid finger sensor with multimodal perception capability, physical properties such as pressure, temperature, texture features, surface roughness, micro vibration, etc. , could be easily extracted when the sensor contacts with different objects. In this paper, an innovative calibration device is created, as well as the motor controllers, centre control system, hardware and software system. The Least Square method is proposed for data fitting and parameter predicting, which is used to implement precise calibration for pressure and its distribution. The Frequency Spectrum Analysis method is presented by using Fast Fourier Transform to determine different physical properties of objects under certain frequency range and energy distribution. With the analysis of calibration data and results, the contact force and temperature range are obtained in good linearity, meanwhile the method for how to distinguish objects with similar texture features is put forward by using the Frequency Spectrum Analysis. This study provides a set of systematic calibration methods for the humanoid robot fingers, which can be served as an efficient reference to perceive multimodal objects.