Arrow Research search

Author name cluster

Minghao Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

EAAI Journal 2026 Journal Article

Visual–haptic attention fusion based flexible printed circuit position identification in industrial robotic mobile phone assembly

  • Zihan Tang
  • Yongjia Zhao
  • Deming Luo
  • Hang Pan
  • Jinlong Chen
  • Yongsong Zhan
  • Minghao Yang

Despite various methods in robotic operation, it remains a great challenge for industrial mobile phone flexible printed circuit (FPC) assembly due to the extremely stringent demand in position identification for the very tiny FPC assembly targets under strict assembly tolerance. To this end, this work proposes an accurate FPC position identification strategy in industrial robotic mobile phone assembly. The contributions of the work are concluded as follows: (1) we construct a Multi-Head Attention (MHA) architecture with the objective of encoding the visual–haptic information of the position into compact fusion presentations; (2) the distance and rotation errors between the ideal FPC position and the varying mismatched FPC positions around the correct one are connected to the Multi-Head Attention decoder information in a regression manner; (3) a dynamic weight averaging (DWA) strategy is adopted to adjust the weights in the loss calculations, which is able to achieve a better balance between the position and rotation errors in loss regression. Experiments were conducted on a practical FPC assembly platform. The results show that the proposed method can significantly improve the FPC location accuracy under the limited requirements of assembly attempts. The possibility exists that the proposed method applies to the real mobile phone assembly lines to reduce the labor burden in the near future.

EAAI Journal 2025 Journal Article

Guiding reinforcement learning with shaping rewards provided by the vision–language model

  • Kaiyi Wang
  • Yongjia Zhao
  • Yichen He
  • Shuling Dai
  • Ning Zhang
  • Minghao Yang

Enabling robots to learn manipulation tasks is a practical engineering application, and reinforcement learning is one of the key artificial intelligence methods to achieve it, but reinforcement learning always faces the trade-off between the ease of designing reward functions and the ease of learning from rewards. Reward shaping provides a solution and recent works have designed rewards using images and language descriptions, which is a simple and convenient shaping method for non-expert users. However, some of them only adopt the pretrained model without fine-tuning, while others train the reward model with absolute score labels, which all have difficulties in capturing the spatial relationships within the images, so the performance of reward shaping models is limited. In this work, we propose a novel reward shaping method to generate additional rewards from task descriptions and scene images. We utilize the pretrained vision–language model as backbone for efficient cross-modal information fusion and design a downstream task trained with pair-wise comparison for reward shaping. Extensive experiments are conducted to demonstrate the effectiveness of each component in the method. The approach is validated in the Meta-World environment and the results demonstrate that it outperforms standard reinforcement learning and the existing work in terms of the policy learning efficiency.

NeurIPS Conference 2025 Conference Paper

Multimodal 3D Genome Pre-training

  • Minghao Yang
  • Pengteng Li
  • Yan Liang
  • Qianyi Cai
  • Zhihang Zheng
  • Shichen Zhang
  • Pengfei Zhang
  • Zhi-An Huang

Deep learning techniques have driven significant progress in various analytical tasks within 3D genomics in computational biology. However, a holistic understanding of 3D genomics knowledge remains underexplored. Here, we propose MIX-HIC, the first multimodal foundation model of 3D genome that integrates both 3D genome structure and epigenomic tracks, which obtains unified and comprehensive semantics. For accurate heterogeneous semantic fusion, we design the cross-modal interaction and mapping blocks for robust unified representation, yielding the accurate aggregation of 3D genome knowledge. Besides, we introduce the first large-scale dataset comprising over 1 million pairwise samples of Hi-C contact maps and epigenomic tracks for high-quality pre-training, enabling the exploration of functional implications in 3D genomics. Extensive experiments show that MIX-HIC significantly surpasses existing state-of-the-art methods in diverse downstream tasks. This work provides a valuable resource for advancing 3D genomics research.

NeurIPS Conference 2025 Conference Paper

VaporTok: RL-Driven Adaptive Video Tokenizer with Prior & Task Awareness

  • Minghao Yang
  • Zechen Bai
  • Jing Lin
  • Haoqian Wang
  • Alex Jinpeng Wang

Recent advances in visual tokenizers have demonstrated their effectiveness for multimodal large language models and autoregressive generative models. However, most existing visual tokenizers rely on a fixed downsampling rate at a given visual resolution, and consequently produce a constant number of visual tokens, ignoring the fact that visual information of varying complexity warrant different token budgets. Motivated by this observation, we propose an adaptive video tokenizer "VaporTok" with two core contributions: Probabilistic Taildrop: We introduce a novel taildrop mechanism that learns a truncation index sampling distribution conditioned on visual complexity of the video. During both training and inference, the decoder reconstructs videos at adaptive token lengths, allocating more tokens to complex videos and fewer to simpler ones. Parallel Sample GRPO with Vapor Reward: By leveraging the probability distribution produced by probabilistic taildrop, we reformulate the visual tokenization pipeline as a sequential decision process. To optimize this process, we propose a variant of GRPO and a composite reward encompassing token efficiency, reconstruction fidelity, and generative quality, thus enabling metrics-aware adaptive tokenization across diverse objectives. Extensive experiments on standard video generation benchmarks confirm our analysis, showing that our adaptive approach matches or outperforms fixed‐rate baselines and naive taildrop while using fewer tokens.

TCS Journal 2024 Journal Article

Improved unbounded inner-product functional encryption

  • Minghao Yang
  • Junqing Gong
  • Haifeng Qian

In this paper, we propose an unbounded inner-product functional encryption (unbounded IPFE) scheme with semi-adaptive simulation-based security. Compared with the previous semi-adaptive secure scheme proposed by Tomida and Takashima [Asiacrypt18], our scheme enjoys about 28% shorter ciphertext and about 43% shorter secret key. Technically, we start with a bounded separable one-key IPFE scheme. In the separable one-key IPFE scheme, the public key and ciphertext can be divided into some vectors. At the same time, we develop a new transformation from a bounded separable one-key IPFE scheme towards an unbounded IPFE scheme. Finally, we give a concrete instantiation with the bounded separable one-key IPFE scheme and the transformation.

TCS Journal 2023 Journal Article

Bounded-collusion decentralized ABE with sublinear parameters

  • Jun Zhao
  • Minghao Yang
  • Junqing Gong
  • Kai Zhang
  • Haifeng Qian

In this paper, we propose a decentralized ABE scheme against bounded collusion which means the number of users in the system is a-prior bounded. The scheme enjoys public key and ciphertext of sublinear sizes in the number of users in the system while all prior constructions require linear sizes. Besides, our scheme achieves semi-adaptive security under bilateral k-Lin assumption and SXDH assumption in a pairing group. Keep the same as the previous constructions, the scheme supports monotone span program as a policy and does not rely on the random oracle. Technically, we follow Wang et al. 's “linear secret sharing scheme (LSSS) + inner-product functional encryption (IPFE)” paradigm [PKC'19] and use (an extended variant of) functional encryption for quadratic functions (QFE) in the place of IPFE. By this, we encrypt with sublinear-size random coins and later expand them to linear-size entropy for security proof. Roughly, the use of QFE requires bilateral k-Lin assumption while the entropy expansion relies on SXDH.

ICML Conference 2019 Conference Paper

BayesNAS: A Bayesian Approach for Neural Architecture Search

  • Hongpeng Zhou
  • Minghao Yang
  • Jun Wang 0012
  • Wei Pan 0004

One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture in both proxy and proxyless tasks on CIFAR-10 within only 0. 2 GPU days using a single GPU. As a byproduct, our approach can be transferred directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.