Author name cluster

Ji Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Massive Sound Embedding Benchmark (MSEB)

Georg Heigold
Ehsan Variani
Tom Bagby
Cyril Allauzen
Ji Ma
Shankar Kumar
Michael D Riley

Audio is a critical component of multimodal perception, and any truly intelligent system must demonstrate a wide range of auditory capabilities. These capabilities include transcription, classification, retrieval, reasoning, segmentation, clustering, reranking, and reconstruction. Fundamentally, each task involves transforming a raw audio signal into a meaningful 'embedding'—be it a single vector, a sequence of continuous or discrete representations, or another structured form—which then serves as the basis for generating the task's final response. To accelerate progress towards robust machine auditory intelligence, we present the Massive Sound Embedding Benchmark (MSEB): an extensible framework designed to evaluate the auditory components of any multimodal system. In its first release, MSEB offers a comprehensive suite of eight core tasks, with more planned for the future, supported by diverse datasets, including the new, large-scale Simple Voice Questions (SVQ) dataset. Our initial experiments establish clear performance headrooms, highlighting the significant opportunity to improve real-world multimodal experiences where audio is a core signal. We encourage the research community to use MSEB to assess their algorithms and contribute to its growth. The library is publicly hosted at https: //github. com/google-research/mseb.

PDF Details

AAAI Conference 2025 Conference Paper

Memory-Reduced Meta-Learning with Guaranteed Convergence

Honglin Yang
Ji Ma
Xiao Yu

The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such as MAML, ANIL and their variants, generally employ backpropagation for upper-level gradient estimation, which requires using historical lower-level parameters/gradients and thus increases computational and memory overhead in each iteration. In this paper, we propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration compared to existing optimization-based meta-learning approaches. In addition to memory reduction, we prove that our proposed algorithm converges sublinearly with the iteration number of upper-level optimization, and the convergence error decays sublinearly with the batch size of sampled tasks. In the specific case in terms of deterministic meta-learning, we also prove that our proposed algorithm converges to an exact solution. Moreover, we quantify the computational complexity of the algorithm, which matches existing convergence results on meta-learning even without using any historical parameters/gradients. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning

Ji Ma
Wei Suo
Peng Wang
Yanning Zhang

Vision-Language Instruction Tuning (VLIT) is a critical training phase for Large Vision-Language Models (LVLMs). With the improving capabilities of open-source LVLMs, researchers have increasingly turned to generate VLIT data by using open-source LVLMs and achieved significant progress. However, such data generation approaches are bottlenecked by the following challenges: 1) Since multi-modal models tend to be influenced by prior language knowledge, directly using LVLMs to generate VLIT data would inevitably lead to low content relevance between generated data and images. 2) To improve the ability of the models to generate VLIT data, previous methods have incorporated an additional training phase to boost the generative capacity. This process hurts the generalization of the models to unseen inputs (i. e. , “exposure bias” problem). In this paper, we propose a new Content Correlated VLIT data generation via Contrastive Learning (C3L). Specifically, we design a new content relevance module which enhances the content relevance between VLIT data and images by computing Image Instruction Correspondence Scores S(I2C). Moreover, a contrastive learning module is introduced to further boost the VLIT data generation capability of the LVLMs. A large number of automatic measures on four benchmarks show the effectiveness of our method.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Learning List-Level Domain-Invariant Representations for Ranking

Ruicheng Xian
Honglei Zhuang
Zhen Qin
Hamed Zamani
Jing Lu
Ji Ma
Kai Hui
Han Zhao

Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and the few existing implementations lack theoretical justifications. This paper revisits invariant representation learning for ranking. Upon reviewing prior work, we found that they implement what we call item-level alignment, which aligns the distributions of the items being ranked from all lists in aggregate but ignores their list structure. However, the list structure should be leveraged, because it is intrinsic to ranking problems where the data and the metrics are defined and computed on lists, not the items by themselves. To close this discrepancy, we propose list-level alignment—learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method, and it achieves better empirical transfer performance for unsupervised domain adaptation on ranking tasks, including passage reranking.

PDF Details

IROS Conference 2007 Conference Paper

Task evaluations of a compact laparoscopic surgical robot system

Ji Ma
Peter J. Berkelman

Minimally invasive surgery (MIS) has become an important technique in practical surgical procedures. Compared with manually operated MIS procedures, surgical robot systems provide more accuracy, enhance dexterity, and make more difficult surgical procedures feasible. In this paper, a prototype teleoperated robotic surgical system which is modular, compact and easy to use is tested with human operators. Two evaluation tasks were performed by participants using manual MIS instruments and this teleoperated robotic surgical system. The task data were analyzed and compared between the manual and robot instrument operation. The results show that compared with typical manual instrument operation, the teleoperated robotic system in this paper has advantages in ease of use, decreased task time, and better accuracy with smooth motions and less tremor.

Details

IROS Conference 2007 Conference Paper

The University of Hawaii teleoperated robotic surgery system

Peter J. Berkelman
Ji Ma

Teleoperated robotic surgical systems have been commercially developed and shown to be effective yet their adoption into standard practice has been limited up to the present for reasons which may include size, complexity, and cost. In the Department of Mechanical Engineering at the University of Hawaii, we have developed a new prototype teleoperated robotic system for minimally invasive surgery (MIS) which is simple, compact, and easy to set up and use.

Details

IROS Conference 2006 Conference Paper

Control Software Design of A Compact Laparoscopic Surgical Robot System

Ji Ma
Peter J. Berkelman

We have developed a prototype teleoperated robotic surgical system which is modular, compact and easy to use. In this paper, the control software design of the prototype is introduced. The main function of the control software is to realize master-slave control. According to the functions, The control software consists of three layers: hardware drivers, master-slave control and human-machine interface. Each software layer includes several software modules which are easy to maintain and upgrade and are reliable. The preliminary motion control and experimental results are given in the end

Details

IROS Conference 2006 Conference Paper

Effects of Friction Parameters on Completion Times for Sustained Planar Positioning Tasks with a Haptic Interface

Peter J. Berkelman
Ji Ma

Haptic interface devices and teleoperation masters are multiple degree of freedom devices manipulated by an operator to generate real-time motion commands to simulated environments or robot manipulators. In this work we examine the relationship between the simulated friction parameters of a particular spatial positioning master device and the completion times of planar positioning tasks by human operators. It is expected that increasing the Coulomb or viscous friction of the device would tend to increase the completion times of less difficult, quicker positioning tasks and decrease completion times for more difficult fine positioning tasks requiring higher precision from the operator. A common haptic interface device was used to perform continuous sequences of planar positioning tasks. Each trial required 10-12 minutes to complete and consisted of 15 positioning sequences which varied in the size of the target regions and the magnitude and type of simulated friction in the device. With a sample size of 10 test subjects, small effects were generally observed as expected, with the exception of the first 3 to 4 sequences of the trials which are concluded to be an adaptation or learning period for the users during each trial

Details