Author name cluster

Ming Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

1 author row

AAAI Conference 2026 Conference Paper

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Jiazheng Xu
Yu Huang
Jiale Cheng
Yuanming Yang
Jiajun Xu
Yuan Wang
Wenbo Duan
Shen Yang

Visual generative models have achieved remarkable progress in synthesizing photorealistic images and videos, yet aligning their outputs with human preferences across critical dimensions remains a persistent challenge. Though reinforcement learning from human feedback offers promise for preference alignment, existing reward models for visual generation face limitations, including black-box scoring without interpretability and potentially resultant unexpected biases. We present VisionReward, a general framework for learning human visual preferences in both image and video generation. Specifically, we employ a hierarchical visual assessment framework to capture fine-grained human preferences, and leverages linear weighting to enable interpretable preference learning. Furthermore, we propose a multi-dimensional consistent strategy when using VisionReward as a reward model during preference optimization for visual generation. Experiments show that VisionReward can significantly outperform existing image and video reward models on both machine metrics and human evaluation. Notably, VisionReward surpasses VideoScore by 17.2% in preference prediction accuracy, and text-to-video models with VisionReward achieve a 31.6% higher pairwise win rate compared to the same models using VideoScore.

PDF Details DOI

YNIMG Journal 2025 Journal Article

A comparative study on the detection and localization of interictal epileptiform discharges in magnetoencephalography using optically pumped magnetometers versus superconducting quantum interference devices

Jiechuan Ren
Ming Ding
Yuming Peng
Chang Sun
Chunqing Yang
Shuxian Zhou
Jiayin Tian
Qun Wang

BACKGROUND: Superconducting quantum interference device (SQUID)-based magnetoencephalography (MEG) holds substantial clinical value in epilepsy examination but is limited by the high costs. The optically pumped magnetometer (OPM)-based MEG appears promising in overcoming these limitations. This study aims to explore the consistency of interictal epileptiform discharge (IED) detection and source localization between OPM-MEG and SQUID-MEG in a large cohort of patients with epilepsy. METHODS: Patients with epilepsy underwent SQUID-MEG and 128-channel whole-scalp OPM-MEG examinations. IED detection, amplitude, signal-to-noise ratio (SNR), sensor-scalp distance, and source localization results were compared between OPM-MEG and SQUID-MEG through statistical analysis. RESULTS: ) of 0.892 suggested good consistency. Among 39 patients with IEDs detected by both systems, OPM-MEG demonstrated closer sensor-scalp distance (p < 0.001), higher IED amplitude (p < 0.001) and SNR (p = 0.003) compared with SQUID-MEG. At the sublobar level, OPM-MEG and SQUID-MEG exhibited nearly consistent source localization results. Among 24 patients with single dipole clusters, the average centroid distance between dipole clusters of OPM-MEG and SQUID-MEG was 12.16 ± 5.90 mm. CONCLUSION: This real-world study demonstrated that OPM-MEG had comparable applicability in IED detection and source localization, compared with SQUID-MEG. Additionally, OPM-MEG performed better in terms of IED amplitude and SNR. Lower costs and user-friendly features highlight the clinical potential of OPM-MEG in epilepsy assessments.

Details DOI

YNIMG Journal 2025 Journal Article

From simulation to clinic: Assessing the required channel count for effective clinical use of OPM-MEG systems

Bing Yan
Yuming Peng
Yixiang Zhang
Yun Zhang
Haonan Zhang
Yifu Cao
Chang Sun
Ming Ding

The channel count in an Optically Pumped Magnetometer Magnetoencephalography (OPM-MEG) system plays a pivotal role in determining its overall performance. While existing research consistently highlights that a greater number of channels enhances system capabilities, practical constraints such as sensor placement on the head, inter-channel interference, and cost-efficiency impose limitations on channel scalability. Additionally, the optimal channel count required for clinical applications of OPM-MEG remains unclear. In this study, we systematically investigate the impact of channel count on OPM-MEG performance by integrating simulations, phantom experiments, and human MEG experiments. Four configurations with varying channel counts (16, 32, 64, and 128) are evaluated. Specifically, systems with fewer channels (e.g., 16 channels) encounter significant challenges in meeting the demands of clinical MEG applications. In contrast, a 64-channel OPM-MEG system demonstrates performance metrics-such as signal-to-noise ratio (SNR) and localization accuracy-that are comparable to those of a 306-channel Superconducting Quantum Interference Device MEG (SQUID-MEG) system. Notably, a 128-channel OPM-MEG system surpasses the 306-channel SQUID-MEG system, achieving superior results. This work provides a detailed exploration of the relationship between channel count and OPM-MEG system performance, analyzing how many channels of the OPM-MEG system are suitable for clinical applications. By combining simulation-based evaluations with empirical measurements, we found that it is crucial to carefully select the appropriate number of channels based on the specific usage requirements in clinical applications.

Details DOI

NeurIPS Conference 2025 Conference Paper

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
Shulin Xin
Linhao Zhang
Qi Liu
Li Aoyan

The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 languages of Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2, 132 high-quality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on human-annotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Our experiments reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolution rates decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.

PDF Details

YNIMG Journal 2024 Journal Article

A first-in-human application of OPM-MEG for localizing motor activity area: Compared to functional MRI

Tai Sun
Xiaohan Chi
Yuming Peng
Qianhe Zhang
Kang Liu
Yiwen Ma
Ming Ding
Nan Ji

BACKGROUND: Accurately localizing brain motor areas is vital for protecting motor function during neurosurgical procedures. Magnetoencephalography (MEG) based on optically pumped magnetometer (OPM) improves the availability of MEG in clinical applications. The aim of this study is to evaluate the availability, accuracy and precision of "OPM-MEG" for localizing motor areas in brain tumor patients and healthy individuals. METHODS: Participants were enrolled and subjected to primary motor area localization by both 3T-fMRI and 128-channel OPM-MEG examinations. The localization accuracy (ability of mapping on the anatomical location) and precision (activation signal centralization) were compared between the two methods, and accuracy was further validated by intraoperative direct cortical electrical stimulation (DCS) on the localized area with assistance of neuro-navigation system. RESULT: A total of 12 participants (7 brain tumor patients and 5 healthy individuals) were enrolled and all had successful localization for motor areas by both methods. The average time of OPM-MEG examination for each limb function was approximately 9 min. The localizations by both methods mainly covered the anatomical location of primary motor cortex and were partially overlapped. The motor activation signal identified by OPM-MEG was more centralized than fMRI did. The centroid of motor area localized by the OPM-MEG deviated from it by fMRI, with a mean distance of 19.7 mm and 27.48 mm for hand or foot localization, respectively. Furthermore, the OPM-MEG centroid for hand movement successfully triggered corresponding hand response by DCS. CONCLUSIONS: In this first-in-human study exploring the potential of OPM-MEG in functional localization of motor areas, we revealed its availability and reliability in mapping motor areas, demonstrating it as a promising tool in assisting neurosurgical practice and neuroscience research.

Details DOI

NeurIPS Conference 2024 Conference Paper

CogVLM: Visual Expert for Pretrained Language Models

Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
Yan Wang
Junhui Ji
Zhuoyi Yang

We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular \emph{shallow alignment} method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables a deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 17 classic cross-modal benchmarks, including 1) image captioning datasets: NoCaps, Flicker30k, 2) VQA datasets: OKVQA, TextVQA, OCRVQA, ScienceQA, 3) LVLM benchmarks: MM-Vet, MMBench, SEED-Bench, LLaVABench, POPE, MMMU, MathVista, 4) visual grounding datasets: RefCOCO, RefCOCO+, RefCOCOg, Visual7W. Codes and checkpoints are available at Github.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Huan Tian
Guangsheng Zhang
Bo Liu
Tianqing Zhu
Ming Ding
Wanlei Zhou

While in-processing fairness approaches show promise in mitigating bias predictions, their potential impact on privacy leakage remains under-explored. We aim to address this gap by assessing the privacy risks of fairness-enhanced binary classifiers with membership inference attacks (MIAs). Surprisingly, our results reveal that these fairness interventions exhibit increased resilience against existing attacks, indicating that enhancing fairness does not necessarily lead to privacy compromises. However, we find current attack methods are ineffective as they typically degrade into simple threshold models with limited attack effectiveness. Following this observation, we discover a novel threat dubbed Fairness Discrepancy Membership Inference Attacks (FD-MIA) that exploits prediction discrepancies between fair and biased models. This attack reveals more potent vulnerabilities and poses significant privacy risks to model privacy. Extensive experiments across multiple datasets, attack methods, and representative fairness approaches confirm our findings and demonstrate the efficacy of the proposed attack method. Our study exposes the overlooked privacy threats in fairness studies, advocating for thorough evaluations of potential security vulnerabilities before model deployments.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong

We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. To begin with, we build ImageReward---the first general-purpose text-to-image human preference reward model---to effectively encode human preferences. Its training is based on our systematic annotation pipeline including rating and ranking, which collects 137k expert comparisons to date. In human evaluation, ImageReward outperforms existing scoring models and metrics, making it a promising automatic metric for evaluating text-to-image synthesis. On top of it, we propose Reward Feedback Learning (ReFL), a direct tuning algorithm to optimize diffusion models against a scorer. Both automatic and human evaluation support ReFL's advantages over compared methods. All code and datasets are provided at \url{https: //github. com/THUDM/ImageReward}.

PDF Details

NeurIPS Conference 2022 Conference Paper

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang

Development of transformer-based text-to-image models is impeded by its slow generation and complexity, for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel autoregressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, a cross-modal general language model (CogLM), and fine-tune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.

PDF Details

YNIMG Journal 2022 Journal Article

Decoding transcriptional signatures of the association between free water and macroscale organizations in healthy adolescents

Lei Wei
Ming Ding
Yuwen Zhang
He Wang

We leveraged a novel index of diffusion MRI to investigate the relationships among cortical free water, macro-organizations and gene expression in healthy adults. Few research has been conducted to investigate the role of free water in the healthy adults due to it can easily be affected also by aging diseases. High quality data of 350 subjects from Human Connectome Project were used in our study. Cortical free water was estimated by using a bi-tensor model. The free water was high in the limbic, insular and somatosensory cortex, while being lower in motor and association cortex. The negative correlation between the free water and cortical thickness has been consistently identified in almost all the cortical regions. Negative correlation between the cortical free water and structural covariance (rho=-0. 38, pspin=0. 005) revealed the free water was sensitive to cortical heterogeneity. Using human gene expression dataset, we found the gene expression pattern of the relationship between the free water and cortical thickness spatially coupled with primary gradient of structural covariance network (rho=0. 40, pspin=0. 004). Our findings indicated the free water was sensitive to the cortical cellular status. The relationship between free water and macroscale organization also reflected hierarchal structures of cerebral cortex.

Details DOI

IJCAI Conference 2022 Conference Paper

Rethinking the Setting of Semi-supervised Learning on Graphs

Ziang Li
Ming Ding
Weikai Li
Zihan Wang
Ziyu Zeng
Yukuo Cen
Jie Tang

We argue that the present setting of semisupervised learning on graphs may result in unfair comparisons, due to its potential risk of over-tuning hyper-parameters for models. In this paper, we highlight the significant influence of tuning hyper-parameters, which leverages the label information in the validation set to improve the performance. To explore the limit of over-tuning hyperparameters, we propose ValidUtil, an approach to fully utilize the label information in the validation set through an extra group of hyper-parameters. With ValidUtil, even GCN can easily get high accuracy of 85. 8% on Cora. To avoid over-tuning, we merge the training set and the validation set and construct an i. i. d. graph benchmark (IGB) consisting of 4 datasets. Each dataset contains 100 i. i. d. graphs sampled from a large graph to reduce the evaluation variance. Our experiments suggest that IGB is a more stable benchmark than previous datasets for semisupervised learning on graphs. Our code and data are released at https: //github. com/THUDM/IGB/.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Adaptive Diffusion in Graph Neural Networks

Jialin Zhao
Yuxiao Dong
Ming Ding
Evgeny Kharlamov
Jie Tang

The success of graph neural networks (GNNs) largely relies on the process of aggregating information from neighbors defined by the input graph structures. Notably, message passing based GNNs, e. g. , graph convolutional networks, leverage the immediate neighbors of each node during the aggregation process, and recently, graph diffusion convolution (GDC) is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited. To address this issue, we propose the adaptive diffusion convolution (ADC) strategy to automatically learn the optimal neighborhood size from the data. Furthermore, we break the conventional assumption that all GNN layers and feature channels (dimensions) should use the same neighborhood for propagation. We design strategies to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures---the unique property that differs GNNs from traditional neural networks. By directly plugging ADC into existing GNNs, we observe consistent and significant outperformance over both GDC and their vanilla versions across various datasets, demonstrating the improved model capacity brought by automatically learning unique neighborhood size per layer and per channel in GNNs.

PDF Details

NeurIPS Conference 2021 Conference Paper

CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
Da Yin
Junyang Lin
Xu Zou

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e. g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e. g. eliminating NaN losses. CogView achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E.

PDF Details

NeurIPS Conference 2021 Conference Paper

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou

Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of investigating these control signals separately, we propose a new two-stage architecture, UFC-BERT, to unify any number of multi-modal controls. In UFC-BERT, both the diverse control signals and the synthesized image are uniformly represented as a sequence of discrete tokens to be processed by Transformer. Different from existing two-stage autoregressive approaches such as DALL-E and VQGAN, UFC-BERT adopts non-autoregressive generation (NAR) at the second stage to enhance the holistic consistency of the synthesized image, to support preserving specified image blocks, and to improve the synthesis speed. Further, we design a progressive algorithm that iteratively improves the non-autoregressively generated image, with the help of two estimators developed for evaluating the compliance with the controls and evaluating the fidelity of the synthesized image, respectively. Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi-Modal CelebA-HQ verify that UFC-BERT can synthesize high-fidelity images that comply with flexible multi-modal controls.

PDF Details

NeurIPS Conference 2020 Conference Paper

CogLTX: Applying BERT to Long Texts

Ming Ding
Chang Zhou
Hongxia Yang
Jie Tang

BERTs are incapable of processing long texts due to its quadratically increasing memory and time consumption. The straightforward thoughts to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The limited text length of BERT reminds us the limited capacity (5∼ 9 chunks) of the working memory of humans – then how do human beings Cognize Long TeXts? Founded on the cognitive theory stemming from Baddeley, our CogLTX framework identifies key sentences by training a judge model, concatenates them for reasoning and enables multi-step reasoning via rehearsal and decay. Since relevance annotations are usually unavailable, we propose to use treatment experiments to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on NewsQA, HotpotQA, multi-class and multi-label long-text classification tasks with memory overheads independent of the text length.

PDF Details

IJCAI Conference 2019 Conference Paper

ProNE: Fast and Scalable Network Representation Learning

Jie Zhang
Yuxiao Dong
Yan Wang
Jie Tang
Ming Ding

Recent advances in network embedding has revolutionized the field of graph and network mining. However, (pre-)training embeddings for very large-scale networks is computationally challenging for most existing methods. In this work, we present ProNE---a fast, scalable, and effective model, whose single-thread version is 10--400x faster than efficient network embedding benchmarks with 20 threads, including LINE, DeepWalk, node2vec, GraRep, and HOPE. As a concrete example, the single-version ProNE requires only 29 hours to embed a network of hundreds of millions of nodes while it takes LINE weeks and DeepWalk months by using 20 threads. To achieve this, ProNE first initializes network embeddings efficiently by formulating the task as sparse matrix factorization. The second step of ProNE is to enhance the embeddings by propagating them in the spectrally modulated space. Extensive experiments on networks of various scales and types demonstrate that ProNE achieves both effectiveness and significant efficiency superiority when compared to the aforementioned baselines. In addition, ProNE's embedding enhancement step can be also generalized for improving other models at speed, e. g. , offering >10% relative gains for the used baselines.

PDF Details

JBHI Journal 2015 Journal Article

A 0.33 nJ/bit IEEE802.15.6/Proprietary MICS/ISM Wireless Transceiver With Scalable Data Rate for Medical Implantable Applications

Ao Ba
Maja Vidojkovic
Kouichi Kanda
Nauman F. Kiyani
Maarten Lont
Xiongchuan Huang
Xiaoyan Wang
Cui Zhou

This paper presents an ultra-low power wireless transceiver specialized for but not limited to medical implantable applications. It operates at the 402-405-MHz medical implant communication service band, and also supports the 420-450-MHz industrial, scientific, and medical band. Being IEEE 802. 15. 6 standard compliant with additional proprietary modes, this highly configurable transceiver achieves date rates from 11 kb/s to 4. 5 Mb/s, which covers the requirements of conventional implantable applications. The phase-locked loop-based transmitter architecture is adopted to support various modulation schemes with limited power budget. The zero-IF receiver has programmable gain and bandwidth to accommodate different operation modes. Fabricated in 40-nm CMOS technology with 1-V supply, this transceiver only consumes 1. 78 mW for transmission and 1. 49 mW for reception. The ultra-low power consumption together with the 15. 6-compliant performance in term of modulation accuracy, sensitivity, and interference robustness make this transceiver competent for various implantable applications.

Details DOI