Author name cluster

Ge Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Dual-Res Tandem Mamba-3D: Bilateral Breast Lesion Detection and Classification on Non-contrast Chest CT

Jiaheng Zhou
Wei Fang
Luyuan Xie
Yanfeng Zhou
Lianyan Xu
Minfeng Xu
Ge Yang
Yuxing Tang

Breast cancer remains a leading cause of death among women, with early detection significantly improving prognosis. Non-contrast computed tomography (NCCT) scans of the chest, routinely acquired for thoracic assessments, often capture the breast region incidentally, presenting an underexplored opportunity for opportunistic breast lesion detection without additional imaging cost or radiation. However, the subtle appearance of lesions in NCCT and the difficulty of jointly modeling lesion detection and malignancy classification pose unique challenges. In this work, we propose Dual-Res Tandem Mamba-3D (DRT-M3D), a novel multitask framework for opportunistic breast cancer analysis on NCCT scans. DRT-M3D introduces a dual-resolution architecture, which captures fine-grained spatial details for segmentation-based lesion detection and global contextual features for breast-level cancer classification. It further incorporates a tandem input mechanism that models bilateral breast regions jointly through Mamba-3D blocks, enabling cross-breast feature interaction by leveraging subtle asymmetries between the two sides. Our approach achieves state-of-the-art performance in both tasks across multi-institutional NCCT datasets spanning four medical centers. Extensive experiments and ablation studies validate the effectiveness of each key component.

PDF Details

IROS Conference 2025 Conference Paper

Learning Generalizable Feature Fields for Mobile Manipulation

Ri-Zhao Qiu
Yafei Hu
Yuchen Song
Ge Yang
Yang Fu
Jianglong Ye
Jiteng Mu
Ruihan Yang

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner so that robots can use both for navigation and manipulation. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherent at an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We quantitatively evaluate GeFF’s ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs, with qualitative examples of semantics-aware navigation and articulated object manipulation.

Details

ICRA Conference 2025 Conference Paper

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

Chenhao Lu
Xuxin Cheng
Jialong Li 0003
Shiqi Yang
Mazeyu Ji
Chengjing Yuan
Ge Yang
Sha Yi

Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system re-mains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.

Details

ICRA Conference 2025 Conference Paper

WildLMa: Long Horizon Loco-Manipulation in the Wild

Ri-Zhao Qiu
Yuchen Song
Xuanbin Peng
Sai Aneesh Suryadevara
Ge Yang
Minghuan Liu
Mazeyu Ji
Chengzhe Jia

‘In-the-wild’ mobile manipulation aims to deploy robots in diverse real-world environments, which requires the robot to (1) have skills that generalize across object configurations; (2) be capable of long-horizon task execution in diverse environments; and (3) perform complex manipulation beyond pick-and-place. Quadruped robots with manipulators hold promise for extending the workspace and enabling robust locomotion, but existing results do not investigate such a capability. This paper proposes WildLMa with three components to address these issues: (1) adaptation of learned low-level controller for VR-enabled whole-body teleoperation and traversability; (2) WildLMa-Skill - a library of generalizable visuomotor skills acquired via imitation learning or heuristics and (3) WildLMa-Planner - an interface of learned skills that allow LLM planners to coordinate skills for long-horizon tasks. We demonstrate the importance of high-quality training data by achieving higher grasping success rate over existing RL baselines using only tens of demonstrations. WildLMa exploits CLIP for language-conditioned imitation learning that empirically generalizes to objects unseen in training demonstrations. Besides extensive quantitative evaluation, we qualitatively demonstrate practical robot applications, such as cleaning up trash in university hallways or outdoor terrains, operating articulated objects, and rearranging items on a bookshelf.

Details

ICML Conference 2024 Conference Paper

Compressing Large Language Models by Joint Sparsification and Quantization

Jinyang Guo
Jianyu Wu
Zining Wang
Jiaheng Liu
Ge Yang
Yifu Ding
Ruihao Gong
Haotong Qin

In this paper, we introduce a novel model compression technique named Joint Sparsification and Quantization (JSQ), explicitly tailored for large language models (LLMs). Traditional methods employ either sparsification or quantization individually to compress LLMs, leading to performance degradation at high compression ratios. In contrast, our JSQ approach integrates sparsification and quantization cohesively. As sparsification tend to preserve outliers that is harmful to quantization, we introduce a novel sparsity metric to serves as a bridge between the sparsification and quantization. Moreover, it is proven outliers in LLMs have significant impact but harmful to compression. Current solutions are highly coupled with quantization process, which is not helpful to sparsification. To this end, we also introduce a search-based activation editor to automatically eliminate relatively useless outliers. Comprehensive experiments across various datasets and architectures affirm the efficacy of our JSQ framework. Notably, our JSQ achieves 7. 96$\times$ computation reduction without crashing for the representative model LLaMA. This accomplishment stands in stark contrast to the limitations of most state-of-the-art LLM compression methods, which typically fail under such extreme compression ratios. Our code is released at https: //github. com/uanu2002/JSQ.

Details

NeurIPS Conference 2024 Conference Paper

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Ge Yang
Changyi He
Jinyang Guo
Jianyu Wu
Yifu Ding
Aishan Liu
Haotong Qin
Pengliang Ji

Although large language models (LLMs) have demonstrated their strong intelligence ability, the high demand for computation and storage hinders their practical application. To this end, many model compression techniques are proposed to increase the efficiency of LLMs. However, current researches only validate their methods on limited models, datasets, metrics, etc, and still lack a comprehensive evaluation under more general scenarios. So it is still a question of which model compression approach we should use under a specific case. To mitigate this gap, we present the Large Language Model Compression Benchmark (LLMCBench), a rigorously designed benchmark with an in-depth analysis for LLM compression algorithms. We first analyze the actual model production requirements and carefully design evaluation tracks and metrics. Then, we conduct extensive experiments and comparison using multiple mainstream LLM compression approaches. Finally, we perform an in-depth analysis based on the evaluation and provide useful insight for LLM compression design. We hope our LLMCBench can contribute insightful suggestions for LLM compression algorithm design and serve as a foundation for future research.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Compositional Sculpting of Iterative Generative Processes

Timur Garipov
Sebastiaan De Peuter
Ge Yang
Vikas Garg
Samuel Kaski
Tommi Jaakkola

High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\\unicode{x2014}$ the $\\textit{harmonic mean}\\unicode{x00A0}(p_1 \\otimes p_2$) and the $\\textit{contrast}\\unicode{x00A0}(p_1 \\, \\unicode{x25D1}\\, \\, p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks. Project codebase: https: //github. com/timgaripov/compositional-sculpting.

PDF Details

AAAI Conference 2022 Conference Paper

Deep Neural Networks Learn Meta-Structures from Noisy Labels in Semantic Segmentation

Yaoru Luo
Guole Liu
Yuanhao Guo
Ge Yang

How deep neural networks (DNNs) learn from noisy labels has been studied extensively in image classification but much less in image segmentation. So far, our understanding of the learning behavior of DNNs trained by noisy segmentation labels remains limited. In this study, we address this deficiency in both binary segmentation of biological microscopy images and multi-class segmentation of natural images. We generate extremely noisy labels by randomly sampling a small fraction (e. g. , 10%) or flipping a large fraction (e. g. , 90%) of the ground truth labels. When trained with these noisy labels, DNNs provide largely the same segmentation performance as trained by the original ground truth. This indicates that DNNs learn structures hidden in labels rather than pixellevel labels per se in their supervised training for semantic segmentation. We refer to these hidden structures in labels as meta-structures. When DNNs are trained by labels with different perturbations to the meta-structure, we find consistent degradation in their segmentation performance. In contrast, incorporation of meta-structure information substantially improves performance of an unsupervised segmentation model developed for binary semantic segmentation. We define metastructures mathematically as spatial density distributions and show both theoretically and experimentally how this formulation explains key observed learning behavior of DNNs.

PDF Details

NeurIPS Conference 2021 Conference Paper

Few-Shot Learning Evaluation in Natural Language Understanding

Subhabrata Mukherjee
Xiaodong Liu
Guoqing Zheng
Saghar Hosseini
Hao Cheng
Ge Yang
Christopher Meek
Ahmed Awadallah

Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. In fact, many NLU models have now matched or exceeded "human-level" performance on many tasks in these benchmarks. Most of these benchmarks, however, give models access to relatively large amounts of labeled data for training. As such, the models are provided far more data than required by humans to achieve strong performance. That has motivated a line of work that focuses on improving few-shot learning performance of NLU models. However, there is a lack of standardized evaluation benchmarks for few-shot NLU resulting in different experimental settings in different papers. To help accelerate this line of work, we introduce CLUES, a benchmark for evaluating the few-shot learning capabilities of NLU models. We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks. We also demonstrate differences between alternative model families and adaptation techniques in the few shot setting. Finally, we discuss several principles and choices in designing the experimental settings for evaluating the true few-shot learning performance and suggest a unified standardized approach to few-shot learning evaluation. We aim to encourage research on NLU models that can generalize to new tasks with a small number of examples. Code and data for CLUES are available at https: //github. com/microsoft/CLUES.

PDF Details

NeurIPS Conference 2021 Conference Paper

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Ge Yang
Edward Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
Jakub Pachocki

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call *$\mu$Transfer*: parametrize the target model in $\mu$P, tune the HP indirectly on a smaller model, and *zero-shot transfer* them to the full-sized model, i. e. , without directly tuning the latter at all. We verify $\mu$Transfer on Transformer and ResNet. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters), with a total tuning cost equivalent to pretraining BERT-large once; 2) by transferring from 40M parameters, we outperform published numbers of the 6. 7B GPT-3 model, with tuning cost only 7% of total pretraining cost. A Pytorch implementation of our technique can be found at github. com/microsoft/mup. See arxiv. org for the full, up-to-date version of this work.

PDF Details

NeurIPS Conference 2018 Conference Paper

Learning Plannable Representations with Causal InfoGAN

Thanard Kurutach
Aviv Tamar
Ge Yang
Stuart Russell
Pieter Abbeel

In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual plans -- a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control. We focus on systems with high-dimensional observations, such as images, and propose an approach that naturally combines representation learning and planning. Our framework learns a generative model of sequential observations, where the generative process is induced by a transition in a low-dimensional planning model, and an additional noise. By maximizing the mutual information between the generated observations and the transition in the planning model, we obtain a low-dimensional representation that best explains the causal nature of the data. We structure the planning model to be compatible with efficient planning algorithms, and we propose several such models based on either discrete or continuous states. Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations. We demonstrate our method on imagining plausible visual plans of rope manipulation.

PDF Details

NeurIPS Conference 2018 Conference Paper

The Importance of Sampling inMeta-Reinforcement Learning

Bradly Stadie
Ge Yang
Rein Houthooft
Peter Chen
Yan Duan
Yuhuai Wu
Pieter Abbeel
Ilya Sutskever

We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-$\text{RL}^2$. Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks.

PDF Details

NeurIPS Conference 2017 Conference Paper

Mean Field Residual Networks: On the Edge of Chaos

Ge Yang
Samuel Schoenholz

We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the "edge of chaos" hypothesis, these subexponential and polynomial laws allow residual networks to "hover over the boundary between stability and chaos, " thus preserving the geometry of the input space and the gradient information flow. In our experiments, for each activation function we study here, we initialize residual networks with different hyperparameters and train them on MNIST. Remarkably, our initialization time theory can accurately predict test time performance of these networks, by tracking either the expected amount of gradient explosion or the expected squared distance between the images of two input vectors. Importantly, we show, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth. Finally, we have made mathematical contributions by deriving several new identities for the kernels of powers of ReLU functions by relating them to the zeroth Bessel function of the second kind.

PDF Details