Arrow Research search

Author name cluster

Yuwei Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

IROS Conference 2025 Conference Paper

Depth Estimation Based on Fisheye Cameras

  • Yuwei Zhou
  • Guoyu Lu 0001

Fisheye cameras, with their ultra-wide field of view, offer significant benefits for depth estimation in applications such as autonomous navigation, robotics, and immersive imaging by capturing more scene content from a single viewpoint. However, their strong radial distortion and varying spatial resolution across the image pose substantial challenges for accurate depth prediction. We present a deep learning–based framework for fisheye depth estimation that addresses these challenges while leveraging the wide coverage advantage. During training, rectified and synchronized stereo image pairs are used, with the right image and an estimated initial depth map reconstructing the left image. A refined spatial consistency loss is formulated by combining Structural Similarity Index Measure (SSIM) and L1 loss, with gradient-based weighting to emphasize disparity edges. To overcome the limitations of photometric loss in disparity learning, we normalize pixel intensities to better correlate disparity with appearance features. A fisheye-specific depth refinement module incorporates an uncertainty map derived from an inconsistency mask and a distortion distribution map, mitigating the effects of occlusion and high-distortion regions. This uncertainty map is used to weight the temporal warping loss, enhancing robustness against distortion-prone areas. During inference, only a single fisheye image is required to produce an accurate depth map. Experimental results demonstrate that our method improves reconstruction fidelity and robustness, making it well-suited for real-world fisheye-based depth estimation tasks.

ICML Conference 2024 Conference Paper

CurBench: Curriculum Learning Benchmark

  • Yuwei Zhou
  • Zirui Pan
  • Xin Wang 0019
  • Hong Chen 0011
  • Haoyang Li 0001
  • Yanwen Huang
  • Zhixiao Xiong
  • Fangzhou Xiong

Curriculum learning is a training paradigm where machine learning models are trained in a meaningful order, inspired by the way humans learn curricula. Due to its capability to improve model generalization and convergence, curriculum learning has gained considerable attention and has been widely applied to various research domains. Nevertheless, as new curriculum learning methods continue to emerge, it remains an open issue to benchmark them fairly. Therefore, we develop CurBench, the first benchmark that supports systematic evaluations for curriculum learning. Specifically, it consists of 15 datasets spanning 3 research domains: computer vision, natural language processing, and graph machine learning, along with 3 settings: standard, noise, and imbalance. To facilitate a comprehensive comparison, we establish the evaluation from 2 dimensions: performance and complexity. CurBench also provides a unified toolkit that plugs automatic curricula into general machine learning processes, enabling the implementation of 15 core curriculum learning methods. On the basis of this benchmark, we conduct comparative experiments and make empirical analyses of existing methods. CurBench is open-source and publicly available at https: //github. com/THUMNLab/CurBench.

ICLR Conference 2024 Conference Paper

DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation

  • Hong Chen 0011
  • Yipeng Zhang 0003
  • Simin Wu
  • Xin Wang 0019
  • Xuguang Duan
  • Yuwei Zhou
  • Wenwu Zhu 0001

Subject-driven text-to-image generation aims to generate customized images of the given subject based on the text descriptions, which has drawn increasing attention. Existing methods mainly resort to finetuning a pretrained generative model, where the identity-relevant information (e.g., the boy) and the identity-irrelevant information (e.g., the background or the pose of the boy) are entangled in the latent embedding space. However, the highly entangled latent embedding may lead to the failure of subject-driven text-to-image generation as follows: (i) the identity-irrelevant information hidden in the entangled embedding may dominate the generation process, resulting in the generated images heavily dependent on the irrelevant information while ignoring the given text descriptions; (ii) the identity-relevant information carried in the entangled embedding can not be appropriately preserved, resulting in identity change of the subject in the generated images. To tackle the problems, we propose DisenBooth, an identity-preserving disentangled tuning framework for subject-driven text-to-image generation. Specifically, DisenBooth finetunes the pretrained diffusion model in the denoising process. Different from previous works that utilize an entangled embedding to denoise each image, DisenBooth instead utilizes disentangled embeddings to respectively preserve the subject identity and capture the identity-irrelevant information. We further design the novel weak denoising and contrastive embedding auxiliary tuning objectives to achieve the disentanglement. Extensive experiments show that our proposed DisenBooth framework outperforms baseline models for subject-driven text-to-image generation with the identity-preserved embedding. Additionally, by combining the identity-preserved embedding and identity-irrelevant embedding, DisenBooth demonstrates more generation flexibility and controllability.

IROS Conference 2024 Conference Paper

Simultaneous Super-resolution and Depth Estimation for Satellite Images Based on Diffusion Model

  • Yuwei Zhou
  • Yangming Lee

Satellite images provide an effective way to observe the earth surface on a large scale. 3D landscape models can provide critical structural information, such as forestry and crop growth. However, there has been very limited research to estimate the depth and the 3D models of the earth based on satellite images. LiDAR measurements on satellites are usually quite sparse. RGB images have higher resolution than LiDAR, but there has been little research on 3D surface measurements based on satellite RGB images. In comparison with in-situ sensing, satellite RGB images are usually low resolution. In this research, we explore the method that can enhance the satellite image resolution to generate super-resolution images and then conduct depth estimation and 3D reconstruction based on higher-resolution satellite images. Leveraging the strong generation capability of diffusion models, we developed a simultaneous diffusion model learning framework that can train diffusion models for both super-resolution images and depth estimation. With the super-resolution images and the corresponding depth maps, 3D surface reconstruction models with detailed landscape information can be generated. We evaluated the proposed methodology on multiple satellite datasets for both super-resolution and depth estimation tasks, which have demonstrated the effectiveness of our methodology.

ICML Conference 2023 Conference Paper

Curriculum Co-disentangled Representation Learning across Multiple Environments for Social Recommendation

  • Xin Wang 0019
  • Zirui Pan
  • Yuwei Zhou
  • Hong Chen 0011
  • Chendi Ge
  • Wenwu Zhu 0001

There exist complex patterns behind the decision-making processes of different individuals across different environments. For instance, in a social recommender system, various user behaviors are driven by highly entangled latent factors from two environments, i. e. , consuming environment where users consume items and social environment where users connect with each other. Uncovering the disentanglement of these latent factors for users can benefit in enhanced explainability and controllability for recommendation. However, in literature there has been no work on social recommendation capable of disentangling user representations across consuming and social environments. To solve this problem, we study co-disentangled representation learning across different environments via proposing the curriculum co-disentangled representation learning (CurCoDis) model to disentangle the hidden factors for users across both consuming and social environments. To co-disentangle joint representations for user-item consumption and user-user social graph simultaneously, we partition the social graph into equal-size sub-graphs with minimum number of edges being cut, and design a curriculum weighing strategy for subgraph training through measuring the complexity of subgraphs via Descartes’ rule of signs. We further develop the prototype-routing optimization mechanism, which achieves co-disentanglement of user representations across consuming and social environments. Extensive experiments for social recommendation demonstrate that our proposed CurCoDis model can significantly outperform state-of-the-art methods on several real-world datasets.

NeurIPS Conference 2023 Conference Paper

Joint Data-Task Generation for Auxiliary Learning

  • Hong Chen
  • Xin Wang
  • Yuwei Zhou
  • Yijian Qin
  • Chaoyu Guan
  • Wenwu Zhu

Current auxiliary learning methods mainly adopt the methodology of reweighing losses for the manually collected auxiliary data and tasks. However, these methods heavily rely on domain knowledge during data collection, which may be hardly available in reality. Therefore, current methods will become less effective and even do harm to the primary task when unhelpful auxiliary data and tasks are employed. To tackle the problem, we propose a joint data-task generation framework for auxiliary learning (DTG-AuxL), which can bring benefits to the primary task by generating the new auxiliary data and task in a joint manner. The proposed DTG-AuxL framework contains a joint generator and a bi-level optimization strategy. Specifically, the joint generator contains a feature generator and a label generator, which are designed to be applicable and expressive for various auxiliary learning scenarios. The bi-level optimization strategy optimizes the joint generator and the task learning model, where the joint generator is effectively optimized in the upper level via the implicit gradient from the primary loss and the explicit gradient of our proposed instance regularization, while the task learning model is optimized in the lower level by the generated data and task. Extensive experiments show that our proposed DTG-AuxL framework consistently outperforms existing methods in various auxiliary learning scenarios, particularly when the manually collected auxiliary data and tasks are unhelpful.

NeurIPS Conference 2022 Conference Paper

Module-Aware Optimization for Auxiliary Learning

  • Hong Chen
  • Xin Wang
  • Yue Liu
  • Yuwei Zhou
  • Chaoyu Guan
  • Wenwu Zhu

Auxiliary learning is a widely adopted practice in deep learning, which aims to improve the model performance on the primary task by exploiting the beneficial information in the auxiliary loss. Existing auxiliary learning methods only focus on balancing the auxiliary loss and the primary loss, ignoring the module-level auxiliary influence, i. e. , an auxiliary loss will be beneficial for optimizing specific modules within the model but harmful to others, failing to make full use of auxiliary information. To tackle the problem, we propose a Module-Aware Optimization approach for Auxiliary Learning (MAOAL). The proposed approach considers the module-level influence through the learnable module-level auxiliary importance, i. e. , the importance of each auxiliary loss to each module. Specifically, the proposed approach jointly optimizes the module-level auxiliary importance and the model parameters in a bi-level manner. In the lower optimization, the model parameters are optimized with the importance parameterized gradient, while in the upper optimization, the module-level auxiliary importance is updated with the implicit gradient from a small developing dataset. Extensive experiments show that our proposed MAOAL method consistently outperforms state-of-the-art baselines for different auxiliary losses on various datasets, demonstrating that our method can serve as a powerful generic tool for auxiliary learning.