Arrow Research search

Author name cluster

Sijin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

ICLR Conference 2025 Conference Paper

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

  • Sijin Chen
  • Omar Hagrass
  • Jason M. Klusowski

Decoding strategies play a pivotal role in text generation for modern language models, yet a puzzling gap divides theory and practice. Surprisingly, strategies that should intuitively be optimal, such as Maximum a Posteriori (MAP), often perform poorly in practice. Meanwhile, popular heuristic approaches like Top-$k$ and Nucleus sampling, which employ truncation and normalization of the conditional next-token probabilities, have achieved great empirical success but lack theoretical justifications. In this paper, we propose Decoding Game, a comprehensive theoretical framework which reimagines text generation as a two-player zero-sum game between Strategist, who seeks to produce text credible in the true distribution, and Nature, who distorts the true distribution adversarially. After discussing the decomposibility of multi-step generation, we derive the optimal strategy in closed form for one-step Decoding Game. It is shown that the adversarial Nature imposes an implicit regularization on likelihood maximization, and truncation-normalization methods are first-order approximations to the optimal strategy under this regularization. Additionally, by generalizing the objective and parameters of Decoding Game, near-optimal strategies encompass diverse methods such as greedy search, temperature scaling, and hybrids thereof. Numerical experiments are conducted to complement our theoretical analysis.

ICLR Conference 2025 Conference Paper

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

  • Yiwen Chen
  • Tong He 0001
  • Di Huang
  • Weicai Ye
  • Sijin Chen
  • Jiaxiang Tang
  • Zhongang Cai
  • Lei Yang 0045

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

NeurIPS Conference 2025 Conference Paper

OmniSVG: A Unified Scalable Vector Graphics Generation Model

  • Yiying Yang
  • Wei Cheng
  • Sijin Chen
  • Xianfang Zeng
  • Fukun Yin
  • Jiaxu Zhang
  • Liao Wang
  • Gang Yu

Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.

NeurIPS Conference 2024 Conference Paper

3DET-Mamba: Causal Sequence Modelling for End-to-End 3D Object Detection

  • Mingsheng Li
  • Jiakang Yuan
  • Sijin Chen
  • Lin Zhang
  • Anyu Zhu
  • Xin Chen
  • Tao Chen

Transformer-based architectures have been proven successful in detecting 3D objects from point clouds. However, the quadratic complexity of the attention mechanism struggles to encode rich information as point cloud resolution increases. Recently, state space models (SSM) such as Mamba have gained great attention due to their linear complexity and long sequence modeling ability for language understanding. To exploit the potential of Mamba on 3D scene-level perception, for the first time, we propose 3DET-Mamba, which is a novel SSM-based model designed for indoor 3d object detection. Specifically, we divide the point cloud into different patches and use a lightweight yet effective Inner Mamba to capture local geometric information. To observe the scene from a global perspective, we introduce a novel Dual Mamba module that models the point cloud in terms of spatial distribution and continuity. Additionally, we design a Query-aware Mamba module that decodes context features into object sets under the guidance of learnable queries. Extensive experiments demonstrate that 3DET-Mamba surpasses previous 3DETR on indoor 3D detection benchmarks such as ScanNet, improving AP25/AP50 from 65. 0\%/47. 0\% to 70. 4\%/54. 4\%, respectively.

NeurIPS Conference 2024 Conference Paper

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

  • Sijin Chen
  • Xin Chen
  • Anqi Pang
  • Xianfang Zeng
  • Wei Cheng
  • Yijun Fu
  • Fukun Yin
  • Zhibin Wang

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models that addresses 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.

AAAI Conference 2021 Conference Paper

CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud

  • Wu Zheng
  • Weiliang Tang
  • Sijin Chen
  • Li Jiang
  • Chi-Wing Fu

Existing single-stage detectors for locating objects in point clouds often treat object localization and category classification as separate tasks, so the localization accuracy and classification confidence may not well align. To address this issue, we present a new single-stage detector named the Confident IoU-Aware Single-Stage object Detector (CIA-SSD). First, we design the lightweight Spatial-Semantic Feature Aggregation module to adaptively fuse high-level abstract semantic features and low-level spatial features for accurate predictions of bounding boxes and classification confidence. Also, the predicted confidence is further rectified with our designed IoU-aware confidence rectification module to make the confidence more consistent with the localization accuracy. Based on the rectified confidence, we further formulate the Distance-variant IoU-weighted NMS to obtain smoother regressions and avoid redundant predictions. We experiment CIA-SSD on 3D car detection in the KITTI test set and show that it attains top performance in terms of the official ranking metric (moderate AP 80. 28%) and above 32 FPS inference speed, outperforming all prior single-stage detectors. The code is available at https: //github. com/Vegeta2020/CIA-SSD.

RLDM Conference 2019 Conference Abstract

Divergent Strategies for Learning in Males and Females

  • Sijin Chen
  • Becket Ebitz
  • Benjamin Hayden
  • Nicola Grissom

While gender and sex differences in most behavioral outcomes are small, there is evidence to sug- gest more substantial divergence in the cognitive strategies preferentially used by males and females. Unob- served computational differences due to sex or gender could cloud any attempt to understand interindividual variability. To address this omission, we examined strategy selection in a large sample of both male and female mice performing a classic decision-making task: the two-armed bandit. In this task, animals adopt a variety of strategies, which evolve as they learn. This means that identical final levels of performance can be achieved through widely divergent strategic paths. Here, we quantified these strategic paths. We found that one of the major axes of interindividual variability in strategy was the sex of the animals. While males and females ended at the same performance level, females learned more rapidly than their male counterparts because the sexes differed by the strategy applied during learning. Female mice as a group adopted a unified, systematic approach which reduced the dimensionality of the decision-space early in learning. Conversely, males engaged in ever-changing strategies not only between males but within an individual male over mul- tiple iterations of the task. These results suggest that similar levels of performance can be achieved through widely divergent approaches, within and between subjects, and that sex is a significant factor governing strategy selection in decision making and learning. These results highlight the need to consider sex and gender influences on cognitive strategies in decision making and reinforcement learning.