Arrow Research search

Author name cluster

Min Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

Slender3D: Curve-Guided Multi-View Reconstruction of Slender Structures

  • Suqin Wang
  • Zeyi Wang
  • Min Shi
  • Zhaoxin Li
  • Qi Wang
  • Xiujuan Chai
  • Dengming Zhu

Although geometric reconstruction of general objects from images has made remarkable progress in recent years, slender structures remain largely underexplored, despite their critical importance in engineering, biomedical, and agricultural applications. To bridge this gap, we propose a dedicated 2DGS-based geometric reconstruction framework tailored for slender structures, achieving accurate and faithful geometry recovery. Our method first addresses the challenge that most slender objects are texture-less, which hinders reliable feature matching and pose estimation in traditional SfM pipelines. By leveraging the curve-like nature of slender structures, we perform a curve-guided SfM process that provides robust camera poses and accurate 3D curve initialization for Gaussian primitives. To ensure SfM reliability, we introduce a high-precision mask extraction strategy that integrates geometric priors with a segmentation network, effectively handling self-occlusion and thin geometry. Furthermore, to enhance fine geometric recovery, we incorporate a differentiable Poisson reconstruction module to extract an initial mesh during training, which is then refined via image-space iterative optimization using differentiable mesh rasterization. In contrast to conventional approaches that rely on differentiable Gaussian rasterization followed by TSDF-based mesh extraction, our method avoids the additional geometric errors and artifacts introduced during the intermediate TSDF conversion, thereby improving the overall reconstruction quality. Comprehensive experiments on both synthetic and real-world datasets validate that our method achieves superior reconstruction quality compared to state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

Spiking Heterogeneous Graph Attention Networks

  • Buqing Cao
  • Qian Peng
  • Xiang Xie
  • Liang Chen
  • Min Shi
  • Jianxun Liu

Real-world graphs or networks are usually heterogeneous, involving multiple types of nodes and relationships. Heterogeneous graph neural networks (HGNNs) can effectively handle these diverse nodes and edges, capturing heterogeneous information within the graph, thus exhibiting outstanding performance. However, most methods of HGNNs usually involve complex structural designs, leading to problems such as high memory usage, long inference time, and extensive consumption of computing resources. These limitations pose certain challenges for the practical application of HGNNs, especially for resource-constrained devices. To mitigate this issue, we propose the Spiking Heterogeneous Graph Attention Networks (SpikingHAN), which incorporates the brain-inspired and energy-saving properties of Spiking Neural Networks (SNNs) into heterogeneous graph learning to reduce the computing cost without compromising the performance. Specifically, SpikingHAN aggregates metapath-based neighbor information using a single-layer graph convolution with shared parameters. It then employs a semantic-level attention mechanism to capture the importance of different meta-paths and performs semantic aggregation. Finally, it encodes the heterogeneous information into a spike sequence through SNNs, simulating bioinformatic processing to derive a binarized 1-bit representation of the heterogeneous graph. Comprehensive experimental results from three real-world heterogeneous graph datasets show that SpikingHAN delivers competitive node classification performance. It achieves this with fewer parameters, quicker inference, reduced memory usage, and lower energy consumption.

ICLR Conference 2025 Conference Paper

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

  • Min Shi
  • Fuxiao Liu
  • Shihao Wang
  • Shijia Liao
  • Subhashree Radhakrishnan
  • Yilin Zhao
  • De-An Huang
  • Hongxu Yin

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.

JBHI Journal 2023 Journal Article

Artifact-Tolerant Clustering-Guided Contrastive Embedding Learning for Ophthalmic Images in Glaucoma

  • Min Shi
  • Anagha Lokhande
  • Mojtaba S. Fazli
  • Vishal Sharma
  • Yu Tian
  • Yan Luo
  • Louis R. Pasquale
  • Tobias Elze

Ophthalmic images, along with their derivatives like retinal nerve fiber layer (RNFL) thickness maps, play a crucial role in detecting and monitoring eye diseases such as glaucoma. For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e. g. , RNFL thinning patterns) associated with functional vision loss. However, representation learning from ophthalmic images that links structural retinal damage with human vision loss is non-trivial mostly due to large anatomical variations between patients. This challenge is further amplified by the presence of image artifacts, commonly resulting from image acquisition and automated segmentation issues. In this paper, we present an artifact-tolerant unsupervised learning framework called EyeLearn for learning ophthalmic image representations in glaucoma cases. EyeLearn includes an artifact correction module to learn representations that optimally predict artifact-free images. In addition, EyeLearn adopts a clustering-guided contrastive learning strategy to explicitly capture the affinities within and between images. During training, images are dynamically organized into clusters to form contrastive samples, which encourage learning similar or dissimilar representations for images in the same or different clusters, respectively. To evaluate EyeLearn, we use the learned representations for visual field prediction and glaucoma detection with a real-world dataset of glaucoma patient ophthalmic images. Extensive experiments and comparisons with state-of-the-art methods confirm the effectiveness of EyeLearn in learning optimal feature representations from ophthalmic images.

AAAI Conference 2021 Conference Paper

Consistent Right-Invariant Fixed-Lag Smoother with Application to Visual Inertial SLAM

  • Jianzhu Huai
  • Yukai Lin
  • Yuan Zhuang
  • Min Shi

State estimation problems without absolute position measurements routinely arise in navigation of unmanned aerial vehicles, autonomous ground vehicles, etc. whose proper operation relies on accurate state estimates and reliable covariances. Unaware of absolute positions, these problems have immanent unobservable directions. Traditional causal estimators, however, usually gain spurious information on the unobservable directions, leading to over-confident covariance inconsistent with actual estimator errors. The consistency problem of fixed-lag smoothers (FLSs) has only been attacked by the first estimate Jacobian (FEJ) technique because of the complexity to analyze their observability property. But the FEJ has several drawbacks hampering its wide adoption. To ensure the consistency of a FLS, this paper introduces the right invariant error formulation into the FLS framework. To our knowledge, we are the first to analyze the observability of a FLS with the right invariant error. Our main contributions are twofold. As the first novelty, to bypass the complexity of analysis with the classic observability matrix, we show that observability analysis of FLSs can be done equivalently on the linearized system. Second, we prove that the inconsistency issue in the traditional FLS can be elegantly solved by the right invariant error formulation without artificially correcting Jacobians. By applying the proposed FLS to the monocular visual inertial simultaneous localization and mapping (SLAM) problem, we confirm that the method consistently estimates covariance similarly to a batch smoother in simulation and that our method achieved comparable accuracy as traditional FLSs on real data.

IJCAI Conference 2021 Conference Paper

GAEN: Graph Attention Evolving Networks

  • Min Shi
  • Yu Huang
  • Xingquan Zhu
  • Yufei Tang
  • Yuan Zhuang
  • Jianxun Liu

Real-world networked systems often show dynamic properties with continuously evolving network nodes and topology over time. When learning from dynamic networks, it is beneficial to correlate all temporal networks to fully capture the similarity/relevance between nodes. Recent work for dynamic network representation learning typically trains each single network independently and imposes relevance regularization on the network learning at different time steps. Such a snapshot scheme fails to leverage topology similarity between temporal networks for progressive training. In addition to the static node relationships within each network, nodes could show similar variation patterns (e. g. , change of local structures) within the temporal network sequence. Both static node structures and temporal variation patterns can be combined to better characterize node affinities for unified embedding learning. In this paper, we propose Graph Attention Evolving Networks (GAEN) for dynamic network embedding with preserved similarities between nodes derived from their temporal variation patterns. Instead of training graph attention weights for each network independently, we allow model weights to share and evolve across all temporal networks based on their respective topology discrepancies. Experiments and validations, on four real-world dynamic graphs, demonstrate that GAEN outperforms the state-of-the-art in both link prediction and node classification tasks.

IJCAI Conference 2020 Conference Paper

Multi-Class Imbalanced Graph Convolutional Network Learning

  • Min Shi
  • Yufei Tang
  • Xingquan Zhu
  • David Wilson
  • Jianxun Liu

Networked data often demonstrate the Pareto principle (i. e. , 80/20 rule) with skewed class distributions, where most vertices belong to a few majority classes and minority classes only contain a handful of instances. When presented with imbalanced class distributions, existing graph embedding learning tends to bias to nodes from majority classes, leaving nodes from minority classes under-trained. In this paper, we propose Dual-Regularized Graph Convolutional Networks (DR-GCN) to handle multi-class imbalanced graphs, where two types of regularization are imposed to tackle class imbalanced representation learning. To ensure that all classes are equally represented, we propose a class-conditioned adversarial training process to facilitate the separation of labeled nodes. Meanwhile, to maintain training equilibrium (i. e. , retaining quality of fit across all classes), we force unlabeled nodes to follow a similar latent distribution to the labeled nodes by minimizing their difference in the embedding space. Experiments on real-world imbalanced graphs demonstrate that DR-GCN outperforms the state-of-the-art methods in node classification, graph clustering, and visualization.