Author name cluster

Junjie Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

HyperLoad: A Cross-Modality Enhanced Large Language Model-Based Framework for Green Data Center Cooling Load Prediction

Haoyu Jiang
Boan Qu
Junjie Zhu
Fanjie Zeng
Xiaojie Lin
Wei Zhong

The explosive growth of artificial intelligence is exponentially escalating computational demand, inflating data center energy use and carbon emissions, and spurring rapid deployment of green data centers to relieve resource and environmental stress. Achieving sub-minute orchestration of renewables, storage, and loads, while minimizing PUE and lifecycle carbon intensity, hinges on accurate load forecasting. However, existing methods struggle to address small-sample scenarios caused by cold start, load distortion, multi-source data fragmentation, and distribution shifts in green data centers. We introduce HyperLoad, a cross-modality framework that exploits pre-trained large language models (LLMs) to overcome data scarcity. In the Cross-Modality Knowledge Alignment phase, textual priors and time-series attributes are mapped to a common latent space, maximizing the utility of prior knowledge. In the Multi-Scale Feature Modeling phase, domain-aligned priors are injected through adaptive prefix-tuning, enabling rapid scenario adaptation, while an Enhanced Global Interaction Attention mechanism captures cross-device temporal dependencies. The public GreenData dataset is released for benchmarking. Under both data-sufficient and data scarce regimes, HyperLoad consistently surpasses state-of-the-art (SOTA) baselines, demonstrating its practicality for sustainable green data center management.

PDF Details DOI

JBHI Journal 2025 Journal Article

Benchmarking Large Language Models in Evidence-Based Medicine

Jin Li
Yiyan Deng
Qi Sun
Junjie Zhu
Yu Tian
Jingsong Li
Tingting Zhu

Evidence-based medicine (EBM) represents a paradigm of providing patient care grounded in the most current and rigorously evaluated research. Recent advances in large language models (LLMs) offer a potential solution to transform EBM by automating labor-intensive tasks and thereby improving the efficiency of clinical decision-making. This study explores integrating LLMs into the key stages in EBM, evaluating their ability across evidence retrieval (PICO extraction, biomedical question answering), synthesis (summarizing randomized controlled trials), and dissemination (medical text simplification). We conducted a comparative analysis of seven LLMs, including both proprietary and open-source models, as well as those fine-tuned on medical corpora. Specifically, we benchmarked the performance of various LLMs on each EBM task under zero-shot settings as baselines, and employed prompting techniques, including in-context learning, chain-of-thought reasoning, and knowledge-guided prompting to enhance their capabilities. Our extensive experiments revealed the strengths of LLMs, such as remarkable understanding capabilities even in zero-shot settings, strong summarization skills, and effective knowledge transfer via prompting. Promoting strategies such as knowledge-guided prompting proved highly effective (e. g. , improving the performance of GPT-4 by 13. 10% over zero-shot in PICO extraction). However, the experiments also showed limitations, with LLM performance falling well below state-of-the-art baselines like PubMedBERT in handling named entity recognition tasks. Moreover, human evaluation revealed persisting challenges with factual inconsistencies and domain inaccuracies, underscoring the need for rigorous quality control before clinical application. This study provides insights into enhancing EBM using LLMs while highlighting critical areas for further research.

Details DOI

AAAI Conference 2025 Conference Paper

Graph-Based Cross-Domain Knowledge Distillation for Cross-Dataset Text-to-Image Person Retrieval

Bingjun Luo
Jinpeng Wang
Zewen Wang
Junjie Zhu
Xibin Zhao

Video surveillance systems are crucial components for ensuring public safety and management in smart city. As a fundamental task in video surveillance, text-to-image person retrieval aims to retrieve the target person from an image gallery that best matches the given text description. Most existing text-to-image person retrieval methods are trained in a supervised manner that requires sufficient labeled data in the target domain. However, it is common in practice that only unlabeled data is available in the target domain due to the difficulty and cost of data annotation, which limits the generalization of existing methods in practical application scenarios. To address this issue, we propose a novel unsupervised domain adaptation method, termed Graph-Based Cross-Domain Knowledge Distillation (GCKD), to learn the cross-modal feature representation for text-to-image person retrieval in a cross-dataset scenario. The proposed GCKD method consists of two main components. Firstly, a graph-based multi-modal propagation module is designed to bridge the cross-domain correlation among the visual and textual samples. Secondly, a contrastive momentum knowledge distillation module is proposed to learn the cross-modal feature representation using the online knowledge distillation strategy. By jointly optimizing the two modules, the proposed method is able to achieve efficient performance for cross-dataset text-to-image person retrieval. Extensive experiments on three publicly available text-to-image person retrieval datasets demonstrate the effectiveness of the proposed GCKD method, which consistently outperforms the state-of-the-art baselines.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition

Bingjun Luo
Haowen Wang
Jinpeng Wang
Junjie Zhu
Xibin Zhao
Yue Gao

With the strong robusticity on illumination variations, near-infrared (NIR) can be an effective and essential complement to visible (VIS) facial expression recognition in low lighting or complete darkness conditions. However, facial expression recognition (FER) from NIR images presents a more challenging problem than traditional FER due to the limitations imposed by the data scale and the difficulty of extracting discriminative features from incomplete visible lighting contents. In this paper, we give the first attempt at deep NIR facial expression recognition and propose a novel method called near-infrared facial expression transformer (NFER-Former). Specifically, to make full use of the abundant label information in the field of VIS, we introduce a Self-Attention Orthogonal Decomposition mechanism that disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We also propose a Hypergraph-Guided Feature Embedding method that models some key facial behaviors and learns the structure of the complex correlations between them, thereby alleviating the interference of inter-class similarity. Additionally, we construct a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former. Extensive experiments and ablation studies show that NFER-Former significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

PDF Details DOI

ICML Conference 2024 Conference Paper

Language Models as Science Tutors

Alexis Chevalier
Jiayi Geng
Alexander Wettig
Howard Chen 0003
Sebastian Mizera
Toni Annala
Max Jameson Aragon
Arturo Rodríguez Fanlo

NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TutorEval helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multi-disciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. Therefore, we create TutorChat, a dataset of 80, 000 long synthetic dialogues about textbooks. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH. Our datasets build on open-source materials, and we release our models, data, and evaluations publicly.

Details

AAAI Conference 2024 Conference Paper

Multi-Energy Guided Image Translation with Stochastic Differential Equations for Near-Infrared Facial Expression Recognition

Bingjun Luo
Zewen Wang
Jinpeng Wang
Junjie Zhu
Xibin Zhao
Yue Gao

Illumination variation has been a long-term challenge in real-world facial expression recognition (FER). Under uncontrolled or non-visible light conditions, near-infrared (NIR) can provide a simple and alternative solution to obtain high-quality images and supplement the geometric and texture details that are missing in the visible (VIS) domain. Due to the lack of large-scale NIR facial expression datasets, directly extending VIS FER methods to the NIR spectrum may be ineffective. Additionally, previous heterogeneous image synthesis methods are restricted by low controllability without prior task knowledge. To tackle these issues, we present the first approach, called for NIR-FER Stochastic Differential Equations (NFER-SDE), that transforms face expression appearance between heterogeneous modalities to the overfitting problem on small-scale NIR data. NFER-SDE can take the whole VIS source image as input and, together with domain-specific knowledge, guide the preservation of modality-invariant information in the high-frequency content of the image. Extensive experiments and ablation studies show that NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learning Deep Hierarchical Features with Spatial Regularization for One-Class Facial Expression Recognition

Bingjun Luo
Junjie Zhu
Tianyu Yang
Sicheng Zhao
Chao Hu
Xibin Zhao
Yue Gao

Existing methods on facial expression recognition (FER) are mainly trained in the setting when multi-class data is available. However, to detect the alien expressions that are absent during training, this type of methods cannot work. To address this problem, we develop a Hierarchical Spatial One Class Facial Expression Recognition Network (HS-OCFER) which can construct the decision boundary of a given expression class (called normal class) by training on only one-class data. Specifically, HS-OCFER consists of three novel components. First, hierarchical bottleneck modules are proposed to enrich the representation power of the model and extract detailed feature hierarchy from different levels. Second, multi-scale spatial regularization with facial geometric information is employed to guide the feature extraction towards emotional facial representations and prevent the model from overfitting extraneous disturbing factors. Third, compact intra-class variation is adopted to separate the normal class from alien classes in the decision space. Extensive evaluations on 4 typical FER datasets from both laboratory and wild scenarios show that our method consistently outperforms state-of-the-art One-Class Classification (OCC) approaches.

PDF Details DOI

AAAI Conference 2023 Conference Paper

People Taking Photos That Faces Never Share: Privacy Protection and Fairness Enhancement from Camera to User

Junjie Zhu
Lin Gu
Xiaoxiao Wu
Zheng Li
Tatsuya Harada
Yingying Zhu

The soaring number of personal mobile devices and public cameras poses a threat to fundamental human rights and ethical principles. For example, the stolen of private information such as face image by malicious third parties will lead to catastrophic consequences. By manipulating appearance of face in the image, most of existing protection algorithms are effective but irreversible. Here, we propose a practical and systematic solution to invertiblely protect face information in the full-process pipeline from camera to final users. Specifically, We design a novel lightweight Flow-based Face Encryption Method (FFEM) on the local embedded system privately connected to the camera, minimizing the risk of eavesdropping during data transmission. FFEM uses a flow-based face encoder to encode each face to a Gaussian distribution and encrypts the encoded face feature by random rotating the Gaussian distribution with the rotation matrix is as the password. While encrypted latent-variable face images are sent to users through public but less reliable channels, password will be protected through more secure channels through technologies such as asymmetric encryption, blockchain, or other sophisticated security schemes. User could select to decode an image with fake faces from the encrypted image on the public channel. Only trusted users are able to recover the original face using the encrypted matrix transmitted in secure channel. More interestingly, by tuning Gaussian ball in latent space, we could control the fairness of the replaced face on attributes such as gender and race. Extensive experiments demonstrate that our solution could protect privacy and enhance fairness with minimal effect on high-level downstream task.

PDF Details DOI

JBHI Journal 2018 Journal Article

An Integrated Maximum Current Density Approach for Noninvasive Detection of Myocardial Infarction

Chen Zhao
Shiqin Jiang
Yanhua Wu
Junjie Zhu
Dafang Zhou
Birgit Hailer
Dietrich Gronemeyer
Peter Van Leeuwen

We present a new approach of integrated maximum current density (IMCD) for the noninvasive detection of myocardial infarction (MI) using magnetocardiography (MCG) data acquired from a superconducting quantum interference device (SQUID) system. In this paper, we investigated the relationship of the maximum current density (MCD) in the current density map and the underlying equivalent current dipole (ECD) based on a novel method of reconstructing the ECD in the extremum circle of the magnetic field map. The performance of IMCD and the integrated ECD (IECD) approaches were also evaluated by using 61-channel MCG data from 39 healthy subjects and 102 patients with ST elevation myocardial infarction (STEMI). Statistical analysis of the healthy and STEMI groups demonstrate that the IMCD approach obtains sensitivity and specificity up to 91. 2% and 84. 6%, somewhat higher than that of IECD, respectively. The results indicate that IMCD provides spatiotemporal information regarding cardiac electrical activity during ventricular repolarization. This approach may be helpful to diagnose MI in clinic application. The physical concept of the approach is also explained in this paper.

Details DOI

NeurIPS Conference 2016 Conference Paper

Unsupervised Learning from Noisy Networks with Applications to Hi-C Data

Bo Wang
Junjie Zhu
Armin Pourshafeie
Oana Ursu
Serafim Batzoglou
Anshul Kundaje

Complex networks play an important role in a plethora of disciplines in natural sciences. Cleaning up noisy observed networks, poses an important challenge in network analysis Existing methods utilize labeled data to alleviate the noise effect in the network. However, labeled data is usually expensive to collect while unlabeled data can be gathered cheaply. In this paper, we propose an optimization framework to mine useful structures from noisy networks in an unsupervised manner. The key feature of our optimization framework is its ability to utilize local structures as well as global patterns in the network. We extend our method to incorporate multi-resolution networks in order to add further resistance to high-levels of noise. We also generalize our framework to utilize partial labels to enhance the performance. We specifically focus our method on multi-resolution Hi-C data by recovering clusters of genomic regions that co-localize in 3D space. Additionally, we use Capture-C-generated partial labels to further denoise the Hi-C network. We empirically demonstrate the effectiveness of our framework in denoising the network and improving community detection results.

PDF Details