Author name cluster

Jintao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

1 author row

AAAI Conference 2026 Conference Paper

Priority-Based Graph-Enhanced Reinforcement Learning for Robust Analog Circuit Optimization

Jintao Li
Zhenxin Chen
Sicheng He
Ao-Jin Li
Shui Yu

A primary motivation for analog integrated circuit (IC) design automation is the inefficiency of manual design in meeting increasingly stringent specifications, which often involve over 10 objectives. Recent advances in reinforcement learning (RL) emerge as a promising method, yet gaps remain when considering full design specifications, especially under process-voltage-temperature (PVT) variations. Excessive objectives lead to diminished reward signals, while varying PVT conditions result in conflicting gradients, both of which result in inefficient exploration. To address these, we propose a priority-based graph-enhanced RL framework. Specifically, using fuzzy logic converts quantitative rewards into qualitative priority signals, mitigating reward deterioration and enhancing exploration via entropy regularization. Furthermore, a graph-based representation compresses high-dimensional objective spaces under PVT variations into low-dimensional manifolds, enabling dynamic resource allocation to variation-sensitive regions and resolving gradient conflicts. Empirical results on various real-world analog ICs demonstrate that our method significantly outperforms existing RL algorithms, achieving superior solution quality and reducing simulation overhead.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

MISA: MIning Saliency-Aware Semantic Prior for Box Supervised Instance Segmentation

Hao Zhu
Yan Zhu
Jiayu Xiao
Yike Ma
Yucheng Zhang
Jintao Li
Feng Dai

Box supervised instance segmentation (BSIS) aims to achieve an effective trade-off between annotation costs and model performance by solely relying on bounding box annotations during training process. However, we observe that BSIS model is bottlenecked by the intricate objective under limited guidance, and tends to sacrifice segmentation capability in order to effectively recognize multiple instances. To boost the BSIS model's perceptual ability for object shape and contour, we introduce MISA, that is, MIning Saliency-Aware semantic prior from a well-optimized box supervised semantic segmentation (BSSS) network, and incorporating cross-model guidance into the learning process of BSIS. Specifically, we first design a Frequency-Space Distillation (FSD) module to extract assorted salient prior knowledge from BSSS model, and perform cross-model alignment for transfering the prior to BSIS model. Furthermore, we introduce Semantic-Enhanced Pairwise Affinity (SEPA), which borrows the object perceptual ability of BSSS model to emphasize the contribution of salient objects for pairwise affinity, providing more accurate guidance for the BSIS network. Extensive experiments show that our proposed MISA consistently surpasses the existing state-of-the-art methods by a large margin in the BSIS scenario.

PDF Details DOI

AAAI Conference 2023 Conference Paper

ERASER: AdvERsArial Sensitive Element Remover for Image Privacy Preservation

Guang Yang
Juan Cao
Danding Wang
Peng Qi
Jintao Li

The daily practice of online image sharing enriches our lives, but also raises a severe issue of privacy leakage. To mitigate the privacy risks during image sharing, some researchers modify the sensitive elements in images with visual obfuscation methods including traditional ones like blurring and pixelating, as well as generative ones based on deep learning. However, images processed by such methods may be recovered or recognized by models, which cannot guarantee privacy. Further, traditional methods make the images very unnatural with low image quality. Although generative methods produce better images, most of them suffer from insufficiency in the frequency domain, which influences image quality. Therefore, we propose the AdvERsArial Sensitive Element Remover (ERASER) to guarantee both image privacy and image quality. 1) To preserve image privacy, for the regions containing sensitive elements, ERASER guarantees enough difference after being modified in an adversarial way. Specifically, we take both the region and global content into consideration with a Prior Transformer and obtain the corresponding region prior and global prior. Based on the priors, ERASER is trained with an adversarial Difference Loss to make the content in the regions different. As a result, ERASER can reserve the main structure and change the texture of the target regions for image privacy preservation. 2) To guarantee the image quality, ERASER improves the frequency insufficiency of current generative methods. Specifically, the region prior and global prior are processed with Fast Fourier Convolution to capture characteristics and achieve consistency in both pixel and frequency domains. Quantitative analyses demonstrate that the proposed ERASER achieves a balance between image quality and image privacy preservation, while qualitative analyses demonstrate that ERASER indeed reduces the privacy risk from the visual perception aspect.

PDF Details DOI

AAAI Conference 2022 Conference Paper

DRAG: Dynamic Region-Aware GCN for Privacy-Leaking Image Detection

Guang Yang
Juan Cao
Qiang Sheng
Peng Qi
Xirong Li
Jintao Li

The daily practice of sharing images on social media raises a severe issue about privacy leakage. To address the issue, privacy-leaking image detection is studied recently, with the goal to automatically identify images that may leak privacy. Recent advance on this task benefits from focusing on crucial objects via pretrained object detectors and modeling their correlation. However, these methods have two limitations: 1) they neglect other important elements like scenes, textures, and objects beyond the capacity of pretrained object detectors; 2) the correlation among objects is fixed, but a fixed correlation is not appropriate for all the images. To overcome the limitations, we propose the Dynamic Region-Aware Graph Convolutional Network (DRAG) that dynamically finds out crucial regions including objects and other important elements, and models their correlation adaptively for each input image. To find out crucial regions, we cluster spatiallycorrelated feature channels into several region-aware feature maps. Further, we dynamically model the correlation with the self-attention mechanism and explore the interaction among the regions with a graph convolutional network. The DRAG achieved an accuracy of 87% on the largest dataset for privacy-leaking image detection, which is 10 percentage points higher than the state of the art. The further case study demonstrates that it found out crucial regions containing not only objects but other important elements like textures. The code and more details are in https: //github. com/guangyanng/DRAG.

PDF Details

NeurIPS Conference 2022 Conference Paper

HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies

Li Wang
Jie Yang
Weikai Chen
Xiaoxu Meng
Bo Yang
Jintao Li
Lin Gao

Neural implicit function based on signed distance field (SDF) has achieved impressive progress in reconstructing 3D models with high fidelity. However, such approaches can only represent closed shapes. Recent works based on unsigned distance function (UDF) are proposed to handle both watertight and open surfaces. Nonetheless, as UDF is signless, its direct output is limited to point cloud, which imposes an additional challenge on extracting high-quality meshes from discrete points. To address this issue, we present a new learnable implicit representation, coded HSDF, that connects the good ends of SDF and UDF. In particular, HSDF is able to represent arbitrary topologies containing both closed and open surfaces while being compatible with existing iso-surface extraction techniques for easy field-to-mesh conversion. In addition to predicting a UDF, we propose to learn an additional sign field via a simple classifier. Unlike traditional SDF, HSDF is able to locate the surface of interest before level surface extraction by generating surface points following NDF~\cite{chibane2020ndf}. We are then able to obtain open surfaces via an adaptive meshing approach that only instantiates regions containing surface into a polygon mesh. We also propose HSDF-Net, a dedicated learning framework that factorizes the learning of HSDF into two easier problems. Experiments on multiple datasets show that HSDF outperforms state-of-the-art techniques both qualitatively and quantitatively.

PDF Details

IJCAI Conference 2018 Conference Paper

High Resolution Feature Recovering for Accelerating Urban Scene Parsing

Rui Zhang
Sheng Tang
Luoqi Liu
Yongdong Zhang
Jintao Li
Shuicheng Yan

Both accuracy and speed are equally important in urban scene parsing. Most of the existing methods mainly focus on improving parsing accuracy, ignoring the problem of low inference speed due to large-sized input and high resolution feature maps. To tackle this issue, we propose a High Resolution Feature Recovering (HRFR) framework to accelerate a given parsing network. A Super-Resolution Recovering module is employed to recover features of large original-sized images from features of down-sampled input. Therefore, our framework can combine the advantages of (1) fast speed of networks with down-sampled input and (2) high accuracy of networks with large original-sized input. Additionally, we employ auxiliary intermediate supervision and boundary region re-weighting to facilitate the optimization of the network. Extensive experiments on the two challenging Cityscapes and CamVid datasets well demonstrate the effectiveness of the proposed HRFR framework, which can accelerate the scene parsing inference process by about 3. 0x speedup from 1/2 down-sampled input with negligible accuracy reduction.

PDF Details

IJCAI Conference 2017 Conference Paper

Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions

Rui Zhang
Sheng Tang
Min Lin
Jintao Li
Shuicheng Yan

Most of existing scene parsing methods suffer from the serious problems of both inconsistent parsing results and object boundary shift. To tackle these problems, we first propose an iterative Global-residual Refinement Network (GRN) through exploiting global contextual information to predict the parsing residuals and iteratively smoothen the inconsistent parsing labels. Furthermore, we propose a Local-boundary Refinement Network (LRN) to learn the position-adaptive propagation coefficients so that local contextual information from neighbors can be optimally captured for refining object boundaries. Finally, we cascade the proposed two refinement networks after a fully residual convolutional neural network within a uniform framework. Extensive experiments on ADE20K and Cityscapes datasets well demonstrate the effectiveness of the two refinement methods for refining scene parsing predictions.

PDF Details

IJCAI Conference 2017 Conference Paper

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

Bo Wu
Wen-Huang Cheng
Yongdong Zhang
Qiushi Huang
Jintao Li
Tao Mei

Prediction of popularity has profound impact for social media, since it offers opportunities to reveal individual preference and public attention from evolutionary social systems. Previous research, although achieves promising results, neglects one distinctive characteristic of social data, i. e. , sequentiality. For example, the popularity of online content is generated over time with sequential post streams of social media. To investigate the sequential prediction of popularity, we propose a novel prediction framework called Deep Temporal Context Networks (DTCN) by incorporating both temporal context and temporal attention into account. Our DTCN contains three main components, from embedding, learning to predicting. With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space. Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Finally, a novel temporal attention is designed to predict new popularity (the popularity of a new user-post pair) with temporal coherence across multiple time-scales. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms, with an average of 21. 51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).

PDF Details

TIST Journal 2017 Journal Article

Sparse Online Learning of Image Similarity

Xingyu Gao
Steven C. H. Hoi
Yongdong Zhang
Jianshe Zhou
Ji Wan
Zhenyu Chen
Jintao Li
Jianke Zhu

Learning image similarity plays a critical role in real-world multimedia information retrieval applications, especially in Content-Based Image Retrieval (CBIR) tasks, in which an accurate retrieval of visually similar objects largely relies on an effective image similarity function. Crafting a good similarity function is very challenging because visual contents of images are often represented as feature vectors in high-dimensional spaces, for example, via bag-of-words (BoW) representations, and traditional rigid similarity functions, for example, cosine similarity, are often suboptimal for CBIR tasks. In this article, we address this fundamental problem, that is, learning to optimize image similarity with sparse and high-dimensional representations from large-scale training data, and propose a novel scheme of Sparse Online Learning of Image Similarity (SOLIS). In contrast to many existing image-similarity learning algorithms that are designed to work with low-dimensional data, SOLIS is able to learn image similarity from large-scale image data in sparse and high-dimensional spaces. Our encouraging results showed that the proposed new technique achieves highly competitive accuracy as compared to the state-of-the-art approaches but enjoys significant advantages in computational efficiency, model sparsity, and retrieval scalability, making it more practical for real-world multimedia retrieval applications.

Details DOI

IJCAI Conference 2015 Conference Paper

Online Learning to Rank for Content-Based Image Retrieval

Ji Wan
Pengcheng Wu
Steven C. H. Hoi
Peilin Zhao
Xingyu Gao
Dayong Wang
Yongdong Zhang
Jintao Li

A major challenge in Content-Based Image Retrieval (CBIR) is to bridge the semantic gap between low-level image contents and high-level semantic concepts. Although researchers have investigated a variety of retrieval techniques using different types of features and distance functions, no single best retrieval solution can fully tackle this challenge. In a real-world CBIR task, it is often highly desired to combine multiple types of different feature representations and diverse distance measures in order to close the semantic gap. In this paper, we investigate a new framework of learning to rank for CBIR, which aims to seek the optimal combination of different retrieval schemes by learning from large-scale training data in CBIR. We first formulate the problem formally as a learning to rank task, which can be solved in general by applying the existing batch learning to rank algorithms from text information retrieval (IR). To further address the scalability towards large-scale online CBIR applications, we present a family of online learning to rank algorithms, which are significantly more efficient and scalable than classical batch algorithms for large-scale online CBIR. Finally, we conduct an extensive set of experiments, in which encouraging results show that our technique is effective, scalable and promising for large-scale CBIR.

PDF Details

TIST Journal 2014 Journal Article

A Unified Geolocation Framework for Web Videos

Yicheng Song
Yongdong Zhang
Juan Cao
Jinhui Tang
Xingyu Gao
Jintao Li

In this article, we propose a unified geolocation framework to automatically determine where on the earth a web video was shot. We analyze different social, visual, and textual relationships from a real-world dataset and find four relationships with apparent geography clues that can be used for web video geolocation. Then, the geolocation process is formulated as an optimization problem that simultaneously takes the social, visual, and textual relationships into consideration. The optimization problem is solved by an iterative procedure, which can be interpreted as a propagation of the geography information among the web video social network. Extensive experiments on a real-world dataset clearly demonstrate the effectiveness of our proposed framework, with the geolocation accuracy higher than state-of-the-art approaches.

Details DOI

AAAI Conference 2014 Conference Paper

SOML: Sparse Online Metric Learning with Application to Image Retrieval

Xingyu Gao
Steven C.H. Hoi
Yongdong Zhang
Ji Wan
Jintao Li

Image similarity search plays a key role in many multimedia applications, where multimedia data (such as images and videos) are usually represented in highdimensional feature space. In this paper, we propose a novel Sparse Online Metric Learning (SOML) scheme for learning sparse distance functions from large-scale high-dimensional data and explore its application to image retrieval. In contrast to many existing distance metric learning algorithms that are often designed for low-dimensional data, the proposed algorithms are able to learn sparse distance metrics from high-dimensional data in an efficient and scalable manner. Our experimental results show that the proposed method achieves better or at least comparable accuracy performance than the state-of-the-art non-sparse distance metric learning approaches, but enjoys a significant advantage in computational efficiency and sparsity, making it more practical for real-world applications.

PDF Details