Arrow Research search

Author name cluster

Nan Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

EAAI Journal 2023 Journal Article

Imbalanced data classification: Using transfer learning and active sampling

  • Yang Liu
  • Guoping Yang
  • Shaojie Qiao
  • Meiqi Liu
  • Lulu Qu
  • Nan Han
  • Guan Yuan
  • Tao Wu

Recently, deep learning models have made great breakthroughs in the field of computer vision, relying on large-scale class-balanced datasets. However, most of them do not consider the class-imbalanced data. In reality, the class-imbalanced distribution can lead to the degradation of model performance, reducing the generalization of these models. In addition, in the era of big data, many applications need to use real-time visual data. These data come from different mobile devices, which continuously generate a huge number of visual data. However, there are few studies using real-time data from information systems, real-time data is easy to capture but difficult to use. In order to solve the above problems, we propose a new model (Transfer Learning Classifier, TLC) based on transfer learning to deal with class-imbalanced data. The model includes active sampling module, real-time data augmentation module and DenseNet module. Among them, (1) the newly proposed active sampling module can dynamically adjust the number of samples with skewed distribution; (2) the data augmentation module can expand the real-time data to avoid over-fitting and insufficient data; (3) the DenseNet module is a standard DenseNet network pre-trained on the ImageNet dataset and transferred to TLC for relearning, and then we adjust the memory usage of the standard DenseNet to make it more efficient. In addition, we have applied a new end-to-end real-time data storage and analysis system. A large number of experiments have been carried out on four different long mantissa data sets. Experimental results show that the proposed TLC model can effectively deal with the static data as well as the real-time data, and the classification effect of imbalanced data is better than that of existing models.

TIST Journal 2022 Journal Article

Algorithms for Trajectory Points Clustering in Location-based Social Networks

  • Nan Han
  • Shaojie Qiao
  • Kun Yue
  • Jianbin Huang
  • Qiang He
  • Tingting Tang
  • Faliang Huang
  • Chunlin He

Recent advances in localization techniques have fundamentally enhanced social networking services, allowing users to share their locations and location-related contents. This has further increased the popularity of location-based social networks (LBSNs) and produces a huge amount of trajectories composed of continuous and complex spatio-temporal points from people’s daily lives. How to accurately aggregate large-scale trajectories is an important and challenging task. Conventional clustering algorithms (e.g., k -means or k -mediods) cannot be directly employed to process trajectory data due to their serialization, triviality and redundancy. Aiming to overcome the drawbacks of traditional k -means algorithm and k -mediods, including their sensitivity to the selection of the initial k value, the cluster centers and easy convergence to a locally optimal solution, we first propose an optimized k -means algorithm (namely OKM ) to obtain k optimal initial clustering centers based on the density of trajectory points. Second, because k -means is sensitive to noisy points, we propose an improved k -mediods algorithm called IKMD based on an acceptable radius r by considering users’ geographic location in LBSNs. The value of k can be calculated based on r, and the optimal k points are selected as the initial clustering centers with high densities to reduce the cost of distance calculation. Thirdly, we thoroughly analyze the advantages of IKMD by comparing it with the commonly used clustering approaches through illustrative examples. Last, we conduct extensive experiments to evaluate the performance of IKMD against seven clustering approaches including the proposed optimized k -means algorithm, k -mediods algorithm, traditional density-based k -mediods algorithm and the state-of-the-arts trajectory clustering methods. The results demonstrate that IKMD significantly outperforms existing algorithms in the cost of distance calculation and the convergence speed. The methods proposed is proved to contribute to a larger effort targeted at advancing the study of intelligent trajectory data analytics.

TIST Journal 2021 Journal Article

A Dynamic Convolutional Neural Network Based Shared-Bike Demand Forecasting Model

  • Shaojie Qiao
  • Nan Han
  • Jianbin Huang
  • Kun Yue
  • Rui Mao
  • Hongping Shu
  • Qiang He
  • Xindong Wu

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF, is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.

EAAI Journal 2020 Journal Article

A point-of-interest suggestion algorithm in Multi-source geo-social networks

  • Xi Xiong
  • Shaojie Qiao
  • Yuanyuan Li
  • Nan Han
  • Guan Yuan
  • Yongqing Zhang

Newly emerging location-based social network (LBSN) services provide us with new platforms to share interests and individual experience based on their activity history. The problems of data sparsity and user distrust in LBSNs create a severe challenge for traditional recommender systems. Moreover, users’ behaviors in LBSNs show an obvious spatio-temporal pattern. Valuable extra information from microblog-based social networks (MBSNs) can be utilized to improve the effectiveness of POI suggestion. In this study, we propose a latent probabilistic generative model called MTAS, which can accurately capture the underlying information in users’ words extracted from both LBSNs and MBSNs by taking into consideration the decision probability, a latent variable indicating a user’s tendency to publish a review in LBSNs or MBSNs. Then, the parameters of the MTAS model can be inferred by the Gibbs sampling method in an effective manner. Based on MTAS, we design an effective framework to fulfill the top- k suggestion. Extensive experiments on two real geo-social networks show that MTAS achieves better performance than existing state-of-the-art methods.

AIIM Journal 2019 Journal Article

A novel Chinese herbal medicine clustering algorithm via artificial bee colony optimization

  • Nan Han
  • Shaojie Qiao
  • Guan Yuan
  • Ping Huang
  • Dingxiang Liu
  • Kun Yue

Traditional Chinese medicine (TCM) has become popular and been viewed as an effective clinical treatment across the world. Accordingly, there is an ever-increasing interest in performing data analysis over TCM data. Aiming to cope with the problem of excessively depending on empirical values when selecting cluster centers by traditional clustering algorithms, an improved artificial bee colony algorithm is proposed by which to automatically select cluster centers and apply it to aggregate Chinese herbal medicines. The proposed method integrates the following new techniques: (1) improving the artificial bee colony algorithm by applying a new searching strategy of neighbour nectar, (2) employing the improved artificial bee colony algorithm to optimize the parameters of the cutoff distance d c, the local density ρ i and the minimum distance δ i between the element i and any other element with higher density in the cluster algorithm by fast search and finding of density peaks (called DP algorithm) to find the optimal cluster centers, in order to clustering herbal medicines in an accurate fashion with the guarantee of runtime performance. Extensive experiments were conducted on the UCI benchmark datasets and the TCM datasets and the results verify the effectiveness of the proposed method by comparing it with classical clustering algorithms including K-means, K-mediods and DBSCAN in multiple evaluation metrics, that is, Silhouette Coefficient, Entropy, Purity, Precision, Recall and F1-Measure. The results show that the IABC-DP algorithm outperforms other approaches with good clustering quality and accuracy on the UCI and the TCM datasets as well. In addition, it can be found that the improved artificial bee colony algorithm can effectively reduce the number of iterations when compared to the traditional bee colony algorithm. In particular, the IABC-DP algorithm is applied to cluster multi-dimensional Chinese herbal medicines and the result shows that it outperforms other clustering algorithms in clustering Chinese herbal medicines, which can contribute to a larger effort targeted at advancing the study of discovering composition rules of traditional Chinese prescriptions.

EAAI Journal 2019 Journal Article

Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information

  • Yongqing Zhang
  • Shaojie Qiao
  • Shengjie Ji
  • Nan Han
  • Dingxiang Liu
  • Jiliu Zhou

Identification of DNA–protein binding sites in protein sequence plays an essential role in a wide variety of biological processes. In particular, there are huge volumes of protein sequences accumulated in the post-genomic era. In this study, we propose a new prediction approach appropriate for imbalanced DNA–protein binding sites data. Specifically, motivated by the imbalanced problem of the distribution of DNA–protein binding and non-binding sites, we employ the Adaptive Synthetic Sampling (ADASYN) approach to over-sample the positive data and Bootstrap strategy to under-sample the negative data to balance the number of the binding and non-binding samples. Furthermore, we employ the three types of features: the position specific scoring matrix, one-hot encoding and predicted solvent accessibility, to encode the sequence-based feature of each protein residue. In addition, we design an ensemble convolutional neural network classifier to handle the imbalance problem between binding and non-binding sites in protein sequence. Extensive experiments were conducted on the real DNA–protein binding sites dataset, PDNA-543, PDNA-224 and PDNA-316, in order to validate the effectiveness of our method on predicting the binding sites by ten-fold cross-validation metric. The experimental results demonstrate that our method achieves a high prediction performance and outperforms the state-of-the-art sequence-based DNA–protein binding sites predictors in terms of the Sensitivity, Specificity, Accuracy, Precision and Mathew’s Correlation Coefficient ( M C C ). Our method can obtain the M C C values of 0. 63, 0. 48 and 0. 67 on PDNA-543, PDNA-224 and PDNA-316 datasets, respectively. Compared with the state-of-the art prediction models, the M C C values for our method are increased by at least 0. 24, 0. 13 and 0. 23 on PDNA-543, PDNA-224 and PDNA-316 datasets, respectively.

EAAI Journal 2018 Journal Article

SocialMix: A familiarity-based and preference-aware location suggestion approach

  • Shaojie Qiao
  • Nan Han
  • Jiliu Zhou
  • Rong-Hua Li
  • Cheqing Jin
  • Louis Alberto Gutierrez

Traditionally, location suggestion systems have employed collaborative filtering model to make recommendations for users based on data gathered from users with similar interests, demographics, and check-in records. However, these techniques fail to take into account on very important element present in online social networks, the online relationships that these users maintain. Arguably, this is the most important aspect of their online profiles, often more revealing than their self reported personal interests and check-in records. Aiming to improve the accuracy and novelty of recommendations, this research proposes a hybrid location suggestion algorithm, called SocialMix, of which, takes into full consideration a user’s familiarity and preference (interest) similarity, along with relationships. In the first part of this study, we compute the degrees of familiarity between users using three feature variables: the number of mutual friends, the Jaccard index and cosine similarity. In order to determine the weights of the these feature variables, maximum likelihood estimation used, and then the features are fit to a Logistic Regression model in order to calculate the degrees of familiarity. The second part of this research we present a new method for calculating similarity between individuals by integrating users familiarity and preference similarity. This allows us to introduce a new location interest degree calculation method on the hybrid similarity. Extensive experiments were conducted on several real datasets. The performance of SocialMix was analyzed for both accuracy and time complexity using the following metrics: MAE (mean absolute error), RMSE (root mean square error), Precision, Recall, F-measure, Coverage rate, Popularity and Response time. Results were compared against classical recommendation approaches as a baseline. The results show that the accuracy and time performance of SocialMix, when compared with other algorithms which do not consider social relationships, are demonstratively improved. In addition, a positive by product worth noting is that SocialMix has a tendency to recommend more obscure but still interesting locations.