SINGLE STAGE DEEP TRANSFER LEARNING MODEL FOR APPAREL DETECTION AND CLASSIFICATION FOR E-COMMERCE

Authors

  • Ssvr Kumar Addagarla Vellore Institute of Technology (VIT)
  • Anthoniraj Amalanathan Vellore Institute of Technology (VIT)

DOI:

https://doi.org/10.7903/ijecs.1953

Keywords:

Custom Object Detection, Yolov3, Spatial pyramid pooling, Color Space, Apparel detection

Abstract

Although many computer vision based object detection techniques are evolved in the past decade but suffers from inconsistent detection accuracy especially for multi-class classification problems. In this paper proposed an approach using Single Stage Deep Transfer Learning model (SS-DTLM) for multi-class apparel detection using customized YoloV3 algorithm by adapting 3-level Spatial pyramid pooling (SPP), a multi scale image feature extractor for faster and reasonable apparel detection and classification. This approach produced a reasonable Mean Average Precision (mAP), reliable object detection and classification. Our model trained and tested on Open Images Dataset (OIDV4) with 6 object classes and Custom built Apparel Dataset with 5 object classes of apparels. Finally Experimental Results are compared with base line Yolov3 and Yolov3-Tiny algorithms. Further this paper also emphasized various color spaces of the detected image using SS-DTLM by applying K-Means clustering algorithm for further analysis.

Author Biographies

Ssvr Kumar Addagarla, Vellore Institute of Technology (VIT)

School of Computer Science and Engineering and Research Scholar

Anthoniraj Amalanathan, Vellore Institute of Technology (VIT)

School of Computer Science and Engineering and Associate Professor, Deputy Director for Software Development Cell.

References

P. Sinha, B. Balas, Y. Ostrovsky, and R. Russell, “Face recognition by humans: Nineteen results all computer vision researchers should know about,” Proc. IEEE, vol. 94, no. 11, pp. 1948–1962, 2006.

A. K. Jain and S. Z. Li, Handbook of face recognition, vol. 1. Springer, 2011.

S. Zoghbi, G. Heyman, J. C. Gomez, and M.-F. Moens, “Fashion meets computer vision and nlp at e-commerce search,” Int. J. Comput. Electr. Eng., vol. 8, no. 1, pp. 31–43, 2016.

K. Hara, V. Jagadeesh, and R. Piramuthu, “Fashion apparel detection: The role of deep convolutional neural network and pose-dependent priors,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 1–9.

H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv Prepr. arXiv1708.07747, 2017.

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 2005, vol. 1, pp. 886–893.

X. Wang, T. X. Han, and S. Yan, “An HOG-LBP human detector with partial occlusion handling,” in 2009 IEEE 12th international conference on computer vision, 2009, pp. 32–39.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.

Z.-Q. Zhao, P. Zheng, S. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Trans. neural networks Learn. Syst., 2019.

Y. Seo and K. Shin, “Hierarchical convolutional neural networks for fashion image classification,” Expert Syst. Appl., vol. 116, pp. 328–339, 2019.

A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” arXiv Prepr. arXiv1901.06032, 2019.

M. H. Hassoun and others, Fundamentals of artificial neural networks. MIT press, 1995.

H. D. Beale, H. B. Demuth, and M. T. Hagan, “Neural network design,” Pws, Bost., 1996.

O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, and H. Arshad, “State-of-the-art in artificial neural network applications: A survey,” Heliyon, vol. 4, no. 11, p. e00938, 2018.

S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “Stock price prediction using LSTM, RNN and CNN-sliding window model,” in 2017 international conference on advances in computing, communications and informatics (icacci), 2017, pp. 1643–1647.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv Prepr. arXiv1409.1556, 2014.

C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” CoRR, vol. abs/1512.0, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” CoRR, vol. abs/1512.0, 2015.

F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” CoRR, vol. abs/1610.0, 2016.

S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” CoRR, vol. abs/1611.0, 2016.

“eCommerce - Asia | Statista Market Forecast,” 2020. [Online]. Available: https://www.statista.com/outlook/243/101/ecommerce/asia. [Accessed: 02-Feb-2020].

P. Lops, M. De Gemmis, and G. Semeraro, “Content-based recommender systems: State of the art and trends,” in Recommender systems handbook, Springer, 2011, pp. 73–105.

J. Wei, J. He, K. Chen, Y. Zhou, and Z. Tang, “Collaborative filtering and deep learning based recommendation system for cold start items,” Expert Syst. Appl., vol. 69, pp. 29–39, 2017.

P. Viola, M. Jones, and others, “Rapid object detection using a boosted cascade of simple features,” CVPR, vol. 1, no. 511–518, p. 3, 2001.

S. G. Wu, F. S. Bao, E. Y. Xu, Y.-X. Wang, Y.-F. Chang, and Q.-L. Xiang, “A leaf recognition algorithm for plant classification using probabilistic neural network,” in 2007 IEEE international symposium on signal processing and information technology, 2007, pp. 11–16.

A. Graves and J. Schmidhuber, “Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks,” in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. Curran Associates, Inc., 2009, pp. 545–552.

M. K. Alsmadi, K. B. Omar, S. A. Noah, and I. Almarashdeh, “Fish recognition based on robust features extraction from size and shape measurements using neural network,” J. Comput. Sci., vol. 6, no. 10, p. 1088, 2010.

L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack, and L. Van Gool, “Apparel classification with style,” in Asian conference on computer vision, 2012, pp. 321–335.

C. Szegedy, A. Toshev, and D. Erhan, “Deep Neural Networks for Object Detection,” in Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2013, pp. 2553–2561.

B. Lao and K. Jagadeesh, “Convolutional neural networks for fashion classification and object detection,” CCCV 2015 Comput. Vis., pp. 120–129, 2015.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

J.-C. Chen and C.-F. Liu, “Visual-based deep learning for clothing from large database,” in Proceedings of the ASE BigData & SocialInformatics 2015, 2015, p. 42.

W. Kiadtikornthaweeyot and A. R. L. Tatnall, “Region of interest detection based on histogram segmentation for satellite image,” Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS Arch., vol. 41, no. July, pp. 249–255, 2016.

S. G. Eshwar, A. V Rishikesh, N. A. Charan, V. Umadevi, and others, “Apparel classification using convolutional neural networks,” in 2016 International Conference on ICT in Business Industry & Government (ICTBIG), 2016, pp. 1–5.

A. Schindler, T. Lidy, S. Karner, and M. Hecker, “Fashion and Apparel Classification using Convolutional Neural Networks,” arXiv Prepr. arXiv1811.04374, 2018.

M. Duan, K. Li, C. Yang, and K. Li, “A hybrid deep learning CNN–ELM for age and gender classification,” Neurocomputing, vol. 275, pp. 448–461, 2018.

C. Giri, S. Jain, X. Zeng, and P. Bruniaux, “A Detailed Review of Artificial Intelligence Applied in the Fashion and Apparel Industry,” IEEE Access, vol. 7, pp. 95364–95384, 2019.

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” CoRR, vol. abs/1311.2, 2013.

R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1440–1448, 2015.

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.

W. Liu et al., “Ssd: Single shot multibox detector,” in European conference on computer vision, 2016, pp. 21–37.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.

J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.

J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv Prepr. arXiv1804.02767, 2018.

J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” Adv. Neural Inf. Process. Syst., pp. 379–387, 2016.

C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd: Deconvolutional single shot detector,” arXiv Prepr. arXiv1701.06659, 2017.

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.

P. Zhang, Y. Zhong, and X. Li, “SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, p. 0.

A. Womg, M. J. Shafiee, F. Li, and B. Chwyl, “Tiny SSD: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection,” in 2018 15th Conference on Computer and Robot Vision (CRV), 2018, pp. 95–101.

Z. Yi, S. Yongliang, and Z. Jun, “An improved tiny-yolov3 pedestrian detection algorithm,” Optik (Stuttg)., vol. 183, pp. 17–23, 2019.

S. Maji and J. Malik, “Object detection using a max-margin hough transform,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1038–1045.

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 2147–2154.

A. Rosebrock, “Intersection over Union (IoU) for object detection,” Diambil kembali dari PYImageSearch https//www. pyimagesearch. com/2016/11/07/intersection-over-union-iou-for-object-detection, 2016.

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666.

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.

H. Ma, Y. Liu, Y. Ren, and J. Yu, “Detection of Collapsed Buildings in Post-Earthquake Remote Sensing Images Based on the Improved YOLOv3,” Remote Sens., vol. 12, no. 1, p. 44, 2020.

A. Kuznetsova et al., “The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale,” arXiv Prepr. arXiv1811.00982, 2018.

Downloads

Published

2021-09-01

Issue

Section

Computer science relative issues