4Dark Web Content Classification Using Quantum Encoding

Ashwini Dalvi1*, Soham Bhoir2, Faruk Kazi1 and S. G. Bhirud1

1Veermata Jijabai Technological Institute, Mumbai, India

2K. J. Somaiya College of Engineering, Mumbai, India

Abstract

The study of cyber terrorism is relatively new and still in its infancy. Nevertheless, researchers and security professionals consider data collected from the dark web as one of the measures for proactive cybersecurity to combat cyber threats and cyber terrorism. Therefore, classifying dark web content with approaches ranging from machine learning to deep learning is researched extensively in the literature. Still, particular challenges remain with classifying dark web hidden services, for example, the limitation of the dataset to label hidden services and the requirement of substantial computing and storage resources to manage raw and unlabelled dark web data.

The proposed work presented a quantum encoding–based approach to categorizing Tor hidden services. First, the dark web crawler crawled the Tor dark web to fetch hidden services. The classical model classifies hidden services into 12 categories. The 12 categories include law and government, forum, streaming services, social networking sites, food, travel, games, health and fitness, education, computer and technology, e-commerce, and business/corporate. The keyword dataset to label hidden services is created using scraping and cleaning surface web pages of each of the 12 categories. Thus ...

Get Quantum Computing in Cybersecurity now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.