Network Intrusion Detection System : Phishing

  • Tech Stack: Python, Sklearn, Machine Learning Algorithms (Gradient Boosting Classifier, CatBoost Classifier, XGBoost Classifier, Multi-layer Perceptron, Random Forest, Support Vector Machine, Decision Tree, K-Nearest Neighbors, Logistic Regression and Naive Bayes Classifier)

This project focuses on developing a sophisticated Machine learning model for detecting phishing attacks, a critical step towards safeguarding computer systems against cyber threats. The model leverages various powerful algorithms and feature selection techniques to analyze a dataset containing phishing URLs and text from emails. The approach involves developing two types of classification algorithms: URL classification & Text classification. The URL classification algorithm scrutinizes various URL features, such as length, presence of the '@', IP address, and prefix & suffix in domain names, to detect any indication of phishing activity. On the other hand, the Text classification algorithm examines the frequency of words and semantic analysis of email content to improve the accuracy of the detection process further.

Combining multiple ML algorithms, including Gradient Boosting, XGBoost, CatBoost, Naive Bayes, Logistic Regression, Decision Tree, KNN, Random Forest, SVM, and Multi-layer Perceptron, the model provides robust detection capabilities. The feature selection techniques, such as principal component analysis (PCA), assist in identifying and highlighting the essential attributes of phishing emails, making the model even more efficient. Ultimately, this project aims to provide a powerful tool to protect computer systems and individuals against phishing attacks by identifying and blocking suspicious emails and URLs, thereby mitigating the risk of cyber threats.