Objective

The objective of this analysis is to develop a robust and accurate machine learning model for detecting credit card fraud. This involves predicting whether a given credit card transaction is fraudulent based on a set of features extracted from transaction data.

supervised classification task

1. Get data

2. Explore data

2.1 Dataset information and types

the dataset is highly imbalanced! as the dataset is highly imbalanced in favor of normal transactions the model sees a lot more normal transactions and the models we build would be biased

2.2 Visualize data

3. Split data to train and test sets

4.Undersampling technique

5. Explore train set in more detail

5.1 Data preprocessing

5.1.1 Drop duplicate samples

5.1.2 Feature scaling

5.2 Correlations

V4 has the biggest positive correlation with transaction being fraud or normal V14 has the biggest negative correlation with transaction being fraud or normal

5.2.1 Reduce redundant dimensionality - Multicollinearity

6. Automating preprocessing using pipelines

Great! We have a preprocessing pipeline that takes the full dataset and applies the appropriate transformations to each column.

7. Select and train models

7.1 Logistic regression

7.2 KNN

7.3 Support vector machine

7.4 Decision tree classifier

8. Fine-tune models

9. Evaluate the model on the test set