Credit Card Fraud Detection
The issue is to spot fraudulent
credit card transactions so that credit card firms' consumers aren't charged
for products they didn't buy. This has become a huge issue in the modern era
because all purchases can be made online with just your credit card
information. Credit card fraud detection is critical for any bank or financial
business. Even before two-step verification was employed for online purchasing
in the United States in the 2010s, many American retail website users were
victims of online transaction fraud. When a data breach results in monetary
theft and, as a result, the loss of customers' loyalty as well as the company's
reputation, it puts organizations, consumers, banks, and merchants in danger.
We need to recognise potential fraud so that customers can't be charged for
items they didn't buy. This is one of the best and easiest data science project ideas for beginners to work on.
In 2017, unauthorized card operations
claimed the lives of 16.7 million people. The goal is to develop a classifier
that can determine whether a proposed transaction is fraudulent.
The following are the key obstacles in detecting credit
card fraud:
●
Every day, massive amounts of data are gathered,
and the model must be fast enough to respond to the scam in time.
●
Data that is unbalanced, i.e. the vast majority
of transactions (99.8%) are not fraudulent, making it extremely difficult to
discover the fraudulent ones. Data availability, as the data is generally
private.
●
Another big concern is misclassified data, as not
every fraudulent transaction is detected and reported.
●
The scammers utilized
adaptive approaches against the model.
Overview:
Fraud can be committed in a variety
of ways and in a wide range of industries. We use machine numpy, scikit learn,
and a few more python modules to address the challenge of recognising credit
card fraud transactions in this data
science project. To make a decision, the majority of detection systems
combine a number of fraud detection datasets to create a connected picture of
both legitimate and invalid payment data. We solved the challenge by developing
a binary classifier and experimenting with several data science project approaches to find which one best matches the
problem. If you want to learn more about these kinds of projects or more about
data science, visit our website, Learnbay best
data science course in Pune which provides different hands-on
project like these.
IP address, geolocation, device
identity, "BIN" data, global latitude/longitude, history transaction
trends, and actual transaction information must all be considered while making
this decision. There are 31 parameters in the dataset. In practice, this means
merchants and issuers use analytically based answers to detect fraud by using a
set of business rules or analytical algorithms to internal and external data.
The PCA transformation resulted in the loss of 28 features due to confidentiality
concerns. The only aspects of PCA that were not changed were "Time"
and "Amount."
Credit Card Fraud Detection with data
science is a method that involves a Data
Science team investigating data and developing a model that will uncover
and prevent fraudulent transactions. Fraudsters are always inventing new fraud
patterns, particularly to adapt to fraud detection systems. This is
accomplished by combining all relevant aspects of cardholder transactions, such
as the date, user zone, product category, amount, provider, client's behavioral
patterns, and so on. Data science models
that are never updated are insufficient because they do not account for changes
and trends in client spending patterns, such as throughout holiday seasons and
across geographic regions. The data is then fed into a model that has been
gradually taught to look for patterns and rules in order to determine if a
transaction is fraudulent or not. Fraud monitoring and detection systems are
used by all major banks, including Chase.
Importing all the necessary Libraries
# import the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import gridspec
Loading the Data
# copy the path for the csv file
data = pd.read_csv("credit.csv")
Code : Understanding the Data
# Grab a peek at the data
data.head()
Describing the Data
# Print the shape of the data
print(data.shape)
print(data.describe())
Imbalance in the data
fraud = data[data['Class'] == 1]
valid = data[data['Class'] == 0]
outlierFraction =
len(fraud)/float(len(valid))
print(outlierFraction)
print('Fraud Cases:
{}'.format(len(data[data['Class'] == 1])))
print('Valid Transactions:
{}'.format(len(data[data['Class'] == 0])))
For Fraudulent Transaction, print the amount data.
print(“Amount details of the
fraudulent transaction”)
fraud.Amount.describe()
For a Normal Transaction, print the amount details.
print(“details of valid transaction”)
valid.Amount.describe()
Plotting the Correlation Matrix
# Correlation matrix
corrmat = data.corr()
fig = plt.figure(figsize = (12, 9))
sns.heatmap(corrmat, vmax = .8,
square = True)
plt.show()
Separating the X and the Y values
Dividing the data into inputs
parameters and outputs value format
X = data.drop(['Class'], axis = 1)
Y = data["Class"]
print(X.shape)
print(Y.shape)
xData = X.values
yData = Y.values
Skicit Learn is used to create a Random Forest Model.
from sklearn.ensemble import
RandomForestClassifier
# random forest model creation
rfc = RandomForestClassifier()
rfc.fit(xTrain, yTrain)
# predictions
yPred = rfc.predict(xTest)
Creating a variety of evaluative parameters
# Evaluating the classifier
# printing every score of the
classifier
# scoring in anything
from sklearn.metrics import classification_report,
accuracy_score
from sklearn.metrics import
precision_score, recall_score
from sklearn.metrics import f1_score,
matthews_corrcoef
from sklearn.metrics import
confusion_matrix
n_outliers = len(fraud)
n_errors = (yPred != yTest).sum()
print("The model used is Random
Forest classifier")
acc = accuracy_score(yTest, yPred)
print("The accuracy is
{}".format(acc))
prec = precision_score(yTest, yPred)
print("The precision is
{}".format(prec))
rec = recall_score(yTest, yPred)
print("The recall is
{}".format(rec))
f1 = f1_score(yTest, yPred)
print("The F1-Score is
{}".format(f1))
MCC = matthews_corrcoef(yTest, yPred)
print("The Matthews correlation
coefficient is{}".format(MCC))
# printing the confusion matrix
LABELS = ['Normal', 'Fraud']
conf_matrix = confusion_matrix(yTest,
yPred)
plt.figure(figsize =(12, 12))
sns.heatmap(conf_matrix, xticklabels
= LABELS,
yticklabels = LABELS, annot = True,
fmt ="d");
plt.title("Confusion
matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Final lines
Fraud is a serious issue for the
entire credit card business, and it is becoming more prevalent as electronic
money transfers become more common. We constructed a binary classifier using
the Random Forest technique to detect credit card fraud transactions in our python data science project. Credit
card issuers should consider implementing advanced Credit Card Fraud Prevention
and Fraud Detection methods to effectively prevent criminal actions such as the
leakage of bank account information, skimming, counterfeit credit cards, the
theft of billions of dollars annually, and the loss of reputation and customer
loyalty.
We learned and utilized strategies to
handle class imbalance issues through this project, and we obtained a 99
percent accuracy rate. Based on information about each cardholder's behavior, data science-based methods can
continuously enhance the accuracy of fraud protection. Because some fraudsters
conduct frauds once using online channels and then transition to other ways,
fraud detection systems must detect online transactions using unsupervised
learning. So, hurry and start learning from the
data science course in pune as well as start your exciting project.
Comments
Post a Comment