Está en la página 1de 4

Ist Author et al.

, International Journal of Advanced Research in Computer Science and Software Engineering 3 (4),
Septembre - 2016, pp. 1-6

SEMI SUPERVISED METHOD FOR FRAUD


DETECTION ON CREDIT CARD TRANSACTION
DATA
Azhari*

Jason Kurniawan

Ekki Rinaldi

Computer Science and Electronics


Department
azharisn.softcomp@gmail.com

Computer Science and Electronics


Department
jasoank@gmail.com

Computer Science and Electronics


Department
ekkirinaldi@gmail.com

Abstract Internet had big impact on the way we do business today. We can see that e-commerce become viral and
affect anything, including online shopping, mobile money, even charity. This allowing business growth, making
transaction more possible and efficient, increase interaction between customer and corporation. The electronic
banking system addresses several emerging trends: customers demand for anytime, anywhere service, product timeto-market imperatives and increasingly complex back-office integration challenges.
This challenge that against security and privacy, especially from user side. There are many ways user can do
transaction, but thus also bring a gap to fraud happen. For example, an anonymous stealing username and password
from a user and do transaction as much as they want. There are still many ways to doing cybercrime.
To increase banking security, we developing algorithm that can learn user transaction habits and detect anomaly
transaction that fake user do. This algorithm consists of unsupervised algorithm DB-Scan. We explore this
algorithm to find most effective way to differentiate an anomaly transaction that have been done by user or fake
user [1] by using user current transaction history to detect outlier. This outlier will be detected as fraud.
Keywords Bank Security, Cyber Crime, Cyber Security, DBSCAN, Outlier
I. INTRODUCTION
At the basic level, Internet banking can mean the setting up of a web page by a bank to give information about its
products and services. At an advanced level, it involves provision of facilities such as accessing accounts, transferring
funds, and buying financial products or services online as well as new banking services, such as electronic bill
presentment and payment, which allow the customers to pay and receive the bills on a banks website. This is called
transactional online banking [2]. Online banking is a series of processes in which a bank client logs on to the Website
of the bank through the Web-browser that is installed on clients Personal computer and carries out various transactions
such as account transfers, bill submissions, account inquiries etc.
Billions of financial data transactions occur online every day and bank cybercrimes take place every day when bank
information is compromised by skilled criminal hackers by manipulating a financial institutions online information
system. This causes huge financial loses to the banks and customers. The evolution history of attacks began more than 7
years ago initiating what quickly became known as phishing [3]. Its sophistication has increased on par with the new
security technologies adopted by the bank industry intended to mitigate the problem. This means there are some flaws in
the security of online banking that results in loss of money of many account holders along with leakage of their personal
information to unauthorized persons. This unauthorized person can do any transaction without any detection from user or
bank itself.
In this situation, we decided to make an algorithm to detect this anomaly transaction by using DBSCAN algorithm.
This unsupervised algorithm can calculate threshold distance (EPS) more effective [4] by using user current transaction
habit and use it as a detection for new transaction
II. METHODOLOGY
A. DBSCAN Algorithm
DBSCAN algorithms is clustering algorithm which using data density based. This algorithm builds an area with high
density become free clusters in database by using noise. Noise is less density area which used to separate between
clusters in database object. Noise also called outlier. Figure 1 describe how DBSCAN algorithm works.

2016, IJARCSSE All Rights Reserved

Page | 1

Ist Author et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (4),
Septembre - 2016, pp. 1-6
DBSCAN ( D , eps , MinPts ) {
C=0
For each point P in dataset D {
If P is visited
Continue next point
Mark P as visited
NeighborPts = regionQuery(P , eps)
If sizeof(NeighborPts) < MinPts
Mark P as NOISE
Else{
C = next cluster
expandCluster(P , NeighborPts, C , eps , MinPts)
}
}
}
expandCluster (P, NeighborPts,C,eps,MinPts){
add P to cluster C
for each point P in NeighborPts {
if P is not visited {
mark P as visited
NeighborPts = regionQuery(P , eps)
If sizeof(NeighborPts) >= minPts
NeighborPts = NeighborPts joined with NeighborPts
}
If P is not yet member of any cluster
Add P to cluster C
}
}
regionQuery(P,eps)
return all points within Ps eps-neighborhood (including P)

Fig. 1 Algoritma DBSCAN ( D , eps, MinPts)

B. K-Means and Outlier Detection Method


Pada Algoritma K-Means, setiap cluster dinyatakan sebagai nilai rata-rata (mean) objek-objek dalam cluster. Langkahlangkah dalam algoritma K-Means adalah sebagai berikut.
Input
: data dengan n objek dan k (banyaknya cluster) serta imax (iterasi maksimal _ optional)
Output
: himpunan k cluster yang optimal
Metode:
1. pilih k objek sebagai pusat cluster awal
2. menilai kesamaan setiap objek dengan setiap cluster menggunakan fungsi kesamaan (misal euclidean distance)
3. menenetukan keanggotaan setiap objek berdasarkan kemiripan (hasil no 2)
4. mengupdate pusat cluster dengan menghitung rata-rata (mean) setiap cluster
5. ulangi langkah 2 s.d 4 sampai tidak ditemukan perubahan pusat cluster dan anggotanya, atau sampai iterasinya
mencapai iterasi maksimal
Kemudian outlier akan dicari menggunakan rumus sebagai berikut[5] :
Mencari jarak maksimal point terjauh dari titik pusat:

Kemudian tingkat ke-outlieran vektor xi dicair menggunakan rumus berikut :

Selanjutnya nilai treshold T, dan apabila nilai dari Oi > T maka point tersebut dianggap sebagai outlier.

2016, IJARCSSE All Rights Reserved

Page | 2

Ist Author et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (4),
Septembre - 2016, pp. 1-6
C. Proposed Method
Begin
C Jalankan algoritma DBSCAN dan jumlah kluster yang dibentuk gunakan pada
Kmeans
For j , ...,I do //I adalah banyaknya iterasi
Dmax maxi {|| xi cpi||{
For i 1,...,N do
Oi = ||xi-cpi||/dmax
If Oi>T then
X X\{xi}
End if
End for
(C,P) K-Means(X,C) //C adalah titik pusat dan P adalah jumlah partisi
endFor
End
III. EXPERIMENTAL
Dataset used in this research consist of 40.400 transaction synthetic in comma separated value format (.CSV) with
attributes showed in Table A.
TABLE A
Column name
No of unique
Description
values
40304
Date of the transactions
Date
4
1 = Tarik
Transaction_code
2 = Transfer
3 = Beli
4 = Setor
100
Kode akun pengirim
Account_from
149
Jumlah
uang
yang
Nominal
ditransaksikan
171
Koden akun tujuan
Account_to
185
Tempat/Lokasi
transaksi
Terminal_id
dilakukan
4001
References code
References
Dalam percobaan ini , dua attribut yaitu nominal dan terminal_id digunakan sebagai feature dalam clustering.
Clustering dilakukan pada setiap kode akun pengirim (account_to) dan setiap jenis transaksi yaitu Tarik, Transfer, Beli ,
Setor dan Semua Jenis Transaksi pada akun pengirim tersebut, sehingga total jumlah clustering yang dilakukan adalah
500 kali.
A Preprocessing
Preprocessing dilakukan dengan me-standarisasi by removing the mean and scaling to unit variance.x Centering
and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set.
Mean and standard deviation are then stored to be used on later data using the transform method.[scikit-learn].
Standarisasi dilakukan setiap kode akun pengirim jenis transaksi yang telah disebutkan sebelumnya.
B. DBSCAN Algorithm for choosing k number of clusters
Dalam menentukan nilai eps dan MinPts , dilakukan eksperimen dan observasi langsung untuk menentukan
kombinasi nilai parameter yang optimal. Di sini nilai default dari MinPts adalah 1 , karena kita dapat beranggapan bahwa
paling tidak ada satu transaksi yang terjadi sehingga point yang tidak memiliki point terdekat terdekat dianggap sebagai
cluster. Berdasarkan percobaan yang telah kami lakukan, hasil cluster yang dibentuk dari berbagai nilai eps yang diujikan
pada salah satu pemilik akun dapat dilihat pada Tabel 1. dan Figure 2.
Tabel 2.
Epsilon
Tarik Transfer Beli Setor Semua
0.2
61
61
58
56
24
0.4
2
18
6
6
5
0.6
2
3
2
2
5
0.8
2
2
2
2
5
1
2
2
2
2
4

2016, IJARCSSE All Rights Reserved

Page | 3

Ist Author et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (4),
Septembre - 2016, pp. 1-6

epsilon = 0.2

epsilon = 0.4

epsilon = 0.6

epsilon = 0.8

epsilon = 1
Figure 2.
Dari tabel b dan gambar a.. , diperoleh bahwa 0.8 merupakan nilai yang paling optimal untuk parameter epsilon.
IV. CONCLUSIONS
.
REFERENCES
[1] Shashidhar HV, Subramanian Varadarajan, Customer Segmentation of Bank based on Data Mining Security
Value based Heuristic Approach as a Replacement to K-means Segmentation Tamil Nadu, India, 2011.
[2] W. Lampson Butler,"Computer Security in the Real World , Annual Computer Security Applications Conference,
2000.
[3] Tony Uceda Velez, Phishing for Banks: A Timely Analysis on Identity Theft & Fraud in the Financial Sector,
Atlanta GA, 2004.
[4] Priyamvada Paliwal, Meghna Sharma, Enhanced DBSCAN Outlier Detection, Gurgaon, India, 2013.
[5] M. H. Marghny dan Ahmed I. Taloba, Outlier Detection Using Improved Genetic K-Means, Internal Journal of
Computer Applications (0975 8887), 2011.

2016, IJARCSSE All Rights Reserved

Page | 4