Está en la página 1de 4

Cite the references in increasing numerical sequences as superscript

(1,2,3 etc...) and list them accordingly under "references"


Callout All Figures and Tables.
Pls keep all figures in separate pages at the end of the article. Mark their
position in text
Provide equations in mathtype

This is title of the article

Given-Name Surname*1, Given-Name Surname 2, Given-Name Surname3


1 2
author's Position, Include Department, Institutional address, Pin code/ zip number, Country Co-author's
affiliation, Include Department, Institutional address, Pin code/ zip number, Country.
3
Co-author's affiliation, Include Department, Institutional address, Pin code/ zip number, Country
1 2 3
emailaddress@gmail.com , emailaddress@yahoo.com , emailaddress@refiffmail.com ,
*
Corresponding author : Phone: +0-000-000-0000

Abstract
Background/Objectives: In <30 words. Please use a 11-point Calibri (Body) font or its closely related
font family throughout the article unless otherwise mentioned; the right margins should be justified
wherever possible.
Methods/Statistical analysis: It should be <70 words. Include the method adapted to study the
objectives/sampling details or simulation or statistical analysis of data; technique employed;
mention unique/ important points of modification of methodology in the current study. Mention
about test samples the control employed or approach used for comparing the test sample.
Findings: It should be <170 words. Mention your findings in the form of statements along with the
conclusive data of statistical importance; Mention how your findings are unique and novel; how
your findings are in consensus with the existing values/ reports or how different are they from the
already reported findings. Highlight how your results are helpful in adding more value to the existing
reports.
Improvements/Applications: In <30 words.

Keywords: 5-6 words, 11-point Calibri (Body) font, Normal, Drawn from title, Word representing the
work.

1. Introduction
Data mining helps to extract the original and the valuable data from the large amount of dataset. Data
mining can be implemented in different areas such as Fraud detection, Medical, Education, Banking, Marketing
and Telecommunications. Feature selection is

Feature selection is a process to pick a group of features as subset that are identically suitable for investigation
and for future predication by removing the unrelated or redundant features. The ultimate objective of feature
selection process is to increase the predictive accuracy and reducing complexity of learner results 1,2. In the
universities or in academic institutions, its very difficult to predict the frailer or dropout students in early stage.
Data assimilations is the main process used to reduce student dropout percentage and to increase the student

1
enrolment percentages in the university. Dropout in residential university is caused by academic, family and
personal reasons, campus environment and infrastructure of university and varies depending on the educational
structure agreed by the university. Thus, this work aims to effectively formulate education program and
institutional infrastructure through which the students enrollment rate at the university will get increased
significantly. The main aim of this paper is to develop a improved decision tree model and to derive a
classification rules to predict whether student will graduate or not using the historic dataset. In this paper,
improved decision tree model is used to generate the model. Information like age, parents qualification,
parents occupation, academic record, attitude towards university was collected from the students to forecast the
group of students needs the periodical monitoring.

2. Literature review

In the Era of data mining Educational Data Mining (EDM) is considered as an potential important study
topic. Data mining researchers have well explored and discussed the applicability of data mining in higher
education. Romero and Ventura 3 performed comprehensive study of educational data mining from 1995 to
2005. Shaeela Ayesha et al 4 applied k means clustering to analyze learning behavior of students which will
help the tutor to improve the performance of students and reduce the dropout ratio to a significant level.
DMello 5 studied on bored and frustrated student. Romero studied on the factors that predict failure and non-
retention in college courses. Many studies included a wide range of potential predictors, including personality
factors, intelligence and aptitude tests, academic achievement, previous college achievements, and demographic
data and some of these factors seemed to be stronger than others, however there is no consistent agreement
among different studies

3. Proposed Work

J.Ross Quinlan proposed the Iterative Dichotomized 3 (ID3 algorithm) in the year 1979 which is used to build
the decision tree using information theory. Top down approach with no backtracking is used to build the model
in the decision tree algorithm. Information gain is used to determine which attribute will best decide the target
data classification. The traditional ID3 algorithm is improved by using Renyi entropy, Information gain and
Association Function in this work. This combination is used as a new criterion to construct the decision tree and
to predict the dropout of the university students. Initially Renyi entropy is determined using which the
Information gain is calculated. This value is kept as the old gain for every attribute. Then using Association
Factor, normalized information gain is to be calculated. This is the new information gain. This gain value will be
used to construct the decision tree. The proposed decision tree model is enumerated as follows.(Figure 1)

Figure 1. Design of Improved Decision tree algorithm for Educational Data mining

2
Step 1: The Renyientropy are used for characterizing the amount of information in a probability distribution. Its
generalization of Shannon entropy. Calculate Renyientropy using the formula

, 0 and 1.
Here X is a discrete random variable with possible outcomes 1, 2n. is the order and when it equals to 1 it is
Shannon entropy. A completely homogeneous sample has the entropy of 0.Equally divided sample has the
entropy of 1.
Step 2: Calculate the Information gain IG of each attribute using the formula:

Gain(S,A)=

It is found that only 12 features are most relevant to the task of student dropout prediction out of original
number of 31 features collected through questionnaire as seen in Table 1. Then the ID3 and improved decision
tree algorithm is employed on selected subset of features and record using 10 fold cross validation. Attribute
with highest information gain is used as a root node. The dropout dataset is classified into two groups Yes and
No based on the confusion matrix for Improved Decision Tree was constructed shows accuracy percent 92.50
for ID3 and 97.50 for improved Decision Tree. It indicates that improved decision tree is the best classifier for
predicting the student who will dropout or not at the university.

Table 1: Initial Set of features used for the experimentation

1. Residence 17. Like this University


2. Family Type 18. Educational system of
3. Family Annual Income University
4. Fathers Education 19. Infrastructure of
5. Mothers Education university
6. Fathers occupation 20. Extra-curriculum
7. Mothers occupation activities in university
8. College Location of student 21. Entertainment in university
9. Student grade/percentagein High School 22. Time for self study
(10th ) 23. Placement Status
10. Studentgrade/percentage in Senior 24. Participate in extra curriculum activity
Secondary (10th ) 25. Teacher Student
11. Course Admitted relationship
12. Admission type 26. Family Problem
13. Satisfaction with Course 27. Home Sickness
14. Syllabus of Course 28. Campus Environment
15. Parents meet the university expenses 29. Change of Goal
16. Family experiences Stress 30. Adjustment Problem
31. Enrolled in other
universities

4. Conclusion
This paper proposed an improved decision tree algorithm for prediction of dropout student. The objective of
this work is to develop an improved decision algorithm that enhances the ability to form decision trees and
thereby to prove that the classification accuracy of improved decision algorithm on educational dataset is
greater. A new decision tree model is to be constructed by using Renyi entropy for calculating the
information gain and the association function will be used which determines the relative degree between the

3
given attribute and class C. Experimental results will prove that improved decision tree algorithm will
provide better prediction accuracy on student dropout data than that of traditional classification algorithms.

Please incorporate the following the style for references

To refer a research article:

1. Kimio T, Natarajan G, Hideki A, Taichi K, Nanao K. Higher involvement of subtelomere regions


for chromosome rearrangements in leukemia and lymphoma and in irradiated leukemic cell
line. Indian Journal of Science and Technology. 2012 April, 5 (1), pp. 1801-11.
To refer a Book/ Report:2.

3. Cunningham C H. A laboratory guide in virology. 6th edn. Burgess Publication Company:


Minnesota, 1973.
To refer a Chapter in a Book:

4. Sathishkumar E, Varatharajan M. Microbiology of Indian desert. In: Ecology and vegetation of


Indian desert. D.N.Sen (ed.), Agro Botanical Publ.: India. 1990, pp. 83-105.
To refer a publication of proceedings:

5. Varatharajan M, Rao B S, Anjaria K B, Unny V K P, Thyagarajan S. Radiotoxicity of sulfur-35.


Proceedings of 10th NSRP, India, 1993, pp. 257-58.
Internet source
5. Article title. http://www.indjst.org/index.php/vision. Date accessed: 01/01/2015.

5. References

1. Jimenez L O, Landgrebe D A, Hyperspectral Data Analysis and Feature Reduction via


Projection Pursuit, IEEE Transactions on Geoscience Remote Sensing, 1999, 37 (6), pp.
2653-667.

2. Nigam K, Ghani R, Analyzing the Effectiveness and Applicability of Co-Training. Ninth


International Conference on Information and Knowledge Management, 2000, pp 8693.

3. Castelli V, Cover T, The relative value of labeled and unlabeled samples in pattern
recognition with an unknown mixing parameter. IEEE Transactions on Information Theory,
1996, 42(6), pp. 2101117.

4. Lak M, Keshavarz A, Pourghassem H, Graph-Based Hyperspectral Image Classification


Using Outliers Detection Based on Spatial Information and Estimating of the Number of
GMM Mixtures. 2013 International Conference on Communication Systems and Network
Technologies, Gwalior, 2013, pp. 196-200.

5. Rosset S, Zhu J, Zou H, Hastie T, A Method for Inferring Label Sampling Mechanisms in
Semi-Supervised Learning. In: Advances in Neural Information Processing Systems, MIT
Press: Cambridge, MA, pp.1-8.