Documentos de Académico
Documentos de Profesional
Documentos de Cultura
By Ujwala Bhoga
INTRODUCTION
It is defined as the extraction of interesting patterns or knowledge from huge amount of data.
Data -Data are any facts, numbers, or text that can be processed by a computer Information-The patterns, associations, or relationships among all this data can provide information Knowledge -Information can be converted into knowledge about historical patterns and future trends
Data mining comes in two flavors Directed Directed data mining attempts to explain or categorize
some particular target field such as income o response.
Data mining is largely concerned with building models. A model is simply an algorithm or set of rules that connects a collection of inputs to a particular target or outcome. Many problems of intellectual, economic, and business interest can be phrased in terms of the following tasks:
Classification Estimation Prediction Affinity grouping Clustering Description and Profiling
The first are examples of directed data mining, where the goal is to find the value of a particular target variable. Affinity grouping and clustering are undirected tasks where the goal is to uncover structure in data without respect to a particular target variable. Profiling is a descriptive task that may be either directed or undirected.
Decision Trees
Rule Induction
K-means Clustering
Architecture
Analysis Services includes the following algorithm types: Classification algorithms predict one or more discrete variables, based
on the other attributes in the dataset.
Experienced analysts will sometimes use one algorithm to determine the most effective inputs (that is, variables), and then apply a different algorithm to predict a specific outcome based on that data.
Use entire training database as the model Find nearest data point and do the same thing as you did for that record
100
Age
Doses
1000
Very easy to implement. More difficult to use in production. Disadvantage: Huge Models
Authentication Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person or software program, tracing the origins of an artifact, ensuring that a product is what its packaging and labeling claims to be. In private and public computer networks authentication is commonly done through the use of logon passwords. Knowledge of the password is assumed to guarantee that the user is authentic. Each user registers initially, using an assigned or self-declared password. On each subsequent use, the user must know and use the previously declared password. The weakness in this system for transactions that are significant is that passwords can often be stolen, accidentally revealed, or forgotten.
For this reason, Internet business and many other transactions require a more stringent authentication process. The use of digital certificates issued and verified by a Certificate Authority (CA) as part of a public key infrastructure is considered likely to become the standard way to perform authentication on the Internet.
False hit
Generally the tem hit means successful search i.e., the required information has been found in the search by the given query. But, if the information required is not available in the database then it is known as false hit. False hits in the data mining increase the cost of the application. So, we have to reduce the false hits in order to improve the performance of the application. In this application false hits are reduced by storing the queries of the false hits in another database, so for the first time if the information is not available for the given query by the client that query will be saved as a false hit in false hit database. Whenever client gives the query first it searches in the false hit database.
Analysis
Existing System:
Several application including image, medical, Time series and Document Databases involve high dimensional data. Similarity retrieval in these application based on low dimensional indexes, such as the R* Tree is very expensive due to the dimensionality curse. The system considers the query which is processing under the Nearest Neighbor but it should not be an authenticated because its providing the result-set with nearest data only. Disadvantages: This systems provided the record-set is fully authenticated Unable to use the public key cryptosystem. We have to search the nearest result accurately.
Proposed System:
The system provides authentication for processing the query is done by maintaining a dataset DB in server and it is signed by a trusted authority (e.g., the data owner, a notarization service). The signature is usually based on a public key cryptosystem. The server receives and processes queries from clients. Each query returns a result set and the database that satisfies certain predicates. Moreover the client must be able to establish that result set is correct i.e. it contains all records of database that satisfy the query condition and that these records have not been modified by the server or another entity. Since the signature captures the entire database and the server returns the verification objects then the clients can verify result set based on signature and the signers public key. In order to make easier this problem, we present a novel technique that reduces the size of each false hit.
Advantages: The system provides the result-set that resultset is accurate one. Using a public key Cryptosystem, the system provides the result-set is fully authenticated to the user and can visible with his signature. As we are using AMNN method, the client can visible the accurate data.
SOFTWARE REQUIREMENTS
Operating system Front End Coding Language Backend : : : : Windows 7/ XP Professional Microsoft Visual Studio .Net 2008 Visual C# .Net SqlServer 2005
HARDWARE REQUIREMENTS
: : :
Design
Class diagram
Object diagram
State diagram
Activity diagram
Sequence diagram
Collaboration diagram
Component diagram