ASSOCIATION RULE- A TOOL FOR DATA MINING |
|
|
|
INTRODUCTION: What is Data Mining?
Data mining tools perform data analysis and may uncover important data patterns, contributing greatly to business strategies, knowledgebase's, and scientific and medical research. The widening gap between data and information calls
for a systematic development of data mining tools. So simply you can say data mining refers to extracting or "mining" knowledge from large amount of data.
KDD (Knowledge Discovery in Databases): Many people treat data mining as a synonym for another popularly used term, i.e. KDD. Alternatively, others view data mining as simply an essential step in the process of knowledge
discovery in databases. KDD process is depicted in figure. |
|
|
|
2. Data integration: Where multiple data source may be combined. 3. Data Selection: Where data relevant to the analysis task are retrieved from the database. 4. Data Transformation: Where data are transformed 5. Data mining: An essential process where intelligent methods are applied in order to extract data pattern. 6.
Pattern Evaluation: To identify the interesting pattern. 7. Knowledge presentation: Where visualization and knowledge representation techniques are used. |
|
|
|
What is association analysis? Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently
together in a given set of data. Association analysis is widely used for market basket or transition data analysis. Association rule having two main important properties.
The definition of the support and confidence is
Support (AB) = P (AUB)
Confidence
(AB) = P (B|A). If we correlate support and confidence then Confidence (
AB) = P (B|A) = Support_ count (AUB)/Support_ count (A) Where Support_ count (AUB) is the number of transaction containing the item sets AU B, and Support_ count (A) is the number of transactions containing the item set A. "How are association rules mined from large databases?" Association rule mining is a two –step process:
1. Find all frequent item sets: By definition, each of these item sets will occur at least as frequently as a predetermined minimum support count.
2. Generate strong association rules from the frequent item sets: By definition, these rules must satisfy minimum support and minimum support and minimum confidence.
APRIORI ALGORITHM: (Finding Frequent Item sets Using Candidate Generation) Apriori is an influential algorithm for mining frequent item sets. The name of the algorithms is based on the fact that the
algorithm uses prior knowledge of frequent item sets properties. Apriori employs an iterative approach known as a level-wise search. To improve the efficiency of the level-wise generation of frequent item
sets, an important property called the apriori property, i.e." all nonempty subsets of a frequent item sets must also be frequent." Apriori algorithms having a two-step process.
procedure AprioriAlg() L1 := {frequent 1-itemsets}; end It makes multiple passes over the database. In the first pass, the algorithm simply counts item occurrences to
determine the frequent 1-itemsets (itemsets with 1 item). A subsequent pass, say pass k, consists of two phases. First, the frequent itemsets Lk-1
(the set of all frequent (k-1)-itemsets) found in the (k-1)th pass are used to generate the candidate itemsets Ck, using the apriori-gen() function. This function first joins Lk-1 with Lk-1, the
joining condition being that the lexicographically ordered first k-2 items are the same. Next, it deletes all those itemsets from the join result that have some (k-1)-subset that is not in Lk-1 yielding Ck. The algorithm now scans the database. For each transaction, it determines which of the candidates in Ck
are contained in the transaction using a hash-tree data structure and increments the count of those candidates. At the end of the pass, Ck is examined to determine which of the candidates are frequent, yielding Lk
. The algorithm terminates when Lk becomes empty. FP-TREE GROWTH ALGORITHM: Apriori algorithms suffer from the following two shortcomings:
1. It is costly to handle large numbers of candidate sets. For instance, 104 frequent 1-itemsets, then approximately, 107 candidate 2-itemsets are generated. 2. It is
tedious to repeatedly scan the database and check a large set of candidates by pattern matching.
Keeping this in mind, a new class of algorithms has recently been proposed which avoids the generation of large numbers of candidate sets. We describe one such method, called the FP-tree growth algorithm. It is proposed by Han
et al. The main idea of the algorithm is to maintain a frequent pattern tree of the databases. A frequent pattern tree (or FP-tree) is a tree structure consisting of an item-prefix-tree and a frequent item-header
table.
o It consists of a root node labeled null o Each non-root node consists of three fields:
* Item name * Support count * Node link.
* Item name * Head of node link which points to the first node in the FP-tree * Carrying the item name.
Association rules should not be used directly for prediction without further analysis or domain knowledge. They do not necessarily indicate causation. They are however a helpful starting point for further
exploration, making them a popular tool for understanding data. CONCLUSION: References:
* * http://www.almaden.ibm.com/software/quest/ * http://www.data-mine.com/bin/site/templates/splash.asp * Data mining a book by A.K.PUJARI (University of Hyderabad)
* Data mining book by Dunham * Research papers by Rakesh Aggrawal and S Srikanth
|
|
|
|
Source : E-mail March 20, 2004 |
|

Experience Sharing / MDPs / Conferences / Admission Announcements / Spot Admissions / Where Are You ? Spotted !
Faculty Positions / Unadvertised MBA Jobs / Books on Management /
Journals on Management /
MBA Contest / Campus News
Advertise on IndianMBA.com / Inquiry /
Guest Book (Feedback) / Disclaimer / Home
site developed & maintained by AVR Services
welcome to indianMBA . com

© indianMBA.com All Rights Reserved
Important Note :
Site Best Viewed in Internet Explorer in 800x600 pixels
Browser text size : Medium
advertisement
Experience Sharing
/ MDPs / Conferences / Admission Announcements / Research Scholarships / Where Are You ? Spotted !
Faculty Positions / Unadvertised MBA Jobs / Books on Management / Journals on Management /
MBA Contest / Campus News / Home