Data+Mining

=Data Mining= Tuesday, April 15

**Topic overview:**
In this class we will introduce some basic data mining concepts and algorithms, with a discussion on how data mining fits in an overall BI program.

Specific topics that we will cover include:
 * Introduction to data mining
 * Overview of selected data mining techniques and algorithms
 * The role of data mining in a BI program

**Preparation for class:**
Please complete the following reading in preparation for class:
 * Introduction to Data Mining and Knowledge Discovery, a report by the Two Crows Corporation. Although the entire report is interesting and useful. We will be focusing on the material contained in the first 25 pages of the report in class. The Report is available for download at:
 * http://www.twocrows.com/intro-dm.pdf

Slides:

Acknowldegement: Thanks to Professor Michael Trick who contributed material used in today's slides and provided guidance and recommendation on presentation of the topics.

-ALC //Solution to Class Exercise//: Posting solution of class exercise to verify solutions. If you got something else, Please post your solution and method too. • LETTUCE and TOMATOES => HAMBURGER confidence: 8/12 ; lift: .67/.025 • LETTUCE and TOMATOES => SALAD DRESSING confidence: 6/12 ; lift: .5/.01 • HAMBURGER => KETCHUP confidence: 12/25 ; lift: .48/.015 • HAMBURGER and BUNS => KETCHUP confidence: 9/16 ; lift: .5625/.015

-Arlette's solution. I got the same results. On which is the strongest predictor of market basket contents, we can use either confidence or lift measures. If confidence is considered, the association rule "Lettuce and tomatoes => hamburger" is the strongest, with a 66%. However, if lift is used, the rule "Lettuce and tomatoes => salad dressing" is the best predictor with 50%. This rule is the only one that shows the same level of prediction for the two measures, so I would think it is the strongest predictor overall.