|
|
| Global Health and Medical Data Mining Information Intelligence and Trends Key Intelligence and Expert Resources on Critical Global Health and Medical Data Mining Information Trends and Solutions
Global Information Intelligence and Trends Critical Intelligence on Current and Emerging Global Information Trends and Solutions Summary Health and Medical Data Mining Problems: The rapid increase in the trends on Global Health and Medical Data Mining information presents major challenges for every reader in the world to assimilate, filter, examine, analyze and digest relevant, critical and vital information for daily use. Health and Medical Data Mining Solutions: This site is intended to provide you with a single time-saving site that addresses important and useful global information on Global Health and Medical Data Mining that you will find both vital and indispensable for regular use whether it be personal, corporate data mining and reality mining, or global data mining and reality mining. Health and Medical Data Mining Subject Areas: This site will address key areas of Global Health and Medical Data Mining for digital data and globally transmitted information that impact everyone who uses the Internet or stores information electronically anywhere worldwide. See a comprehensive list below. This list will grow in the future. Global Information Intelligence - Health and Medical Data Mining
The products on Global Health and Medical Data Mining topics include Free and Discounted Articles, E-Books, Expert Analysis, Tips, Tools and Resources Global Health and Medical Data Mining Topics: The topics include the following global trends and solutions on Global Health and Medical Data Mining information. Whether you are a beginner or an expert, you will discover emerging, vital and invaluable solutions: § Global Health and Medical Data Mining and Decision Sciences § Global Health and Medical Data Mining and Epidemiology § Global Health and Medical Data Mining in Population, Vaccines and Patients § Global Health and Medical Data Mining in Health and Medical Sciences § Global Health and Medical Data Mining in Population, Medical Technology § Global Health and Medical Data Mining in Population Health § Global Health and Medical Data Mining Quality § Global Health and Medical Data Mining Training Datasets § Global Health and Medical Data Mining Test Datasets § Global Health and Medical Data Mining Hybrid Frameworks § Efficient Frameworks for Global Health and Medical Data Mining Applications § Intelligent Global Health and Medical Data Mining § Global Health and Medical Data Mining Data Attributes, Feature Attributes § Global Health and Medical Data Mining Data Classes, Groups, Categories and Subcategories § Global Health and Medical Data Mining Algorithms and Machine Learning § Global Health and Medical Data Mining and Accurate Prediction Models § Global Health and Medical Data Mining Privacy Preserving Algorithms § Global Health and Medical Data Mining Search Algorithms § Global Health and Medical Data Mining Machine Learning Intelligence § Global Health and Medical Data Mining Artificial Intelligence § Global Health and Medical Data Mining Qualitative and Quantitative Research § Emerging Issues on Global Health and Medical Data Mining
Overview of Major Problems, Issues, Challenges, Trends and Solutions on Global Health and Medical Data Mining What are Global Health and Medical Data Mining? Global Health and Medical Data Mining involves mining of data including real datasets in the context of making intelligent decisions, research, policy development on health and medical issues pertaining to either an individual or an entity such as a specific organization, corporation or government. Global Health and Medical Data Mining of information on a global scale is a major problem especially due to the frequent use and exchange of sensitive and confidential information on the Internet, corporate networks, intranets and extranets. This has resulted in major challenges for Global Health and Medical Data Mining decicisons on health and medical vaccines, value and evidence-based researh in health and medical sciences
Significant Intelligent Seminars Series Data Mining of Global Health Information and Medical Datasets
Intelligent Data Mining of Global Health and Medical Datasets
Seminar Series I: Introduction
Seminar Series II: Intermediate
Seminar Series III: Advanced
Seminar Series I: Introduction Data Mining of Health Information and Data Sets Dr. Emmanuel Hooper, PhD, PhD, PhD ehooper@fas.harvard.edu
Seminar Objectives
This is a unique and interesting seminar that combines the main focus: Introduction to Data Mining of Health Information and Health Datasets. Attendees from all backgrounds will benefit greatly whether they are students, health and medical personnel, corporate executives, beginners, managers, with business backgrounds or experts in data management, decision science and health information and health datasets management, public policy, or other professions. This seminar is based on the eBook and covers key Strategic Steps for Effective IT and Information Security Program and applies to all users of information assets. This includes the control environment of an organization. This introduction seminar define emerging trends and challenges for Data Mining of Health Information and Health Datasets and essential for strategic steps for achieving and maintaining simultaneous and incremental effective decision-making in health information and health datasets in a global context. This prepares students to take other levels of the topic in the seminar series as follows: - Seminar Series I: Introduction to Data Mining of Health Information and Data Sets
- Seminar Series II: Intermediate Data Mining of Health Information and Data Sets
- Seminar Series III: Advanced Data Mining of Health Information and Data Sets
- Online Seminar: Introduction to Data Mining of Health Information and Data Sets
Prerequisite: There is no prerequisite for this class. A basic understanding of health information and health datasets will be helpful. On the other hand any student or professional with experience will find this seminar very useful as enhancement of their expertise. Therefore attendees with little or no background can benefit as well as the seasoned professional or middle management or corporate executives. Target audience: This seminar is for students or individuals from all backgrounds, including health, medical, corporate executives, managers, business, beginners, professionals, mid-career or experts. It is designed for any individual who intends to learn or update their knowledge on and understanding of health information and health datasets or related areas. It is especially informative and strategic for all attendees who are interested in knowing the rapidly changing and emerging future trends and challenges and solutions for basic understanding of health information and health datasets for practitioners, researchers, faculty, or professionals in health, medical, in the private or public sectors, whether national, multi-national or global. Benefits of Attending: This seminar will provide you with time-saving resources that address important and useful global information on Data Mining of Health Information and Health Datasets. You will learn strategic techniques based on nearly 30 years of professional experience, research and consulting for global companies. The information is presented with ebooks and discussions of real examples that you can use immediately in practical application to your situation. We also reference key resources and expert information that you will find vital and indispensable for regular use. You will find the seminar and information very stimulating, providing you with cost-saving tips and free analysis on rapidly emerging global information and trends with solutions and recommendations for meeting challenges and solving both common and complex problems on Data Mining of Health Information and Health Datasets. Handouts: Relevant examples and seminar slides and other relevant materials will be provided as appropriate. Seminar Objectives: The objectives of this seminar are to provide you with a significant combination of the main focus: Data Mining of Health Information and Health Datasets; and Emerging trends and solutions to Data Mining of Health Information and Health Datasets Requirements. It is intended that you will learn all three areas by the end of the seminar and apply them immediately to solve real and emerging challenges in Data Mining of Health Information and Health Datasets. By the end of the seminar you will gain both broad and in-depth understanding of all these areas for practical implementation in your situation. Assignments Class Participation: Students engage in class discussions on the key topics of each session. Paper 1: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Mining of Health Information and Health Datasets. Paper 2: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Protection and Privacy Regulations. Paper 3: Students write a short paper of approximately 2-3 pages on the benefits of Data Mining of Health Information and Health Datasets. Final Paper: Students research and on a topic related to Data Mining of Health Datasets using any of the Classification Techniques in the seminar and write a paper (approximately 10 pages) on a topic approved by instructor that demonstrates their understanding and application of the topic and lessons learned in the class. Papers should be written in Times or Arial fonts, 12 points and double space. Be sure to put your name, page number and date on each page of the paper. Seminar Evaluation The grading of the seminar will be as follows: Class Participation: 35% Paper 1: 10% Paper 2: 10% Paper 3: 10% Final Paper: 35%
Seminar Textbook Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann, ISBN: 1-55860-901-6 Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Various Handouts The seminar is well organized with experiments to facilitate understanding and rapid assimilate. Thus the reading includes lucid explanations, summaries and detailed analytical academic and practical solutions. This implies that students will learn readily through the presentations, power-point slides, explanations, samples demonstrations, discussions, and real applicable and relevant information including technical examples and detailed references for future development and use.
Optional Reading
References The following are references only to provide background reading and enhance understanding of the students. Yang, Jian, Alejandro F. Frangi, Jing-Yu Yang, David Zhang and Zhong Jin. KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition. IEEE Transaction on Pattern Analysis Machine Intelligence 27(2):230–244, 2005. Øhrn, Aleksander. Discernibility and Rough Sets in Medicine: Tools and Applications. PhD thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, Dec. 1999. NTNU report 1999:133. URL: http://www.idi.ntnu.no/~aleks/thesis Pagallo, G. and D. Haussler. Bolean Feature Discovery in Empirical Learning. Machine Learning, 5(1):71–99, 1990. Papagelis, A. and D. Kalles. Breeding Decision Trees using Evolutionary Techniques. In Proceedings of the 18th International Conference on Machine Learning, ICML-2001, pages 393–400. Morgan Kaufmann, San Francisco, CA, 2001. Pei, J., J. Han, and R. Mao. Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Proceedings of the A CM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21–30. ACM Press, New York, 2000.
Data Mining and Health Data Set Software Tools
Top Ten Data Mining Software Tools 1. SPSS/ SPSS Clementine 2. Salford Systems CART/MARS/TreeNet/RF 3. Yale (currently, Rapid Miner) (open source) 4. SAS / SAS Enterprise Miner 5. Angoss Knowledge Studio / Knowledge Seeker 6. KXEN 7. Weka (open source) 8. R (open source) 9. Microsoft SQL Server 10. MATLAB
Reference: KDD Nuggets Survey(May 2007), http://www.kdnuggets.com/ http://www.the-data-mine.com/; Additional Data Mining Tools: Rulequest Research, Rule Induction with C5.0, See5/Cubist software, 2002–2005. Rosetta software (Øhrn, 1999). XLMiner See the end of this syllabus for more Data Mining tools
Seminar Sessions Class Session 1: Purpose, Definition and Significance of Data Mining of Health Information and Data Sets Decision Science and Data Mining of Health Datasets Introduction to Data Mining of Health Information and Data Sets Strategic Steps for Data Mining of Health Information and Data Sets Pattern Recognition, Pattern Matching and Pattern Analysis Data Mining of Health Datasets: Machine Learning, Applications, Algorithms and Health Databases Impact on Decision Science and Data Mining of Health Datasets
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 2: Introductory Analysis of Data Mining and Health Data Sets Data Mining and Health Data Sets: Data Preprocessing Data Mining and Health Data Sets: Feature Attributes Data Mining and Health Data Sets: Data Dictionary
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 3: Data Mining and Health Data Sets: Data Mining Methodologies Data Mining and Health Data Sets: Overview of Classification Data Mining and Health Data Sets: Introduction to Classification and Decision Trees Data Mining and Health Data Sets: Introduction to Clustering Data Mining and Health Data Sets: Introduction to Rule Induction Data Mining and Health Data Sets: Introduction to Association Rules
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 4: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining and Health Data Sets: Classification and Decision Trees Data Mining and Health Data Sets: Decision Trees Experiments in Decision Trees using Health Data Sets Optimizations in Classification Decision Trees using Health Data Sets Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Assignments: Paper 1 Due
Class Session 5: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining and Health Data Sets: Other Classification Techniques Data Mining and Health Data Sets: Bayesian Classification Experiments in Bayesian Classification using Health Data Sets Optimizations in Classification Techniques using Health Data Sets
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 6: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Classifiers Maximum Entropy Naive Bayes classifiers probabilistic classifiers Bayesian networks and statistical inference and classification Experiments in Health Data Sets using Naive Bayes probabilistic Classification
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 7: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Classifiers k-nearest neighbor classifiers Experiments in Health Data Sets using k-nearest neighbor Classification
Reading: Pages 2–5: Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann, ISBN: 1-55860-901-6: Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Assignments: Paper 2 Due
Class Session 8: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Other Classifiers Support Vector Machines Experiments in Health Data Sets using Support Vector Machines Classification
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 9 Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Other Classifiers Linear Classifiers Fisher's Linear Discriminant Experiments in Health Data Sets using Linear Discriminant Classification
Reading: Pages 2–5: Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann, ISBN: 1-55860-901-6: Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Assignments: Paper 3 Due
Class Session 10 Data Mining and Health Data Sets: Data Mining and Classification Techniques Learning Vector Quantization Perceptrons and Neural networks: multi-layer perceptrons Data Mining and Health Data Sets: Classification and Misclassification Experiments in Health Data Sets using Optimizations in Classification
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Class Session 11 Data Mining and Health Data Sets: Data Mining and Classification Techniques Strategic Steps for Data Mining of Health Information and Data Sets Data Mining and Health Data Sets: Training Data Set Data Mining and Health Data Sets: Test Data Sets Data Mining and Health Data Sets: Training and Test Results Data Mining and Health Data Sets: Analysis and Conclusions
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source)
Assignments: Final Paper Due
Seminar Series II: Intermediate Data Mining of Health Information and Data Sets
Dr. Emmanuel Hooper, PhD, PhD, PhD ehooper@fas.harvard.edu
Seminar Objectives This is a unique and interesting seminar that combines the main focus: Intermediate Data Mining of Health Information and Health Datasets. Attendees from all backgrounds will benefit greatly whether they are students, health and medical personnel, corporate executives, beginners, managers, with business backgrounds or experts in data management, decision science and health information and health datasets management, public policy, or other professions. This Intermediate seminar define emerging trends and challenges for Data Mining of Health Information and Health Datasets and essential for strategic steps for achieving and maintaining simultaneous and incremental effective decision-making in health information and health datasets in a global context. This prepares students to take other levels of the topic in the seminar series as follows: - Seminar Series I: Introduction to Data Mining of Health Information and Data Sets
- Seminar Series II: Intermediate Data Mining of Health Information and Data Sets
- Seminar Series III: Advanced Data Mining of Health Information and Data Sets
- Online Seminar: Introduction to Data Mining of Health Information and Data Sets
Prerequisite: The prerequisite for this class is the • Seminar Series I: Introduction to Data Mining of Health Information and Data Sets, or a similar background. An introductory understanding of health information and health datasets will be helpful. On the other hand any student or professional with experience will find this seminar very useful as enhancement of their expertise. Therefore attendees with some background can benefit as well as the seasoned professional or middle management or corporate executives. Target audience: This seminar is for students or individuals from all backgrounds, including health, medical, corporate executives, managers, business, beginners, professionals, mid-career or experts. It is designed for any individual who intends to learn or update their knowledge on and understanding of health information and health datasets or related areas. It is especially informative and strategic for all attendees who are interested in knowing the rapidly changing and emerging future trends and challenges and solutions for basic understanding of health information and health datasets for practitioners, researchers, faculty, or professionals in health, medical, in the private or public sectors, whether national, multi-national or global. Benefits of Attending: This seminar will provide you with time-saving resources that address important and useful global information on Data Mining of Health Information and Health Datasets. You will learn strategic techniques based on nearly 30 years of professional experience, research and consulting for global companies. The information is presented with lectures and discussions of real examples that you can use immediately in practical application to your situation. We also reference key resources and expert information that you will find vital and indispensable for regular use. You will find the seminar and information very stimulating, providing you with cost-saving tips and free analysis on rapidly emerging global information and trends with solutions and recommendations for meeting challenges and solving both common and complex problems on Data Mining of Health Information and Health Datasets. Handouts: Relevant examples and seminar slides and other relevant materials will be provided as appropriate. Seminar Objectives: The objectives of this seminar are to provide you with a significant combination of the main focus: Data Mining of Health Information and Health Datasets; and Emerging trends and solutions to Data Mining of Health Information and Health Datasets Requirements. It is intended that you will learn all three areas by the end of the seminar and apply them immediately to solve real and emerging challenges in Data Mining of Health Information and Health Datasets. By the end of the seminar you will gain both broad and in-depth understanding of all these areas for practical implementation in your situation. Assignments Class Participation: Students engage in class discussions on the key topics of each session. Paper 1: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Mining of Health Information and Health Datasets. Paper 2: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Protection and Privacy Regulations. Paper 3: Students write a short paper of approximately 2-3 pages on the benefits of Data Mining of Health Information and Health Datasets. Final Paper: Students research and on a topic related to Data Mining of Health Datasets using any of the Classification Techniques in the seminar and write a paper (approximately 10 pages) on a topic approved by instructor that demonstrates their understanding and application of the topic and lessons learned in the class. Papers should be written in Times or Arial fonts, 12 points and double space. Be sure to put your name, page number and date on each page of the paper. Seminar Evaluation The grading of the seminar will be as follows: Class Participation: 35% Paper 1: 10% Paper 2: 10% Paper 3: 10% Final Paper: 35%
Seminar Textbook Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann, ISBN: 1-55860-901-6 Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Various Handouts The seminar is well organized with experiments to facilitate understanding and rapid assimilate. Thus the reading includes lucid explanations, summaries and detailed analytical academic and practical solutions. This implies that students will learn readily through the presentations, power-point slides, explanations, samples demonstrations, discussions, and real applicable and relevant information including technical examples and detailed references for future development and use.
Optional Reading
References The following are references only to provide background reading and enhance understanding of the students. Yang, Jian, Alejandro F. Frangi, Jing-Yu Yang, David Zhang and Zhong Jin. KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition. IEEE Transaction on Pattern Analysis Machine Intelligence 27(2):230–244, 2005. Øhrn, Aleksander. Discernibility and Rough Sets in Medicine: Tools and Applications. PhD thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, Dec. 1999. NTNU report 1999:133. URL: http://www.idi.ntnu.no/~aleks/thesis Pagallo, G. and D. Haussler. Bolean Feature Discovery in Empirical Learning. Machine Learning, 5(1):71–99, 1990. Papagelis, A. and D. Kalles. Breeding Decision Trees using Evolutionary Techniques. In Proceedings of the 18th International Conference on Machine Learning, ICML-2001, pages 393–400. Morgan Kaufmann, San Francisco, CA, 2001. Pei, J., J. Han, and R. Mao. Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Proceedings of the A CM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21–30. ACM Press, New York, 2000.
Data Mining and Health Data Set Software Tools
Top Ten Data Mining Software Tools 1. SPSS/ SPSS Clementine 2. Salford Systems CART/MARS/TreeNet/RF 3. Yale (currently, Rapid Miner) (open source) 4. SAS / SAS Enterprise Miner 5. Angoss Knowledge Studio / Knowledge Seeker 6. KXEN 7. Weka (open source) 8. R (open source) 9. Microsoft SQL Server 10. MATLAB
Reference: KDD Nuggets Survey(May 2007), http://www.kdnuggets.com/ http://www.the-data-mine.com/; Additional Data Mining Tools: Rulequest Research, Rule Induction with C5.0, See5/Cubist software, 2002–2005. Rosetta software (Øhrn, 1999). XLMiner See the end of this syllabus for more Data Mining tools
Seminar Sessions Class Session 1: Strategic Steps for Data Mining of Health Information and Data Sets Decision Science and Data Mining of Health Datasets Impact on Decision Science and Data Mining of Health Datasets Intermediate Analysis of Data Mining and Health Data Sets Data Mining and Health Data Sets: Development and Analysis Data Preprocessing Data Mining and Health Data Sets: Development and Analysis Feature Attributes Data Mining and Health Data Sets: Development and Analysis of Data Dictionary Data Mining and Health Data Sets: Development and Analysis of Data Post processing Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 2: Strategic Steps for Data Mining of Health Information and Data Sets Data Mining and Health Data Sets: Clustering Techniques Data Mining and Health Data Sets: Clustering Model Data Mining and Health Data Sets: Development and Analysis of Training Data Set Data Mining and Health Data Sets: Development and Analysis of Test Data Sets Data Mining and Health Data Sets: Development and Analysis of Training and Test Results Data Mining and Health Data Sets: Analysis and Conclusions Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 3: Data Mining and Health Data Sets: Data Mining Clustering Methodologies Data Mining and Health Data Sets: Intermediate Clustering Data Mining and Health Data Sets: Intermediate Rule Induction Data Mining and Health Data Sets: Intermediate Association Rules Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 4: Data Mining and Health Data Sets: Data Mining and Clustering Techniques Data Mining and Health Data Sets: Other Clustering Techniques Data Mining and Health Data Sets: Sum of Squares Clustering Method Data Mining and Health Data Sets: Sum of Squares Clustering Criteria Data Mining and Health Data Sets: Sum of Squares Clustering Algorithms Optimizations in Classification Decision Trees using Health Data Sets Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Assignments: Paper 1 Due
Class Session 5: Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Clustering Algorithms Hierarchical Clustering Algorithms - clusters: agglomerative ("bottom-up") or divisive ("top-down"). Agglomerative Clustering Algorithms - separate clusters and merged into successively larger clusters Divisive Clustering Algorithms - division of whole set of clusters into successively smaller clusters Partitional Clustering Algorithms – both all clusters and divisive algorithms in hierarchical clustering Density-based Clustering algorithms -arbitrary-shaped clusters - region density of data objects exceeding threshold Experiments in Health Data Sets using K-means Clustering Techniques Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 6: Data Mining and Health Data Sets: Data Mining and Clustering Techniques Data Mining and Health Data Sets: Data Mining and Partitional Clustering Data Mining and Health Data Sets: K-means Clustering Techniques Multiple Cluster Sizes and Cluster Distances Experiments in Health Data Sets using K-means Clustering Techniques Optimizations in Classification Techniques using Health Data Sets Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 7: Data Mining and Health Data Sets: Data Mining and Clustering Techniques Cluster Outliers and Model Improvement Optimizations on Multiple Cluster Distances and Outliers Experiments in Health Data Sets using K-means Clustering Techniques Optimizations in Classification Techniques using Health Data Sets Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Assignments: Paper 2 Due
Class Session 8: Data Mining and Health Data Sets: Data Mining and Clustering Techniques Data Mining and Health Data Sets: Intermediate Principal Component Analysis Data Mining and Health Data Sets: Principal Component Analysis - Correlation and Unique Uncorrelated Components Data Mining and Health Data Sets: Intermediate Discriminant Component Analysis Experiments in Health Data Sets using Principal Component Analysis Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 9 Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Clustering Optimizations Experiments in Health Data Sets using Clustering Algorithms and Optimizations Experiments in Health Data Sets using Ward’s Clustering Algorithm Technique Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Assignments: Paper 3 Due
Class Session 10 Data Mining and Health Data Sets: Data Mining and Classification Techniques Data Mining of Health Information using Clustering Optimizations Experiments in Health Data Sets using Clustering Analysis and Optimizations Experiments in Health Data Sets using Clustering Optimizations Techniques Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 11 Data Mining and Health Data Sets: Data Mining: Comparative Analysis of Classification and Clustering Techniques Strategic Steps for Data Mining of Health Information and Data Sets Experiments in Health Data Sets using Optimized Clustering Techniques Data Mining and Health Data Sets: Optimizations in Training Data Set Data Mining and Health Data Sets: Optimizations in Test Data Sets Data Mining and Health Data Sets: Training and Test Results Analysis Data Mining and Health Data Sets: Analysis and Conclusions Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Assignments: Final Paper Due
Seminar Series III: Advanced Data Mining of Health Information and Data Sets
Dr. Emmanuel Hooper, PhD, PhD, PhD ehooper@fas.harvard.edu Seminar Objectives This is a unique and interesting seminar that combines the main focus: Advanced Data Mining of Health Information and Health Datasets. Attendees from all backgrounds will benefit greatly whether they are students, health and medical personnel, corporate executives, managers, with business backgrounds or experts in data management, decision science and health information and health datasets management, public policy, or other professions. This Advanced seminar presents effective intelligent strategies and solutions to emerging trends and challenges for Data Mining of Health Information and Health Datasets This includes essential for strategic steps for achieving and maintaining simultaneous and incremental effective decision-making in health information and health datasets in a global context. This is the final seminar for students in the following seminars series: - Seminar Series I: Introduction to Data Mining of Health Information and Data Sets
- Seminar Series II: Intermediate Data Mining of Health Information and Data Sets
- Seminar Series III: Advanced Data Mining of Health Information and Data Sets
- Online Seminar: Introduction to Data Mining of Health Information and Data Sets
Prerequisite: The prerequisite for this class is the • Seminar Series II: Intermediate Data Mining of Health Information and Data Sets, or a similar background. An intermediate understanding of health information and health datasets will be helpful. On the other hand any student or professional with experience will find this seminar very useful as enhancement of their expertise. Therefore attendees with some background can benefit as well as the seasoned professional or middle management or corporate executives Target audience: This seminar is for students or individuals from all backgrounds, including health, medical, corporate executives, managers, business, beginners, professionals, mid-career or experts. It is designed for any individual who intends to learn or update their knowledge on and understanding of health information and health datasets or related areas. It is especially informative and strategic for all attendees who are interested in knowing the rapidly changing and emerging future trends and challenges and solutions for basic understanding of health information and health datasets for practitioners, researchers, faculty, or professionals in health, medical, in the private or public sectors, whether national, multi-national or global. Benefits of Attending: This seminar will provide you with time-saving resources that address important and useful global information on Data Mining of Health Information and Health Datasets. You will learn strategic techniques based on nearly 30 years of professional experience, research and consulting for global companies. The information is presented with ebooks and discussions of real examples that you can use immediately in practical application to your situation. We also reference key resources and expert information that you will find vital and indispensable for regular use. You will find the seminar and information very stimulating, providing you with cost-saving tips and free analysis on rapidly emerging global information and trends with solutions and recommendations for meeting challenges and solving both common and complex problems on Data Mining of Health Information and Health Datasets. Handouts: Relevant examples and seminar slides and other relevant materials will be provided as appropriate. Seminar Objectives: The objectives of this seminar are to provide you with a significant combination of the main focus: Data Mining of Health Information and Health Datasets; and Emerging trends and solutions to Data Mining of Health Information and Health Datasets Requirements. It is intended that you will learn all three areas by the end of the seminar and apply them immediately to solve real and emerging challenges in Data Mining of Health Information and Health Datasets. By the end of the seminar you will gain both broad and in-depth understanding of all these areas for practical implementation in your situation. Assignments Class Participation: Students engage in class discussions on the key topics of each session. Paper 1: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Mining of Health Information and Health Datasets. Paper 2: Students write a short paper of approximately 2-3 pages on a topic approved by instructor in the area of Data Protection and Privacy Regulations. Paper 3: Students write a short paper of approximately 2-3 pages on the benefits of Data Mining of Health Information and Health Datasets. Final Paper: Students research and on a topic related to Data Mining of Health Datasets using any of the Classification Techniques in the seminar and write a paper (approximately 10 pages) on a topic approved by instructor that demonstrates their understanding and application of the topic and lessons learned in the class. Papers should be written in Times or Arial fonts, 12 points and double space. Be sure to put your name, page number and date on each page of the paper. Seminar Evaluation The grading of the seminar will be as follows: Class Participation: 35% Paper 1: 10% Paper 2: 10% Paper 3: 10% Final Paper: 35%
Seminar Textbook Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Additional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Sumathi, S. and Sivananda, S. N. (2006) Introduction to data mining and its applications: Studies in Computational Intelligence 29, Publisher: Springer, New York, Library of Congress Control Number: 2006926723 . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Health Data Sets: http://www.ehdp.com/vitalnet/datasets.htm Various Handouts The seminar is well organized with experiments to facilitate understanding and rapid assimilate. Thus the reading includes lucid explanations, summaries and detailed analytical academic and practical solutions. This implies that students will learn readily through the presentations, power-point slides, explanations, samples demonstrations, discussions, and real applicable and relevant information including technical examples and detailed references for future development and use. Optional Reading The following are references only to provide background reading and enhance understanding of the students. Yang, Jian, Alejandro F. Frangi, Jing-Yu Yang, David Zhang and Zhong Jin. KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition. IEEE Transaction on Pattern Analysis Machine Intelligence 27(2):230–244, 2005. Øhrn, Aleksander. Discernibility and Rough Sets in Medicine: Tools and Applications. PhD thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, Dec. 1999. NTNU report 1999:133. URL: http://www.idi.ntnu.no/~aleks/thesis Pagallo, G. and D. Haussler. Bolean Feature Discovery in Empirical Learning. Machine Learning, 5(1):71–99, 1990. Papagelis, A. and D. Kalles. Breeding Decision Trees using Evolutionary Techniques. In Proceedings of the 18th International Conference on Machine Learning, ICML-2001, pages 393–400. Morgan Kaufmann, San Francisco, CA, 2001. Pei, J., J. Han, and R. Mao. Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Proceedings of the A CM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21–30. ACM Press, New York, 2000.
Data Mining and Health Data Set Software Tools Top Ten Data Mining Software Tools 1. SPSS/ SPSS Clementine 2. Salford Systems CART/MARS/TreeNet/RF 3. Yale (currently, Rapid Miner) (open source) 4. SAS / SAS Enterprise Miner 5. Angoss Knowledge Studio / Knowledge Seeker 6. KXEN 7. Weka (open source) 8. R (open source) 9. Microsoft SQL Server 10. MATLAB
Reference: KDD Nuggets Survey(May 2007), http://www.kdnuggets.com/ http://www.the-data-mine.com/; Additional Data Mining Tools: Rulequest Research, Rule Induction with C5.0, See5/Cubist software, 2002–2005. Rosetta software (Øhrn, 1999). XLMiner See the end of this syllabus for more Data Mining tools Seminar Sessions Class Session 1: Intelligent Decision Science and Data Mining of Health Datasets Advance Data Mining of Health Information and Data Sets Intelligence Strategic Steps for Data Mining of Health Information and Data Sets Advanced Pattern Recognition, Pattern Matching and Pattern Analysis Impact on Decision Science and Data Mining of Health Datasets
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 2: Data Mining and Health Data Sets: Intelligent Data Mining Techniques Strategic Steps for Intelligent Data Mining of Health Information and Data Sets Experiments in Health Data Sets using Optimized Techniques Data Mining and Health Data Sets: Optimizations in Training Data Set Data Mining and Health Data Sets: Optimizations in Test Data Sets Data Mining and Health Data Sets: Training and Test Results Analysis Data Mining and Health Data Sets: Analysis and Conclusions Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 3: Data Mining and Health Data Sets: Data Mining Methodologies Data Mining and Health Data Sets: Association Rules and Algorithms Decision Science Data Mining and Health Data Sets: Decision Rule Induction Experiments in Health Data Sets using Rule Induction Optimizations Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 4: Data Mining and Health Data Sets: Data Mining Methodologies Data Mining and Health Data Sets: Association Rules and Data Mining Data Mining and Health Data Sets: Rule Induction and Data Mining Experiments in Health Data Sets using Rule Induction Optimizations Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source)
Class Session 5: Data Mining and Health Data Sets: Data Mining, Rule Induction and Classification Techniques Data Mining and Health Data Sets: Intelligent Rule Induction Techniques Hypothesis testing algorithms: RULEX Rough set rules Optimizations in Rule Induction Techniques using Health Data Sets Experiments in Health Data Sets using Rule Induction Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Class Session 6: Data Mining and Health Data Sets: Data Mining, Rule Induction and Clustering Techniques Data Mining and Health Data Sets: Intelligent Rule Induction Techniques Optimizations in Rule Induction Techniques using Health Data Sets Experiments in Health Data Sets using Rule Induction Techniques and Holtze’s 1R Algorithm
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Assignments: Paper 1 Due Class Session 7: Data Mining and Health Data Sets: Data Mining, Rule Induction and Clustering Techniques Data Mining and Health Data Sets: Frequent Pattern Matching and Rule Induction Techniques Optimizations in Rule Induction Techniques using Health Data Sets Experiments in Health Data Sets using Rule Induction and Frequent Pattern Matching Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source)
Class Session 8: Data Mining and Health Data Sets: Data Mining and Rule Induction Techniques Data Mining and Health Data Sets: Optimizations in Decision rule algorithms Decision Science Data Mining and Health Data Sets: Optimizations in Decision rule algorithms Experiments in Health Data Sets using Decision rule algorithms
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source)
Class Session 9 Data Mining and Health Data Sets: Hybrid Data Mining Techniques Data Mining and Health Data Sets: Training Datasets and Avoiding Overfitting in Training Sets and Test Data Sets Data Mining and Health Data Sets: Optimal Training Data Set Data Mining and Health Data Sets: Optimal Test Data Sets Experiments in Health Data Sets using Hybrid Data Mining Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source) Assignments: Paper 2 Due
Class Session 10 Data Mining and Health Data Sets: Hybrid Data Mining Techniques Optimizations of Data Mining and Health Data Sets Data Mining of Health Information using Data Mining Techniques Models Optimizations: Effective Use of Anomalies and Outliers Experiments in Health Data Sets using Optimal Data Mining Techniques
Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann
Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates. . See “Data Mining in Biomedicine and Science”, Chapter 21, pages 499-543 in Introduction to data mining and its applications.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source)
Class Session 11 Data Mining and Health Data Sets: Intelligent Data Mining Techniques Decision Science and Decision Making: Predicted Results Decision Science and Decision Making: Actual Results Data Mining and Health Data Sets: Confusion Matrices and Training and Test Results Data Mining and Health Data Sets: Confusion Matrix and Training Results Data Mining and Health Data Sets: Confusion Matrix and Test Results Data Mining and Health Data Sets and Health Information Data Mining of Health Information and Electronic Medical Records Data Mining and Health Data Sets: Analysis and Conclusions Reading Data Mining: Concepts and Techniques, Second Edition (2006) J. Han and M. Kamber, Publisher: Morgan Kaufmann Optional Reading Bath, P. A. (2004), Data mining in health and medical information. Annual Review of Information Science and Technology, 38: 331–369. doi: 10.1002/aris.1440380108 http://onlinelibrary.wiley.com/doi/10.1002/aris.1440380108/full Advances in data mining: applications in medicine, web mining, marketing。Sixth Industrial Conference on Data Mining (ICDM 2006), Leipzig, Germany, July 2006, Publisher: Springer.
Ye, Nong (ed.). (2003). The Handbook of Data Mining. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Lecturer’s biography:
Dr. Emmanuel Hooper earned 3 PhDs simultaneously in Computing Sciences and information Security and multiple Masters Degrees from leading global universities including Yale University, USA. He conducted graduate studies at Oxford University and Postdoctoral collaborative research at Harvard, MIT and Yale Universities. His professional experience includes leading consultancy for major global companies and adjunct faculty at various leading US universities, including the University of California. He is General Vice Chair of and Technical chair of many IEEE and many leading global associations, consortiums, committees, conferences and groups including IEEE, ISO, Ivy League, Harvard and Yale Alumni Professionals, Harvard-MIT-Yale Cyber Scholar, and editorial review committees of leading international journals. He is recognized in "Who is Who in the World", 2008-2011 and "Who is Who in America", 2007-2011. Dr. Hooper developed the most Comprehensive and Cost-Effective Strategy for Intelligent Data Mining in Decision-Making and Decision Science Approach to Global Health Information for all multinational companies. He is an author of Federal CIO Standards including advocating Intelligent Hybrid Data Mining of Health Information and Electronic Medical Records for Effective Decision-Making, Decision Science and Public Policy Development, including major IEEE Publications. Dr. Hooper is a global lecturer, at leading universities, researcher and consultant in Strategic Global Information Intelligence including Intelligent Hybrid Data Mining of Health Information and Electronic Medical Records for Effective Decision Science, Development Data Management, Data Mining and Reality Mining. This includes data mining of health information and datasets, decision science of medical and health information, data mining of vaccines response and analysis, epidemiology, and intelligent hybrid data mining of datasets on health and medical technologies for effective public policy for remedial strategies of US and global health.
Main Experiments Tools Weka (open source) XLMiner (open source) Rosetta (open source)
Additional Data Mining Tools and Resources • Angoss Software KnowledgeStudio 4.2 and Mining Manager 2.1 • Computer Associates CleverPath Predictive Analysis Server 3.0 • Fair Isaac Enterprise Decision Management suite • Genalytics Predictive Suite 5.0 • IBM DB2 Intelligent Miner • Insightful Miner 3.0 • Oracle Data Mining • Quadstone System V. 5 Reference: http://www.dmoz.org/Computers/Software/Databases/Data_Mining//
Health Data Sets Data Mining Resources Acxiom Corporation - Provides range of information services and products geared towards enterprise data management and retrieval. Advanced Software Applications - Data mining, analysis, and decision support software including predictive modeling, custom clustering, segmentation, scoring, rank order and profiling. Alterian Ltd - Specialises in the development of data analysis and visualisation technology including Nucleus, Atom, and Molecule. Alyuda Research, Inc. - Provides neural network software for data mining and forecasting as well as consulting and research services in neural networks and data mining. BayesiaLab - Bayesian network laboratory including a set of data mining and machine learning tools. The Chi-Square Works, Inc. - Provides multi-window, dynamic data mining systems (e.g., Panmo) that use graphical direct manipulation as the main user interface. Data can be specified, retrieved, and passed to analytical functions (e.g., SOM and CART) graphically. Cymfony - Develops information extraction technology, including the InfoXtract engine and the Dashboard line of media measurement applications. The Data Mining Group - Consortium of technology organizations developing the Predictive Modeling Markup Language (PMML) and predictive modeling standards based APIs. DataEngine - Software tool for intelligent data analysis which unites statistical methods with neural networks and fuzzy technologies. Site offers information and evaluation download. Data-Miner Software Kit - Consists of routines for predictive data mining. Runs under Unix, Windows 9x/2000/XP or Certified 100% Pure Java. Includes classical and state-of-the-art prediction methods. Direct Insite - Providers of data mining and data visualization software, EBPP, professional services and transaction rating and billing services to the telecommunications market. DM-II - Builds classifiers using association rules. Developed at the School of Computing, National University of Singapore. Ellipse - Offers visual data mining tool based on Self-Organizing Map. Enterprise Miner - SAS Institute's entry to the world of data mining: Enterprise Miner. Site contains white paper for download describing Enterprise Miner as well as links to the entire SAS family of products. Exclusive Ore Inc. - Consulting group specializing in data mining. Known for XAffinity. IBM Knowledge Management Solutions - Tools that helps knowledge workers to present organized, understandable information in the right business context. Includes case studies and trial version. IBM's Intelligent Miner Family - Collection of tools to identify and extract high-value business intelligence from DB2 data assets. Powerful visualization built in. A Text Mining tool is also available Inforsense - Kensington Discovery Edition platform includes data integration, transformation, visualisation, mining and discovery processes creation. Vertical application in life science and general discovery informatics. Insightful Corporation - Insightful Corporation is a supplier of software and services for statistical data mining, business analytics, knowledge management, and information retrieval. Products include S-PLUS, Insightful Miner, InFact. Intelli-mine - Provides analytical solutions that assist organizations improve the quality of their daily decisions. ISoft - Provides tools to find out the vital information from your data. KnowledgeMiner - Self-Organizing Data Mining for your Mac. It works using three advanced self-organizing modeling technologies: Group Method of Data Handling (GMDH Neural Networks), Analog Complexing and Fuzzy Rule Induction. This is the first time that all of these algorithms have been available in one place on any computer platform. KXEN Inc. - Data mining automation company offering business analytics software. Magnify, Inc. - Data Mining Software and Predictive Modeling Software for the financial services and insurance industries. Management Intelligenter Technologien GmbH - Data mining and clustering software for numerical and textual data. Main product is DataEngine. Interfaces to LabView and BridgeView. DataEngine is a software tool for data analysis in which fuzzy rules, fuzzy clustering, neural networks and fuzzy neural systems are offered in combination with mathematics, statistics and signal processing. Mantas - A provider of data mining software that delivers business intelligence and compliance solutions for the global financial services industry. Marketswitch - Designs, develops and sells decisioning software that maximizes the capital value of the customer asset. Provides true mathematical constraint-based optimization software. Mars - Salford Systems. A new high-speed predictive modeling solution that provides superior forecasting accuracy. An essential component of any data mining solution, MARS automates the development and deployment of accurate and easy-to-understand regression models. Megaputer Intelligence, Ltd. - Provides a family of solutions for data mining, knowledge discovery in databases, and natural language text retrieval and analysis. Best known for PolyAnalyst Miner3D - Provides advanced visualization, sonification and speech technologies for interactive data analysis, real-time data mining and for visual navigation through complex information systems. Products. Mobile and Distributed Data Mining - AGNIK, LLC specializes in mobile and distributed Data Mining software. Model and Mine - Companion to the well regarded Data Preparation for Data Mining book by Dorian Pyle. Lists and reviews ETL and Mining software and resources, as well as articles by Pyle Norkom Global - Providing financial crime and compliance software solutions for the global financial services industry. Partek Incorporated - Pattern recognition software used in life sciences and engineering for gene expression (microarray) data analysis, high throughput screening, and drug design including SAR and ADME prediction. PolyVista, Inc. - Front-end software for Microsoft Analysis Services that integrates OLAP, data mining, and 3D data visualization. Portrait Software - Provides customer analytics and interaction management tools. QL2 Software - Specializes in web data harvesting and extraction using SQL-like query language (WebQL). Raptor International - RapAnalyst is a powerful application that transforms complex data into actionable information. Retsel Group - Report Miner documents Crystal Report files for both developers and end-users to enhance understanding of reports and can identify differences between report files. RuleQuest Research - Produces knowledge discovery and data mining tools for Unix and Windows 95/98/NT/2000. These include See5 (decision trees) aka C5 and Magnum Opus (association rules). Sagarmatha - An advanced engine to provide data mining solutions software for 1 to 1 marketing and personalization. Salentica Systems Inc. - An end to end analytics solution that creates and distributes knowledge to empower financial service providers to operate at their peak effectiveness. Salford Systems - Provider of data-mining and choice-modeling software and consultation services. Scientio - XML Miner and XML Rule are two tools designed to data mine unstructured XML data and create a fuzzy expert system explaining what's found. The Expert system rules can be reused on fresh XML or embedded data sources. Screen Scraper - Products and services for web site data extraction. Flagship product, screen-scraper, provides a GUI to define links to follow and information to extract, and works with several programming languages and platforms. Slice And Dice Data - This site lists and describes all data items available from the long form US Census. It allows visitors to create custom multidimensional reports based on any criteria. Smart Research BV - Statistical Modeling and Artificial Reasoning Technology for solving complex problems using intelligent techniques such as neural networks and graphical models. Spotfire - Emphasis on unique visual discovery and visualization, esp. for life sciences. SPSS, Inc. - Develops, markets, supports integrated line of statistical software products that let users effectively bring marketplace and enterprise data to bear on decision-making. Statistica - Provides information about StatSoft's complete line of data analysis software. STATISTICA UK - Provides information about StatSoft's range of software for data mining, data analysis, quality control, graphics and web-based analysis. Stonefield Query - End-user data mining, querying, reporting, and business intelligence tool. Also offer an associated development kit for building custom reporting. Tennyson Maxwell Information Systems, Inc. - Develops and markets webspider applications and services. Products and services include offline browser Teleport Pro, Dataplex, a worldwide data mining service, and WebDisc. thinkAnalytics - Provides a scalable, extensible system for real-time predictive analytics. Thinkmap - Offers software that animates data, creating interactive displays that interface directly with databases. Thinx Data Visualization Software - A 32bit Windows application that is becoming the standard by which users connect graphics with data. In Thinx, graphics and data blend into one. Change the data and it is automatically reflected in the graphics. Two Crows Data Mining Technology Report - Details on 26 different data mining products, from 24 vendors. How to order the report. Pricing. Unica Corporation - Features cross-channel marketing suite which helps businesses implement multi-channel marketing and campaign management strategies. Visual Analytics Inc. - Visual data mining and link analysis systems, pattern and trend detection, visualization, VisuaLinks. Popular with Government and Military data work. Visual Numerics - The leading provider of visualization, mathematics, analysis and network software solutions including PV-WAVE, JWAVE, IMSL and JNL. XSB Inc. - Tools to extract data from poorly structured sources. Data warehouse and integration, data cleansing and data mining over the web, based on open source advanced logic programming technology. Zinnote - Offers a data reporting an integration tool. Links with databases through ODBC or OLE DB connections to create graphical and tabular data reports. Reference: http://www.dmoz.org/Computers/Software/Databases/Data_Mining/Tool_Vendors/
Product price or special offer
| |
|