Global Quantitative Research on Education Information Intelligence Trends
Key Intelligence and Expert Resources on Critical Global Quantitative Research Methods for Education Information Trends and Solutions
Global Information Intelligence and Trends
Critical Intelligence on Current and Emerging Global Quantitative Research Information Trends and Solutions
Summary
Quantitative Research Methods: Evidence Based Research
Quantitative research methods are applicable to rigorous scientific evaluation of the effectiveness of US Educational, Critical Infrastructures, Electricity Markets, Economics, Health Care Cost, Ethics and Integrity and Regulatory Reforms, etc.
The Imperical Quantitative Research Approach includes the following:
· Experimental Quantitative and Qualitative Research Methods
· Evaluation of existing intervention Programs and Activities
· Implementation capacities of Research Programs and Activities
See information on using Quantitative Methods at Harvard:
Quantitative Research Methods
Evidence-Based Research
Quantitative Research analysis
Comparative Analysis
· Econometrics and Labor Economics
o Regression Analysis
§ Linear Regression
§ “Natural Experiments” Instead of “Controlled Experiments”
§ Observational Data
§ Omitted-Variable Bias and
§ Single Equation Methods Model – Dependent Variable
§ Simultaneous Equation Methods– Instrumental Variables
o Data Sets
§ Time Series Data – observations of one variable over time
§ Cross Sectional Data – many students’ performance over in a given year
§ Panel Data – Consists of both Time Series Data and Cross Sectional Data
§ Multidimensional Panel Data - contain observations across time, cross-sectionally, and across some third dimension:
Scientific Quantitative Statistical Research Methods
Statistical Methods
· Statistical Chi Square Methods
· Statistical Rule Induction
· Statistical Clustering
· Variance, Covariance, Distribution, Variables
· Cluster Means
· Sum of Squares
· Support Vector Machines
Artificial Intelligence and Machine Learning
· Statistical Pattern Recognition
· Classification, Supervised Algorithms · Hidden Markov Model and applications,
· Decision processes and reinforcement learning
· Machine Learning
· Bayesian and Neural Networks: Representation, Inference and Learning
· Statistical Clustering
· Variance, Covariance, Distribution, Variables
· Cluster Means
· Sum of Squares
Quantitative Research Methods
· Tools: SPSS, Rosetta, XL Miner Software
· Data Mining and Reality Mining
· Hybrid Data Mining and Statistical Analysis
Quantitative Research Data
· Data Sets
· Real Datasets
· Historical Data Sets
· Current Data Sets
· Emerging Data Sets
· Training and Test Data Sets
· Feature Dictionary
· Feature Sets
· Feature Attributes
· Categories
· Subcategories
· Classes
· Types
· Attributes
Quantitative Statistical Research: Data Mining Algorithms
· Bayesian Classification
· Discriminant Analysis
· Ward Clustering
· Chi Square Statistical Analysis
· Association Rules
· K-Means Clustering
· Sum of Squares
· Rule Induction Algorithms
· Cubist/C5.0 Rule Induction
· Holte's 1 rule Induction
· Genetic Algorithms
· Discriminant Component Analysis (DCA)
· Fisher’s Discriminant Analysis
· Principal Component Analysis (PCA)
· Regression test statistics
· Principal Component Regression
· Linear combinations of student deviations
· Statistical methods: Sequence of performance of student profile and past Performance.
· Probability distribution of educational data
· Probabilities with maximum variance, correlation and non-correlation
· Transition matrix of possible groups
· Bayesian approach to performance transitions
· Bayes Factor statistic - testing null hypothesis:
- Observed performance transition probabilities
- Profiled performance transition matrix
· Decision tree classification
· Rule learner
· Naive BAYES
· Maximum Support Rules
· Conditional Rules
· Filtering for Maximum Support Attributes
· Validation of Test Data
· Statistical theory and neural networks
· Ripley neural nets- class of statistical models
· High-dimensional parameters model choice
· Predictive Bayesian inference in computation
· Neural Networks
- Multi-layer perceptron
- Back propagation network
- Feed-forward network (FFNN)
- Performance in multiplayer perceptrons in NN limits of student
- Performance analysis in terms of over-fitting in NN complexity
· Linear Discriminant, nearest neighbor, etc.
· Reducing over-fitting using logistic regression
· Performance conditions and characteristics
· Convergence or initialization conditions
· Representation and data sizes
Quantitative Statistical Techniques
· Pattern recognition using neural networks
· Frequent pattern mining techniques
· Pattern Recognition
· Patterns Analysis
· Correlation and Cross-Tabulation
· Cross-Correlation
· Measurement of Failures and Effectiveness
· Monitoring, Detection, Prevention and Response
· Feature Cost-Sensitive (FCS)
· Cost Factors - Dynamic Costs
· Fault, Failure and Success Detection
· Heuristics
Quantitative Research: Field research Data Acquisition and Analysis
· Records on Performance
· Field Surveys
· Case Studies
· Questionnaires
· Test Cases
· Test Procedures
· Performance Matrices
· Corroborative Data
· Machine Learning
· Artificial Intelligence
· Quantitative Statistical Analysis
Evidence Results and Presentation
· Quantitative Statistical Analysis Outputs
· Receiver Operating Characteristic (ROC) curves of performance
· Confusion Matrices
· Statistical Tables
· Effectiveness, Robustness, Scalability, Transportability, Accuracies, Efficiencies
Quantitative Research: Data Mining Techniques
B. Median Distance: this defines the distance between two clusters as the distance
between the cluster medians.
2. The Sum-of-Squares Methods
Methods that minimizes a sum of squares inaccuracies criterion, includes K-means
clustering. The sum-of-squares clustering method identifies a partition of data based on a predefined clustering criterion dependent on the within-class and between-class scatter matrices.
3. The Clustering Criteria
The task is for clustering methods to partition a set of n data samples into g clusters to optimize the clustering criterion.
4. Clustering Algorithms of the Sum-of-Squares Methods
Combinational optimization for the partition of n objects into g groups is the
optimized selected criterion. This requires the evaluation of all possible partitions.
5. K-means Clustering
The K-means algorithm partitions data into k clusters to minimize the within-group
sum of squares.
6. Selecting Number of Clusters
The selection of the number of clusters has been a problem in cluster validity tests
and analysis since no single method can validate unstructured clusters.
methods formal validations:
· Lkelihood ratio
· Chi square
· covariance matrices
7. Analysis of Sum-of-Squares Clustering Methods
The problem for clustering methods is to partition a set of n data samples into g
clusters to optimize the clustering criterion.
Rule induction techniques and algorithms are used to extract information from data
because the representation of information is intuitive and readily understood.
Rule induction methods return “if/then” rules as outputs of
- systems based learning (e.g. k-nearest neighbor)
- statistical techniques (e.g. naïve Bayes classifier)
- neural networks
- support vector machines (SVM)
- Classification: each training example is represented by a set of predictor attributes and a class attribute. The algorithm analyses relationships between the predictor and the goal attributes to create a model that can be used later to predict the value of the goal attribute of new examples.
1. The Divide and Conquer Rule Induction Technique
The divide and conquer technique generates decision trees. The decision tree
algorithms use the divide and conquer technique to construct decision trees via a top down, greedy search. The divide and conquer technique evaluates all the predictor attributes to classify the examples in the training set.
2. The Separate and Conquer Rule Induction Technique
The separate and conquer technique generates a set of rules. The technique learns
a rule from a training dataset, then removes from it the examples covered by the rule, and subsequently learns recursively other rules that cover the remaining examples. This is the most common technique for rule induction algorithms.
3. Rule Induction Algorithms and Techniques
Rule induction algorithms specify procedures based on the above techniques.
I. Search mechanism – this involves the search strategy and method. The search strategy is implemented using the following procedures.
a. InitializeRules – specify if the initial rule is a generic rule without an antecedent, as specific rule is derived from an example, or a different rule between these two.
b. RefineRules – determine if the current rule is generalized or specialized, so that the chosen operation is consistent with the type of initial rule specified in the InitializeRules procedure. The search methods are based on SelectCandidates and FilterRules.
c. SelectCandidates – the procedure selects the subset of rules that will be generalized or specialized. This procedure is specified through a beam search. This is followed by a specific search method through instantiation of the beam width parameter. A greedy search method can be obtained by setting the beam width parameter to 1.
d. FilterRules – the procedure can use the same search method as
SelectCandidates or a different search method. The search method can
be specified similarly in both procedures.
I. Rules Representation – this is implemented by the RefineRules
procedure, which determines the conditions that can be added to the
candidate rules.
II. Rule Evaluation – this is directly defined by the procedure
EvaluateRule, since it determines the rule-quality measure in the rule
evaluation process.
III Pruning Methods – this is determined by the Stopping Criterion and
Post Processing procedures. The Stopping Criterion implements pre-
pruning methods by determining when to stop refining the rules,
and Post Processing implements post-pruning methods.
Analysis of Separate and Conquer Algorithms
Many of the rule induction algorithms based on the divide and conquer approach differ from each other in four ways:
1. The representation of the candidate rules;
2. The search mechanism used to explore the space of candidate rules;
3. How the created rules are evaluated;
4. The pruning method.
The rule representation has a significant influence in the learning process, since some concepts can be expressed in one presentation but not in others. In particular, rules can be represented in propositional or first order logic.
Propositional rule algorithms
Propositional rules comprise of selectors, which are associations between pairs of attribute-values.
· CN2 and C4.5 rules
· RIPPER
· FOIL, PROGOL and Reduced Error Pruning (REP)
Prolog representation
Inductive Logic Programming (ILP)
ILP uses the same principles of rule induction algorithms
Analysis of Rule Induction Algorithms
1. Association Rules Induction Technique
Association rules are defined as rules that are based on the simultaneous occurrence of a set of event items, which satisfy specific conditions. In association-rule discovery, any association algorithm must discover precisely the same rule set, i.e. the set of all rules have support and confidence greater than a user-specified threshold. In rule induction techniques, association rules may be used to analyze multiple features of attributes of various datasets.
1. To format it into a database file where each row is an audit record
and each column is a field of features in the audit records.
2. To enable continuous merging of the rules from each run and thereby aggregate the rule set of previous runs.
2. Frequent Episodes
A frequent episode is defined as a set of frequent events that occur within a time window of a specified length. The events in a serial episode occur in sequence within a specified time of minimum frequency.
1. Examination of the frequent associations of the feature attributes of the attack event.
2. Computation of the frequent sequential patterns from the associations.
3. The associations of attributes and sequential patterns of records are combined into a single rule.
2. C4.5 Rules - first generates a decision tree using the divide and conquer technique, and subsequently extracts one rule for each leaf node of the tree
3. CN2 - comprises the separate and conquer technique. This also includes
RIPPER and AQ algorithms, and Evolutionary algorithms, which include genetic algorithms and Genetic Programming (GP), to extract rules from datasets.
4. Rule induction: ID3 algorithm
Rule induction within the artificial intelligence (AI) research involves the use of algorithms, such as ID3, which uses entropy as the criterion for selecting the data fields for tree branching and the grouping of field values between branches. The generated rules can be more succinctly summarized. One method involves the omission of each rule condition in turn to see whether this results in any misclassification of data.
Quinlan “iterative dichotomize” (ID3) system
The ID3 rule induction algorithm is applied to a sample of data to generate a set of rules and then other data items that were misclassified by the current rules are examined. A number of similar data items is added to the initial set and the rule induction algorithm is re-run to generate a new set of rules.
5. Decision Trees
A decision tree corresponds directly to a set of rules, with as many rules as there are
leaf nodes in the whole tree. Each rule is a tracing out of the path from the top of the
tree to a leaf node. The key question is which of the attributes is the most useful
determiner of the conclusion of the rules.
6. Rule induction: CN2 Algorithm
The CN2 algorithm induces an ordered list of classification rules from the dataset using entropy as its heuristic search. CN2 consists of a search procedure and control procedure.
7. Rule induction: C4.5 Rules
C4.5 uses a divide-and-conquer approach to growing decision trees that was pioneered by Hunt et al. The default splitting criterion used by C4.5 is the gain ratio, an information-based measure that takes into account different numbers and different probabilities of test outcomes.
8. RIPPER
The RIPPER uses a heuristic v value function and encoding length for determining when to stop adding rules to a rule set and a post pass to optimize the rule set. Individual rules are grown and pruned. The encoding length heuristic is as follows: after each rule is added, the total description length of the rule set and the examples is computed.
9. AQ Algorithms
The AQ algorithm is a rule induction technique for producing a complete and consistent description of classes. A class description is formed by a collection of disjuncts of decision rules describing all the training examples given for that particular class. A decision rule is a set of conjuncts of allowable tests of feature values. It uses the given parameters to direct the AQ algorithm in the process of searching for a complete and consistent set of classes.
10. Genetic Programming and Genetic Algorithm
Genetic programming (GP) is the main kind of Evolutionary Algorithms (EA) designed to evolve programs. Hence, GP is a kind of EA where the individuals being evolved are computer programs. Banzhaf defined GP as the direct evolution of programs or algorithms for the purpose of inductive learning. The four major kinds of EAs are genetic algorithms, genetic programming, evolutionary strategies and evolutionary programming
11. Genetic Algorithms in Rule Induction Technique
A genetic algorithm (GA) is used to explore the space of all subsets of the given feature set. Each of the selected feature subsets is evaluated (its fitness measured) by invoking a rule induction algorithms such as AQ15, with the correspondingly reduced feature space and training set and measuring the recognition rate of the rules produced. The best feature subset is used in the actual design of the recognition system.
I. Representation Issues
The first step in applying GAs to the problem of feature selection is to map the
search space into a representation suitable for genetic search.
II. Fitness Function
In order to use genetic algorithms as the search procedure, it is necessary to
define a fitness function, which properly assesses the decision rules generated by
the AQ algorithm.
As an empirical learning method, 1R takes as input each value of several attributes for a given class. It then generates a rule that predicts the class with the values of the attributes. The 1R algorithm selects the most informative single attribute and bases the rule on this attribute. Holte reported the results of experiments measuring the performance of very simple rules on the datasets commonly used in machine learning research. The specific kind of rules examined, called "1−rules", are rules that classify an object on the basis of a single attribute (i.e. they are 1-level decision trees). Holte described a system, called 1R, whose input is a set of training examples and whose output is a 1−rule.
Application of Holte’s 1R Using Rosetta
The process of using rule induction to isolate each conditional attribute so that it can be identified in terms of its maximum support is illustrated in Rosetta software. Robert Holte’s 1R Algorithm can be adapted to provide the individual support levels. Holte’s 1R was implemented using Rosetta’s 1R Reducer (Holte’s 1R Reduct) which returns all attribute sets. The set of all 1R rules, i.e., univariate decision rules, are indirectly returned as a child of the returned set of single reducts. The first implication is that 1R can be used to predict the accuracy of the rules produced by more sophisticated machine learning systems.
13. Quantitative Scientific Evidence Reports
Quantitative Evidence Reports include the following methods:
Hybrid Quantitative Methods
· Hybrid Quantitative Methods for Performance
· Effectiveness
· Optimization
· Accuracies
· Evidence Reports
Quantitative Statistical Heuristics Methods
Related Areas:
Select the following pages on the left column to find out more on related subjects:
n Data Mining and Reality Mining
n Global Auditing
n Global Risk Management
n Global Compliance
n Global Regulations
n Global Standards
n Global Internet and Society
Product price or special offer