Training Analysis Page: Correlation
The Modify Field Dialog allows you to analyze and modify the properties of a field. The Training Analysis page allows you to view an analysis of the prediction results versus the desired values.
& For help with predictions, see Predicting and Modeling Financial Data.
Correlation Sub-Page Data
This sub-page displays an analysis of the statistical correlation between the desired values and the predicted values. Specifically, the reported correlation coefficient is an analysis of whether the desired values and the predicted values move in the same direction by similar amounts. If a strong relationship is found, the changes in the predicted values are very similar to the changes in the desired values.
A correlation value of 1 indicates that both values change by the same amount for each sample. A value of 0 indicates that there is no correlation between the changes in the values. Negative values indicate that the values move in opposite directions.
correlation = 1 perfect correlation
correlation = 0 no correlation found
correlation = -1 inverse correlation
A correlation value is calculated for both the actual values and the change or percent change values. When predicting using change or percent change values, the correlation of the actual values is typically very high. This is because the desired and predicted changes are typically very similar relative to the scale of the data. However, the change and percent change values are evaluated relative to the scale of the other changes. Therefore, this can be a more accurate analysis of the effectiveness of the model.
The analysis is performed for the training, cross validation, and accuracy testing sets used for the prediction. Each type of analysis is performed on the entire subset. In addition, each type of analysis is also performed for when the value is predicted to increase or decrease. This is useful for determining if predictions in a particular direction are more accurate than others.
Ä Note: If a value associated with a predicted direction is listed as "n/a", no predictions were made in that direction.
Correlation Sub-Page Analysis
The analysis presented on this page is based on the correlation values. It detects common characteristics to look for in the data and is intended only as a starting point for evaluating the model. Some common results include:
· This might be a good / reasonable / weak correlational model.
This is an analysis of the correlation in the testing set. If no testing set is used, the cross validation or training set error is used. The following table is used:
correlation > .20 excellent model
correlation > .15 very good model
correlation > .10 good model
correlation > .05 reasonable model
correlation <= .05 weak model
The evaluation of models in this way is very abstract. Models that are classified as "excellent" or "good" may be good at mirroring the desired data, but may not predict the values in a way that is useful in the way that was intended. Similarly, models that are classified as "weak" may produce values that are still useful. As stated above, this is intended only as a starting point for evaluating the model.
· It produced values with a strong / weak correlation to the desired data.
This is an explanation of the rationale for the analysis text. As explained above, a higher correlation is desirable. If an inverse correlation is reported, this is generally not a desirable characteristic, since the model is supposed to be training to produce results similar to the desired data, not opposite to it.
· It is specialized on the training data.
This is an analysis of the correlation in the testing set as compared to the correlation in the training set. If no testing set is used, the correlation in the cross validation set is used.
This is significant when determining whether the training has produced a model that is good at making generalizations outside of the training set. If a model over-trains or simply memorizes the training data, it may perform very well at predicting values in the training set; however, it will tend to perform poorly at predicting the values outside of the training set. A good predictive model will generally perform equally well on data in the training, cross validation, and testing sets.
This message may appear if the samples to weights ratio is not adequate for the type of data being used. A good rule of thumb is to have about ten training samples for each weight in the network. Significantly lower ratios may allow the neural network to simply use the weights to memorize the data, rather than make effective generalizations about its characteristics. See the Prediction Model page for a report of this ratio and help for improving it.
· It is specialized on upward / downward trends.
This is an analysis of the correlation across all data sets, comparing the correlation when upward changes are predicted to the correlation when downward changes are predicted. If the correlation is significantly better for predictions in a given direction, the model may be specialized on predicting changes in that direction. This is significant when determining whether directional information from the prediction is useful.
This message may appear when the training set contains data that trends mostly in the reported direction. In this case, the network may determine that the average change is in a given direction and use that average change when it does not have enough information to correctly predict a value. A report of the actual number of changes up and down is on the Overview sub-page.
· It is only producing upward / downward predictions for this data.
This is an analysis of the correlation across all data sets. It indicates that a model is predicting changes in only one direction. If the training set does not contain predominantly changes in one direction, this is typically an indication that the model did not have enough relevant information to produce an effective model.
If a problem occurred during the training or calculation phase, the analysis will be replaced with a description of the error. A summary of these error messages is displayed on the help for the Modify Field Dialog: Training Analysis page.
How Did I Get Here?
This is a sub-page of the Modify Field Dialog: Training Analysis page.