Principal Component Analysis
| Inputs |
Continuous data (unlimited number, minimum 2) |
| Outputs |
Specific P.C.A. Matrix |
Display:

Description:
The purpose of a Principal Component Analysis is to summarize and hierarchize the information contained in a matrix
made up of n lines (n being the number of entities on the working map) and p columns (p
being the number of data elements entered). The entities on the working map are called the individual values,
and the input data are called the variables.
Calculations:
- Standardizing data: the data are centered and reduced ( xi = ( xi - average of x ) / standard deviation for
x).
- Drawing up the correlation matrix for the variables. This matrix is a symmetrical square of order p (in
the (i,j) box, we place the correlation coefficient for the variables i and j).
- Finding the eigenvectors for the correlation matrix, together with their associated eigenvalue.
- Calculating coordinates for individual values and variables on these vectors, for graphic representation.
- Calculating the other parameters (see below).
Interpretation:
The eigenvectors give the factor axes, and by choosing two axes, we can represent the individual values and
the variables on the above graph.
The factor axes are determined to give the best possible rendering of the dispersion of the data present in the
matrix. The first two give the most information for graphical representation, but you can choose any axes.
Graphics :
Under each graphic, you will find a button : that creates a "graphical window"
from current drawing. This window let you compare graphics, export them, or print them.
Results:
In the table in the lower left-hand corner of the parametrizing window, you can see the eigenvalues and the
percentage of information taken into account by the corresponding axis. The graph in the lower right-hand corner shows
the histogram of the eigenvalues. By clicking on a red rectangle (which represents the eigenvalue), the number
of the selected eigenvalue is displayed above the graph.
In the graph in the top right corner we can see the representation of the individual values (red squares)
and the variables (blue squares) in the factorial design selected. In fact, the horizontal axis
corresponds to the axis selected in the axis 1 data capture field , and the vertical axis is that in the axis
2 data capture field. These axes are between 1 and the number of data elements entered.
A click on a red or blue square displays the name of the individual value or the variable in question under the graph.
If you need to know the exact coordinates of a variable or an individual value in the factorial design, you must click
on the Save in a file button and read the file created.
If you modify the window size, the images are automatically adapted to the new
size.
If the all axes box is checked, the calculations are made for all the axes; if not, they are made only for
the 2 axes selected, which reduces calculation time (especially if the module contains a large number of entries),
together with the size of the results file.
By clicking on the Save in a file button, you create a tabulated text file in which all the results of the
P.C.A. will be recorded.
In this file, you can find:
- The initial centered, reduced matrix
- The eigenvalues of the correlation matrix
- The eigenvectors of the correlation matrix
- The total inertia of the cloud
- The hierarchy of the axes (the percentage of concentration of the information on each axis given by the
eigenvalue)
- The accumulated hierarchies for the axes
-
Information concerning:
- All the axes, if you have checked the all axes box
- The two desired axes, if you have checked the 2 axes box
The information concerning the axes is as follows:
For each individual value and each variable, the results are of five types:
- Their coordinates on the selected axes, enabling them to be situated as compared with the axis system.
- Their contributions to the selected axes, measuring the role played by each in the formation of the axis.
- The qualities of their representation on the selected axes, measuring their degree of proximity to the axes.
- Their share of the total inertia of the cloud, giving an idea of their specificity as compared with the
average.
- Their relative weight, showing the importance of the role of each in the processing.
Output:
The table containing the coordinates of the individual values on all the axes is supplied as an output, to create an
Ascending Hierarchical Classification once this module has been used. See the A.H.C. module for
further details on this classification.
Notions of statistics:
The detailed results above must be interpreted with care. This is because the results are given only for the highest
(and lowest) values of the coordinates for the individual values and the variables.
The closer data are to 0, the less significant the corresponding axis becomes (the variable or individual value
plays a smaller and smaller part in the structure shown by the axis). To interpret the coordinates, we must thus think
in terms of opposition, and hence study the extreme values.
Moreover, we cannot use just one axis (even the first one) to get a clear picture of the characteristics of a
particular individual value, or to compare two individual values. But if we examine several axes we can determine
fairly precisely the specific characteristics of each individual value as compared with all the variables.
For further details, you can consult the following book:
L'analyse statistique des données en géographie (Statistical analysis of geographical data). Lena
Sanders. Collection Alidade. Groupement d'Intérêt Public Reclus. Montpellier. 1989.
Script:
2 module untyped_list ""
3 mod_type integer "103"
3 mod_subtype integer "520"
3 mod_name string "ACP"
3 mod_dads integer_list ""
4 ? integer "7"
4 ? integer "6"
4 ? integer "4"
4 ? integer "5"
3 all_axis boolean "F"
3 axis1 integer "1"
3 axis2 integer "2"
3 ind_nb integer "20"
3 var_nb integer "4"
Samples
|