Home
 

Quantification

Input 1 Continuous data
or
1 Continuous matrix
Output Discrete data
or
Discrete matrix

Display:

Description:
The input data are divided up into classes according to the quantification selected. The result at the output is the class number for each initial entity.

The parametrizing window has two tabs:
   - The Parameters tab, containing the choice of the type of quantification, the standard deviation factor for the standard and standard zero centered quantifications, the desired number of classes, and the table of the automatic and custom thresholds.
   - The Drawings tab, containing the histograms and the box and whiskers.

The choice of quantification is made according to the type of data input. Users who are unfamiliar with this operation will very quickly learn to recognize the correct type of quantification, especially with the help of the box and whiskers presented in the Drawings tab. The one selected by default is quantiles but it is not automatically the best.

The class number can be a figure between 1 and 512, except during calculation of recursive averages, for which the number of classes must be a power of 2 (2, 4, 8, 16, 32... 512). ATTENTION: generally, the number of classes should not be more than 12, to enable clear representation in colors or qualitative symbols.
Moreover there are two methods enabling the theoretical optimum number of classes to be found:
   - according to Brooks and Carruthers:  number of classes = integer part of ( 5*log10(N) + 0.5 )
   - according to Huntsberger:  number of classes = integer part of ( 3.3*log10(N) + 1.5 )
where N = number de values in the data input.

The standard deviation factor allow to choose a multiplying factor which will modify the class width for the standard and standard zero centered quantifications.

For quantification by custom thresholds, the thresholds can be

  • entered in the right-hand column. You must validate each of the thresholds entered with the "Downward arrow" or "Tabulation" keys on your keyboard.
  • set thanks to the mouse in the Drawings tab (see below).

The button representing an arrow, which is to be found under the threshold table, enables values to be transferred from the automatic thresholds to the custom thresholds column.

The text area situated below the Parameters tab contains the Jenks and Tai indicators, the inter-group and intra-group variances, the number of elements contained in each class, and a statistical summary of the data input. See the paragraph on indicators below.

The Drawings tab, a copy of whose display can be seen above, contains :
   - At the bottom : the box and whiskers representation (in green).
   - At the middle : a frequency diagram, graphic representation of the data input (in black), with its curve of sum totals (in blue).
   - At the top : a density diagram (in red), representing each class with a height H calculated as follows:
                   H = number of elements in the class / width of the class

When the mouse pointer is moved over the graphic area, the value under the pointer is displayed in the Data value zone located under the graphic area.

For quantification by custom thresholds, gray vertical bars representing the thresholds appear on the graphic area. You can move them thanks to the mouse to set manually the thresholds with the help of the drawings. Their moving is only allowed between the minimum and maximum values of the data.
Once a vertical bar is moved, the value associated to this threshold is automatically modified in the table of the Parameters tab, and conversely.
To refresh the density diagram (top graphic, in red) with the new thresholds, you must click on the Apply button.

The Center button enables the graphs to be re-centered if you have moved the drawings using the scroll bars.

If the window size is changed, the drawings are automatically adjusted to the new size.

Drawings :
Under each graphic, you will find a button : that creates a "graphical window" from current drawing. This window let you compare graphics, export them, or print them.

Types of data:
This module changes its type of output as a function of the data input type. As a default setting, it makes calculations on continuous data and supplies discrete data. But if you connect matrix data as an input, its output is a discrete matrix.

On the other hand, if the module already has output connections, its input type is fixed. To connect a matrix data input, it must first be disconnected from its sons.

Formulas used for the indicators:

Jenks' indicator (from its author's name) :

deviation of a class = absolute value of:
         (width / average) - (width / center)

     Jenks' indicator = (sum total of the deviations of each class) / number of classes

Tai indicator (Tabular Accuracy Index from Jenks also) :

Distance1 = sum total of the distances between the values and the average of the class in which they are found
Distance2 = sum total of the distances between the values and the overall average

     Tai indicator = 1 - Distance1 / Distance2

The closer the Tai indicator is to one, the better the quantification.
The closer the Jenks indicator is to zero, the better the quantification.
These indicators are more or less precise depending on the configuration of the classes. In particular, if values are integers, do not rely on them, rather look at the graphics.

Inter-group variance:

This indicator shows whether the classes obtained are similar or different.

Distance = sum total of the squares of the distances between the average of each class and the overall average of the data multiplied by the number of data inputs in the class

    Inter-group variance =  Distance / (number of classes - 1)

Intra-group variance:

This indicator shows whether the classes obtained are homogeneous or heterogeneous.

Distance = sum total of the squares of the distances between the values and the average of the class in which they are to be found

     Intra-group variance =  Distance / (number of data inputs - number of classes)

When comparing different quantifications, look carefully at these variances. You should try to increase the inter-group variance, and reduce the intra-group variance.

List of quantifications:

Standard:
Quantification is carried out in accordance with one of Gauss' laws on the values of the data input. The classes are centered on the arithmetical average. They have a width of one standard deviation by default, but you can choose to set two standard deviation as a width, or three standard deviation... thanks to the standard deviation factor. If the extreme classes are too small to contain the full range of the values in the data input, they are widened to cover the minimum and the maximum.

With a same value for the standard deviation factor, this quantification gives rise to only two types of distribution:
   - One if the number of classes is even.
   - Another if the number of classes is odd.

In fact, for example for 3, 5, 7, etc.classes with a standard deviation factor equal to 1, the central class is always centered on the average, and always has a width of one standard deviation.

This type of quantification is not suitable for relative data; for example, if you are mapping a percentage, the average of the data is not the average of the percentages.

Standard zero centered:
This quantification is identical to standard quantification, except that here the classes are centered on zero, and not on the arithmetical average. This enables a display to be made in warm and cold colors for the negative or positive classes.

Quantiles:
The classes are calculated to contain the same number of elements. The classes may have a slightly different number of elements depending on the distribution.
An example of a case where the number of elements to be distributed per class is not an integer:
  data values: 1 , 4 , 4 , 4 , 4 , 10
  number of classes: 4   => 1.5 elements per class
  this gives:
    class 1 = [1;4[ contains 2 elements
    class 2 = [4;4[ contains 1 element
    class 3 = [4;10[ contains 2 elements
    class 4 = [10;10] contains 1 element
Remark: class 1 contains the values 1 and 4, in spite of the limits shown.

Equal sizes:
The interval over which the data values are to be found is spread evenly throughout the various classes.

Recursive averages:
The limits of the classes are placed at the nested averages of the data values. For example, for 4 classes:
  - Classes 2 and 3 meet at the average of all the data.
  - Classes 1 and 2 meet at the average of the data in classes 1 and 2 combined.
  - Classes 3 and 4 meet at the average of the data in classes 3 and 4 combined.

This type of quantification is not suitable for relative data; for example, if you are mapping a percentage, the average of the data is not the average of the percentages.

Jenks:
Jenks type quantification is based on the notion of variance, i.e. the dispersion of the data input values around the average. Its purpose is to minimize intra-class variance (and thus to maximize homogeneity of the classes), and to maximize the inter-class variance (and thus to increase the difference between classes).

Custom:
The class thresholds are supplied by the user in the column provided for this purpose or are set manually thanks to the mouse on the drawings.

Principle shared by quantifications (except quantiles):
If several classes share the same limits, the data concerned are placed in the first class.
Example:
  data values: 12 , 12 , 15 , 15 , 19
  distribution:
    class 1 = [12;12[
    class 2 = [12;12[
    class 3 = [12;19]
  then:
    class 1 contains 2 elements
    class 2 is empty
    class 3 contains 3 elements

Attention:  the thresholds are stored using simple precision; this means that quantification cannot distinguish between values whose first eight figures are identical.

Script :

2      module untyped_list ""
3        mod_type integer "103"
3        mod_subtype integer "502"
3        mod_name string "Discretisation"
3        mod_dads integer_list ""
4          ? integer "4"
3        work_on_matrix boolean "F"
3        quant_type integer "802"
3        class_nb integer "4"
3        stddur_fact double "1"
3        user_thresholds double_list ""
4          ? double "0"
4          ? double "0"
4          ? double "0"
4          ? double "0"
4          ? double "0"
3        auto_thresholds double_list ""
4          ? double "17"
4          ? double "45"
4          ? double "65"
4          ? double "78"
4          ? double "95"
3        classes_count integer_list ""
4          ? integer "4"
4          ? integer "4"
4          ? integer "4"
4          ? integer "3"
3        jenks double "0.013676485"
3        tai double "0.63260762"
3        inter_grp double "2054.4944"
3        intra_grp double "75.113636"

Values for quant_type:
Custom                     801
Quantiles                  802
Equal sizes                803
Recursive averages         804
Standard                   805
Standard centered on zero  806
Jenks                      807

In the list of automatic thresholds, we can see that the minimum and maximum of the values
are also in the list. These values are obligatory, but they are recalculated by the module.
It is thus possible to add zeros.


Samples