hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_stat_plot_stem_leaf (g01ar)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of observations.

Syntax

[unit, lines, ifail, plot, sorty] = g01ar(y, nstepx, nstepy, 'range', range, 'prt', prt, 'n', n, 'unit', unit)
[unit, lines, ifail, plot, sorty] = nag_stat_plot_stem_leaf(y, nstepx, nstepy, 'range', range, 'prt', prt, 'n', n, 'unit', unit)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: range was made optional (default 'E'); prt was made optional (default 'P'); unit was made optional (default 0); output parameters were reordered

Description

nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of n observations. The stem and leaf display shows data values separated into the form of a ‘stem’ and a ‘leaf’. For example, a value of 473 could be represented as 47 3 where the stem is 47 and the leaf is 3. The data is scaled using a value known as the ‘leaf digit unit’. In the above example the leaf digit unit would be 1.0.
The following example illustrates a stem and leaf display.
For the 10 observations:
1.8 2.3 2.1 1.9 2.1 2.4 2.0 2.0 1.9 2.1  
the stem and leaf display is:
1  1  8
3  1  99
5  2  00
5  2  111
2  2
2  2  3
1  2  4
where the leaf digit unit is 0.1 so that 1 8 represents 1.8 (i.e., 18×0.1). The leaf digit unit distinguishes between the numbers 18.0, 1.8, 0.18, etc. which may otherwise all be represented by 1 8.
Included in the above display is an initial column specifying the cumulative count of values, up to and including that particular line, from either the top or bottom of the display, whichever is smaller. An exception to this is when the line on which the median lies is reached, in which case the actual count of values on that line is displayed, rather than a cumulative count, and this is highlighted by enclosing the count in parentheses. In this case the median is 2.05 and thus falls between the two lines at which the cumulative count has reached n/2 where n is the number of observations.
Some of the other features of the stem and leaf display are illustrated by the following two examples.
For the 30 observations:
-19.0 -3.0 -1.0 0.0 1.0 2.0 2.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 7.0 7.0 8.0 10.0 11.0 11.0 13.0 31.0  
the stem and leaf display may be:
 1   1.  9
 1   1*
 1  -0.
 3  -0*  13
15  +0*  012233344444
15  +0.  55556667788
 5   1*  011
 2   1.  3
 1   2
 1   2.
 1   3   1
In the above display all the data are plotted and the leaf digit unit is 1.0. Also in this display different leaves, that is different digits, may be plotted on a particular line. In this case we have 5 possible digits per line, that is 2 lines per stem, and these are represented as follows:
Alternatively the stem and leaf display may look like:
      LO   -19

  2   -0*  3
  3   +0T  1
  5   +0*  01
 10   +0T  22333
( 9)  +0F  444445555
 11   +0*  66677
  6   +0T  8
  5    1*  011
  2    1T  3

      HI   31
Again the leaf digit unit is 1.0 but in this display just the data between the fences, which are the hinges ±112× the inter-hinge range, are plotted. Any data points that fall outside the fences are presented separately in the display under the headings LO for those points below the lower fence and HI for those points above the upper fence.
Again in this display different leaves, that is different digits, may be plotted on a particular line. However in this case we have 2 possible digits per line, that is 5 lines per stem, and these are represented as follows
A display may also allow 10 different digits (0 to 9) per line, that is 1 line per stem, or just 1 digit per line, that is 10 lines per stem, as in the first of the three examples above.
Note that the median here is 4.5. This falls between two lines in the first display but is highlighted on the second display since it lies on a particular line.
Finally if there are positive and negative numbers on the display these are highlighted by a + or - sign where the distinction is required, that is near the zero-point.
If there are too many leaves to fit in the plot width allowed, nag_stat_plot_stem_leaf (g01ar) plots as many leaves as possible and places an asterisk to the right to indicate that some leaves are not displayed. If this occurs and you wish to be able to plot all the leaves then the width of the plot may be adjusted.
Options also allow the leaf unit and the height of the display to be specified by you or calculated by nag_stat_plot_stem_leaf (g01ar). These arguments may be used to control the type of the display you wish to obtain. Fixing the unit and changing the height of the display may alter the number of lines used per stem, that is the number of different digits per line. nag_stat_plot_stem_leaf (g01ar) will choose a display for the fixed unit that attempts to make as much use of the available height as possible, thus increasing the height may allow for more lines per stem whereas decreasing the height may force the display to use fewer lines per stem. Similarly you may wish to fix the height and vary the leaf digit unit used on the display. See Further Comments for further details.
The display is returned in a character array with the option of printing the display.

References

Erickson B H and Nosanchuk T A (1985) Understanding Data Open University Press, Milton Keynes
Tukey J W (1977) Exploratory Data Analysis Addison–Wesley
Velleman P F and Hoaglin D C (1981) Applications, Basics, and Computing of Exploratory Data Analysis Duxbury Press, Boston, MA

Parameters

Compulsory Input Parameters

1:     yn – double array
The n observations.
2:     nstepx int64int32nag_int scalar
The number of character positions to be plotted horizontally.
Constraint: nstepx35.
3:     nstepy int64int32nag_int scalar
The maximum number of character positions to be plotted vertically.
If nstepy0 a suitable value will be used by nag_stat_plot_stem_leaf (g01ar) for the number of character positions to be plotted vertically. This will clearly be less than or equal to the value of ldplot.
Constraint: nstepy0 or nstepy5.

Optional Input Parameters

1:     range – string (length ≥ 1)
Default: 'E'
Indicates whether you wish to scale the plot to the extremes of the data or to the fences.
range='E'
The display is a plot to the extremes, that is a plot of all the data.
range='F'
The display is a plot of the data between the fences.
Constraint: range='E' or 'F'.
2:     prt – string (length ≥ 1)
Default: 'P'
Indicates whether the stem and leaf display is to be output to an external file.
prt='N'
The display is not output to an external file.
prt='P'
The display is output to the current advisory message unit as defined by nag_file_set_unit_advisory (x04ab). Only the first 132 characters of each line are actually printed.
Constraint: prt='P' or 'N'.
3:     n int64int32nag_int scalar
Default: the dimension of the array y.
n, the number of observations.
Constraint: n2.
4:     unit – double scalar
Default: 0
Indicates the leaf digit unit to be used.
If unit>0.0 and is not a power of ten, it will be converted to the nearest power of ten below the input value for unit.
If unit0.0, the optimum unit will be used. This is based on the range of the data to be plotted and the number of lines available for the display.

Output Parameters

1:     unit – double scalar
Default: 0
Contains the actual unit used in the stem and leaf display.
2:     lines int64int32nag_int scalar
The actual number of lines needed for the display.
3:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).
4:     plotldplotnstepx – cell array of strings
The stem and leaf display.
5:     sortyn – double array
The observations sorted into ascending order.

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
On entry,n<2,
ornstepx<35,
or0<nstepy<5,
orldplot<5,
orldplot<nstepy.
   ifail=2
On entry,prt'P' or 'N',
orrange'E' or 'F'.
   ifail=3
The number of lines needed to produce the display exceeds the maximum number of lines allowed. You may wish to increase nstepy.
   ifail=4
One of the observations is too large and causes a value to exceed the maximum integer allowed.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

Accuracy is limited by the number of significant figures that may be represented on the display which will depend on the data, the number of lines available and the unit used.

Further Comments

nag_stat_plot_stem_leaf (g01ar) uses integer representations of the data. If very large data values are being used they should be scaled before using this function. The largest integer can be found by calling nag_machine_integer_max (x02bb).
If an asterisk is plotted at the end of a line to indicate that some leaves are not displayed you should increase nstepx if they wish to be able to print the rest of the leaves on that line.
Note that if you request nag_stat_plot_stem_leaf (g01ar) to print the plot only the first 132 characters of each line are printed. The full plot is stored in the array plot so you do have the option of printing a plot which has more than 132 characters on a line.
When the leaf digit unit is set, the number of lines per stem is decided as follows:
Let r be the range of the data to be plotted:
Let l be the number of lines available for the plot:
Let e= r/unit+1l ,
The time taken by the function increases with n.

Example

A program to produce two stem and leaf displays for a sample of 30 observations. The first illustrates a plot produced automatically by nag_stat_plot_stem_leaf (g01ar) and the second shows how to print the display under your control.
function g01ar_example


fprintf('g01ar example results\n\n');

y = [31;  1;   2;   3;    4;     5;     6;     7;     8;    -9;
          1;   2;   3;    4;     5;     6;     7;     8;
               2;   3;    4;     5;     6;     7;
                    3;    4;     5;     6;
                          4;     5];
nstepx = int64(72);
nstepy = int64(20);

[unit, lines, ifail, plot, sorty] = ...
  g01ar( ...
	 y, nstepx, nstepy, 'range', 'Fences');

[unit, lines, ifail, plot, sorty] = ...
  g01ar( ...
	 y, nstepx, nstepy, 'range', 'Extremes', 'prt', 'Noprint');

fprintf('\n');
for i = 1:lines
  fprintf('%s\n', char(plot(i,1:nstepx)));
end


g01ar example results

 Stem-and-leaf display
 Leaf digit unit = 1.0
 1  2  represents  12.

       LO  -9

   3    0  11
   6    0  222
  10    0  3333
  15    0  44444
  15    0  55555
  10    0  6666
   6    0  777
   3    0  88

       HI  31

Stem-and-leaf display                                                   
Leaf digit unit = 1.0                                                   
1  2  represents  12.                                                   
                                                                        
  1    -0. 9                                                            
  1    -0*                                                              
 15    +0* 11222333344444                                               
 15    +0. 55555666677788                                               
  1     1*                                                              
  1     1.                                                              
  1     2*                                                              
  1     2.                                                              
  1     3* 1                                                            

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015