G01 Class
関数リスト一覧   NagLibrary Namespaceへ  ライブラリイントロダクション  本ヘルプドキュメントのchm形式版

This chapter covers three topics:
  • plots, descriptive statistics, and exploratory data analysis;
  • statistical distribution functions and their inverses;
  • testing for Normality and other distributions.

Syntax

C#
public static class G01
Visual Basic (Declaration)
Public NotInheritable Class G01
Visual C++
public ref class G01 abstract sealed
F#
[<AbstractClassAttribute>]
[<SealedAttribute>]
type G01 =  class end

Background to the Problems

Plots, Descriptive Statistics and Exploratory Data Analysis

Statistical Distribution Functions and Their Inverses

Statistical distributions are commonly used in three problems:
  • evaluation of probabilities and expected frequencies for a distribution model;
  • testing of hypotheses about the variables being observed;
  • evaluation of confidence limits for parameters of fitted model, for example the mean of a Normal distribution.
Random variables can be either discrete (i.e., they can take only a limited number of values) or continuous (i.e., can take any value in a given range). However, for a large sample from a discrete distribution an approximation by a continuous distribution, usually the Normal distribution, can be used. Distributions commonly used as a model for discrete random variables are the binomial, hypergeometric, and Poisson distributions. The binomial distribution arises when there is a fixed probability of a selected outcome as in sampling with replacement, the hypergeometric distribution is used in sampling from a finite population without replacement, and the Poisson distribution is often used to model counts.
Distributions commonly used as a model for continuous random variables are the Normal, gamma, and beta distributions. The Normal is a symmetric distribution whereas the gamma is skewed and only appropriate for non-negative values. The beta is for variables in the range 0,1 and may take many different shapes. For circular data, the ‘equivalent’ to the Normal distribution is the von Mises distribution. The assumption of the Normal distribution leads to procedures for testing and interval estimation based on the χ2, F (variance ratio), and Student's t-distributions.
In the hypothesis testing situation, a statistic X with known distribution under the null hypothesis is evaluated, and the probability α of observing such a value or one more ‘extreme’ value is found. This probability (the significance) is usually then compared with a preassigned value (the significance level of the test), to decide whether the null hypothesis can be rejected in favour of an alternate hypothesis on the basis of the sample values. Many tests make use of those distributions derived from the Normal distribution as listed above, but for some tests specific distributions such as the Studentized range distribution and the distribution of the Durbin–Watson test have been derived. Nonparametric tests as given in G08 (not in this release), such as the Kolmogorov–Smirnov test, often use statistics with distributions specific to the test. The probability that the null hypothesis will be rejected when the simple alternate hypothesis is true (the power of the test) can be found from the noncentral distribution.
The confidence interval problem requires the inverse calculation. In other words, given a probability α, the value x is to be found, such that the probability that a value not exceeding x is observed is equal to α. A confidence interval of size 1-2α, for the quantity of interest, can then be computed as a function of x and the sample values.
The required statistics for either testing hypotheses or constructing confidence intervals can be computed with the aid of methods in this chapter, and G02 class (Regression), G04 (not in this release) (Analysis of Designed Experiments), G13 class (Time Series), and E04 class (Nonlinear Least-squares Problems).
Pseudo-random numbers from many statistical distributions can be generated by methods in G05 class.

Testing for Normality and Other Distributions

Methods of checking that observations (or residuals from a model) come from a specified distribution, for example, the Normal distribution, are often based on order statistics. Graphical methods include the use of probability plots. These can be either P-P plots (probability–probability plots), in which the empirical probabilities are plotted against the theoretical probabilities for the distribution, or Q-Q plots (quantile–quantile plots), in which the sample points are plotted against the theoretical quantiles. Q-Q plots are more common, partly because they are invariant to differences in scale and location. In either case if the observations come from the specified distribution then the plotted points should roughly lie on a straight line.
If yi is the ith smallest observation from a sample of size n (i.e., the ith order statistic) then in a Q-Q plot for a distribution with cumulative distribution function F, the value yi is plotted against xi, where Fxi=i-α/n-2α+1, a common value of α being 12 . For the Normal distribution, the Q-Q plot is known as a Normal probability plot.
The values xi used in Q-Q plots can be regarded as approximations to the expected values of the order statistics. For a sample from a Normal distribution the expected values of the order statistics are known as Normal scores and for an exponential distribution they are known as Savage scores.
An alternative approach to probability plots are the more formal tests. A test for Normality is the Shapiro and Wilk's W Test, which uses Normal scores. Other tests are the χ2 goodness of fit test and the Kolmogorov–Smirnov test; both can be found in G08 (not in this release).

Distribution of Quadratic Forms

Energy Loss Distributions

Recommendations on Choice and Use of Available Methods

Descriptive statistics / Exploratory analysis: 
    plots: 
        box and whisker g01as
        histogram g01aj
        Normal probability (Q − Q) plot g01ah
        scatter plot g01ag
        stem and leaf g01ar
    summaries: 
        frequency / contigency table, 
            one variable g01ae
            two variables, with  χ 2 and Fisher's exact test g01af
        mean, variance, skewness, kurtosis (one variable), 
            from frequency table g01ad
            from raw data g01aa
        mean, variance, sums of squares and products (two variables) g01ab
        median, hinges / quartiles, minimum, maximum g01al
        quantiles: 
            approximate: 
                large arbitrary-sized data stream g01ap
                large fixed-sized data stream g01an
            unordered vector g01am
Distributions: 
    Beta: 
        central: 
            deviates g01fe
            probabilities and probability density function g01ee
        non-central: 
            probabilities g01ge
    binomial: 
        distribution function g01bj
    Durbin–Watson statistic: 
        probabilities g01ep
    Energy loss distributions: 
        Landau: 
            density g01mt
            derivative of density g01rt
            distribution g01et
            first moment g01pt
            inverse distribution g01ft
            second moment g01qt
        Vavilov: 
            density g01mu
            distribution g01eu
            initialization g01zu
    F: 
        central: 
            deviates g01fd
            probabilities g01ed
        non-central: 
            probabilities g01gd
    gamma: 
        deviates g01ff
        probabilities g01ef
        probability density function g01kf
    Hypergeometeric: 
        distribution function g01bl
    Kolomogorov–Smirnov: 
        probabilities: 
            one-sample g01ey
            two-sample g01ez
    Normal: 
        bivariate: 
            probabilities g01ha
        multivariate: 
            probabilities g01hb
            quadratic forms: 
                cumulants and moments g01na
                moments of ratios g01nb
        univariate: 
            deviates g01fa
            probabilities g01ea
            probability density function g01ka
            reciprocal of Mill's Ratio g01mb
            Shapiro and Wilk's test for Normality g01dd
    Poisson: 
        distribution function g01bk
    Student's t: 
        central: 
            deviates g01fb
            probabilities g01eb
        non-central: 
            probabilities g01gb
    Studentized range statistic: 
        deviates g01fm
        probabilities g01em
    von Mises: 
        probabilities g01er
     χ 2: 
        central: 
            deviates g01fc
            probabilities g01ec
            probability of linear combination g01jd
        non-central: 
            probabilities g01gc
            probability of linear combination g01jc
Scores: 
    Normal scores, ranks or exponential (Savage) scores g01dh
    Normal scores: 
        accurate g01da
        approximate g01db
        variance-covariance matrix g01dc
Note:  the Student's t, χ2, and F methods do not aim to achieve a high degree of accuracy, only about four or five significant figures, but this should be quite sufficient for hypothesis testing. However, both the Student's t and the F-distributions can be transformed to a beta distribution and the χ2-distribution can be transformed to a gamma distribution, so a higher accuracy can be obtained by calls to the gamma or beta methods.
Note:  g01dh computes either ranks, approximations to the Normal scores, Normal, or Savage scores for a given sample. g01dh also gives you control over how it handles tied observations. g01da computes the Normal scores for a given sample size to a requested accuracy; the scores are returned in ascending order. g01da can be used if either high accuracy is required or if Normal scores are required for many samples of the same size, in which case you will have to sort the data or scores.

References

Inheritance Hierarchy

See Also