Unit
ProbFun/ProbF87
Probability
distribution functions for statistical calculations
Copyright
1990 by J. W. Rider
This
unit uses the math and specfun units (available separately)
for
defining float types, and the beta-, erf-, and gamma-related
functions
This
unit is not intended to be a self-contained tutorial in
probability. The probability cumulative distribution
functions
(cdf's)
are provided with the caveat that they only work with the
correct
inputs. The "probability"
returned assumes that the
"null
hypothesis" (that some number is a random variate with a
particular
probability distribution) is true.
Based upon the
returned
probability, you can determine at what level you want to
accept
or reject the "null hypothesis".
This
unit provide "probability distributions" rather than
"statistical
routines". They will not help you
compute
statistics. However, they will tell you what the
probability
of
computing a statistic from a particular distribution will be.
Because
of the number of probability distributions available, I
have
tried to adopt a consistent (but not standard anywhere else)
way to
describing the functions. Each
distribution has a base
"prefix"
to which is appended either "CDF", "INV", or
"PF".
"CDF"
indicates that the function is a "cumulative distribution
function". Where possible, I've strived to make this
consistently
the probability that a random variate will be less
than or
equal to ("<=") the "x" argument. "INV" indicates an
"inverse
cumulative distribution function" which takes a
probability
as the argument and returns an "x" for which the
"CDF"
would yield the given probability.
"PF" indicates a
"probability
density function" which is the derivative of the
"CDF". (Or, inversely, the "CDF" is the
integral from minus
infinity
to "x" of the "PF".)
Not all
distributions have a complete set of functions defined.
Probability
function supplied for:
PROB DIST PREFIX
CDF PD INV
type
Beta beta- y
y cnts
Binomial bin-
y y disc
Cauchy cauchy-
y y y
cnts
Chi-square chs- y y cnts
Double-Exponential
(Laplacian) dx- y y
y cnts
Error erf-
y y cnts
Exponential x- y y
y cnts
Snedecor's F f- y y cnts
Gamma gam-
y y cnts
Gaussian (Normal) g- y y cnts
Geometric geo-
y disc
Hypergeometric hgeo-
y disc
Kolmogorov-Smirnov's
D
ks- y cnts
Maxwell maxwell- cnts
Pascal
(Negative Binomial) pas- y
y disc
Poisson poi-
y y disc
Rayleigh ray- cnts
Student's T t- y y cnts
Uniform
(Rectangular) u- y y
y cnts
In most
of references cited at the end of this document, some
sort of
mathematical expression is provided for any particular
distribution. Unfortunately, this is often insufficient to
determine
when the practitioner should use a particular
distribution. Knowing precisely what circumstances yield
random
variates
that follow a particular distribution can be especially
fruitful
in determining which hypotheses can be tested.
Consequently,
I have strived to explain such in the descriptions
that
follow.
The
concept of a "Bernoulli trial" occurs frequently in
relationship
with probability distributions.
Briefly, a
Bernoulli
trial has a fixed probability of success without regard
to when
it is tried, and any one Bernoulli trial is independent
of all
others.
BETA.
The beta distribution is continuous distribution related to
the
binomial distribution. Mathematically,
the Beta distribution
describes
the distribution of the probability of success of "n"
Bernoulli
trials given "s" successes, rather than the
distribution
of the number of successes out of "n" Bernoulli
trials
with a given probablity "p". If X1 and X2 are independent
chi-square
random variates with "degrees of freedom" v1 and v2
respectively,
then the expression "X1/(X1+X2)" will follow a beta
distribution
with parameters v1/2 and v2/2.
Limits:
0 <= x <= 1
0 < dof1,dof2
function
betapdf(x,dof1,dof2:xfloat):xfloat
function
betapf(x,dof1,dof2:xfloat):xfloat
BINOMIAL.
The binomial distribution describes the number of
successes
out of a specific number of Bernoulli trials.
The CDF returns the probability of
less than "k"
successes
out of "n" trials with probability "p" of success per
trial. Low values indicate that "p" is
likely too big; high
values,
"p" too small, for less than "k" events out of
"n".
Limits:
0 <= k <= n
0 <= p <= 1
function
bincdf(k,n,p:xfloat):xfloat
function
binpf(k,n,p:xfloat):xfloat
CAUCHY.
The continuous Cauchy distribution is peculiar in the
sense
that it has no well-defined mean or variance.
However, it
arises
in some physical phenomena.
function
cauchycdf(x:xfloat):xfloat
function
cauchyinv(prob:xfloat):xfloat
function
cauchypf(x:xfloat):xfloat
CHI-SQUARE.
If X1, X2, ..., XN are independent gaussian random
variates
of zero mean and unit variance, then the sum of the
squares
of the variates will follow a continuous Chi-square
distribution
with "N-1" degrees of freedom.
The CDF returns the probability that
an observed
chi-square
statistic will be less than "chs" with "dof"
degrees-of-freedom. Low values indicate "cooked" or
"biased"
experimentation. High values indicate significant differences
between
model predictions and experimental outcomes.
Limits:
0 < chs,dof
function
chscdf(chs,dof:xfloat):xfloat
function
chspf(chs,dof:xfloat):xfloat
DOUBLE-EXPONENTIAL.
The double-exponential or Laplacian is a
continous
distribution that is a double-ended version of the
exponential.
function
dxcdf(x:xfloat):xfloat
function
dxinv(prob:xfloat):xfloat
function
dxpf(x:xfloat):xfloat
ERROR. The distribution of the absolute values of a
gaussian
variates
(with zero mean and unit variance). For
positive x,
this is
the same as the "error function" (ERF) defined in the
SpecFun
unit. There is a difference. ERF is an "odd" function
in that
"ERF(-x)=-ERF(x)". For
negative values of "x", the CDF
and PF
functions here are strictly zero.
Limits:
0 <= x
function
erfcdf(x:xfloat):xfloat
function
erfpf(x:xfloat):xfloat
EXPONENTIAL. The continuous exponential distribution
describes
the
intervals between Poisson events.
The CDF returns the probability that
an observed
exponential
deviate (mean 1) will be less than "x". Another
easy-to-understand
function, not as useless as "ucdf".
Limits:
0 <= x
function
xcdf(x:float):float
function
xinv(prob:xfloat):xfloat
function
xpf(x:xfloat):xfloat
SNEDECOR'S
F. If X1 and X2 are independent
chi-square variates
with v1
and v2 degrees of freedom, then the expression (the
"F-ratio")
(X1/v1)/(X2/v2) follows an F-distribution.
The CDF returns the probability that
an observed F-ratio
will be
less than "f" with "dof1" and "dof2" degreess of
freedom.
Low and
high values indicate significant differences between two
sample
variances.
Limits:
0 < f,dof1,dof2
function
fcdf(f,dof1,dof2:xfloat):xfloat
function
fpf(f,dof1,dof2:xfloat):xfloat
GAMMA.
Limits: 0 <= x
0 < p < 1
function
gamcdf(x,p:xfloat):xfloat
function
gampf(x,p:xfloat):xfloat
GAUSSIAN. Gaussian (Normal) sum of many small
variates.
The CDF returns the probability that a
random gaussian
deviate
(mean 0, var 1) will be less than "x". Another
easy-to-understand
function, and quite useful considering the
number
of ways that the gaussian distribution arises.
function
gcdf(x:xfloat):xfloat
function
gpf(x:xfloat):xfloat
GEOMETRIC.
Interval between Bernoulli successes, or number of
trials
until first success.
function
geopf(x,p:xfloat):xfloat
HYPERGEOMETRIC. This is perhaps the most primitive of the
probability
distributions in this collection. In a
finite
population
"Npop" of items there is a specific number of "T" of
items
of interest. Examine "Nsamp"
of the population items
(sampled
without replacement). The number of
items of interest
in the
sample follows a Hypergeometric distribution.
Limits: 0 <= x <= min(Nsamp,T)
0 <= Nsamp,T <= Npop
function
hgeopf(x,Nsamp,T,Npop:xfloat):xfloat
KOLMOGOROV-SMIRNOV
D.
The CDF returns probability that the
observed D-statistic
will be
less than "d". High values indicate significant
difference
between source distributions.
function
kscdf(d,dof:xfloat):xfloat; { NR calls this "PROBKS" }
MAXWELL. If X1, X2, and X3 are independent, gaussian
random
variates
with zero mean and unit variance, then sqrt( sqr(X1) +
sqr(X2)
+ sqr(X3)) has a Maxwell distribution.
This distribution
arises
with three dimensional applications with "spherical error
probabilities".
Limits:
0 <= x
PASCAL.
(Negative Binomial) The distribution of
failures in a
run of
Bernoulli trials that have exactly "n" successes where the
probability
of success of each trial is "p".
Limits:
0 <= x
0 < p < 1
function
pascdf(x,n,p:xfloat):xfloat
function
paspf(x,n,p:xfloat):xfloat
POISSON.
Poisson is a limiting case of the binomial distribution
as the
probability of each individual Bernoulli event goes to
zero,
and the number of trials goes to infinity, but the expected
number
of events remains constant.
The CDF returns the probability that a
Poisson (mean
"mu")
random event will be less than "k" (that is, 0 to k-1).
Low
values indicate that "mu" is too high; high, "mu" too low.
Domain: 0 <= k
0 < mu
function
poicdf(k,mu:xfloat):xfloat
function
poipf(k,mu:xfloat):xfloat
RAYLEIGH. If X1 and X2 are independent gaussian random
variates
with
zero mean and unit variance, then sqrt( sqr(X1) + sqr(X2))
has a
Rayleigh distribution. This
distribution arises in two
dimensional
applications with "circular error probabilities".
Limits: 0 <= x
STUDENT'S
T. Student's T distribution of sample means drawn from
a
normal distribution.
The CDF returns the probability that an observed
t-statistic
will be greater than "t" (or less than "-t") with
"dof"
degrees of freedom. Two-tail test. Low values indicate
significant
differences between sample means.
function
tcdf(t,dof:xfloat):xfloat
function
tpf(t,dof:xfloat):xfloat
UNIFORM.
(Rectangular) The trivial "uniform" probability
distribution
function.
The CDF returns the probability that
an observed uniform
deviate
between 0 and 1 will be less than "x". Not particularly
useful,
but provided because the distribution is easy to
understand.
Limits: 0 <= x <= 1
function
ucdf(x:xfloat):xfloat
function
uinv(prob:xfloat):xfloat
function
upf(x:xfloat):xfloat
References:
[HMF] Abramowitz and Stegun, Handbooks of
Mathmetical Functions,
Government Printing Office. (also
available as a Dover
reprint)
[HMS] Beyer, Handbook of Mathematical Sciences,
CRC Press.
[BST] Beyer, Basic Statistical Tables, CRC Press.
[SNA] Knuth, Semi-numerical Algorithms.
[FFP] Menzel, Fundamental Formulas of Physics,
Dover reprint.
[HAM] Pearson, Handbook of Applied Mathematics,
Van Nostrand
Reinhold.
[NR] Press, et al., Numerical Recipes,
Cambridge.