**Probability Theory**

- see also:

- the product of n consecutive positive integers from n to 1 is n!
- n! = n(n-1)(n-2)(n-3)...3.2.1
- note: 0! is defined as 1.

**Permutation:**

- each of the ordered subsets which can be formed by selecting some or all of the elements of a set;
**the multiplication principle:**- if one operation can be performed in
*m*different ways, and when it has been performed in any of these ways, a second operation can then be performed in*n*different ways, the number of ways of performing the two operations is*m*x*n* - eg. if there are 3 different main courses & 4 different desserts, you have a choice of 3x4=12 different two course meals

- if one operation can be performed in
**nPr:**- the no. of arrangements of n different objects taken r at a time;
- = n!/(n-r)!;

**Arrangements in a circle:**- the no. of ways of arranging n different objects in a circle, regarding clockwise & anticlockwise as different:
- = (n-1)!;

**Arrangements of n objects in a row, when not all are different:**- if p alike of one kind, q alike of another kind, etc..
- = n!/(p!q!...);
- eg. how many ways can the letters
of the word mammal be rearranged to make different words?
- n = 6; p = 2 for letter a; q = 1 for letter l; r = 3 for letter m;
- thus = 6! / (2! x 1! x 3!)

**Arrangements with restrictions:**- restriction principle: always fill a restriction first
- eg. number of ways of arranging in a
row 6 men and 2 boys:
- if 2 boys must be together:
- regard the boys as 1 unit, thus 7 objects not 8
- thus 7! ways, but as the 2 boys can be arranged 2! ways amongst themselves => 2!7!

- if 2 boys must NOT be together:
- number arrangements without restriction = 8!
- => answer = 8! - number arrangements 2 boys are together => 8! - (2!7!)

- if there must be at least 3 men
separating the boys:
- calculate the number of ways
of arranging the boys alone:
- sum up number of arrangements for boy B when boy A is placed in each of the 8 positions = 20

- use multiplication principle to determine how the remaining 6 men can be arranged = 20 x 6!

- calculate the number of ways
of arranging the boys alone:

- if 2 boys must be together:

**Combination:**

- each of the subsets which can be formed by selecting some or all of the elements of the set without regard to the order in which the elements appear in the subset;
**nCr:**- the no. of combinations of n different objects taken r at a time;
- = nPr/r! = n!/[r!(n-r)!];

**Mutually exclusive
operations:**

- when the selection of one object eliminates the possibility of it being selected again in that arrangement;
- if two operations are mutually exclusive then the no.of arrangements possible with each are added (not multiplied) to obtain the total no. of possible arrangements;
- ie. intersection A & B is a null set,
- if two or more events cannot occur same time, Pr(A or B) = Pr(A) + Pr(B); (addition principle)

**Event:** a set of favourable
outcomes;

**Trial:** eg. the tossing of a
die;

**Sample space:** E = all possible
outcomes;

**Probability of outcomes
corresponding to A events:**

- assume all sample points equally likely,
- Pr(A) = [no. outcomes A]/[total no. possible outcomes];
- Pr(A or B) = Pr(A) + (Pr(B) - Pr(A&B);
- thus, the probability of drawing an Ace or a Heart from a pack of cards = 1/13 + 1/4 - 1/52 = 16/52 = 4/13

**Independent events:**

- A & B are independent
if:
- Pr(A&B) = Pr(A).Pr(B);

**Conditional Probability:**

- Pr(B) given A = Pr(B/A) = Pr(A&B)/Pr(A);
- if A,B are independent,
then:
- Pr(B/A) = [Pr(A).Pr(B)/Pr(A)] = Pr(B);

**Statistics:**

- see also: Statistics
- 2 types of variables:
- continuous (eg. height);
- discrete (eg. no. of peas);

- Population: the group of items/individuals;
- Sample range: that part of pop. measured;
- Class intervals: subdivisions of the sample range into classes;
- Class frequency: no. observations in each class;
- Mode: most frequent variable;
- Quantile: a value of the
variable below which falls a given % of the frequency:
- 25% quantile = lower quartile = Q1;
- 75% quantile = upper quartile = Q3;

- Semi-interquartile range(d) = 0.5(Q3-Q1);
- Median: 50% quantile;
- Arithmetic mean: the
average of a set of observations;
- = (Sum of x)/n;

- Variance(s
^{2}): - Standard deviation(s):
- Standard score(z):
eliminates scales, but standardises variable wrt mean
& s;
- = (x - mean)/s;

**Correlation coefficient(r):**

- the degree of assoc. between variables;
- r=1, then positively assoc. linear relationship;
- r=0, no relationship;
- r=-1, then negatively assoc. linear relationship;
- Does not allow for
non-linear associations & is unduly influenced by
extreme observations;
- = (1/(N-1))(sum z(x).z(y));

**Rank correlation coefficient(r'):**

- a better measure of degree
of assoc. with less influence from extremes & some
measure of non-linear relationship, but need to rank all
observations:
- => u = rank(x), v = rank(y), u,v E 1,2,3,...
- r'(x,y) = r(u,v) = s(u,v)/[SQR(s(uu).s(vv))];

**Sampling Distibution:**

- if x is the mean of the sample, s' is the sd of the pop., n is the no. items in the sample, the standard error of the sampling distribution
- = s'/SQR(n);
- u (the mean of pop.) almost certainly lies b/n x +/- 3s'/SQR(n);
- u(x) = u, s'(x) = s'/SQR(n);

**Central Limit Theorem:**

- the sampling distibution of the sample means becomes more normal the greater the sample space is and the more normal distribution the pop. is;

**Probability Distribution
Curves:**

- see also: Statistics
- curve = f(x),
- Integral f(x) from -infinity to infinity = 1,
**Normal Distribution:**- f(x) = [1/(s'.SQR(2pi))]e^[-0.5((x-u)/s')^2,
- where:
- u = mean value of x in pop.
- s'= stand. dev. of x in pop.

- Standard Normal
Curve:
- z = [1/(SQR(2pi))]e^[-0.5t^2],
- where t = (x-u)/s'; z = s'f(x);
- has a mean of zero, sd of 1;

**Cauchy Distribution:**- f(x) = (1/pi)[a/(a^2+x^2)], x E R;

**Exponential Distribution:**- f(x) = ke^(-kx) for x >= 0,
- = 0 for x<0,

**Binomial Distribution:**- when to use:
- when the same trial is repeated several times & there are only 2 possible outcomes in each trial either one of which must occur;

- variables:
- n = no. of independent trials;
- p = prob. success;
- q = prob. failure;

- u = mean = np;
- s'= sd = SQR(npq);
- x = no. successes in n independent trials;
- prob.(X=x) = nCr(n#x)(q^(n-x))p^x
- = prob. exactly x successes;

- Normal
approximation:
- can approx. to normal distrib. using u,s'
- if n>30, p>0.1;

- when to use:
**Hypergeometric Distribution:**- when to use:
- same as for binomial except there is sampling without replacement;

- variables:
- N = size of pop.;
- n = size of sample;
- D = no. of kind A in pop.;
- N-D = no of kind B in pop.;
- X = no. of kind A in sample;

- Pr(X=x) = nCr(D#x).nCr(N-D#n-x)/nCr(N#n)
- = Pr(sample contains x of kind A),

- u = nD/N
- s'^2 = nD(1-D/N)(N-n)/[N(N-1)],
- if N is very large, & n small, can approx. to binomial distribution;

- when to use:
**Poisson distribution:**- when to use:
- as an approx. to binomial distrib. when p is very small and n is very large;
- when the no. of times an event occurs can be counted but there is no upper limit to the no. of times it may occur;

- Equation:
- Pr(X=x) = e^(-u) * u^x / x!,

- Eg. On average
there are 2.5 cars per quarter-hour at a petrol
station, what is the prob. that during a
particular quarter-hour there will be some cars
at the petrol station?
- u = 2.5, Pr(X>=1) = 1 - Pr(X=0),
- Pr(X=0) = e^(-2.5) => Pr(X>=1) = 0.92;

- Normal Approx.:
- u = u, s' = SQR(u),
- Good approx. if u is large;
- Eg. On
average, there are 20 people asking for
an item each week, what is the minimum
no. of items the store must have in stock
each week to be almost certain of not
having to refuse demand for this item?
- u = 20, s' = SQR(20),
- Min. no. items = u + 3s' = 34;

- when to use:

**Hypothesis testing:**

- Null hypothesis(H0): that there is no effect of one variable on another;
- Alternative hypothesis(H1): that there is an effect;
- H1 is likely to be true if the results are very unlikely to have been obtained if H0 were true;
- Significance level:
- p = Pr(obtaining obs. as extreme as the ones obtained if H0 were true),

- Type I error:
- a = Pr(deciding to reject H0 when H0 true);

- Type II error:
- b = Pr(accepting H0 when H1 is true);
- if p <= a, then reject H0,
- if p > a, then accept H0,
- Usually a is designated 0.05;

- Z-test:
- If H0 is a
standard distribution:
- p =Pr(z <= (x-u0)(SQRn)/s') or Pr(z >= (x-u0)(SQRn)/s')
- (x < u0) (x > u0)
- where, u0,s' are mean, sd of H0 true, n = size of sample, x = mean of sample;
- the pop. should be approx. of normal distrib., if it is skewed, a large sample is necessary for the z-test to be true;

- If H0 is a
standard distribution:
- t-test:
- used instead of z-test if pop. sd. unknown;
- need to use t-tables;
- as for z-test,
but:
- t = (x-u0)SQRn/s, s = sd of sample,
- no. of degrees of freedom(df) = n-1;
- t-test of
u1=u2 comparing 2 independent samples:
- eg.
treated vs control group;
- t = (x1-x2)/[s(1@2)SQR[(1/n1)+(1/n2)]],
- df = n1 + n2 - 2,
- s(1@2)^2 = [(n1-1)s1^2 + (n2-1)s2^2]/(n1 + n2 - 2),

- eg.
treated vs control group;
- t-test of
u1=u2 comparing matched pairs:
- eg.
twins;
- t = d'SQRn/sd,
- df = n-1, ud = 0 if H0 is true,
- d' = (1/n)(sum d) = mean of the differences b/n each pr
- sd^2 = (1/(n-1))(sum (d^2) - [((sum d)^2)/n],
- d = difference b/n a pair,
- sd = stand. dev. of all d,
- n = no. of matched pairs;

- eg.
twins;

- Chi-squared test of
goodness of fit:
- used to test hypothesis concerning the proportions of the pop. in each of the categories;
- k = no. of categories in the pop.;
- n = no. of random samples from pop.;
- Cx = category x;
- Ox = observed freq. of sample in Cx;
- Ex = expected freq. of sample in Cx if H0 true;
- Px = proportion of pop. in Cx;
- Ex = nPx, sum(x=1 to k) Px = 1, &o^2 = sum[((Ox-Ex)^2)/Ex], if H0 true, then &o^2 (Chi-squared) small, if H1 true, then &o^2 large;
- p = Pr(&^2 > &o^2);

- Row by Column contingency
tables:
- r = no. of rows, c = no. of columns,
- Ex = expected
freq. assuming independence,
- = row total * column total / grand total,

- df = (r-1)(c-1),
- &o^2 = sum[((Ox-Ex)^2)/Ex], if H0 true, provided Ex > 5 for at least 80% of categ.