Ooodles of old stuff including How-To's, Writers Resources, Directories of all Types, Technology Reviews, Health Information, Marketing Statisitcs, Affluent Markets, Hospital and Medical Market Data and more

Monday, September 04, 2006

Statistics Terminology

Chapter 1

Statistics: a set of methods for organizing, summarizing and interpreting info.

Population: the set of all individuals of interest in a study.

Sample: a set of selected individuals representative of the population.

Parameter: a value (usually numeric) that describes a population.

Statistic: a value (usually numeric) that describes a sample.

Data: measures or observations.

Data Set: a collection of measurements or observations.

Datum: a single measurement or observation.

Score: a single measurement or observation.

Descriptive statistics: statistical procedures used to summarize, organize, and simplify data.

Inferential statistics: techniques that allow the study of samples in order to make generalizations about the population.

Sampling error: the discrepancy or amount of error that exists between a sample statistic and the corresponding population parameter.

Variable: a characteristic or condition that changes or has different values for different individuals.

Constant: a characteristic or condition that does not vary between individuals.

Correlational method: two variables are observed to see if there is a relationship.

Experimental method: one variable is manipulated while changes are observed in another variable. Seeks to establish cause/effect. Must use randomization and a control group.

Independent variable: the variable manipulated by the researcher.

Dependent variable: the variable observed for changes.

Control group: a condition of the independent variable that does not receive the experimental treatment.

Experimental group: receives the treatment.

Confounding variable: an uncontrolled variable that is unintentionally allowed to vary systematically with the independent variable.

Quasi Experimental: Like the experimental but lacking the control or manipulation.

Constructs: hypothetical concepts used in theories to organize observations in terms of underlying mechanisms.

Operational definition: defines a construct in specific operational or procedural measurements.

Hypothesis: a prediction about the outcome of an experiment…usually about how the manipulation of the independent variable will affect the dependent variable.

Scales of measurement: Nominal, Ordinal, Interval, Ratio

Nominal: labels observations by category only. Exp: Male/Female

Ordinal: a set of categories rank ordered. Exp: Best to worst employee.

Interval: rank ordered catergories that for intervals of the same size. Allows you to measure the difference in size or amount...magnitude.

Ratio: an interval scale with an absolute zero point.

Discrete variable: separate indivisible categories.

Continuous variable: infinite number of possible values between two observed values.

Real limit: the halfway point below and above two adajacent continuous value scores.

Frequency distribution: an organized tabulation of the number of individuals located in each category on the scale of measurement.

Histogram: vertical bars are drawn above each score so the height corresponds to the frequency and the width corresponds to the real limits of the score. Used for interval or ratio scales.

Bar Graph: vertical bar is drawn above each score or category so the height of the bar corresponds to the frequency and there is a space separating each bar. Used for nominal or ordinal scales.

Polygon: a single dot is drawn above each score so the dot is centered above the score and the height of the dot corresponds to the frequency. A continuous line is then drawn to connect the dots and down to the zero frequency at each end of the range of scores. Used with interval or ratio scores.

Relative frequencies: like a polygon except there are too many scores so this shows proportions on the vertical axis in a curve rather than a series of lines…shows distribution of scores rather than individual scores as the polygon.

Rank or percentile rank: the percentage of people with a score below that level.

Symmetrical distribution: equal on each side.

Skewed distribution: scores pile up on one end and taper at the other end.

Tail: the side where the scores taper off.

Positive skew: the tail points toward the positive (above zero) end of the x axis. (Right).

Negative skew: tail points toward the negative (left) end.

Central Tendency: a statistical measure that identifies a single score as representative of an entire distribution. The goal is to find a single score most representative of the group.

Mean: The average.

Weighted mean: Adding scores to determine the average.

Median: the score that divides a distribution in half. Equivalent to the 50th percentile.

Mode: the most common observation among a group of scores.

Open ended distribution: when one category is left open as with 75 and above…etcetera.

Variability: a quantitative measure of the degree to which scores in a distribution are spread or clustered.

Range: the distance between the largest and smallest score in the distribution…or the upper real limit of the largest x value and the lower real limit of the smallest x value.

Interquartile range: the distance between the first quartile and the third quartile.

Semi-interquartile range: one half the interquartile range.

Deviation: the distance from the mean. (deviation scores must always add to zero)

Population variance: the mean squared deviation. Variance is the mean of the squared deviation scores.

Standard deviation: is equal to the square root of the variance.

SS: sum of squares: the sum of the squared deviation scores.

DF: Degrees of freedom: see text.

z-score: the precise location of each x value within a distribution. Can be positive or negative...the value of the score indicates the distance from the mean by counting the number of standard deviations between x and u.

Standardized distribution: transformed scores that result in predetermined values for u and q regardless of their values for the raw score distribution. Used to make dissimilar distributions comparable.

Standard score: a transformed score that provides information about its location in a distribution. A z-score is an example of a standard score.

Probability: in a situation where several outcomes are possible, probability is the fraction or proportion of any particular outcome. Must have random sampling.

Random sampling: for a sample to be random, each individual in the population has an equal chance of being selected and if more than one individual is to be selected there must be a constant probability for each and every selection.

Normal distribution: is symmetrical: the mean, median, and mode are equal; fifty percent of the scores are below the mean and fifty percent are above; most of the scores pile up around the mean and extreme scores are rare.

Sampling error: the discrepancy or amount of error between a sample statistic and the corresponding population parameter.

Distribution of sample means: the collectionof sample means for all possible random samples of a particular size (n) that can be obtained from a population.

Sampling distribution: a distribution of statistics obtained by selecting all possible samples of a specific size from a population.

Hypothesis testing: an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population.

Null hypothesis: predicts the independent variable (treatment) has no effect on the dependent variable.

Alternative hypothesis: predicts the independent variable (treatment) will have an effect on the dependent variable.

Type I error: rejecting the null hypothesis when the null is actually true.

Type II error: the investigator fails to reject a null hypothesis that is really false.

Alpha level or level of significance: a probability value that defines the unlikely sample outcomes when the null hypothesis is true. Defines the probability of a type I error.

Critical region: extreme sample values that are unlikely to be obtained if the null hypothesis is true.

One-tailed test: a directional hypothesis: the statistical hypotheses specify an increase or a decrease in the population mean score.

Power: the probability a test will correctly reject a false null hypothesis.
Factors affecting power include; alpha level, one-tailed versus two tailed, and sample size.

The general elements of hypothesis testing:
1. State the null hypothesis.
2. Use the sample data to calculate a sample statistics that corresponds to the hypothesized population parameter.
3. Evaluate findings by measuring standard error, sampling error, variability of the scores etcetera.
4. Test the statistics using a z-score or other means.
5. Test the alpha level/level of significance.

t-test: use instead of a z-test when the population standard deviation is not known. Uses standard error.

Degrees of freedom: the number of scores in a sample that are free to vary.

Independent measures research design: an experiment that uses a separate sample for each treatment condition or each population.

Repeated measure study: a single sample of subjects is used to compare two or more different treatment conditions. Each individual is measured in one treatment and then again in the second treatment.

Matched subject study: each individual in one sample is matched with a subject in the other sample.

Estimation: the inferential process of using sample data to estimate population parameters

Point estimate: use of a single number as the estimate of an unknown quantity.

Interval estimate: use of a range of values as an estimate of an unknown quantity. When it is accompanied with a specific level of confidence or probability it is called a confidence interval.

ANOVA: a hypothetical procedure used to evaluate mean differences between two or more treatments or populations. Major advantage is that it can compare two or more.

IN ANOVA a Factor is an independent variable.
Single factor design: a research study with only one independent variable.
Factorial design: a study with more than one independent variable.

Error term: in an ANOVA the denominator of the F-ratio. The error term provides a measure of the variance due to chance. When the treatment effect is zero (the null hypothesis is true) the error term measures the same sources of variance as the numerator of the F-ratio so the value of the f-ratio is expected to be nearly equal to 1.00

Levels of the factor: the individual treatment conditions that make up a factor.

K= the number of treatment conditions / the number of levels of the factor in ANOVA

Experimentwise alpha level: the overall probability of a type I error accumlated over a series of separate hypothesis tests.

Variance between treatments.
1. Treatment effect: the different treatment conditions produce different effects and cause the individuals scores to be higher or lower in one condition than another.

2. Experimental error: anytime behavior is measured there may be error introduced. It may cause scores to be different in one individual across conditions.

Variance due to chance; the error term.
1. Individual differences: within each treatment the scores come from different individuals.
2. Experimental error: the uncontrolled and unsystematic error could always be the source of differences.

ANOVA notations
K= the number of treatment conditions

A n= the number of scores in each treatment

N= the total number of scores in the entire study.

G= the sum of all the scores in the study. The value of G corresponds to the summation of x for all N scores and specifies the grand total for the experiment.

T= the sum of the scores in each treatment condition (treatment total).

SS= the sum of the squares for each treatment (SS)
åX2= the sum of the squared scores for the entire study.

P= personal total. Used to measure individual differences in the analysis.

F-ration: requires between treatments variance and error variance.

Main effect: the mean differences among the levels of one factor.

Interaction: the effect of one factor contingent upon the level of the second factor.

Between-treatment variance is caused by the treatment effect, individual differences, and experimental error.

Positive correlation: two variables tend to move in the same direction. As the x variable increases the y variable also increases and vv.

Negative correlation: two variables tend to go in opposite directions. As the x variable increases the y variable decreases.

Correlations are used to predict, test validity, reliability, and theory verification.

Pearson correlation: measures the degree and direction of linear relationship between two variables.

Spearman correlation: measures the degree and direction of relationships between two variables where both are measured on ordinal scales…that is, both x and y consist of ranks.

Point biserial correlation: used to measure the relationship between two variables when one is measured on an interval or ratio scale but the second has only two different values (dichotomous variables such as male/female).

PHI-Coefficient: when both variables (x and y) are dichotomous.

Regression: the technique for finding the best fitting straight line for a set of data.

Standard error of estimate: a measure of the standard distance between a regression line and the actual data points.


Post a Comment

Links to this post:

Create a Link

<< Home