Glossary - StatsFind

B

Bar Chart – A method for displaying categorical data. The frequency is on one axis while the categorical variables are on another.

C

Categorical Data – Values that place things into different groups or categories

Categorical Variables – Non-numerical values that you study. Examples are hair color, letter grade, and type of dog.

Categorical and Nominal – Categorical variables that have no logical ordering. For example, you can order hair color alphabetically but the values themselves have no logical order.

Categorical and Ordinal – Categorical variables that have logical ordering. For example, you can order letter grades from first to last or last to first.

Continuous Variable – A variable that can take on any numerical value. For example, someone can weigh 105 lbs, 185 lbs, or 170.683 lbs.

D

Descriptive Statistics – Explains the data collected. For example, if there are ten people in a room and two are women, a descriptive statistic would be that 20% of the people are female.

Digital Transformation – Digital transformation (DX) is the adoption of digital technologies in all areas of a business. It includes, but is not limited to, integrating artificial intelligence, mobile, and social media channels. The aim of digital transformation is to deliver a better customer experience and gain a competitive edge.

Discrete Variable – A variable that you can measure by certain numbers. For example, the number of pets can be 1,2, or 30 but not 2.7.

F

Frequency Distribution – Shows how many data values fall within a certain interval.

H

Histogram – A method for displaying quantitative data and is a form of frequency distribution. The frequency is on one axis and the quantitative variables are on the other.

I

Inferential Statistics – Looks at the data from a sample size and draws a conclusion based on it. For example, are left-handed people better at math than right-handed people? In this case, we could select 100 lefties and 100 righties and have them take the same math test. After reviewing the test scores, what can you infer? That’s inferential statistics.

Interval Variables – Also called numerical variables, are variables that you can evenly space out on a scale. Examples include time and age. Interval data can be negative numbers but cannot have a zero value.

M

Mean – The arithmetic average of the data set. The mean is the summation of all data values divided by the average of the data values.

Median – The data value that is positioned in the middle of an ordered data set. You can use the following formula to find the position of the median:

If there are two middle values (due to the data set being an even number), the median is the average of the two middle values.

Mode – The data value that is most frequently observed, the most popular.

O

Outlier – An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal.

P

Pie Chart – A method for displaying categorical data. It shows the relative size of each value in relation to the whole.

Population – The total amount of things. Things can be people, vehicles, animals, buildings, etc.

Q

Quantitative Data – Numerical data that you can perform arithmetic calculations on (e.g., the average)

Quantitative Variables – Numerical things that you are studying. Examples are age, weight, and temperature.

R

Range – A measure of spread and is the maximum value minus the minimum value.

Ratio Variables – You can calculate ratios. For instance, temperature is an interval value but not a ratio value because 4 degrees Celsius is not half as cold as 8 degrees Celcius. However, weight is a ratio value because 50 lbs are half the weight of 100 lbs. Other examples of ratio variables are height, distance, and mass.

Relative Frequency Distribution – Similar to a frequency distribution, where it displays the amount of data values within an interval. The difference is that relative frequency distribution displays the proportional value in its relation to the whole.

S

Sample – A small part of the population that is used study

Sample Size – The total amount of things in a sample

Sensitivity – Term used in disease screening. Sensitivity is a mathematical equation for detecting a true positive test of active disease. A similar term is positive percent agreement (PPA), which compares the current at-home kit to the best testing available.

Social Proof – Also known as normative social influence. When someone adjusts their behavior to gain social approval or because they think the majority knows best.

Social Psychology – The scientific study of how interactions with others shape a person’s thoughts, beliefs, feelings, and even goals.

Specificity – Term used in disease screening. Specificity is a mathematical equation for detecting a true negative test, indicating the person does not have active disease. A similar term is negative percent agreement (NPA), which compares the current at-home kit to the best testing available.

Standard Deviation – Tells us how close the values in a data set are to the mean.

Statistics – The science of collecting, analyzing, presenting, and interpreting data

Stemplots – Stemplots are like histograms except they show each data point. The stemplot has two components: the stem and the leaf. The stem represents all the numbers except the last one as is positioned on the left side of the stemplot. The leaf represents the last number and is positioned on the right side of the stemplot. Combing the value on the left with a value on the right gives you the complete number.

T

Timeplot – Show how a variable changes over time. Time is plotted on the x-axis and the variable is on the y-axis.

V

Variable – What you are studying. A variable is measurable, countable, or categorized. Examples include height, test scores, and eye color.