Class 11 · Math

Statistics

Statistics is the science of data: collecting, organizing, analyzing, and interpreting it.

Feynman Lens

Start with the simplest version: this lesson is about Statistics. If you can explain the core idea to a friend using everyday language, examples, and one clear reason why it matters, you have moved from memorising to understanding.

Statistics is the science of data: collecting, organizing, analyzing, and interpreting it. While a single data point tells you one story, statistics reveals patterns hidden within collections of data. This chapter introduces measures of central tendency (mean, median, mode) that describe where data clusters, and measures of dispersion (variance, standard deviation) that describe how spread out it is. Understanding statistics is essential for science (designing experiments), business (market analysis), medicine (clinical trials), and informed citizenship (interpreting news and policy claims). Statistics transforms raw data into actionable insights.

Data: Raw Material for Statistics

Data comes from observations or measurements. A dataset is a collection of data values, often organized in a table or list.

Types of data:

- Discrete: Countable (number of students) - Continuous: Measurable across a range (height, weight)

Qualitative: Descriptive (colors, categories)
Quantitative: Numerical (ages, heights, test scores)

A population is the entire set of objects of interest. A sample is a subset of the population studied to estimate population characteristics.

Frequency Distributions

Organizing data into a frequency distribution makes patterns visible.

Create a table with class intervals (ranges) and count how many data points fall in each interval. For test scores 0-100:

0-20: 2 students
20-40: 5 students
40-60: 8 students
60-80: 10 students
80-100: 5 students

The frequency is the count. The relative frequency is the proportion (count/total). A histogram visualizes this as a bar chart with class intervals on the x-axis and frequencies on the y-axis.

Measures of Central Tendency

These numbers summarize where the center of the data lies.

Mean (Average): The sum of all values divided by the count.

Mean = Σx / n

For data {2, 4, 6, 8}, mean = (2+4+6+8)/4 = 5

The mean is sensitive to outliers. One extremely large value pulls it up.

Median: The middle value when data is arranged in order.

For {2, 4, 6, 8}, median = (4+6)/2 = 5 (average of two middle values for even-sized sets)

The median is robust—outliers don't affect it as much.

Mode: The value that appears most frequently.

For {2, 2, 4, 6, 8, 8, 8}, mode = 8 (appears 3 times)

Data can have one mode (unimodal), two (bimodal), or no mode if no value repeats.

Measures of Dispersion

These measure how spread out the data is around the center.

Range: Difference between maximum and minimum values.

For {2, 4, 6, 8}, range = 8 - 2 = 6

Simple but sensitive to outliers.

Variance (σ²): Average of squared deviations from the mean.

Variance = Σ(x - mean)² / n

For each value, calculate how far it is from the mean, square it, average those squares.

For {2, 4, 6, 8} with mean = 5: Variance = [(2-5)² + (4-5)² + (6-5)² + (8-5)²] / 4 = [9 + 1 + 1 + 9] / 4 = 5

Standard Deviation (σ): The square root of variance.

σ = √Variance

For the example above: σ = √5 ≈ 2.24

Standard deviation is in the original units of measurement, making it more interpretable than variance. A small standard deviation means data is tightly clustered around the mean.

Coefficient of Variation

When comparing datasets with different scales, the coefficient of variation (CV) is useful:

CV = (Standard Deviation / Mean) × 100%

This expresses dispersion as a percentage of the mean, allowing comparison across different units.

Probability and the Normal Distribution

The normal distribution (bell curve) is ubiquitous in nature. Many phenomena—heights, test scores, measurement errors—follow it approximately.

The normal distribution is characterized by:

- 68% of data falls within 1 standard deviation of the mean - 95% within 2 standard deviations - 99.7% within 3 standard deviations

Symmetry: Mean = median = mode
68-95-99.7 Rule:

Correlation and Relationship Between Variables

Correlation measures how two variables move together.

The correlation coefficient (r) ranges from -1 to +1:

r = +1: Perfect positive correlation (one increases with the other)
r = 0: No correlation
r = -1: Perfect negative correlation (one decreases as the other increases)

Important: Correlation does not imply causation. Two variables might move together because of a third variable or pure chance.

Real-World Applications

Medicine: Clinical trials use statistics to test drug effectiveness.

Quality Control: Manufacturing monitors product consistency using statistical samples.

Economics: Unemployment rates, inflation, and GDP growth are statistical measures guiding policy.

Psychology: Research findings rely on statistical significance testing.

Sports: Advanced analytics reveal player performance patterns.

Key Formulas

Mean: x̄ = Σx / n
Variance: σ² = Σ(x - x̄)² / n
Standard Deviation: σ = √Variance
Coefficient of Variation: CV = (σ / x̄) × 100%
68-95-99.7 Rule: Probabilities for normal distributions

Socratic Questions

Why is the median often preferred over the mean for describing typical values when data contains outliers? Can you construct an example where they differ dramatically?

Variance is the average of squared deviations from the mean. Why square the deviations instead of using absolute values? What would be lost if we used |x - mean| instead?

The standard deviation measures spread. If one dataset has σ = 2 and another has σ = 10, what can you conclude about which is more clustered? How would this affect predictions?

The correlation coefficient measures association between variables, but "correlation is not causation." Can you think of two variables that are correlated but not causally related? What other factors might explain their association?

The normal distribution describes many real phenomena. Why is this distribution so common? What fundamental principles in nature lead different independent random processes to produce bell-shaped patterns?

🃏 Flashcards — Quick Recall

Term / Concept

Range

tap to flip

Range = Maximum value − Minimum value. The simplest measure of dispersion; gives a quick spread but ignores how data is distributed.

Term / Concept

Mean of ungrouped data

tap to flip

x̄ = (∑ xᵢ) / n. Add all observations and divide by the number of observations.

Term / Concept

Mean of frequency distribution

tap to flip

x̄ = (∑ fᵢxᵢ) / N, where N = ∑ fᵢ is the total frequency.

Term / Concept

Mean deviation about a

tap to flip

M.D.(a) = (1/n) ∑ |xᵢ − a|. Absolute values are taken so positive and negative deviations don't cancel.

Term / Concept

Why ∑(xᵢ − x̄) is not used

tap to flip

The sum of deviations from the mean is always zero, so the average signed deviation gives no information about spread; we use absolute values or squared values instead.

Term / Concept

Variance σ²

tap to flip

σ² = (1/N) ∑ fᵢ(xᵢ − x̄)². The mean of squared deviations from the mean. For ungrouped data, replace fᵢ by 1 and use n instead of N.

Term / Concept

Standard deviation σ

tap to flip

σ = √(variance), the positive square root. Reported in the same units as the original observations, making it more interpretable than variance.

Term / Concept

Effect of adding a constant

tap to flip

Adding (or subtracting) the same constant to every observation leaves the variance and standard deviation unchanged, because deviations from the new mean are identical.

Term / Concept

Effect of multiplying by a constant

tap to flip

If each observation is multiplied by k, the new variance becomes k² times the original variance, and the new standard deviation becomes |k| times the original.

Term / Concept

Coefficient of variation

tap to flip

C.V. = (σ / x̄) × 100. A unit-free measure used to compare variability of two distributions with different means or scales — smaller C.V. means more consistent.

📝 Quick Quiz — Test Yourself

Batsman A scored 0, 117, 30, 91, 64, 42, 80, 30, 5, 71. The range of his scores is:

A 117
B 53
C 87
D 14

For the data 2, 4, 6, 8 the variance is:

A √5
B 5
C 4
D 20

The variance of 20 observations is 5. If each observation is multiplied by 2, the new variance is:

A 20
B 10
C 5
D 25

Each observation in a data set is increased by 7. The variance of the new set, compared with the original, is:

A Increased by 7
B Increased by 49
C Multiplied by 7
D Unchanged

The mean and standard deviation of 5 observations 1, 2, 6, x, y are 4.4 and √8.24 respectively. Then x and y are:

A 5 and 8
B 6 and 7
C 4 and 9
D 3 and 10