

Course 1, Unit 2  Patterns in Data
Overview
Patterns in Data is an introduction to the analysis of univariate
(one variable) data. Throughout this unit students will be developing
tools and strategies that will help them make sense of data and communicate
their conclusions. The focus is on displaying data (to observe shape,
center, and variability/spread) and then computing and interpreting summary
statistics such as measures of center (mean, median, and mode) and measures
of variability (range, interquartile range, and standard deviation).
Key Ideas from Course 1, Unit 2

Dot plot (or number line plot): A way of organizing onevariable
data. Dot plots are particularly useful when the data set is small and/or
spread out. Shown below is the dot plot for the lengths of 100 male bears. (See
page 76.)

Histogram: A way of organizing onevariable data. For example,
in the histogram below of test scores, 3 students have a score
of at least 10 but less than 20, 7 students have a score of at
least 20 but less than 30, and so on. (See page 79.)
Relative frequency histogram: This type of histogram has the proportion or percentage that fall into each bar on the vertical axis rather than the frequency or count. This plot is particularly useful if the sample is very large. (See page 79.)

Shape of the distribution: Distributions of onevariable
data can be symmetric or skewed. (See page 77.)

Center: We can use mean, median or mode for the measure of
center, depending on which is most appropriate. Mean = (sum
of the data values) / (number of data values). Median = middle
data value in the ordered list. Mode = most frequently
occurring data value. (See pages 84 and 94.)

Percentiles: Percentiles are often used to measure the position of a data value in the distribution. Percentiles are typically used only when there are a very large or infinite number of possible values, such as with heights. So, for example, look at the growth chart for girls on page 105. For this chart, you can see that a 15yearold girl who weighs about 105 lbs would be at the 25th percentile. This means that about 75% of the girls her age weigh more than 105 lbs. (See pages 103105.)
Fivenumber summary (minimum, 1st quartile, median, 3rd quartile,
maximum): Using our example, we can determine the fivenumber
summary as follows. Put the values in order and count to the middle;
this is the median. The median is 31. Count to the middle of the
first (lower) 50% of the data; this data value is the first quartile,
Q1. Q1 is 23. Count to the middle of the second (upper) 50% of the
data; this data value is the third quartile, Q3. Q3 is 40. (See page 108.)

Box plot: Use the fivenumber summary to make a box plot.
You need a scale on the horizontal axis to make sense of the graph.
The box contains the middle 50% of the data values, starting at the
first quartile and ending at the third quartile. Interquartile range =
Q3  Q1 = 40  23 = 17. (See
pages 108111.)
Outliers: Data values that are far from, and separated from
the rest of the distribution. If the data are represented by a box
plot, then any value that is more than 1.5 times interquartile range
(see above) above Q3 or below Q1 will be represented as a dot, separated
from the other data. (See pages 113116.)

Spread of a distribution: Spread (or variability) could be
measured by the range, by the IQR (interquartile range, see above or
page 108),
or by the standard deviation (see pages 116124).
