Box plot

In descriptive statistics (also known as a box-and-whisker diagram), a box plot is a convenient way of graphically depicting the five-number summary, which consists of the smallest observation, lower quartile, median, upper quartile and largest observation.

The box plot may also identify outliers and possibly the mean.

A plain-text version might look like this:

+-----+-+

```  *		   o	 --------|   + | |----
```

+-----+-+

```+---+---+---+---+---+---+---+---+---+---+   number line
0   1   2   3   4   5   6   7   8   9  10
```

For this data set (values are approximate, based on the figure):

• smallest observation (minimum or min) = .5
• lower (first) quartile (Q1) = 7
• median (second quartile) (Med) = 8.5
• upper (third) quartile (Q3) = 9
• largest observation (maximum or max) = 10
• mean = 8
• interquartile range, IQR = [itex]Q3-Q1[/itex] = 2
• the value 3.5 is a "mild" outlier, between 1.5*(IQR) and 2*(IQR) below Q1
• the value 0.5 is an "extreme" outlier, more than 2*(IQR) below Q1
• the smallest value that is not an outlier is 5
• the data is skewed to the left (negatively skewed)

The horizontal lines (the "whiskers") extend to at most 1.5 times the box width (the interquartile range) from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. Twice the box width marks the boundary between "mild" and "extreme" outliers.

There are alternative implementations of this detail of the box plot in various software packages, such as the whiskers extending to at most the 5th and 95th (or some more extreme) percentiles. Such approaches do not conform to Tukey's definition, with its emphasis on the median in particular and counting methods in general, and they tend to produce "outliers" for all data sets larger than ten, no matter what the shape of the distribution.