April, 2018
plotrix
vioplot
vcd
* Example of categorical variables: gender, supporting candidate, blood type etc. + gender is an example of categorical variable. It has two levels, male and female. + blood type has four levels, A, B, AB and O. * Summarization of categorical data: frequency, relative frequency (cf. What tools can we use to summarize continuous data?) * Visualization of categorical data: + Barplot + Pie chart + Other useful tool?
state.region
.counts = table(state.region) counts barplot(counts, main = "simple bar chart", xlab = "region", ylab = "freq")
table()
function: compute frequencies of variables in state.region
.barplot()
function: draw barplot with use of the counts
variable.xlab, ylab
options add the names of axises.## state.region ## Northeast South North Central West ## 9 16 12 13
mtcar
has continuous and categorical variables.
Draw barplot with cyl
variable in mtcars dataset.
freq.cyl =table(mtcars$cyl) barplot(freq.cyl, main = "simple bar chart", col ="orange")
names.arg
option can takes the names of levels and add the names on the plot.freq.cyl
freq.cyl
is 3, we use the corresponding name vector with length 3.cyl.name = c("4 cyl", "6 cyl", "8 cyl") barplot(freq.cyl, main = "simple bar chart", col ="orange", names.arg = cyl.name)
pie()
function draws pie chart with categorical data.table()
function.pie()
function does not display the names of levels in categorical data, label
are sometimes used to draw the figure like that of Excel.cyl.name2 = paste0( cyl.name, "(", freq.cyl, "%)") pie(freq.cyl, labels = cyl.name2, col = rainbow(length(freq.cyl)), main = "pie chart")
paste0()
function returns character vector whose elements defined in inputs are concatenated. Check the output of paste0
function and look for another function paste()
.plotrix()
draws 3 dimensional pie chart.library(plotrix) pie3D(freq.cyl, labels = cyl.name2, explode = 0.1, main = "3d pie plot")
explode
option controls the gaps between parts of pie.fan.plot()
draws the fan plot used as an alternative of pie chart. Fan plot is relatively easy to compare the relative frequencies of levels, while pie chart is difficult.fan.plot(freq.cyl, labels = cyl.name2, main = "Fan plot")
xtabs()
barplot()
, spine()
library(vcd) head(Arthritis, n = 3)
## ID Treatment Sex Age Improved ## 1 57 Treated Male 27 Some ## 2 46 Treated Male 29 None ## 3 77 Treated Male 30 None
my.table <- xtabs( ~ Treatment + Improved, data = Arthritis) my.table
## Improved ## Treatment None Some Marked ## Placebo 29 7 7 ## Treated 13 7 21
~ Treatment + Improved
, `treatment
is row-variables and improved
is column variable.xtab()
produces cross table with categorical variables.barplot( my.table, xlab = "Improved", ylab = "Frequency", legend.text = TRUE, col = c("green", "red"))
* Better display? Change the row and columns of the cross table.
barplot( t(my.table), xlab = "Improved", ylab = "Frequency", legend.text = TRUE, col = c("green", "red", "orange"))
t(mytable)
col=
t(my.table)
## Treatment ## Improved Placebo Treated ## None 29 13 ## Some 7 7 ## Marked 7 21
tmp = c("buckled", "unbuckled") belt <- matrix( c(58, 2, 8, 16), ncol = 2, dimnames = list(parent = tmp, child = tmp)) belt
## child ## parent buckled unbuckled ## buckled 58 8 ## unbuckled 2 16
spine()
: this function shows mosaic plot where the length of edge denotes the marginal probability.library(vcd) spine(belt, main="spine plot for child seat-belt usage", gp = gpar(fill = c("green", "red")))
x = rnorm(100) boxplot(x, main = "boxplot", col ='lightblue')
\[ \max \{ x_i: x_i \leq Q1 + 1.5\times(Q1-Q2) \}\] and
\[ \min \{ x_i: x_i \leq 2.5 \times Q3 - 1.5\times (Q2-Q3) \};\], respectively.
x = faithful$waiting hist(faithful$waiting, nclass = 8)
hist()
produces histogram.
nclass
option is usually used to determind the number of bins.break
option.probability=T
option.x = faithful$waiting hist(faithful$waiting, breaks = seq(min(x), max(x), length = 10), probability = T)
density()
function gives the results of density estimation by a kernel method.x = faithful$waiting hist(faithful$waiting, nclass = 10, probability = T) lines(density(x), col = "red", lwd = 2)
library(vioplot) x = rpois(1000, lambda = 3) vioplot(x, col = "lightblue")
boxplot()
we can compare summarized information of continuous variable according to the levelsmpg~cyl
means that we will use mpg
on y-axis and cyl
on x-axis.attach(mtcars) boxplot(mpg~cyl, data = mtcars, names = c('4 cyl','6 cyl', '8 cyl'), main = "MPG dist by cylinder")
hist(mpg[cyl==4], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'lightblue', nclass = trunc(sqrt(length(mpg[cyl==4])))) hist(mpg[cyl==6], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'orange', nclass = trunc(sqrt(length(mpg[cyl==6]))), add= TRUE) hist(mpg[cyl==8], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'red', nclass = trunc(sqrt(length(mpg[cyl==8]))), add= TRUE)
Exampple of poor visualization
Use vertical layout by mfrow option.
Set xlim equally for fair comparison of locations.
par(mfrow = c(3,1)) hist(mpg[cyl==4], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'lightblue', nclass = trunc(sqrt(length(mpg[cyl==4])))) hist(mpg[cyl==6], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'orange', nclass = trunc(sqrt(length(mpg[cyl==6])))) hist(mpg[cyl==8], xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0,10), col = 'red', nclass = trunc(sqrt(length(mpg[cyl==8]))))
plot(density(mpg[cyl==4]), xlab="MPG", main = "MPG dist by cylinder", xlim = c(5, 40), ylim = c(0.,0.26)) lines(density(mpg[cyl==6]), col = "red", lty = 2) lines(density(mpg[cyl==8]), col = "blue", lty = 3) legend("topright", paste(c(4,6,8), "Cylinder"), col = c("black","red", "blue"), lty = c(1,2,3), lwd = 3, bty ="n")
tmp = c("buckled", "unbuckled") belt <- matrix( c(58, 2, 8, 16), ncol = 2, dimnames = list(parent = tmp, child = tmp)) belt
## child ## parent buckled unbuckled ## buckled 58 8 ## unbuckled 2 16
barplot( t(belt), main = "Stacked Bar chart for child seat-belt usage", xlab = "parent", ylab = "Frequency", legend.text = TRUE, col = c("green", "red"))