Last Updated
Viewed 478 Times
        

Why are the boxplots so different?

boxplot(loan.part.value ~ platform, p2p)

enter image description here

ggplot(p2p, aes(loan.part.value, platform)) + geom_boxplot()

enter image description here

(I redacted the tick labels.)

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_0.9.3.1

I am trying to use the group= option in geom_boxplot and it works for one grouping function, but not for the another. First plot runs, 2nd and 3rd plots (really same, called differently) both fail to produce 2-month boxplots for pre 2017 and one-month boxplots for 2017, as the grouper intends. For grouper function ggplot declares Warning message: position_dodge requires non-overlapping x intervals " but X value is same across graphs. Clearly related to my groupdates function, but groups appear to be constructed properly. Suggestions welcome. With thanks.

library(tidyverse)
library(lubridate)
# I want two month groups before 2017, and one-month groups in 2017

groupdates <- function(date) {
  month_candidate <-case_when(
    year(date) < 2017 ~ paste0(year(date), "-", (floor(((0:11)/12)*6)*2)+1),
    TRUE ~ paste0(year(date), "_", month(date))
  )
  month_candidate2 <-case_when(
    (str_length(month_candidate)==6) ~ paste0(str_sub(month_candidate,1,5), "0", str_sub(month_candidate,6)),
    TRUE ~ month_candidate
  )
  return(month_candidate2)
}

generate_fake_date_time <- function(N, st="2015/01/02", et="2017/02/28") {
       st <- as.POSIXct(as.Date(st))
       et <- as.POSIXct(as.Date(et))
       dt <- as.numeric(difftime(et,st,unit="sec"))
       ev <- sort(runif(N, 0, dt))
       rt <- st + ev
}

n=5000
set.seed(250)
test <-as.data.frame(generate_fake_date_time(n))
colnames(test) <- "posixctdate"
test$ranvalue <- month(test$posixctdate)+runif(length(test), 0,1)
test$grouped_time <-groupdates(test$posixctdate)
table(test$grouped_time)

ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=paste0(year(posixctdate), "_", month(posixctdate))))
#ggplot(test)+geom_violin(aes(x=posixctdate, y=ranvalue, group=junk))
ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=grouped_time))
ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=groupdates(posixctdate)))

    sessionInfo()

This question already has an answer here:

I have a dataset that is something like this:

A, B, C, D, E
1, 2, 3, 4, 5
2, 3, 4, 5, 6
3, 4, 5, 6, 7
...

I want to make a ggplot2 graph in R that is a boxplot where each box corresponds to a letter. According to what I researched, the way to do this is having something like this as dataset:

Letter, Value
A, 1
A, 2
A, 3
...
B, 2
B, 3
B, 4
...
...
E, 5
E, 6
E, 7

The original dataset doesn't need to have the same amount of elements for each box actually . Is there any way to a) do the boxplot with the original dataset without changing it, using ggplot2 (not the builtin boxplot of R), this is what I really want, if there is no way, then: b) transform one dataset to another with R?

Thank you!

P.S. (I don't know if I can ask this in the same question, if no, I am very sorry) If you know any good tutorial for beginners in ggplot2, that actually teaches how to use ggplot instead of qplot, I would appreciate it a lot. Thank you again!

I want to produce a grouped boxplot, so first I modified a piece of code I found on the internet (http://www.r-bloggers.com/ggplot2-multiple-boxplots-with-metadata/) to generate a dataframe of test values:

    Y <- data.frame(
      values = c(rnorm(mean=20, sd=4, n=3), rnorm(mean=10, sd=2, n=3), rnorm(mean=50, sd=10, n=3), rnorm(mean=60, sd=12, n=3)),
      factor1 = rep(c('oil1', 'oil2'), each = 3),
      factor2 = rep(c('product1', 'product2'), each = 6)
   )

    values  factor1 factor2
1   13.527314   oil1    product1
2   23.495898   oil1    product1
3   14.881210   oil1    product1
4   9.110103    oil2    product1
5   9.330372    oil2    product1
6   10.846560   oil2    product1
7   40.786020   oil1    product2
8   43.157393   oil1    product2
9   43.050182   oil1    product2
10  39.588651   oil2    product2
11  65.963630   oil2    product2
12  63.425253   oil2    product2

Then, the code:

ggplot(Y, aes(x = factor2, y = values, fill = factor1)) +
  geom_boxplot()

produces the boxplot I want.

My real data are in this other data frame, created reading a .csv file:

    values  factor1 factor2
1   0.2 oil1    product1
2   1.7 oil1    product1
3   3.2 oil1    product1
4   27.8    oil2    product1
5   29.8    oil2    product1
6   31.8    oil2    product1
7   0   oil1    product2
8   1   oil1    product2
9   2.5 oil1    product2
10  29.3    oil2    product2
11  31.3    oil2    product2
12  33.3    oil2    product2

(I am unable to correct the misalignement in this table) Yet when I try to create a boxplot using the code above, instead of the boxes the plot contains horizontal segments at y=value.

How can I resolve this problem?

Similar Question 4 (1 solutions) : finding x coordinates of box in geom_boxplot (ggplot2)

Similar Question 7 (1 solutions) : extreme value labels ggplot2 in geom_boxplot

cc