Last Updated
Viewed 15,291 Times
        

I have 5 columns of numerical data (Equipment, Hyiene.items etc) and 1 column of categorical data (A or D). I'd like to make a grouped boxplot of the numerical data grouped by category but I cannot find a way:

 head(sc)
  Equipment Hygiene.items Patient Near.bed Far.bed Care
1         0             0       1        5       1    D
2         1             4       1        2       0    D
3         3             1       1        2       0    D
4         0             2       2        3       1    A
5         1             2       1        5       2    A
6         1             2       1        1       1    A

boxplot(sc~sc$Care) would seem like the most appropriate way right? I like ggplot2 but it doesn't look like i can simply do this:

ggplot(sc, aes(y=sc)) + 
  geom_boxplot(aes(fill=Care))

EDIT: What I like the look of:

I think what I'm after is something like this one I made in Matlab (a long time ago):

enter image description here

Or the 4th graph on here: Plotly

enter image description here

What I have so far:

library(ggplot2)
library(RColorBrewer)

ggplot(melt_A,aes(x=Care,y=value,fill=Care))+geom_boxplot(ylim=c(1,6,1))+facet_grid(~variable)+
labs(x = "Care", y = "Surface contacts",color="Care" )+
  scale_y_continuous(limits = c(-0, 6))+
  scale_fill_brewer(palette="Purples")+
  theme_bw()+
  theme(strip.background=element_rect(fill="black"))+
  theme(strip.text=element_text(color="white", face="bold"))

Question

How can I change the Care labels from D, H, Me, to something else? e.g. Direct Care, Housekeeping, Medication round, etc...

Fixed:

Found answer here : Stack

I added the following to my ggplot command

scale_fill_brewer(palette="Purples",
  labels = c("Direct care", "Housekeeping","Medication    round","Mealtimes","Miscellaneous care","Personal care"))

enter image description here

I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.

However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:

library(scales)
library(ggplot2) 

# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
           rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
    value <- sqrt(value*value)
        Tdata <- cbind(Gene, Clone, variable)
        Tdata <- data.frame(Tdata)
            Tdata <- cbind(Tdata,value)

# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.                        
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               pch=15)


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Grouped-Wrong Symbols.png")

#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               aes(shape=Clone))


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Ungrouped-Right Symbols.png")

If anyone has any suggestions I'd really appreciate it.

Thank you Nathan

Currently, I have gene expression data from TCGA and loaded certain genes in a data.frame like this (with T for = tumor sample and N = normal tissue sample):

             Gene1  Gene2  Gene3 ...
Patient_T 1    2      3      1
Patient_T 2    1      5      6 
Patient_N 1    3      6      1
Patient_N 2    3      6      1
...

I want now to create a grouped boxplot with ggplot2. The graph should depict all the gene candidates in the x-axis and the expression level in the y-axis grouped by tumor and normal for each gene.

In other threads issuing grouped boxplots; they used a different format of the data.frame. I just wondered if there is a practical solution based on this data.frame format to create a grouped plot (i.e. with the rowname patient_ID).

Similar Question 3 : Boxplot in R using ggplot2

I'm new to R and have been trying to make a boxplot. A part of the data I'm using is shown

            h1          h2          h3          h4          h5          h6          h7          h8          h9         h10
1  0.003719430 0.002975544 0.003049933 0.003421876 0.003421876 0.003347487 0.003645042 0.003496264 0.007364472 0.009075410
2  0.003400540 0.002749373 0.003038781 0.003328188 0.003328188 0.003400540 0.003472892 0.003400540 0.007741656 0.009333398
3  0.003741387 0.002918282 0.003142765 0.003367248 0.003367248 0.003367248 0.003666559 0.003516904 0.008081396 0.008156223
4  0.003870634 0.002884002 0.003187581 0.003339370 0.003567055 0.003415265 0.003794739 0.003491160 0.008348426 0.007741268
5  0.003782963 0.002950711 0.003177689 0.003480326 0.003404667 0.003404667 0.003707304 0.003631645 0.008927793 0.007414608
6  0.003643736 0.002884624 0.003264180 0.003416002 0.003491913 0.003416002 0.003871469 0.003795558 0.009033428 0.007135649
7  0.003718600 0.003035592 0.003111482 0.003339151 0.003566821 0.003566821 0.003642710 0.003870380 0.008120209 0.008044319
8  0.003819313 0.002979064 0.003284609 0.003360995 0.003590154 0.003437382 0.003895699 0.003590154 0.008326102 0.007791398
9  0.003899334 0.002981844 0.003211216 0.003364131 0.003669961 0.003440589 0.003746419 0.003669961 0.008410328 0.007569295
10 0.003828488 0.002986220 0.003292499 0.003445639 0.003522209 0.003522209 0.003598778 0.003598778 0.008422673 0.007810115

When I use the default boxplot command then this is what I get

boxplot(df)

enter image description here

I have been trying to generate the boxplot for same data using ggplot2 but it gives an error which I am unable to resolve. Here's what I tried.

library(ggplot2)
df <- readRDS('data.Rda')
ggplot(df) + geom_boxplot()

Here's the error

Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous
Error: Aesthetics must either be length one, or the same length as the dataProblems:df[, 6:15]

I saw the ggplot2 docs for geom_boxplot and realize (from the example) that I need to rearrange my data like

col1        col2
h1   0.003719430
h1   0.003400540
h1   0.003741387
h1   0.003870634
h1   0.003782963
h1   0.003643736
h2   0.002975544
h2   0.002749373
h2   0.002918282
h2   0.002884002
h2   0.002950711
h2   0.002884624
...

and use something like

ggplot(df, aes(factor(col1), col2)) + geom_boxplot()

But that is a lot of work. I believe that there must be some way to do this automatically which I'm not able to find. Any help is appreciated.

Similar Question 4 (1 solutions) : Order boxplot at higher level (R, ggplot2)

Similar Question 5 (1 solutions) : R boxplot vs ggplot2 geom_boxplot

Similar Question 6 (3 solutions) : annotate boxplot in ggplot2

Similar Question 7 (1 solutions) : Creating a boxplot, ggplot2

Similar Question 8 (2 solutions) : ggplot2 boxplot

Similar Question 9 (1 solutions) : ggplot2 width of boxplot

cc