Last Updated
Viewed 752 Times
              

I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.

However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:

library(scales)
library(ggplot2) 

# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
           rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
    value <- sqrt(value*value)
        Tdata <- cbind(Gene, Clone, variable)
        Tdata <- data.frame(Tdata)
            Tdata <- cbind(Tdata,value)

# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.                        
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               pch=15)


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Grouped-Wrong Symbols.png")

#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
                 size = 0.7, coef = 4) +
    geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
                  alpha = 1, colour = ln_clr) +
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
               aes(shape=Clone))


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
    expand_limits(y=c(0.01,10^5)) +
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
                  labels = trans_format("log10", math_format(10^.x)))

ggsave("Scatter Ungrouped-Right Symbols.png")

If anyone has any suggestions I'd really appreciate it.

Thank you Nathan

I am creating faceted box plots that are grouped by a variable. Instead of having the x-axis text be the factors for the x-axis variable I'd like the x-axis text to be the grouping variable.

However, I don't just want to use the grouping variable as my x-axis variable because I'd like the boxplots to cluster. Its hard to explain well. But I think its clear from the code and comments below.

Let me know if you have any suggestions or can help and thanks in advance!

    library(ggplot2) 
    library(scales)
    ln_clr <- "black"
    bk_clr <- "white"
    set.seed(1)

# Creates variables for a dataset
    donor = rep(paste0("Donor",1:3), each=40)
    machine = sample(rep(rep(paste0("Machine",1:4), each=1),30))
    gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
    value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
                  sd=rep(c(0.5,8,900,9000,3000), each=24))

# Makes all values positive
    for(m in 1:length(value)){
        if(value[m]<0){
            value[m] <- sqrt(value[m]*value[m])
        }
    }
# Creates a data frame from variables
    df = data.frame(donor, machine, gene, value)

# Adds a clone variable    
        clns <- LETTERS[1:4]
        k=1
        for(i in 1:nrow(df)/4){
            for(j in 1:length(clns)){
                df$clone[k] <- paste(df$donor[k],clns[j],sep="")
                    k = k+1
            }
        }
        df$clone <- as.factor(df$clone)


#*************************************************************************************************************************************
# Creates the facet of the machine but what I want on the x-axis is clone, not donor. 
# However, if I set x to clone it doesn't group the boxplots and its harder to read 
# the graph.
    bp1 <- ggplot(df, aes(x=donor, y=value, group=clone)) +
        stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), 
                     width = 0.25, size = 0.7, coef = 1) +
        geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), 
                     lwd = 0.3, alpha = 1, colour = ln_clr) +
        geom_point(position = position_dodge(width = 0.83), size = 1.8, alpha = 0.9, 
                    mapping=aes(group=clone)) +
        facet_wrap(~ machine, ncol=2, scales="free_x") 

    bp1 + scale_y_log10(expand = c(0, 0)) +
        theme(axis.text.x= element_text(size=rel(1), colour = "black", angle=45, hjust=1),
              strip.background = element_rect(colour = ln_clr, fill = bk_clr, size = 1))

# Creates the facet of the Donor and clusters the clones but doesn't facet the  
# machine. This could be okay if I could put spaces in between the different  
# machine values but not the donors and could remove the donor facet labels, and 
# only have the machine values show up once.
    bp2 <- ggplot(df, aes(x=clone, y=value)) +
        stat_boxplot(geom ='errorbar', position = position_dodge(width = .83),  
                     width = 0.25, size = 0.7, coef = 1) +
        geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), 
                     lwd = 0.3, alpha = 1, colour = ln_clr) +
        geom_point(position = position_dodge(width = 0.83), size = 1.8, alpha = 0.9) +
        facet_wrap(machine ~ donor, scales="free_x", ncol=6) 

    bp2 + scale_y_log10(expand = c(0, 0)) +
        theme(axis.text.x= element_text(size=rel(1), colour = "black", angle=45, hjust=1),
              strip.background = element_rect(colour = ln_clr, fill = bk_clr, size = 1),
              panel.spacing = unit(0, "lines"))    

Below is an example comparing what I'd like in an ideal world (Top two facets) as compared to what I'm getting (bottom two facets).

enter image description here

I am a new R user and found graphs I would like to replicate with my data. From the look of the plot, it looks as though it was made in ggplot2. I've searched and searched and can't find a template within ggplot2 or another package. Just wondering if anyone has seen template code for this?

See attached image and paper here: http://ehp.niehs.nih.gov/1205963/

enter image description here

I try to do such a plot using ggplot2 in R ? Check the panel g, h j, l or m of the figure below . So a scatterplot with boxplots next to the main plot. I tried the method described here but it didn't work at all http://www.r-bloggers.com/scatterplot-with-marginal-boxplots/ ...

FYI the figure is from this article : http://www.ncbi.nlm.nih.gov/pubmed/26437030

Thank you enter image description here

Similar Question 5 (1 solutions) : grouped boxplot r ggplot2

Similar Question 7 (1 solutions) : R ggplot2 grouped boxplot of TCGA expression data

Similar Question 8 (1 solutions) : R ggplot: Change Grouped Boxplot Median line

Similar Question 9 (3 solutions) : Grouped Frequency Bars in R using ggplot

cc