Last Updated
Viewed 80 Times
           

Currently, I have gene expression data from TCGA and loaded certain genes in a data.frame like this (with T for = tumor sample and N = normal tissue sample):

             Gene1  Gene2  Gene3 ...
Patient_T 1    2      3      1
Patient_T 2    1      5      6 
Patient_N 1    3      6      1
Patient_N 2    3      6      1
...

I want now to create a grouped boxplot with ggplot2. The graph should depict all the gene candidates in the x-axis and the expression level in the y-axis grouped by tumor and normal for each gene.

In other threads issuing grouped boxplots; they used a different format of the data.frame. I just wondered if there is a practical solution based on this data.frame format to create a grouped plot (i.e. with the rowname patient_ID).

I am trying to add p_values to my graph using "stat_signif" function.
The problem is that my boxplots are grouped box plots where I want to compare every 2 box plots of the same category and stat_signif function requires the x-axis values for comparing.
This is my code:

p <- ggplot(plot.data, aes(x = Element, y = Value, fill = Group)) + #Define the elements for plotting - group by "strandness".
geom_boxplot(outlier.shape = NA, colour = "black") +
scale_fill_manual(values = c("goldenrod","darkgreen")) +
coord_cartesian(ylim = c(0, 0.03)) +
stat_summary(fun.y=mean, colour="black", geom ="point", shape=18, size=4 ,show.legend = FALSE, position = position_dodge(0.75)) +
theme(legend.title=element_blank(),legend.text = element_text(size=16), axis.text.x = element_text(color = "black", size = 12), axis.text.y = element_text(color = "black", size = 12),
      panel.background = element_blank(),
      panel.grid.major = element_blank(), 
      panel.grid.minor = element_blank(),
      axis.line = element_line(colour = "black"),
      panel.border = element_rect(colour = "black", fill=NA, size=0.5),
      legend.key = element_rect(colour = "transparent", fill = "white")) +
theme(plot.title = element_text(lineheight=.8, hjust = 0.5, size = 20),axis.title.y = element_text(size = 20, angle = 90, margin = margin(t = 0, r = 20, b = 0, l = 0))) +
labs(x = "", y = paste0(dinuc, " frequency")) +
theme(plot.margin = unit(c(2,1,1,1), "cm")) +
#stat_compare_means(aes(group = group))
stat_signif(comparisons = list(c("Genes", "mRNA"))
            ,test = "wilcox.test", test.args = list(paired = FALSE, exact = FALSE, correct = FALSE,
                                                    map_signif_level = T), y_position = 0.02) 

Where the plot.data data frame looks like:

  Group, Value, Element
1 Transcribed, 0.004814926, Genes
2 Non-transcribed, 0.008926, Genes
3 Transcribed, 0.086000026, mRNA
4 Non-transcribed, 0.00548, mRNA
5 Transcribed, 0.258400078, Exons
6 Non-transcribed, 0.23008457, Exons
7 Transcribed, 0.00005687, Introns
8 Non-transcribed, 0.890000521, Introns

etc. (For every element there are about 10000 rows)

This is the figure obtained by the code: plot When I actually want to compare between the transcribed and non-transcribed box plots of every element.

I have repeatedly measured a given behavior in male and female animals over four different reproductive states (virgin, mated, expecting and parent). I would like to represent my data (x: reproductive state, y: behavior value) in the following manner:

  1. dot plot where dots of the same behavior value spread horizontally instead of overlapping
  2. each subgroup (e.g. virgin males) should also have a segment showing the mean value of the behavior
  3. each individual animal should also be tractable with thin lines connecting the dots that correspond to that individual in each reproductive state

I have managed to do 1) and 2), but couldn't combine them with my 3)rd objective. Can someone help me?

Here is an example:

library(ggplot2)

Function to obtain a mean segment for each group
MinMeanSEMMax <- function(x) {
  v <- c(min(x), mean(x) - sd(x)/sqrt(length(x)), mean(x), mean(x) + sd(x)/sqrt(length(x)), max(x))
  names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
  v
}

# Mock dataframe:
Sex<-rep(c("M","F"), times=12)
ID<-rep(seq(from=1, to=6), times=4)
Behavior<-rnorm(24, mean=10, sd=3)
State<-rep(c("virgin", "virgin", "mated", "mated", "expecting", "expecting", "parent", "parent"), times=3)
d<-data.frame(ID,Sex,Behavior,State)

# Prepare mean value for plotting of mean segments
  g<-ggplot(d, aes(x=factor(State), y=Behavior, colour=Sex))+
    stat_summary(fun.data=MinMeanSEMMax, geom="boxplot", position=position_dodge(), outlier.shape = 21, outlier.size = 3, size=1)+
    scale_x_discrete(limits=c("virgin", "mated", "expecting", "parent"), labels=c("virgin"="Virgin", "mated"="Mated", "expecting"="Expecting", "parent"="Parent"))+
  dat.g <- ggplot_build(g)$d[[1]]
  g

  # The plot

  b<-ggplot(d, aes(x=factor(State), y=Behavior, colour=factor(Sex)))+
    geom_segment(data=dat.g, aes(x=xmin, xend=xmax,y=middle, yend=middle), colour=c("blue3","brown2","blue3","brown2","blue3","brown2","blue3","brown2"), size=1)+
    geom_dotplot(aes(fill=Sex),binaxis="y", stackdir="center", position=position_dodge(width=1), binwidth = 0.3)+
    labs(x="",y="Behavior")+
    theme_classic()+ 
    theme(axis.line.x = element_line(color="black", size = 1),
          axis.line.y = element_line(color="black", size = 1))+
    theme(legend.position="none")+
    theme(axis.text.x =element_text(size=10),axis.text.y=element_text(size=10), axis.title=element_text(size=11,face="bold"))+
    scale_fill_manual(name="Sex", values=c("brown2", "blue3"), breaks=c("F", "M"))+
    scale_colour_manual(name="Sex",values=c("brown2","blue3"),breaks=c("F", "M"),labels=c("Female", "Male"))+
    scale_x_discrete(limits=c("virgin", "mated", "expecting", "parent"), labels=c("virgin"="Virgin", "mated"="Mated", "expecting"="Expecting", "parent"="Parent"))+
    theme(text=element_text(family="serif"))
  b

enter image description here

Similar Question 3 : Join values when creating boxplot

I have a table of 983 obs. of 27 variables; the data can be provided if need be, but I do not believe there is a need for it, as the following crosstable should summarise it well enough:

Kjønn   Antall  <>  e   f   g   s   ug
Sex     Count       w   d   m   s   um
k       282     2   26  5   41      208
m       701     11  56  4   148 2   480

Abbreviations (with English translation):

e[nkemann],  f[raskilt], g[ift],    s[eparert],  ug[ift]
w[idow(er)], d[ivorced], m[arried], s[eparated], u[n]m[arried]

I would like to create a variable width boxplot showing the distribution of these individuals, but as can be seen from the table, the NAs, the divorced and the separated would be such a small group that it would be hardly legible (and pointless. How can I join these groups creating a boxplot showing e, f+s, g, and ug?

My current code:

# The basis for the boxplot
dBox_SexAge <- ggplot(data = tblHoved) +
  geom_boxplot(
    mapping = aes(colour = KJONN, x = KJONN, y = 1875-FAAR),
    notch = TRUE,
    lwd = .5, fatten = .125,
    varwidth = TRUE
  )

# Create the final boxplot
dBox_SexAgeMStat <- dBox_SexAge +
  facet_grid(SIVST ~ .) +
  coord_flip()

# Run it
dBox_SexAgeMStat

Current plot, from which I would like to group f and s: enter image description here

Similar Question 5 (1 solutions) : Boxplot in R using ggplot2

Similar Question 7 (1 solutions) : Boxplot of CSV data with ggplot2

Similar Question 8 (1 solutions) : R boxplot vs ggplot2 geom_boxplot

Similar Question 9 (1 solutions) : R ggplot: Change Grouped Boxplot Median line

cc