火山图是散点图的一种,通过将统计测试中的统计显著性量度(如pvalue)与变化幅度相结合从而更快速直观的展示那些变化幅度较大且具有统计意义的数据点。
火山图应用很多,它能够清晰地展示显著上调和下调的蛋白/基因,样式繁多,本文展示渐变火山图的绘制方法,如下图:
NO.1
示例数据与R包载入
library(openxlsx)library(ggplot2)library(ggrepel) #用于标记的包data <- read.xlsx("data.xlsx")#读取数据head(data)#查看数据,Marker列为1的需要标记,Marker列为0的不用
NO.2
绘制简易火山图--首先绘制出图的大致形状,修改散点为渐变色,大小为渐变大小
ggplot( data,aes(log2(FC),-1*log10(PValue))) +#加载数据,定义横纵坐标 geom_point(aes(color = -log10(PValue),size = -log10(PValue))) + #设置散点图 geom_vline(xintercept = c(-log2(FC),log2(FC)),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于x轴的虚线 geom_hline(yintercept = -log10(PValue),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于y轴的虚线,lty指定线条类型线,lwd是线条宽度 annotate("text",x = 1.3,y = -log10(PValue) - 0.1,label = "P-value=0.05",colour = "black") + annotate("text",x = -0.37,y = 6.4,label = "FC=1/1.2",colour = "black") + annotate("text",x = 0.35,y = 6.4,label = "FC=1.2",colour = "black") +#annotate添加虚线标签 scale_color_gradientn(values = seq(0,1,0.2), colors = c("#39489f","#39bbec","#f9ed36","#f38466","#b81f25")) +#指定颜色渐变模式 scale_size_continuous(range = c(2,3)) +#指定散点大小渐变模式 labs(title = "Volcano Plot",#主题 x = "log2(FC)",#x轴标签 y = "-log10(P-value)") + theme_bw() + #设置主题 theme(panel.grid = element_blank())#移除网格
NO.3
为部分散点添加label
(一)根据Marker标记,新增一列用于存储label信息
data$label <- ifelse(data$Marker == 1,as.character(data$Accession),"")
(二)根据PValue小于多少和log[2]FC的绝对值大于多少筛选出合适的点
#PvalueLimit = 0.01#FCLimit = 1data$label <- ifelse(data$PValue < PvalueLimit & abs(log2(data$FC)) >= FCLimit,as.character(data$Accession),"")#绘图ggplot( data,aes(log2(FC),-1*log10(PValue))) +#加载数据,定义横纵坐标 geom_point(aes(color = -log10(PValue),size = -log10(PValue))) + #设置散点图 geom_vline(xintercept = c(-log2(FC),log2(FC)),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于x轴的虚线 geom_hline(yintercept = -log10(PValue),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于y轴的虚线,lty指定线条类型线,lwd是线条宽度 annotate("text",x = 1.3,y = -log10(PValue) - 0.1,label = "P-value=0.05",colour = "black") + annotate("text",x = -0.37,y = 6.4,label = "FC=1/1.2",colour = "black") + annotate("text",x = 0.35,y = 6.4,label = "FC=1.2",colour = "black") +#annotate添加虚线标签 scale_color_gradientn(values = seq(0,1,0.2), colors = c("#39489f","#39bbec","#f9ed36","#f38466","#b81f25")) +#指定颜色渐变模式 scale_size_continuous(range = c(2,3)) +#指定散点大小渐变模式 geom_text_repel(aes(label = label,color = -1*log10(PValue)), max.overlaps = 1000000) +#最大覆盖率,当点很多时,有些标记会被覆盖,调大该值则不会被覆盖 labs(title = "Volcano Plot",#主题 x = "log2(FC)",#x轴标签 y = "-log10(P-value)") + theme_bw() + #设置主题 theme(panel.grid = element_blank())#移除网格
也可以设置arrow = arrow(length = unit(0.015, "npc")添加箭头标注,如下图:
NO.4
添加边际密度图和直方图并设定颜色
data$label <- ifelse(data$Marker == 1,as.character(data$Accession),"")ggplot( data,aes(log2(FC),-1*log10(PValue))) +#加载数据,定义横纵坐标 geom_point(aes(color = -log10(PValue),size = -log10(PValue))) + #设置散点图 geom_vline(xintercept = c(-log2(FC),log2(FC)),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于x轴的虚线 geom_hline(yintercept = -log10(PValue),lty = 2,col = "#999999",lwd = 0.8) +#添加垂直于y轴的虚线,lty指定线条类型线,lwd是线条宽度 annotate("text",x = 1.3,y = -log10(PValue) - 0.1,label = "P-value=0.05",colour = "black") + annotate("text",x = -0.37,y = 6.4,label = "FC=1/1.2",colour = "black") + annotate("text",x = 0.35,y = 6.4,label = "FC=1.2",colour = "black") +#annotate添加虚线标签 scale_color_gradientn(values = seq(0,1,0.2), colors = c("#39489f","#39bbec","#f9ed36","#f38466","#b81f25")) +#指定颜色渐变模式 scale_size_continuous(range = c(2,3)) +#指定散点大小渐变模式 geom_text_repel(aes(label = label,color = -1*log10(PValue)), max.overlaps = 1000000,#最大覆盖率,当点很多时,有些标记会被覆盖,调大该值则不会被覆盖 arrow = arrow(length = unit(0.015, "npc")),#添加箭头 box.padding = 1.2,segment.size = 0.65,#标记的边距和标记线条的大小 show.legend = F,nudge_x = ifelse(log2(data$FC) < 0,-0.5,0.5)) + labs(title = "Volcano Plot",#主题 x = "log2(FC)",#x轴标签 y = "-log10(P-value)") + guides(col = guide_colourbar(title = "-Log10(P-value)"),size = "none")+#不显示部分图例 theme_bw() + #设置主题 theme(panel.grid = element_blank(), legend.position = "left"#改变图例位置 )#移除网格q <- ggMarginal(p,type="densigram",#添加边际图 xparams = list(fill ="orange"), yparams = list(fill ="skyblue"))
经过这四步渐变火山图就绘制完成了,但本系列还没有结束,后面小鹿将会用R语言绘制双曲线火山图,敬请期待~
文末福利
大家知道在KEGG PATHWAY Database里面,Map01100就是总的代谢图,但是下载下来的版本总是不清楚,看不清代谢物质,不满意。