ITKeyword,专注技术干货聚合推荐

注册 | 登录

解决r - Plotting each column of a dataframe as one line using ggplot

itPublisher 分享于

2020腾讯云10周年活动,优惠非常大!(领取2860元代金券),
地址https://cloud.tencent.com/act/cps/redirect?redirect=1040

2020阿里云最低价产品入口,含代金券(新老用户有优惠),
地址https://www.aliyun.com/minisite/goods

-1

The whole dataset describes a module (or cluster if you prefer).

In order to reproduce the example, the dataset is available at: https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0

(54kb file)

You can read as:

test_example <- read.table(file='example_dataset.txt')

What I would like to have in my plot is this

On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.

This is exactly what I want, but the way I achieved this was with the following code:

plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...

As you can see it is not very automated. I thought about putting in a loop, like

columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
  plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap(  ~ ConditionID, ncol=6) )

That doesn't work. I found this topic Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem. I tried the solution given with the melt() function.

The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:

data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)

I tried using aggregate

aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)

Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.

Can anyone suggest me an approach. I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.

Thanks

r plot ggplot2 facet-wrap
|
  this question
edited Nov 19 '14 at 15:20 asked Nov 19 '14 at 14:20 Rafael Santos 1 3 2   To convert your data into long format (using e.g. melt) is the standard ggplot way here I would say. Please provide a minimal, self contained example (see e.g. here) and show your attempts using melt. –  Henrik Nov 19 '14 at 14:27 1   Suggest you to post sample data in the Q directly such that folks here could test before putting forward their solution. –  KFB Nov 19 '14 at 14:28      thanks for the feedback. I added the data now for reproducibility. –  Rafael Santos Nov 19 '14 at 15:22      dropbox file no longer available –  Boern May 18 '16 at 8:40

 | 

1 Answers
1

解决方法

You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:

melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
  geom_line(aes(group=paste0(variable, InModule)))
p

|
  this answer
answered Nov 19 '14 at 15:59 shadow 15k 2 26 51      Thanks! This solved my problem. However I'm a bit confused. Why when I don't specify id.vars, melt automatically kept almost all the columns I needed except for the last one? What is the criteria here? Is it because it recognized every column as numeric, and then finally found columns that were factor and assumed that all the columns until that point were the correct one? Also, the way you did you are saying that Timepoints column work as id, which is not true. They are somehow values like the other columns, but only make sense when grouped by each condition they represent –  Rafael Santos Nov 19 '14 at 17:01

 | 


相关阅读排行


相关内容推荐

最新文章

×

×

请激活账号

为了能正常使用评论、编辑功能及以后陆续为用户提供的其他产品,请激活账号。

您的注册邮箱: 修改

重新发送激活邮件 进入我的邮箱

如果您没有收到激活邮件,请注意检查垃圾箱。