r - Spliting then plotting uneven vector lengths to a single graph -
i'm using data in format shown: actual data set longer. column labels are: date | variable 1 | variable 2 | failed ?
i'm sorting data date order. dates may missing, ordering function should sorting out. there, i'm trying split data sets new sets denoted far right column registering 1. i'm trying plot these sets on single graph number of days passed on x-axis. i've looked using ggplot
function, seems require frames length of each vector known. tried creating matrix of length based on maximum number of days passed sets , fill spare cells nan
values plotted, took ages data set quite large. wondering whether there more elegant way of plotting values against days past sets on single graph, , iterate process additional variables.
appreciated.
code reproducible example included here:
test <-matrix(c( "01/03/1997", 0.521583294, 0.315170092, 0, "02/03/1997", 0.63946859, 0.270870821, 0, "03/03/1997", 0.698687101, 0.253495021, 0, "04/03/1997", 0.828754157, 0.233024574, 0, "05/03/1997", 0.87078867, 0.214507537, 0, "06/03/1997", 0.883279874, 0.212268627, 0, "07/03/1997", 0.952083969, 0.062663598, 0, "08/03/1997", 0.991100195, 0.054875256, 0, "09/03/1997", 0.992490126, 0.026610776, 1, "10/03/1997", 0.020707391, 0.866874513, 0, "11/03/1997", 0.32405139, 0.778696984, 0, "12/03/1997", 0.32665243, 0.703234151, 0, "13/03/1997", 0.603941956, 0.362869647, 0, "14/03/1997", 0.944046386, 0.026992527, 1, "15/03/1997", 0.108246142, 0.939363715, 0, "16/03/1997", 0.152195386, 0.907458966, 0, "17/03/1997", 0.285748169, 0.765212667, 0), ncol = 4, byrow=true) colnames(test) <- c("date", "variable 1", "variable 2", "failed") test <-as.table(test) test
i've managed hash solution, looks messy. i'm convinced there far more elegant way of solving this.
z = as.data.frame.matrix(test) attach(z) x = as.numeric(as.character(failed)) x = cumsum(x) #variable names recycled
a corrected cumulative failure sum puts data sets of number of preceding failures
z <- within(z, acc_sum <- x) attach(z) z$acc_sum <- as.numeric(as.character(z$acc_sum))-as.numeric(as.character(z$failed)) attach(z) z = data.frame(z, group_index=ave(acc_sum==acc_sum,acc_sum,fun=cumsum)
an row created has number of days passed since start of measurement. it's easier read code keep new variable names keep indexing directly.
attach(z) x = (max(acc_sum)+1) #this number of sets of variable results
current columns read: date|variable.1|variable.2|failed|acc_sum|group_index
library(ggplot2) n = data.frame(acc_sum, group_index)
this initialises frame , should make faster group_index
, acc_sum
aren't read-in each time.
for(j in 1:(ncol(z)-4)){ #this iterates through variables generate new set of lists. -4 removing date, failed, group_index , acc_sum n$variable <- z[,(j+1)] #this reads in new variable data, requires variables next each other n[] <- lapply(n,function(x)as.numeric(as.character(x))) #this ensures values numeric plotting plot <- ggplot(n, aes(x = group_index, y = variable, colour = acc_sum)) + theme_bw() + geom_line(aes(group=acc_sum)) #linetype = "dotted" print(plot) #this ensures graph presented in every iteration cat ("press [enter] continue") #this waits user input before moving next variable line <- readline() }
graph improved actual variable name change being plotted. done including ylabel
in for
loop.
Comments
Post a Comment