statByIndex               package:plw               R Documentation

_C_o_m_p_u_t_e_s _s_t_a_t_i_s_t_i_c_s _b_y _i_n_d_e_x _o_r _b_y _r_o_w

_D_e_s_c_r_i_p_t_i_o_n:

     These function give the same result as  by(x,index,mad)
     by(x,index,mean) by(x,index,median) but are much faster. NOTE: The
     index vector is assumed to be SORTED and should contain INTEGER
     values only.

     The function meanSdByRow computes mean and standard deviation for
     each row of the matrix mat. A list with mean and sd is returned
     and gives the the same result as:

     list(mean=apply(mat,1,mean),sd=apply(mat,1,sd))

_U_s_a_g_e:

     madByIndex(x,index)
     meanByIndex(x,index)
     medianByIndex(x,index)
     orderStatByIndex(x,index,orderStat)
     sdByIndex(x,index)
     meanSdByRow(mat)

_A_r_g_u_m_e_n_t_s:

       x: Data vector

   index: Index vector

orderStat: Which order statistic to compute

     mat: Matrix

_D_e_t_a_i_l_s:

     See the definition (R-code) of each function for details.

_V_a_l_u_e:

     All but the last function: A vector with the statistic for each
     level if index. meanSdByRow: A list with items mean and sd.

_A_u_t_h_o_r(_s):

     Magnus Astrand

_S_e_e _A_l_s_o:

     by, apply

_E_x_a_m_p_l_e_s:

     ## Example 1
     ## Computing, mad, mean and median by index.
     ## Compares with the result obtained using by(...) 

     n<-10000
     x<-rnorm(n)
     index<-sort(round(runif(n,0.5,10.5)))

     mad1<-madByIndex(x,index)
     mad2<-by(x,index,mad)

     mean1<-meanByIndex(x,index)
     mean2<-by(x,index,mean)

     median1<-medianByIndex(x,index)
     median2<-by(x,index,median)

     par(mfrow=c(2,2),mar=c(4,4,1.5,.5),mgp=c(1.5,.25, 0))
     plot(mad1,mad2,main="Comparing mad",pch=19)
     abline(a=0,b=1,col=2)
     plot(mean1,mean2,main="Comparing mean",pch=19)
     abline(a=0,b=1,col=2)
     plot(median1,median2,main="Comparing median",pch=19)
     abline(a=0,b=1,col=2)

     ## Example 2
     ## Computing, median by index
     ## Compares with the running time when using by(...)
     n<-200000
     x<-rnorm(n)
     index<-sort(round(runif(n,0.5,10.5)))

     system.time(median1<-medianByIndex(x,index))

     system.time(median2<-by(x,index,median))

     ## Example 3
     ## Computing, mean and sd by row 
     ## Compares with using apply
     nrow<-5000
     ncol<-20
     mat<-matrix(rnorm(ncol*nrow),nrow,ncol)

     system.time(res1<-meanSdByRow(mat))
     system.time(res2<-list(mean=apply(mat,1,mean),sd=apply(mat,1,sd)))

     par(mfrow=c(1,2),mar=c(4,4,1.5,.5),mgp=c(1.5,.25, 0))
     plot(res1$mean,res2$mean,pch='.')
     plot(res1$sd,res2$sd,pch='.')

