weightedMedian      package:aroma.light      R Documentation(latin1)

_W_e_i_g_h_t_e_d _M_e_d_i_a_n _V_a_l_u_e

_D_e_s_c_r_i_p_t_i_o_n:

     Computes a weighted median of a numeric vector.

_U_s_a_g_e:

     ## Default S3 method:
     weightedMedian(x, w, na.rm=NA, interpolate=is.null(ties), ties=NULL, method=c("quick", "shell"), ...)

_A_r_g_u_m_e_n_t_s:

       x: a 'numeric' 'vector' containing the values whose weighted
          median is to be computed.

       w: a vector of weights the same length as 'x' giving the weights
          to use for each element of 'x'. Negative weights are treated
          as zero weights. Default value is equal weight to all values.

   na.rm: a logical value indicating whether 'NA' values in 'x' should
          be stripped before the computation proceeds, or not.  If
          'NA', no check at all for 'NA's is done. Default value is
          'NA' (for effiency).

interpolate: If 'TRUE', linear interpolation is used to get a
          consistant estimate of the weighted median.

    ties: If 'interpolate == FALSE', a character string specifying how
          to solve ties between two 'x''s that are satisfying the
          weighted median criteria. Note that at most two values can
          satisfy the criteria. When 'ties' is '"min"', the smaller
          value of the two is returned and when it is '"max"', the
          larger value is returned. If 'ties' is '"mean"', the mean of
          the two values is returned and if it is '"both"', both values
          are returned. Finally, if 'ties' is '"weighted"' (or 'NULL')
          a weighted average of the two are returned, where the weights
          are weights of all values 'x[i] <= x[k]' and 'x[i] >= x[k]',
          respectively.

  method: If '"shell"', then 'order()' is used and when
          'method="quick"', then internal 'qsort()' is used.

     ...: Not used.

_D_e_t_a_i_l_s:

     For the 'n' elements 'x = c(x[1], x[2], ..., x[n])' with positive
     weights 'w = c(w[1], w[2], ..., w[n])' such that 'sum(w) = S', the
     _weighted median_ is defined as the element 'x[k]' for which the
     total weight of all elements 'x[i] < x[k]' is less or equal to
     'S/2' and for which the total weight of all elements 'x[i] > x[k]'
     is less or equal to 'S/2' (c.f. [1]).

     If 'w' is missing then all elements of 'x' are given the same
     positive weight. If all weights are zero, 'NA' is returned.

     If one or more weights are 'Inf', it is the same as these weights
     have the same weight and the others has zero. This makes things
     easier for cases where the weights are result of a division with
     zero. In this case 'median()' is used internally.

     When all the weights are the same (after values with weight zero
     are excluded and 'Inf''s are taken care of), 'median' is used
     internally.

     The weighted median solves the following optimization problem:


         alpha^* = arg_alpha min sum_{k=1}{K} w_k |x_k-alpha|

     where x=(x_1,x_2,...,x_K) are scalars and w=(w_1,w_2,...,w_K) are
     the corresponding "weights" for each individual x value.

_V_a_l_u_e:

     Returns the weighted median.

_B_e_n_c_h_m_a_r_k_s:

     When implementing this function speed has been highly prioritized
     and it also making use of the internal quick sort algorithm (from
     R v1.5.0). The result is that 'weightedMedian(x)' is about half as
     slow as 'median(x)'. It is hard to say how much since it depends
     on the data set, but it is also hard to time it exactly since
     internal garbage collector etc might mess up the measurements.

     Initial test also indicates that 'method="shell"', which uses
     'order()' is slower than 'method="quick"', which uses internal
     'qsort()'.  Non-weighted median can use partial sorting which is
     faster because all values do not have to be sorted.

     See examples below for some simple benchmarking tests.

_A_u_t_h_o_r(_s):

     Henrik Bengtsson and Ola Hossjer, Centre for Mathematical
     Sciences, Lund University. Thanks to Roger Koenker, Econometrics,
     University of Illinois, for the initial ideas.

_R_e_f_e_r_e_n_c_e_s:

     [1]  T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to
     Algorithms, The MIT Press, Massachusetts Institute of Technology,
     1989.

_S_e_e _A_l_s_o:

     'median', 'mean'() and 'weighted.mean'.

_E_x_a_m_p_l_e_s:

     x <- 1:10
     n <- length(x)

     m1 <- median(x)                           # 5.5
     m2 <- weightedMedian(x)                   # 5.5
     stopifnot(identical(m1, m2))

     w <- rep(1, n)
     m1 <- weightedMedian(x, w)                # 5.5 (default)
     m2 <- weightedMedian(x, ties="weighted")  # 5.5 (default)
     m3 <- weightedMedian(x, ties="min")       # 5
     m4 <- weightedMedian(x, ties="max")       # 6
     stopifnot(identical(m1,m2))

     # Pull the median towards zero
     w[1] <- 5
     m1 <- weightedMedian(x, w)                # 3.5
     y <- c(rep(0,w[1]), x[-1])                # Only possible for integer weights
     m2 <- median(y)                           # 3.5
     stopifnot(identical(m1,m2))

     # Put even more weight on the zero
     w[1] <- 8.5
     weightedMedian(x, w)                # 2

     # All weight on the first value
     w[1] <- Inf
     weightedMedian(x, w)                # 1

     # All weight on the last value
     w[1] <- 1
     w[n] <- Inf
     weightedMedian(x, w)                # 10

     # All weights set to zero
     w <- rep(0, n)
     weightedMedian(x, w)                # NA

     # Simple benchmarking
     bench <- function(N=1e5, K=10) {
       x <- rnorm(N)
       t <- c()
       gc()
       t[1] <- system.time(for (k in 1:K) median(x))[3]
       gc()
       t[2] <- system.time(for (k in 1:K) weightedMedian(x, method="quick"))[3]
       gc()
       t[3] <- system.time(for (k in 1:K) weightedMedian(x, method="shell"))[3]
       t <- t / t[1]
       t[4] <- t[2]/t[3]
       names(t) <- c("median", "wMed-quick", "wMed-shell", "quick/shell")
       t
     }

     print(bench(N=  5, K=1000))
     print(bench(N=100, K=1000))
     print(bench(N=1e3, K=100))
     print(bench(N=1e5, K=10))
     print(bench(N=1e6, K=1))

