pyspark.pandas.DataFrame.quantile¶
-
DataFrame.
quantile
(q: Union[float, Iterable[float]] = 0.5, axis: Union[int, str] = 0, numeric_only: bool = True, accuracy: int = 10000) → Union[DataFrame, Series][source]¶ Return value at the given quantile.
Note
Unlike pandas’, the quantile in pandas-on-Spark is an approximated quantile based upon approximate percentile computation because computing quantile across a large dataset is extremely expensive.
- Parameters
- qfloat or array-like, default 0.5 (50% quantile)
0 <= q <= 1, the quantile(s) to compute.
- axisint or str, default 0 or ‘index’
Can only be set to 0 at the moment.
- numeric_onlybool, default True
If False, the quantile of datetime and timedelta data will be computed as well. Can only be set to True at the moment.
- accuracyint, optional
Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.
- Returns
- Series or DataFrame
If q is an array, a DataFrame will be returned where the index is q, the columns are the columns of self, and the values are the quantiles. If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.
Examples
>>> psdf = ps.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]}) >>> psdf a b 0 1 6 1 2 7 2 3 8 3 4 9 4 5 0
>>> psdf.quantile(.5) a 3.0 b 7.0 Name: 0.5, dtype: float64
>>> psdf.quantile([.25, .5, .75]) a b 0.25 2.0 6.0 0.50 3.0 7.0 0.75 4.0 8.0