नोट
इस पृष्ठ तक पहुंच के लिए प्राधिकरण की आवश्यकता होती है। आप साइन इन करने या निर्देशिकाएँ बदलने का प्रयास कर सकते हैं।
इस पृष्ठ तक पहुंच के लिए प्राधिकरण की आवश्यकता होती है। आप निर्देशिकाएँ बदलने का प्रयास कर सकते हैं।
Applies to:
Databricks SQL
Databricks Runtime 10.2 and above
Computes a histogram on expr using numBins bins.
Syntax
histogram_numeric ( [ALL | DISTINCT ] expr, numBins ) [ FILTER ( WHERE cond ) ]
Arguments
expr: A numeric,TIMESTAMP,DATE, orINTERVALexpression that the function consumes and computes the histogram on.numBins: AnINTEGERliteral that must be greater than 1, specifying the number of bins for the histogram computation.cond: An optionalBOOLEANexpression that filters the rows for aggregation.
Returns
The return value is an ARRAY of STRUCTS with fields x and y representing the centers of the histogram's bins.
The type of x is the same as the type of expr, and the type of y is DOUBLE.
Increasing the value of numBins refines the histogram approximation, making it finer-grained. However, it can introduce artifacts around outliers.
Typically, 20-40 bins are effective for histograms, though skewed or smaller datasets might require more bins.
Note that this function creates a histogram with non-uniform bin widths.
It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by other computing packages.
Specifying DISTINCT makes the function operate only on a unique set of expr values.
Examples
> SELECT histogram_numeric(col, 5)
FROM VALUES (0), (1), (2), (10) AS tab(col);
[{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (0L), (1L), (2L), (10L) AS tab(col);
[{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (0F), (1F), (2F), (10F) AS tab(col);
[{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (0D), (1D), (2D), (10D) AS tab(col);
[{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (INTERVAL 0 YEAR), (INTERVAL 1 YEAR), (INTERVAL 2 YEAR),
(INTERVAL 3 YEAR) AS tab(col);
[{"x":0-0,"y":1.0},{"x":1-0,"y":1.0},{"x":2-0,"y":1.0},{"x":3-0,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (INTERVAL 0 DAY), (INTERVAL 1 DAY), (INTERVAL 2 DAY),
(INTERVAL 3 DAY) AS tab(col);
[{"x":0 00:00:00.000000000,"y":1.0},{"x":1 00:00:00.000000000,"y":1.0},{"x":2 00:00:00.000000000,"y":1.0},{"x":3 00:00:00.000000000,"y":1.0}]
> SELECT histogram_numeric(col, 5)
FROM VALUES (TIMESTAMP '2020-01-01'), (TIMESTAMP'2020-02-01'),
(TIMESTAMP'2020-03-01'), (TIMESTAMP'2020-10-01') AS tab(col)
[{"x":2020-01-01 00:00:00,"y":1.0},{"x":2020-02-01 00:00:00,"y":1.0},{"x":2020-03-01 00:00:00,"y":1.0},{"x":2020-10-01 00:00:00,"y":1.0}]