본문 바로가기

SQL

Window Function

Window Function


  • Window Function은 데이터의 부분집합(subset)에 대해 실행되는 함수에 기반해 새로운 컬럼을 생성한다. 레코드마다 실행되기 때문에 시간이 오래걸릴 수 있다..
  • sum, min, max, avg, first_value, last_value, nth_value, percent_rank, rank, row_number, lag, lead 등이 있다.
Window_function(arg1, arg2, ...) OVER (
    [PARTITION BY <partition expression>]
    [ORDER BY <sort_expression> [ASC|DESC] [NULLS FIRST|NULLS LAST]
)
-- Example
SELECT
  *,
    MAX(salary) OVER () -- maximum salary among all 
FROM salaries
LIMIT 5;
 emp_no | salary | from_date  |  to_date   |  max
--------+--------+------------+------------+--------
  10001 |  60117 | 1986-06-26 | 1987-06-26 | 158220
  10001 |  62102 | 1987-06-26 | 1988-06-25 | 158220
  10001 |  66074 | 1988-06-25 | 1989-06-25 | 158220
  10001 |  66596 | 1989-06-25 | 1990-06-25 | 158220
  10001 |  66961 | 1990-06-25 | 1991-06-25 | 158220

PARTITION BY


  • 역할: 레코드들을 기준에 따라 파티션으로 나눈다. 파티션은 함수를 적용하는 기본 단위가 된다.(optional)
-- Example
-- Q. Average salary per department

SELECT DISTINCT
    dp.dept_name,
    AVG(salary) OVER (
        PARTITION BY dp.dept_name
    )
FROM salaries
JOIN dept_emp AS de USING (emp_no)
JOIN departments AS dp USING (dept_no);
     dept_name      |        avg
--------------------+--------------------
 Customer Service   | 58770.366479762480
 Development        | 59478.901162431823
 Finance            | 70489.364896996098
 Human Resources    | 55574.879369695531
 Marketing          | 71913.200004191532
 Production         | 59605.482461651448
 Quality Management | 57251.271913415990
 Research           | 59665.181701268599
 Sales              | 80667.605755337693

Frame Clause

파티션 안의 frame(subrange)을 생성할 수 있다. → framing

  • ROWS/RANGE : 프레임으로 ROWS나 RANGE를 사용
  • PRECEDING : rows before the current one
  • FOLLOWING : rows after the current one
  • UNBOUNDED PRECEDING / FOLLOWING : return all before / after the current one
  • CURRENT ROW : your current row

💡주의 ORDER BY 유무에 따라 기본(default) 프레이밍이 달라진다!

  • without ORDER BY : all partition rows
  • with ORDER BY : everything before the current row and the current row
-- Example
-- Q. Employees' current salary

SELECT DISTINCT e.emp_no, e.first_name, d.dept_name,
LAST_VALUE(s.salary) OVER (
        PARTITION BY e.emp_no
        ORDER BY s.from_date
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS "Current Salary"
FROM salaries AS s
JOIN employees AS e USING(emp_no)
JOIN dept_emp AS de USING(emp_no)
JOIN departments AS d USING(dept_no)
ORDER BY e.emp_no;
emp_no | first_name |    dept_name    | Current Salary
--------+------------+-----------------+----------------
  10001 | Georgi     | Development     |          88958
  10002 | Bezalel    | Sales           |          72527
  10003 | Parto      | Production      |          43311
  10004 | Chirstian  | Production      |          74057
  10005 | Kyoichi    | Human Resources |          94692
                        ...

Window Function Examples


FIRST_VALUE / LAST_VALUE

그룹 안의 처음/마지막 레코드의 값을 리턴한다. (프레이밍에 따라 범위가 달라진다)

-- Example
-- Q. How my price is compared to the item with the highest price in the same category?

SELECT prod_id, price, category,
LAST_VALUE(price) OVER (
        PARTITION BY category ORDER BY price
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS "most expensive in category"
FROM products;
 prod_id | price | category | most expensive in category
---------+-------+----------+----------------------------
    5977 |  9.99 |        1 |                      29.99
     803 |  9.99 |        1 |                      29.99
    9386 |  9.99 |        1 |                      29.99
    6609 |  9.99 |        1 |                      29.99
    4175 |  9.99 |        1 |                      29.99
                          ...

SUM

그룹 안의 값들을 모두 더한 값을 리턴한다. (프레이밍에 따라 범위가 달라진다)

-- Example
-- Q How much cumulatively a customer has ordered at our store?

SELECT o.orderid, o.customerid, o.netamount,
SUM(o.netamount) OVER (
        PARTITION BY o.customerid
        ORDER BY o.orderid
    ) AS "cum sum"
FROM orders AS o
ORDER BY o.customerid;

ROW_NUMBER

프레이밍에 상관없이 파티션 안의 현재 로우에 1부터 숫자를 매긴다.

-- Example
-- Q. Where my product is positioned in the category by price?

SELECT prod_id, price,category,
ROW_NUMBER() OVER (
        PARTITION BY category ORDER BY price
    ) AS "position in category by price"
FROM products
ORDER BY prod_id;
 prod_id | price | category | position in category by price
---------+-------+----------+-------------------------------
       1 | 25.99 |       14 |                           466
       2 | 20.99 |        6 |                           338
       3 | 28.99 |        6 |                           593
       4 | 14.99 |        3 |                           165
       5 | 11.99 |        3 |                            64
                          ...

'SQL' 카테고리의 다른 글

View  (0) 2021.09.27
CASE, NULLIF  (0) 2021.09.27
UNION, GROUPING SETS, ROLL UP  (0) 2021.09.27
GROUP BY, HAVING  (0) 2021.09.27
JOIN  (0) 2021.09.27