Row_number() Shows Unexpected Values
Solution 1:
The PARTITION BY
clause of the ROW_NUMBER()
function instructs it to partition the entire row set by Price
values and assign row numbers in the ascending order of ID
s.
It seems like you want to distinguish between any two groups of rows with identical Price
values that are separated by at least one row with a different Price
.
There may be various ways to achieve that. In SQL Server (and I think the same would work in PostgreSQL too), I would first use two ROW_NUMBER()
calls to get an additional partitioning criterion, then rank rows once again using that criterion, like this:
WITH partitioned AS (
SELECT
ID,
Date_trans,
Time_trans,
Price,
ROW_NUMBER() OVER ( ORDERBY ID) -
ROW_NUMBER() OVER (PARTITION BY Price ORDERBY ID) AS PriceGroup
FROM MyTable
)
SELECT
ID,
Date_trans,
Time_trans,
Price,
ROW_NUMBER() OVER (PARTITION BY Price, PriceGroup ORDERBY ID) AS RowCount
FROM partitioned
ORDERBY ID
;
Here's a SQL Fiddle demo.
Solution 2:
Pure SQL
WITH x AS (
SELECT id, date_trans, time_trans, price
,(price <> lag(price) OVER (ORDERBY id))::int ASstepFROM tbl
)
,y AS (
SELECT *, sum(step) OVER (ORDERBY id) AS grp
FROM x
)
SELECT id, date_trans, time_trans, price
,row_number() OVER (PARTITION BY grp ORDERBY id) As row_ct
FROM y
ORDERBY id;
The logic:
- Remember when the price changes compared to the last row in
step
. (Special case of first row works, too.) - Sum up steps, so that identical prices in sequence end up in the same group
grp
. - Number rows per group.
Honestly, I think @Andriy's solution is a wee bit more elegant. It needs three window functions, too, but can do it in only two query steps. In a quick test on the small sample it was also slightly faster. So, +1 from me.
If performance is of the essence, a more specialized solution with a
PL/pgSQL function
should be considerably faster, because it only needs to scan and order the table once.
CREATEOR REPLACE FUNCTION f_my_row_ct()
RETURNSTABLE (
id int
,date_trans date
,time_trans time
,price numeric
,row_ct int
) AS
$BODY$
DECLARE
_last_price numeric; -- remember price of last rowBEGINFOR id, date_trans, time_trans, price INSELECT t.id, t.date_trans, t.time_trans, t.price
FROM tbl t
ORDERBY t.id
LOOP
IF _last_price = price THEN-- works with 1st row, too
row_ct := row_ct +1;
ELSE
row_ct :=1;
END IF;
RETURN NEXT;
_last_price = price; -- remember last priceEND LOOP;
END;
$BODY$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_my_row_ct()
In another quick test on the small sample this was 3-4x faster. Test with EXPLAIN ANALYZE
to see.
As an aside: you could simplify your table (and queries) and save some bytes of storage by merging date_trans date
and time_trans time
into ts_trans timestamp
.
It's very simple and very fast to extract date
or time
from a timestamp
with a cast:
ts_trans::datets_trans::time
Solution 3:
1699100 Price 58.0 - is showing 3 because 1699097,8 are 1,2
1699104 Price 57.96 – is showing 2 because 1669101 is 1.
1699105, 1699106 Price 57.93 – showing 2, 3, because 1699103 is 1
If you want to find items of the same value in a sequence, one option is to join the data to the previous ID and see if the values are the same
Solution 4:
From what I can gather by your expections of results, you need to partition over Time_trans too:
SELECT ID, Date_trans, Time_trans, Price
,ROW_NUMBER() OVER(PARTITIONBY Time_trans, Price ORDERBY ID) RowCount
FROM MyTable
ORDERBY ID
I believe this is the case as your expecting the ROW_NUMBER to start again when the Time-trans value changes as you progress through the data.
Also you might want to add Date_trans in there too if there could be multiple dates in the table, which I would expect.
Post a Comment for "Row_number() Shows Unexpected Values"