How to order and partition by multiple columns?

8.7K    Asked by DavidEDWARDS in SQL Server , Asked on Mar 20, 2023

I am using PostgreSQL.

I have a table with 2 columns in this example:

I want to add a new column with a unique id corresponding to partitions by name and category as shown in the result. Then, I want to take a random sample choosing 2 (or more) unique ids because under each unique id, there will be a lot of other historical data.


I have tried this so far, but I get 1s for everything. I'm missing something really simple here, but how do I correct my mistake? I have to do this operation for several million rows in the real table.


SELECT
  dense_rank() over (partition by name, category order by category) as unique_id,
  *
FROM
 example_table
After this, presumably I'll have to use RAND() somewhere but how do I do this?

This is my naive approach to get the solution in the above pic.

with ranks (
  ...
)
select * from ranks where unique_id = 2 or unique_id = 4
Answered by David EDWARDS

Use ROW_NUMBER() function better for order and partition by multiple columns.


SELECT
    name,
    category,
    ROW_NUMBER() OVER (PARTITION BY name, category ORDER BY name, category) unique_id
FROM
    test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 2
A | Alpha | 3
A | Beta | 1
A | Beta | 2
B | Alpha | 1
B | Alpha | 2
B | Theta | 1
db<>fiddle here
To get a unique_id for each pair of name-category, remove the PARTITION part and use only ORDER BY:
SELECT
    name,
    category,
    DENSE_RANK() OVER (ORDER BY name, category) unique_id
FROM
    test;
name | category | unique_id
:--- | :------- | --------:
A | Alpha | 1
A | Alpha | 1
A | Alpha | 1
A | Beta | 2
A | Beta | 2
B | Alpha | 3
B | Alpha | 3
B | Theta | 4
db<>fiddle here
To select two rows of each unique_id: (As far as I've not used any order it is supposed to be random)
WITH ct AS
(
  SELECT
      name,
      category,
      DENSE_RANK() OVER (ORDER BY name, category) unique_id
  FROM
      test
)
SELECT
    a.unique_id, c.name, c.category
FROM
    (SELECT DISTINCT unique_id
     FROM ct) a
JOIN LATERAL (SELECT b.name, b.category
              FROM ct b
              WHERE b.unique_id = a.unique_id
              LIMIT 2) c ON TRUE
ORDER BY
    a.unique_id;
unique_id | name | category
--------: | :--- | :-------
        1 | A | Alpha
        1 | A | Alpha
        2 | A | Beta
        2 | A | Beta
        3 | B | Alpha
        3 | B | Alpha
        4 | B | Theta


Your Answer

Interviews

Parent Categories