Is This A Good Or Bad Way Of Generating Random Numbers For Each Record?

January 31, 2023 Post a Comment

A colleague of mine discovered a behaviour in SQL Server which I was unaware of. CREATE VIEW dbo.vRandNumber AS SELECT RAND() as RandNumber GO CREATE FUNCTION dbo.RandNumber() RET

Solution 1:

I would not do this for a piece of software I wanted to continue working on future versions of SQL Server. I found a way to return a different values from RAND() for each row in a select statement. This discovery was 1) a bit of a hack and 2) was made on SQL Server 2005. It no longer works on SQL Server 2008. That experience makes me extra leary of relying on trickery to get rand() to return a random value per row.

Also, I believe SQL Server is allowed to optimize away the multiple calls to a UDF ... though that might be changing since they do allow some non-deterministic functions now.

For SQL Server 2005 only, a way to force rand() to execute per row in a select statement. Does not work on SQL Server 2008. Not tested on any version prior to 2005:

create table #t (i int)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)

select i, case when i = 1 then rand() else rand() end as r
from #t

1   0.84923391682467
2   0.0482397143838935
3   0.939738172108974

Also, I know you said you were not asking about the randomness of rand(), but I will a good reference is: http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx. It compares rand() to newid() and rand(FunctionOf(PK, current datetime)).

Solution 2:

It depends on what you need the random value for. It also depends on the format that you need the value in INTEGER, VARCHAR, etc.

if I need to sort rows randomly, I do something like

SELECT *
FROM [MyTable]
ORDER BY newID()

Likewise, you could generate a table of ints using the identity "feature" of SQL Server and perform a similar query and that could give you a random number.

My colleague needed a random integer per row, so he added a calculated field to our table and that generates one random number (integer) per row returned in a query. I'm not sure I recommend this; it caused issues in certain tools but it gave random integers for each table. We could then combine my solution of newid() and that table and get a set of random numbers when needed.

So I return to it depends. Can you elaborate on what you need it?

Update: Here is the table definition snippet my colleague used to have a computed column return a different random number per row, each time the table is queried:

CREATE TABLE [dbo].[Table](
    -- ...
    [OrderID] [smallint] NOT NULL,  --Not sure what happens if this is null
    -- ...
    [RandomizeID]  AS (convert(int,(1000 * rand(([OrderID] * 100 * datepart(millisecond,getdate())))))),
    -- ...
)

Solution 3:

If I had to select a random number for each row in SQL, and you could prove to me that RAND() is generating true random numbers...

Yes. I would probably use something like that.

Solution 4:

I wouldn't use this. As far as I know, RAND() uses the system time as seed and produces the same values when executed more than once quickly after each other. For example, try this:

SELECT    *, 
          RAND()
FROM      SomeTable

RAND() will give you the same value for each row.

Solution 5:

The view and udf approach is clumsy for me: excess trivial objects to use a flawed function.

I'd use CHECKSUM(NEWID()) to generate a random number (rather than RAND() * xxx), or the new SQL Server 2008 CRYPT_GEN_RANDOM

Lab Sql