Using MySQL: select random row, but fast

Using MySQL RAND()

One may wonder why I am touching this subject, as SQL responds to this problem ordering by RAND() like this:

SELECT * FROM table ORDER BY RAND() LIMIT 1

But there is only one problem with this solution - it is very slow, as RAND() should be executed for each row, assigned and only after this MySQL will be able to select the record. If table is small this approach will work and probably will work best. But what if table contains 100 000+ records? or even more...

Using MySQL index column to select random row

MySQL gurus suggest to use the key in 3 steps

  1. Calculate maximum value of the key
  2. Generate random value from 1 to max value
  3. Select 1 row where key value is greater than random value

Something like this:

<?php
function random_row($table, $column) {
      $max_sql = "SELECT max(" . $column . ")  AS max_id FROM " . $table;
      $max_row = mysql_fetch_array(mysql_query($max_sql));
      $random_number = mt_rand(1, $max_row['max_id']);
      $random_sql = "SELECT * FROM " . $table . " WHERE " . $column . " >= " .$random_number . "  ORDER BY " . $column . " ASC LIMIT 1";
      $random_row = mysql_fetch_row(mysql_query($random_sql));
      if (!is_array($random_row)) {
          $random_sql = "SELECT * FROM " . $table . " WHERE " . $column . " < " . $random_number . "  ORDER BY " . $column . " DESC LIMIT 1";
          $random_row = mysql_fetch_row(mysql_query($random_sql));
      }
      return $random_row;
}

//USAGE
echo '<pre>';
print_r(random_row('YOUR_TABLE', 'YOUR_COLUMN'));
echo '</pre>';

?>

This approach wors perfect, but has a huge issue - keys should be consistent, because if record A has key 5 and record B has index 6, than record A will be selected 5 times more often than record B. So, keys should be recalculated after each rows removal. One more small issue is that algorithm has 3 stems and required execution of 2 SQL statements. To solve this problems I am introducing slightly modified algorithm.

MySQL random row - funny way

Firs of all I understand that this algorithm is not ideal and will not feet perfectly for all cases, but I think it may work for lot of them. So, idea is to add to the table one more key column and for each row insertion assign it a random value. Something like that:

INSERT INTO table (column1, column2, rand_col) VALUES(1,2,RAND());

Than selection of the random row becomes quite trivial task:

SELECT * FROM table WHERE rand_col>RAND() LIMIT 1

This algorithm allows removal of the records without the extra steps of key column recalculation. If table contains lot of recors this may be very important and save server resources. May be selection of several columns will work good for some projects. May be with ORDER BY rand_col. Like that:

SELECT * FROM table WHERE rand_col>RAND() ORDER BY rand_col LIMIT 10

Sure selection of single row executed 10 times will provide more random values, but for some cases second statement will work quire fine.

Do not forget to follow my blog to be informed about new articles.

Posted by:
Enjoyed this post? Share and Leave a comment below, thanks! :)