Using MySQL PHP: select random row based on weight

Im one of my previous articles I have reviewed several ways to select random row from MySQL database table. Today I am going to describe simple algorithm that allows to select random rows taking into consideration so called weight values, so some of the rows will be selected more often than other according to this weight walues. General idea is very simple - if we will represent all possible random values as a segment and weights as distanses between points W1, W2, W3, W4, W5 (see the image) on this segment, than it becomes clear that longer segments will catch more generated random values than shorter ones. Javascript implementation of this algorithm - weighted random value generation in javascript.

Now the only task left is to describe all this in terms of SQL and PHP. I will create very primitive table just to ilustrate thealgorithm:

CREATE TABLE `test_rand` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Weight` int(10) unsigned NOT NULL DEFAULT '0',
  `Lookup` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `Lookup` (`Lookup`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

As yu can see I have defined a key by Lookup column, so row serarch will work quite fast even for big tables. And here is PHP code:

<?
$mysqli = new mysqli("localhost", "root", "password", "test");

/* check connection */
if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
}

// Calculating lookup values
calcLookup();
// Getting maximum value for random generation
$Max = maxValue();

$rand = mt_rand(0, $Max);
$sql = "SELECT * FROM test_rand WHERE Lookup>=".$rand." LIMIT 1";
if ($result = $mysqli->query($sql)) {
	$Row = $result->fetch_object();
	var_dump($Row);
	
	$result->close();
}

$mysqli->close();

function calcLookup(){
    global $mysqli;

	$Weights = array();
    $sql = "SELECT ID, Weight FROM test_rand";
    if ($result = $mysqli->query($sql)) {
    	while ($Row = $result->fetch_object()){
    		$Weights[$Row->ID] = $Row->Weight;
    	}
    	$result->close();
	}
	
	$Cnt = 0;
	foreach($Weights as $ID=>$Weight){
		$Cnt+=$Weight;
		$sql = "UPDATE test_rand SET Lookup = ".$Cnt." WHERE ID=".$ID;
		$mysqli->query($sql);
	}
}

function maxValue(){
	global $mysqli;

    $sql = "SELECT SUM(Weight) AS TotWeight FROM test_rand";
    if ($result = $mysqli->query($sql)) {
    	$Row = $result->fetch_object();
    	$TotWeight = $Row->TotWeight;
	    $result->close();
	}
	
	return $TotWeight;
}

?>

I have tried to keep PHP code simple and clear, so I hope my description of the algorithm itself makes this piece of code clear to the reader. I should notice that it is not required to recalculate Lookup values before each search execution. It should not be recalculated even after new record insertion, because only Lookup for the recently added row should be calculated. It is very simple, as only requires to add weight of the new record to the maximum value in the column before insertion.

Important! After each row removal all Lookup values should be recalculated and this may me quite slow process for huge tables. So, I suggest to devide table data for smaller subsets, if possible.

Follow me on twitter and google+ to be informed about new articles. Do not hesitate to leave comments if you have ideas how to improve the algorithm.

Posted by:
Enjoyed this post? Share and Leave a comment below, thanks! :)