[C#] 產生相異亂數

C# 有 Random 物件和 .Next() 等方法產生亂數，但不像 Python 有 random.sample() 等可直接產生相異隨機數字的好用方法，就整理了一下幾個並比較效率。以下目標都是產生 start 到 end 之間（頭尾有包含），number 個的相異且隨機的整數，並成一個整數陣列。例如 start = 20 、end = 50、number = 10，會產生 {37, 25, 48, 20, 32, 50, 44, 42, 29, 46}

作法一：單純 for loop

利用 Array 的 .Contains() 方法，若產生的新數已經出現過，就將 index-- 來重取，直到迴圈結束：

public int[] RandomDistinctByForLoop(int start, int end, int number)
{
  Random rnd = new Random();

  int[] arrayResult = new int[number];
  for (int i = 0; i < number; i++)
  {
    int rndResult = rnd.Next(start, end + 1);
    if (arrayResult.Contains(rndResult))  // 若已存在
    {
      i--;                                // 使 for 不會步進，再重覆一次
    }
    else                                  // 若不存在
    {
      arrayResult[i] = rndResult;         // 存入亂數結果
    }
  }
  return arrayResult;
}

簡單好寫，缺點是如果 number 很大時，在後期會因為多數都已重覆所以耗費大量時間。

作法二：單純 while loop

和上面差不多，用 while 來思考，很容易想到的寫法，若產生的新數沒有出現過，就放入 Array，並將 index++ ，直到數量達成`:

public int[] RandomDistinctByWhileLoop(int start, int end, int number)
{
  Random rnd = new Random();
  int[] arrayResult = new int[number];
  int index = 0, rndResult;
  do
  {
    rndResult = rnd.Next(start, end + 1);
    if (!arrayResult.Contains(rndResult))       // 亂數結果不存在
    {
      arrayResult[index++] = rndResult;         // 才存入，且index遞增
    }
  } while (index < number);
  return arrayResult;
}

和上面相似，所以也有後期效率問題。

作法三：巢狀for配while loop

弄作業時同學去 google 到的寫法，沒有利用 array 的 .Contains() 方法而是多一個內層的 for loop 和 while 檢查是否已存在：

public int[] RandomDistinctByForAndWhile(int start, int end, int number)
{
  Random rnd = new Random();
  int[] arrayResult = new int[number];
  for(int i=0; i < number; i++)
  {
    arrayResult[i] = rnd.Next(start, end + 1);
    for(int j=0; j < i; j++)
    {
      while (arrayResult[j] == arrayResult[i])
      {
        arrayResult[i] = rnd.Next(start, end + 1);
        j = 0;
      }
    }
  }
  return arrayResult;
}

這個的效率糟到不行，下面會看到。

作法四：用 List 亂數取值同時移除掉

這是我寫踩地雷和貪食蛇時用的方法，亂數出 List 中元素的位置 index，取出來該位置的數放入 Array 就把 List 的移除掉：

using System.Collections.Generic;
public int[] RandomDistinctByListRemove(int start, int end, int number)
{
  Random rnd = new Random();
  List<int> pickList = new List<int>();
  int[] arrayResult = new int[number];
  for (int i = start; i < end + 1; i++) 
  {
    pickList.Add(i);                  // 先造出連續數列，也可以用 Enumerable ToList
  }
  for (int i = 0; i < number; i++)
  {
    int rndIndex = rnd.Next(pickList.Count);    // 用剩餘長度取亂數作為 index
    arrayResult[i] = pickList[rndIndex];        // 取出該 index 位置的數
    pickList.RemoveAt(rndIndex);                // 移除使 List 變短
  }
  return arrayResult;
}

這個作法的好處是當 number 很大，並不會有重覆一直重新取亂數的情形，每次都是從 List 剩下的數中取，又因為之前有砍掉所以剩下也絕對不會重覆。

作法五：用 Linq 排序

StackOverflow 上很有名的作法，簡潔又有效率，無可挑剔：

using System.Linq;
public int[] RandomDistinctByLinq(int start, int end, int number)
{
Random rnd = new Random(DateTime.Now.Millisecond);
return Enumerable.Range(start, end - start + 1)
            .OrderBy(x => rnd.Next(number))
            .Take(number)
            .ToArray();
}

Enumerable.Range() 產生一個連續序列，.OrderBy() 將各位置置換到亂數後的位置，.Take() 只取前 number 個出來。

效率

下面用 Stopwatch 物件的 .Start() 和 .Stop() 計算之間經過的時間，再印出 StopwarchObject.ElapseMilliseconds 經過的毫秒數，例如：

using System.Diagnostics;
var sw = new Stopwatch();
sw.Start();
RandomDistinctByForLoop(1000000, 2000000, 100000);
sw.Stop();
Console.WriteLine("by for loop: " + sw.ElapsedMilliseconds);
sw.Reset();

以上程式碼會印出第一個只用 for 的方法，在取出 1000000 到 2000000 之間的 100000 個相異隨機整數，所花費的毫秒數。下面測試取 100000 個和取 900000 個的效率，前者佔範圍的十分之一，後者佔範圍的十分之九：

方法	取100000個	取900000個
RandomDistinctByForLoop	4719	604648
RandomDistinctByWhileLoop	4671	601386
RandomDistinctByForAndWhile	10470	1833934
RandomDistinctByListRemove	4033	20928
RandomDistinctByLinq	263	290

單位：毫秒(千分之一秒)

當取的數量愈來愈接近整個範圍時，例如一百萬個數字之中取九十萬個數，前三個方法後期愈有嚴重的效率問題，雖然「已重覆就重新再取」的想法很簡單，若有取逼近全範圍數量的需求時就要考慮後面兩個做法。例如貪食蛇要隨機生成食物，在遊戲後期因為蛇的成長使得食物的位置嚴重受限，可能要使用「將可以的位置做出來，從中亂數取一位置」，而不是「亂數取所有位置其中一個，若是蛇的身體則重取」。

機率公平性

但是 Linq 的寫法與效率快雖快，取個幾次之後發現，似乎沒有那麼均勻，似乎前面偏多，不太容易出現範圍後半段的數字，我就試驗了一下統計各個方法取出來數字的機率。以 1 到 100 之間取 20 個相異整數為例，理論上每個數字被取到的機率是 C⁹⁹₁₉ / C¹⁰⁰₂₀ = 1/5 = 0.20，實際結果呢？我試驗了 10000 次，每次都 1 到 100 之間取 20 個相異整數，確實每個數字都很接近 2000 個，大約是 0.20 ，但仔細看會發現 Linq 真的在範圍前段比較容易出現，不容易取得後段的整數。（亂數物件都有使用 GUID 即 new Random(Guid.NewGuid().GetHashCode()) 使得密集取亂數的結果不致於偏頗）

靠 for loop 和 Array 的 .Contains() ：