Musings

I'm just copying my father

Home

Thesis Appendix: Latin Hypercube

First Published: 2024 January 8

Draft 2: 8 January 2024

Ok so I missed a few days of posting there. However, I do want to make some progress on my thesis work tonight, even if it is only an appendix. So, let’s revise RebelFit Latin Hypercube.1

One of the fundamental algorithms in Rebelfit is Latin Hypercube Sampling/ It was initially described by () at () as a method for effectively sparsely sampling a space.

It is sometimes difficult to conceptualize sampling high dimensional spaces. For instance, if one wanted to take five samples in each dimension and do it as a grid, the number of samples grows exponentially as the number of dimensions increases. By the time that one reaches 8 dimensions (the number of variables in a Reduced Rotational Hamiltonian with Quartic Centrifugal Distortion), a grid five samples wide would require nearly four hundred thousand samples.

Another instinctual method for sampling a high dimensional space is to use a random sample. Or, at least, a pseudo-random sample (see appendix: randomness for more details). Latin Hypercube is demonstrably better at mapping a space than random sampling, though that improvement is reduced as dimensions and numbers of samples increase.2

The name Latin Hypercube comes from the concept of a Latin Square, which is inspired by the work of Leonhard Euler.3 It does include some aspects of randomness. In each of M dimensions, the sample space is cut into N slices, where N is the number of total samples desired. The slices can be defined in any arbitrary way, but the most common are even spacings and a Baysian distribution based on prior knowledge. Once each dimension is sliced, a point is randomly picked in each of the N wells in each dimension, and they are randomly correlated together. Unlike random sampling, this ensures that values are measured over the entire range of each variable. Without randomly correlating the dimensions, the trivial Latin Hypercube is constructed, where you sample along the hyperhypotenuse.

Draft 1: 31 December 2023

One of the fundamental algorithms in Rebelfit is the Latin Hypercube. It was initially described by () at () as a method of more effectively sparsely sampling a space.

Its name comes from the concept of a Latin Square, which is an arrangement of rooks on a grid such that none can take each other. As with all sampling methods that are better than random sampling, Latin Hypercube sampling becomes less better4 than random sampling as the number of samples or the number of dimensions increases.

In order to construct a Latin Hypercube, you need to define the number of dimensions (M) and the number of samples (N) that you will use. In the general form of the algorithm, each of the dimensions is then cut into N different wells, and a point within each well is randomly selected.5 Each dimension’s wells are then randomly assigned to each other, so that you don’t just sample around the n dimensional hyperhypotenuse.6

Latin Hypercube’s benefits are best demonstrated by comparison to other common sampling methods. If we compare to a Grid Search, where we want to sample 10 different points in each dimension, that requires a million data points by the time that we reach the 7th dimension. The N for samples is constant over any number of dimensions you ad,.


  1. n.b. the previous draft is probably not needed to be read because I’ve actually revised, rather than wholesale rewritten↩︎

  2. I forget where I read that, but it does seem to be common knowledge at this point↩︎

  3. cite wikipedia unless I can find a better source/read the original source↩︎

  4. oof that’s rough↩︎

  5. see the appendix on randomness↩︎

  6. I know that’s not the real word, but it feels appropriate↩︎