532 lines
21 KiB
TeX
532 lines
21 KiB
TeX
\sectionnn{Introduction}
|
|
|
|
Bin packing is the process of packing a set of items of different sizes into
|
|
containers of a fixed capacity in a way that minimizes the number of containers
|
|
used. This has applications in many fields, such as logistics, where we want to
|
|
optimize the storage and transport of items in boxes, containers, trucks, etc.
|
|
|
|
Building mathematical models for bin packing is useful in understanding the
|
|
problem and in designing better algorithms, depending on the use case. An
|
|
algorithm optimized for packing cubes into boxes will not perform as well as
|
|
another algorithm for packing long items into trucks. Studying the mathematics
|
|
behind algorithms provides us with a better understanding of what works best.
|
|
When operating at scale, every small detail can have a huge impact on overall
|
|
efficiency and cost. Therefore, carefully developing algorithms based on solid
|
|
mathematical models is crucial. As we have seen in our Automatics class, a
|
|
small logic breach can be an issue in the long run in systems that are supposed
|
|
to run autonomously. This situation can be avoided by using mathematical models
|
|
during the design process wich will lead to better choices welding economic and
|
|
relibility concerns.
|
|
|
|
We will conduct a probabilistic analysis of multiple algorithms and compare
|
|
results to theoretical values. We will also consider the algoriths complexity
|
|
and performance, both in resource consumption and in box usage.
|
|
|
|
\clearpage
|
|
|
|
\section{Bin packing use cases}
|
|
|
|
Before studying the mathematics behind bin packing algorithms, we will have a
|
|
look at the motivations behind this project.
|
|
|
|
Bin packing has applications in many fields and allows to automize and optimize
|
|
complex systems. We will illustrate with examples focusing on two use cases:
|
|
logistics and computer science. We will consider examples of multiple dimensions
|
|
to show the versatility of bin packing algorithms.
|
|
|
|
\paragraph{} In the modern day, an effective supply chain relies on an automated production
|
|
thanks to sensors and actuators installed along conveyor belts. It is often
|
|
required to implement a packing procedure. All of this is controlled by a
|
|
computer system running continuously.
|
|
|
|
\subsection{3D : Containers}
|
|
|
|
Storing items in containers can be a prime application of bin packing. These
|
|
tree-dimensional objects of standardized size are used to transport goods.
|
|
While the dimensions of the containers are predictable, those of the transported
|
|
items are not. Storage is furthermore complicated by the fact that there can be
|
|
a void between items, allowing to move around. Multiple types of items can also
|
|
be stored in the same container.
|
|
|
|
There are many ways to optimize the storage of items in containers. For
|
|
example, by ensuring items are of an optimal standardized size or by storing a
|
|
specific item in each container, both eliminating the randomness in item size.
|
|
In these settings, it is easy to fill a container by assimilating them to
|
|
rectangular blocks. However, when items come in pseudo-random dimensions, it is
|
|
intuitive to start filling the container with larger items and then filling the
|
|
remaining gaps with smaller items. As containers must be closed, in the event
|
|
of an overflow, the remaining items must be stored in another container.
|
|
|
|
\subsection{2D : Cutting stock problem}
|
|
|
|
In industries such as woodworking bin packing algorithms are utilized to
|
|
minimize material waste when cutting large planks into smaller pieces of
|
|
desired sizes. Many tools use this two-dimensional cut process. For example, at
|
|
the Fabric'INSA Fablab, the milling machine, laser cutter and many more are
|
|
used to cut large planks of wood into smaller pieces for student projects. In
|
|
this scenario, we try to organize the desired cuts in a way that minimizes the
|
|
unusable excess wood.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\includegraphics[width=0.65\linewidth]{graphics/fraiseuse.jpg}
|
|
\caption[]{Milling machine at the Fabric'INSA Fablab \footnotemark}
|
|
\label{fig:fraiseuse}
|
|
\end{figure}
|
|
\footnotetext{Photo courtesy of Inés Bafaluy}
|
|
|
|
Managing the placement of items of complex shapes can be optimized by using
|
|
by various algorithms minimizing the waste of material.
|
|
|
|
\subsection{1D : Networking}
|
|
|
|
When managing network traffic at scale, efficiently routing packets is
|
|
necessary to avoid congestion, which leads to lower bandwidth and higher
|
|
latency. Say you're a internet service provider and your users are watching
|
|
videos on popular streaming platforms. You want to ensure that the traffic is
|
|
balanced between the different routes to minimize throttling and energy
|
|
consumption.
|
|
|
|
\paragraph{} We can consider the different routes as bins and the users'
|
|
bandwidth as the items. If a bin overflows, we can redirect the traffic to
|
|
another route. Using less bins means less energy consumption and decreased
|
|
operating costs. This is a good example of bin packing in a dynamic
|
|
environment, where the items are constantly changing. Humans are not involved
|
|
in the process, as it is fast-paced and requires a high level of automation.
|
|
|
|
\vspace{0.4cm}
|
|
|
|
\paragraph{} We have seen multiple examples of how bin packing algorithms can
|
|
be used in various technical fields. In these examples, a choice was made,
|
|
evaluating the process effectiveness and reliability, based on a probabilistic
|
|
analysis allowing the adaptation of the algorithm to the use case. We will now
|
|
conduct our own analysis and study various algorithms and their probabilistic
|
|
advantages, focusing on one-dimensional bin packing, where we try to store
|
|
items of different heights in a linear bin.
|
|
|
|
\section{Next Fit Bin Packing algorithm (NFBP)}
|
|
|
|
Our goal is to study the number of bins $ H_n $ required to store $ n $ items
|
|
for each algorithm. We first consider the Next Fit Bin Packing algorithm, where
|
|
we store each item in the current bin if it fits, otherwise we open a new bin.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{tikzpicture}[scale=0.8]
|
|
% Bins
|
|
\draw[thick] (0,0) rectangle (2,6);
|
|
\draw[thick] (3,0) rectangle (5,6);
|
|
\draw[thick] (6,0) rectangle (8,6);
|
|
|
|
% Items
|
|
\draw[fill=red] (0.5,0.5) rectangle (1.5,3.25);
|
|
\draw[fill=blue] (0.5,3.5) rectangle (1.5,5.5);
|
|
\draw[fill=green] (3.5,0.5) rectangle (4.5,1.5);
|
|
\draw[fill=orange] (3.5,1.75) rectangle (4.5,3.75);
|
|
\draw[fill=purple] (6.5,0.5) rectangle (7.5,2.75);
|
|
\draw[fill=yellow] (6.5,3) rectangle (7.5,4);
|
|
|
|
% arrow
|
|
\draw[->, thick] (8.6,3.5) -- (7.0,3.5);
|
|
\draw[->, thick] (8.6,1.725) -- (7.0,1.725);
|
|
|
|
% Labels
|
|
\node at (1,-0.75) {Bin 0};
|
|
\node at (4,-0.75) {Bin 1};
|
|
\node at (7,-0.75) {Bin 2};
|
|
\node at (10.0,3.5) {Yellow item};
|
|
\node at (10.0,1.725) {Purple item};
|
|
|
|
\end{tikzpicture}
|
|
\label{fig:nfbp}
|
|
\caption{Next Fit Bin Packing example}
|
|
\end{figure}
|
|
|
|
\paragraph{} The example in figure \ref{fig:nfbp} shows the limitations of the
|
|
NFBP algorithm. The yellow item is stored in bin 2, while it could fit in bin
|
|
1, because the purple item is considered first and is too large to fit.
|
|
|
|
\paragraph{} Each bin will have a fixed capacity of $ 1 $ and items
|
|
will be of random sizes between $ 0 $ and $ 1 $.
|
|
|
|
\subsection{Variables used in models}
|
|
|
|
We use the following variables in our algorithms and models :
|
|
|
|
\begin{itemize}
|
|
|
|
\item $ U_n $ : the size of the $ n $-th item. $ (U_n)_{n \in \mathbb{N^*}} $
|
|
denotes the mathematical sequence of random variables of uniform
|
|
distribution on $ [0, 1] $ representing the items' sizes.
|
|
|
|
\item $ T_i $ : the number of items in the $ i $-th bin.
|
|
|
|
\item $ V_i $ : the size of the first item in the $ i $-th bin.
|
|
|
|
\item $ H_n $ : the number of bins required to store $ n $ items.
|
|
|
|
\end{itemize}
|
|
|
|
Mathematically, the NFBP algorithm imposes the following constraint on the first box :
|
|
|
|
\begin{align*}
|
|
T_1 = k \iff & U_1 + U_2 + \ldots + U_{k} < 1 \\
|
|
\text{ and } & U_1 + U_2 + \ldots + U_{k+1} \geq 1 \qquad \text{ with } k \geq 1 \\
|
|
\end{align*}
|
|
|
|
|
|
\subsection{Implementation and results}
|
|
|
|
We implemented the NFBP algorithm in Python \footnotemark, for its ease of use
|
|
and broad recommendation. We used the \texttt{random} library to generate
|
|
random numbers between $ 0 $ and $ 1 $ and \texttt{matplotlib} to plot the
|
|
results in the form of histograms.
|
|
|
|
\footnotetext{The code is available in Annex \ref{annex:probabilistic}}
|
|
|
|
We will try to approximate $ \mathbb{E}[R] $ and $ \mathbb{E}[V] $ with $
|
|
\overline{X_N} $ using $ {S_n}^2 $. This operation will be done for both $ R =
|
|
2 $ and $ R = 10^6 $ simulations.
|
|
|
|
\[
|
|
\overline{X_N} = \frac{1}{N} \sum_{i=1}^{N} X_i
|
|
\]
|
|
|
|
As the variance value is unknown, we will use $ {S_n}^2 $ to estimate the
|
|
variance and further determine the Confidence Interval (95 \% certainty).
|
|
|
|
\begin{align*}
|
|
{S_N}^2 & = \frac{1}{N-1} \sum_{i=1}^{N} (X_i - \overline{X_N})^2 \\
|
|
IC_{95\%}(m) & = \left[ \overline{X_N} \pm \frac{S_N}{\sqrt{N}} \cdot t_{1 - \frac{\alpha}{2}, N-1} \right] \\
|
|
\end{align*}
|
|
|
|
|
|
|
|
|
|
\paragraph{2 simulations} We first ran $ R = 2 $ simulations to observe the
|
|
behavior of the algorithm and the low precision of the results.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{graphics/graphic-NFBP-Ti-2-sim}
|
|
\caption{Histogram of $ T_i $ for $ R = 2 $ simulations and $ N = 50 $ items (number of items per bin)}
|
|
\label{fig:graphic-NFBP-Ti-2-sim}
|
|
\end{figure}
|
|
|
|
On this graph (figure \ref{fig:graphic-NFBP-Ti-2-sim}), we can see each value
|
|
of $ T_i $. Our calculations have yielded that $ \overline{T_1} = 1.0 $ and $
|
|
{S_N}^2 = 2.7 $. Our Student coefficient is $ t_{0.95, 2} = 4.303 $.
|
|
|
|
We can now calculate the Confidence Interval for $ T_1 $ for $ R = 2 $ simulations :
|
|
|
|
\begin{align*}
|
|
IC_{95\%}(T_1) & = \left[ 1.0 \pm 1.96 \frac{\sqrt{2.7}}{\sqrt{2}} \cdot 4.303 \right] \\
|
|
& = \left[ 1 \pm 9.8 \right] \\
|
|
\end{align*}
|
|
|
|
We can see that the Confidence Interval is very large, which is due to the low
|
|
number of simulations. Looking at figure \ref{fig:graphic-NFBP-Ti-2-sim}, we
|
|
easily notice the high variance.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{graphics/graphic-NFBP-Vi-2-sim}
|
|
\caption{Histogram of $ V_i $ for $ R = 2 $ simulations and $ N = 50 $ items (size of the first item in a bin)}
|
|
\label{fig:graphic-NFBP-Vi-2-sim}
|
|
\end{figure}
|
|
|
|
On the graph of $ V_i $ (figure \ref{fig:graphic-NFBP-Vi-2-sim}), we can see
|
|
that the sizes are scattered pseudo-randomly between $ 0 $ and $ 1 $, which is
|
|
unsuprising given the low number of simulations. The process determinig the statistics
|
|
is the same as for $ T_i $, yielding $ \overline{V_1} = 0.897 $, $ {S_N}^2 =
|
|
0.2 $ and $ IC_{95\%}(V_1) = \left[ 0.897 \pm 1.3 \right] $. In this particular run,
|
|
the two values for $ V_1 $ are high (being bouded between $ 0 $ and $ 1 $).
|
|
|
|
|
|
\paragraph{100 000 simulations} In order to ensure better precision, we then
|
|
ran $ R = 10^5 $ simulations with $ N = 50 $ different items each.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{graphics/graphic-NFBP-Ti-105-sim}
|
|
\caption{Histogram of $ T_i $ for $ R = 10^5 $ simulations and $ N = 50 $ items (number of items per bin)}
|
|
\label{fig:graphic-NFBP-Ti-105-sim}
|
|
\end{figure}
|
|
|
|
On this graph (figure \ref{fig:graphic-NFBP-Ti-2-sim}), we can see each value
|
|
of $ T_i $. Our calculations have yielded that $ \overline{T_1} = 1.72 $ and $
|
|
{S_N}^2 = 0.88 $. Our Student coefficient is $ t_{0.95, 2} = 2 $.
|
|
|
|
We can now calculate the Confidence Interval for $ T_1 $ for $ R = 10^5 $ simulations :
|
|
|
|
\begin{align*}
|
|
IC_{95\%}(T_1) & = \left[ 1.72 \pm 1.96 \frac{\sqrt{0.88}}{\sqrt{10^5}} \cdot 2 \right] \\
|
|
& = \left[ 172 \pm 0.012 \right] \\
|
|
\end{align*}
|
|
|
|
We can see that the Confidence Interval is very small, thanks to the large number of iterations.
|
|
This results in a steady curve in figure \ref{fig:graphic-NFBP-Ti-105-sim}.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{graphics/graphic-NFBP-Vi-105-sim}
|
|
\caption{Histogram of $ V_i $ for $ R = 10^5 $ simulations and $ N = 50 $ items (size of the first item in a bin)}
|
|
\label{fig:graphic-NFBP-Vi-105-sim}
|
|
\end{figure}
|
|
|
|
\paragraph{Asymptotic behavior of $ H_n $} Finally, we analyzed how many bins
|
|
were needed to store $ n $ items. We used the numbers from the $ R = 10^5 $ simulations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{Next Fit Dual Bin Packing algorithm (NFDBP)}
|
|
|
|
Next Fit Dual Bin Packing is a variation of NFBP in which we allow the bins to
|
|
overflow. A bin must be fully filled, unless it is the last bin.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{tikzpicture}[scale=0.8]
|
|
% Bins
|
|
\draw[thick] (0,0) rectangle (2,6);
|
|
\draw[thick] (3,0) rectangle (5,6);
|
|
|
|
% Transparent Tops
|
|
\fill[white,opacity=1.0] (0,5.9) rectangle (2,6.5);
|
|
\fill[white,opacity=1.0] (3,5.9) rectangle (5,6.5);
|
|
|
|
% Items
|
|
\draw[fill=red] (0.5,0.5) rectangle (1.5,3.25);
|
|
\draw[fill=blue] (0.5,3.5) rectangle (1.5,5.5);
|
|
\draw[fill=green] (0.5,5.75) rectangle (1.5,6.75);
|
|
\draw[fill=orange] (3.5,0.5) rectangle (4.5,2.5);
|
|
\draw[fill=purple] (3.5,2.75) rectangle (4.5,5.0);
|
|
\draw[fill=yellow] (3.5,5.25) rectangle (4.5,6.25);
|
|
|
|
% Labels
|
|
\node at (1,-0.75) {Bin 0};
|
|
\node at (4,-0.75) {Bin 1};
|
|
|
|
\end{tikzpicture}
|
|
\caption{Next Fit Dual Bin Packing example}
|
|
\label{fig:nfdbp}
|
|
\end{figure}
|
|
|
|
\paragraph{} The example in figure \ref{fig:nfdbp} shows how NFDBP utilizes
|
|
less bins than NFBP, due to less stringent constraints. The top of the bin is
|
|
effectively removed, allowing for an extra item to be stored in the bin. We can
|
|
easily see how with NFDBP each bin can at least contain two items.
|
|
|
|
\paragraph{} The variables used are the same as for NFBP. Mathematically, the
|
|
new constraints on the first bin can be expressed as follows :
|
|
|
|
\begin{align*}
|
|
T_1 = k \iff & U_1 + U_2 + \ldots + U_{k-1} < 1 \\
|
|
\text{ and } & U_1 + U_2 + \ldots + U_{k} \geq 1 \qquad \text{ with } k \geq 2 \\
|
|
\end{align*}
|
|
|
|
\subsection{La giga demo}
|
|
|
|
Let $ k \geq 2 $. Let $ (U_n)_{n \in \mathbb{N}^*} $ be a sequence of
|
|
independent random variables with uniform distribution on $ [0, 1] $, representing
|
|
the size of the $ n $-th item.
|
|
|
|
Let $ i \in \mathbb{N} $. $ T_i $ denotes the number of items in the $ i $-th
|
|
bin. We have that
|
|
|
|
\begin{equation*}
|
|
T_i = k \iff U_1 + U_2 + \ldots + U_{k-1} < 1 \text{ and } U_1 + U_2 + \ldots + U_{k} \geq 1
|
|
\end{equation*}
|
|
|
|
Let $ A_k = \{ U_1 + U_2 + \ldots + U_{k} < 1 \}$. Hence,
|
|
|
|
\begin{align}
|
|
\label{eq:prob}
|
|
P(T_i = k)
|
|
& = P(A_{k-1} \cap A_k^c) \\
|
|
& = P(A_{k-1}) - P(A_k) \qquad \text{ (as $ A_k \subset A_{k-1} $)} \\
|
|
\end{align}
|
|
|
|
We will try to show that $ \forall k \geq 1 $, $ P(A_k) = \frac{1}{k!} $. To do
|
|
so, we will use induction to prove the following proposition \eqref{eq:induction},
|
|
$ \forall k \geq 1 $:
|
|
|
|
\begin{equation}
|
|
\label{eq:induction}
|
|
\tag{$ \mathcal{H}_k $}
|
|
P(U_1 + U_2 + \ldots + U_{k} < a) = \frac{a^k}{k!} \qquad \forall a \in [0, 1],
|
|
\end{equation}
|
|
|
|
Let us denote $ S_k = U_1 + U_2 + \ldots + U_{k} \qquad \forall k \geq 1 $.
|
|
|
|
\paragraph{Base case} $ k = 1 $ : $ P(U_1 < a) = a = \frac{a^1}{1!}$, proving $ (\mathcal{H}_1) $.
|
|
|
|
\paragraph{Induction step} Let $ k \geq 2 $. We assume $ (\mathcal{H}_{k-1}) $ is
|
|
true. We will show that $ (\mathcal{H}_{k}) $ is true.
|
|
|
|
\begin{align*}
|
|
P(S_k < a) & = P(S_{k-1} + U_k < a) \\
|
|
& = \iint_{\cal{D}} f_{S_{k-1}, U_k}(x, y) dxdy \\
|
|
\text{Where } \mathcal{D} & = \{ (x, y) \in [0, 1]^2 \mid x + y < a \} \\
|
|
& = \{ (x, y) \in [0, 1]^2 \mid 0 < x < a \text{ and } 0 < y < a - x \} \\
|
|
P(S_k < a) & = \iint_{\cal{D}} f_{S_{k-1}}(x) \cdot f_{U_k}(y) dxdy \qquad
|
|
\text{because $ S_{k-1} $ and $ U_k $ are independent} \\
|
|
& = \int_{0}^{a} f_{S_{k-1}}(x) \cdot \left( \int_{0}^{a-x} f_{U_k}(y) dy \right) dx \\
|
|
\end{align*}
|
|
|
|
$ (\mathcal{H}_{k-1}) $ gives us that $ \forall x \in [0, 1] $,
|
|
$ F_{S_{k-1}}(x) = P(S_{k-1} < x) = \frac{x^{k-1}}{(k-1)!} $.
|
|
|
|
By differentiating, we get that $ \forall x \in [0, 1] $,
|
|
|
|
\[
|
|
f_{S_{k-1}}(x) = F'_{S_{k-1}}(x) = \frac{x^{k-2}}{(k-2)!}
|
|
\]
|
|
|
|
Furthermore, $ U_{k-1} $ is uniformly distributed on $ [0, 1] $, so
|
|
$ f_{U_{k-1}}(y) = 1 $. We can then integrate by parts :
|
|
|
|
\begin{align*}
|
|
P(S_k < a)
|
|
& = \int_{0}^{a} f_{S_{k-1}}(x) \cdot \left( \int_{0}^{a-x} 1 dy \right) dx \\
|
|
& = \int_{0}^{a} f_{S_{k-1}}(x) \cdot (a - x) dx \\
|
|
& = a \int_{0}^{a} f_{S_{k-1}}(x) dx - \int_{0}^{a} x f_{S_{k-1}}(x) dx \\
|
|
& = a \int_0^a F'_{S_{k-1}}(x) dx - \left[ x F_{S_{k-1}}(x) \right]_0^a
|
|
+ \int_{0}^{a} F_{S_{k-1}}(x) dx \qquad \text{(IPP : }x, F_{S_{k-1}} \in C^1([0,1]) \\
|
|
& = a \left[ F_{S_{k-1}}(x) \right]_0^a - \left[ x F_{S_{k-1}}(x) \right]_0^a
|
|
+ \int_{0}^{a} \frac{x^{k-1}}{(k-1)!} dx \\
|
|
& = \left[ \frac{x^k}{k!} \right]_0^a \\
|
|
& = \frac{a^k}{k!} \\
|
|
\end{align*}
|
|
|
|
\paragraph{Conclusion} We have shown that $ (\mathcal{H}_{k}) $ is true, so by induction, $ \forall k \geq 1 $,
|
|
$ \forall a \in [0, 1] $, $ P(U_1 + U_2 + \ldots + U_{k} < a) = \frac{a^k}{k!} $. Take
|
|
$ a = 1 $ to get
|
|
|
|
\[ P(U_1 + U_2 + \ldots + U_{k} < 1) = \frac{1}{k!} \]
|
|
|
|
Finally, plugging this into \eqref{eq:prob} gives us
|
|
|
|
\[
|
|
P(T_i = k) = P(A_{k-1}) - P(A_{k}) = \frac{1}{(k-1)!} - \frac{1}{k!} \qquad \forall k \geq 2
|
|
\]
|
|
|
|
\subsection{Expected value of $ T_i $}
|
|
|
|
We now compute the expected value $ \mu $ and variance $ \sigma^2 $ of $ T_i $.
|
|
|
|
\begin{align*}
|
|
\mu = E(T_i) & = \sum_{k=2}^{\infty} k \cdot P(T_i = k) \\
|
|
& = \sum_{k=2}^{\infty} (\frac{k}{(k-1)!} - \frac{1}{(k-1)!}) \\
|
|
& = \sum_{k=2}^{\infty} \frac{k-1}{(k-1)!} \\
|
|
& = \sum_{k=0}^{\infty} \frac{1}{k!} \\
|
|
& = e \\
|
|
\end{align*}
|
|
|
|
\begin{align*}
|
|
E({T_i}^2) & = \sum_{k=2}^{\infty} k^2 \cdot P(T_i = k) \\
|
|
& = \sum_{k=2}^{\infty} (\frac{k^2}{(k-1)!} - \frac{k}{(k-1)!}) \\
|
|
& = \sum_{k=2}^{\infty} \frac{(k-1)k}{(k-1)!} \\
|
|
& = \sum_{k=2}^{\infty} \frac{k}{(k-2)!} \\
|
|
& = \sum_{k=0}^{\infty} \frac{k+2}{k!} \\
|
|
& = \sum_{k=0}^{\infty} (\frac{1}{(k-1)!} + \frac{2}{(k)!}) \\
|
|
& = \sum_{k=0}^{\infty} \frac{1}{(k)!} - 1 + 2e \\
|
|
& = 3e - 1
|
|
\end{align*}
|
|
|
|
\begin{align*}
|
|
\sigma^2 = E({T_i}^2) - E(T_i)^2 = 3e - 1 - e^2
|
|
\end{align*}
|
|
|
|
|
|
\section{Complexity and implementation optimization}
|
|
|
|
Both the NFBP and NFDBP algorithms have a linear complexity $ O(n) $, as we
|
|
only need to iterate over the items once. While the algorithms themselves are
|
|
linear, calculating the statistics may not not be. In this section, we will
|
|
discuss how to optimize the implementation of the statistical analysis.
|
|
|
|
\subsection{Performance optimization}
|
|
|
|
When implementing the statistical analysis, the intuitive way to do it is to
|
|
run $ R $ simulations, store the results, then conduct the analysis. However,
|
|
when running a large number of simulations, this can be very memory
|
|
consuming. We can optimize the process by computing the statistics on the fly,
|
|
by using sum formulae. This uses nearly constant memory, as we only need to
|
|
store the current sum and the current sum of squares for different variables.
|
|
|
|
While the mean can easily be calculated by summing then dividing, the empirical
|
|
variance can be calculated using the following formula:
|
|
|
|
\begin{align*}
|
|
{S_N}^2 & = \frac{1}{N-1} \sum_{i=1}^{N} (X_i - \overline{X})^2 \\
|
|
& = \frac{1}{N-1} \sum_{i=1}^{N} X_i^2 - \frac{N}{N-1} \overline{X}^2
|
|
\end{align*}
|
|
|
|
The sum $ \frac{1}{N-1} \sum_{i=1}^{N} X_i^2 $ can be calculated iteratively
|
|
after each simulation.
|
|
|
|
\subsection{Effective resource consumption}
|
|
|
|
We set out to study the resource consumption of the algorithms. We implemented
|
|
the above formulae to calculate the mean and variance of $ N = 10^6 $ random
|
|
numbers. We wrote the following algorithms \footnotemark :
|
|
|
|
\footnotetext{The full code used to measure performance can be found in Annex \ref{annex:performance}.}
|
|
|
|
\paragraph{Intuitive algorithm} Store values first, calculate later
|
|
|
|
\begin{lstlisting}[language=python]
|
|
N = 10**6
|
|
values = [random() for _ in range(N)]
|
|
mean = mean(values)
|
|
variance = variance(values)
|
|
\end{lstlisting}
|
|
|
|
Execution time : $ 4.8 $ seconds
|
|
|
|
Memory usage : $ 32 $ MB
|
|
|
|
\paragraph{Improved algorithm} Continuous calculation
|
|
|
|
\begin{lstlisting}[language=python]
|
|
N = 10**6
|
|
Tot = 0
|
|
Tot2 = 0
|
|
for _ in range(N):
|
|
item = random()
|
|
Tot += item
|
|
Tot2 += item ** 2
|
|
mean = Tot / N
|
|
variance = Tot2 / (N-1) - mean**2
|
|
\end{lstlisting}
|
|
|
|
Execution time : $ 530 $ milliseconds
|
|
|
|
Memory usage : $ 1.3 $ kB
|
|
|
|
\paragraph{Analysis} Memory usage is, as expected, much lower when calculating
|
|
the statistics on the fly. Furthermore, something we hadn't anticipated is the
|
|
execution time. The improved algorithm is nearly 10 times faster than the
|
|
intuitive one. This can be explained by the time taken to allocate memory and
|
|
then calculate the statistics (which iterates multiple times over the array).
|
|
\footnotemark
|
|
|
|
\footnotetext{Performance was measured on a single computer and will vary
|
|
between devices. Execution time and memory usage do not include the import of
|
|
libraries.}
|
|
|
|
\subsection{NFBP vs NFDBP}
|
|
|
|
\subsection{Optimal algorithm}
|
|
|
|
|
|
\sectionnn{Conclusion}
|
|
|
|
|
|
\nocite{bin-packing-approximation:2022}
|
|
\nocite{hofri:1987}
|
|
|