Splitting Methods
They differ primarily in the techniques used to split an overflowing node
during insertion. The goal is to minimize coverage and overlap. These
goals are at times contradictory and thus heuristics are often used. The
node splitting algorithms described below range from being quite simple
(e.g., exhaustive search) to being fairly complicated (e.g., R*-tree [Beck90c]). Some just split the
overflowing node, while others try to reinsert some of the objects and
nodes from the overflowing nodes thereby striving for better overall
behavior (e.g., reduction in coverage and overlap).
- Exhaustive Search:
If the R-tree node overflows, then try all possible ways of
splitting the node into two new nodes that satisfy the requirements on
the minimal and maximal number of children that can be stored in a
node.
Choose the split which causes bounding boxes of the two new nodes to
have the smallest area [Gutt84].
Thus the goal of this method is to reduce the coverage.
This method is quite complex as it tries all possibilities.
- Quadratic method:
Examine all the children of the overflowing node and find the pair of
bounding boxes that would waste the most area were they to be inserted
in the same node.
This is determined by subtracting the sum of the areas of the two
bounding boxes from the area of the covering bounding box.
These two bounding boxes are placed in separate nodes, say j
and k .
The set of remaining bounding boxes are examined and the bounding box
i whose addition maximizes the difference in coverage between
the bounding boxes associated with j and k is added
to the node whose coverage is minimized by the addition.
This process is reapplied to the remaining bounding boxes [Gutt84].
This method takes quadratic time.
- Linear Method:
Find the two bounding boxes with the greatest normalized separation along
both axes, and split along this axis. The remaining bounding boxes in the
node are assigned to the nodes whose covering bounding box is increased
the least by the addition [Gutt84].
This method takes linear time.
- R*-tree:
The R*-tree [Beck90c] is a name
given to a variant of the
R-tree which makes use of the most complex of the node splitting
algorithms.
The algorithm differs from the other algorithms as it attempts to
reduce both overlap and coverage.
In particular, the primary focus is on reducing overlap with ties
broken by favoring the splits that reduce the coverage by using the
splits that minimize the perimeter of the bounding boxes of the
resulting nodes.
In addition, when a node a overflows, instead of immediately
splitting a , an attempt is made first to see if some of the
objects
in a could possibly be more suited to being in another node.
This is achieved by reinserting a fraction (30% has been
found to yield good performance [Beck90c]) of these objects in
the tree (termed forced reinsertion).
The node is only split if it has been found to overflow after
reinsertion has taken place.
This method is quite complex.
- Ang/Tan Method:
Form four sets, one for each face (i.e., side) of the overflowing
node a .
Associate each bounding box o in a with the closest
face of a in each of the two dimensions.
Once the four sets representing four partitions have been constructed
(i.e., each bounding box o has been associated with two sets),
select the partition that ensures the most even distribution of
bounding boxes.
In case of a tie, choose the partition with the least overlap.
In case of another tie, choose the partition with the least coverage.
This method takes linear time [Ang97].
- Hilbert Nonpacked Method:
Order the objects on the basis of the Peano-Hilbert number (e.g.,
[Same90a]) corresponding to
their centroid [Kame94]. The objects are stored in a B+-tree.
- Morton Nonpacked Method:
Order the objects on the basis of the Morton number (e.g., [Same90a]) corresponding to their
centroid [Whit82]. This number is
obtained by interleaving the x and y coordinate values of
the centroid. The objects are stored in a B+-tree.
- Packed Method:
Order the objects on the basis of some criterion [Rous85].
It has two stages. In our implementation, the first stage orders the
centroids of the objects in Peano-Hilbert order.
The second stage fills the leaf nodes by examining the objects in
increasing Peano-Hilbert order of their centroids (obtained by the
first stage).
Each leaf node is filled with the first unprocessed object and its
M-1 nearest neighbors (in the space in which the objects lie)
which have not yet been inserted in other leaf nodes.
Once an entire level of the packed R-tree has been obtained, the
algorithm is reapplied to add nodes at the next level using the same
nearest neighbor criterion, terminating when a level contains just one
node. The only difference between the ordering that is applied at the
levels containing the nonleaf nodes from that used at the level of the
leaf nodes is that in the former case we are ordering the bounding
boxes while in the latter case we are ordering the actual objects.
- Hilbert Packed Method:
Order the objects on the basis of the Peano-Hilbert number
corresponding to their centroid [Kame93].
Once this ordering has been obtained, the leaf nodes are filled to
capacity by examining the objects in increasing order.
The nodes at the remaining levels are ordered according to the time at
which they were created.
- Morton Packed R-tree:
Order the objects on the basis of the Morton number corresponding to
their centroid (e.g., [Kame93]).
This number is obtained by interleaving the x and y
coordinate values of the centroid. Once this ordering has been
obtained, the leaf nodes are filled to capacity by examining the
objects in increasing order. The nodes at the remaining levels are
ordered according to the time at which they were created.