Motion Estimation Search Techniques: Full Search
Algorithms on fast motion estimation are always hot research spot, especially fast integer pel motion
estimation has achieved much more attention because traditional fractional pel motion estimation (such as 1/2-pel) only take a very few proportion in the computation load of whole motion estimation. In this post I wanted to put the information I have studying on the Search Techniques used in different implementations of Motion Estimation of the H.264 encoder. I think this helps as quick reference for any Video Engineer to understand the algorithms. I will keep on post on different search techniques as keep reading.
Motion Estimation: ME is defined as searching the best motion vector (MV), which is the displacement of the coordinate of the best similar block within a given search range in previously-encoded video frames for a block in the current video frame.
Matching Error: Is the sum of absolute differences (SAD) between the reference and the candidate block.
Figure-1 shows the Motion Estimation flow in reference software- JM16.0.
 
Figure-1: Motion Estimation Flow in JM
Full Search Technique
In Block-matching full search (FS) algorithms, the matching errors will be calculated for all candidate points within the predefined search range in the reference frame. The MV that gives minimum matching error will be taken as the best MV. The calculation is huge.
These are the calculation I did for a search window with widths and heights defined as max_x, max_y, min_x, min_y. This calculation is based on the JM16.0 reference software code.
search_range = max_x;
Total search points the computation will be done = (2* search_range+1)* (2* search_range+1)
Figure-2: 9 Sets of Reference and Current blocks for search range = 1
SAD Computation: 
The following code snippet is from the JM reference software.
   for (y=0; y
   {
for (x = 0; x <>
    {
      mcost += iabs( *src_line++ - *ref_line++ );
      mcost += iabs( *src_line++ - *ref_line++ );
      mcost += iabs( *src_line++ - *ref_line++ );
      mcost += iabs( *src_line++ - *ref_line++ );
    }
    if (mcost >= min_mcost) return mcost;
    ref_line += pad_size_x;
   }
As we can see, the SAD computation for one set of current (source) block and reference block will have,
blocksize_x*blocksize_y       Additions
blocksize_x*blocksize_y       Subtractions
blocksize_x*blocksize_y       Absolute Computations
2*blocksize_x*blocksize_y     Address Generations
2*blocksize_x*blocksize_y     Loads
blocksize_x*blocksize_y       Stores
*The loop overhead is not included
So even for a typical DSP processor with ABS instruction in it takes more than 8*blocksize_x* blocksize_y cycles (assuming single cycle per operation) for one set. As the figure shows there will be nine sets for search range equal to one. So the total operations will be 72* blocksize_x* blocksize_y. For block size of 16, and search range as 1, there will be 18462 operations for only SAD computations. HD resolution sequences need to have higher search ranges. 
All the above computations are for only Integer Motion Vector Estimation (IME). After IME for a selected block size, the Fractional Pixel Motion Estimation (FME) will be carried out around the estimated integer motion vector. This process is called as Half-Pel Refinement and Quarter-Pel Refinement. Again the SADs will be computed for a search window which is an area bounded by eight neighbor integer pels positions around the best integer pel position. For 1/4-pel case, the search range is three 1/4-pel units. In generation of these fractional pel positions, a 6 tap filter is used to produce the 1/2-pel positions, 1/4-pel positions are produced by linear interpolation. 

Comments
Post a Comment