Searching Operations

Apache C++ Standard Library User's Guide

13.3 Searching Operations

The next category of algorithms we describe are used to locate elements within a sequence that satisfy certain properties. Most commonly the result of a search is then used as an argument to a further operation, for example, std::copy() (Section 13.2.2), std::partition() (Section 13.4.4), or std::inplace_merge() (Section 13.4.6.)

NOTE -- The example functions described in the following sections are in the file alg2.cpp.

The searching routines described in this section return an iterator that identifies the first element that satisfies the search condition. It is common to store this value in an iterator variable, as follows:

std::list<int>::iterator where;
where = std::find(aList.begin(), aList.end(), 7);

If you want to locate all the elements that satisfy the search conditions, you must write a loop. In that loop, the value yielded by a previous search is first advanced, since otherwise the value yielded by the previous search would once again be returned. The resulting value is then used as a starting point for the new search. For example, the following loop from the std::adjacent_find() example program (Section 13.3.2) will print the value of all repeated characters in a string argument:

while ((where = std::adjacent_find(where, stop)) != stop) {
  std::cout << "double " << *where << " in position " 
            << where - start << std::endl;
  ++where;
}

NOTE -- The searching algorithms in the C++ Standard Library all return the end-of-sequence iterator if no value is found that matches the search condition. As it is generally illegal to dereference the end-of-sequence value, it is important to check for this condition before proceeding to use the result of a search.

Many of the searching algorithms have an optional argument that can specify a function for comparing elements in place of the equality operator==() for the container element type. In the descriptions of the algorithms, we write these optional arguments inside square brackets to indicate they need not be specified if the standard equality operator is acceptable.

13.3.1 Find an Element Satisfying a Condition

There are two algorithms, std::find() and std::find_if(), that are used to find the first element that satisfies a condition. The declarations of these two algorithms are as follows:

namespace std {
  InputIterator find_if(InputIterator first,
                        InputIterator last, Predicate);

  InputIterator find(InputIterator first, 
                     InputIterator last, const T&);
}

The algorithm std::find_if() takes as argument a predicate function, which can be any function that returns a boolean value (see Section 3.3). std::find_if() returns a new iterator that designates the first element in the sequence that satisfies the predicate. The second argument, the past-the-end iterator, is returned if no element is found that matches the requirement. Because the resulting value is an iterator, the dereference operator * must be used to obtain the matching value. This is illustrated in the example program.

The second form of the algorithm, std::find(), replaces the predicate function with a specific value, and returns the first element in the sequence that tests equal to this value, using operator==() for the given datatype.

NOTE -- The algorithms find() and find_if() perform a linear sequential search through the associated structures. The set and map data structures, which are ordered, provide their own find() member functions, which are more efficient. Because of this, the generic find() algorithm should not be used with set and map.

The following example program illustrates the use of std::find() and std::find_if():

void find_test ()
// illustrates the use of the find algorithm
// see alg2.cpp for complete source code
{
   int vintageYears[] = {1967, 1972, 1974, 1980, 1995};
   int *start = vintageYears;
   int *stop = start + 5;
   int *where = std::find_if(start, stop, isLeapYear);

   if (where != stop)
      std::cout << "first vintage leap year is "
                << *where << std::endl;
   else
      std::cout << "no vintage leap years" << std::endl;

   where = std::find(start, stop, 1995);

   if (where != stop)
      std::cout << "1995 is position " << where - start 
                << " in sequence" << std::endl;
   else
      std::cout << "1995 does not occur in sequence" << std::endl;
}

13.3.2 Find Consecutive Duplicate Elements

The std::adjacent_find() algorithm is used to discover the first element in a sequence equal to the next immediately following element. For example, if a sequence contained the values 1 4 2 5 6 6 7 5, the algorithm would return an iterator corresponding to the first 6 value. If no value satisfying the condition is found, then the end-of-sequence iterator is returned. The declaration of the algorithm is as follows:

namespace std {
  ForwardIterator adjacent_find (ForwardIterator first, 
                                 ForwardIterator last
                                 [, BinaryPredicate ] );
}

The first two arguments specify the sequence to be examined. The optional third argument must be a binary predicate, a binary function returning a boolean value. If present, the binary function is used to test adjacent elements, otherwise the equality operator, operator==(), is used.

The example program searches a text string for adjacent letters. In the example text these are found in positions 5, 7, 9, 21, and 37. The increment is necessary inside the loop in order to avoid the same position being discovered repeatedly.

void adjacent_find_example ()
// illustrates the use of the adjacent_find instruction
// see alg2.cpp for complete source code
{
   char *text = "The bookkeeper carefully opened the door.";

   char *start = text;
   char *stop = text + strlen(text);
   char *where = start;

   std::cout << "In the text: " << text << std::endl;
   while ((where = std::adjacent_find(where, stop)) != stop) {
      std::cout << "double " << *where 
                << " in position " << where - start << std::endl;
      ++where;
   }
}

13.3.3 Find the First Occurrence of Any Value from a Sequence

The algorithm std::find_first_of() is used to find the first occurrence of some element from one sequence that is also contained in another sequence:

namespace std {
  ForwardIterator1 find_first_of 
    (ForwardIterator1 first1, ForwardIterator1 last1,
     ForwardIterator2 first2, ForwardIterator2 last2
     [, BinaryPredicate pred ] );
}

The algorithm returns a new iterator that designates the first element found in [first1,last1) that is also contained in [first2,last2). If no match is found, the second argument is returned. Because the resulting value is an iterator, the dereference operator * must be used to obtain the matching value. This is illustrated in the example program:

void find_test ()
// illustrates the use of the find algorithm
// see alg2.cpp for complete source code
{
   int vintageYears[] = {1967, 1972, 1974, 1980, 1995};
   int requestedYears[] = {1923, 1970, 1980, 1974 };
   int *start = vintageYears;
   int *stop = start + 5;
   int *where = find_first_of(start, stop, requestedyears,
                              requestedyears+4);

   if (where != stop)
      std::cout << "first requested vintage year is "
                << *where << std::endl;
   else
      std::cout << "no requested vintage years" << std::endl;

}

// The output would indicate 1974.

Note that this algorithm, unlike many that manipulate two sequences, uses a starting and ending iterator pair for both sequences, not just the first sequence.

Like the algorithms std::equal() and std::mismatch(), an alternative version of std::find_first_of() takes an optional binary predicate that is used to compare elements from the two sequences.

The basic_string class provides its own versions of the find_first_of and find_end algorithms, including several convenience overloads of the basic pattern indicated here.

13.3.4 Find a Sub-Sequence within a Sequence

The algorithms std::search() and std::search_n() are used to locate the beginning of a particular sub-sequence within a larger sequence. The easiest example to understand is the problem of looking for a particular substring within a larger string, although the algorithm can be generalized to other uses. The arguments are assumed to have at least the capabilities of forward iterators.

namespace std {
  ForwardIterator search 
         (ForwardIterator first1, ForwardIterator last1, 
          ForwardIterator first2, ForwardIterator last2 
          [, BinaryPredicate ]);
}

Suppose, for example, that we want to discover the location of the string "ration" in the string "dreams and aspirations". The solution to this problem is shown in the example program. If no appropriate match is found, the value returned is the past-the-end iterator for the first sequence.

void search_example ()
// illustrate the use of the search algorithm
// see alg2.cpp for complete source code
{
  char *base = "dreams and aspirations";
  char *text = "ration";

  char *where = std::search(base, base + std::strlen(base), 
                            text, text + std::strlen(text));

  if (*where != '\0')
    std::cout << "substring position: "
              << where - base << std::endl;
  else
    std::cout << "substring does not occur in text" << std::endl;
}

Note that this algorithm, unlike many that manipulate two sequences, uses a starting and ending iterator pair for both sequences, not just the first sequence.

Like the algorithms std::equal() and std::mismatch(), an alternative version of std::search() takes an optional binary predicate that is used to compare elements from the two sequences.

What determines the speed of the search? In the worst case, the number of comparisons performed by the algorithm std::search() is the product of the number of elements in the two sequences. Except in rare cases, however, this worst case behavior is highly unlikely.

13.3.5 Find the Last Occurrence of a Sub-Sequence

The algorithm std::find_end() is used to locate the beginning of the last occurrence of a particular sub-sequence within a larger sequence. The easiest example to understand is the problem of looking for a particular substring within a larger string, although the algorithm can be generalized to other uses. The arguments are assumed to have at least the capabilities of forward iterators.

namespace std {
  ForwardIterator find_end 
    (ForwardIterator first1, ForwardIterator last1, 
     ForwardIterator first2, ForwardIterator last2 
     [, BinaryPredicate ]);
}

Suppose, for example, that we want to discover the location of the last occurrence of the string "le" in the string "The road less traveled". The solution to this problem is shown in the example program. If no appropriate match is found, the value returned is the past-the-end iterator for the first sequence.

void find_end_example ()
// illustrate the use of the find_end algorithm
// see alg2.cpp for complete source code
{
  char *base = "The road less traveled";
  char *text = "le";

  char *where = std::find(base, base + std::strlen(base), 
                          text, text + std::strlen(text));

  if (*where != '\0')
    std::cout << "substring position: "
              << where - base << std::endl;
  else
    std::cout << "substring does not occur in text" << std::endl;
}

Note that this algorithm, unlike many that manipulate two sequences, uses a starting and ending iterator pair for both sequences, not just the first sequence.

Like the algorithms std::find_first_of() and std::search(), an alternative version of std::find_end() takes an optional binary predicate that is used to compare elements from the two sequences.

What determines the speed of the search? As with std::search(), in the worst case, the number of comparisons performed by the algorithm std::find_end() is the product of the number of elements in the two sequences.

13.3.6 Locate Maximum or Minimum Element

The functions std::max() and std::min() can be used to find the maximum and minimum of a pair of values. These can optionally take a third argument that defines the comparison function to use in place of the less-than operator <. The arguments are values, not iterators:

namespace std{
  template <class T> 
     const T& max(const T& a, const T& b [, Compare ] );
  template <class T> 
     const T& min(const T& a, const T& b [, Compare ] );
}

The maximum and minimum functions are generalized to entire sequences by the generic algorithms std::max_element() and std::min_element(). For these functions the arguments are input iterators.

namespace std {
  ForwardIterator max_element (ForwardIterator first, 
                               ForwardIterator last
                               [, Compare ] );
  ForwardIterator min_element (ForwardIterator first,
                               ForwardIterator last
                               [, Compare ] );
}

These algorithms return an iterator that denotes the largest or smallest of the values in a sequence, respectively. Should more than one value satisfy the requirement, the result yielded is the first satisfactory value. Both algorithms can optionally take a third argument, which is the function to be used as the comparison operator in place of the default operator.

The example program illustrates several uses of these algorithms. The function named split() that was used to divide a string into words in the string example is described in Section 12.3. The function randomInteger() is described in Section 2.5.

void max_min_example()
// illustrates use of max_element and min_element algorithms
// see alg2.cpp for complete source code
{
  // make a vector of random numbers between 0 and 99
  std::vector<int> numbers(25);
  for (int i = 0; i < 25; i++)
    numbers[i] = randomInteger(100);

  // print the maximum
  std::vector<int>::iterator max = 
    std::max_element(numbers.begin(), numbers.end());
  std::cout << "largest value was " << * max << std::endl;

  // example using strings
  std::string text = 
    "It was the best of times, it was the worst of times.";
  std::list<std::string> words;
  std::split (text, " .,!:;", words);
  std::cout << "The smallest word is " 
            << *std::min_element(words.begin(), words.end())
            << " and the largest word is "
            << *std::max_element(words.begin(), words.end())
            << std::endl;
}

The maximum and minimum algorithms can be used with all the datatypes provided by the C++ Standard Library. However, for the ordered datatypes set and map, the maximum or minimum values are more easily accessed as the first or last elements in the structure.

13.3.7 Locate the First Mismatched Elements in Parallel Sequences

The name std::mismatch() might lead you to think this algorithm is the inverse of the std::equal() algorithm, which determines if two sequences are equal (Section 13.6.4). Instead, the std::mismatch() algorithm returns a pair of iterators that together indicate the first positions where two parallel sequences have differing elements. The structure pair is described in Section 9.1 and Section 9.2.1.

The second sequence is denoted only by a starting position, without an ending position. It is assumed, but not checked, that the second sequence contains at least as many elements as the first. The arguments and return type for mismatch() can be described as follows:

namespace std {
  pair<InputIterator, InputIterator> mismatch 
       (InputIterator first1, InputIterator last1, 
        InputIterator first2 [, BinaryPredicate ] );
}

The elements of the two sequences are examined in parallel, element by element. When a mismatch is found, that is, a point where the two sequences differ, then a pair containing iterators denoting the locations of the two differing elements is constructed and returned. If the first sequence is exhausted before discovering any mismatched elements, then the resulting pair contains the ending value for the first sequence, and the last value examined in the second sequence. The second sequence need not yet be exhausted.

The example program illustrates the use of this procedure. The function std::mismatch_test() takes as arguments two string values. These are lexicographically compared and a message printed indicating their relative ordering. This is similar to the analysis performed by the std::lexicographic_compare() algorithm, although that function simply returns a boolean value.

Because the std::mismatch() algorithm assumes the second sequence is at least as long as the first, a comparison of the two string lengths is performed first, and the arguments are reversed if the second string is shorter than the first. After the call on std::mismatch(), the elements of the resulting pair are separated into their component parts. These parts are then tested to determine the appropriate ordering.

void mismatch_test (char *a, char *b)
// illustrates the use of the mismatch algorithm
// see alg2.cpp for complete source code
{
  std::pair<char *, char *> differPositions(0, 0);
  char *aDiffPosition;
  char *bDiffPosition;

  if (std::strlen(a) < std::strlen(b)) {
    // make sure longer string is second
    differPositions = std::mismatch(a, a + std::strlen(a), b);
    aDiffPosition = differPositions.first;
    bDiffPosition = differPositions.second;
  }
  else {
    differPositions = std::mismatch(b, b + std::strlen(b), a);
    // note following reverse ordering
    aDiffPosition = differPositions.second;
    bDiffPosition = differPositions.first;
  }

  // compare resulting values
  std::cout << "string " << a;

  if (*aDiffPosition == *bDiffPosition)
    std::cout << " is equal to ";
  else if (*aDiffPosition < *bDiffPosition)
    std::cout << " is less than ";
  else
    std::cout << " is greater than ";

  std::cout << b << std::endl;
}

A second form of the std::mismatch() algorithm is similar to this one, except it accepts a binary predicate as a fourth argument. This binary function is used to compare elements in place of operator==().