Searching Data

Searching Data

Introduction to Searching Data

  • When handling data, there are often instances where you need to find specific information. This process is known as searching.
  • Searching is a fundamental concept in computer science and is used in a wide variety of applications, from databases to the internet.
  • A search algorithm is a method used to locate specific data among a collection of data. It is a type of algorithm—a step-by-step procedure for performing calculations.
  • A linear search is the simplest type of search algorithm.
  • It starts at the beginning of a list and checks each element in turn until it finds the required value or reaches the end of the list.
  • This is not the fastest search method – especially if the list is very large – but it is extremely straightforward and can be used on any type of list, even if it’s unordered.
  • A binary search is a more advanced type of search algorithm that works by dividing and conquering.
  • This can only be used on a list that is already sorted.
  • It works by checking the middle value in the list: if this is not the required value, it discards either the bottom half or the top half of the list, depending on whether the required value is higher or lower than the middle value.
  • The process is then repeated on the remaining half of the list.
  • This is much faster than a linear search on a large, sorted list, as it reduces the size of the search range by half with each step.
  • Search algorithms can be used to find a range of different types of data, from simple data types like numbers and strings, to more complex data types such as records and objects.
  • It’s important to understand that the efficiency of different search methods can depend on the characteristics of the data, such as how many items there are, and how they are arranged or sorted.

Efficiency of Search Algorithms

  • The efficiency of a search algorithm is a measure of the resources that it uses (time, memory, etc.) in relation to the size of the dataset it is working with.
  • Efficiency is important when dealing with large datasets because a less efficient algorithm can significantly impact performance.
  • The efficiency of a search algorithm is often expressed as a big O notation, which describes the worst-case scenario in terms of the number of steps an algorithm takes based on the size of the dataset.
  • For example, for a linear search, the worst-case scenario (where the wanted item is at the end of the list) is expressed as O(n), where n is the number of items in the list, indicating that in the worst case, every item in the list has to be checked.

Hashing

  • Hashing is a technique used to speed up searching.
  • It uses a hash function to calculate a unique index (a “hash”) for each key in the dataset.
  • When searching for a particular key, the hash function is used to directly locate the corresponding data, rather than having to search through the dataset.
  • Hashing is extremely fast and effective, but it requires a good hash function that will distribute keys throughout the dataset without collision (two keys getting the same hash value).