Skip to main content
Ctrl+K
  • Welcome to CS515: Intro to Computer Science!
  • Week 1: Basic
    • Welcome!
    • Part 1: The Basics of Numbers and Variables
    • Mystery Operator
    • Variables
    • Variable Declarations
    • Variable Declarations, Part 2
    • Operator Precedence
    • Viral Twitter Math Problem
    • Temperature Conversion Mixup
    • Part 2: Using Functions
    • Working on Your Abs
    • Call Me, Maybe
    • Nesting Function Calls
    • Smaller of the Two
    • Smaller of the Three
    • Greatest Magnitude
    • Part 3: A Datatype for Text
    • Comma Chameleon
    • Longest String
    • Escaping
    • Escaping Practice
    • Part 4: Mixing Type is (Sometimes) an Error
    • Which Throws an Error?
    • Sum of Ints
    • Part 5: Input and Output
    • Copypasta
    • Pleased to Meet you!
    • Stretch Goals
    • Defining Our Own Functions
    • Cubes
    • Triangle Time
    • I Am Very Excited!!!!!!!
    • Floating Point Numbers, Redux
    • Which Are Floats?
    • Which Are Integers?
    • There and Back Again: Converting Between Integers and Floats
    • Converting with int to float
    • Adding, Subtracting, Multiplying and Dividing with Floating-Point Numbers
    • Does It float?
    • It Dosen’t Float!
    • Division: / versus //
    • Does It Float Again?
    • Conversions and Promotions
    • Introduction to Booleans
    • Expressions Yielding True
    • Expressions Yielding False
    • Boolean Expressions Using Variables
    • Writing Boolean Expressions
    • Functions Returning Booleans
    • Booleans Summary
    • Conditioning on Booleans with If Statements
    • Validating Values
    • More Conditions
    • Room Assignments
    • Default Conditions
    • Passing Grades
    • Characterizing Numbers
    • Computing the Lengths of Strings
    • Name Length
    • Indexing
    • Safe Indexing
    • Slicing
    • Slicing (without Dicing)
    • String Comparison
    • Using `in` and `not in` Operators
    • Comparing Strings
    • String Membership
    • Passing Arguments to Functions
    • What’s good about functions?
    • Parameters vs. arguments
    • Parameter names vs. argument names
    • Shared names and shadowing
    • Guten Tag!
    • The call stack
    • What’s On Your Stack?
    • Scope: local vs. global
    • Local variables only exist during function calls
    • Characteristics of local and global variables
    • Reading Code
    • Updating global variables from inside a function
    • You can count on me!
  • Week 2: Data structures and loops
    • Checking In about Datatypes
    • Defining Lists
    • List Indexing and Lookup
    • Last in Line
    • Summing the Ends
    • Slicing Lists
    • Slicing Lists: Your Turn!
    • Operations on Lists
    • Testing Membership in Lists
    • List Membership
    • List Length
    • Strings and Lists in Python
    • Splitting and Joining
    • Filtering and Mapping
    • Absolute Values
    • Map of the Territory
    • Mapping and Summing
    • What is a reference?
    • What is an object?
    • References, Objects, and Aliasing
    • How to find out if references point to the same object?
    • Fruit Salad (Part 1)
    • Fruit Salad (Part 2)
    • Fruit Salad (Part 3)
    • Absolute Values
    • A method to our madness
    • Counting needles in a haystack
    • Excuse me, this is a library
    • Dots and dashes
    • List methods
    • List methods!
    • Now backwards is talking
    • For loops
    • Home on the range
    • For/range output
    • Factorial
    • Using for loops with lists
    • How many times does it run?
    • Finding something in a list
    • Hip to be square
    • Counting evens and odds
    • While loops
    • Tracing a while loop
    • Tracing a while loop, redux
    • multiply_3
    • Searching with while loops
    • abs_more_than_10
    • User input validation
    • Input length validation
    • Infinite while loops
    • Break statements
    • Take a break to think about break
    • is_positive_or_multiple_of_7
    • Break statements in for loops
    • Understanding for loop with breaks
    • Find Python!
    • Continue statements
    • print_numbers
    • Randomness
    • Fair Die Roll
    • What and why are dictionaries?
    • Creating dictionaries
    • Burgeoning lexicographers
    • Looking things up in a dictionary
    • Lookup is looking up
    • Working with dictionaries
    • Adding and deleting keys
    • Adding/deleting practice
    • Constraints on dictionary keys
    • More lookup practice
    • Tuples as dictionary keys
    • Dictionary and tuple practice
    • insert_pair
    • How big is the dictionary?
    • len on dictionaries
    • Using methods for lookup
    • Get it?
    • Methods for adding and deleting items
    • News update
    • Looping over dictionaries
    • What should we loop over?
    • combined_cases
    • Finding the mode in a list of numbers
    • find_mode
    • count_total_words
    • Storing lists in dictionaries
    • print_emails
    • Storing dictionaries in dictionaries
    • Dictionaries within dictionaries
    • Data processing
    • Accessing and updating data
    • Getting data out
    • Actually processing the data
    • lookup_state
    • Data with nested dictionaries
    • Organizing Data
    • Working with nested data
    • Lab grades
    • Processing nested data
    • is_above_85
    • Sets
    • Uniqueness
  • Week 3: Exceptions, recursion, and nested loops
    • Errors that end your program early
    • Syntax errors
    • Name/variable binding errors
    • Type errors
    • Errors from partial operations
    • Interruptions from the user and the environment
    • What kind of error?
    • Try/except statements
    • Exceptional control flow
    • Breakfast
    • Stack traces
    • Catching only certain exceptions
    • Hardened square-root program
    • Raising exceptions
    • Exceptional control flow, redux
    • Catching errors early with assert
    • Testing types with isinstance
    • isinstance
    • Surprise: integers and floats are unrelated
    • Surprise: bools are numbers!
    • Recursion: functions that call themselves
    • Identifying the parts
    • Sum and length
    • Visualizing recursive call stacks
    • Concatenating nested strings
    • Recursion into iteration (and vice versa)
    • What can’t a single for loop do?
    • Nested for loops
    • Simulating nested for loops
    • Iterate through lists using inner for-loops
    • More simulation
    • Predicting COVID cases
    • Doing your times tables
    • [OPTIONAL] Prime numbers
    • Lists of lists
    • Making a 2D list
    • Indexing nested lists
    • Nested lists
    • Nested for loops and 2D lists
    • increment_2D
    • What is an image?
    • Images and 2D lists
    • Pixel numbering
    • Pixels, Color and RGB
    • Color quiz
    • Tuples
    • create_pixel
    • Tuples and 2D lists
    • red_square
    • Recap
    • Accessing rows of images
    • Getting a row
    • Accessing columns of images
    • Getting a column
    • Nested for loops to access images
    • Getting a column
    • Modifying pixels of an image
    • Modifying pixels using for loops
    • More for loops
    • Blacking out checkers
    • Apply a red filter to an image
    • red_filter_pixel
    • Using for loops
    • Red filter
    • Working in just a region of an image
    • Red shifted region
    • Customizing red_filter
    • Thinking about custom_red_filter
    • custom_blue_filter
    • Handling edge cases
    • Bad inputs
    • custom_blue_filter, revisited
    • Flipping images left-to-right
    • Flipping cats
    • Flipping with nested for loops
    • More of a flop than a flip
    • List swap
    • We’re all tired at this point in the semester
    • Defining a function for flip
    • red_filter, in place
    • Returning an image
    • Understanding objects and references
    • Creating a copy of an image
    • Copying and flipping
    • Copies and references
    • Overwriting references
    • Overwriting references
    • red_filter, copying
  • Recursion Workbook
    • Recursion Practice
    • List sum
    • List minimum
    • Linear search
    • Stutter
    • Even-indexed elements
    • Odd-indexed elements
    • Map
    • Zip
    • Alternate
    • Everywhere
    • Summing nested dictionaries
    • Tries
  • Week 4: Objects and classes
    • What is an object?
    • Surprise: we’ve been using objects!
    • Which are method calls?
    • Creating objects: constructors
    • Using constructors
    • Defining objects: classes
    • Constructors
    • Working with fields
    • Sum
    • Defining methods
    • Adder
    • Private fields with __
    • SmallCounter
    • Printing objects out
    • Sum with __repr__
    • Objects and classes: recap
    • Finding your way around
    • Explaining what you’re about
    • Doctest: putting your tests in your documentation
    • Testing mean
    • Inheritance
    • Inheritance
    • Classy testing
    • Testing median
  • Week 5: Regular expressions
    • Turning strings into structured things
    • Running example: DNA and amino acid motifs
    • DNA and amino encoding
    • Translating DNA to amino acids
    • Translating DNA to amino acids
    • Motifs
    • Finding motifs by hand
    • Regular expressions: a better way
    • Characters and character classes
    • Matching practice
    • The re module
    • re.findall practice
    • Special characters, escape characters, and raw strings
    • Escaping practice
    • Choice a/k/a parallel composition; groups
    • Choice practice
    • Repetition
    • US phone numbers
    • Unbounded repetition
    • Using re.search, re.match, and re.fullmatch
    • Practice with Match objects
    • ISO 8601 dates
    • Substitutions and backreference
    • A word of caution
  • Week 6: Algorithms
    • What is an algorithm?
    • Shuffling: stating the problem
    • The Fisher-Yates shuffle
    • Bottom-up Fisher-Yates
    • Shuffling wrong
    • Empirically testing shuffle implementations
    • Searching
    • Linear search
    • Sortedness
    • Testing for sortedness
    • Binary search
    • Understanding search algorithms
    • Sorting algorithms
    • Insertion sort: the “natural” sort
    • Insertion sort with loops
    • Selection sort
    • Descending selection sort
    • Quicksort
    • Choosing the pivot element
    • Quicksort with median pivot
    • Merge sort
    • Merging sorted lists
  • Week 7: Python Details
    • Real world Python
    • Defining your own modules
    • Defining modules
    • Nested modules
    • __name__ == “__main__”
    • Get you a module that can do both
    • Variable arity with *args
    • Variable arity minimum and maximum
    • Named arguments with defaults with keyword arguments and **kw
    • Sort by key
    • Anonymous functions with lambda
    • Anonymous function practice
    • Iterables
    • The Collatz conjecture
    • Making generators with the yield keyword
    • Practice with generators
    • Comprehensions
    • Practice with comprehensions
    • Arithmetic overloading
    • Practice with overloading
    • Unpacking patterns
    • Enumerations of fixed values
    • Type annotations
    • Decorators change behavior
    • Document field names and types with data classes
    • Further reading
  • Repository
  • Show source
  • Suggest edit
  • Open issue
  • .rst

Binary search

Contents

  • Performance
  • Invariants

Binary search#

Suppose we know something more about our list—that it’s sorted ascending, i.e., each element is no smaller than the previous one (but maybe equal, if there are duplicates). Can we use that information to write a better search? Absolutely! Here’s the intuition: suppose we’re looking for the number 10 in a list of sorted numbers. We glance at the list and see the number 50. If the list is sorted ascending, we know that if 10 is in the list, it’s to the left of 50—we can ignore everything to the right, since it’s greater than or equal to 50 which is greater than 10.

The algorithm is called binary search, because we cut the elements we’re looking at—the range of the list we’re considering—in half every time. Here’s the idea to find a needle/target element v in a list of n elements:

  1. Start out with a lower bound of lo = 0 and an upper bound of hi = n-1. These define our current range.

  2. Look at the middle element of our range (i.e., mid = lo + ((hi-lo) / 2)), call it x.
    1. If v is equal to x, return the index mid.

    2. If v is less than x, set hi to mid-1.

    3. If v is greater than x, set lo to mid+1.

  3. If hi < lo, the element is not in the list.

  4. Go back to (2).

Let’s see it in Python:

If you’re not sure why the code is working, try uncommenting the print call. It will tell you the values of mid, lo, and hi on every iteration.

Correctness here is harder to show—it’s the sort of thing we’d talk about in a data structures or algorithms course. The general method is to go by induction on the length of the range, using the sortedness of the list to argue that everything above hi is strictly greater than v and everything below lo is strictly less than v. At every step, our range shrinks, and we get closer to v (or its absence). In the base case of the induction, hi < lo, and that means all elements in the list are either greater or less than v, so v cannot be in the list—and we’re correct to indicate “not found”.

Performance#

Binary search is a more complicated algorithm than linear search—it’s very easy to get it wrong! But it comes with a huge advantage: it is much more performant. If your list is sorted, binary search is very much superior to linear search.

A deep analysis requires some more advanced tools (induction again, and some careful math). But we can sketch it out here, again using “number of comparisons” as our measure.

In the best case, our element is exactly in the middle. We found it on our first go, making only one comparison.

In the worst case, our element isn’t in the list. How many elements will be consider? Each time we run the loop, our range halves. How many times can we run the loop before our range shrinks to nothing (i.e, hi < lo) and we have to give up? The number of times you can halve a number corresponds to its logarithm base 2 a/k/a its binary logarithm, written:

Logarithmic performance is fantastic. Why? Well, suppose we have 1000 elements. Linear search will perform 1000 comparisons before giving up. We’ll perform log2(1000) comparisons. If you know that 2**10 == 1024, you can guess how many comparisons, but we can also just compute:

We can expect 10 comparisons before giving up. That’s a huge improvement over linear search’s 1000 comparisons! On 2000 elements, we’d have 11 comparisons, while linear search would do… 2000! Yikes!

Our average case resembles the worst case—given uniform distribution, we can expect to do no more than log2(n) comparisons.

Invariants#

It’s very common to use an invariant like “the list is sorted” to find a better algorithm. Such invariants can be tricky to maintain, but often allow for much better performance. Such invariants are common not only in computer science, but all through the world: dictionaries are in sorted order to make it easy to find words; libraries use special categorizations to help you find the books you’re looking for in particular; a restaurant menu is broken into sections to help you find foods that you’re interested in.

previous

Testing for sortedness

next

Understanding search algorithms

Contents
  • Performance
  • Invariants

By Joshua Hizgiaev & Michael Greenberg

© Copyright 2023, Michael Greenberg.