For today's lab, we will be investigating the benefits of using an efficient algorithm. We will be searching datasets ranging in size from 100 to 10,000 elements.

- Sizes of the lists should be from 100 to 10,000, stepping by 100. The list should be the numbers from 0 to N-1.
sizes = range(100,10000, 100)

- Shuffle the list, using
**shuffle**in the**random**library. - For each list, search for each of following values:
queries = range(1000)

- To record the clock time, use
**clock**in the**time**library.

For each list size:

- Record the clock (t1).
- Use a linear search on the random list for each query. (
**if key in list:**) - Record the clock (t2).
- Sort the list. (Use
**sorted(L)**) - Use a binary search on the sorted list for each query. (The binary search code is provided below.)
- Record the clock (t3).
- Create a dictionary from the random list.
- Search for the same query in the dictionary.
- Record the clock (t4).

From this information, you can compare the time it takes for the queries in a linear search versus the same queries on the dictionary.

def binary_search(L, key): ### This function assumes a sorted list! first = 0 last = len(L)-1 current = (first + last) // 2 while first != last: current = (first + last) // 2 if key == L[current]: return True if key < L[current]: last = current - 1 if key > L[current]: first = current + 1 return L[current] == key

p1 = plot(sizes, list_times, 'rD') p2 = plot(sizes, sorted_times, 'gD') p3 = plot(sizes, dict_times, 'bD') title('List vs Dictionary') xlabel('Data set size') ylabel('Running time') legend(('List', 'Dictionary', 'Sorted List'), 2) show()

Graph your results of data set size (x-axis) versus time (y-axis). There should be three plots, one for the random list (t2-t1), one for the sorted list (t3-t2), and one for the dictionary (t4-t3).

What is the general trend of the three plots?

from pylab import plot, show, ylabel, xlabel, title, legend from random import randint,shuffle from time import clock sizes = range(100,10000, 100) queries = range(1000) ltimes = [] btimes = [] dtimes = [] for size in sizes: r = range(size) shuffle(r) dlist = r ddict = {} start = clock() for q in queries: if q in dlist: pass end = clock() ltimes.append(end - start) start = clock() slist = sorted(r) for q in queries: found = False first = 0 last = len(slist)-1 while first != last and not found: current = (first + last) // 2 if q == slist[current]: found = True if q < slist[current]: last = current - 1 if q > slist[current]: first = current + 1 end = clock() btimes.append(end - start) start = clock() for d in dlist: ddict[d] = None for q in queries: if q in ddict: pass end = clock() dtimes.append(end - start) p1 = plot(sizes, ltimes, 'rD') p2 = plot(sizes, dtimes, 'bD') p3 = plot(sizes, btimes, 'gD') title('List vs Dictionary') xlabel('Data set size') ylabel('Running time') legend(('List', 'Dictionary', 'Sorted List'), 2) show()