# Lab 13: In lab problem

For today's lab, we will be investigating the benefits of using an efficient algorithm. We will be searching datasets ranging in size from 100 to 10,000 elements.

• Sizes of the lists should be from 100 to 10,000, stepping by 100. The list should be the numbers from 0 to N-1.
`sizes = range(100,10000, 100)`
• Shuffle the list, using shuffle in the random library.
• For each list, search for each of following values:
`queries = range(1000)`
• To record the clock time, use clock in the time library.

## Procedure

For each list size:

1. Record the clock (t1).
2. Use a linear search on the random list for each query. (if key in list:)
3. Record the clock (t2).
4. Sort the list. (Use sorted(L))
5. Use a binary search on the sorted list for each query. (The binary search code is provided below.)
6. Record the clock (t3).
7. Create a dictionary from the random list.
8. Search for the same query in the dictionary.
9. Record the clock (t4).

From this information, you can compare the time it takes for the queries in a linear search versus the same queries on the dictionary.

## Given Search Code

```def binary_search(L, key):
### This function assumes a sorted list!
first = 0
last = len(L)-1
current = (first + last) // 2
while first != last:
current = (first + last) // 2
if key == L[current]:
return True
if key < L[current]:
last = current - 1
if key > L[current]:
first = current + 1
return L[current] == key```

## Given Plot Code

```p1 = plot(sizes, list_times, 'rD')
p2 = plot(sizes, sorted_times, 'gD')
p3 = plot(sizes, dict_times, 'bD')
title('List vs Dictionary')
xlabel('Data set size')
ylabel('Running time')
legend(('List', 'Dictionary', 'Sorted List'), 2)
show()```

## Graph the Results

Graph your results of data set size (x-axis) versus time (y-axis). There should be three plots, one for the random list (t2-t1), one for the sorted list (t3-t2), and one for the dictionary (t4-t3).

What is the general trend of the three plots?

## Sample Code

```from pylab import plot, show, ylabel, xlabel, title, legend
from random import randint,shuffle
from time import clock

sizes = range(100,10000, 100)
queries = range(1000)
ltimes = []
btimes = []
dtimes = []

for size in sizes:
r = range(size)
shuffle(r)
dlist = r
ddict = {}

start = clock()
for q in queries:
if q in dlist:
pass
end = clock()
ltimes.append(end - start)

start = clock()
slist = sorted(r)
for q in queries:
found = False
first = 0
last = len(slist)-1
while first != last and not found:
current = (first + last) // 2
if q == slist[current]:
found = True
if q < slist[current]:
last = current - 1
if q > slist[current]:
first = current + 1
end = clock()
btimes.append(end - start)

start = clock()
for d in dlist:
ddict[d] = None
for q in queries:
if q in ddict:
pass
end = clock()
dtimes.append(end - start)

p1 = plot(sizes, ltimes, 'rD')
p2 = plot(sizes, dtimes, 'bD')
p3 = plot(sizes, btimes, 'gD')
title('List vs Dictionary')
xlabel('Data set size')
ylabel('Running time')
legend(('List', 'Dictionary', 'Sorted List'), 2)
show()```

cs190c/lab13.txt · Last modified: 2008/07/24 12:13 (external edit)