Strict Standards: Declaration of Doku_Renderer_metadata::table_open() should be compatible with Doku_Renderer::table_open($maxcols = NULL, $numrows = NULL, $pos = NULL) in /home2/cp-wiki/htdocs/inc/parser/metadata.php on line 24

Strict Standards: Declaration of Doku_Renderer_metadata::table_close() should be compatible with Doku_Renderer::table_close($pos = NULL) in /home2/cp-wiki/htdocs/inc/parser/metadata.php on line 24

Warning: Cannot modify header information - headers already sent by (output started at /home2/cp-wiki/htdocs/inc/parser/metadata.php:24) in /home2/cp-wiki/htdocs/inc/actions.php on line 581

Warning: Cannot modify header information - headers already sent by (output started at /home2/cp-wiki/htdocs/inc/parser/metadata.php:24) in /home2/cp-wiki/htdocs/inc/actions.php on line 581
cs190c:lab13
Table of Contents

Lab 13: In lab problem

For today's lab, we will be investigating the benefits of using an efficient algorithm. We will be searching datasets ranging in size from 100 to 10,000 elements.

Procedure

For each list size:

  1. Record the clock (t1).
  2. Use a linear search on the random list for each query. (if key in list:)
  3. Record the clock (t2).
  4. Sort the list. (Use sorted(L))
  5. Use a binary search on the sorted list for each query. (The binary search code is provided below.)
  6. Record the clock (t3).
  7. Create a dictionary from the random list.
  8. Search for the same query in the dictionary.
  9. Record the clock (t4).

From this information, you can compare the time it takes for the queries in a linear search versus the same queries on the dictionary.

Given Search Code

def binary_search(L, key):
    ### This function assumes a sorted list!
    first = 0
    last = len(L)-1
    current = (first + last) // 2
    while first != last:
        current = (first + last) // 2
        if key == L[current]:
            return True
        if key < L[current]:
            last = current - 1
        if key > L[current]:
            first = current + 1
    return L[current] == key

Given Plot Code

p1 = plot(sizes, list_times, 'rD')
p2 = plot(sizes, sorted_times, 'gD')
p3 = plot(sizes, dict_times, 'bD')
title('List vs Dictionary')
xlabel('Data set size')
ylabel('Running time')
legend(('List', 'Dictionary', 'Sorted List'), 2)
show()

Graph the Results

Graph your results of data set size (x-axis) versus time (y-axis). There should be three plots, one for the random list (t2-t1), one for the sorted list (t3-t2), and one for the dictionary (t4-t3).

What is the general trend of the three plots?

Sample Code

from pylab import plot, show, ylabel, xlabel, title, legend
from random import randint,shuffle
from time import clock
 
sizes = range(100,10000, 100)
queries = range(1000)
ltimes = []
btimes = []
dtimes = []
 
for size in sizes:
    r = range(size)
    shuffle(r)        
    dlist = r
    ddict = {}
 
    start = clock()
    for q in queries:
        if q in dlist:
            pass
    end = clock()
    ltimes.append(end - start)
 
    start = clock()
    slist = sorted(r)
    for q in queries:
        found = False
        first = 0
        last = len(slist)-1
        while first != last and not found:
            current = (first + last) // 2
            if q == slist[current]:
                found = True
            if q < slist[current]:
                last = current - 1
            if q > slist[current]:
                first = current + 1
    end = clock()
    btimes.append(end - start)
 
    start = clock()
    for d in dlist:
        ddict[d] = None
    for q in queries:
        if q in ddict:
            pass
    end = clock()
    dtimes.append(end - start)
 
p1 = plot(sizes, ltimes, 'rD')
p2 = plot(sizes, dtimes, 'bD')
p3 = plot(sizes, btimes, 'gD')
title('List vs Dictionary')
xlabel('Data set size')
ylabel('Running time')
legend(('List', 'Dictionary', 'Sorted List'), 2)
show()