Today's in-lab project is similar to the movie-matching game called six degrees of Kevin Bacon. We are providing you a data file that represents an undirected graph where there exists an edge between an actor and a movie if that actor was in that movie. Thus, your problem is to leverage NetworkX such that you can perform lookups of the shortest paths between actors.
The following is a function that you can use to import the graph.
from networkx import * from string import replace def import_graph(): fname = raw_input("Enter the filename of the graph: "); G = Graph() f = open(fname) for edge in f: a,b = edge.split('\t') b = b.replace ( "\n", "" ) G.add_edge(a,b) f.close() return G
The \t character is the separator between actor and movie. Also, note that ordering doesn't matter here, since the graph is undirected.
Then, create a procedure separation that leverages the imported graph to find the shortest path between two actors. Prompt for two actors names, then use the NetworkX function shortest_path in the following manner:
networkx.path.shortest_path(G, 'Bloom, Orlando', 'Bacon, Kevin')
This will return a list of nodes. Remember, you need to make sure that both actors are in the graph before calling the shortest path function. If either actor is not in the graph, print an error message and return. This type of path will be a list returned to you by the shortest path function of the following type:
actor -> movie -> actor -> movie -> ... -> actor
Parse the list returned in the following way, and output the results:
Burns, George was in "Movie Movie" with Wallach, Eli Wallach, Eli was in "Mystic River" with Bacon, Kevin The connection number of Burns, George and Bacon, Kevin is 2.
Connection number is defined as:
You can use Cytoscape to verify and view the graph. Also, there is an online version of this game which uses the complete data set of IMDB. We are using a reduced dataset.