amazon module¶

This module provides a loading function of an Amazon Dataset.

The dataset consists of reviews for products insix categories. The list of the categoris are defined CATEGORIES. If you give one or a list of categories chosed from the list to load(), the function will load only reviews for products belong to the given categories.

This package also provides a helper function, print_state(), to output a state of a graph object.

To use both fuctions, the graph object must implement the graph interface.

This is statistics of ratings and the number of reviewers:

Rating score	The number of reviewers
1.0	26754
2.0	16964
3.0	20294
4.0	57011
5.0	148373

amazon.CATEGORIES = ['cameras', 'laptops', 'mobilephone', 'tablets', 'TVs', 'video_surveillance']¶: Categories this dataset has.

amazon.load(graph, categories=None)[source]¶

Load the Amazon dataset to a given graph object.

The graph object must implement the graph interface.

If a list of categories is given, only reviews which belong to one of the given categories are added to the graph.

Parameters:	graph – an instance of bipartite graph.
Returns:	The graph instance graph.

amazon.print_state(g, i, output=<open file '<stdout>', mode 'w'>)[source]¶

Print a current state of a given graph.

This method outputs a current of a graph as a set of json objects. Graph objects must have two properties, reviewers and products. Those properties returns a set of reviewers and products respectively. See the graph interface for more information.

In this output format, each line represents a reviewer or product object.

Reviewer objects are defined as

{
   "iteration": <the iteration number given as i>
   "reviewer":
   {
      "reviewer_id": <Reviewer's ID>
      "score": <Anomalous score of the reviewer>
   }
}

Product objects are defined as

{
   "iteration": <the iteration number given as i>
   "reviewer":
   {
      "product_id": <Product's ID>
      "sumarry": <Summary of the reviews for the product>
   }
}

Parameters:	g – Graph instance. i – Iteration number. output – A writable object (default: sys.stdout).