amazon module

This module provides a loading function of an Amazon Dataset.

The dataset consists of reviews for products insix categories. The list of the categoris are defined CATEGORIES. If you give one or a list of categories chosed from the list to load(), the function will load only reviews for products belong to the given categories.

This package also provides a helper function, print_state(), to output a state of a graph object.

To use both fuctions, the graph object must implement the graph interface.

This is statistics of ratings and the number of reviewers:

Rating score The number of reviewers
1.0 26754
2.0 16964
3.0 20294
4.0 57011
5.0 148373
amazon.CATEGORIES = ['cameras', 'laptops', 'mobilephone', 'tablets', 'TVs', 'video_surveillance']

Categories this dataset has.

amazon.load(graph, categories=None)[source]

Load the Amazon dataset to a given graph object.

The graph object must implement the graph interface.

If a list of categories is given, only reviews which belong to one of the given categories are added to the graph.

Parameters:graph – an instance of bipartite graph.
Returns:The graph instance graph.
amazon.print_state(g, i, output=<open file '<stdout>', mode 'w'>)[source]

Print a current state of a given graph.

This method outputs a current of a graph as a set of json objects. Graph objects must have two properties, reviewers and products. Those properties returns a set of reviewers and products respectively. See the graph interface for more information.

In this output format, each line represents a reviewer or product object.

Reviewer objects are defined as

{
   "iteration": <the iteration number given as i>
   "reviewer":
   {
      "reviewer_id": <Reviewer's ID>
      "score": <Anomalous score of the reviewer>
   }
}

Product objects are defined as

{
   "iteration": <the iteration number given as i>
   "reviewer":
   {
      "product_id": <Product's ID>
      "sumarry": <Summary of the reviews for the product>
   }
}
Parameters:
  • g – Graph instance.
  • i – Iteration number.
  • output – A writable object (default: sys.stdout).