rsd package

Review Graph Based Online Store Review Spammer Detection.

RSD is an algorithm introduced by Guan Wang, et al. in ICDM2011. This algorithm represents review data as a following graph.

digraph bipartite {
   graph [label="Graph model used in RSD.", rankdir = LR];
   "r1" [label="Reviewer 1
(trustiness: 0.1)"];
   "r2" [label="Reviewer 2
(trustiness: 0.9)"];
   "r3" [label="Reviewer 3
(trustiness: 0.5)"];
   "p1" [label="Product 1
(reliability: 0.3)"];
   "p2" [label="Product 2
(reliability: 0.8)"];
   "r1p1" [label="0.3"];
   "r1p2" [label="0.9"];
   "r2p2" [label="0.1"];
   "r3p2" [label="0.5"];
   "r1" -> "r1p1" -> "p1";
   "r1" -> "r1p2" -> "p2";
   "r2" -> "r2p2" -> "p2";
   "r3" -> "r3p2" -> "p2";
   "d_r1p1" [shape=box, label="time: 1
honesty: 0.4
agreement: 1.0 "];
   "d_r1p2" [shape=box, label="time: 4
honesty: 0.1
agreement: 0.3 "];
   "d_r2p2" [shape=box, label="time: 2
honesty: 0.8
agreement: 0.3 "];
   "d_r3p2" [shape=box, label="time: 3
honesty: 0.2
agreement: 0.3 "];
   "r1p1" -> "d_r1p1" [style=dotted];
   "r1p2" -> "d_r1p2" [style=dotted];
   "r2p2" -> "d_r2p2" [style=dotted];
   "r3p2" -> "d_r3p2" [style=dotted];
 }

This package exports ReviewGraph which is an alias of rsd.graph.ReviewGraph.

class rsd.ReviewGraph(theta: float)[source]

Bases: object

A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.

Each edge has a label representing a review.

Parameters:

theta – A parameter for updating. See the paper for more details.

add_review(reviewer: Reviewer, product: Product, review: float, time: Optional[int] = None) Review[source]

Add a new review.

Parameters:
  • reviewer – An instance of Reviewer.

  • product – An instance of Product.

  • review – A real number representing review score.

  • time – An integer representing reviewing time. (optional)

Returns:

the new review object.

new_product(name: Optional[str] = None) Product[source]

Create a new product.

Parameters:

name – The name of the new product.

Returns:

A new product instance.

new_reviewer(name: Optional[str] = None, anomalous: Optional[float] = None) Reviewer[source]

Create a new reviewer.

Parameters:
  • name – the name of the new reviewer.

  • anomalous – the anomalous score of the new reviewer.

Returns:

A new reviewer instance.

retrieve_products(review: Review) Collection[Product][source]

Find products associated with a review.

Parameters:

review – A review instance.

Returns:

A list of products associated with the given review.

retrieve_reviewers(review: Review) Collection[Reviewer][source]

Find reviewers associated with a review.

Parameters:

review – A review instance.

Returns:

A list of reviewers associated with the review.

retrieve_reviews(review: Review, time_diff: Optional[float] = None, score_diff: float = 0.25) ReviewSet[source]

Find agree and disagree reviews.

This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.

Parameters:
  • review – A review instance.

  • time_diff – An integer.

  • score_diff – An float value.

Returns:

A tuple consists of (a list of agree reviews, a list of disagree reviews)

retrieve_reviews_by_product(product: Product) Collection[Review][source]

Find reviews to a product.

Parameters:

product – Product

Returns:

A list of reviews to the product.

retrieve_reviews_by_reviewer(reviewer: Reviewer) Collection[Review][source]

Find reviews given by a reviewer.

Parameters:

reviewer – Reviewer

Returns:

A list of reviews given by the reviewer.

update() float[source]

Update reviewers’ anomalous scores and products’ summaries.

This update process consists of four steps;

  1. Update honesty of reviews (See also Review.update_honesty()),

  2. Update rustiness of reviewers (See also Reviewer.update_trustiness()),

  3. Update reliability of products (See also Product.update_reliability()),

  4. Update agreements of reviews (See also Review.update_agreement()).

Returns:

summation of maximum absolute updates for the above four steps.

property delta: float

Time delta.

This value is defined by \(\delta = (t_{\rm max} - t_{\rm min}) \times \theta\), where \(t_{\rm max}, t_{\rm min}\) are the maximum time, minimum time of all reviews, respectively, \(\theta\) is the given parameter defining time ratio.

graph: Final[DiGraph]

Graph object of networkx.

products: Final[list[rsd.graph.Product]]

Collection of products.

reviewers: Final[list[rsd.graph.Reviewer]]

Collection of reviewers.

reviews: Final[list[rsd.graph.Review]]

Collection of reviews.

Submodules

rsd.graph module

Implementation of RSD.

class rsd.graph.Node(graph: ReviewGraph, name: Optional[str] = None)[source]

Bases: object

Abstract class of review graph.

Parameters:
  • graph – the graph object this node will belong to.

  • name – name of this node.

name: Final[str]

Name of this node.

class rsd.graph.Product(graph: ReviewGraph, name: Optional[str] = None)[source]

Bases: Node

A node class representing a product.

Parameters:
  • graph – Graph object this product belongs to.

  • name – Name of this product.

update_reliability() float[source]

Update product’s reliability.

The new reliability is defined by

\[{\rm reliability}(p) = \frac{2}{1 + e^{-\theta}} - 1, \quad \theta = \sum_{r \in R(p)} {\rm trustiness}(r)({\rm review}(r, p) - \hat{s}),\]

where \(R(p)\) is a set of reviewers product p receives, trustiness is defined in Reviewer.trustiness(), review(r, p) is the review score reviewer r has given to product p, and \(\hat{s}\) is the median of review scores.

Returns:

absolute difference between old reliability and new one.

reliability: float

A float value in [0, 1], which represents reliability of this product.

property summary: float

Summary of reviews.

This value is same as reliability. Original algorithm uses reliability but our algorithm uses summary. For convenience, both properties remain.

class rsd.graph.Review(graph: ReviewGraph, time: int, rating: float)[source]

Bases: object

A graph entity representing a review.

Parameters:
  • graph – Graph object this product belongs to.

  • time – When this review is posted.

  • rating – Rating of this review.

update_agreement(delta: float) float[source]

Update agreement of this review.

This process considers reviews posted in a close time span of this review. More precisely, let \(t\) be the time when this review posted and \(\delta\) be the time span, only reviews of which posted times are in \([t - \delta, t+\delta]\) are considered.

The updated agreement of a review \(r\) will be computed with such reviews by

\[{\rm agreement}(r) = \frac{2}{1 + \exp( \sum_{v \in R_{+}} {\rm trustiness}(v) - \sum_{v \in R_{-}} {\rm trustiness}(v) )} - 1\]

where \(R_{+}\) is a set of reviews close to the review \(r\), i.e. the difference between ratings are smaller than or equal to delta, \(R_{-}\) is the other reviews. The trustiness of a review means the trustiness of the reviewer who posts the review.

Parameters:

delta – a time span \(\delta\). Only reviews posted in the span will be considered for this update.

Returns:

absolute difference between old agreement and new one.

update_honesty() float[source]

Update honesty of this review.

The updated honesty of this review \(r\) is defined by

\[{\rm honesty}(r) = |{\rm reliability}(P(r))| \times {\rm agreement}(r)\]

where \(P(r)\) is the product this review posted.

Returns:

absolute difference between old honesty and new one.

agreement: float

Agreement score.

honesty: float

Honesty score.

rating: Final[float]

Rating score of this review.

time: Final[int]

Time when this review posted.

class rsd.graph.ReviewGraph(theta: float)[source]

Bases: object

A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.

Each edge has a label representing a review.

Parameters:

theta – A parameter for updating. See the paper for more details.

add_review(reviewer: Reviewer, product: Product, review: float, time: Optional[int] = None) Review[source]

Add a new review.

Parameters:
  • reviewer – An instance of Reviewer.

  • product – An instance of Product.

  • review – A real number representing review score.

  • time – An integer representing reviewing time. (optional)

Returns:

the new review object.

new_product(name: Optional[str] = None) Product[source]

Create a new product.

Parameters:

name – The name of the new product.

Returns:

A new product instance.

new_reviewer(name: Optional[str] = None, anomalous: Optional[float] = None) Reviewer[source]

Create a new reviewer.

Parameters:
  • name – the name of the new reviewer.

  • anomalous – the anomalous score of the new reviewer.

Returns:

A new reviewer instance.

retrieve_products(review: Review) Collection[Product][source]

Find products associated with a review.

Parameters:

review – A review instance.

Returns:

A list of products associated with the given review.

retrieve_reviewers(review: Review) Collection[Reviewer][source]

Find reviewers associated with a review.

Parameters:

review – A review instance.

Returns:

A list of reviewers associated with the review.

retrieve_reviews(review: Review, time_diff: Optional[float] = None, score_diff: float = 0.25) ReviewSet[source]

Find agree and disagree reviews.

This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.

Parameters:
  • review – A review instance.

  • time_diff – An integer.

  • score_diff – An float value.

Returns:

A tuple consists of (a list of agree reviews, a list of disagree reviews)

retrieve_reviews_by_product(product: Product) Collection[Review][source]

Find reviews to a product.

Parameters:

product – Product

Returns:

A list of reviews to the product.

retrieve_reviews_by_reviewer(reviewer: Reviewer) Collection[Review][source]

Find reviews given by a reviewer.

Parameters:

reviewer – Reviewer

Returns:

A list of reviews given by the reviewer.

update() float[source]

Update reviewers’ anomalous scores and products’ summaries.

This update process consists of four steps;

  1. Update honesty of reviews (See also Review.update_honesty()),

  2. Update rustiness of reviewers (See also Reviewer.update_trustiness()),

  3. Update reliability of products (See also Product.update_reliability()),

  4. Update agreements of reviews (See also Review.update_agreement()).

Returns:

summation of maximum absolute updates for the above four steps.

property delta: float

Time delta.

This value is defined by \(\delta = (t_{\rm max} - t_{\rm min}) \times \theta\), where \(t_{\rm max}, t_{\rm min}\) are the maximum time, minimum time of all reviews, respectively, \(\theta\) is the given parameter defining time ratio.

graph: Final[DiGraph]

Graph object of networkx.

products: Final[list[rsd.graph.Product]]

Collection of products.

reviewers: Final[list[rsd.graph.Reviewer]]

Collection of reviewers.

reviews: Final[list[rsd.graph.Review]]

Collection of reviews.

class rsd.graph.ReviewSet(agree: Collection[Review], disagree: Collection[Review])[source]

Bases: NamedTuple

Pair of agreed reviews and disagreed reviews.

agree: Collection[Review]

Collection of agreed reviews.

disagree: Collection[Review]

Collection of disagreed reviews.

class rsd.graph.Reviewer(graph: ReviewGraph, name: Optional[str] = None, anomalous: Optional[float] = None)[source]

Bases: Node

A node class representing a reviewer.

Parameters:
  • graph – Graph object this reviewer belongs to.

  • name – Name of this reviewer.

  • anomalous – Initial anomalous score (default: None).

update_trustiness() float[source]

Update trustiness of this reviewer.

The updated trustiness of a reviewer \(u\) is defined by

\[{\rm trustiness}(u) = \frac{2}{1 + \exp(-\sum_{r \in R(u)} {\rm honesty(r)} )} - 1\]

where \(R(u)\) is a set of reviews the reviewer \(u\) posts.

Returns;

absolute difference between the old trustiness and updated one.

property anomalous_score: float

Returns the anomalous score of this reviewer.

The anomalous score is defined by 1 - trustiness.

trustiness: float

A float value in [0, 1] which represents trustiness of this reviewer.