rsd package

Review Graph Based Online Store Review Spammer Detection.

RSD is an algorithm instoduced by Guan Wang, et al. in ICDM2011. This algorithm represents review data as a following graph.

digraph bipartite {
   graph [label="Graph model used in RSD.", rankdir = LR];
   "r1" [label="Reviewer 1
(trustiness: 0.1)"];
   "r2" [label="Reviewer 2
(trustiness: 0.9)"];
   "r3" [label="Reviewer 3
(trustiness: 0.5)"];
   "p1" [label="Product 1
(reliability: 0.3)"];
   "p2" [label="Product 2
(reliability: 0.8)"];
   "r1p1" [label="0.3"];
   "r1p2" [label="0.9"];
   "r2p2" [label="0.1"];
   "r3p2" [label="0.5"];
   "r1" -> "r1p1" -> "p1";
   "r1" -> "r1p2" -> "p2";
   "r2" -> "r2p2" -> "p2";
   "r3" -> "r3p2" -> "p2";
   "d_r1p1" [shape=box, label="time: 1
honesty: 0.4
agreement: 1.0 "];
   "d_r1p2" [shape=box, label="time: 4
honesty: 0.1
agreement: 0.3 "];
   "d_r2p2" [shape=box, label="time: 2
honesty: 0.8
agreement: 0.3 "];
   "d_r3p2" [shape=box, label="time: 3
honesty: 0.2
agreement: 0.3 "];
   "r1p1" -> "d_r1p1" [style=dotted];
   "r1p2" -> "d_r1p2" [style=dotted];
   "r2p2" -> "d_r2p2" [style=dotted];
   "r3p2" -> "d_r3p2" [style=dotted];
 }

The root module has an alias of rsd.graph.ReviewGraph as ReviewGraph.

Submodules

rsd.graph module

Implementation of RSD.

class rsd.graph.Product(graph, name=None)[source]

Bases: rsd.graph._Node

A node class representing a product.

Parameters:
  • graph – Graph object this product belongs to.
  • name – Name of this product.
reliability

a float value in [0, 1], which represents reliability of this product.

summary

Summary of reviews.

This value is same as reliability. Original algorithm uses reliability but our algorithm uses summary. For convenience, both properties remain.

update_reliability()[source]

Update product’s reliability.

The new reliability is defined by

\[{\rm reliability}(p) = \frac{2}{1 + e^{-\theta}} - 1, \quad \theta = \sum_{r \in R(p)} {\rm trustiness}(r)({\rm review}(r, p) - \hat{s}),\]

where \(R(p)\) is a set of reviewers product p receives, trustiness is defined in Reviewer.trustiness(), review(r, p) is the review score reviewer r has given to product p, and \(\hat{s}\) is the median of review scores.

Returns:absolute difference between old reliability and new one.
class rsd.graph.Review(graph, time, rating)[source]

Bases: object

A graph entity representing a review.

rating

rating score of this review.

honesty

honesty score.

aggreement

aggreement score.

time

time this review posted.

update_agreement(delta)[source]

Update agreement of this review.

This process considers reviews posted in a close time span of this review. More precisely, let \(t\) be the time when this review posted and \(\delta\) be the time span, only reviews of which posted times are in \([t - \delta, t+\delta]\) are considered.

The updated agreement of a review \(r\) will be computed with such reviews by

\[{\rm agreement}(r) = \frac{2}{1 + \exp( \sum_{v \in R_{+}} {\rm trustiness}(v) - \sum_{v \in R_{-}} {\rm trustiness}(v) )} - 1\]

where \(R_{+}\) is a set of reviews close to the review \(r\), i.e. the difference between ratings are smaller than or equal to delta, \(R_{-}\) is the other reviews. The trustiness of a review means the trustiness of the reviewer who posts the review.

Parameters:delta – a time span \(\delta\). Only reviews posted in the span will be considered for this update.
Returns:absolute difference between old agreement and new one.
update_honesty()[source]

Update honesty of this review.

The updated honesty of this review \(r\) is defined by

\[{\rm honesty}(r) = |{\rm reliability}(P(r))| \times {\rm agreement}(r)\]

where \(P(r)\) is the product this review posted.

Returns:absolute difference between old honesty and new one.
class rsd.graph.ReviewGraph(theta)[source]

Bases: object

Review graph is a bipartite graph of which one set of nodes represent
reviewers and the other set of nodes represent products.

Each edge has a label representing a review.

graph

graph object of networkx.

reviewers

a collection of reviewers.

products

a collection of products.

reviews

a collection of reviews.

add_review(reviewer, product, review, time=None)[source]

Add a new review.

Parameters:
  • reviewer – An instance of Reviewer.
  • product – An instance of Product.
  • review – A real number representing review score.
  • time – An integer representing reviewing time. (optional)
Returns:

the new review object.

delta

Time delta.

This value is defined by \(\delta = (t_{\rm max} - t_{\rm min}) \times \theta\), where \(t_{\rm max}, t_{\rm min}\) are the maximum time, minimum time of all reviews, respectively, \(\theta\) is the given parameter defining time ratio.

new_product(name=None)[source]

Create a new product.

Parameters:name – The name of the new product.
Returns:A new product instance.
new_reviewer(name=None, anomalous=None)[source]

Create a new reviewer.

Parameters:name – the name of the new review.
Returns:A new reviewer instance.
retrieve_products(*args)[source]

Find products associated with a review.

Parameters:review – A review instance.
Returns:A list of products associated with the given review.
retrieve_reviewers(*args)[source]

Find reviewers associated with a review.

Parameters:review – A review instance.
Returns:A list of reviewers associated with the review.
retrieve_reviews(review, time_diff=None, score_diff=0.25)[source]

Find agree and disagree reviews.

This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.

Parameters:
  • review – A review instance.
  • time_diff – An integer.
  • score_diff – An float value.
Returns:

A tuple consists of (a list of agree reviews, a list of disagree reviews)

retrieve_reviews_by_product(*args)[source]

Find reviews to a product.

Parameters:product – Product
Returns:A list of reviews to the product.
retrieve_reviews_by_reviewer(*args)[source]

Find reviews given by a reviewer.

Parameters:reviewer – Reviewer
Returns:A list of reviews given by the reviewer.
update()[source]

Update reviewers’ anomalous scores and products’ summaries.

This update process consists of four steps;

  1. Update honesties of reviews (See also Review.update_honesty()),
  2. Update trustinesses of reviewers (See also Reviewer.update_trustiness()),
  3. Update reliablities of products (See also Product.update_reliability()),
  4. Update agreements of reviews (See also Review.update_agreement()).
Returns:summation of maximum absolute updates for the above four steps.
class rsd.graph.Reviewer(graph, name=None, anomalous=None)[source]

Bases: rsd.graph._Node

A node class representing a reviewer.

Parameters:
  • graph – Graph object this reviewer belongs to.
  • name – Name of this reviewer.
  • anomalous – Initial anomalous score (default: None).
trustiness

a float value in [0, 1] which represents trustiness of this reviewer.

anomalous_score

Returns the anomalous score of this reviewer.

The anomalous score is defined by 1 - trustiness.

update_trustiness()[source]

Update trustiness of this reviewer.

The updated trustiness of a reviewer \(u\) is defined by

\[{\rm trustiness}(u) = \frac{2}{1 + \exp(-\sum_{r \in R(u)} {\rm honesty(r)} )} - 1\]

where \(R(u)\) is a set of reviews the reviewer \(u\) posts.

Returns;
absolute difference between the old trustiness and updated one.