rsd package

Review Graph Based Online Store Review Spammer Detection.

RSD is an algorithm introduced by Guan Wang, et al. in ICDM2011. This algorithm represents review data as a following graph.

$digraph bipartite { graph [label="Graph model used in RSD.", rankdir = LR]; "r1" [label="Reviewer 1 (trustiness: 0.1)"]; "r2" [label="Reviewer 2 (trustiness: 0.9)"]; "r3" [label="Reviewer 3 (trustiness: 0.5)"]; "p1" [label="Product 1 (reliability: 0.3)"]; "p2" [label="Product 2 (reliability: 0.8)"]; "r1p1" [label="0.3"]; "r1p2" [label="0.9"]; "r2p2" [label="0.1"]; "r3p2" [label="0.5"]; "r1" -> "r1p1" -> "p1"; "r1" -> "r1p2" -> "p2"; "r2" -> "r2p2" -> "p2"; "r3" -> "r3p2" -> "p2"; "d_r1p1" [shape=box, label="time: 1 honesty: 0.4 agreement: 1.0 "]; "d_r1p2" [shape=box, label="time: 4 honesty: 0.1 agreement: 0.3 "]; "d_r2p2" [shape=box, label="time: 2 honesty: 0.8 agreement: 0.3 "]; "d_r3p2" [shape=box, label="time: 3 honesty: 0.2 agreement: 0.3 "]; "r1p1" -> "d_r1p1" [style=dotted]; "r1p2" -> "d_r1p2" [style=dotted]; "r2p2" -> "d_r2p2" [style=dotted]; "r3p2" -> "d_r3p2" [style=dotted]; }$

This package exports ReviewGraph which is an alias of rsd.graph.ReviewGraph.

class rsd.ReviewGraph(theta: float)[source]

Bases: object

A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.

Each edge has a label representing a review.

Parameters:: theta – A parameter for updating. See the paper for more details.

add_review(reviewer: Reviewer, product: Product, review: float, time: int | None = None) → Review[source]

Add a new review.

Parameters:

reviewer – An instance of Reviewer.
product – An instance of Product.
review – A real number representing review score.
time – An integer representing reviewing time. (optional)

Returns:

the new review object.

property delta: float

Time delta.

This value is defined by $\delta = (t_{\rm max} - t_{\rm min}) \times \theta$, where $t_{\rm max}, t_{\rm min}$ are the maximum time, minimum time of all reviews, respectively, $\theta$ is the given parameter defining time ratio.

graph: Final[DiGraph]: Graph object of networkx.

new_product(name: str) → Product[source]

Create a new product.

Parameters:: name – The name of the new product.
Returns:: A new product instance.

new_reviewer(name: str, anomalous: float | None = None) → Reviewer[source]

Create a new reviewer.

Parameters:

name – the name of the new reviewer.
anomalous – the anomalous score of the new reviewer.

Returns:

A new reviewer instance.

products: Final[list[Product]]: Collection of products.

retrieve_products(review: Review) → Collection[Product][source]

Find products associated with a review.

Parameters:: review – A review instance.
Returns:: A list of products associated with the given review.

retrieve_reviewers(review: Review) → Collection[Reviewer][source]

Find reviewers associated with a review.

Parameters:: review – A review instance.
Returns:: A list of reviewers associated with the review.

retrieve_reviews(review: Review, time_diff: float | None = None, score_diff: float = 0.25) → ReviewSet[source]

Find agree and disagree reviews.

This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.

Parameters:

review – A review instance.
time_diff – An integer.
score_diff – An float value.

Returns:

A tuple consists of (a list of agree reviews, a list of disagree reviews)

retrieve_reviews_by_product(product: Product) → Collection[Review][source]

Find reviews to a product.

Parameters:: product – Product
Returns:: A list of reviews to the product.

retrieve_reviews_by_reviewer(reviewer: Reviewer) → Collection[Review][source]

Find reviews given by a reviewer.

Parameters:: reviewer – Reviewer
Returns:: A list of reviews given by the reviewer.

reviewers: Final[list[Reviewer]]: Collection of reviewers.

reviews: Final[list[Review]]: Collection of reviews.

update() → float[source]

Update reviewers’ anomalous scores and products’ summaries.

This update process consists of four steps;

Update honesty of reviews (See also Review.update_honesty()),
Update rustiness of reviewers (See also Reviewer.update_trustiness()),
Update reliability of products (See also Product.update_reliability()),
Update agreements of reviews (See also Review.update_agreement()).

Returns:: summation of maximum absolute updates for the above four steps.

Submodules

rsd.graph module

Implementation of RSD.

class rsd.graph.Node(graph: ReviewGraph, name: str)[source]

Bases: object

Abstract class of review graph.

Parameters:

graph – the graph object this node will belong to.
name – name of this node.

name: Final[str]: Name of this node.

class rsd.graph.Product(graph: ReviewGraph, name: str)[source]

Bases: Node

A node class representing a product.

Parameters:

graph – Graph object this product belongs to.
name – Name of this product.

reliability: float: A float value in [0, 1], which represents reliability of this product.

property summary: float

Summary of reviews.

This value is same as reliability. Original algorithm uses reliability but our algorithm uses summary. For convenience, both properties remain.

update_reliability() → float[source]

Update product’s reliability.

The new reliability is defined by

\[{\rm reliability}(p) = \frac{2}{1 + e^{-\theta}} - 1, \quad \theta = \sum_{r \in R(p)} {\rm trustiness}(r)({\rm review}(r, p) - \hat{s}),\]

where $R(p)$ is a set of reviewers product p receives, trustiness is defined in Reviewer.trustiness(), review(r, p) is the review score reviewer r has given to product p, and $\hat{s}$ is the median of review scores.

Returns:: absolute difference between old reliability and new one.

class rsd.graph.Review(graph: ReviewGraph, time: int, rating: float)[source]

Bases: object

A graph entity representing a review.

Parameters:

graph – Graph object this product belongs to.
time – When this review is posted.
rating – Rating of this review.

agreement: float: Agreement score.

honesty: float: Honesty score.

rating: Final[float]: Rating score of this review.

time: Final[int]: Time when this review posted.

update_agreement(delta: float) → float[source]

Update agreement of this review.

This process considers reviews posted in a close time span of this review. More precisely, let $t$ be the time when this review posted and $\delta$ be the time span, only reviews of which posted times are in $[t - \delta, t+\delta]$ are considered.

The updated agreement of a review $r$ will be computed with such reviews by

\[{\rm agreement}(r) = \frac{2}{1 + \exp( \sum_{v \in R_{+}} {\rm trustiness}(v) - \sum_{v \in R_{-}} {\rm trustiness}(v) )} - 1\]

where $R_{+}$ is a set of reviews close to the review $r$, i.e. the difference between ratings are smaller than or equal to delta, $R_{-}$ is the other reviews. The trustiness of a review means the trustiness of the reviewer who posts the review.

Parameters:: delta – a time span $\delta$. Only reviews posted in the span will be considered for this update.
Returns:: absolute difference between old agreement and new one.

update_honesty() → float[source]

Update honesty of this review.

The updated honesty of this review $r$ is defined by

\[{\rm honesty}(r) = |{\rm reliability}(P(r))| \times {\rm agreement}(r)\]

where $P(r)$ is the product this review posted.

Returns:: absolute difference between old honesty and new one.

class rsd.graph.ReviewGraph(theta: float)[source]

Bases: object

A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.

Each edge has a label representing a review.

Parameters:: theta – A parameter for updating. See the paper for more details.

add_review(reviewer: Reviewer, product: Product, review: float, time: int | None = None) → Review[source]

Add a new review.

Parameters:

reviewer – An instance of Reviewer.
product – An instance of Product.
review – A real number representing review score.
time – An integer representing reviewing time. (optional)

Returns:

the new review object.

property delta: float

Time delta.

This value is defined by $\delta = (t_{\rm max} - t_{\rm min}) \times \theta$, where $t_{\rm max}, t_{\rm min}$ are the maximum time, minimum time of all reviews, respectively, $\theta$ is the given parameter defining time ratio.

graph: Final[DiGraph]: Graph object of networkx.

new_product(name: str) → Product[source]

Create a new product.

Parameters:: name – The name of the new product.
Returns:: A new product instance.

new_reviewer(name: str, anomalous: float | None = None) → Reviewer[source]

Create a new reviewer.

Parameters:

name – the name of the new reviewer.
anomalous – the anomalous score of the new reviewer.

Returns:

A new reviewer instance.

products: Final[list[Product]]: Collection of products.

retrieve_products(review: Review) → Collection[Product][source]

Find products associated with a review.

Parameters:: review – A review instance.
Returns:: A list of products associated with the given review.

retrieve_reviewers(review: Review) → Collection[Reviewer][source]

Find reviewers associated with a review.

Parameters:: review – A review instance.
Returns:: A list of reviewers associated with the review.

retrieve_reviews(review: Review, time_diff: float | None = None, score_diff: float = 0.25) → ReviewSet[source]

Find agree and disagree reviews.

This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.

Parameters:

review – A review instance.
time_diff – An integer.
score_diff – An float value.

Returns:

A tuple consists of (a list of agree reviews, a list of disagree reviews)

retrieve_reviews_by_product(product: Product) → Collection[Review][source]

Find reviews to a product.

Parameters:: product – Product
Returns:: A list of reviews to the product.

retrieve_reviews_by_reviewer(reviewer: Reviewer) → Collection[Review][source]

Find reviews given by a reviewer.

Parameters:: reviewer – Reviewer
Returns:: A list of reviews given by the reviewer.

reviewers: Final[list[Reviewer]]: Collection of reviewers.

reviews: Final[list[Review]]: Collection of reviews.

update() → float[source]

Update reviewers’ anomalous scores and products’ summaries.

This update process consists of four steps;

Update honesty of reviews (See also Review.update_honesty()),
Update rustiness of reviewers (See also Reviewer.update_trustiness()),
Update reliability of products (See also Product.update_reliability()),
Update agreements of reviews (See also Review.update_agreement()).

Returns:: summation of maximum absolute updates for the above four steps.

class rsd.graph.ReviewSet(agree: Collection[Review], disagree: Collection[Review])[source]

Bases: NamedTuple

Pair of agreed reviews and disagreed reviews.

agree: Collection[Review]: Collection of agreed reviews.

disagree: Collection[Review]: Collection of disagreed reviews.

class rsd.graph.Reviewer(graph: ReviewGraph, name: str, anomalous: float | None = None)[source]

Bases: Node

A node class representing a reviewer.

Parameters:

graph – Graph object this reviewer belongs to.
name – Name of this reviewer.
anomalous – Initial anomalous score (default: None).

property anomalous_score: float

Returns the anomalous score of this reviewer.

The anomalous score is defined by 1 - trustiness.

trustiness: float: A float value in [0, 1] which represents trustiness of this reviewer.

update_trustiness() → float[source]

Update trustiness of this reviewer.

The updated trustiness of a reviewer $u$ is defined by

\[{\rm trustiness}(u) = \frac{2}{1 + \exp(-\sum_{r \in R(u)} {\rm honesty(r)} )} - 1\]

where $R(u)$ is a set of reviews the reviewer $u$ posts.

Returns;: absolute difference between the old trustiness and updated one.