rsd package¶
Review Graph Based Online Store Review Spammer Detection.
RSD is an algorithm introduced by Guan Wang, et al. in ICDM2011. This algorithm represents review data as a following graph.
This package exports ReviewGraph which is an alias of rsd.graph.ReviewGraph
.
- class rsd.ReviewGraph(theta: float)[source]¶
Bases:
object
A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.
Each edge has a label representing a review.
- Parameters:
theta – A parameter for updating. See the paper for more details.
- add_review(reviewer: Reviewer, product: Product, review: float, time: Optional[int] = None) Review [source]¶
Add a new review.
- Parameters:
reviewer – An instance of Reviewer.
product – An instance of Product.
review – A real number representing review score.
time – An integer representing reviewing time. (optional)
- Returns:
the new review object.
- new_product(name: Optional[str] = None) Product [source]¶
Create a new product.
- Parameters:
name – The name of the new product.
- Returns:
A new product instance.
- new_reviewer(name: Optional[str] = None, anomalous: Optional[float] = None) Reviewer [source]¶
Create a new reviewer.
- Parameters:
name – the name of the new reviewer.
anomalous – the anomalous score of the new reviewer.
- Returns:
A new reviewer instance.
- retrieve_products(review: Review) Collection[Product] [source]¶
Find products associated with a review.
- Parameters:
review – A review instance.
- Returns:
A list of products associated with the given review.
- retrieve_reviewers(review: Review) Collection[Reviewer] [source]¶
Find reviewers associated with a review.
- Parameters:
review – A review instance.
- Returns:
A list of reviewers associated with the review.
- retrieve_reviews(review: Review, time_diff: Optional[float] = None, score_diff: float = 0.25) ReviewSet [source]¶
Find agree and disagree reviews.
This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.
- Parameters:
review – A review instance.
time_diff – An integer.
score_diff – An float value.
- Returns:
A tuple consists of (a list of agree reviews, a list of disagree reviews)
- retrieve_reviews_by_product(product: Product) Collection[Review] [source]¶
Find reviews to a product.
- Parameters:
product – Product
- Returns:
A list of reviews to the product.
- retrieve_reviews_by_reviewer(reviewer: Reviewer) Collection[Review] [source]¶
Find reviews given by a reviewer.
- Parameters:
reviewer – Reviewer
- Returns:
A list of reviews given by the reviewer.
- update() float [source]¶
Update reviewers’ anomalous scores and products’ summaries.
This update process consists of four steps;
Update honesty of reviews (See also
Review.update_honesty()
),Update rustiness of reviewers (See also
Reviewer.update_trustiness()
),Update reliability of products (See also
Product.update_reliability()
),Update agreements of reviews (See also
Review.update_agreement()
).
- Returns:
summation of maximum absolute updates for the above four steps.
- property delta: float¶
Time delta.
This value is defined by \(\delta = (t_{\rm max} - t_{\rm min}) \times \theta\), where \(t_{\rm max}, t_{\rm min}\) are the maximum time, minimum time of all reviews, respectively, \(\theta\) is the given parameter defining time ratio.
- products: Final[list[rsd.graph.Product]]¶
Collection of products.
- reviewers: Final[list[rsd.graph.Reviewer]]¶
Collection of reviewers.
- reviews: Final[list[rsd.graph.Review]]¶
Collection of reviews.
Submodules¶
rsd.graph module¶
Implementation of RSD.
- class rsd.graph.Node(graph: ReviewGraph, name: Optional[str] = None)[source]¶
Bases:
object
Abstract class of review graph.
- Parameters:
graph – the graph object this node will belong to.
name – name of this node.
- class rsd.graph.Product(graph: ReviewGraph, name: Optional[str] = None)[source]¶
Bases:
Node
A node class representing a product.
- Parameters:
graph – Graph object this product belongs to.
name – Name of this product.
- update_reliability() float [source]¶
Update product’s reliability.
The new reliability is defined by
\[{\rm reliability}(p) = \frac{2}{1 + e^{-\theta}} - 1, \quad \theta = \sum_{r \in R(p)} {\rm trustiness}(r)({\rm review}(r, p) - \hat{s}),\]where \(R(p)\) is a set of reviewers product p receives, trustiness is defined in
Reviewer.trustiness()
, review(r, p) is the review score reviewer r has given to product p, and \(\hat{s}\) is the median of review scores.- Returns:
absolute difference between old reliability and new one.
- class rsd.graph.Review(graph: ReviewGraph, time: int, rating: float)[source]¶
Bases:
object
A graph entity representing a review.
- Parameters:
graph – Graph object this product belongs to.
time – When this review is posted.
rating – Rating of this review.
- update_agreement(delta: float) float [source]¶
Update agreement of this review.
This process considers reviews posted in a close time span of this review. More precisely, let \(t\) be the time when this review posted and \(\delta\) be the time span, only reviews of which posted times are in \([t - \delta, t+\delta]\) are considered.
The updated agreement of a review \(r\) will be computed with such reviews by
\[{\rm agreement}(r) = \frac{2}{1 + \exp( \sum_{v \in R_{+}} {\rm trustiness}(v) - \sum_{v \in R_{-}} {\rm trustiness}(v) )} - 1\]where \(R_{+}\) is a set of reviews close to the review \(r\), i.e. the difference between ratings are smaller than or equal to delta, \(R_{-}\) is the other reviews. The trustiness of a review means the trustiness of the reviewer who posts the review.
- Parameters:
delta – a time span \(\delta\). Only reviews posted in the span will be considered for this update.
- Returns:
absolute difference between old agreement and new one.
- update_honesty() float [source]¶
Update honesty of this review.
The updated honesty of this review \(r\) is defined by
\[{\rm honesty}(r) = |{\rm reliability}(P(r))| \times {\rm agreement}(r)\]where \(P(r)\) is the product this review posted.
- Returns:
absolute difference between old honesty and new one.
- class rsd.graph.ReviewGraph(theta: float)[source]¶
Bases:
object
A bipartite graph of which one set of nodes represent reviewers and the other set of nodes represent products.
Each edge has a label representing a review.
- Parameters:
theta – A parameter for updating. See the paper for more details.
- add_review(reviewer: Reviewer, product: Product, review: float, time: Optional[int] = None) Review [source]¶
Add a new review.
- Parameters:
reviewer – An instance of Reviewer.
product – An instance of Product.
review – A real number representing review score.
time – An integer representing reviewing time. (optional)
- Returns:
the new review object.
- new_product(name: Optional[str] = None) Product [source]¶
Create a new product.
- Parameters:
name – The name of the new product.
- Returns:
A new product instance.
- new_reviewer(name: Optional[str] = None, anomalous: Optional[float] = None) Reviewer [source]¶
Create a new reviewer.
- Parameters:
name – the name of the new reviewer.
anomalous – the anomalous score of the new reviewer.
- Returns:
A new reviewer instance.
- retrieve_products(review: Review) Collection[Product] [source]¶
Find products associated with a review.
- Parameters:
review – A review instance.
- Returns:
A list of products associated with the given review.
- retrieve_reviewers(review: Review) Collection[Reviewer] [source]¶
Find reviewers associated with a review.
- Parameters:
review – A review instance.
- Returns:
A list of reviewers associated with the review.
- retrieve_reviews(review: Review, time_diff: Optional[float] = None, score_diff: float = 0.25) ReviewSet [source]¶
Find agree and disagree reviews.
This method retrieve two groups of reviews. Agree reviews have similar scores to a given review. On the other hands disagree reviews have different scores.
- Parameters:
review – A review instance.
time_diff – An integer.
score_diff – An float value.
- Returns:
A tuple consists of (a list of agree reviews, a list of disagree reviews)
- retrieve_reviews_by_product(product: Product) Collection[Review] [source]¶
Find reviews to a product.
- Parameters:
product – Product
- Returns:
A list of reviews to the product.
- retrieve_reviews_by_reviewer(reviewer: Reviewer) Collection[Review] [source]¶
Find reviews given by a reviewer.
- Parameters:
reviewer – Reviewer
- Returns:
A list of reviews given by the reviewer.
- update() float [source]¶
Update reviewers’ anomalous scores and products’ summaries.
This update process consists of four steps;
Update honesty of reviews (See also
Review.update_honesty()
),Update rustiness of reviewers (See also
Reviewer.update_trustiness()
),Update reliability of products (See also
Product.update_reliability()
),Update agreements of reviews (See also
Review.update_agreement()
).
- Returns:
summation of maximum absolute updates for the above four steps.
- property delta: float¶
Time delta.
This value is defined by \(\delta = (t_{\rm max} - t_{\rm min}) \times \theta\), where \(t_{\rm max}, t_{\rm min}\) are the maximum time, minimum time of all reviews, respectively, \(\theta\) is the given parameter defining time ratio.
- products: Final[list[rsd.graph.Product]]¶
Collection of products.
- reviewers: Final[list[rsd.graph.Reviewer]]¶
Collection of reviewers.
- reviews: Final[list[rsd.graph.Review]]¶
Collection of reviews.
- class rsd.graph.ReviewSet(agree: Collection[Review], disagree: Collection[Review])[source]¶
Bases:
NamedTuple
Pair of agreed reviews and disagreed reviews.
- agree: Collection[Review]¶
Collection of agreed reviews.
- disagree: Collection[Review]¶
Collection of disagreed reviews.
- class rsd.graph.Reviewer(graph: ReviewGraph, name: Optional[str] = None, anomalous: Optional[float] = None)[source]¶
Bases:
Node
A node class representing a reviewer.
- Parameters:
graph – Graph object this reviewer belongs to.
name – Name of this reviewer.
anomalous – Initial anomalous score (default: None).
- update_trustiness() float [source]¶
Update trustiness of this reviewer.
The updated trustiness of a reviewer \(u\) is defined by
\[{\rm trustiness}(u) = \frac{2}{1 + \exp(-\sum_{r \in R(u)} {\rm honesty(r)} )} - 1\]where \(R(u)\) is a set of reviews the reviewer \(u\) posts.
- Returns;
absolute difference between the old trustiness and updated one.