A wrapper of FRAUDAR algorithm

This package implements a wrapper of FRAUDAR algorithm to provide APIs defined in Review Graph Mining project.

Installation

Use pip to install this package.

pip install --upgrade rgmining-fraudar

Graph model

FRAUDAR algorithm assumes review data are represented by a bipartite graph. This graph has two kinds of nodes, reviewers and products. A reviewer node and a product node are tied by an edge if the reviewer reviews the product. We extend the bipartite graph so that we can compute summary of rating scores. In our bipartite graph, each review, i.e. edge, has a normalized rating score, which means all scores are in \([0, 1]\).

digraph bipartite {
graph [label="Bipartite graph example.", rankdir = LR];
"reviewer-0";
"reviewer-1";
"product-0";
"product-1";
"product-2";
"reviewer-0" -> "product-0" [label="0.2"];
"reviewer-0" -> "product-1" [label="0.9"];
"reviewer-0" -> "product-2" [label="0.6"];
"reviewer-1" -> "product-1" [label="0.1"];
"reviewer-1" -> "product-2" [label="0.7"];
}

In the above bipartite graph example, there are two reviewers and three products. Both reviewers review product 1 and product 2, but product 0 is only reviewed by reviewer 0.

Usage

Graph Construction

This package provides a review graph class fraudar.ReviewGraph which represents the above bipartite graph. The constructor of this class takes two arguments: the number of kinds of fraudulent patterns this algorithm assumes, and a type of subroutine. Currently, you can pick one from the following three functions as the type of subroutine:

See API references and the original article for more information about the subroutines.

To construct a review graph instance with assuming \(n\) kinds of fraudulent patterns and using aveDegree as the subroutine,

import fraudar
graph = fraudar.ReviewGraph(n, fraudar.aveDegree)

The constructed graph object implements the graph interface.

After constructing a graph instance, you need to add reviewer nodes, product nodes, and review edges. Two methods, new_reviewer() and new_product(), create a reviewer node and a product node, respectively. Both methods take one argument name i.e. ID of the node. This name must be unique in a graph.

Method add_review() adds a review to the graph. It takes a reviewer object, a product object, and a rating score. The reviewer object and the product object must be created by the above two methods, and the rating score takes a float value in \([0, 1]\).

For example, let us construct a review graph instance which represents the bipartite graph example in the Graph Model section. The graph construction code is

import fraudar

# Construct a Review Graph instance.
# In this example, we choose 1 as the `n`.
n = 1
graph = fraudar.ReviewGraph(n, fraudar.aveDegree)

# Create reviewers and products.
reviewers = [graph.new_reviewer("reviewer-{0}".format(i)) for i in range(2)]
products = [graph.new_product("product-{0}".format(i)) for i in range(3)]

# Add reviews.
graph.add_review(reviewers[0], products[0], 0.2)
graph.add_review(reviewers[0], products[1], 0.9)
graph.add_review(reviewers[0], products[2], 0.6)
graph.add_review(reviewers[1], products[0], 0.1)
graph.add_review(reviewers[1], products[1], 0.7)

Analysis

Method update() starts the FRAUDAR algorithm.

# Run one iteration.
graph.update()

Result

Each reviewer has an anomalous score. If the anomalous score of a reviewer is 1, the reviewer is classified in FRAUD, otherwise HONEST. Property anomalous_score returns the anomalous score.

The ReviewGraph has property reviewers, which returns a collection of reviewers, and you can list up FRAUD reviewer names by

for r in graph.reviewers:
  if r.anomalous_score == 1:
    print(r.name)

On the other hand, each product has a summarized rating score. The summarized rating score of a product is the average of rating scores posted to the product from HONEST reviewers. Property summary returns the summarized rating score.

The ReviewGraph also has property products, which returns a collection of products, and you can print summarized rating scores of all products by

for p in graph.products:
    print(p.name, p.summary)

Script

As the summary of the above usage, we make an executable script which takes the parameter n as a command line option, and analyze the above graph. Let us save the following script as analyze.py.

#!/usr/bin/env python
import click
import fraudar

@click.command()
@click.argument("n", type=int)
def analyze(n):
    graph = fraudar.ReviewGraph(n, fraudar.aveDegree)

    # Create reviewers and products.
    reviewers = [graph.new_reviewer("reviewer-{0}".format(i)) for i in range(2)]
    products = [graph.new_product("product-{0}".format(i)) for i in range(3)]

    # Add reviews.
    graph.add_review(reviewers[0], products[0], 0.2)
    graph.add_review(reviewers[0], products[1], 0.9)
    graph.add_review(reviewers[0], products[2], 0.6)
    graph.add_review(reviewers[1], products[0], 0.1)
    graph.add_review(reviewers[1], products[1], 0.7)

    # Run the algorithm.
    graph.update()

    # Print anomalous reviewers.
    print("Anomalous reviewers.")
    for r in graph.reviewers:
      if r.anomalous_score == 1:
        print(r.name)

    # Print summarized rating scores.
    print("Summaries.")
    for p in graph.products:
        print(p.name, p.summary)


if __name__ == "__main__":
    analyze()

Note that, the above script uses click. If you didn’t install it, you need to run pip install click.

Then, you can analyze the graph with a specific \(n\), for example 5, run the script by

./analyze.py 5

Parameter tuning

Basically, bigger \(n\) produces a better result, too big \(n\) causes a worse result. You need to find the best parameter \(n\) to obtain the best result. The best parameter is highly depended on the data you want to analyze. You should run the algorithm many times. project:parallel_evaluation may help to evaluation time.

API Reference

License

This software is released under The GNU General Public License Version 3, see COPYING for more detail.

The original FRAUDAR source code, which is in fraudar/export, are made by Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos, and licensed under Apache License, Version 2.0.