Review Graph Mining¶
A framework of review data mining based on a graph model.
It helps both data analysts who want to analyze their review data, and research scientists who want to make other mining algorithms.
For data analysts, you can easily compare several algorithms:
- Mutually Reinforcing Analysis (MRA) [1] (rgmining-ria),
- Repeated Improvement Analysis (RIA) [2] (rgmining-ria),
- Detecting Product Review Spammers Using Rating Behaviors [3] (rgmining-ria),
- Review Spammer Detection [4] (rgmining-rsd),
- Fraud Eagle [5] (rgmining-fraud-eagle),
- FRAUDAR [6] (rgmining-fraudar).
All packages are also available on PyPI and you can install them by pip install
.
To use one algorithm, you need to create a review graph object provided by each package, create reviewers and products, add then run the algorithm. For example, the following code shows a way to use MRA algorithm with the above review graph:
# Construct a Review Graph instance.
import ria
graph = ria.mra_graph()
# Create reviewers and products.
reviewers = [graph.new_reviewer("reviewer-{0}".format(i)) for i in range(1,4)]
products = [graph.new_product("product-{0}".format(i)) for i in range(1,3)]
# Add reviews.
graph.add_review(reviewers[0], products[0], 0.3)
graph.add_review(reviewers[0], products[1], 0.9)
graph.add_review(reviewers[1], products[1], 0.1)
graph.add_review(reviewers[2], products[1], 0.5)
# Start the algorithm.
for i in range(10000):
# Run one iteration.
diff = graph.update()
print("Iteration %d ends. (diff=%s)", i + 1, diff)
# If the update becomes negligible, the algorithm ends.
if diff < 10**-5:
break
# Print results.
for r in graph.reviewers:
print(r.name, r.anomalous_score)
for p in graph.products:
print(p.name, p.summary)
If you want to analyze a dataset given by this project or your review data are formatted in the JSON format, you can run algorithms via analyze
command provided by Scripts for Analyzing Review Graphs package.
For example, the following command analyzes the synthetic dataset by the Fraud Eagle algorithm with parameter \(\epsilon = 0.25\):
$ analyze synthetic feagle --method-param epsilon=0.25
To run the above command, you may need to install rgmining-script
, rgmining-synthetic-dataset
, and rgmining-fraud-eagle
packages.
Or the following command analyze your review data file review.json
by the FRAUDAR algorithm with parameter blocks = 10:
$ analyze file --dataset-param file=review.json fraudar --method-param blocks=10
To run the above command, you may need to install rgmining-script
, and rgmining-fraudar
packages.
For research scientists, you can evaluate your algorithms comparing with other algorithms. This project also provides several dataset loaders:
- A Synthetic Review Dataset Loader (rgmining-synthetic-dataset),
- Six Categories of Amazon Product Reviews Loader (rgmining-amazon-dataset),
- Trip Advisor Dataset Loader (rgmining-tripadvisor-datast).
To use these loaders, you need to make a review graph object which implements the Graph interface.
For example, the following code shows a way to load the Six Categories of Amazon Product Reviews Dataset to a graph object graph
:
import amazon
# `graph` must implement the graph interface.
amazon.load(graph)
Contents¶
Contribution¶
We welcome any contributions to this project such as issue reports, providing mining algorithms, providing datasets, etc.
If you publish something using this project, please let us know any links to it. We’d like to link any materials.
You can find contact information at the bottom of this page.
License¶
All softwares in this project are released under The GNU General Public License Version 3, see COPYING for more detail.
References¶
[1] | Kazuki Tawaramoto, Junpei Kawamoto, Yasuhito Asano, and Masatoshi Yoshikawa, “ A Bipartite Graph Model and Mutually Reinforcing Analysis for Review Sites,” Proc. of the 22nd International Conference on Database and Expert Systems Applications (DEXA 2011), pp.341-348, Toulouse, France, August 31, 2011. |
[2] | 川本 淳平, 俵本 一輝, 浅野 泰仁, 吉川 正俊, “ 初期レビューを用いた長期間評価推定,” 第7回データ工学と情報マネジメントに関するフォーラム, D3-6, 福島, 2015年3月2日~4日. |
[3] | Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady Wirawan Lauw, “Detecting Product Review Spammers Using Rating Behaviors,” Proc. of the 19th ACM International Conference on Information and Knowledge Management, pp.939-948, 2010. |
[4] | Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu, “Review Graph Based Online Store Review Spammer Detection,” Proc. of the 11th IEEE International Conference on Data Mining (ICDM 2011), pp.1242-1247, 2011. |
[5] | Leman Akoglu, Rishi Chandy, and Christos Faloutsos, “ Opinion Fraud Detection in Online Reviews by Network Effects,” Proc. of the 7th International AAAI Conference on WeblogsS and Social Media (ICWSM 2013), Boston, MA, July, 2013. |
[6] | Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, Christos Faloutsos, “ FRAUDAR: Bounding Graph Fraud in the Face of Camouflage,” Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp.895-904, 2016. |