Analyze command usage

The basic usage of this command is

$ analyze <dataset-specifier> <dataset-parameters> <method-specifier> <method-parameters> <options>

The dataset-specifier and datasset-parameters are the same parameters described in the dataset command explanation.

The method-specifier is a name of installed method. You can see available method names by analyze -h.

method-parameters are optional arguments specified with --method-param flag. The --method-param flag takes a string which connecting key and value with a single =, and can be given multi-times.

You can find what kinds of parameter keys are defined in the method you want to run from documents of the constructor of the review graph object defined in the method.

For example, Fraud Eagle takes one parameter epsilon and you can give a value by --method-param epsilon=0.25.

analyze also takes three options:

  • --threshold: threshold to distinguish an update is negligible (Default: \(10^{-5}\)),
  • --loop: the maximum number of iterations (Default: 20),
  • --output: file path to store results (Default: stdout).

Most of methods, the Review Graph Mining project provides, are loop based algorithms, which iterate some procedure until the update will be negligible. The --threshold flag sets a threshold and if an update is smaller than or equal to the threshold, it will be decided as negligible and the iteration will be ended.

On the other hand, some methods with some datasets won’t converge or even if they will converge but it takes lots of time. The --loop flag sets the maximum number of iterations to stop algorithms.

Datasets the Review Graph Mining Project provides

All packages are available on PyPI and you can install them by pip install.

References

[1]Kazuki Tawaramoto, Junpei Kawamoto, Yasuhito Asano, and Masatoshi Yoshikawa, “springer A Bipartite Graph Model and Mutually Reinforcing Analysis for Review Sites,” Proc. of the 22nd International Conference on Database and Expert Systems Applications (DEXA 2011), pp.341-348, Toulouse, France, August 31, 2011.
[2]川本 淳平, 俵本 一輝, 浅野 泰仁, 吉川 正俊, “ 初期レビューを用いた長期間評価推定,” 第7回データ工学と情報マネジメントに関するフォーラム, D3-6, 福島, 2015年3月2日~4日.
[3]Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady Wirawan Lauw, “Detecting Product Review Spammers Using Rating Behaviors,” Proc. of the 19th ACM International Conference on Information and Knowledge Management, pp.939-948, 2010.
[4]Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu, “Review Graph Based Online Store Review Spammer Detection,” Proc. of the 11th IEEE International Conference on Data Mining (ICDM 2011), pp.1242-1247, 2011.
[5]Leman Akoglu, Rishi Chandy, and Christos Faloutsos, “ Opinion Fraud Detection in Online Reviews by Network Effects,” Proc. of the 7th International AAAI Conference on WeblogsS and Social Media (ICWSM 2013), Boston, MA, July, 2013.
[6]Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, Christos Faloutsos, “ FRAUDAR: Bounding Graph Fraud in the Face of Camouflage,” Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp.895-904, 2016.