SyntheticDataGenerator seq options

Options for generating sequences can be displayed by typing

SyntheticDataGenerator seq

Available Options
Name Value Description Default value
-fname filename Output base file name No default
-tlen double Average transaction length 2.5
-nitems integer Item count 10000
-randseed integer Master random seed (must <= 0) 0*
-lit.npats integer Large item set pattern count 10000
-lit.patlen integer Large item set average pattern length 1.25
-lit.corr double Large item set correlation (-corr) 0.25
-lit.conf double Large item set confidence (-conf) 0.75
-ncust integer Number of customers / sequences (-nseq) 100000
-slen double Average sequence length 10
-seq.npats integer Sequence pattern count 5000
-seq.patlen double Sequence average pattern length 4
-seq.corr double Sequence correlation 0.25
-seq.conf double Sequence confidence 0.75
-flat boolean Write sequences in a flat format false

* A randseed of zero results in a random seed being automatically generated

The sequence generator produces three files:

filename.config The parameters used to generate the sequences
filename.seq.patterns The sequence patterns
filename.lit.patterns The large item set patterns
filename.sequences The sequences

  • At a minimum, an output file name must be specified, e.g. SyntheticDataGenerator seq -fname sequences
  • Flat sequences, e.g. 1, 2, 3; 4, 5, 6; 7, 8, 9, where each item is separated by a comma and each element separated by a semi-colon, may be generated with the -flat true option

