FlowSim User Manual
FlowSim is used to measure the resources needed to find the maximal strong simulation relation for a probabilistic model (Markov chain, probabilistic automaton) under different optimizations. The results can be printed in plain text, in LaTeX table format or exported as a gnuplot script and data file. FlowSim can also automatically generate randomized models to study the behavior of different optimizations in a generalized fashion. The facility for this is described in more detail in the section "Random Models" below.
Usage
FlowSim is invoked from the command line with a list of options
followed by a list of models. At the very least, the --type
option must be
specified to define the model type as it isn't determined automatically. The
--type
option is discussed in detail in the section about model types. In
addition, there are options to control which optimizations should be benchmarked, which
resource types should be tracked, how the report should be delivered to the user and options
pertaining to averaging. All these additional options will be discussed in detail in the
following sections.
Output File Names
-o, --name TITLE | Specify a title for the benchmark |
For LaTeX
For each benchmarked data type, a file named
<title>_<data>.tex
will be generated where "<title>
"
is replaced by the title given through the --name
option (defaults to "benchmark
")
and "<data>
" is replaced by the corresponding data type.
Example: mybench_usertime.tex
.
For gnuplot
For each benchmarked data type, two files are
generated. The naming convention depends on the plot mode. For regular models (--plot n/m
), two
files are generated with the names <title>_<data>.data
and
<title>_<data>.gnuplot
where "<title>
" is replaced by the
title given via the --name
option and "<data>
" is replaced by the
corresponding data type. When plotting for random models, these files are called
<title>_<data>_<source>.data
and
<title>_<data>_<source>.gnuplot
with "<title>
"
and "<data>
" substituted as before. "<source>
" is substituted
by the model parameter which is being varied. This can be a combination of two parameters if a 3D
plot is being generated.
For plain text
Plain text is always printed to stdout and the title option does not apply.
Optimizations
-O, --opt OPT[:NAME] | Specify a configuration to benchmark |
Each configuration to benchmark is given by an individual specification of
--opt
(short -O
). The argument consists of an integer, optionally
followed by a colon and a title for this configuration. This title is used to identify this
configuration in all output formats. The integer is a bitvector where each bit corresponds to an
optimization which may be turned on or off. Not all optimizations can be combined. The following
table lists all supported configurations:
Decimal | Binary | Description |
0 | 00000 |
No optimization |
1 | 00001 |
State partitioning |
2 | 00010 |
Quotient |
4 | 00100 |
P-Invariant checking |
5 | 00101 |
State partitioning, P-Invariant checking |
6 | 00110 |
Quotient, P-Invariant checking |
16 | 10000 |
Parametric maximum flow |
17 | 10001 |
Parametric maximum flow, State partitioning |
18 | 10010 |
Parametric maximum flow, Quotient |
20 | 10100 |
Parametric maximum flow, P-Invariant checking |
21 | 10101 |
Parametric maximum flow, P-Invariant checking, State partitioning |
22 | 10110 |
Parametric maximum flow, P-Invariant checking, Quotient |
24 | 11000 |
Parametric maximum flow, Significant arc detection |
25 | 11001 |
Parametric maximum flow, Significant arc detection, State partitioning |
28 | 11100 |
Parametric maximum flow, Significant arc detection, P-Invariant checking |
29 | 11101 |
Parametric maximum flow, Significant arc detection, P-Invariant checking, State partitioning |
You may use --opt all
to benchmark all supported configurations.
Note
When some of the earlier case studies were made, in particular when the paper An Experimental Evaluation of Probabilistic Simulation was written, the quotient automaton approach hadn't been part of this project yet. At that time, the bit which now refers to the quotient approach had been unused and was omitted to avoid confusion, such that a 4bit bit vector was formed in which the second, third and fourth bits (counting from the right) corresponded to the third, forth and fifth bit in the representation above.
Data Types
-d, --data DATA | Specify a data type to include in output |
Using the --data
(short -d
) option, you can specify which
data you are interested in. If no --data
option is given, only usertime
will
be benchmarked. As with the --opt
option, you may specify --data all
for all
available data types. Note that this option is simply an output filter: All data is always recorded, so
there is no performance difference between few and many data types. The following table lists and
explains available data types:
Data Type | Description | |
usertime | Time needed to find the simulation relation (user mode time) | |
systemtime | Time needed to find the simulation relation (system mode time) | |
realtime | Real time from start of simulation until end | |
memory | Overall peak memory usage by data structures used in the algorithm (minor local variables, stack, program code etc. are not counted) | |
initialsize | Number of pairs in the initial relation minus reflexive relation (Quotient: initial number of blocks) | |
finalsize | Number of pairs in the result relation (minus reflexive relation) | |
partitions | Number of blocks if state partitioning is enabled, 0 otherwise (Quotient: final number of blocks) | |
iterations | Number of iterations | |
maxflow | Number of times maximum flow was (re)computed | |
pivfail | Number of networks that were discarded due to P-Invariant violation | |
safail | Number of networks that were discarded due to significant arc deleted | |
cache | Number of networks stored by parametric maximum flow | |
cachehits | Number of times a network was found in cache and reused |
Note
The values initialsize
and finalsize
do not
include the reflexive relation, which is obviously always there. This means that a final number of
pairs of zero does not mean that the relation is empty, it means that the relation is empty except
for the reflexive relation.
Output Formats
-q, --quiet | Suppress plain-text output | |
-l, --latex | Enable LaTeX table output | |
-p, --plot SRC | Enable gnuplot export and choose data source | |
--xrng R, --yrng R | For gnuplot, explicitly specify ranges, default automatic | |
--time-unit U | Unit for time (ms, s, m, h), default automatic but read note | |
--space-unit U | Unit for memory (B, kB, MB, GB), default automatic but read note | |
--precision N | Number of decimals in output (default 3) |
The results of the benchmark can be printed in plain text, exported as a LaTeX table or as a gnuplot script.
Plain text output is enabled by default, but can be turned
off using the --quiet
(short: -q
) switch.
LaTeX output can be enabled using the --latex
(short:
-l
) switch. The LaTeX table will be written to a file; see the section on
file names above for more information about the naming convention for output files.
One file will be generated for each selected data type. Note that the LaTeX files are
not complete documents as they only contain the table but no document preamble.
LaTeX output is not available for random models.
Gnuplot script generation is enabled by selecting the data source via the
--plot SRC
(short: -p SRC
) option. Valid values of SRC are:
n
: Number of states in modelm
: Number of transitions in modelr
: Primary variable in randomly generated model (see section on model types)r+
: Both variables in randomly generated model, fails if not present (3D plot)
Plotting will create a pair of output files (data set and script) for each data type that you are interested in. The section on output file names specifies how the names for these files will be formed.
By default, gnuplot will automatically show the entire range of available data.
In some cases you may want to override this behavior, for example in order to clip extreme outliers
or in order to move your data away from the coordinate axes for better visibility. You can use the
--xrng R
and --yrng R
options to adjust the visible data range. The argument
R
is the low end of the range, a colon and the high end of the range, e.g.
"--xrng 0:100
" would show the x-axis from zero to 100.
By default, FlowSim will try to select the most suitable units for time and
memory for you. However, in some cases you may wish to use the --time-unit
and
--space-unit
options to override the default. When benchmarking random models,
the units are not determined automatically and will be given in seconds and kilobytes unless
you use these options to specify otherwise.
In plain text output as well as LaTeX output, floating-point values will be
printed with three decimals after the point by default. This behavior can be changed by using
the --precision
option with which you can supply an arbitrary number of decimals
to be printed. For gnuplot output, this option has no effect: Floating-point values are always
written into the data file with maximum precision to ensure an as accurate representation in
the plot as possible.
Labels
-n, --labels N | Specify number of labels to use [deprecated] | |
-x, --lext X | Specify label file suffix to append to model file name (default '.label') |
-n, --labels N
: This option was used to specify some number of
different labels, to be distributed arbitrarily across the state space. This option is now
deprecated and will be ignored with a warning. Use the option --lext
(short -x
) instead.
-x, --lext X
: This option specifies a suffix string which is appended
to each input model file name to form a new file name. This file is interpreted as a label
specification, which associates arbitrary labels to the states of the input model. By using
different suffix strings, different label configurations can be prepared for each input model.
This option does not apply to random models.
Label specification format
The label specification format is a line-based format in which each line falls into one of the following categories:
- Comment line; begins with a
#
-sign. Everything until the next line break is ignored. - Blank line
- State/label association line; begins with a state specification, followed by at least one white-space and concluded with an arbitrary string (interpreted as a label).
Note that all lines, without exception, may begin and/or end in an arbitrary
number of white-space characters (20h
(space) and 09h
(tab) are considered white-space).
State specifications
may take three different forms: (1) a number, identifying a single state, (2) two numbers separated by a dash, identifying a sequence of consecutive states and (3) arbitrarily many numbers separated by commas, identifying a list of states. No other characters than digits, dashes and commas are allowed (i.e. no white-spaces). The different forms may not be mixed, meaning that the fictional specification "3,5-7,11" is malformed.
Labels
are arbitrary strings including anything but line breaks. Internally, the labels are represented as numbers. Note that label names should never be longer than 512 characters.
# This is an example label specification,
# for a fictional model which has 10 states.
0-4 a
5,8,9 b
6 c
States which do no appear in a state specification are assigned a separate label. State numbers which are out of range for the model that the specification is being applied to are silently ignored. Therefore, be sure that the first state is addressed as 0, not 1, since such a mistake will not generate any warnings.
Note
If you do not have specific label information for a model and
want to use the old behavior of distributing a fixed number of labels across the state space,
you can use the following shell command to generate a label file with the same labels that
would have been assigned ($MODEL
is the model and $LABELS
is the
desired number of labels):
jot `cat "$MODEL" | awk '{ if ($1 == int($1)) { print $1; exit; } }'` 0 | awk '{ print $1 " " ($1 % $LABELS); }' > "$MODEL.label"
Other Settings
--extra-rmap | Print memory used by the relation map separately (plain and LaTeX only) | |
--model-info | Add number of states and number of transitions (plain and LaTeX only) | |
-a, --avg N | Benchmark each model several times to stabilize time | |
--fp-approx X | Manually specify a floating-point approximation threshold (see below) |
There are additional options that allow you to fine-tune the way FlowSim operates.
--extra-rmap
: If this option is specified, the memory used by the
relation map will be printed in a separate row of the memory statistics. This is possible because
the size of the relation map is determined by the number of states in the model. The option may not
be used when benchmarking configurations that use the Quotient algorithm, where the relation map
is not related to the number of states. Note: This option generates a warning when used with
optimizations based on the quotient automaton approach.
--model-info
: Add rows listing the number of states, transitions
and actions (if present) for each model. This is only applicable for plain text and LaTeX output.
For random models, these values may be rounded averages.
-a, --avg N
: Benchmark every input model N times and take the averages
of usertime
, systemtime
and realtime
. All other data types
are deterministic and will have the same result every time. Using a high number for this value
ensures that random fluctuations in timing don't falsify the result as much, however the overall time
used by the benchmark process is multiplied by this number. When benchmarking random models, every
single instance of a certain random model class will be benchmarked multiple times. For random
models, it is better to leave this switch out and use the model regeneration parameter for the random
model instead, which will generate multiple instances to average over.
--fp-approx X
: In order to compute the correct relation, it is
often necessary to accommodate for rounding errors in floating-point numbers. This is done by
considering two floating-point numbers whose difference is smaller than some epsilon to be
equal. FlowSim will automatically compute a suitable value for epsilon for each input model
(only for discrete time models with distributions that are either stochastic or absorbing).
This option allows you to manually override this, as well as to specify an appropriate value for
continuous time models or models with sub-stochastic distributions. If you suspect that the
automatically selected threshold is erroneous and causing problems, you can use the
modelstat
tool to find out the threshold used for a particular model.
Types of Models
-t, --type T
: Specify the input model type. This option must always be
specified. The following model types are supported:
Model Type | Description | |
dtmc | Discrete-time markov chain | |
ctmc | Continuous-time markov chain | |
pa | Probabilistic automaton and markov decision process (MDP) | |
cpa | Continuous-time probabilistic automaton | |
random | Randomly generated model (input is a specification of random model parameters) |
The format used for these models is described in the following sections.
DTMC/CTMC File Format
For DTMCs and CTMCs, we use the format obtained when exporting models as text from the PRISM model checker. The format works as follows.
The first line contains two integer numbers which represent the number of states and the number of transitions in the model. After that, each line describes exactly one transition through three numbers: The source state, the target state and the probability (for DTMC) or rate (for CTMC). The source states MUST be listed in ascending order, and all transitions leaving the same state must be defined in adjacent lines, i.e. the successor distributions of two different states may not be mixed in the file. The number of transitions must match the number of transitions declared in the first line, otherwise the parser will fail.
Note
Within one line, any two successive arguments should be separated by exactly one space character (0x20).
PA/CPA File Format
For PAs, CPAs and MDPs, we support two different kinds of formats. PRISM's format only supports MDPs, which are a subset of PAs. While we can load the format exported by PRISM, we needed to modify it slightly in order to be able to load PAs which add non-determinism to MDPs.
The PRISM format
starts with a line containing the number of states, the number of actions and the number of transitions in the model, similarly to DTMCs/CTMCs. The model is then defined by a sequence of lines defining the transitions in the model as four numbers: The source state, the action (as an integer number), the target state and the corresponding probability (MDPs, PAs) or rate (CPAs). As with DTMCs/CTMCs, the source states must appear in ascending order, and the specifications of two successor distributions may not overlap.
The extended format
is identical save for a different way to specify the action. Instead of an integer number, two numbers separated by a forward slash (without any additional spaces in between) are used. The first number specifies the action as before. The second number can be of arbitrary value, except that it must be the same for all transitions belonging to the same distribution, and must be different between two adjacent distributions. It is permissible to alternate this number between zero and one for successive distributions. This feature is necessary in order to specify states with non-determinism, i.e. two or more actions with the same symbol.
Note
Within one line, any two successive arguments should be separated by exactly one space character (0x20).
Random Model File Format
Random models are defined by a sequence of 16 arguments. Currently, only random DTMCs are implemented in this fashion. These parameters should be all on the first line, separated by spaces. The parser will ignore anything after the 16th argument, so one may add comments or notes on the model without having to adhere to a particular syntax.
In order to generate a plot from a range of different, randomly
generated models, the first 9 arguments as well as the last (16th) argument may be
replaced by the strings "$x
" and "$z
", each of which may
appear at most once. The following table lists the meanings of each of the parameters.
Range | Description | ||
1. N | 0<N | Number of states in the model | |
2. A | 0≤A | Minimum number of successors per successor distribution | |
3. B | A≤B | Maximum number of successors per successor distribution | |
4. C | 0<C≤N | Number of clusters in the model, only effective in conjunction with clustering bias | |
5. Fb | [-1;1] | Fanout bias | |
6. Lb | [0;1] | Linearity bias | |
7. Cb | [0;1] | Clustering bias | |
8. Pb | [0;1] | Probability distribution bias | |
9. Sb | [0;1] | Successor bias | |
10. MS | 0<MS | Model samples (average over this many models generated with the same parameters) | |
11. X0 | - | First value for $x placeholder, must be present even when unused | |
12. X1 | X0≤X1 | Last value for $x placeholder, must be present even when unused | |
13. Z0 | - | First value for $z placeholder, must be present even when unused | |
14. Z1 | Z0≤Z1 | Last value for $z placeholder, must be present even when unused | |
15. AS | 2≤AS | Number of points to sample on the $x and $z axes when plotting, as applicable | |
16. L | 0<L | Number of labels to distribute over the state space |
Fanout bias
At zero, the number of successors for a state is chosen as a uniform random number between the minimum and maximum numbers of successors per state. A lower or higher number introduces a bias towards the lower or upper bound, respectively. At the boundaries, i.e. for -1 and 1, all states will have the minimum and maximum number of successors, respectively.
Linearity bias
At zero, successor states are chosen randomly, such that cycles are likely to appear in the model. A higher linearity bias makes cycles increasingly less likely to appear in the model. At 1, the model is completely linear in the sense that there exists a distinguished state which has no predecessors, from which every state can be reached exactly once while traversing state transitions (if clustering is enabled, unreachable clusters may be created, in which case the previous argument holds for every individual cluster).
Clustering bias
At zero, successors are chosen at random (influences of
other biases non-withstanding). When the clustering bias is used, the state space is subdivided
into a number of clusters (the C
parameter) and successors are more likely to be
picked from the same cluster as the predecessor. If the clustering bias is one, the resulting
model will have a number of entirely separate clusters such that there are no transitions between
any two states of any two clusters.
Probability bias
Perhaps more aptly called heterogenity bias, this value controls whether probability distributions are uniform or not. At zero, all distributions are uniform, i.e. all successors have the same transition probability. With higher values, distributions are more likely to be subdivided into multiple parts which are internally uniform. As the bias increases, the distributions are more likely to be divided into more parts. In the limit, this means that all distributions are subdivided into parts of one successor each, or in other words, no two successors of a distribution have the same transition probability. Note, however, that the probabilities for a distribution with a certain number of successors, subdivided into a certain number of parts, are computed in a deterministic fashion, meaning that the same probabilities will be assigned to two distributions of the same order being subdivided into the same number of parts. This is important to keep in mind when interpreting the results of a study involving this bias.
Successor bias
At zero, successors are chosen at random. This bias makes the choice of particular states more likely than the rest. However, since the successors in a distribution are all different, this primarily means that a certain state is likely to appear in most or all distributions. This bias is only meaningful if the state space is significantly larger than the order of "most" distributions. In particular, since distributions whose order is close to the size of the state space have all or almost all available states as successors, the probability of a certain state being more likely to be picked as a successor is meaningless.
Example
200 5 10 $x 0 0 $z 0 0 5 1 10 0 1 10 1
The example above defines a set of models, each of which have 200 states and between 5 and 10 successors per state. All states have the same label (there is only one label). This set of models can be used to create a 3D plot, since it uses both variables; in this case, this x-axis is the number of clusters and the z-axis is the clustering bias. On each axis, there will be 10 samples and for each sample, five instances of the model will be created, making a total of 10*10*5 or 500 model instances. The number of clusters will be sampled between 1 and 10, whereas the clustering bias will be sampled between 0 and 1, such that the resulting plot will visualize the effect of low, medium and high clustering biases for varying numbers of clusters.
Usage Example
Suppose the random model parameters above are stored in a file called
cluster_3d.rndm
, then the following command might be used to create a plot of
the processing time needed to compute the maximum simulation relation using no optimizations:
flowsim --type random --plot r+ --quiet --name cluster --data usertime
--opt 0:None -- cluster_3d.rndm