Table of contents
The environment variables are mainly used to store sensitive information like credentials or other TLS parameters. All these environment variables are optional.
Variable | Values | Default | Notes |
---|---|---|---|
es_username |
String |
"" |
Username to connect to Elasticsearch |
es_password |
String |
"" |
Password to connect to Elasticsearch |
verify_certs |
Boolean |
True |
Whether the Elasticsearch certificate must be validated or not |
ca_certs |
String |
None |
A path to a valid CA to validate the Elasticsearch server certificate |
ee-outliers makes use of a single configuration file containing all required parameters such as connectivity with your Elasticsearch cluster, logging, etc.
A default configuration file with all required configuration sections and parameters, along with an explanation,
be found in defaults/outliers.conf
.
General | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
es_url * |
String |
URL to connect to Elasticsearch. It supports https schema for TLS |
es_index_pattern * |
String |
The name of the Elasticsearch index. Can be a glob pattern such as my_indexes* . |
es_scan_size * |
Int |
size of the batch used by Elasticsearch for each search request. |
es_scroll_time * |
[Integer][letter] where letter represents a duration (Hours, Minutes, Seconds) | Specify how long a consistent view of the index should be maintained for scrolled Elasticsearch search. |
es_timeout * |
Int |
Explicit timeout in seconds for each Elasticsearch request. |
timestamp_field |
String |
The field name representing the event timestamp in Elasticsearch.
Default value: timestamp . |
es_save_results * |
0 , 1 |
If set to 1 , save outlier detection results to Elasticsearch.
If set to 0 , do nothing. |
print_outliers_to_console |
0 , 1 |
If set to 1 , print outlier matches to the console.
If set to 0 , do nothing. Default value: 0 . |
history_window_days * |
Int |
Specify how many days back in time to process events and search for outliers.
This value is combine with history_window_hours . |
history_window_hours * |
Int |
Specify how many hours back in time to process events and search for outliers.
This value is combine with history_window_days . |
es_wipe_all_existing_outliers * |
0 , 1 |
If set to 1 , wipe all existing outliers that fall in the history window upon first run.
If set to 0 , do nothing. |
es_wipe_all_whitelisted_outliers * |
0 , 1 |
If set to 1 , existing outliers are checked and wiped if they match with the whitelisting.
If set to 0 , do nothing. |
run_models * |
0 , 1 |
If set to 1 , run all use cases with key parameter run_model set to 1.
If set to 0 , do nothing. |
test_models * |
0 , 1 |
If set to 1 , run all use cases with key parameter test_model set to 1.
If set to 0 , do nothing. |
log_verbosity * |
0-5+ |
0 for no progress info, 1 -4 for progressively more
outputs, 5+ for all the log output. |
log_level * |
CRITICAL , ERROR , WARNING , INFO ,
DEBUG |
Sets the threshold for the logger. Logging messages which are less severe than level will be ignored. |
log_file * |
String |
File path where the log messages will be saved |
It allows to extract additional information within the outliers and save them in the dictionary field
outliers.assets
.
General | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
Any existing field name | String |
Example: timestamp=time will extract the value inside the field
timestamp and add it to in the dictionary field outliers.assets at the key time . |
To have more information about the notification system, visit the page Notifications.
Notifier | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
email_notifier * |
0 , 1 |
If set to 1 , enable the notification system and the other key parameters from the
section [notifier] , except max_cache_ignore , become mandatory.
If set to 0 , do nothing. |
notification_email |
String |
Email where the information needs to be sent. |
smtp_user |
String |
SMTP username. |
smtp_pass |
String |
SMTP password |
smtp_server |
String |
SMTP server address. |
smtp_port |
int |
SMTP port. |
max_cache_ignore |
int |
Number of element keep in memory to avoid twice alerts for same notification.
Default value: 1000 . |
Used when ee-outliers is running on Daemon mode.
In daemon mode, ee-outliers will continuously run based on a cron schedule
which is defined by the following schedule
parameter.
General | ||
---|---|---|
Key parameters | Values | Notes |
schedule |
Standard cron format | Only used when running ee-outliers in daemon mode.
Example: schedule=10 0 * * * will run ee-outliers at 00:10 each night. |
Global parameters for all use cases of type simplequery.
The only global parameter for simplequery use cases is highlight_match
.
If set to 1
, ee-outliers will use the Elasticsearch highlight mechanism to find the fields and values
that matched the search query. The matched fields and values are respectively added to new dictionary fields
outliers.matched_fields
and outliers.matched_values
.
Example: If the search query is es_query_filter=CurrentDirectory : sysmon AND Image: System32 AND Image: cmd.exe
and the log
event contains the fields:
CurrentDirectory: C:\sysmon\
Image: C:\Windows\System32\cmd.exe
It will add the fields:
outliers.matched_fields: {"CurrentDirectory": ["C:\\<value>sysmon</value>\\"],
"Image": ["C:\\Windows\\<value>System32</value>\\<value>cmd.exe</value>"]}
outliers.matched_values: {'CurrentDirectory': ['sysmon'], 'Image': ['System32', 'cmd.exe']}
Note that in the field outliers.matched_fields
, the values that match the search query has been tagged as
follow: <value>MACHTED_VALUE</value>
.
General | ||
---|---|---|
Key parameters | Values | Notes |
highlight_match |
0 , 1 |
If set to 1 , it will use the Elasticsearch highlight mechanism to find the fields
and values that matched the search query. The matched fields and values are respectively added to new
dictionary fields outliers.matched_fields and outliers.matched_values .
If set to 0 , do nothing. Default: 0 . |
Global parameters for all use cases of type terms.
General | ||
---|---|---|
Key parameters | Values | Notes |
terms_batch_eval_size |
Int |
Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage. |
Global parameters for all use cases of type metrics.
General | ||
---|---|---|
Key parameters | Values | Notes |
metric_batch_eval_size |
Int |
Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage. |
Global parameters for all use cases of type sudden_appearance.
General | ||
---|---|---|
Key parameters | Values | Notes |
max_num_aggregators |
Int |
Maximum number of estimated aggregation.
If the number of aggregation defined in aggregator is bigger than max_num_aggregators , the returned results will not be accurate.
Default: 100000 . |
max_num_targets |
Int |
Maximum number of estimated targets.
If the number of terms defined in target is bigger than max_num_targets , the
returned results will not be accurate. Default: 100000 . |
Global parameters for all use cases of type word2vec.
General | ||
---|---|---|
Key parameters | Values | Notes |
word2vec_batch_eval_size |
Int |
Define how many events should be processed at the same time, before looking for outliers. Bigger batch means better results, but increase the memory usage. |
min_target_buckets |
Int |
Minimum number of events required within an aggregation before processing word2vec analyzer. |
drop_duplicates |
0 , 1 |
If set to 1 , drops duplicate target elements within each
aggregation.
If set to 0 , do nothing. Set to 0 by default.
Note that when activated, dorp_duplicates can increases the memory size. The reason is that it generally
increase the size of the vocabulary and therefore the size of the word2vec model. |
use_prob_model |
0 , 1 |
If set to 1 , use a probabilistic model instead of word2vec.
If set to 0 , use word2vec model.
Used mainly to evaluate the performance of word2vec.
The probabilistic model will compute the true probability that a context word to appear given a certain center word.
P(context_word|center_word) = (num. of time the pair context_word-center_word appears)/(num. of time center_word appears) .
Set to 0 by default.
|
output_prob |
0 , 1 |
If set to 1 , the models output the probability that a context word appears,
given a certain center word.
If set to 0 , and use_prob_model=0 it outputs the raw value of word2vec
(layer before the softmax).
If set to 0 , and use_prob_model=1 it outputs the logarithmic of the probabilities.
Set to 1 by default. |
separators |
regex format between quotes | Will split target elements by the occurrence of the regex pattern.
Example: If separators="\.| " and target of one event is "Our website is nviso.eu"
the output tokens will became ["Our", "website", "is", "nviso", "eu"] . |
size_window |
Int |
Size of the context window. Note that as you increase the size window, the number of center word - context word combination will increase. It will then result in a augmentation of memory size and computation time. |
min_uniq_word_occurrence |
Int |
If a word appears less than min_uniq_word_occurrence times, it will be replaced by
the 'UNKNOWN' word. Set to 1 by default.
Note that as it reduces the vocabulary size of the model, it reduces the memory size. |
num_epoch |
Int |
Number of times word2vec model trains on all events within one aggregation.
Set to 1 by default. |
learning_rate |
Float |
The learning rate of the word2vec model.
Set to 0.001 by default. |
embedding_size |
Int |
Embedding size of the word2vec model.
Set to 40 by default. |
seed |
Int |
The random seed to make word2vec deterministic.
If set to 0 it make word2vec non deterministic.
If deterministic, it will also read documents chronologically and therefore reduce Elasticsearch scanning performance.
Set to 0 by default. |
print_score_table |
0 , 1 |
Print all outlier scores on a table. Set to 0 by default. |
print_confusion_matrix |
0 , 1 |
Print confusion matrix and precision, recall and F-Measure metrics.
Work only if the field "label" (equal to 0 or 1 ) exist in Elasticsearch events.
Set to 0 by default. |
trigger_focus |
word , text |
If set to text , it triggers events based on global text score.
If set to word , it triggers events based on word score.
Set to word by default. |
trigger_score |
center , context , total , mean |
Type of score the events are triggered on. Mean compatible only with trigger_focus=text |
Some fields contains multiple information, like timestamp
that can be split between sub fields year, month, etc..
It requires any existing field name (e.g. timestamp
) as key parameter and using the GROK
format as value to extract the sub information.
The sub information will be extracted from all processed events, and added as new fields in case an outlier event is found.
The format for the new field will be: outlier.derived_<field_name> (e.g. outliers.derived_timestamp_year).
Note that, these fields are extracted BEFORE the analysis happens and with their original field_name (e.g. timestamp_year), which means that these fields can also be used as for example with aggregators or targets in use cases.
General | ||
---|---|---|
Key parameters | Values | Notes |
Any existing field name | GROK format |
Example: timestamp=%{YEAR:timestamp_year}-%{MONTHNUM:timestamp_month}-%{MONTHDAY:timestamp_day}[T ]%{HOUR:timestamp_hour}:?%{MINUTE:timestamp_minute}(?::?%{SECOND:timestamp_second})?%{ISO8601_TIMEZONE:timestamp_timezone}?
will creates from the field timestamp the fields derived_timestamp_year ,
derived_timestamp_month , etc.. |
By whitelisting an outlier, you prevent them from being tagged and stored in Elasticsearch.
For events that have already been enriched and that match a whitelist later, the
es_wipe_all_whitelisted_outliers
flag can be used in order to remove them.
To have more information about literals whitelist, visit the page Whitelisting outliers.
General | ||
---|---|---|
Key parameters | Values | Notes |
Any existing field name | String |
This whitelist will only hit for outlier events that contain an exact whitelisted string as one
of its event field values. The whitelist is checked against all the event fields, not only the outlier fields!
Example: slack_connection=rare outbound connection: Slack.exe . |
By whitelisting an outlier, you prevent them from being tagged and stored in Elasticsearch.
For events that have already been enriched and that match a whitelist later, the
es_wipe_all_whitelisted_outliers
flag can be used in order to remove them.
To have more information about literals whitelist, visit the page Whitelisting outliers.
General | ||
---|---|---|
Key parameters | Values | Notes |
Any existing field name | regex format | This whitelist will hit for all outlier events that contain a regular expression match against
one of its event field values. The whitelist is checked against all the event fields, not only the outlier fields.
Example: autorun_user_specific=^.*rare autorun:.*-.*-.*-.*-.*$ . |
To have more information about the configuration of one analyzer, visit the page Building detection use cases .
All analyzers | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
es_query_filter * |
String |
Any valid Elasticsearch query. |
es_dsl_filter |
String |
DSL filter on Elasticsearch query. |
timestamp_field |
String |
Can be any document field.
It will override the general settings timestamp_field . |
history_window_days |
Int |
Override history_window_days parameter in general settings. |
history_window_hours |
Int |
Override history_window_hours parameter in general settings. |
should_notify |
0 , 1 |
If set to 1 ,
notify the use case via the notifier if email_notifier is set to 1 .
If set to 0 , do nothing. |
use_derived_fields |
0 , 1 |
Enable or not the utilisation of derived fields. |
es_index |
String |
Override the es_index_pattern parameter in general settings |
outlier_type * |
String |
Freetext field which will be added to the outlier event as new field named outliers.outlier_type . |
outlier_reason * |
String |
Freetext field which will be added to the outlier event as new field named outliers.reason . |
outlier_summary * |
String |
Freetext field which will be added to the outlier event as new field named outliers.summary . |
run_model * |
0 , 1 |
If set to 1 , model run if run_models
parameter in general settings is set to 1 . |
test_model * |
0 , 1 |
If set to 1 , model run if test_models
parameter in general settings is set to 1 . |
The following parameters could be used for analyzers terms
, metrics
and word2vec
.
More information available here.
Usual model parameters (Terms, Metrics) | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
trigger_on * |
low , high |
If set to low , triggers events with model computed value lower than the decision boundary.
If set to high , triggers events with model computed value higher than the detection boundary. |
trigger_method * |
-percentile |
Percentile. trigger_sensitivity ranges from 0 -100 . |
-pct_of_max_value |
Percentage of maximum value. trigger_sensitivity ranges from 0 -100 . |
|
-pct_of_median_value |
Percentage of median value. trigger_sensitivity ranges from 0 -100 . |
|
-pct_of_avg_value |
Percentage of average value. trigger_sensitivity ranges from 0 -100 . |
|
-mad
| Median Average Deviation.
trigger_sensitivity defines the total number of deviations and ranges from 0 -Inf. . |
|
-madpos |
Same as mad but the trigger value will always be positive.
In case mad is negative, it will result 0 . |
|
-stdev |
Standard Deviation.
trigger_sensitivity defines the total number of deviations and ranges from 0 -Inf. . |
|
-float |
Fixed value to trigger on. trigger_sensitivity defines the trigger value. |
|
-coeff_of_variation |
Coefficient of variation.
trigger_sensitivity defines the total number of coefficient of variation and ranges from 0 -Inf. . |
|
trigger_sensitivity * |
0-100 , 0-Inf. |
Value of the sensitivity linked to the trigger_method |
process_documents_chronologically |
0 , 1 |
If set to 1 , process documents chronologically when analysing the model.
Set by default to 0 as it has high impact on Elasticsearch scanning performance. |
target * |
String |
Document field that will be used to do the computation
(based on the trigger_method selected). |
aggregator * |
Strings separated by a , |
One or multiple document fields that will be used to group documents. |
Any other parameters that are not used by the model will be automatically copied to the outlier parameter. More information available here.
Simple query | ||
---|---|---|
Key parameters | Values | Notes |
highlight_match | 0 , 1 |
Override highlight_match parameter in general simplequery settings.
|
Metrics | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
metric * |
-numerical_value |
Use the numerical value of the target field as metric. Example: numerical_value("2") => 2. |
-length |
Use the target field length as metric. Example: length("outliers") => 8. | |
-entropy |
Use the entropy of the field as metric. Example: entropy("houston") => 2.5216406363433186. | |
-hex_encoded_length |
Calculate total length of hexadecimal encoded substrings in the target and use this as metric. | |
base64_encoded_length |
Calculate total length of base64 encoded substrings in the target and use this as metric. Example: base64_encoded_length("houston we have a cHJvYmxlbQ==") => base64_decoded_string: problem, base64_encoded_length: 7. | |
-url_length |
Extract all URLs from the target value and use this as metric. Example: url_length("why don't we go http://www.dance.com") => extracted_urls: http://www.dance.com, extracted_urls_length: 20. | |
-relative_english_entropy |
Compute Kullback Leibler entropy. |
Terms | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
target_count_method * |
within_aggregator , across_aggregators |
If set to across_aggregator the analysis will be performed across all values of the
aggregator at the same time. If set to within_aggregator , will be performed for each value of the aggregator separately. |
min_target_buckets |
Int |
Minimum number of events within an aggregation before processing terms analyzer.
Only with the target_count_method set on within_aggregator .
|
Sudden Appearance | ||
---|---|---|
Key parameters (*Mandatory) | Values | Notes |
target * |
String separated by , |
One or multiple document fields that will be analyzed for sudden appearance in group documents. |
aggregator * |
String separated by , |
One or multiple document fields that will be used to group documents. Each document that contains the same combination of field values will be assembled in the same group. |
history_window_days |
Int |
Override history_window_days parameter in general settings. |
history_window_hours |
Int |
Override history_window_hours parameter in general settings. |
sliding_window_size * |
DDD :HH :MM |
Size of the sliding window where DDD define the number of days,
HH the number of hours and MM the number of minutes.
Example: 20 :13 :20 will correspond to a sliding window of size 20 days, 13
hours and 20 minutes.
|
sliding_window_step_size * |
DDD :HH :MM |
Size of the sliding step where DDD define the number of days,
HH the number of hours and MM the number of minutes. The sliding step represents the
jump step in time, the sliding window will slide withing the global window.
Example: 10 :01 :02 will correspond to a sliding step of size 10 days, 1
hours and 2 minutes.
|
Word2vec | ||
---|---|---|
Key parameters | Values | Notes |
word2vec_batch_eval_size |
Int |
Override word2vec_batch_eval_size parameter in
word2vec general configuration. |
min_target_buckets |
Int |
Override min_target_buckets parameter in
word2vec general configuration. |
drop_duplicates |
0 , 1 |
Override drop_duplicates parameter in
word2vec general configuration. |
use_prob_model |
0 , 1 |
Override use_prob_model parameter in
word2vec general configuration. |
output_prob |
0 , 1 |
Override output_prob parameter in
word2vec general configuration. |
separators |
regex format between quotes | Override drop_duplicates parameter in
word2vec general configuration. |
size_window |
Int |
Override size_window parameter in
word2vec general configuration. |
min_uniq_word_occurrence |
Int |
Override min_uniq_word_occurrence parameter in
word2vec general configuration. |
num_epoch |
Int |
Override num_epoch parameter in
word2vec general configuration. |
learning_rate |
Float |
Override learning_rate parameter in
word2vec general configuration. |
embedding_size |
Int |
Override embedding_size parameter in
word2vec general configuration. |
seed |
Int |
Override seed parameter in
word2vec general configuration. |
print_score_table |
0 , 1 |
Override print_score_table parameter in
word2vec general configuration. |
print_confusion_matrix |
0 , 1 |
Override print_confusion_matrix parameter in
word2vec general configuration. |
trigger_focus |
word , text |
Override trigger_focus parameter in
word2vec general configuration. |
trigger_score |
center , context , total , mean |
Override trigger_score parameter in
word2vec general configuration. |