DOROTHEA
DOcker-based fRamework fOr gaTHering nEtflow trAffic
DOROTHEA is a Docker-based solution that implements the above framework. It is highly scalable since users can easily incorporate their own customized scripts for both attacks and benign traffic. Besides, users can define complex network topologies
DOROTHEA allows for simulating both benign and malicious network traffic. Benign traffic corresponds to the network traffic that a real user generates when using a web browser, a mail client, a remote desktop client; or doing any other legitimate task. To get benign traffic, DOROTHEA uses network traffic simulators that send packages to a gateway. Simulators use scripts for legitimate traffic generation. These scripts are isolated items. Users may customize them, or even incorporate their own. Network traffic generated by simulators is received by the gateway which performs two main tasks replicating commercial routers' behavior. On the one hand, it routes packets to the internet. On the other hand, it sends 1 out if X to a NetFlow generator, where X is the sampling threshold set by the user. As mentioned above, packet sampling is used in commercial routers in order to decrease the number of packets that must be processed in order to generate flow data, thus reducing router congestion.
On the other hand, regarding malicious-traffic generation. Attacks are distributed between different nodes. In order to do so, Celery is used. Celery is a python library that allows for using a queue to distribute tasks between different nodes. The user defines the number of nodes that will carry out the attacks and the number of nodes that will be attacked. In the same way, as for benign traffic generation, attack nodes use scripts to perform attacks. Again, these attack scripts are isolated items so users may customize or even incorporate their own scripts.
Once the malicious-traffic generation is running, the launcher node will load attack scripts and enter the tasks in the queue. From the task queue, the attack nodes obtain their tasks and begin to execute the attacks.
Attack nodes are connected to a gateway. The gateway, the NetFlow generator, and the NetFlow warehouse work as explained above for benign traffic generation. Once all the attack scripts have been run and all the tasks in the queue have been completed, \ac{DOROTHEA} saves flow data and shuts all the nodes down.
Regarding malicious-traffic generation, the most important DOROTHEA's feature is that malicious traffic generation is isolated from the internet. Thus, we ensure that all network traffic used to build flow data corresponds to malicious attacks so as facilitating data labeling.
Finally, it is important to point out the scalability and flexibility of the proposed solution since DOROTHEA not only allows users to add customized scripts for benign and malicious traffic generation but also the Docker technology allows to define complex network topologies that simulate realistic environments.
If you wish, you can also use some of the datasets that we have already created with this tool:
Netflow data without sampling for training (D1)[1]
Netflow data without sampling for test (D2) [2]