TraceGen: User Activity Emulation for Digital Forensic Test Image Generation

Du, Xiaoyu; Hargreaves, Christopher; Sheppard, John; Scanlon, Mark

Authors: Du, Xiaoyu; Hargreaves, Christopher; Sheppard, John and Scanlon, Mark

Publication Date: March 2021

Publication Name: Forensic Science International: Digital Investigation

Abstract:

Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a `story' to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or `background noise' on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts.

Download:

BibTeX Entry:

@article{du2021tracegen,
author={Du, Xiaoyu and Hargreaves, Christopher and Sheppard, John and Scanlon, Mark},
title="{TraceGen: User Activity Emulation for Digital Forensic Test Image Generation}",
journal="{Forensic Science International: Digital Investigation}",
year=2021,
month=03,
publisher={Elsevier},
abstract={Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a `story' to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or `background noise' on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts.}
}