Introduction

Overview of the Dataset on Hugging Face

The dataset is publicly available on Hugging Face at here.

Each entry in the dataset represents a unique persona and consists of four fields: pid, persona_text, persona_summary, and persona_json.

pid: a unique participant identifier.
persona_text: the complete text of the survey along with the corresponding responses.
persona_summary: a textual summary of the individual based on the survey content.
persona_json: a structured JSON object containing the organized survey data and responses, which can be used for downstream processing or analysis.

The following code snippet demonstrates how to load and parse the persona_json field from the dataset:

from datasets import load_dataset
import json

ds = load_dataset('LLM-Digital-Twin/Twin-2K-500', 'full_persona')
first_person = json.loads(ds['data'][0]['persona_json'])

Following this sample code, the next section will describe the structure of the persona_json file in detail.

Structure of the `persona_json` file

This section describes the hierarchical structure of the persona_json file that stores each individual response in the study. The structure is organized into two primary levels: blocks, and within each block, a set of questions and corresponding answers.

The survey is conducted across four distinct waves, and each block is associated with one of these waves. Blocks serve as thematic groupings of questions, which may vary in number and content. Blocks and questions can be presented in a fixed or randomized order. In some cases, blocks or individual questions may be randomly selected for inclusion based on experimental conditions or display logic (For the json file, this is especially true for the blocks in the fourth wave).

This section provides a comprehensive listing of all possible question blocks and their contents. However, the actual set of questions encountered by each participant (or digital twin) may differ due to such randomization and conditional display mechanisms.

Element 14:

Element 15:

Element 16:

Element 17:

Element 18:

Element 19:

Element 20:

Element 21:

Element 22:

Element 23:

Element 24:

Element 25:

Element 26:

Element 27:

Element 28:

Element 29:

Block: Non-experimental heuristics and biases [Wave 4]

Element 30:

Block: Product Preferences - Pricing [Wave 4]

Introduction

Overview of the Dataset on Hugging Face

Structure of the persona_json file

Element 0:

Element 1:

Element 2:

Element 3:

Element 4:

Element 5:

Element 6:

Element 7:

Element 8:

Element 9:

Element 10:

Element 11:

Element 12:

Element 13: