Skip to main content

Table 1 Frequencies (in per cent) of categorical variables in the original confidential population and in the synthetic population

From: Constructing synthetic populations in the age of big data

Variables

Original confidential population

Synthetic population

Frequencies

Frequencies

Lung cancer

No

99.86

99.88

Yes

0.14

0.12

Pancreas cancer

No

99.99

99.99

Yes

0.01

0.01

Main source of income

Employee

47

47

Civil servant

7.4

7.4

Salary as company director

2.4

2.4

Other income from labour

0.3

0.3

Income as company owner

14.7

14.7

Income from property

0.4

0.4

Unemployment benefits

1

1

Disability pension

2.9

2.9

Retirement pension

17.8

17.8

Social assistance benefits

3.2

3.2

Other social security

1

1

Study grant

0.8

0.8

Other

0.1

0.1

No income

1

1

Household size (number of persons)

1

17.5

17.2

2

29.7

30.2

3

16.4

16.7

4

23.1

22.4

5

9.4

9.4

6 and more

3.9

4.1

Migration background

Dutch

78.9

78.7

Moroccan

2.2

2.2

Turkish

2.4

2.4

Surinam

2.1

2.1

Netherlands Antilles and Aruba

0.9

0.9

Other non-Western

4.2

4.2

Other Western

9.4

9.5

Type of household

Institutional

1.4

1.6

Non-institutional

98.6

98.4

  1. * Hospital admission data from the National Medical Registry (LRM), primary care data from the Netherlands Institute for Health Services Research (NIVEL), drug reimbursement data from the Dutch registry on medication use (Medicijntab), all provided through Statistics Netherlands