Ronny Gunnarsson. Data collection [in Science Network TV]. Available at: http://science-network.tv/data-collection/. Accessed April 29, 2017.

Table of Contents

# Data collection in studies using a quantitative approach

Collection of observations are part of what we call data collection. However, it starts long before when you are planning your study. Data collection involves making a few definitions and decisions before actually collecting the data:

- Define the population of interest
- Define Sampling frame
- Decide sampling method
- Deciding inclusion and exclusion criteria
- Decide what type of data should be collected
- Decide sample size
- Plan practicalities around data collection
- Perform data collection

## Defining the population of interest

If your project involves humans then the participants in your study can be seen as a sample taken from an underlying population. The results from your project can hopefully be generalised to this underlying population. You must be able to describe to what population you expect your results to be valid. An example of a population of interest can be “all Caucasian women in the age group 40-70 years with diabetes mellitus type II living in a developed country”. You can not investigate all these people so you will take a small sample of them and hope that your sample is representative for your population of interest.

## Defining sampling frame

The sampling frame are those from your population of interest that for practical reasons are eligible for being included. An example could be “all Caucasian women in the age group 40-70 years with diabetes mellitus type II known to the primary health care centre or hospital in XX town”. You will rarely include the whole sampling frame, just a sample of it.

## Deciding sampling method

We have two main approaches to sampling, non-probability sampling and probability sampling. Each individuals probability of being chosen for the study is known in advance in a probability sampling. It is unknown in a non-probability sampling. Example of non-Probability sampling:

- Convenience sampling = Grab what you have at hands
- Snowball sampling
- Quota sampling (not truly stratified)
- Consecutive sampling (grab observations in the order they appear)
- Purposeful sampling (purposefully pick individuals to get a reasonable dispersion / variation in respect of age, gender, experience of phenomena of interest, etc)

Each individuals probability of being recruited to the study is unknown in all above sampling methods and this may increase the risk for bias.

Example of probability (random) sampling:

- Simple random sampling
- Systematic sampling
- Stratified sample
- Cluster sample

Each individuals probability of being recruited to the study can be calculated before data collection commence in all above sampling methods and this is likely to reduce the risk for bias.

Recommended sampling methods:

- Most non-Probability sampling methods are OK for a pilot study estimating feasibility before a larger randomised controlled trial (RCT) is done.
- Consecutive sampling is often OK for an early phase I or phase II RCT to prove if there is any kind of effect.
- Some kind of probability sampling is desired for a large phase III RCT proving effect in the clinical situation. However, most phase III trials use consecutive sampling which is a non-probability sampling method.
- Probability sampling is required for any observational study trying to clarify association between different phenomenon.

(More description of the different sampling methods will come)

## Deciding inclusion and exclusion criteria

Inclusion criteria are criteria used to identify subjects suitable to be included. It is common in health care related research that some of these criteria may be absence of pregnancy, dementia, end stage renal disease or other co-existing conditions making them unsuitable to participate. Exclusion criteria are criteria later applied to determine if patients already included later should be excluded. Exclusion criteria are common in prospective studies where patients are followed for some time. However, exclusion criteria are usually not needed in cross-sectional studies where all data collection is done at a single occasion.

It is a common misconception that exclusion criteria are a mirror of the inclusion criteria. A common example of this might be that age >18 years is one of the inclusion criteria and subsequently age <18 years is stated as an exclusion criteria. However, individuals younger than 18 years of age were never included in the first place because they did not meet inclusion criteria. Hence, no need to exclude them because they were never included.

## Decide what type of data should be collected

We use the label “variable” for a specific type of observation. Examples of variables might be age, gender, presence of high blood pressure, etc. These variables have two functions useful when you have your results;

- Describing what kind of observations / patients was included in your study. This is labelled descriptive statistics.
- Used as the basis to draw conclusions. This is labelled inferential statistics.

Many variables are used for both descriptive and inferential statistics. Variables used for inferential statistics should be submitted to sample size calculations (see below). Sometimes the sample size calculation may show that one variable requires an unreasonably high number of observations / patients. In that scenario this variable might be ditched or it might be kept and used solely for descriptive statistics. There is usually an interplay between the preliminary list of desired variables and the sample size calculation before you end up with the final list of variables intended for descriptive and / or inferential statistics. The type of data to be collected can be:

- Direct measurements (such as measurements of the body and its chemistry, body reactions)
- Indirect measurements of knowledge, attitudes or perceptions using surveys or structured interviews
- Binary questions (Yes/No)
- Surveys measuring attitudes or perceptions (Likert scale, Visual analogue scale or similar)
- Surveys with other fixed response alternatives

- Structured observations
- Structured observations of behavior
- Structured observations of events or processes

## Deciding sample size

It is important when using a quantitative approach to do a sample size calculation for variables intended to be used for inferential statistics. This involves making some assumptions and decisions. Please read the web-page sample size estimation for detailed information.

## Plan practicalities around data collection

(Coming)

## Perform data collection

(Coming)

# Data collection in studies using a qualitative approach

## Defining criteria for selecting participants

Purposeful sampling is usually the best choice in studies using a qualitative approach. This means purposefully picking individuals to get a reasonable dispersion / variation in respect of age, gender, experience of phenomena of interest, etc. (The rest is under construction…)

## Decide method of data collection

- Interviews with one person at the time
- Open (unstructured) interviews
- Partly open (semi structured) interviews

- Interviews and discussions in group = focus groups
- Documents
- Diaries
- Written stories
- Fiction / Poetry

- Open (unstructured) observations
- Non participatory observations
- Non participatory hidden observations (one way mirror or hidden video cameras)
- Non participatory disclosed observations (sitting observing or disclosed video camera)

- Participatory observations
- Participatory hidden observations (Günter Wallraff)
- Participatory disclosed observations (common in ethnography, grounded theory and social anthropology)

- Non participatory observations

Ronny Gunnarsson. Data collection [in Science Network TV]. Available at: http://science-network.tv/data-collection/. Accessed April 29, 2017.