Whether social science involved questionnaires or behavioral measures, it is common practice to use established measures from previous research, and sometimes to modify and adapt them to new studies. This process can produce several variants of popular measures used in different settings and reported in different papers. Sometimes, though, measures lend themselves to adaptation not for different research questions and settings, but for the more problematic purpose of increasing the likelihood of producing a desired result (e.g., a "significant" effect).

While much of the analyses in behavioral research rely on null hypothesis statistical testing (NHST) analyses that assume a selected hypothesis test is being conducted with only one version of an outcome measure, a determined researcher can use different versions of a "flexible measure" repeatedly in analyses until the desired result is obtained. This increases the likelihood of producing desired findings to report, but of course decreases the likelihood that these reported findings represent a genuine phenomenon. In short, flexible measures can undermine the assumptions of statistical hypothesis testing analyses and increase the number of spurious findings in the research literature.

Given that problematic analyses using flexible measures can be difficult to identify in a published paper unless such analyses are reported (and why would they be?), FlexibleMeasures.com is dedicated to collecting, aggregating, and communicating examples of flexible measures in the literature to encourage reflection about both the use of flexible measures in research and the validity of studies that rely on them.

So far, FlexibleMeasures.com reports extensive analyses on only one measure (the Competitive Reaction Time Task), but hopefully the database will grow - collaborations are welcome!

Flexible Measures




All contents CC-BY Malte Elson (2016), Ruhr University Bochum; malte.elson at rub dot de; @maltoesermalte