JsPsychR

Open source, standard tooling for experimental protocols: towards Registered reports

Gorka Navarrete & Herman Valencia

Presentation

https://gorkang.github.io/jsPsychRpresentation/

The past

Gather ’round little ones

Old school science

Old school science

Read literature & come up with a shiny¹ idea
Design and run experiment
Prepare data & explore different analytic approaches
If When significant result → Write paper

Scientific method

Source: Center for Open Science |MODIFIED|

Some issues

Experimenter degrees of freedom, incentives, issues

The need for significance and novelty:

Garden of forking paths (Rubin 2017)
p-hacking (Bruns and Ioannidis 2016)
Hypothesizing after the results are known (i.e. HARKing) (Kerr 1998)
False-positive research (Forstmeier, Wagenmakers, and Parker 2017)
Salami slicing (Rogers 1999)

The natural selection of bad science

The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing (…), only that publication is a principal factor for career advancement.

(…) in the absence of change, the existing incentives will necessarily lead to the degradation of scientific practices. (Smaldino and McElreath 2016) ¹

Psychology Replication crisis

Replication of 100 studies ¹

Statistically significant:

97% original → 36% replications

Effect sizes:

Replication effects were half the magnitude of original effects

Is there a crisis?

Baker (2016)

Contributing factors

Baker (2016)

Context

Mean number of publications for new hires in the Canadian cognitive psychology job market

And not only Psychology

Medicine (Ioannidis 2005)
Cancer research (Begley and Ellis 2012)
Finance (Bettis 2012)
Economics (Maniadis, Tufano, and List 2017)
Health informatics (Coiera et al. 2018)
Operations and supply chain management (Pagell 2021)
Methodological research (Boulesteix et al. 2020)
Education (Frias-Navarro et al. 2020)
…

Issues

Issues Opportunities

Improving Replicability

Improve methods

Reduce errors

Openness

Nosek et al. (2022)

Improving Replicability

Improve methods

increase number of participants
better measures and manipulations
improve design
piloting

Improving Replicability

Reduce errors

preregistration:
- p-hacking
- hypothesizing after the results are known (i.e. HARKing)
- selective reporting
internal replications

Improving Replicability

Openness

transparency of the research process
sharing methods, materials, procedures, and data

Registered reports

Registered reports (RRs)

RRs were conceived to alter the incentives for authors and journals away from (…) novel, positive, clean findings and towards (…) rigorous research on important questions. Soderberg et al. (2021)

How RRs work

Write introduction, method, … before collecting data!
Send to journal for review
Revise and resubmit (improve before collecting data)
Once you get In principle acceptance:
- collect data & run planned analysis
- report results and conclusions & send for final review

RRs do help

RRs outperformed comparison papers on all 19 criteria (Soderberg et al. 2021)

Sizable improvements in:

rigor of methodology and analysis, and overall paper quality

Statistically indistinguishable in:

novelty and creativity

RRs could improve research quality while reducing publication bias…

RRs advantages

More open, preregistered, reproducible by design
It does not matter if p < 0.05:
- Less incentives for p-hacking
- No hypothesizing after the results are known (HARKing)
- More trustworthy results, less noise
You still can explore, but have to say explicitly

Registered reports are great

But isn’t this a bit… hard?

Hard to know how to analyze an experiment before having the data
Always surprises when data arrives. How to create an analysis plan that will hold?
Collecting ALL-THE-THINGS™ allows me to figure out the best way to do analysis
My code is a mess. Would take more time to make it shareable…

Our path towards RRs

Background

At the CSCN (~5-10 PI’s) we used different technologies for experiments: Psychopy, Qualtrics, Limesurvey, jsPsych,…
Each protocol started almost from scratch. A single pre-existing task would define the technology used
Multiple implementations of the same tasks, not always exact replicas, not always easy to find

Issues

Experiments
Resources
Reproducibility

Experiment issues

Errors in experiment logic
Errors in items coding
Data not what we expected
Data structure made data preparation hard
Match between hypotheses and data not clear
Variables or questions not used in the analysis/paper

Resources issues: projects as islands

Time wasted re-programming tasks
Thousands of $/€ ‘invested’ in licenses (e.g. Qualtrics)
Piloting protocols as a part-time job for Research Assistants
Time wasted re-doing data preparation (each software has its own output format)

Reproducibility issues

Can I see what the participants saw in the 2012 protocol?
Data preparation/analyses so ugly, sharing them is hard (let me clean up this a bit before sending it)
Idiosyncratic analyses, some of which require licensed closed software (SPSS, Matlab,…)
Location and organization of projects
Why is this 2012 paradigm/data analysis is not running?

Issues Survey

2 questions survey:

https://cscn.uai.cl/lab/protocols/38/

Our wish list

Open source software based on standard technologies
Reusable tasks (my project feeds future projects)
Based on a mature project or technologies (De Leeuw 2015) ¹
As many ‘automagic’ things as possible
Easy to create, analyze and share (paradigms and analysis)
Balancing participants
Online/offline
etc.

The present

A few years latter…

jsPsychR

Open source tools to create experimental paradigms with jsPsych, simulate participants and standardize data preparation and analysis

The tool

A big catalog of reusable tasks in jsPsychMaker. Each task runs with jsPsychMonkeys to create virtual participants, and have a script in jsPsychHelpeR to automate data preparation (re-coding, reversing items, calculating dimensions, etc.)

The goal

Help us have the data preparation and analysis ready before collecting any real data

reduce errors in the protocols
make the move towards registered reports easier

So far

~100 tasks (maker + helper)
Used by researchers in Chile, Colombia, Spain:
- > 30 online protocols with > 5000 participants (Prolific Academic, Social Media, etc.) + offline
Everything open source. > 80 pages manual
♥ So, many, errors, caught early ♥

The team

Gorka Navarrete, Herman Valencia

Initial idea and development:

Gorka Navarrete, Nicolas Sanchez-Fuenzalida, Nicolas Alarcón, Alejandro Cofre, Herman Valencia

Discussions, ideas, testing:

Esteban Hurtado, Alvaro Rivera, Juan Pablo Morales, …

jsPsychR

1) jsPsychMaker

2) jsPsychMonkeys

3) jsPsychHelpeR

jsPsychMaker

Available tasks

Features jsPsychMaker

Fully open source, based on web standards (jsPsych)
Reuse ~ 100 tasks
Online and offline protocols
Balancing of participants to between participants conditions
Easy to create new tasks
Full control over order or tasks (randomization, etc.)
Participants can continue where they left (or not)
Time and number of participants limits
Multilingual support (for a selected number of tasks)
All the parameters can be quickly changed editing a single file

Create Tasks and Protocols

Copy example task to your computer

jsPsychMaker::copy_example_tasks(
  destination_folder = "~/Downloads/ExampleTasks", 
  which_tasks = "MultiChoice")

Copy paste items in excel file, edit instructions

# Open source document
rstudioapi::navigateToFile(here::here("R/BRS.txt")) # Brief Resilience Scale

# Edit instructions
rstudioapi::navigateToFile("~/Downloads/ExampleTasks/MultiChoice/MultiChoice_instructions.html")

# Adapt csv/excel file
system("nautilus ~/Downloads/ExampleTasks/MultiChoice/MultiChoice.csv")

Create protocol (see next slide)

jsPsychMaker::create_protocol(
  canonical_tasks = c("BNT"), # Berlin Numeracy Test
  folder_tasks = "~/Downloads/ExampleTasks/",
  folder_output = "~/Downloads/protocol9996",
  launch_browser = TRUE
)

jsPsychR tools

1) jsPsychMaker

2) jsPsychMonkeys

3) jsPsychHelpeR

jsPsychMonkeys

Features jsPsychMonkeys

Fully open source (R, docker, selenium)
Works for online and offline protocols
Sequentially and in parallel
Get pictures of each screen
Store logs to make debugging easier
Watch the monkeys as they work for you
Random pauses or refreshing to simulate human behavior
Set random seed to make the monkey’s behavior predictable

Release monkeys

Release a single Monkey and take a look:

jsPsychMonkeys::release_the_monkeys(
  uid = "1",
  local_folder_tasks = "~/Downloads/protocol9996/",
  open_VNC = TRUE,
  wait_retry = 0
)

Release 10 Monkeys in parallel:

jsPsychMonkeys::release_the_monkeys(
  uid = 1:10,
  sequential_parallel = "parallel",
  number_of_cores = 10,
  local_folder_tasks = "~/Downloads/protocol9996/",
  open_VNC = FALSE
)

jsPsychR tools

1) jsPsychMaker

2) jsPsychMonkeys

3) jsPsychHelpeR

jsPsychHelpeR

Features jsPsychHelpeR

Fully open source (R)
Get tidy output data frames for each task, and for the whole protocol
Standard naming for tasks, dimensions, scales, …
Include tests for common issues
Snapshots to detect changes in data processing
Functions to help create new tasks correction using a standard template
Automatic reports with progress, descriptive statistics, code-book, etc.
Create a fully reproducible Docker container with the project’s data preparation and analysis
Create a blinded data frame to perform blinded analyses

jsPsychHelpeR

Create project for data preparation:

jsPsychHelpeR::run_initial_setup(
  pid = 9996,
  data_location = "~/Downloads/protocol9996/.data/",
  folder = "~/Downloads/jsPsychHelpeR9996", 
  dont_ask = TRUE
  )

Challenge: everything in 3 minutes?

Create protocol, simulate participants and prepare data…

rstudioapi::navigateToFile("R/script-full-process.R")

Create, simulate, prepare

Survey Experiment Issues

Survey results

Let’s try to download the data, process it and show a report with the results:

Plan A: run Experiment Issues project

rstudioapi::openProject("jsPsychHelpeR-ExperimentIssues/jsPsychHelpeR-ExperimentIssues.Rproj", newSession = TRUE)

Plan B: If something fails, we always have the monkeys!

utils::browseURL(here::here("jsPsychHelpeR-ExperimentIssues/outputs/reports/report_analysis_monkeys.html"), 
                 browser = "firefox")

Limitations

Easy to create new scales and simple tasks, but complex experimental tasks require javascript knowledge (although there are a good number of examples available)
Data preparation for new experimental tasks requires expertise in R (simple surveys not so much)
Analysis reports require some R knowledge (simple templates available)
Needs access to a server for online tasks
Only behavioral tasks (no EEG fMRI, maybe eyetracker…)

The future

The past is always tense, the future perfect

Zadie Smith

Too many things, too little time

Development is linked to our needs, time and resources. Future roadmap:

Templates for common experimental designs (tasks, data preparation and analysis)
More tasks, translations, tests, …
Upgrade, improve, clean, simplify, share
jsPsychR paper

Help wellcome!

Javascript programmers
R programmers
Documentation
Task creators
Testers
Coffee brewers
Patrons

Back to Registered reports

Can jsPsychR really help? (1/2)

Protocols are standardized with (mostly) clean code, open source, and based on web standards
Data preparation 90% automatic, standardized, beautiful
Less errors in protocols and in data preparation
Time from idea to protocol much lower
Super easy to self-replicate (adapt, re-run, analysis already works)
When errors are found and fixed, old protocols can benefit from the corrections, old results can be checked, …

Can jsPsychR really help? (2/2)

Trivial to work on analysis before collecting human data
Much easier to write up a good analysis plan, share it, improve it, …
Easy to create fully reproducible papers and results’ reports
Sharing protocol, materials, data preparation is painless (single command) ¹
Creating future-proof fully reproducible data preparation projects (with Docker) is one command away ²

jsPsychR ♥ Registered Reports

More information

RRs’ templates, checklists, participating journals (> 300):

https://www.cos.io/initiatives/registered-reports

Also, check out the future:

RRs v2: Peer community in

And if you are a reviewer:

No reviews unless data, stimuli and materials are publicly available (www.opennessinitiative.org/)

References

Baker, Monya. 2016. “Reproducibility Crisis.” Nature 533 (26): 353–66.

Begley, C Glenn, and Lee M Ellis. 2012. “Raise Standards for Preclinical Cancer Research.” Nature 483 (7391): 531–33.

Bettis, Richard A. 2012. “The Search for Asterisks: Compromised Statistical Tests and Flawed Theories.” Strategic Management Journal 33 (1): 108–13.

Boulesteix, Anne-Laure, Sabine Hoffmann, Alethea Charlton, and Heidi Seibold. 2020. “A Replication Crisis in Methodological Research?” Significance 17 (5): 18–21.

Bruns, Stephan B, and John PA Ioannidis. 2016. “P-Curve and p-Hacking in Observational Research.” PloS One 11 (2): e0149144.

Coiera, Enrico, Elske Ammenwerth, Andrew Georgiou, and Farah Magrabi. 2018. “Does Health Informatics Have a Replication Crisis?” Journal of the American Medical Informatics Association 25 (8): 963–68.

De Leeuw, Joshua R. 2015. “jsPsych: A JavaScript Library for Creating Behavioral Experiments in a Web Browser.” Behavior Research Methods 47: 1–12.

Forstmeier, Wolfgang, Eric-Jan Wagenmakers, and Timothy H Parker. 2017. “Detecting and Avoiding Likely False-Positive Findings–a Practical Guide.” Biological Reviews 92 (4): 1941–68.

Frias-Navarro, Dolores, Juan Pascual-Llobell, Marcos Pascual-Soler, Jose Perezgonzalez, and Jose Berrios-Riquelme. 2020. “Replication Crisis or an Opportunity to Improve Scientific Production?” European Journal of Education 55 (4): 618–31.

Ioannidis, John PA. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.

Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3): 196–217.

Maniadis, Zacharias, Fabio Tufano, and John A List. 2017. “To Replicate or Not to Replicate? Exploring Reproducibility in Economics Through the Lens of a Model and a Pilot Study.” Oxford University Press Oxford, UK.

Nosek, Brian A, Tom E Hardwicke, Hannah Moshontz, Aurélien Allard, Katherine S Corker, Anna Dreber, Fiona Fidler, et al. 2022. “Replicability, Robustness, and Reproducibility in Psychological Science.” Annual Review of Psychology 73: 719–48.

Open-Science-Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Pagell, Mark. 2021. “Replication Without Repeating Ourselves: Addressing the Replication Crisis in Operations and Supply Chain Management Research.” Journal of Operations Management 67 (1): 105–15.

Pennycook, Gordon, and Valerie A Thompson. 2018. “An Analysis of the Canadian Cognitive Psychology Job Market (2006–2016).” Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale 72 (2): 71.

Rogers, Lee F. 1999. “Salami Slicing, Shotgunning, and the Ethics of Authorship.” AJR. American Journal of Roentgenology 173 (2): 265–65.

Rubin, Mark. 2017. “An Evaluation of Four Solutions to the Forking Paths Problem: Adjusted Alpha, Preregistration, Sensitivity Analyses, and Abandoning the Neyman-Pearson Approach.” Review of General Psychology 21 (4): 321–29.

Smaldino, Paul E, and Richard McElreath. 2016. “The Natural Selection of Bad Science.” Royal Society Open Science 3 (9): 160384.

Soderberg, Courtney K, Timothy M Errington, Sarah R Schiavone, Julia Bottesini, Felix Singleton Thorn, Simine Vazire, Kevin M Esterling, and Brian A Nosek. 2021. “Initial Evidence of Research Quality of Registered Reports Compared with the Standard Publishing Model.” Nature Human Behaviour 5 (8): 990–97.

Thanks!

Gorka Navarrete

gorkang@gmail.com