Peer Feedback Banner.png

Peer feedback

Studying life scientists' experience of peer review and evaluating an alternative approach to peer review.

At A Glance

Design Firm: Wrye Design Lab, LLC

Client: ASAPbio

Website: asapbio.org

Time Frame: 4 weeks at no more than 20 hours/week

Role: Experience Research Consultant

Dates: 01.11.2018 - 02.07.2018

Tools: Google Suite, MATLAB, Calendly

Services: Generative User Research, Evaluative User Research.


Context

ASAPbio defines itself as a "scientist-driven initiative to promote transparency and innovation in life sciences communication.1" One of ASAPbio's newest projects is a re-imagining of the current, journal-based peer review system. I was engaged to perform generative research on life scientists' experience of peer review along with evaluative user research for a proposed journal-independent peer review service titled Peer Feedback. My research would be used to inform an upcoming summit (February 9th, 2018) with several representatives from different scientific journals as well as further development of the system.

In brief, the proposed system involved scientists submitting their papers to the Peer Feedback organization before they submit to any journals. In return for a small fee, Peer Feedback would forward the papers to carefully selected experts in the field for review. In return for compensation, these experts would evaluate the paper for technical rigor, though not for scientific impact (the current journal-based approach requires referees to evaluate for both rigor and impact). The reviews would then be posted publicly with the paper on a pre-print server, and could be forwarded with the paper to any journal with the intent that technical review would not have to be repeated.

The system was intended to address multiple systemic issues with the publishing system in the life sciences, such as bias and abuse in the peer review process, conflation of impact with technical rigor, and the journals' control over what scientific endeavors are ultimately deemed worthwhile.

Note: as indicated in the Reflections section, this project is still a work in progess.


Initial Research Planning

Whenever I create a research plan, I first ensure that I have a set of clearly defined research goals. This practice anchors my research development and provides a set of objectives by which I can evaluate possible research protocols. As a contractor, defining my research goals always begins with a period of stakeholder research to ensure that I deliver meaningful, actionable information to the organization at the end of the project.

Discussions with the director of ASAPbio revealed that stakeholders' primary concern was understanding scientists' reaction to the proposed Peer Feedback system. This evaluative work would, hopefully, yield evidence to provide backing for the Peer Feedback system, which could then provide a foundation for discussion at the upcoming February 9th summit. The ASAPbio team also hoped that my work could be used to inform further development of the proposal post-summit.

I suggested that it might also be useful for me to include a generative component in the research to provide a deeper understanding of users' experiences of the systemic issues the proposal sought to address. These insights would end up doing double-duty by both demonstrating what experience problems the proposal currently addresses and suggesting areas for further development and refinement. The director accepted this suggestion and, with this shared understanding, I developed the following design goals (written here in the form of hunt statements):

  • I am going to research life scientists' experience of journal submission and peer review, both as authors and reviewers, in order to inform the design and evaluation of a proposed journal-independent peer review service.

  • I am going to research life scientists' response to ASAPbio's current Peer Feedback proposal in order to gain an understanding of the proposed system's usability and validate assumptions inherent in its design.

Although I would usually complete generative research separately from evaluative research, the timeline (at this point, just over three weeks) wouldn't support two entirely separate rounds of testing. I therefore elected to create a combined generative-evaluative study with separate sections designed to serve their own research goals.

I'm a strong believer in utilizing both qualitative and quantitative data collection methods--in my experience, mixed-methods research yields more than simply the sum of its parts. For the generative portion of the research, I settled on two research methods, given the constraints of the situation:

  • Interviews: ASAPbio's worldwide connections would make procuring a diverse population of interview participants relatively easy. Further, the interviews themselves could be done in person or remotely and the timeframe required to conduct and analyze interviews fit easily into our 3 week schedule. Interviews would provide rich, qualitative data about scientists' habits, thoughts, frustrations, motivations, and goals regarding peer review.

  • A Survey: Again, leveraging ASAPbio's worldwide reach, a survey could pull in several hundred responses from life scientists all over the globe, bringing some statistical power to the research findings. Easy to create and release to our user group, a survey could be sent out early in the schedule and left open for 2-3 weeks to garner as many responses as possible.

The evaluative portion was trickier to devise. The main challenge was how I might create a prototype for the system: after all, I was not evaluating a physical or digital product with an interface that one could mock-up and then use for testing. I toyed briefly with creating a small scale model of the service, however, it quickly became clear that a useful prototype would essentially need to be a fully functioning system (if smaller in scale) with authors, reviewers, and journals already on board. This wasn't feasible, to any degree, in the 3 weeks I had. Through further investigation into evaluative research methods, I found a workable solution: a modified Wizard of Oz method (also known as an Oz Protocol). Generally used for human-computer interaction research, the Wizard of Oz method is so named because it involves a system that is partially or wholly controlled by human decisions ("pay no attention to that man behind the curtain"). The power of the Oz Protocol for my application was two-fold:

  • I could be the "prototype": I'd already determined that assembling even a small-scale model of the system would be a massive undertaking. With the Oz Protocol, I could simulate interactions between scientists and the Peer Feedback system directly, even titrating system responses to more closely study a certain point of friction.

  • It could be done in the same context as the interviews: I could complete a modified Oz Protocol in the same environment, using the same tools, as an interview. No additional technology (such as InVision, Adobe XD, or other prototyping tool) would be necessary--it could simply be layered on top of the generative interviews, in person or over video call.

What about mixed methods? Could I entertain a quantitative bent in my evaluative research? Given the impractical nature of creating a prototype system in the time available, I decided to introduce quantitative character in my evaluative research using the survey I'd already devised for generative research. Evaluative questions could be interspersed throughout the survey, focusing on specific, analogous behaviors to avoid speculation and bias.

 

My initial research planning was complete--I would include:

  • An Interview + Oz Protocol: This qualitative portion of research would begin with a generative interview comprised of open-ended questions intended to elucidate life scientists' experience of journal submission and peer review. It would then move on to the Oz protocol which would examine scientists' responses to a series of situations, each reflecting an aspect of interaction with the Peer Feedback system.

  • A Survey: The quantitative portion would include both generative and evaluative properties. It would be widely circulated via ASAPbio's worldwide connections, in the hope of procuring a statistically relevant number of responses from a diverse sample of scientists.


Determining Research Questions

I worked closely with the ASAPbio director while developing the questions included in the Interview, Oz Protocol, and Survey. Over the first week of the project, we met several times over video call (I'm based in San Francisco, she in Boston), to discuss questions we'd compiled on a Google doc. As a postdoctoral fellow, her experience in publishing and academia proved very valuable in ensuring I used effective, field-specific phrasing and vocabulary while formulating questions. For instance, without her help I wouldn't have known the term "desk reject" referred to a submitted manuscript that was rejected before even being sent out for review.

Questions for different research methods required different considerations:

  • Interview questions needed to be highly open-ended, encouraging stories and narratives to surface. I would need a section focused on submitting papers in addition to a separate section for those who had reviewed papers (which would comprise a subset of all the interviewees). Some questions focusing on analogous behaviors to those required to participate in the Peer Feedback system could also be included to shed light on subjects and behaviors important to the evaluative research while avoiding speculation as much as possible. For instance, accepting honoraria for speaking engagements or reviewing grant proposals would be analogous, to some degree, to accepting payment for performing peer review. Following some contextual questions intended to get the interviewee talking and build rapport, I decided to aim for six to seven questions for authors and six to seven questions for those with reviewing experience. This would to help keep interviews, including the Oz Protocol, under an hour. Examples of interview questions included:

    • Tell me about a recent memorable experience you had going through peer review (as an author of a paper)

    • How would you characterize good feedback on a manuscript?

  • The Oz Protocol questions needed to be both open-ended and specific, each targeting a certain feature or decision in the scientist's interaction with the Peer Feedback system. This required significant interaction with the director to identify atomic features of the Peer Feedback systems and design questions to exactly target and probe those features. Similar to the generative section, I would need two separate question flows for reviewers and authors. Examples of the Oz Protocol questions include:

    • You’re informed that the reviews you receive will, by default, eventually be posted publicly on the web, and the reviewers may choose to sign them. Before they are posted, you will have two weeks to prepare a rebuttal or revise the manuscript that you may publicly post at the same time as these reviews. What do you do?

    • You’re informed that, in order to participate in the service, the content of your finished review will be posted publicly. What do you do?

  • The survey was intended to provide quantitative information, and therefore required closed-ended questions that would provide quantifiable data. Doing double-duty for both the generative and evaluative sides of the research, the survey required questions about scientists' general experience of peer review as well as questions regarding important features of the Peer Feedback System. Because past behavior is the best indicator of future behavior, I would try to create questions probing behaviors analogous to those involved in Peer Feedback. The survey would include a brief demographic section, as well as sections for both authors and reviewers (like the interviews). Screener questions would make sure each section is only seen by the appropriate participants. In an attempt to keep the length of time required of participants below 15 minutes, I limited the total number of questions to 30 (using an estimate of no more than 30 seconds per question). Some survey questions include:


Conducting Research

Interview participants were procured through ASAPbio's connections worldwide and were scheduled using Calendly. I interviewed 17 scientists from various backgrounds and stages in their career (graduate students to tenured professors). Most interviews took 40-55 minutes (average: 47 minutes). All interviews were recorded, with explicit consent, for later transcription and review.

To provide some context for my interviewees, I began my interviews with the following preamble:

My name is Will—I’m a user experience researcher working with ASAPbio to develop and assess a proposal for a portable, journal-independent peer review service. Part of my job is to gather and synthesize the perspectives on peer-review and potential service ideas from a diverse group of potential users, like you. I have a series of questions here which I’d like to hear your take on, but first a little preamble:

The success of this brand of qualitative research hinges on you, as the interviewee, answering these questions with a lot of detail. Stories and descriptions of past experiences are especially important, so please feel free to go into as much detail as you’re comfortable with on any of these questions. Does that make sense?

Building rapport with an interviewee is vital to an effective interview. To do this with the scientists I spoke to, I would begin by showing interest in the sort of work they do and offer up some of my history in academics as well. For instance, several scientists worked in quantitative biology, so I mentioned my previous work in the same field and my general enthusiasm for interdisciplinary science. This established some degree of common ground between me and the interviewee, as well as clued them into my background. Throughout the rest of the interview, I would continue to offer some personal color when appropriate, especially when it seemed it would bring stories and other useful information to the surface.

I had some concern about influencing the generative interview answers with the Peer Feedback system as put forth in the Oz Protocol, and therefore elected to go through the Oz Protocol questions after the generative portion was complete. Those interviewees who indicated they had reviewed papers for journals during the generative portion were walked through the referee question flow, whereas those who indicated otherwise walked through the author flow.

I concluded both sections with a question of the form "is there anything I haven't asked about that you think I should know?" This question serves as a final catch-all questions and, in my experience, can often lead to the most interesting insights of the interview.


Analysis

Qualitative analysis of both the generative interviews and Oz Protocol began with an affinity diagram: I wrote the frustrations, motivations, behaviors, and goals gathered from the interview recordings on sticky notes, then placed them close by or far away from each other depending on the similarity or dissimilarity of the underlying ideas and insights. Distinct groupings revealed patterns of user insights for inclusion in the final ASAPbio report.

The affinity diagram developing across three walls of my office.

Quantitative analysis was completed in MATLAB: form responses (already anonymized) were uploaded into a script I developed to filter and compare the answers from those surveyed. Statistical metrics, such as mean and skew, were also reported to numerically describe the distributions. Unfortunately, slip in the schedule during interviews and an immediate need for delivery of the information meant I was unable to perform a full statistical analysis of patterns data (such as differences in answers between subgroups).

Some of the MATLAB code used for quantitative analysis

As requested by ASAPbio, findings from the qualitative and quantitative data were then interpreted in the context of each other, and the resulting conclusions placed in a 20-page report presented to the ASAPbio director over video call. Roughly half the report covered the generative research (interview + survey), and half the evaluative research (Oz protocol + small parts of the interview + survey).

Example of a generative finding in the report

Findings in the generative portion were reported as follows:

Finding Title: A short, descriptive title that explains the finding.

Discussion: A discussion of the research finding in detail, citing both qualitative and quantitative data. Important quotations and other representative material (for instance, charts and graphs) were included in the discussion as well.

Design question: The finding entry ends with a design question in "How could we..." form to guide further design thinking as the service is refined.

 

Example of an evaluative finding in the report

Findings in the evaluative portion were reported as follows:

Question "Checkpoint" Title: Each question in the Oz protocol targeted a certain feature or design decision (I referred to them as "checkpoints") in the service. As the evaluative research sought to assess these features/decisions, findings were organized by the question in which they arose.

Key Takeaway: To allow readers a way to quickly gather information from the report, the take-home finding was included at the very beginning.

Survey Findings: Applicable survey findings probing related questions were included and interpreted here.

Interview Findings: Interview findings were explained here in detail, including important quotations and a list of positive points and major concerns (if applicable).

 


Reflection

My research ultimately suggested that the Peer Feedback proposal, while strong in many aspects, left unanswered particular, critical user needs. Still, the report was well received by the ASAPbio Director, who was able to approach the February 9th summit aware of likely points of support and criticism (which, as reported to me later by a member of the board, ultimately did land along the lines of my report). As the project heads into revision, the research will also serve as a clarifying force to define what design features are important to include, modify, or remove. ASAPbio also indicated that, depending on how the proposal revision proceeds, this research project may be extended in the future. If so, my first move would be to perform a more thorough statistical treatment of the survey results than time allowed in this sprint: this may uncover patterns in responses I wasn't able to resolve, and would certainly validate the insights already apparent in the data more rigorously.