Cognitive Science Seminar: The Importance of Replication in Cognitive Science: Week 8

Monday, October 10, 2016

Week 8 - Roger Kreuz

17 comments:

UnknownOctober 10, 2016 at 8:00 AM
In the Rouse paper, the author says that affirming response accuracy looks like it affected the quality of the results, but how can they say this when the affirmation question was the absolute last thing on the survey? It could not have affected the participants’ responses unless they were allowed to go back and change their responses on the Openness to Experiences scale, which it did not sound like they were. So I do not understand why the author found a difference between the group who got this affirmation question and the group who did not. I must be missing something here. The author recommends including one of these affirmation questions to improve result quality. This may not be a bad idea, but the item should be included at the beginning in future tense (e.g., “Will you pay attention and answer honestly?”) rather than at the absolute end of the survey where it cannot affect how the participant responds to the survey.

Finding a lower reliability in the MTurk sample compared to the original Goldberg sample feels like going back to the replication issue. Is the result of one of the studies “more correct” than the other, or is it simply the case that the two samples were different (perhaps based on demographics or the difference between 1999 participants and 2015 participants)? I do not think that differences in findings from MTurk samples compared to “conventional” samples are necessarily a bad thing. Think about a reverse reality: If the majority of research was done on MTurk, and someone came up with the novel idea to use a group of undergraduates as participants, we would run into the same problem where the results to some studies might come out a bit differently. People would likely be wary of running undergraduates because it is not the “standard” practice that people have been using for decades. One type of study (MTurk vs. undergraduates) would not be more correct than the other though. The samples are just different, and (as with all research) we have to make sure that we keep in mind the characteristics of the sample that the research is conducted on and thus the generalizability of the results.

In the Ramsey article, in Study 2, I was surprised that so few people followed the instructions for the first question. The MTurk participants were more likely to follow the instructions correctly than the undergraduate participants, but still only about half of the MTurk participants did so. What really stuck out to me was how this finding was different from a previous study that they cite (the Goodman study), which found the opposite pattern (undergraduate participants were more likely to follow the instructions than MTurk participants). Though they mention that their result is different from Goodman’s, the authors carry on like their result is correct without really questioning why it is different or making a case for why their result more accurately reflects reality. As we have learned about replications, it gets murky when two studies find different results for the same (or similar) question because it is difficult to figure out whose results are the “true” results, or if there even is a “true” result. I would have to see a few more studies that find that MTurk participants are better than undergraduate participants at following non-intuitive instructions before I fully buy it.
ReplyDelete
Replies
UnknownOctober 10, 2016 at 8:37 AM

Let’s start with the Rouse (2016) article. I thought this was a great little primer piece on Mturk and a good way to gain understanding on how a Mturk survey is constructed and how one “should” be constructed. I’ve taken several Mturk surveys but I’ve never actually assembled one. Honestly I’m a bit jealous at how easily data can be collected but we won’t go there ;) It’s funny because being able to collect so much data at once feels like magic. There is this great line from the TV show “Once Upon A Time” that goes “All magic comes with a price!” and in the case of Mturk this price is reliability.

In regards to reliability and replication, I did have a bit of a problem with a 1 to 1 comparison of Mturk data with data from another platform, even though the demographics of Mturk data tend to follow a relatively uniform pattern. I did however agree with the recommendations on how to increase reliability, specifically with the inclusion of attentiveness criteria. I also thought that creating unique attentiveness questions was a great idea to try and trip up yay or nay sayers rather than simply replicating accuracy screeners from previous studies.

Lastly, I just wanted to touch on the Ramsey et.al. article. I found it HIGHLY encouraging that participants seemed highly attentive to the questions that they were being presented in an online format. The authors made an “interesting” point though when they said that Mturk participants complete surveys due to intrinsic interest in research. Though they may have intrinsic interest, you can’t tell me that aren’t being motivated by extrinsic rewards. Do the two simply cancel each other out? How might this interaction affect the ability to replicate recognition and attentiveness findings?
ReplyDelete
Replies
UnknownOctober 10, 2016 at 9:04 AM
I really enjoyed these articles, especially since they are applicable to any area of focus and are important to keep in mind when conducting research using these sorts of methods. In general, they made me think both of ways to use MTurk for replication and ways to replicate the articles themselves to further explore the validity of the use of MTurk for psychological research.

As far as using MTurk as a tool for replication, it would be interesting to simply go back to studies that have consistently only utilized university students and replicate them. Would phenomena hold up with a more diverse sample? This, of course, could improve generalizability of concepts and theories if results stay consistent, or bring them to question if they do not. However, the downside would be that not all studies can be conducted in this manner, such as studies that use any psychophysiological component. As well, as the articles pointed out, the researcher has no control of the environment the person is in while they perform a study using MTurk. This is especially important in the world of cognitive psychology since many tasks involve attention, sound and visual processing, measuring reaction times, etc. The environment someone using MTurk is in will definitely affect these things.

As far as the actual articles, the main point I picked up on are the issues involved with processing and following instructions. I think it would be worthwhile to build a study around this, creating different conditions in which the instructions are presented differently to see how it affects responses. This would not only be beneficial to know for research using MTurk, but for survey research in general. If you can’t control the environment, the least you can do is figure out how to make it more likely that individuals will accurately answer your questions and follow instructions for the study.
ReplyDelete
Replies
Erica GrishamOctober 10, 2016 at 9:43 AM
I am so excited to be discussing MTurk. I have heard about MTurk so many times and never had a full understanding of how it works. I really appreciated the in depth discussion and explanation of MTurk provided in the Rouse article. I find MTurk fascinating and see the potential benefits for collecting data this way. I feel it is a great way to get around the “college sophomore” problem that many researchers face today. The Ramsey article makes the statement that perhaps one of the weakest links in psychological research is the quality of data in which we can collect because of sampling issues. I feel that MTurk is a great way to expand our population sample, it could allow for not only more participants but also more diverse and generalizable results.

The studies done in the Ramsey article provide some interesting results on sex differences in attentiveness and conscientiousness, however there isn’t much to be said on how to attend to these issues. It is clear in both articles that attentiveness is a huge issue both in and out of the lab, but little is suggested in how to improve this. The Rouse article did attempt to improve attentiveness but they were unsuccessful, but I found their study very interesting. I found their “did you pay attention during this study” question clever and thought it was a great way to get at assessing attentiveness in their study. I think by using variability in rewarding participants and still giving the participants their reward despite their answer to the question is a great way to eliminate any data that cannot be deemed useful. I also feel this could be a great way to prime participants for attentiveness in future research, even though it may not address it in study the currently participated in, it could plant a seed for how seriously their responses are taken in future research they may participate in. Just a thought?

However, I am curious if anyone has any ideas on how we can continue to assess and achieve attentiveness in research? I am pretty new to running my own research and up to this point have only ever collected data in person either in a lab or a classroom setting. Although it seems unrealistic to use eye tracking outside the lab, is this at any way possible? Could eye tracking reveal anything about attentiveness in this type of research? Again I apologize for my lack of knowledge on eye tracking, it is just something that seems applicable in a lab setting and I was curious if there is any way to use something like this for those who are participating outside the lab?
ReplyDelete
Replies
Kaitlyn PeperoneOctober 10, 2016 at 7:12 PM
While I had heard of MTurk prior to reading these articles, the articles were a nice intro into more in depth info about the system. Now I am somewhat tempted to become a Turker myself!

I can clearly see how MTurk allows access to a more diverse sample than does the basic subject pool of most universities. However, is it safe to view MTurk as a subject pool of its own? At what point does MTurk become the new norm? Will we eventually be looking for a more diverse and representative sample than MTurk can offer?

Similarly, if I understood correctly, creators of the suverys/experiments can place parameters on who they want to be able to participate in their survey. I know this happens in settings outside of MTurk, but how can you verify that the participants are telling the truth about their demographics or whatever characteristics you want to specify? I guess this is an issue that all experimenters grapple with, not just when using MTurk.

I also found the attentiveness idea interesting. I wonder how many participants in studies actually pay attention the entire time and how many are simply filling in bubbles or rushing through questions. I assume there has been research done specifically on this subject, but I wonder if the results are applicable to all fields of study. We know that humans in general have a difficult time attending to things they are totally uninterested in and I fear this is the case for most studies done using the subject pool of a university, though this shouldn't become much of an issue with MTurk.

All together, I am excited for this discussion. MTurk seems like a fairly straightforward concept that still has a long way to go before it becomes the norm.
ReplyDelete
Replies
UnknownOctober 10, 2016 at 7:36 PM
I like the idea of using MTurk, because it reaches a more diverse population than college students. I do have some concerns, however, about participants not being in a controlled environment while completing a survey or cognitive task. Attention checks seem like a good idea, but I kind of doubt that reading one strange question would prompt someone to pay more attention to every other question asked. Another concern is that reliability estimates for some MTurk studies are considerably lower than expected. This begs the question of whether research conducted on this platform is accurate, and whether it can be used to extend knowledge gained in a laboratory setting.

I also have a problem with the affirmation form used in the first study in the Rouse article. A better way to assure that the data is accurate would be to state at the beginning that full attention is required. It would be interesting, however, to compare data from the yes and no categories and see how the data differs. I think researchers could even use that data to delve into how environment outside of the lab can impact attention.

MTurk has some perks, though, that would be really interesting to use in replications. For example, researchers can screen out participants from geographic areas, including international ones. This is a cheaper way to conduct international research than having to travel to and stay in another country while working on the study. This is an exciting application, and it could be used to collect data from many different nations and cultures simultaneously.
ReplyDelete
Replies
UnknownOctober 10, 2016 at 11:31 PM
Both articles examined the quality of MTurk as a means of gaining research data. It seems that the major benefits of using MTurk are 1) gaining access to large and demographically diverse sample of participants, 2) low cost and ease to get date, 3) reducing the likelihood of potentially biasing interactions with the experimenters.

Rouse’s article found that time-expectation and rewards did not affect score reliability. I am a little bit confused when the author examined whether affirmation the accuracy of MTurk workers’ response, he said “this improvement in reliability was not attained through the elimination of the 2% of respondents who acknowledge that they were inattentive, rather, the reliability of the scores obtained from those who affirmed their attentiveness was higher than that seen for comparable samples who were not asked to give an affirmation”. If I understood correctly, the participants did not know whether they were doing a survey with affirmation before they saw the affirmation item in the end of the survey, then their response to the items before the affirmation item would be similar to the participants who did non-affirmation survey. I thought the reason why the reliability was higher for the participants who affirmed their attentiveness compared to the participants who was not asked to give affirmation is that the scores excluded the respondents who chose inattentiveness. But it seems not true because within that group, elimination of the inattentive respondents would not improve the reliability. So I am curious about why there was a difference between two groups.

In the article of Ramsey, et. al., they examined participants attentiveness across sites. When they compared MTurk with off-site undergraduate, I did not find the criterion of recruiting participants on MTurk, such as their education level (if researcher can set criterion when recruiting on MTurk). It makes sense if the authors just want to compare attentiveness between general MTurk participants with off-site undergraduates. But I think it may be clearer if they compare MTurk participants who are undergraduates to off-site undergraduates when evaluating attentiveness, because participants’ education level might influence attentiveness to the instruction (just a guess). The reason I was wondering is that we usually have a target population when designing a research, such as adolescents versus adults, so it seems more helpful if we could set some criterions on MTurk when recruiting participants.

As regard to replication, I think it is very important to use a large sample size when evaluate the quality of Mechanical Turk data especially on self-reported survey, because it is hard to tell whether the respondents told the truth or really paid attention. In addition, MTurk provides us an approach to generalize the research on diverse population, which is very helpful. I wonder whether the use of MTurk would be accessed by countries worldwide. Because it will be very helpful and convenient to conduct cross-national research if MTurk can be accessed by people worldwide.
ReplyDelete
Replies
danielbkgabrielOctober 11, 2016 at 11:37 AM
This comment has been removed by the author.
ReplyDelete
Replies
danielbkgabrielOctober 11, 2016 at 11:38 AM
I find the topic of data gained through Mechanical Turk to be very interesting. But, like the Ramseyetal article points out, there are plenty of questions regarding its validity. My biggest concern when I first heard of Mechanical Turk was, indeed, the issue of people taking part in experiments in a non-lab environment. I hadn’t realized that things similar to this had been done before via snail mail. The difference in the results of previous studies regarding off site and, in a broader sense, Mechanical Turk’s, validity makes this study essential, in my opinion. I’m glad that the findings seem to support the validity of using off site testing, such as Mechanical Turk. The issue regarding attending to instructions disappointed me, but I suppose that’s to be expected. I think that means things such as Mechanical Turk are relegated for less intensive studies than what can be done on site. I was a little surprised that nonsensitive questions were better encoded than sensitive, as I would have thought that people would be more likely to pay extra attention to questions of a sensitive nature, rather than a nonsensitive. I guess that just goes to show me the importance of testing hypotheses, though.

I thought the Rouse article brought up an important point regarding protecting the integrity of research while simultaneously improving the ease of research. I appreciated the background of Mechanical Turk that was given in this article. In particular, the detailed reporting of previous findings. The Ramseyetal article seemed to just breeze past these, but Rouse went into much greater detail. I thought that the test done by Rouse was well done. The lack of replication is worrying. I was disappointed by their findings of less reliability. I’m glad that Rouse makes a point to acknowledge that his findings do not completely negate Mechanical Turk’s usefulness.

I am left a little confused by the disparate findings of the two articles. I wonder why they found such different results, when both tests seemed valid. Perhaps it was the lack of an affirmation question in the Ramseyetal study or something similar. I think for future replications, the multiple forms of the Rouse study should be utilized. Perhaps including those measure in with the Ramseyetal recall measures could produce more useful still results.

ReplyDelete
Replies
UnknownOctober 11, 2016 at 12:47 PM
The article by Ramsey et al. provided comforting results concerning the similarity of findings between typical university and MTurk participants. Further, it was particularly nice to see that MTurk participants were more likely than undergraduates in either the on- or off-site groups to follow directions. This could prove beneficial for research topics requiring greater attention to instructions, though even the higher percentage of participants correctly following directions is a bit depressing (i.e. around 50%). While many of the results suggested that participants may be attending to the items equally well regardless of whether recruited through MTurk, I still wonder whether MTurk participants might differ in other significant ways. For instance, as we discussed in previous classes, the demographics of MTurk users tends to be skewed in certain areas. For instance, the demographics provided in the Ramsey et al. article show that the MTurk users were quite a bit less ethnically diverse (i.e. 76% vs 58% white) while being considerably more diverse in terms of educational background (i.e. including participants with only a high school diploma while also including those with Bachelor's degrees and graduate education). While a variable such as educational background might be more successfully filtered, selecting only those that fit certain criteria, inherent proportional differences in ethnicity, SES, and other personal factors could prove much more problematic. In these instances, while filtering would still be technically possible (with such a large pool of participants, one could theoretically find a few hundred individuals matching a variety of different stipulations), it becomes more likely that one might be selecting less representative samples. For example, if the proportion of certain types of MTurk users is significantly higher or lower than the proportion typically seen in real-world scenarios (e.g. If MTurk users have higher SES, more education, greater technical skill, or if MTurk users are less likely to be minorities), one runs the risk of over representing certain types of people or under representing others. Phrased another way, if minority group A makes up 20% of the general or regional population, but minority group A only makes up 5% of the MTurk population, is there something uniquely similar about the few members of said group that influences their use of MTurk in the first place? This could also potentially limit the types of research questions one might ask, as online means of participant recruitment may prohibit certain populations from being reached or otherwise tap into groups of outliers, such as students that spend an above average amount of their time online or low-SES participants that have personal or regular access to the internet (certain cities like Detroit and Memphis have up to 40% of their population without any means of internet access, while this is event higher in particularly impoverished areas). Hopefully some of these issues will be addressed through conscientious use of MTurk and similar platforms, so that we might continue taking advantage of such means of research while utilizing them in appropriate ways.
ReplyDelete
Replies
AnonymousOctober 11, 2016 at 4:32 PM
Both of the articles this week did a fairly good job capturing the pros and cons of using mTurk. While I was beginning my undergraduate honors thesis, I looked into the research on mTurk and decided to use it myself as I had a fairly complex design (8 conditions, mostly between subjects) and a longer study time (approx. 30-45 minutes, though I had predicted it would be closer to an hour). I would have needed around 160-200 participants at 2 SONA credits each in our very small subject pool, so it seemed unlikely that traditional methods would work best.
Some of the patterns they discuss in both articles I had certainly seen. The most positive of these was the overwhelmingly fast rate of responses. In just a few days I was able to collect all my data. Additionally, I followed many of the suggestions in the literature including the use of manipulation checks. I also decided to collect response times as this might help catch inattentive responses as I had read mixed reviews of the use of mTurk for cognitive psychology studies (see link below: I cannot recommend reviewing this very brief article enough to anyone who is going to use mTurk). Particularly, I had read that longer studies requiring more effort may be problematic.
I had collected no data from an undergraduate population to compare my data with and my study was a fairly novel combination of the misinformation effect and a manipulation from the reasoning literature, so there was no reasonable way to compare my data, though the classic misinformation effect was repeated. What I began to see, however, was evidence from the timing data of participant inattentiveness. The most obvious example came from the timing data from a video participants were required to watch (i.e. the eyewitness event they would be tested on later). At the completion of the video, participants had to press the arrow to continue with the study. Although it was a fairly small number of participants (somewhere around 10-20), I noticed that it took these outliers sometimes 30 seconds or more to simply click a button. This clearly suggested they were not attending to the information being presented (and which they would be tested on later).
In relation to the article, this seems a bit higher than the 20 or so participants Rouse eliminated (as my sample size was half as big), though it may not be completely uncommon. I also was able to see that some participants were taking up to a minute to answer a yes/no recognition question (“did you see this event in the film?”). As a result, it certainly did make me a bit apprehensive about the data and, at present, I am collecting additional data to replace the unusable responses.
From my experience, these two articles, and a number of other articles I have read, it seems that mTurk may be a great option, if a study meets several criteria. For example, it seems incredible for simple surveys, personality measures, attitude measures, or similar methods. However, it does seem that additional steps must be taken for more complex, longer, and more demanding methods (e.g., memory tests, reasoning, categorization). This is somewhat consistent with the results from Ramsey et al. However, while the 50% of following instructions by mTurk workers was higher than undergraduate populations, it seems very disappointing. While mTurk is a powerful new tool, precautions certainly need to be taken to ensure that instructions are followed carefully (as suggested by both articles). Still, the Ramsey et al. article did do a great job highlighting that although mTurk data is imperfect, so is that obtained from the average general psychology student.
ReplyDelete
Replies
UnknownOctober 11, 2016 at 7:13 PM
I will begin by saying that I really like using Murk because it is easy to use and it’s a very fast way of obtaining data. I will however say this, In the Rouse article, it is stated that the cost of running Murk studies is very low. This is true depending on the type of experiment you are planning to run. For example, some of my experiments would require a budget of roughly $25,000 dollars due to participant time requirements. I really enjoyed the articles this week. I have heard about some of the validity issues brought up, but not explicitly hashed out in discussion. was surprised that there was such a gender difference in following directions in the Ramsey article but I was disappointed that not much was said about how to address this issue. I appreciated that it was noted that researchers added a question about directions before participants continue on to the study as this seems like a good idea whether directions are atypical or not. I also think that adding attentiveness questions to studies (particularly online studies) would be beneficial regardless of whether the mode of implementation is MTurk or an on-site study. I think that keeping participants engaged in an experiment is always a concern but perhaps adding attentiveness questions is one way to increase engagement (or at least attention as engagement seems to have a motivating connotation). There is always the concern that people are not participating in a controlled experimental environment. However, I think that there is much to be gained by having people participate “in the wild.” Most of life in spent in the wild so I think that there is something to be gained by not controlling the participant’s environment in all situations. Particularly, I think it is beneficial to run participants on-site and on MTurk because it allows researchers to examine these environmental differences. When it comes to replication, it is important that this issue be examined by more researchers in order to get a full picture of what all of the issues are and what concerns are developed over time with such quickly evolving technologies.
ReplyDelete
Replies
UnknownOctober 12, 2016 at 6:40 AM
I like the idea of using MTurk for collecting data in survey research. I see many advantages collecting data using MTurk. Researcher can get very diverse group in their sample size and using MTurk is also cheaper then other data collection process. But it is true though in psychological research MTurk become more popular now a days. In the literature part of the Rouse article some of the previous research gave the similar thought.

In the article by Rouse, most of the participants in MTurk data are undergrad students that’s why MTurk use could be very useful for instructors. But researcher with special group interest might not very interested since they might not get their expected sample size. For instance, I am interested in researching elementary school children which may be difficult for me to reach this age group of people via MTurk. Another big concerned that also caught my attention is very few people also reported that they were less attentive when they participated in the MTurk survey. That is also a potential reason why in few survey research results get highly skewed.

In the other article by Ramsey get the different result than the first one. This makes me think we talk a lot about replication but how two similar study get two different directions. In the Rouse’s article it has been stated undergrad students are more likely to follow instruction then MTurk participant and they in larger number as MTurk participants. However, in Ramsey’s article get the different results.

It is also interesting in Ramsey’s article that he mention he found the participants very enthusiastic as a participants. And the author also mentioned that they found MTurk participants intrinsically motivated in participating MTurk research. But to me it is not clear how they sated this. There might be external reward in participating MTurk study, since it is all about good money for other international people.

Finally, I would say both of this article did nice job in stating a lot of good point how MTurk should be using in future research. But as a researcher point of view I recommend to use MTurk in more psychological research and where the participants are more undergrad students.
ReplyDelete
Replies
UnknownOctober 12, 2016 at 8:54 AM
Since I am not familiar with web-based data collecting method like mechanical turk, these two articles were a good introduction. As a nature science major student, tell the truth, I never think about any experimental methods like collecting data from internet, because, in my thought, collecting data from website imposes lots of risks in validity of data and ultimately the integrity of the experiment.

After reading these articles, I could see web-based data collecting is efficient method for social science especially when there is a need for improving statistical power by large extent of data sample and when examine questions are well-generalized and clear. However, even though there are significant benefits in these kind of methods, I still have some worries about test integrity.

As shown in the Ramsey et al., (2016), even though the results are statistically equal between on- and off-site, there are still remained questions such as how reliable participants response and how much researchers can trust those web-based data result. Moreover, Rouse’ results showed significantly lower reliability of mechanical turk data than standardization sample.

Either nature science or social science, it is hard to say that there is a perfect methods for academic research. Only thing we can do is to minimize deficits of the methods and maximize the benefit of the research methods so that we can obtain the most precise results. If I adopt this MTurk methods to replication study, I may use it to obtain a large amount of data to improve statistic power in social science fields such as psychology or politics, since in these research areas, participants may feel more comfortable and give more honest answer in web-based setting.
ReplyDelete
Replies
UnknownOctober 12, 2016 at 9:32 AM

I find it really strange that both of these papers ignore the vast literature from the HCI community on crowdsourcing. It is a very popular field with its own conferences even. Although I don't keep up on this topic, I would certainly expect them to be citing Kittur at CMU and Bernstein at Stanford, who have each been studying it for over a decade. Kittur's CHI'08 paper and Bernstein's many CHI and CSCW papers as well as his dissertation go into the nuances of what you can expect from mechanical turkers and when you can't trust them. Downs' CHI'10 is another big one in regards to properly screening turkers to make sure they aren't gaming your experiment.

I find it funny that the Rouse paper says psychological researchers have long been open-minded in adopting new technologies... considering that computer science researchers are often criticized for being 10-20 years behind the standard. So I don't buy it. And I don't think it is that they are "open-minded" rather than they are incentivized to be the first to study something.

The experiment in the Rouse paper seems rather superficial without motivating why they did it in relation to the dozens of similar studies published at CHI and CSCW a decade ago.

While I find it good that researchers are checking the quality of data from things such as mturk, I would say they are being extremely unethical in ignoring the huge number of studies that have been done on this. It is hard to judge this when they are presenting it in a vacuum.

Were the surveys presented EXACTLY the same to both the undergrads as the mturkers? Minor changes in the interface might be causing the difference in results. Since various usability experts have already presented dos and don'ts for experiments on mturk and since this paper isn't citing them, they don't seem to be following them. This makes me think I can't trust the results since the effect of the interface is hiding other differences. We already know that minor changes during A/B testing can cause significant behavioral differences in users.
ReplyDelete
Replies
UnknownOctober 12, 2016 at 10:13 AM
This has probably been the easiest article (save for our first set of articles of the semester which address the issue directly) to relate to the issue of replication. Honestly, before this class I had never heard of MTurk—and I’d venture to guess that many of my same-age peers hadn’t heard of it before as well. This is probably a good thing, seeing as up until last year I had been an undergraduate—and MTurk’s goal is to expand the research population from more than just psychology undergrads. I think this is evidenced by the average age for one of the studies being around 32 years old, and the greater variance in terms of demographics and even geographic location.
As an undergraduate who’s taken many surveys, I have noticed in myself that depending on the relative importance of the survey, my responses have certainly been more/less attentive. Even on such things like completing SETEs (the end of term evaluations for professors)—I personally know some people who answer according to the extremes, based on their bias for/against a professor. This issue of attentiveness was definitely a concern and skepticism that I’ve had for a while regarding surveys completed by undergraduates. The study at hand certainly did a good job of tackling some of the possible ways in which we can evaluate the attentiveness of participants, by having essentially a memory task which was meant as a proxy for attentiveness. The study does admit that this does not necessarily prove truthfulness of answers, but it at least shows some level of conscientiousness.
I suppose that I wasn’t all that surprised that there wouldn’t be much of a difference between undergraduates completing the tasks vs. the MTurk group. Both are somewhat incentivized in this case, with course credit and with a nominal fee. With that said, I can imagine that MTurk participants likely take dozens more surveys, and I imagine that there can be a buildup of fatigue over answering so many. Generally, I would have expected that there would be greater reliability with the MTurk group though, as to even know about the service takes a certain degree of understanding about the importance of reliable information acquisition in the research process. Similarly, one would expect Psychology undergraduates to be cognizant of this significance as well.
ReplyDelete
Replies

Add comment