6 Desenho de questionários

Author

Felipe Lamarca

6.1 Krosnick, J. A. (2018). Questionnaire Design. Em D. L. Vannette & J. A. Krosnick (Orgs.), The Palgrave Handbook of Survey Research (p. 439–455). Springer International Publishing.

I have come to the view that question wording is where the crises lies at the moment, that’s one area where we severely need more work. (p. 440)

First of all, it is true that in experimental comparisons, open-ended questions take, on average, about twice as long to answer as closed questions and respondents prefer closed questions. On practical grounds, it might appear that closed questions are preferable. In studies of reliability, open questions prove to be more reliable than closed questions and in lots of different studies of validity, open questions prove to be superior to close questions across the board using these various different methods of assessing validity. (p. 443)

The conclusion that I take this literature to suggest is that we should ask open-ended questions when you can’t be sure of the universe of possible answer to a categorical question and the other specify option does not work. The only way to be sure that we know the universe is to pre-test, ask open questions from the population we care about, build the big list, offer it to people and we could do that, but it’s so much work we might as well as the open-ended question in the real survey. And lastly, if we’re looking for a number we should just ask for the number. That is what I take that literature to suggest and I don’t know if that’s really widely recognized and followed. (p. 445)

The evidence in terms of response speed as rating scales get longer, from 2 to 11 points, for example, indicates that the speed goes up. Although, it’s sort of interesting that five-point scales take significantly less time than their neighbors do, that’s a hint that it is preferable to have mid-points. This is even more direct evidence, when people are asked how difficult it is to use the scale, you can see that three points, seven points, and nine points, these are significantly less difficult than the scales of other lengths. So these are reinforcing that notion that in this case seven points is the optimal way to measure bipolar scales and that not only is it less difficult for people but that it produces more valid and reliable results. (p. 447)

The last thing in this arena that we have found is that it’s helpful to branch bipolar dimensions. This is one of the first branching questions that I paid attention to from the ANES: “generally speaking do you consider yourself to be a Republican, a Democrat, an independent or what?” and Republicans and Democrats are asked if they’re strong or not very strong. The independents are asked if they lean one way or another. You can produce seven-point scale like this. If you do it in those two steps up here it goes more quickly and it produces more valid and reliable results rather than presenting all seven of these, which people have to slog through and place themselves on. (p. 447)

If the mid-point is precise, it turns out that asking people, do you lean one way or another actually adds noise. In order to produce the seven-point scale, the people who place themselves at a precise mid-point like this belong there. We shouldn’t branch the mid-point, instead we should branch the end points into three categories. So the people who say increase should be asked, do you want it increased a little, a moderate amount, or a great deal? (p. 447)

Now, with regard to dimensions that have a natural metric, so for example I could ask you how often do you go to the movies, very often, often, sometimes, rarely or never? I’m not going to take the time to go through all of the evidence on this so forgive me for just skipping this. What I can tell you is what this literature says is using those kinds of what you might think of as vague quantifiers actually cause many more problems than they solve. If what you want is a number just ask for the number. Other than saying, how often do you go to the movies, you ask people in the last month how many times did you go to the movies; you can get around that problem. (p. 450)

First of all, use simple, direct, and comprehensible words. Don’t use jargon, be specific in your question, avoid ambiguous words, avoid double barreled questions that ask two things at the same time, and avoid negations if you can avoid the word not. Avoid leading questions, and include filter questions. The common-sense version of using filter questions is don’t ask people what brand of car they have if they might not have a car. Be sure that questions read smoothly aloud, avoid emotionally charged words, avoid prestige names. When you look at textbooks and research design, it’s almost like the later ones copied the earlier ones because they are remarkably consistent in this kind of advice. (p. 450)

First, lots and lots of studies are done that help us to inform the issues of questionnaire design and yet there is much more work to be done, especially with regard to language. I think we understand a lot about structure, but we understand much less about language. There is also an issue of dissemination. NSF of course has commitment not only to making discoveries but also to disseminating those discoveries through educational efforts and outreach. I think there is a real opportunity here because there are lots of people who don’t know it’s a bad idea to offer a ‘don’t know’ response option. There are lots of people who don’t know the order of answer choices and close questions matter and how to handle that. There are lots and lots of people who think it’s fine to ask agree/disagree questions, so a major educational outreach effort to disseminate the findings of this literature would help. (p. 455)

6.2 Wolf, C., Joye, D., Smith, T. W., & Fu, Y. (2016). The SAGE handbook of survey methodology. Sage. Cap. 16: Designing Questions and Questionnaires, by Jolene D. Smyth

This is why questionnaire design texts suggest avoiding starting with boring, embarrassing, or sensitive questions and instead make the first question simple, interesting, and applicable to all sample members with a professional design and visual layout that makes the questionnaire look simple to complete. (p. 219)

An analysis plan can also help one refine their initial research question(s) and list of constructs to measure. For example, while age, race, and gender are not explicitly identified in the aforementioned research question, the researcher would likely want to measure them because they may be related to both engagement in risky behaviors and health outcomes,making them necessary for subgroup analyses or control variables in a regression, both of which should be expressed in the final research questions but are sometimes initially overlooked. These and other constructs may also need to be measured for weighting and adjustment purposes, which should also be planned carefully upfront in the context of the research question(s). (p. 220)

Questions should be written to accommodate the retrieval process, which means accounting for how people recall information. People are better able to recall events that happened recently, are distinctive (e.g., buying a new car versus buying groceries), or that are important (e.g., a wedding celebration versus a birthday celebration) (Tourangeau et al., 2000). People can also more accurately recall events when the recall period is shorter (e.g., the last week versus last year). (p. 222)

Thus, in writing questions, it is important to determine whether exact enumeration of events is needed or whether an estimate is good enough. With very frequent and mundane events, estimation might just have to be good enough. However, for events respondents are expected to be able to recall and enumerate, a shorter recall period will help. Slowing down the survey to give more time to think can also improve retrieval, and can be done by asking longer (not more complex) questions (Bradburn and Sudman, 1979; Tourangeau et al., 2000). (p. 222)

In this case, respondent motivation to provide quality open-ended responses can be increased by emphasizing the importance of the question. Statements like, ‘This question is very important to this research’ can increase the length and quality of open-ended responses (Oudejans and Christian, 2011; Smyth et al., 2009). (p. 223)

Using construct-specific rather than vague quantifier labels can also make reporting more straightforward and reduce measurement error (e.g., asking ‘How would you rate the quality of your new car?’ instead of ‘How much do you agree or disagree that your new car is high quality?’) (Saris et al., 2010). (p. 223)

Reliability and validity seem to be maximized at 5 to 7 scale points for bipolar scales (i.e., scales that measure both direction and magnitude like very satisfied/ very dissatisfied) and 4 to 5 scale points for unipolar scales (scales that measure only magnitude, like never to always) (Krosnick and Fabrigar, 1997). In addition, ensuring that the scale points are conceptually equidistant can also help. (p. 223)

Questionnaire designers also have to make sure respondents are willing to provide accurate answers at the reporting stage. Some respondents may be hesitant because of social or normative concerns. Social desirability is a tendency to answer questions in a way that makes one look good (or not look bad) rather than providing the most accurate answer (Tourangeau and Yan, 2007). It can take the form of underreporting negative behaviors like illicit drug use, abortion, and poor college performance or overreporting positive behaviors like voting and church attendance (Bernstein et al., 2001; Hadaway et al., 1993; Kreuter et al., 2008; Tourangeau and Smith, 1996). Respondents are more susceptible to social desirability in interviewer-administered surveys (Aquilino, 1994; de Leeuw, 1992; Dillman et al., 1996), but it is a concern in all modes. (p. 224)

Aside from increasing response privacy by asking sensitive questions in a private, self- administered mode, question design strategies to combat social desirability sometimes focus on making it either more acceptable or safe to answer honestly. Question wording is commonly changed to make the question less threatening such as asking ‘Have you happened to …’, ‘Some people believe and others believe ’, ‘Do you believe or ?’, and ‘There are many reasons people might not do such as not having time, not having transportation, or being sick. How about you, did you do ?’ (Bradburn et al., 2004). While these strategies make intuitive sense, the few empirical evaluations that have been done suggest they are not more effective than direct inquiries (Bradburn et al., 2004; Schuman and Presser, 1981; Yeager and Krosnick, 2012). (p. 224)

For example, an important word may be made more visible by increasing its font size or bolding or italicizing it. Likewise, nonessential information (e.g., office use only, data entry codes, etc.) can be deemphasized by making it smaller and lighter in color and placing it in areas respondents are less likely to look. Applying properties allows designers to impact how respondents understand the relationship between elements on a page. (p. 226)

There are several ways question ordering can help with retrieval. For example, grouping topically similar questions gains efficiencies because respondents can use retrieved information to answer all questions on a topic before moving to a different topic (Dillman et al., 2014). However, one also has to be watchful for unintentional priming effects, which occur when information retrieved for an early question is used to answer a later question simply because it is more easily accessible. For example, Todorov (2000) found that asking respondents about specific chronic conditions early in the National Health Interview Survey on Disability increased the likelihood that they would identify one of the chronic conditions asked about as a cause of their disability in a later question. (p. 229)

Orienting oneself to how respondents will experience the questionnaire and their response process can provide a useful framework for making these design decisions. This includes keeping in mind the goals of encouraging response, promoting optimizing, providing a clear navigational path, and, of course, collecting high quality measurements. It also requires one to think about how their design will impact each stage of the response process. In addition, understanding the response process and where it can break down can help one determine what pretesting method(s) are most appropriate for a given questionnaire and troubleshoot problems. (p. 231)

6.3 Anotações de aula

Teoria do respondente guarda muita semelhança com o que vimos do Zaller, mas não exatamente. Zaller foca na recepção e no fato de que as pessoas exercem um filtro na resposta, exercendo algum grau de amostragem mental para responder.

A teoria do respondente assume que as pessoas possuem valores mais ou menos lantentes, mas não valores inatos, mas memórias de longo prazo positivas ou negativas relacionadas a certos aspectos. Tenho uma série de memórias que não me cruzam diariamente, mas que surgem de maneira mais ou menos clara da ideia. Respondentes passam por etapas cognitivas para acessar atitudes da memória de longo-prazo (Touranngeau e Rasinski, 1988).

Compreensão: entender o enunciado da pergunta
Recuperação: buscar na memória informações necessárias
Julgamento: avaliar e sintetizar informações
Relato: mapear julgamento para resposta

Não existe uma teoria unificada de respostas, mas abordagens enfatizam dois tipos possíveis de processos (Krosnick 1991):

Otimizar: quando pessoas respondem cuidadosa e completamente no mínimo de tempo
Satisficing: quando usam atalhos ou dão respostas sem esforça

Basicamente existiam dois grupos de pessoas respondendo: algumas que respondem “bruscamente”, e outras muito detalhistas. As pessoas que respondiam de forma brusca tendiam, por exemplo, a responder de maneira aquiescente, ou de maneira aleatória, ou lendo apenas as primeiras alternativas… enfim. Quando desenhamos surveys, precisamos minimizar as respostas do tipo “satisficing” e maximizar as respostas ótimas.

No caso do satisficing, as pessoas respondem muito rapidamente, sem pensar; aderem às primeiras opções de resposta razoáveis (e.g., Sim, Muito); optar por alternativas modais (0, 5, 10); ignoram perguntas difíceis ou complexas; e respondem “não sei” com frequência, mesmo que não seja uma alternativa. Rapidamente as pessoas ficam cansadas e vão até ao final da pesquisa respondendo “de qualquer maneira”.

Para Krosnick, a probabilidade de satisficing é dada por:

\[ \mathbb{P}(\text{Satisficing}) = \dfrac{\text{Dificuldade}}{\text{Habilidade} \times \text{Motivação}} \]

Solução: desenhar questionários de forma a incentivar otimização, que é oposto de satisficing. Podemos controlar três aspectos em diferentes medidas.

Compreensão, reduzindo a dificuldade do instrumento
Recuperação, oferecendo cues – ganchos ou frases que ajudam a puxar itens da memória. “Pensando nos últimos 7 dias…”; “Falando agora sobre os militares no Haiti que estão no Brasil…”.

6.3.1 Formulação das perguntas

Os princípios gerais de escrita das perguntas:

Simplicidade
Especificidade
Instruções
Exemplos

Evitar conceitos vagos, negativas (nunca, nunca usar dupla negação), redação complexa e longa. Pré-teste, feito idealmente com uma pessoa de cada segmento, é ideal.

Krosnick sugere também branching: Você concorda, sim ou não? E dada a resposta, con/discorda muito ou pouco?

6.3.2 Texto de recrutamento

Surveys opt-in naturalmente resolvem parte do problema de motivação: auto-seleção recruta em função de motivação. Para outras abordagens, recrutamento importa:

Evitar abordar pessoas ocupadas
Criar ambiente amigável
Mensagem de recrutamento persuasiva (tentar motivar a pessoa a participar)

6.3.3 Outros

Quando a pergunta é sensível (raça etc), “Prefiro não responder” é um way out muito fácil. Em alguns casos é justo, mas nesses casos é melhor colocar “Não sei”. Quando são perguntas que tem algum nível de factualidade, o “Não sei” significa que a pessoa de fato não entende sobre o assunto – e, se ela já respondeu várias questões sobre o assunto antes, “Não sei” passa a ser menos atrativo e ela vai responder algo.

Formatos de “grade”, em que você faz a pergunta uma vez a pessoa vai respondendo, vendo toda a escala de resposta de uma vez, é ótimo. Vai ganhando eficiência e permite que o respondente faça comparações.

Pergunta demográfica é mais importante fazer no início ou no final? Algumas demográficas não são sensíveis e é fácil responder, então facilmente perguntamos logo de cara idade, escolaridade, e, dependendo do survey, cor/raça e renda são questões sensíveis. A maioria dos institutos fazem as perguntas de filtro mais simples no início (sexo, inclusive, é padrão nem perguntar quando a pesquisa é feita presencialmente).

Perguntar renda familiar costuma ser melhor por manter um certo nível de privacidade. Se você dá grandes margens também costuma ser mais confortável. Além disso, podemos perguntar também quantas pessoas moram com você (preferencialmente de maneira separada, como faz o IBGE).

É normal colocar a ancoragem no salário mínimo. Fernando sugere valores absolutos e, no tratamento, fazer o ajuste.

“Segundo as categorias do IBGE,” [pergunta sobre identificação de cor/raça]. Sempre uma boa opção se ancorar nisso.