Consulting with a Statistician
Posted on July 28, 2006
Sue M. Marcus (bio) talks about how a statistician can help with your research design.
|
Q: My mentor told me that I need to consult with a statistician, but I'm not sure what to ask her, since I haven't even started collecting data yet. What can I expect from a statistician at this point in my research?
A: Unless you're doing a very simple study, you need to have a statistician on board from the beginning. If your study is poorly designed, no type of analysis can really save you. People have brought me data that's a mess, and sometimes it's just not salvageable. A statistician can help you with complex issues such as primary and secondary questions, longitudinal models, power analyses, causal inference, mediators and moderators, selecting appropriate statistical methods, and interpretation of analyses.
If you have not started collecting your data yet, a statistician can help you to define your hypotheses and design your study so as to make sure you can actually answer your primary question given your resources.
Q: What are some of the more common research design problems that you have encountered?
A: The primary question
In study design, one thing that people often forget is that you really have to have a clear, narrow focus. Often when you start out you may be interested in many outcomes and you might ask, "How can I help people with bipolar disorder?" But that's not an appropriate question for a research study. Your question must be very clearly defined. Every piece of it has to be stated, with each piece fitting into a puzzle, and it has to be capable of being adequately answered.
Sample selection and power
Inclusion and exclusion criteria for your study sample is another area that is challenging. If you pick a very broad population then you'll have good generalizability, but your statistical significance may not be as good. On the other hand, if you study a high-risk population perhaps you'll get more significant results but less generalizability. In your sample size calculations, you need to include your sample sizes, assumptions and estimates.
A longitudinal design using longitudinal statistical methods can increase your power and, depending on your question, provide a clearer answer. If you have one subject with one observation you have a lot less information than when you have one subject with many observations, so when you have a limited sample size, it's a way to increase your power and your information.
NIMH reviewers do expect a formal power analysis in your proposal, so you need to include certain details. For example, when you are interested in looking for a particular difference between two groups, you need to say, "Here's the statistical method I'm going to use," and each statistical method will have a different power analysis. You might say, "I'm assuming we have 20% dropouts between each assessment" so you need all of these types of details to be clearly specified as well as certain estimates of, for example, the dropout between each period for a longitudinal study. You'll also need the correlation within each individual over time.
Another important area related to this is recruitment. If you say, "I'm going to have 90 subjects," you need to specify where you're going to find them. From your institution, your clinic? Are they coming from multiple places? How many per month were you able to recruit in your pilot work? This is a situation where you really need to be honest with yourself. Many studies have failed from not being able to recruit subjects, and if you do get your grant funded and you're not getting subjects and the rate you specified, you can really get into trouble. So if you overestimate the number of subjects then your budget is much higher and if it's not realistic, you might not actually be able to accrue enough subjects. But if you underestimate then you won't have enough power, you might not get a significant P value.
Analysis plan
The analysis plan is where a lot of proposals fall short. First of all you always need to state your specific aims and hypotheses. It's hard for people to specify their hypotheses clearly. Often people say, "I just want to look at everything related to sleep" and you need to be much more specific. If you want to make friends with a statistician, try to write your own data analysis plan as much as you can - at least write out each of the hypotheses very clearly and state outcome measure.
Generally reviewers prefer that you choose a primary outcome measure and primary statistical model. For example, you might say that you want to see whether CBT (cognitive behavioral therapy) or SSRI (selective serotonin reuptake inhibitors) has greater benefit over time. You could specify a mixed-effects model as a function of time, group, and time-by-group interaction for Beck depression score over time as your primary analysis. Secondary analyses could include the same model with relevant covariates added (e.g. age, sex) or the same model using Hamilton depression score.
Also reviewers like to see a detailed plan for each hypothesis. Don't say you will analyze Hypotheses 2, 3, and 4 similarly to Hypothesis 1. Reviewers would rather see concrete details for each hypothesis.
More About "RCTs" Related Topics | More From Sue M. Marcus (bio) |