版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Will G HopkinsAuckland University of TechnologyAuckland NZQuantitative Data AnalysisSummarizing Data: variables; simple statistics; effect statistics and statistical models; complex models.Generalizing from Sample to Population: precision of estimate, confidence limits, statistical significance, p v
2、alue, errors.Reference: Hopkins WG (2002). Quantitative data analysis (Slideshow). Sportscience 6, /jour/0201/Quantitative_analysis.ppt (2046 words)Summarizing Data Data are a bunch of values of one or more variables. A variable is something that has different values. Values can be numbe
3、rs or names, depending on the variable:Numeric, e.g. weightCounting, e.g. number of injuriesOrdinal, e.g. competitive level (values are numbers/names)Nominal, e.g. sex (values are names When values are numbers, visualize the distribution of all values in stem and leaf plots or in a frequency histogr
4、am.Can also use normal probability plots to visualize how well the values fit a normal distribution. When values are names, visualize the frequency of each value with a pie chart or a just a list of values and frequencies. A statistic is a number summarizing a bunch of values. Simple or univariate s
5、tatistics summarize values of one variable. Effect or outcome statistics summarize the relationship between values of two or more variables. Simple statistics for numeric variables Mean: the average Standard deviation: the typical variation Standard error of the mean: the typical variation in the me
6、an with repeated samplingMultiply by (sample size) to convert to standard deviation. Use these also for counting and ordinal variables. Use median (middle value or 50th percentile) and quartiles (25th and 75th percentiles) for grossly non-normally distributed data. Summarize these and other simple s
7、tatistics visually with box and whisker plots. Simple statistics for nominal variables Frequencies, proportions, or odds. Can also use these for ordinal variables. Effect statistics Derived from statistical model (equation) of the form Y (dependent) vs X (predictor or independent). Depend on type of
8、 Y and X . Main ones:YXEffect statisticsModel/Testnumericnumericslope, intercept, correlation regressionnumericnominalnominalnominalnominalnumericmean differencefrequency difference or ratiofrequency ratio per t test, ANOVA chi-squarecategorical Model: numeric vs numerice.g. body fat vs sum of skinf
9、olds Model or test: linear regression Effect statistics: slope and intercept= parameterscorrelation coefficient or variance explained (= 100correlation2)= measures of goodness of fit Other statistics:typical or standard error of the estimate= residual error= best measure of validity (with criterion
10、variable on the Y axis) Model: numeric vs nominale.g. strength vs sex Model or test: t test (2 groups)1-way ANOVA (2 groups) Effect statistics: difference between meansexpressed as raw difference, percent difference, or fraction of the root mean square error (Cohens effect-size statistic) variance e
11、xplained or better (variance explained/100)= measures of goodness of fit Other statistics:root mean square error= average standard deviation of the two groups More on expressing the magnitude of the effect What often matters is the difference between means relative to the standard deviation: strengt
12、hfemalesmalesTrivial effect: strengthfemalesmalesVery large effect: Fraction or multiple of a standard deviation is known as the effect-size statistic (or Cohens d). Cohen suggested thresholds for correlations and effect sizes. Hopkins agrees with the thresholds for correlations but suggests others
13、for the effect size:CorrelationsEffect Sizes For studies of athletic performance, percent differences or changes in the mean are better than Cohen effect sizes. Model: numeric vs nominal (repeated measures)e.g. strength vs trial Model or test: paired t test (2 trials)repeated-measures ANOVA withone
14、within-subject factor (2 trials) Effect statistics: change in mean expressed as raw change, percent change, or fraction of the pre standard deviation Other statistics:within-subject standard deviation (not visible on above plot) = typical error: conveys error of measurement useful to gauge reliabili
15、ty, individual responses, and magnitude of effects (for measures of athletic performance).preposttrial Model: nominal vs nominale.g. sport vs sex Model or test: chi-squared test or contingency table Effect statistics:Relative frequencies, expressed as a difference in frequencies, ratio of frequencie
16、s (relative risk), or ratio of odds (odds ratio)Relative risk is appropriate for cross-sectional or prospective designs. risk of having rugby disease for males relative to females is (75/100)/(30/100) = 2.5Odds ratio is appropriate for case-control designs. calculated as (75/25)/(30/70) = 7.030%75%
17、Model: nominal vs numerice.g. heart disease vs age Model or test: categorical modeling Effect statistics:relative risk or odds ratioper unit of the numeric variable(e.g., 2.3 per decade) Model: ordinal or counts vs whatever Can sometimes be analyzed as numeric variables using regression or t tests O
18、therwise logistic regression or generalized linear modeling Complex models Most reducible to t tests, regression, or relative frequencies. Example Model: controlled trial (numeric vs 2 nominals)e.g. strength vs trial vs group Model or test: unpaired t test of change scores (2 trials, 2 groups)repeat
19、ed-measures ANOVA withwithin- and between-subject factors (2 trials or groups)Note: use line diagram, not bar graph, for repeated measures. Effect statistics: difference in change in mean expressed as raw difference, percent difference, or fraction of the pre standard deviation Other statistics:stan
20、dard deviation representing individual responses (derived from within-subject standard deviations in the two groups)preposttrialdrugplacebo Model: extra predictor variable to control for somethinge.g. heart disease vs physical activity vs age Cant reduce to anything simpler. Model or test:multiple l
21、inear regression or analysis of covariance (ANCOVA)Equivalent to the effect of physical activity with everyone at the same age.Reduction in the effect of physical activity on disease when age is included implies age is at least partly the reason or mechanism for the effect.Same analysis gives the ef
22、fect of age with everyone at same level of physical activity. Can use special analysis (mixed modeling) to include a mechanism variable in a repeated-measures model. See separate presentation at . Problem: some models dont fit uniformly for different subjects That is, between- or within-
23、subject standard deviations differ between some subjects. Equivalently, the residuals are non-uniform (have different standard deviations for different subjects). Determine by examining standard deviations or plots of residuals vs predicteds. Non-uniformity makes p values and confidence limits wrong
24、. How to fixUse unpaired t test for groups with unequal variances, orTry taking log of dependent variable before analyzing, orFind some other transformation. As a last resort Use rank transformation: convert dependent variable to ranks before analyzing (= non-parametric analysissame as Wilcoxon, Kru
25、skal-Wallis and other tests).Generalizing from a Sample to a Population You study a sample to find out about the population. The value of a statistic for a sample is only an estimate of the true (population) value. Express precision or uncertainty in true value using 95% confidence limits. Confidenc
26、e limits represent likely range of the true value. They do NOT represent a range of values in different subjects. Theres a 5% chance the true value is outside the 95% confidence interval: the Type 0 error rate. Interpret the observed value and the confidence limits as clinically or practically benef
27、icial, trivial, or harmful. Even better, work out the probability that the effect is clinically or practically beneficial/trivial/harmful. See . Statistical significance is an old-fashioned way of generalizing, based on testing whether the true value could be zero or null. Assume the nul
28、l hypothesis: that the true value is zero (null). If your observed value falls in a region of extreme values that would occur only 5% of the time, you reject the null hypothesis. That is, you decide that the true value is unlikely to be zero; you can state that the result is statistically significan
29、t at the 5% level. If the observed value does not fall in the 5% unlikely region, most people mistakenly accept the null hypothesis: they conclude that the true value is zero or null! The p value helps you decide whether your result falls in the unlikely region. If p0.05, your result is in the unlik
30、ely region. One meaning of the p value: the probability of a more extreme observed value (positive or negative) when true value is zero. Better meaning of the p value: if you observe a positive effect, 1 - p/2 is the chance the true value is positive, and p/2 is the chance the true value is negative
31、. Ditto for a negative effect.Example: you observe a 1.5% enhancement of performance (p=0.08). Therefore there is a 96% chance that the true effect is any enhancement and a 4% chance that the true effect is any impairment.This interpretation does not take into account trivial enhancements and impair
32、ments. Therefore, if you must use p values, show exact values, not p0.05.Meta-analysts also need the exact p value (or confidence limits). If the true value is zero, theres a 5% chance of getting statistical significance: the Type I error rate, or rate of false positives or false alarms. Theres also
33、 a chance that the smallest worthwhile true value will produce an observed value that is not statistically significant: the Type II error rate, or rate of false negatives or failed alarms.In the old-fashioned approach to research design, you are supposed to have enough subjects to make a Type II err
34、or rate of 20%: that is, your study is supposed to have a power of 80% to detect the smallest worthwhile effect. If you look at lots of effects in a study, theres an increased chance being wrong about at least one of them.Old-fashioned statisticians like to control this inflation of the Type I error rate within an
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 靶向治療注意事項(xiàng)
- 證券估價(jià)課件教學(xué)課件
- 藥劑科應(yīng)急演練
- 慢性哮喘病人護(hù)理查房
- 積分獎(jiǎng)勵(lì)課件教學(xué)課件
- 第三章3.2金屬材料課件-高一化學(xué)人教版2019必修第一冊(cè)
- 骨科護(hù)士課件教學(xué)課件
- 吉林省2024七年級(jí)數(shù)學(xué)上冊(cè)第2章整式及其加減全章整合與提升課件新版華東師大版
- 檢修安全措施及注意事項(xiàng)
- 早幼粒細(xì)胞白血病
- Module 3 Things we do Unit 7 Helping others Period 3 The story The bee and the ant(教學(xué)設(shè)計(jì))-2023-2024學(xué)年牛津上海版(三起)英語(yǔ)六年級(jí)下冊(cè)
- 西南油氣田分公司招聘筆試題庫(kù)2024
- 2024-2030年電鍍行業(yè)市場(chǎng)發(fā)展分析及發(fā)展趨勢(shì)與投資前景研究報(bào)告
- 小學(xué)生主題班會(huì)開(kāi)學(xué)第一課學(xué)習(xí)奧運(yùn)精神 爭(zhēng)做強(qiáng)國(guó)少年 課件
- 上海市豐鎮(zhèn)中學(xué)2024-2025學(xué)年九年級(jí)上學(xué)期分層練習(xí)數(shù)學(xué)試題(無(wú)答案)
- 文件評(píng)審表(標(biāo)準(zhǔn)樣本)
- 醫(yī)療輔助服務(wù)行業(yè)發(fā)展前景與機(jī)遇展望報(bào)告
- 1 小熊購(gòu)物 (教學(xué)設(shè)計(jì))-2024-2025學(xué)年數(shù)學(xué)三年級(jí)上冊(cè)北師大版
- (2024年)新人教版部編一年級(jí)道德與法治教材解讀5
- 跨學(xué)科主題學(xué)習(xí)-美化校園(課件) 2024-2025學(xué)年七年級(jí)地理(人教版2024)
評(píng)論
0/150
提交評(píng)論