




版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、a powerpoint presentation package to accompanyapplied statistics in business & economics, 4thedition 4-2descriptive statisticschapter contents4.1 numerical description4.2 measures of center4.3 measures of variability4.4 standardized data4.5 percentiles, quartiles, and box plots4.6 correlation an
2、d covariance4.7 grouped data4.8 skewness and kurtosischapter 44-3chapter learning objectiveslo4-1:explain the concepts of center, variability, and shape.lo4-2:use excel to obtain descriptive statistics and visual displays.lo4-3:calculate and interpret common measures of center.lo4-4:calculate and in
3、terpret common measures of variability.lo4-5: transform a data set into standardized values.lo4-6:apply the empirical rule and recognize outliers.chapter 4descriptive statistics4-4chapter learning objectiveslo4-7:calculate quartiles and other percentiles.lo4-8:make and interpret box plots.lo4-9:calc
4、ulate and interpret a correlation coefficient and covariance.lo4-10:calculate the mean and standard deviation from grouped data.lo4-11:assess skewness and kurtosis in a sample.chapter 4descriptive statistics4-5chapter 44.1 numerical descriptionlo4-1:explain the concepts of center, variability, and s
5、hape.three key characteristics of numerical data:lo4-14-6chapter 4lo4-2:use excel to obtain descriptive statistics and visual displays.lo4-2excel histogram display for tables 4.34.1 numerical description4-7a familiar measure of centerin excel, use function =average(data) where data is an array of da
6、ta values. formulas insert function () statisticalaveragepopulation meansample meanmeanchapter 44.2 measures of centerlo4-3lo4-3:calculate and interpret common measures of center.4-8chapter 4lo4-34.2 measures of centerthe mean balances the positive and negative deviations ( ) from the mean, in that
7、. 4-9the median (m) is the 50thpercentile or midpoint of the sortedsample data.m separates the upper and lower halves of the sorted observations.if n is odd, the median is the middle observation in the data array.if n is even, the median is the average of the middle two observations in the data arra
8、y.formulas insert function () statisticalmedianmedianchapter 4lo4-34.2 measures of centernf4-10the most frequently occurring data value.may have multiple modes or no mode.the mode is most useful for discrete or categorical data with only afew distinct data values. for continuous data or data with a
9、widerange, the mode is rarely useful.modechapter 4lo4-34.2 measures of center4-11 compare mean and median or look at the histogram to determine degree of skewness. figure 4.10 shows prototype population shapes showing varying degrees of skewness.shapechapter 4lo4-1:explain the concepts of center, va
10、riability, and shape.lo4-14.2 measures of center4-12the geometric mean (g) is a multiplicative average.formulasinsert function () statisticalgeomeangeometric meanchapter 4growth ratesa variation on the geometric mean used to find the average growth rate for a time series.4.2 measures of centerlo4-34
11、-13for example, from 2006 to 2010, jetblue airlines revenues are:yearrevenue (mil)20062,36120072,84320083,39220093,29220103,779growth ratesthe average growth rate:or 12.5 % per year.chapter 44.2 measures of centerlo4-34-14the midrange is the point halfway between the lowest and highest values of x.e
12、asy to use but sensitive to extreme data values.here, the midrange (126.5) is higher than the mean (114.70) or median (113).midrangefor the j.d. power quality data:chapter 44.2 measures of centerlo4-34-15to calculate the trimmed mean, first remove the highest and lowest k percent of the observations
13、.for example, for the n = 33 p/e ratios, we want a 5 percent trimmed mean (i.e., k = .05).to determine how many observations to trim, multiply k by n, which is 0.05 x 33 = 1.65 or 2 observations. so, we would remove the two smallest and two largest observations before averaging the remaining values.
14、trimmed meanchapter 44.2 measures of centerlo4-34-16here is a summary of all the measures of central tendency for the j.d. power data.the trimmed mean mitigates the effects of very high values, but still exceeds the median.mean:114.70=average(data)median:113=median(data)mode:111=mode.sngl(data)geome
15、tric mean:113.35=geomean(data)midrange:126.5(min(data)+max(data)/25% trim mean:113.94=trimmean(data, 0.1)trimmed meanchapter 4lo4-34.2 measures of center4-17variation is the “spread” of data points about the center of the distribution in a sample. consider the following measures of variability:stati
16、sticformulaexcelproconrangexmax xmin=max(data) -min(data)easy to calculatesensitive to extreme data values.sample variance (s2)=var.s(data)plays a key role in mathematical statistics.nonintuitivemeaning.measures of variabilitychapter 44.3 measures of variabilitylo4-4: calculate and interpret common
17、measures of variability.lo4-44-18statisticformulaexcelproconsample standard deviation (s)=stdev.s(data)most common measure. uses same units as the raw data ($ , , , grams etc.).nonintuitivemeaning.measures of variationsample coef-ficient. ofvariation (cv)nonemeasures relative variation in percent so
18、 can compare data sets.requires non-negative data.chapter 4lo4-44.3 measures of variability4-19statisticformulaexcelproconmean absolute deviation (mad)=avedev(data)easy to understand.lacks “nice” theoretical properties.measures of variability1niixxnpopulation varianceformulas insert function ()stati
19、sticalvar.ppopulation standard deviationformulas insert function() statisticalstdev.pchapter 44.3 measures of variabilitylo4-44-20chapter 44.3 measures of variabilitylo4-4variance and standard deviation4-21useful for comparing variables measured in different units or with different means.a unit-free
20、 measure of dispersion.expressed as a percent of the mean.only appropriate for nonnegative data. it is undefined if the mean is zero or negative.coefficient of variationchapter 44.3 measures of variabilitylo4-44-22this statistic reveals the average distance from the center.absolute values must be us
21、ed since otherwise the deviations around the mean would sum to zero. it is stated in the unit of measurement.the mad is appealing because of its simple interpretation.formulas insert function ()statisticalavedevmean absolute deviationchapter 44.3 measures of variabilitylo4-44-23chapter 44.3 measures
22、 of variabilitylo4-4mean absolute deviation4-24take frequent samples to monitor quality.central tendency vs. dispersion: manufacturingchapter 44.3 measures of variabilitylo4-14-25for any population with mean m and standard deviation s, the percentage of observations that lie within k standard deviat
23、ions of the mean must be at least 1001 1/k2. chebyshevs theoremfor k = 2 standard deviations, 1001 1/22 = 75%so, at least 75.0% will lie within m + 2sfor k = 3 standard deviations, 1001 1/32 = 88.9%so, at least 88.9% will lie within m + 3salthough applicable to any data set, these limits tend to be
24、rather wide.chapter 44.4 standardized data4-26the empirical rule states that for data from a normal distribution, we expect the interval m ks to contain a known percentage of data. forthe normal distribution is symmetric and is also known as thebell-shaped curve.k = 1, 68.26% will lie within m + 1sk
25、 = 2, 95.44% will lie within m + 2sk = 3, 99.73% will lie within m + 3sthe empirical rulechapter 44.4 standardized data4-27note: no upper bound is given. data values outside m + 3sare rare.the empirical rulechapter 44.4 standardized data4-28a standardized variable (z) redefines each observation in t
26、erms of the number of standard deviations from the mean.a negative zvalue means theobservation is to theleft of the mean.positive z means the observation is to the right of the mean. chapter 44.4 standardized datalo4-5standardization formula for a population:standardization formula for a sample (for
27、 n 30):lo4-5: transform a data set into standardized values.formulas insert function()statisticalstandardize4-29chapter 4lo4-6: apply the empirical rule and recognize outliers.lo4-64.4 standardized data4-30for a normal distribution, the range of values is almost 6s(from m 3s to m + 3s).if you know t
28、he range r (high low), you can estimate the standard deviation as s = r/6.useful for approximating the standard deviation when only r is known.this estimate depends on the assumption of normality.estimating sigmachapter 44.4 standardized data4-31percentiles are data that have been divided into 100 g
29、roups.for example, you score in the 83rdpercentile on a standardized test. that means that 83% of the test-takers scored below you. deciles are data that have been divided into 10 groups.quintiles are data that have been divided into 5 groups.quartiles are data that have been divided into 4 groups.p
30、ercentileschapter 44.5 percentiles, quartiles, and box-plotslo4-7: calculate quartiles and other percentileslo4-74-32percentiles may be used to establish benchmarks for comparison purposes (e.g. health care, manufacturing, and banking industries use 5th, 25th, 50th, 75th and 90th percentiles). quart
31、iles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios. percentiles can be used in employee merit evaluation and salary benchmarking.formulas insert function ()statisticalpercentile.incformulas insert function ()statisticalquarttile.incpercentileschapter
32、 4lo4-74.5 percentiles, quartiles, and box plots4-33quartiles are scale points that divide the sorted data into four groups of approximately equal size.the three values that separate the four groups are called q1, q2, and q3, respectively.q1q2q3lower 25%|second 25%|third 25%|upper 25%quartileschapte
33、r 4lo4-74.5 percentiles, quartiles, and box plots4-34the second quartile q2is the median, a measure of central tendency.q1and q3measure dispersion since the interquartile range q3 q1measures the degree of spread in the middle 50 percent of data values.q2 lower 50% | upper 50% q1q3lower 25%| middle 5
34、0% |upper 25%quartileschapter 4lo4-74.5 percentiles, quartiles, and box plots4-35the first quartile q1is the median of the data values below q2, and the third quartile q3is the median of the data values above q2.q1q2q3lower 25%|second 25%|third 25%|upper 25%for first half of data, 50% above, 50% bel
35、ow q1.for second half of data, 50% above, 50% below q3.quartiles the method of medianschapter 4lo4-74.5 percentiles, quartiles, and box plots4-36for small data sets, find quartiles using method of medians:step 1: sort the observations.step 2: find the median q2.step 3: find the median of the data va
36、lues that lie below q2.step 4: find the median of the data values that lie above q2.method of medianschapter 4lo4-74.5 percentiles, quartiles, and box plots4-37method of medianschapter 4lo4-7example:4.5 percentiles, quartiles, and box plots4-38so, to summarize:these quartiles express central tendenc
37、y and dispersion. what is the interquartile range?q1q2q3lower 25%of p/e ratios27second 25%of p/e ratios35.5third 25%of p/e ratios40.5upper 25%of p/e ratiosexample: p/e ratios and quartileschapter 4lo4-74.5 percentiles, quartiles, and box plots4-39a useful tool of exploratory data analysis (eda).also
38、 called a box-and-whisker plot.based on a five-number summary:xmin, q1, q2, q3, xmaxconsider the five-number summary for the previous p/e ratios example:7 27 35.5 40.5 49xmin, q1, q2, q3, xmaxchapter 4lo4-8: make and interpret box plots.lo4-84.5 percentiles, quartiles, and box plots4-40the box plot
39、is displayed visually, like this.a box plot shows variability and shape.chapter 4box plotslo4-84.5 percentiles, quartiles, and box plotsmegastat descriptive statistics dot plot4-41chapter 4box plotslo4-84.5 percentiles, quartiles, and box plots4-42use quartiles to detect unusual data points by defin
40、ing fences using the following formulas:inner fencesouter fences:lower fenceq1 1.5 (q3 q1)q1 3.0 (q3 q1)upper fenceq3+ 1.5 (q3 q1)q3+ 3.0 (q3 q1)values outside the inner fences are unusual while those outside the outer fences are outliers. here is a visual illustrating the fences: box plots: fences
41、and unusual data valueschapter 4lo4-84.5 percentiles, quartiles, and box plots4-43for example, consider the p/e ratio data:there is one outlier (170) that lies above the inner fence. there are noextreme outliers that exceed the outer fence.inner fencesouter fences:lower fence:107 1.5 (126 107) = 78.
42、5107 3.0 (126 107) = 50upper fence:126 + 1.5 (126 107) = 154.5126 + 3.0 (126 107) = 183box plots: fences and unusual data valueschapter 4lo4-84.5 percentiles, quartiles, and box plots4-44truncate the whisker at the fences and display unusual values and outliers as dots.based on these fences, there is only one outlier.chapter 4box plots: fences and unusual data valueslo4-8outlier4.5 percentiles, quartiles, and box plots4-45the average of the first and third quartiles.the name midhinge derives from the idea that, if the “box” were folded in half, it would resemble a “hinge”.box plots: mid
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 汽車委托出租合同范本
- 2025家用電器維修服務合同
- 礦山合股經(jīng)營合同范本
- 2025私人汽車買賣合同范本版
- 2025電影制片方服裝設計組承包合同
- 2025冷凍貨車租賃合同范本
- 2024年新沂農(nóng)村商業(yè)銀行招聘真題
- 影視欄目拍攝合同范本
- 青島版二年級上冊數(shù)學 第二單元《5的乘法口訣》教學設計
- 2024年安徽新華高級技工學校有限公司專任教師招聘真題
- 2025福建省泉州市房屋租賃合同范本
- 中華遺產(chǎn)考試題目及答案
- 大班語言《他們看見了一只貓》課件
- 教育游戲化策略研究-全面剖析
- 昆明市官渡區(qū)衛(wèi)生健康局招聘筆試真題2024
- 吉林省吉林市2024-2025學年高三下學期3月三模試題 歷史 含答案
- 2024年昆明市官渡區(qū)衛(wèi)生健康局招聘考試真題
- (一模)2025年廣東省高三高考模擬測試 (一) 英語試卷(含官方答案)
- 辦公室環(huán)境改善項目計劃書
- 《鴻門宴》課本?。撼h風云震撼開場看英雄如何對決
- 2025年春新蘇教版數(shù)學一年級下冊課件 第五單元 兩位數(shù)加、減整十數(shù)和一位數(shù) 第1課時 兩位數(shù)加、減整10數(shù)
評論
0/150
提交評論