![常柴股份有限公司章程96_第1頁(yè)](http://file4.renrendoc.com/view/f970c4e6f2262bea7be8cbcad28db8c7/f970c4e6f2262bea7be8cbcad28db8c71.gif)
![常柴股份有限公司章程96_第2頁(yè)](http://file4.renrendoc.com/view/f970c4e6f2262bea7be8cbcad28db8c7/f970c4e6f2262bea7be8cbcad28db8c72.gif)
![常柴股份有限公司章程96_第3頁(yè)](http://file4.renrendoc.com/view/f970c4e6f2262bea7be8cbcad28db8c7/f970c4e6f2262bea7be8cbcad28db8c73.gif)
![常柴股份有限公司章程96_第4頁(yè)](http://file4.renrendoc.com/view/f970c4e6f2262bea7be8cbcad28db8c7/f970c4e6f2262bea7be8cbcad28db8c74.gif)
![常柴股份有限公司章程96_第5頁(yè)](http://file4.renrendoc.com/view/f970c4e6f2262bea7be8cbcad28db8c7/f970c4e6f2262bea7be8cbcad28db8c75.gif)
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、The Kleinbaum Sample ProblemThis problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New York: Springer-Verlag, 1966. Pages 256-257.The problem is to examine the relationship between the dependent variable Confidence in the legal system, (CONFIDE
2、N) and three independent variables: Social class (CLASS), Number of times victimized (VICTIM), and age (AGE).Confidence in the legal system, the dependent variable, is metric so we could use multiple regression analysis. However, the author opts to convert Confidence in the legal system to a dichoto
3、mous variable by dividing the score above and below the median value of 10. A new dependent variable, High confidence in the legal system (HIGHCONF) was created where 1 stands for high confidence and 0 stands for low confidence.The data for this problem is: ConfidenceInLegalSystem.Sav.Kleinbaum Logi
4、stic Regression Problem1Stage One: Define the Research ProblemIn this stage, the following issues are addressed:Relationship to be analyzedSpecifying the dependent and independent variablesMethod for including independent variablesKleinbaum Logistic Regression ProblemRelationship to be analyzedThe p
5、roblem is to examine the relationship between the dependent variable Confidence in the legal system, (CONFIDEN) and three independent variables: Social class (CLASS), Number of times victimized (VICTIM), and age (AGE).2Specifying the dependent and independent variablesThe dependent variable, High co
6、nfidence in the legal system (HIGHCONF) was created so that 1 stands for high confidence and 0 stands for low confidence.The independent variables are:CLASS Social class statusVICTIM Number of times victimizedAGE Age of respondentCLASS Social class status is a nonmetric variable with three response
7、options: 1 = Low, 2 = Medium, and 3 = High. While a case could be made that it can be treated as a scale variable, we will treat it as nonmetric and use the SPSS facility in logistic regression to enter it as a categorical variable.VICTIM Number of times victimized has a range from 0 to 2.Kleinbaum
8、Logistic Regression ProblemMethod for including independent variablesSince we are interested in the relationship between the dependent variable and all of the independent variables, we will use direct entry of the independent variables.3Stage 2: Develop the Analysis Plan: Sample Size IssuesIn this s
9、tage, the following issues are addressed:Missing data analysisMinimum sample size requirement: 15-20 cases per independent variableKleinbaum Logistic Regression ProblemMissing data analysisThere is no missing data in this problem.Minimum sample size requirement: 15-20 cases per independent variableT
10、he CLASS Social class status variable has three categories, so dummy coding it will require two variables, bringing the total number of independent variables is 4. The data set has 39 cases and 4 independent variables for a ratio of 10 to 1, falling short of the requirement that we have 15-20 cases
11、per independent variable.4Stage 2: Develop the Analysis Plan: Measurement Issues:In this stage, the following issues are addressed:Incorporating nonmetric data with dummy variablesRepresenting Curvilinear Effects with PolynomialsRepresenting Interaction or Moderator EffectsKleinbaum Logistic Regress
12、ion ProblemIncorporating Nonmetric Data with Dummy Variables The logistic regression procedure will dummy code the nonmetric variables for us.Representing Curvilinear Effects with Polynomials We do not have any evidence of curvilinear effects at this point in the analysis.Representing Interaction or
13、 Moderator Effects We do not have any evidence at this point in the analysis that we should add interaction or moderator variables.5Stage 3: Evaluate Underlying AssumptionsIn this stage, the following issues are addressed:Nonmetric dependent variable with two groupsMetric or dummy-coded independent
14、variablesKleinbaum Logistic Regression ProblemNonmetric dependent variable having two groupsThe dependent variable HIGHCONF High confidence in legal system is a dichotomous variable.Metric or dummy-coded independent variablesThe independent variable CLASS Social class status is nonmetric and will be
15、 recoded into two dichotomous variables automatically using an SPSS option for designating an independent variable as categorical.The independent variables VICTIM Number of times victimized and AGE Age of respondent are metric variables.6Stage 4: Estimation of Logistic Regression and Assessing Overa
16、ll Fit: Model Estimation In this stage, the following issues are addressed:Compute logistic regression modelKleinbaum Logistic Regression ProblemCompute the logistic regressionThe steps to obtain a logistic regression analysis are detailed on the following screens.7Requesting a Logistic RegressionKl
17、einbaum Logistic Regression Problem8Specifying the Dependent VariableKleinbaum Logistic Regression Problem9Specifying the Independent VariablesKleinbaum Logistic Regression Problem10Specify the Categorical Independent VariableKleinbaum Logistic Regression Problem11Specify the method for entering var
18、iablesKleinbaum Logistic Regression Problem12Specifying Options to Include in the OutputKleinbaum Logistic Regression Problem13Specifying the New Variables to SaveKleinbaum Logistic Regression Problem14Complete the Logistic Regression RequestKleinbaum Logistic Regression Problem15Stage 4: Estimation
19、 of Logistic Regression and Assessing Overall Fit: Assessing Model FitIn this stage, the following issues are addressed:Significance test of the model log likelihood (Change in -2LL)Measures Analogous to R: Cox and Snell R and Nagelkerke RHosmer-Lemeshow Goodness-of-fitClassification matricesCheck f
20、or Numerical ProblemsPresence of outliersKleinbaum Logistic Regression ProblemCategorical variable recodingAt the start of the output, SPSS reports how it dummy coded the variable CLASS Social class status:SPSS does not assign new names to the dummy coded variable, instead it will refer to the varia
21、bles as CLASS(1) and CLASS(2). CLASS(1) corresponds to Lower Class; CLASS(2) corresponds to Middle Class, and Upper Class is the omitted category.16Initial statistics before independent variables are includedThe Initial Log Likelihood Function, (-2 Log Likelihood or -2LL) is a statistical measure li
22、ke total sums of squares in regression. If our independent variables have a relationship to the dependent variable, we will improve our ability to predict the dependent variable accurately, and the log likelihood value will decrease. The initial 2LL value is 54.040 on step 0, before any variables ha
23、ve been added to the model.Kleinbaum Logistic Regression Problem17Significance test of the model log likelihoodThe difference between these two measures is the model child-square value (17.863 = 54.040 36.177) that is tested for statistical significance. This test is analogous to the F-test for R or
24、 change in R value in multiple regression which tests whether or not the improvement in the model associated with the additional variables is statistically significant.In this problem the model Chi-Square value of 17.863 has a significance of 0.001, less than 0.05, so we conclude that there is a sig
25、nificant relationship between the dependent variable and the set of independent variables.Kleinbaum Logistic Regression Problem18Measures Analogous to RThe next SPSS outputs indicate the strength of the relationship between the dependent variable and the independent variables, analogous to the R mea
26、sures in multiple regression.The relationship between the dependent variable and independent variables is strong, indicated by the value of Nagelkerke R2 which is 0.490. Using the interpretive criteria for R, we would characterize this relationship as strong.Kleinbaum Logistic Regression Problem19Co
27、rrespondence of Actual and Predicted Values of the Dependent VariableThe final measure of model fit is the Hosmer and Lemeshow goodness-of-fit statistic, which measures the correspondence between the actual and predicted values of the dependent variable. In this case, better model fit is indicated b
28、y a smaller difference in the observed and predicted classification. A good model fit is indicated by a nonsignificant chi-square value.The goodness-of-fit measure has a value of 5.507 which has the desirable outcome of nonsignificance.Kleinbaum Logistic Regression Problem20The Classification Matric
29、esThe classification matrices in logistic regression serve the same function as the classification matrices in discriminant analysis, i.e. evaluating the accuracy of the model.If the predicted and actual group memberships are the same, i.e. 1 and 1 or 0 and 0, then the prediction is accurate for tha
30、t case. If predicted group membership and actual group membership are different, the model misses for that case. The overall percentage of accurate predictions (71.8% in this case) is the measure of a model that I rely on most heavily for this analysis as well as for discriminant analysis because it
31、 has a meaning that is readily communicated, i.e. the percentage of cases for which our model predicts accurately. To evaluate the accuracy of the model, we compute the proportional by chance accuracy rate and the maximum by chance accuracy rates, if appropriate. The proportional by chance accuracy
32、rate is equal to 0.500 (0.4872 + 0.5132). A 25% increase over the proportional by chance accuracy rate would equal 0.625. Our model accuracy race of 71.79% exceeds this criterion.With 51% of the cases in one group and 49% of the cases in the other group, we do not have a dominant category that would
33、 require us to compare our results to the maximum by chance accuracy rate.21The Stacked HistogramSPSS provides a visual image of the classification accuracy in the stacked histogram as shown below. To the extent to which the cases in one group cluster on the left and the other group clusters on the
34、right, the predictive accuracy of the model will be higher.As we can see in this plot, there is some overlapping between the two groups. 22Check for Numerical ProblemsThere are several numerical problems that can in logistic regression that are not detected by SPSS or other statistical packages: mul
35、ticollinearity among the independent variables, zero cells for a dummy-coded independent variable because all of the subjects have the same value for the variable, and complete separation whereby the two groups in the dependent event variable can be perfectly separated by scores on one of the indepe
36、ndent variables.All of these problems produce large standard errors (over 2) for the variables included in the analysis and very often produce very large B coefficients as well. If we encounter large standard errors for the predictor variables, we should examine frequency tables, one-way ANOVAs, and
37、 correlations for the variables involved to try to identify the source of the problem.The standard errors and B coefficients are not excessively large, so there is no evidence of a numeric problem with this analysis.Kleinbaum Logistic Regression Problem23There are two outputs to alert us to outliers
38、 that we might consider excluding from the analysis: listing of residuals and saving Cooks distance scores to the data set.SPSS provides a casewise list of residuals that identify cases whose residual is above or below a certain number of standard deviation units. Like multiple regression there are
39、a variety of ways to compute the residual. In logistic regression, the residual is the difference between the observed probability of the dependent variable event and the predicted probability based on the model. The standardized residual is the residual divided by an estimate of its standard deviat
40、ion. The deviance is calculated by taking the square root of -2 x the log of the predicted probability for the observed group and attaching a negative sign if the event did not occur for that case. Large values for deviance indicate that the model does not fit the case well. The studentized residual
41、 for a case is the change in the model deviance if the case is excluded. Discrepancies between the deviance and the studentized residual may identify unusual cases. (See the SPSS chapter on Logistic Regression Analysis for additional details).In the output for our problem, SPSS informs us that there
42、 is one outlier in this analysis:Presence of outliers Kleinbaum Logistic Regression Problem24Cooks Distance SPSS has an option to compute Cooks distance as a measure of influential cases and add the score to the data editor. I am not aware of a precise formula for determining what cutoff value shoul
43、d be used, so we will rely on the more traditional method for interpreting Cooks distance which is to identify cases that either have a score of 1.0 or higher, or cases which have a Cooks distance substantially different from the other. The prescribed method for detecting unusually large Cooks dista
44、nce scores is to create a scatterplot of Cooks distance scores versus case id.Kleinbaum Logistic Regression Problem25Request the ScatterplotKleinbaum Logistic Regression Problem26Specifying the Variables for the ScatterplotKleinbaum Logistic Regression Problem27The Scatterplot of Cooks DistancesOn t
45、he plot of Cooks distances shown below, we see no cases that exceed the 1.0 rule of thumb for influential cases. We do, however, identify cases that have relatively larger Cooks distance values (above 0.6) than the majority of cases. However, with the small sample size we have in this problem, I am
46、not inclined to remove any cases unless they were extreme outliers or influential cases.Kleinbaum Logistic Regression Problem28Stage 5: Interpret the ResultsIn this section, we address the following issues:Identifying the statistically significant predictor variablesDirection of relationship and con
47、tribution to dependent variableKleinbaum Logistic Regression Problem29Identifying the statistically significant predictor variablesThe coefficients are found in the column labeled B, and the test that the coefficient is not zero, i.e. changes the odds of the dependent variable event is tested with t
48、he Wald statistic, instead of the t-test as was done for the individual B coefficients in the multiple regression equation.As shown above, only the variables VICTIM Number of times victimized and AGE Age of respondent have a statistically significant individual relationship with the dependent variab
49、le.Kleinbaum Logistic Regression Problem30The predictor variable with the strongest relationship is VICTIM. The negative sign of the B coefficient and the value of Exp(B) less than 1.0 both indicate that the relationship is inverse: the more times one is victimized, the less likely one is to have co
50、nfidence in the legal system.With the inverse relationship, it may make more sense to invert the odds ratio (1 / odds ratio) and interpret the probability of not belonging to the dependent variable group assigned the code of 1. In this problem, we could say that every time a person is victimized, th
51、ey are 4.2 times less likely to have high confidence in the legal system (1/.236 = 4.2).Age has a direct relationship with confidence in the legal system: as one gets older, ones confidence in the legal system increases. For every 1 year increase in age, the odds of having high confidence in the leg
52、al system increase 1.2 times.Direction of relationship and contribution to dependent variableKleinbaum Logistic Regression Problem31Stage 6: Validate The ModelWhen we have a small sample in the full data set as we do in this problem, a split half validation analysis is almost guaranteed to fail beca
53、use we will have little power to detect statistical differences in analyses of the validation samples. In this circumstance, our alternative is to conduct validation analyses with random samples that comprise the majority of the sample. We will demonstrate this procedure in the following steps:Compu
54、ting the First Validation AnalysisComputing the Second Validation AnalysisThe Output for the Validation AnalysisKleinbaum Logistic Regression ProblemComputing the First Validation AnalysisWe set the random number seed and modify our selection variable so that is selects about 75-80% of the sample.32
55、Set the Starting Point for Random Number GenerationKleinbaum Logistic Regression Problem33Compute the Variable to Select a Large Proportion of the Data SetKleinbaum Logistic Regression Problem34Specify the Cases to Include in the First Validation AnalysisKleinbaum Logistic Regression Problem35Specif
56、y the Value of the Selection Variable for the First Validation AnalysisKleinbaum Logistic Regression Problem36Computing the Second Validation AnalysisWe reset the random number seed to another value and modify our selection variable so that is selects about 75-80% of the sample.Kleinbaum Logistic Re
57、gression Problem37Set the Starting Point for Random Number GenerationKleinbaum Logistic Regression Problem38Compute the Variable to Select a Large Proportion of the Data SetKleinbaum Logistic Regression Problem39Specify the Cases to Include in the Second Validation AnalysisKleinbaum Logistic Regress
58、ion Problem40Specify the Value of the Selection Variable for the Second Validation AnalysisKleinbaum Logistic Regression Problem41Generalizability of the Logistic Regression ModelWe can summarize the results of the validation analyses in the following table.Full ModelSplit1 = 1Split2 = 1Model Chi-Sq
59、uare17.863, p=.001317.230, p=.001710.550, p=.0321Nagelkerke R2.490.614.385Accuracy Rate forLearning Sample71.79%85.71%74.19%Accuracy Rate for Validation Sample45.45%87.50%Significant Coefficients (p 0.05)VICTIM Number of times victimized AGE Age of respondentVICTIM Number of times victimized AGE Age
60、 of respondentIt is difficult to do a validation analysis with such a small sample in the full model. Based on the evidence, we cannot conclude that the model is generalizable because none of the independent variables appear in both of the validation analyses and the accuracy rate for the validation
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年金屬基耐磨復(fù)合材料項(xiàng)目規(guī)劃申請(qǐng)報(bào)告模稿
- 2025年企業(yè)調(diào)整策劃與和解合同
- 2025年度磚廠用地租賃合同
- 2025年企業(yè)績(jī)效管理改進(jìn)協(xié)議
- 2025年交通事故責(zé)任補(bǔ)償合同樣本
- 2025年居家康復(fù)護(hù)理策劃協(xié)議標(biāo)準(zhǔn)文本
- 2025年詳實(shí)版土地征收補(bǔ)償協(xié)議
- 2025年上海二手住宅買(mǎi)賣(mài)合同參考
- 2025年能源資源管理合同
- 2025年公路拓寬工程合作協(xié)議
- 后印象派繪畫(huà)
- pcs-9611d-x說(shuō)明書(shū)國(guó)內(nèi)中文標(biāo)準(zhǔn)版
- GB/T 1634.1-2004塑料負(fù)荷變形溫度的測(cè)定第1部分:通用試驗(yàn)方法
- 數(shù)據(jù)結(jié)構(gòu)英文教學(xué)課件:chapter4 Stacks and Queues
- 無(wú)人機(jī)航拍技術(shù)理論考核試題題庫(kù)及答案
- T∕CMATB 9002-2021 兒童肉類制品通用要求
- 工序勞務(wù)分包管理課件
- 暖通空調(diào)(陸亞俊編)課件
- 工藝評(píng)審報(bào)告
- 自動(dòng)化腹膜透析(APD)的臨床應(yīng)用課件
- 中國(guó)滑雪運(yùn)動(dòng)安全規(guī)范
評(píng)論
0/150
提交評(píng)論